intro

I’ve always been curious about emulation, hypervisors etc so I started to read about it and came accross an awesome blog post explaining the basics of emulation with chip8 as an example.

It demystified a lot and didn’t spoiled too much so it motivated me a lot to write my own emulator from scratch using rust and targeting WebAssembly: chiphuit.

First of all I didn’t want to use third party crates that would have spoiled the journey, such as gloo. I wanted to learn to use web-sys in order to control as much as possible the interaction between my code and the browser. This allowed me to learn a lot about the browsers APIs since web-sys types are directly mapped with the standard Web API. According to its documentation, web-sys “is sort of like the libc crate, but for the Web”.

Also, many Rust/WebAssembly projects tend to handle a lot of logic on the browser side with JavaScript code, I didn’t want to do that and focused on doing as much as possible in rust.

chip8 vm

So, what’s the chip8 VM?

chip8 is a VM developed by Joseph Weisbecker in the late 70’s that allowed to play 2D monochromic games on computers such as the famous space invaders, pong, or tetris.

It’s one of the simplest VMs so it’s often chosen to build a first toy emulator (yes that’s my case!).

Its screen resolution is 64x32 (only black & white pixels), and has a keypad made of 16 keys to handle user input.

It also has everything a stack based VM needs:

  • 4096 bytes of memory
  • 16 registers
  • an index register
  • a program counter
  • a stack & a stack pointer
  • 35 different opcodes

NB: An opcode is a generic name to define a set of bits that define an atomic task for a given VM.

Thus we can store these in an Emulator struct.

struct Emulator {
    pub current_opcode: OpCode,
    memory: [u8; 4096],

    registers: [u8; 16],
    index_register: u16,
    program_counter: u16,

    pub screen: [bool; 64 * 32],

    pub stack: [u16; 16],
    stack_pointer: usize,

    delay_timer: u8,
    sound_timer: u8,

    pub keypad: Rc<RefCell<[bool; 16]>>,

    pub rom_buffer: Rc<RefCell<Vec<u8>>>,

    pub running: Rc<RefCell<bool>>,
}

Each field is defined by a primitive type:

  • u8 is an unsigned integer encoded on 8 bits.
  • u16 is an unsigned integer encoded on 16 bits.
  • usize is the pointer-sized unsigned integer type.
  • [T; n] is an array of type T and length n.
  • OpCode is a struct made of 4 u8.
pub struct OpCode {
    pub first_nibble: u8,
    pub second_nibble: u8,
    pub third_nibble: u8,
    pub fourth_nibble: u8,
}

Rust is a safe language and has concepts other languages do not have. The borrow checker is one of them and Rc<RefCell<T>> is just a way to check borrowing rules correctness at runtime instead of compile time. But that’s absolutely not the point of this blogpost.

how it works

When the Emulator is initialized, all its fields are set to its default values (basically everything is set to 0 / false).

impl Emulator {
    pub fn new() -> Emulator {
        Emulator {
            current_opcode: OpCode {
                first_nibble: 0_u8,
                second_nibble: 0_u8,
                third_nibble: 0_u8,
                fourth_nibble: 0_u8,
            },
            memory: [0; 4096],

            registers: [0; 16],
            index_register: 0,
            program_counter: 512,

            screen: [false; 64 * 32],

            stack: [0; 16],
            stack_pointer: 0,

            delay_timer: 0,
            sound_timer: 0,

            keypad: Rc::new(RefCell::new([false; 16])),

            running: Rc::new(RefCell::new(false)),

            rom_buffer: Rc::new(RefCell::new(Vec::new())),
        }
    }
}

The Emulator then loads its fonts in memory.

pub const FONTS: [u8; 80] = [
    0xF0, 0x90, 0x90, 0x90, 0xF0, // 0
    0x20, 0x60, 0x20, 0x20, 0x70, // 1
    0xF0, 0x10, 0xF0, 0x80, 0xF0, // 2
    0xF0, 0x10, 0xF0, 0x10, 0xF0, // 3
    0x90, 0x90, 0xF0, 0x10, 0x10, // 4
    0xF0, 0x80, 0xF0, 0x10, 0xF0, // 5
    0xF0, 0x80, 0xF0, 0x90, 0xF0, // 6
    0xF0, 0x10, 0x20, 0x40, 0x40, // 7
    0xF0, 0x90, 0xF0, 0x90, 0xF0, // 8
    0xF0, 0x90, 0xF0, 0x10, 0xF0, // 9
    0xF0, 0x90, 0xF0, 0x90, 0x90, // A
    0xE0, 0x90, 0xE0, 0x90, 0xE0, // B
    0xF0, 0x80, 0x80, 0x80, 0xF0, // C
    0xE0, 0x90, 0x90, 0x90, 0xE0, // D
    0xF0, 0x80, 0xF0, 0x80, 0xF0, // E
    0xF0, 0x80, 0xF0, 0x80, 0x80, // F
];

pub fn load_font(&mut self) {
        self.memory[0..80].copy_from_slice(&FONTS);
    }

It’s just a constant u8 array that encodes the 0-9A-F charset with the first nibble as follow:

hexadecimal             binary             visual
   0xF0               1111 0000             ████
   0x90               1001 0000             █  █
   0x90               1001 0000             █  █
   0x90               1001 0000             █  █
   0xF0               1111 0000             ████

Then the user has to supply a ROM (Tetris for example) to the emulator, the Emulator will then copy the ROM’s bytes into the RAM (memory field of the Emulator struct), starting from offset 512, according to cowgod’s chip8 technical reference. That’s also why the program_counter is initialized with the value 512.

pub fn load_rom(&mut self) {
        let rom_length = self.rom_buffer.borrow().len();
        self.memory[512..512 + rom_length].copy_from_slice(&self.rom_buffer.borrow());
        self.rom_buffer.borrow_mut().clear();
    }

Now we’re ready to enter the event loop.

The event loop is basically an infinite loop that handles 3 things on each iteration:

  • the instruction cycle
  • the screen display & audio
  • the user input

NB: wasm-bindgen docs provides a nice example of an event loop using requestAnimationFrame.

Hence, making a chip8 emulator from scratch can be splitted in these 3 parts.

However, the handling of user input & the way you handle screen depend on the API you’re using. A chip8 emulator can be rendered in a terminal since printing white squares is enough to emulate the screen. In our implementation we’re targeting WebAssembly so we’re going to use browsers APIs.

Although, the way you handle opcodes has to be very precise and close to the architecture documentation, in order to run any game properly.

handling the instruction cycle

An instruction cycle in a VM is often represented as a 3 steps process:

  • fetch
  • decode
  • execute

fetch

Fetching the opcode is done by getting it from the Emulator’s memory according to its program counter. A chip8 opcode is made of 4 nibbles (1 nibble = 4 bits).

4x4 = 16, so we can store a whole opcode in a u16 (16-bit unsigned integer type).

Example of an opcode:

first nibble seconde nibble third nibble fourth nibble
0001 0110 0110 0101

According to the documentation, the 1st and 2nd nibbles of the opcode to process lie in memory at offset program_counter, and the 3rd and 4th nibbles at offset program_counter + 1. Since our memory is a u8 array, we build a u16 out of two u8.

Then we store each nibble in a u8 (8-bit unsigned integer type) since rust doesn’t have a u4 type, and wrap everything in an OpCode struct so that we can access nibbles easily when processing the opcode later.

    fn fetch_opcode(&mut self) {
        let opcode = (self.memory[self.program_counter as usize] as u16) << 8
            | self.memory[(self.program_counter as usize + 1) as usize] as u16;

        self.current_opcode = OpCode {
            first_nibble: ((opcode & 0xF000) >> 12) as u8,
            second_nibble: ((opcode & 0x0F00) >> 8) as u8,
            third_nibble: ((opcode & 0x00F0) >> 4) as u8,
            fourth_nibble: (opcode & 0x000F) as u8,
        };
    }

To do so, we used bitwise operators:

  • bit shifting ( » )
  • boolean OR (|)
  • boolean AND (&).

decode

Decoding is identifying the opcode, in order to know which function to execute, basically it is a big switch case between all the different types of opcodes (35 in our case).

Wikipedia provides a nice description of all opcodes, so all we have to do is to reimplement their logic.

Match arms are pretty handy to pattern match opcodes and should be the way to go when handling opcodes of a VM, whatever the VM in my opinion.

Yes, coding an emulator is basically implementing each handle function for each opcode by reading its documentation, but in the end you’ll be able to play tetris.

pub fn process_opcode(&mut self) {
        self.program_counter += 2;

        match (
            self.current_opcode.first_nibble,
            self.current_opcode.second_nibble,
            self.current_opcode.third_nibble,
            self.current_opcode.fourth_nibble,
        ) {
            (0, 0, 0xE, 0xE) => self._00ee(),
            (0, 0, 0xE, 0) => self._00e0(),
            (0, _, _, _) => self._0nnn(),
            (1, _, _, _) => self._1nnn(),
            (2, _, _, _) => self._2nnn(),
            (3, _, _, _) => self._3xnn(),
            (4, _, _, _) => self._4xnn(),
            (5, _, _, 0) => self._5xy0(),
            (6, _, _, _) => self._6xnn(),
            (7, _, _, _) => self._7xnn(),
            (8, _, _, 0) => self._8xy0(),
            (8, _, _, 1) => self._8xy1(),
            (8, _, _, 2) => self._8xy2(),
            (8, _, _, 3) => self._8xy3(),
            (8, _, _, 4) => self._8xy4(),
            (8, _, _, 5) => self._8xy5(),
            (8, _, _, 6) => self._8xy6(),
            (8, _, _, 7) => self._8xy7(),
            (8, _, _, 0xE) => self._8xye(),
            (9, _, _, 0) => self._9xy0(),
            (0xA, _, _, _) => self.annn(),
            (0xB, _, _, _) => self.bnnn(),
            (0xC, _, _, _) => self.cxnn(),
            (0xD, _, _, _) => self.dxyn(),
            (0xE, _, 9, 0xE) => self.ex9e(),
            (0xE, _, 0xA, 1) => self.exa1(),
            (0xF, _, 0, 7) => self.fx07(),
            (0xF, _, 0, 0xA) => self.fx0a(),
            (0xF, _, 1, 5) => self.fx15(),
            (0xF, _, 1, 8) => self.fx18(),
            (0xF, _, 1, 0xE) => self.fx1e(),
            (0xF, _, 2, 9) => self.fx29(),
            (0xF, _, 3, 3) => self.fx33(),
            (0xF, _, 5, 5) => self.fx55(),
            (0xF, _, 6, 5) => self.fx65(),
            _ => {
                console::log_1(
                    &format!(
                        "Unknown opcode: {:X}{:X}{:X}{:X}",
                        self.current_opcode.first_nibble,
                        self.current_opcode.second_nibble,
                        self.current_opcode.third_nibble,
                        self.current_opcode.fourth_nibble
                    )
                    .into(),
                );
            }
        }
    }
}

As an example we are going to decode by hand an opcode.

Let’s consider the following u16 : 1000011001010111

Let’s split it into 4 nibbles as follow:

first nibble seconde nibble third nibble fourth nibble
1000 0110 0101 0111
  • The 1st nibble is 1000 which is 8 in binary.
  • The 4th nibble is 0101 which is 7 in binary.

Hence it is the 8XY7 opcode, and the 2nd (X) and 3rd nibbles (Y) are going to be processed to modify our emulator state during the execution of the 8XY7 function handler.

first nibble seconde nibble third nibble fourth nibble
1000 = 8 0110 = X 0101 = Y 0111 = 7

execute

Executing is calling the opcode handle function, it’s the right hand side of the match arm.

For example, here’s the function handle for the 8XY7 opcode we just decoded:

    /// Sets VX to VY minus VX. VF is set to 0 when there's a borrow,
    /// and 1 when there is not.
    /// vx = vy - vx
    fn _8xy7(&mut self) {
        let substraction = (self.get_vy() - self.get_vx()) as i8;
        self.registers[15] = self.carry(substraction);
        self.registers[self.current_opcode.second_nibble as usize] = substraction as u8;
    }

Another example with the function handle for the 8XY6 opcode:

/// Stores the least significant bit of VX in VF and then shifts
/// VX to the right by 1.
/// vx >>= 1
fn _8xy6(&mut self) {
    self.registers[15] = 1u8 & self.get_vx();
    self.registers[self.current_opcode.second_nibble as usize] >>= 1;
}

You’ll notice the presence of get_vy(), get_vx() and the carry functions.

All the opcodes share common patterns so it’s a good idea to factorize common patterns in separate functions in order to reuse code, the code usually looks cleaner this way.

NB: A very handy way to ensure all opcodes’ implementation correctness is by loading this ROM in your emulator. You’ll be able to see which opcode handle function is incorrect. Helped me a lot when I didn’t have my debugger yet.

handling the screen display

In order to display the screen of our Emulator we have 2 Web APIs for this:

WebGL kinda brings OpenGL into the browser which can be very useful when rendering complex geometry and shaders, whereas Canvas offers a 2D rectangular space to draw in a simple manner.

Since chip8 doesn’t involve 3D geometry nor shaders, but only 32x64 black & white pixels, using Canvas API is much more appropriate IMO for our use case :).

A great example of the Canvas API is provided by the wasm-bindgen repo, where a julia set is rendered.

Therefore, in our event loop we’re going to update the Canvas from our Emulator’s screen field. To do so, we iterate over our screen array and convert each boolean into a RGBA pixel in order to build an ImageData that our Canvas will display in the Browser.

If we implemented the opcodes logic correctly, our Emulator’s screen should contain false for every pixel turned off, and true for every pixel turned on. In RGBA [0, 0, 0, 0] represents and black pixel and [255, 255, 255, 255] a white pixel.

pub fn draw_screen(context: &CanvasRenderingContext2d, screen: [bool; 64 * 32]) {
    let rgba_screen: Vec<u8> = screen
        .iter()
        .flat_map(|x| match x {
            false => [0u8; 4],
            true => [255u8; 4],
        })
        .collect();

    let frame =
        ImageData::new_with_u8_clamped_array_and_sh(Clamped(&rgba_screen), WIDTH, HEIGHT).unwrap();

    context.put_image_data(&frame, 0.0, 0.0).unwrap();
}
  • [0u8; 4] is just a handy way to create an u8 array of length 4 filled with zeros so is [255u8; 4] for an u8 array filled with 255s.

  • flat_map is just a way to iterate over screen and to avoid nesting arrays in rgba_screen.

Then all we will have to do is call draw_screen in our event loop to refresh the emulator’s screen in our Browser’s Canvas.

There’s only 1 opcode that handles the drawing of the sprites on a chip8 emulator screen, it’s the DXYN. It does so by XORing rectangle areas with the previous frame, you can see its implementation here.

handling audio

The Emulator struct has a u8 sound_timer. We should hear a ‘beep’ noise if sound_timer has a positive value and decrement it, so the code is pretty straightforward:

pub fn sound(emulator: &mut cpu::Emulator, audio_context: &FmOsc) {
    match emulator.sound_timer {
        0 => audio_context.gain.gain().set_value(0.0),
        _ => {
            audio_context.gain.gain().set_value(0.04);
            emulator.sound_timer -= 1;
        }
    }
}

Once again, wasm-bindgen provides an example of how to use the AudioContext API.

handling user input

We want buttons that from the browser’s DOM would change our Emulator’s keypad field values, but we also want it to work on ALL devices that can run a browser, touchscreens included.

Therefore, we want the user to be able to do a keypress by:

  • clicking with a mouse on the button
  • pressing the key on his physical keyboard
  • touching a key on the virtual keyboard on a touchscreen device

First of all, we have to initialize our keypad in the GUI, then we will add callbacks to them so they can affect our Emulator struct.

Initializing the keypad is basically just creating a ‘keypad’ HTML element, setting its id and class name, and filling it with its 16 keys (1234QWERASDFZXCV). In their initialization loop, we hook them with the callbacks we’re going to define.

pub fn set_keypad(emulator_keypad: &Rc<RefCell<[bool; 16]>>) {
    let keypad = document()
        .create_element("keypad")
        .expect("should have a keypad.");

    keypad.set_id("keypad");
    keypad.set_class_name("keypad-base");

    append_to_body(&keypad);

    for (index, &key) in [
        "1", "2", "3", "4", "Q", "W", "E", "R", "A", "S", "D", "F", "Z", "X", "C", "V",
    ]
    .iter()
    .enumerate()
    {
        let keypad_key = document().create_element("div").unwrap();
        keypad_key.set_id(key);
        keypad_key.set_inner_html(key);
        keypad_key.set_class_name("key");
        keypad
            .append_child(&Node::from(keypad_key.clone()))
            .unwrap();

        // Handle clicks on virtual keypad
        set_callback_to_button(true, &keypad_key, emulator_keypad, index);
        set_callback_to_button(false, &keypad_key, emulator_keypad, index);

        // Handle keyboard events
        set_callback_to_key(true, key.to_string(), emulator_keypad, index);
        set_callback_to_key(false, key.to_string(), emulator_keypad, index);
    }
}

To define a callback upon a specific event, we have to use event handlers. This can be done using rust closures, and once again wasm-bindgen provides a nice example of how to do so.

Therefore, to fulfill our needs, we define two functions:

set_callback_to_key adds an event listener to handle keyboard event with the keyup and keydown events. These allow the user to play on the emulator using his physical keyboard.

pub fn set_callback_to_key(
    press: bool,
    key: String,
    keypad: &Rc<RefCell<[bool; 16]>>,
    index: usize,
) {
    let keypad_clone = Rc::clone(&keypad);
    let callback = Closure::wrap(Box::new(move |_event: web_sys::KeyboardEvent| {
        if _event.key().to_uppercase() == key {
            keypad_clone.borrow_mut()[index] = press;
        }
    }) as Box<dyn FnMut(_)>);

    web_sys::window()
        .unwrap()
        .add_event_listener_with_callback(
            match press {
                true => "keydown",
                false => "keyup",
            },
            callback.as_ref().unchecked_ref(),
        )
        .unwrap();
    callback.forget();
}

set_callback_to_button adds 2 event listeners so the user can play with the virtual keypad in the GUI:

pub fn set_callback_to_button(
    press: bool,
    button: &Element,
    keypad: &Rc<RefCell<[bool; 16]>>,
    index: usize,
) {
    let keypad_clone = Rc::clone(&keypad);
    let mouse_callback = Closure::wrap(Box::new(move |_event: web_sys::MouseEvent| {
        keypad_clone.borrow_mut()[index] = press;
    }) as Box<dyn FnMut(_)>);

    button
        .add_event_listener_with_callback(
            match press {
                true => "mousedown",
                false => "mouseup",
            },
            mouse_callback.as_ref().unchecked_ref(),
        )
        .unwrap();
    mouse_callback.forget();

    let keypad_clone = Rc::clone(&keypad);
    let touch_callback = Closure::wrap(Box::new(move |_event: web_sys::TouchEvent| {
        keypad_clone.borrow_mut()[index] = press;
    }) as Box<dyn FnMut(_)>);

    button
        .add_event_listener_with_callback(
            match press {
                true => "touchstart",
                false => "touchend",
            },
            touch_callback.as_ref().unchecked_ref(),
        )
        .unwrap();
    touch_callback.forget();
}

SPOILER ALERT: user input isn’t just about key presses :)

Indeed, as we said before, the user has to provide the emulator a ROM to run.

Handling user file input in rust was kind of a pain in the neck since there was no example of it in the wasm-bindgen documentation, and people on stackoverflow were wondering how to achieve it.

Eventually I found a way to do it with a mix of js-sys & web-sys by using the FileReader API, this was a great opportunity to contribute to stackoverflow for the first time.

a bit of GUI (Graphical User Interface)

Since we’re in a browser, we can style up the emulator buttons with CSS.

Basically, it displays the keypad in a grid, sets some hovering effects on buttons, and allows 2 orientation modes (landscape & portrait).

writing a debugger

A debugger is a computer program used to view, analyze, instrument & edit other programs, in our case: the chip8 emulator. The goal is to allow the edition of the emulator at runtime from the browser, to offer a complete control of the Emulator’s struct fields, RAM included (memory field).

Hence, the debugger should be able to do a few things with our Emulator struct:

  • offer a view of its field values
  • modify its fields values
  • make a snapshot of it
  • trace its execution at each CPU cycle

Thanks to serde, serialization allows to load and export our Emulator in JSON by just prepending our Emulator struct as follow:

#[derive(Serialize, Deserialize)]
pub struct Emulator {
    current_opcode: OpCode,
    /// [...]
}

This way we can build a debugger:

  • the Serialize trait on our Emulator struct allows us to export it in json
  • the Deserialize trait allows to spawn a new Emulator instance from a json.

So if you want to change the Emulator’s fields values, pause the emulator, copy the snapshot in JSON format to your clipboard (copy button), edit the JSON values, and load your modified JSON (load button).

If you’re curious enough you can check the code of debugger.rs, or try to load a Tetris snapshot I made in the emulator.

compiling & serving the emulator

Compiling is a 2 steps procedure. First, we build the project using cargo:

cargo build --release

Then, we have to use wasm-bindgen:

wasm-bindgen ./target/wasm32-unknown-unknown/release/chiphuit.wasm \
--out-dir build --no-typescript --target no-modules \
--remove-name-section --remove-producers-section \
--omit-default-module-path --omit-imports

The compilation emits 2 build files in our build directory:

  • chiphuit_bg.wasm, our emulator.
  • chiphuit.js, JavaScript glue code necessary to load our WebAssembly bytecode.

In our build directory lies an HTML file and a CSS file, both necessary to load our program and display the GUI made with CSS in a browser:

build/
├── chiphuit_bg.wasm
├── chiphuit.css
├── chiphuit.js
├── index.html
└── favicon.ico

Finally, serve this directory to play the emulator in your favorite browser.

troubleshooting

One of my first mistakes was to render the screen of the Emulator for each CPU cycle. This caused the emulator to be very slow since refreshing the screen has a cost and that chip8 CPU cycles by definition happen a lot per second.

Hence, I refreshed the screen every 10 CPU cycles and the emulator ran as expected.

[...] main function
if *emulator.running.borrow() {
            for _ in 0..10 {
                emulator.cycle();
            }
            emulator.update_emulator_state(&debugger.element.rows());
            graphics::draw_screen(&canvas, emulator.screen);
        }
[...]

Also, the emulator wouldn’t respond as expected on touchscreen, it would:

  • select the text in the keypad
  • zoom on double-clicks

This can be mitigated on the CSS side by adding a touch-action property to the body.

body {
  [...]
  touch-action: none;
  [...]
}

result

An instance of the emulator is hosted here, you can supply the emulator ROMs from here to play several games.

If you’re too lazy to click and download a ROM, here’s a video of the emulator running on an iPhone (more precisely it runs in Safari’s browser engine Webkit).

Also, the source code is open source and available here.

what’s next

First, chip8 instruction set can be extended, so we could modify the emulator to run even more ROMs if we wanted to.

Also, there are 2 possible improvements to the emulator:

  • targeting another architecture than WebAssembly using conditional compilation
  • using a GUI library to build a fully fledged debugger

If I use conditional compilation to target another architecture, I would have to remake the front since my current GUI targets a browser. So in both case I’m going to have remake the GUI.

Thus, I’m currently looking for a GUI library that would tackle both problems among these libraries:

I’m quite a fan of egui at the moment but haven’t tested others yet.

Anyway, maybe I’ll start another emulator with that in mind soon and keep that one at this state, according to the emudev community GameBoy is the next emulator to build after chip8.

Documentation & links