Saturday, November 23, 2013

Toorum's Quest II - Retro video game and console

Okay, this project was a lot bigger and time consuming than I initially thought. The project started last summer and I planned to finish it by end of summer holidays. It took actually a few month longer and I'm glad it's finally done! Watch the following video to see what this is all about:



What did you just see? Simply put I made a retro 8-bit video game console, called The Box, and a 2D platformer game for it. Here is the feature list:

  • Based on ATmega328P running at 16 Mhz (same as Arduino Uno).
  • The game has a display resolution of 104x80 with 256 colors.
  • Video mode is tile based and supports up to 3 sprites per scan line.
  • Sprites are multiplexed so there can be unlimited number of sprites vertically on the screen.
  • 4 audio channels with triangle, pulse, sawtooth and noise waveforms.
  • Chiptune music playroutine and sound effects.
  • NES controller support.

Read on to learn more!

Source code in github

Top-left: 128x80 untiled titlescreen, Top-right: game screen, 104x80 tiled with sprites
Bottom-left: Final console hardware, Bottom-right: Prototype based on Arduino Uno


The Box hardware


The design is based on ATmega328P microcontroller (MCU) which has only 2 kilobytes of RAM and 32 kilobytes of program memory. The MCU is clocked at 16 Mhz, which makes the specs exactly the same as in Arduino Uno (intentional choice btw., because the initial prototype was built on Arduino Uno). Everything, the NTSC video signal generation, sound synthesis, music playroutine and game logic is running on the MCU, so many things had to be hand-optimized in assembler language. Only a single additional IC, the AD725 is needed in addition to the ATmega328P.

The most interesting part of the hardware is video signal generation. Here's the basic idea, inspired by Uzebox (which uses a more powerful MCU running at almost doubled clock rate btw.). The MCU outputs 8-bit colors in R3G3B2 format every sixth clock cycle and the bits are turned into analog voltages using resistor DAC. The R,G,B analog signals are fed to AD725, which is RGB to NTSC/PAL encoder. The AD725 outputs composite video signal. The AD725 requires a 14.31818Mhz clock signal for NTSC color modulation, so I have a DIP14 packaged crystal oscillator on board for this. The hardware design of the video stage is mostly based on the reference design in the AD725 datasheet. The AD725 is a surface mount part so I bought a SOIC28-DIP adapter for it and modded it to a SIOC16-DIP adapter to take less space the PCB.

The image quality is actually very good, the best we can get with composite video I would say. The picture is rock solid, only slight jittering can be seen between highly saturated colors problem inherent with composite video. If you going to build the console on breadboard expect lower visual quality -- only by building this on a PCB can you get rock solid picture. The breadboard version is actually not that bad, but after seeing the quality of PCB version you can't go back :)

Schematic (click to enlarge)


Parts list

Apart from standard value resistors and capacitors you need the following parts:
  • ATmega328P (easily found anywhere)
  • 16Mhz crystal
  • AD725 (I ordered 5 from China)
  • 14.31818Mhz crystal oscillator in DIP8 or DIP14 package (RS components has the DIP14 version)
  • SOIC28-DIP adapter for AD725 (I got it from Sparkfun)
  • The 10uF filtering caps on the power supply lines should be tantalum (recommened by AD725 datasheet)
  • 3.18k, 1.58k and 806 resistors (1% tolerance) for DAC
  • NES controller
  • NES controller socket (these can be bought online, e.g. from www.parallax.com)
You can also build this on a breadboard and connect it to Arduino Uno board. The compiled code fits into program memory with the standard Uno bootloader. It's much easier and faster to build the console like this but of course the end result won't be as pretty.

Tiled graphics mode with sprites


Video generation is written in AVR assembler. I don't know if anybody else has written a tile based color graphics mode with sprites on an 16Mhz Arduino compatible setup before. The MCU has only 2 kilobytes of RAM, so it's not enough to hold a frame buffer. The game uses a display resolution of 104x80 so with 8-bit colors 8320 bytes would be needed for the frame buffer alone, clearly out of our reach. So, first I had to do a tiled graphics mode.

The game screen is made of 13x10 tiles, each 8x8 pixels. A tile, therefore, consumes 64 bytes of program memory. I have a tile buffer of 13x10 pointers that point into tile graphics in program memory. On each scanline, I fetch the tile pointer from RAM, and pull 8 pixels from the tile and output the pixels exactly every 6 cycles. With pixel width of 6 cycles and with doubled scanlines the pixels are approximately square on screen. Pulling a pixel from program memory takes 3 cycles and outputting a pixel takes 1 cycle, so there are only 2 cycles remaining to fetch the tile addresses. With careful ordering of instructions and unrolling the loop it can be done. Overall, it was fairly easy to get the basic tiling setup working.

However, things started to get much more complicated because I also wanted to have sprites on top of the tiles. The ATmega328P running at 16Mhz is not fast enough to do the tiles and mask sprites on tiles during the time period of a scanline. It took me a while to figure out how do the sprites. Then it hit me. Because I have doubled the scanlines, I have actually two scanlines of time to process a single row of 104 pixels. In order to pull this off I had to use double buffering, so that while I was computing the next row of pixels, I was pulling in the previously computed scanline and still outputting pixels every 6th cycle. So on even scanlines, I do as many tiles as possible (which turned out to be 9 tiles), write the pixels to a scanline buffer WHILE reading the pixels of the previous scanline and outputting them to screen. On odd scanlines, I do the remaining 4 tiles, write the resulting pixels to memory, mask sprites on top of the tiles, and again while pulling pixels from previously computed scanline and outputting them to screen. A buffer holds a single row of 104 pixels. After two scanlines the buffers are swapped. Doing everything while outputting a pixel every 6 cycles meant that every cycle had to be counted.

There is actually three seperate "threads" running in the code and the threads are manually interleaved. This was very painful to code but eventually I managed to do it. The result is a video mode where I can have 3 sprites on each scanline. The game has actually more sprites because I can reuse the hardware sprites vertically on the screen by using multiplexing. I have a buffer in RAM which stores the sprite locations and image pointers for three sprites on each scanline. Multiplexing the sprites is as simple as writing the sprite data to the buffer in the correct place.


Multichannel music and sound effects


I also wanted to have multichannel music in the spirit of C64's SID and Rob Hubbard (the best chiptune musician ever, just listen to the music of Commando, International Karate or Monty on the Run if you don't believe me). Unfortunately there is not enough time left on the scanlines to do any sound synthesis. So sound had to be generated in the vertical blank period when the MCU is not busy doing the tiles and sprites. There are max 263 scanlines on a NTSC screen, so I fill a buffer of 263 bytes of 8-bit audio samples during the vblank. The video generation reads the samples and sends them out of the chip using pulse width modulation (PWM). Since we are constantly sending out samples while generating new samples, the sound needs to be double buffered. Otherwise clips and pops can be heard.

The audio system supports 4 channels, with triangle, pulse with varying pulse width, sawtooth and noise waveforms. Volume is controlled using ADSR envelopes. Oscillators and mixing is coded in assembler. The music playroutine is pretty much a standard four channel tracker with support for pulse width animation, volume slides, arpeggios, vibrato and portamento effects. Music data is compressed in memory so that each track row uses only 1 byte. The catchy tune was composed by Antti Tiihonen aka jpeeba using a custom textmode tracker I wrote just for this project.

Other tidbits


Rooms are stored compressed in program memory using a simple RLE compression. I had very limited RAM left because all the sound buffers, scanline buffers, tile pointers and sprite buffers use up almost every byte of available RAM. I could not have the game state of every room simultaneously in memory. So when the player moves to a new rooms I store only collected hearts, gold and opened doors as compressed bitfields in RAM. This way each inactive room consumes only 1 byte of memory as long as there are only 8 things per room to be stored.

Some of the tiles are animated on the screen: gold pieces, hearts and the princess are technically background tiles. I only swap their tile pointers every few frames. To make this really fast I scan only a single row of tiles per frame, so that the whole screen is updated every 10 frames.

There are actually three different video modes in the game: the main game mode with tiles and sprites (13x10 tiles), untiled titlescreen mode with 128x80 resolution and intro text mode with 14x10 tiles with no sprites. I did not need sprites for the titlescreen and there was space left in program memory so I could afford a slightly bigger resolution for the titlescreen. I couldn't fit the intro text beautifully into only 13x10 tiles, so I had to do a custom graphics mode with one more tile horizontally for the intro ;-)

In the end, there are only a few bytes of RAM and about 200 bytes of program memory left. I know by optimizing and with better compression techniques (and removing one of two of the extra video modes) I could fit even more into memory but luckily the game does not really need more stuff.

Thanks for reading!

Etched circuit board (excuse the hand drawn lines)

Media coverage:

Legend of Grimrock Co-Creator Builds 8-Bit Game On DIY Console
8-Bit Video Game is Best of Retro Gaming on a Shoestring Budget
8-bit gaming with Atmel’s ATmega328P
The True 8-Bit Video Game Toorum’s Quest II And The Console Made To Play It

7 comments:

  1. Hi peten,

    The resistors in your video DAC, you say standard values, but those values look like SMD values, while your PCB is using through hole metal film resistors.

    Is this a typo?

    Cherrs

    ReplyDelete
    Replies
    1. Hi! 806, 1.58K and 3.16K are standard resistor values in the E96 series with 1% tolerance.

      Delete
  2. I understand sticking with 16mhz for arduino compatibility, but those chips are specced to go up to 20. what could you do with the extra speed, if you decided to?

    ReplyDelete
    Replies
    1. Oh, great job, btw... :)

      Delete
    2. Thanks! :)

      Yeah, at the start of the project I pondered for some time about whether I should run the MCU at 20Mhz but I decided to keep the clock rate at 16Mhz for Arduino compatibility.

      20Mhz would allow 25% more pixels per scanline or a few more sprites. Higher resolution graphics would require redoing the graphics, and probably touching a lot of code, though.

      Delete
  3. First of all, congratulations on this! :)

    If you had no sound or a less complex sound generation like only one square wave, you could expand video resolution, right?
    Reading bits directly from a PORT byte (like Atari or SMS joysticks) instead of clocking in the data could optimize the software as well?

    Cheers!

    ReplyDelete
    Replies
    1. Thank you!

      Sound is generated during vertical blanking period and stored in a buffer. I just pull one sample per scanline from the buffer and output it. If I would skip sound generation I could expand horizontal resolution by about 1 pixel :)

      Gamepad input is fairly cheap as it is. I have to read the bits one by one because that's how the serial protocol used by NES controller works.

      Delete