This article contains a brief overview and a background of the A-Z80 CPU created for FPGA boards and a ZX Spectrum implementation tied to it.
(You can find the Russian translation of this article here: https://howtorecover.me/z80-s-nulya)
Every so often I let go of all that's on my mind and simply brainstorm and play with new ideas and their combinations (mostly based on the retro stuff). Then I pick what seems to excite me the most and deep dive into it.
This time (early 2014), I wanted to re-make a Sinclair ZX Spectrum on an FPGA. There were already several implementations available and most of them used "off the shelf" components. One could simply pluck components written in Verilog (CPU, ULA) and with some glue logic quickly build a retro FPGA solution. There is no fun in doing that - or at least nothing much to learn about each component as one could learn by creating them from scratch.
I wanted to start with the Zilog Z80 CPU. Since the real people design their own CPUs (right?!), I decided I would make my own version of it. 🙂
Both parts of this project (the A-Z80 CPU and a ZX Spectrum code that uses it) are fully described in separate blogs, and here I will try to tie them together using a somewhat less technical narrative.
I started reverse-engineering Zilog Z80 CPU about a year ago by running a working chip on a custom Arduino dongle board. I described it here. It was interesting to see how the pins responded to various scenarios. I was mostly interested in undocumented behaviour hoping it will give me some hints on the CPU's internal architecture.
Then, I've found a set of articles by Ken Shirriff who actually reverse-engineered large portions of the Z80 from an image of a die. I've found the knowledge of "reading a die image" very exciting and the skill (in a weird way) very useful, so painstakingly I've learned to do the same. I reverse engineered a few other pieces of that silicon, like the IR register and a data pin. Soon I got a hang of it and started "reading" other parts, as needed. It felt like a whole new world opened right in front of my eyes! Not much unlike stereogram images: as you look at it long enough, gates suddenly pop up in your mind and you start seeing transistors, pull-ups and latches out of a tangled mesh of colorful flat traces.
I had purchased an Altera FPGA board with a desire to learn that technology after working on a project at my work which was using one, quite beefy, Altera chip. Designing a circuitry on my own, I felt, was an exciting challenge and a great learning opportunity. Soon, I figured out how Verilog/SystemVerilog "works"; a language not too difficult to learn for any software person. I also brushed up on a logic design (I had one lousy taught EE class some 20 years ago).
Long story short, I started implementing selected parts of Z80 in the Quartus schematic editor and wrote related ModelSim tests one by one: first came the ALU (since it's design was well described by Ken), then the register file, sequencer, etc. Frankly, I did not know how would all of that fit or work together; I was simply enjoying the process, one block at a time. I just loved seeing wires toggling and gates doing their thing in a ModelSim simulation: a certain deep understanding of a digital design sank in while playing with it.
I also read all I could get my hands on: patents on Z80 (and other CPUs) which provided valuable insights; talks, lectures and over time I got a pretty good ideas on how it all should work.
The most complicated task was to create the instruction timing matrix. Using all the information gathered in the process, I created an Excel spreadsheet with the exact timing of required micro-operations for each M (machine) and T (clock) stated of each opcode class.
This matrix defines the CPU and ensures it is fully Z80-compatible. In fact, if you modify it, you could create new instructions, enhance the CPU or even create a completely different one and the design would still just work!
The following set of images show the complete timing matrix printed from this XPS file. It is printed into 8 consecutive images; click on each to see it in a full, readable size:
Each column, B through AH, contains zero or more micro-operations on specific internal design blocks (this file deciphers the tokens used). Groups of instructions are listed in a row as statically decoded by the PLA table, each line corresponding to specific M and T cycle.
Rather early in the process, I devised several tests, including a full Fuse-based test suite which could run every instruction against a set of known results. That test proved the most valuable: it enabled to flag regressions as well as to check that the instructions were implemented correctly.
After some time, I was able to combine several Z80 instructions and run them in sequence. Enjoying many exciting milestone moments along the way, the one that stuck was when the PC register correctly incremented and was fetching successive instructions. They were also correctly executed since each one was tested separately: it all just worked!
After adding a virtual UART module in Verilog, I was able to get simple "Hello, world" programs sending the text out to a terminal! The same output would be captured by ModelSim simulating that code. It proved very helpful having several different ways to test and correlate the results.
The most difficult part was getting the interrupt handling done just right. Z80 has 3 interrupt modes (im0, im1, im2), maskable INT with two flags (the enable bit and its shadow) and non-maskable NMI. INT is level-triggered while the NMI is latched, edge-triggered and has precedence over INT. The pins should be checked at known times and be inhibited in certain situations, including during the execution of some instructions. It all needed to be working perfectly at known clock boundaries.
The solution ended up very elegant and used minimum number of extra gates: besides necessary state flags and priority logic, it used a couple of control signals wired just right: see the timings page 7, lines 1028-1060, entry "rst p" and how the instruction RST38 is being used to handle both NMI and INT (im2) at the same time. You start seeing a beauty of the original Mr. Faggin's design when you realize the coherence of the known behavior and the most optimal implementation. The simplicity of it became the proof of the implementation.
Once the CPU was wrapped up, it could run any code sequence using any instruction assembled and pushed onto a simple "board" model. That included some devilish and mean-designed interrupt-bombardment tests. It also run well known ZEXDOC and ZEXALL programs. They all passed except several XF/YF (undocumented flags) behavioral tests for 2 sets of instructions. Those are really weird and if you have ever played with Z80 emulators you would know what I mean. This particular Z80 behavior is likely a side-effect of internal bus charge/discharge cycles and some spurious control signals; various second-sourced and cloned Z80 chips also behave differently in these scenarios. I decided not to sweat over it as it had no practical impact.
Frankly, at this stage, I was already satisfied with the results since my original goal was to understand and replicate the internal architecture of the Z80, and I have achieved it.
Compare the following image - a conceptual block diagram of the A-Z80 CPU - with the annotated image of a Z80 die from the top of this article to see the similarity in the layout. I have placed schematic modules at the locations roughly corresponding to their positions on the Z80 die. Not to scale. You should be able to trace major buses (data bus in green and address bus in red) and zoom into detailed diagrams to see gate level implementations of the blocks. A few System Verilog sources appear just symbolically; naturally, the complete code would not fit on a printed page. Open it up in an image viewer that can zoom and pan.
By that time, I was already fairly fluent in Verilog so the coding of a ZX Spectrum model was not a problem. First, I added a video unit by picking one (most suitable) VGA timing standard and was able to display static screens from various games. Then came the correct character blink and border behavior. After that: keyboard interface, speaker, mic - all fairly simple modules. Once I got all internal mappings correctly set up, I powered it up, flashed it with the newest FPGA file and, to my huge excitement, saw a black screen clearing up and the magic "(C) 1982 Sinclair Research Ltd" prompt on the screen! I can't tell you just how exciting that moment was; I kept resetting it over and over just to see it appearing again!
Soon I was able to load and play games!
But, there was one problem: the design would randomly reboot or lock up. I spent a long time debugging it but a chance conversation with Ed (a co-worker of mine, Ed is a world-class chip designer who really knows this stuff) would reveal a possible problem: trying to follow the exact Z80 architecture, I have used transparent latches in my FPGA design throughout. Well, no FPGA designs use latches these days. Ed told me a latch design would have never worked [reliably]. Well, I did make it work, somehow. Kind of.
Back to the drawing board and one month later, I have re-implemented the whole design to use flip-flops. Accordingly, I also had to modify a few timings here and there. The new design is fully synchronous and meets all timings; it is being checked using the TimeQuest static timing analyzer tool and have fully constrained timings (using SDC files). In the meantime, I've also found and fixed several issues including one important bug fix in the instruction timing matrix - I speculate that this fix alone would have most likely had solved a random hang issue with the latches (so the latch implementation would have likely worked well), but I have never went back to check that. The final design is rock-solid to more than 18MHz (depending on various factors). It's open and free for anyone to use; I uploaded it to OpenCores and the complete User's Guide is here.
The lessons? I have learned a great deal of how to design, simulate and test a (small) microprocessor and work with FPGAs; I understood the Zilog Z80 inside-out by breaking it up into pieces and then re-assembling it from scratch; but most importantly, this ZX Spectrum implementation plays all 48K games I ever wanted to play again.
I would recommend similar endeavor to anyone curious enough to undertake it for your own benefits. If you are one of those, please email me and we can connect and if I can be of any assistance along the way, I will gladly help you in the same way I got helped.
Note: This article was updated in 2016. In the meantime, I have ported all SystemVerilog files to Verilog and did modifications needed to use it with Xilinx FPGA tools. The A-Z80 CPU is being used by some retro boards such is Mist board (link) and the port to few other boards is in progress.
To support loading ZX Spectrum games, I have also written an Android app PlayZX.
A-Z80 in more details...
This project is part of the: