Skip to content

The A-Z80 CPU

This article contains a brief overview and a background of the A-Z80 CPU created for FPGA boards and a ZX Spectrum implementation tied to it.

Every so often I let go of all that's on my mind and simply brainstorm and play with new ideas and combinations (mostly based on retro stuff). Then I pick what most excites me and deep dive into it.

(partial) list of brainstorming ideas (2014)
(partial) list of brainstorming ideas (2014)

This time (early 2014), I wanted to re-make a Sinclair ZX Spectrum on an FPGA. There were already several implementations available and most of them used "off the shelf" components. One could simply pluck components written in Verilog (CPU, ULA) and with some glue logic quickly build a retro FPGA solution. There is no fun in doing that - or at least nothing much to learn about each component as one could learn by creating them from scratch.

I wanted to start with ZIlog Z80 CPU. Since the real people design their own CPUs (right?!), I decided I would make my own version of it. ūüôā

Both parts of this project (the A-Z80 CPU and a ZX Spectrum code that uses it) are fully described in separate blogs, and here I will try to tie them together using a somewhat less technical narrative.

I started reverse-engineering Zilog Z80 CPU about a year ago by running a working chip on a custom Arduino dongle board. I described it here. It was interesting to see how the pins responded to various scenarios. I was mostly interested in undocumented behaviour hoping it will give me some hints on the CPU's internal architecture.

Then, I've found a set of articles by Ken Shirriff who actually reverse-engineered large portions of the Z80 from an image of a die. I've found the knowledge of "reading a die image" very exciting and the skill (in a weird way) very useful, so painstakingly I've learned to do the same. I reverse engineered a few other pieces of that silicon, like the IR register and a data pin. Soon I got a hang of it and started "reading" other parts, as needed. It felt like a whole new world opened right in front of my eyes! Not much unlike stereogram images: as you look at it long enough, gates suddenly pop up in your mind and you start seeing transistors, pull-ups and latches out of a tangled mesh of colorful flat traces.

I had purchased an Altera FPGA board with a desire to learn that technology after working on a project at my work which was using one quite beefy Altera chip. Designing circuitry on my own, I felt, was an exciting challenge and a great learning opportunity. Soon I figured out how Verilog/SystemVerilog "works", a language not too difficult to learn for any software person. I also brushed up on a logic design (I had one lousy taught EE class some 20 years ago).

Long story short, I started implementing selected parts of Z80 in the Quartus schematic editor and wrote related ModelSim tests one by one: first came the ALU (since it's design was well described by Ken), then the register file, sequencer, etc. Frankly, I did not know how would all of that fit or work together; I was simply enjoying it one circuit at a time. I just loved seeing wires toggling and gates doing their thing in a ModelSim simulation: a certain deep understanding of a digital design sank in while playing with it.

modelsim_run

I also read all I could get my hands on: patents on Z80 (and other CPUs) which provided valuable insights; talks, lectures and over time I got a pretty good ideas on how it all should work.

The most complicated task was to create the instruction timing matrix. Using all the information gathered in the process, I created an Excel spreadsheet with the exact timing of required micro-operations for each M (machine) and T (clock) stated of each opcode class.

This matrix defines the CPU and ensures it is fully Z80-compatible. In fact, if you modify it, you could create new instructions, enhance the CPU or even create a completely different one and it would just work!

The following set of images show the complete timing matrix printed from this XPS file. It is printed into 8 consecutive images; click on each to see it in a full, readable size:

A-Z80 Timings 1/8
A-Z80 Timings 1/8
A-Z80 Timings 2/8
A-Z80 Timings 2/8
A-Z80 Timings 3/8
A-Z80 Timings 3/8
A-Z80 Timings 4/8
A-Z80 Timings 4/8
A-Z80 Timings 5/8
A-Z80 Timings 5/8
A-Z80 Timings 6/8
A-Z80 Timings 6/8
A-Z80 Timings 7/8
A-Z80 Timings 7/8
A-Z80 Timings 8/8
A-Z80 Timings 8/8

Each column, B through AH, contains zero or more micro-operations on specific internal design blocks (this file deciphers the tokens used). Groups of instructions are listed in a row as statically decoded by the PLA table, each line corresponding to specific M and T cycle.

Rather early in the process, I devised several tests, including a full Fuse-based test suite which could run every instruction against a set of known results. That test proved the most valuable: it enabled to flag regressions as well as to check that the instructions were implemented correctly.

After some time, I was able to combine several Z80 instructions and run them in sequence. Enjoying many exciting milestone moments along the way, the one that stuck was when the PC register correctly incremented and kept fetching successive instructions. They were correctly executed since each was tested separately: it all just worked!

After adding a virtual UART module in Verilog, I was able to get simple "Hello, world" programs sending the text out to a terminal! The same output would be captured by ModelSim simulating that code. It proved very helpful having several different ways to test and correlate the results.

The most difficult part was getting the interrupt handling done just right. Z80 has 3 interrupt modes (im0, im1, im2), maskable INT with two flags (the enable bit and its shadow) and non-maskable NMI. INT is level-triggered while the NMI is latched, edge-triggered and has precedence over INT. The pins should be checked at known times and be inhibited at others, including during the execution of some instructions. It all needed to be set perfectly at well known clock boundaries.

The solution ended up very elegant and used minimum number of extra gates: besides necessary state flags and priority logic, it used a couple of control signals wired just right: see the timings page 7, lines 1028-1060, entry "rst p" and how the instruction RST38 is being used to handle both NMI and INT (im2) at the same time. You start seeing a beauty of the original Mr. Faggin's design when you realize the coherence of the known behavior and the most optimal implementation. The simplicity of it became the proof of the implementation.

Once the CPU was wrapped up, it could run any code sequence using any instruction assembled and pushed onto a simple "board" model. That included some devilish and mean-designed interrupt-bombardment tests. It also run well known ZEXDOC and ZEXALL tests. They all passed except several weird (undocumented flags) XF/YF behavior for 2 sets of instructions. Those are really weird and if you have ever played with Z80 emulators you would know what I mean. I believe this particular Z80 (mis) behavior is likely a side-effect of internal bus charge/discharge cycles and some spurious control signals since different second-sourced Z80 chips behave differently in these scenarios. I decided not to sweat over it as it had no practical impact.

Frankly, at this stage, I was already satisfied with the results since my original goal was to understand and replicate the internal architecture of the Z80, and I have achieved it.

Compare the following image - a conceptual block diagram of the A-Z80 CPU - with the annotated image of a Z80 die from the top of this article to see the similarity in the layout. I have placed schematic modules at the locations roughly corresponding to their positions on the Z80 die. Not to scale. You should be able to trace major buses (data bus in green and address bus in red) and zoom into detailed diagrams to see gate level implementations of the blocks. A few System Verilog sources appear just symbolically; naturally, the complete code would not fit on a printed page. Open it up in an image viewer that can zoom and pan.

A-Z80 block diagram (thumb)
A-Z80 block diagram; click to expand

By that time, I was already fairly fluent in Verilog so the coding of a ZX Spectrum model was not a problem. First, I added a video unit by picking one (most suitable) VGA timing standard and was able to display static screens from various games. Then came the correct character blink and border behavior. After that: keyboard interface, speaker, mic - all fairly simple modules. Once I got all internal mappings correctly set up, one morning I powered it up, flashed it with the newest FPGA file and, to my huge excitement, saw a black screen clearing up and the magic "(C) 1982 Sinclair Research Ltd" prompt on the screen! I can't tell you just how exciting that moment was; I kept resetting it over and over just to see it appearing again!

Soon I was able to load and play games!

The Legend of Avalon, Graftgold Ltd, 1984
The Legend of Avalon, Graftgold Ltd, 1984

But, there was one problem: the design would randomly reboot or lock up. I spent long time debugging it but a chance conversation with Ed (a co-worker of mine, Ed is a world-class chip designer who really knows this stuff) would reveal a possible problem: trying to follow the exact Z80 architecture, I have used transparent latches in my FPGA design throughout. Well, no FPGA designs use latches these days. Ed told me a latch design would have never worked [reliably]. Well, I did make it work, somehow. Kind of.

Back to the drawing board and one month later, I have re-implemented the whole design to use flip-flops. Accordingly, I also had to modify a few timings here and there. The new design is fully synchronous and meets all timings; it is being checked using the TimeQuest static timing analyzer tool and have fully constrained timings (using SDC files). In the meantime, I've also found and fixed several issues including one important bug fix in the instruction timing matrix - I speculate that this fix alone would have most likely had solved a random hang issue with the latches (so the latch implementation would have likely worked well), but I have never went back to check that. The final design is rock-solid to more than 18MHz (depending on various factors). It's open and free for anyone to use; I uploaded it to OpenCores and the complete User's Guide is here.

The lessons? I have learned a great deal of how to design, simulate and test a (small) microprocessor and work with FPGAs; I understood the Zilog Z80 inside-out by breaking it up into pieces and then re-assembling it from scratch; but most importantly, this ZX Spectrum implementation plays all 48K games I ever wanted to play again.

I would recommend this endeavor to anyone curious enough to undertake it for your own benefits. If you are one of those, please email me and we can connect and if I can be of any assistance along the way, I will gladly help you in the same way I got helped.

Manic Miner loads using PlayTZX
Manic Miner loads using PlayTZX

Note: This article was updated in 2016. In the meantime, I have ported all SystemVerilog files to Verilog and did modifications needed to use it with Xilinx FPGA tools. The A-Z80 CPU is being used by some retro boards such is Mist board (link) and the port to few other boards is in progress.

To support loading ZX Spectrum games, I have also written an Android app PlayZX.



This project is part of the:

Homebuilt CPUs WebRing





18 thoughts on “The A-Z80 CPU

  1. Ed

    Hey Goran, nice to read a new post from you. Genious as always. I ended up opening a blog to discuss the work I did in one custom Z80 part used in arcade games. Take a look when you can: arcadehacker.blogspot.com

    Reply
  2. Jeff Braine

    Rock solid up to 10MHz... have you tried it at faster clock speeds and had failures?
    Just wondering, as I currently have a user-programmable CPU speed on ZX Prism up to 51Mhz

    Cheers, Jeff

    Reply
    1. Goran Devic

      I did not do a more extensive performance characterization, let alone tuning. I know that 50% penalty is due to data/address pin latching on opposite edges - and that was a compromise I had to make to port it from using latches to using flops. If that's changed, certain bus timings would be 1T off (instead of 1/2T) but fMax would be ~20MHz.

      The goal was not to have a super-fast Z80 but an accurate reproduction of it (as much as it's feasible) including being able to "see" Z80 internals at the schematic level. (People needing a fast soft-cpu have many alternatives!)

      That said, it might be interesting to (1) tweak the arch and intentionally deviate from Z80 bus timings (or add delays) just to see how fast it can be made to run, and (2) have an original, latch-based version.

      Reply
  3. mp

    I'm currently trying to port the a-z80 to a lattice FPGA using icestorm tools, and I'm running into some issues with getting data_pins.v mapped (tool thinks that dout has mutiple drivers). What is that module supposed to do? I can't seem to get my head around what that module does..

    Reply
    1. gdevic

      data_pins.v (generated from data_pins.bdf which is a Quartus schematic file) contains D[7..0] gates, it is an interface to the outside world for data bus. D[7..0] are bidirectional pins, don't know if and how Lattice toolset deals with bidir wires.

      Reply
  4. mp

    replaced the code with

    seems to map properly now.

    Reply
    1. juansolsona

      Hi Goran, MP,

      I am also having issues when synthesizing A80 for Lattice FPGAs with Yosys, (icestrom), my issues are:

      * After synthesis (on FPGA) in the out cycles the data bus instead of holding the out value, holds A[7:0] (condition nIOREQ==0 nWR==0 nRD==1 nMREQ==1). No matter if I insert a wait state.
      * On behavioural simulation, with IVerilog/GTKWave all the write cycles have an undefined value in the address bus.

      Did you guys had same experiences as me?

      MP, please may you post your complete data_pins.v file? I can check if that fixes also my issues.

      Regards
      Juan

      Reply
      1. gdevic

        New data_pins_lattice.v code has been pushed to BitBucket (https://bitbucket.org/gdevic/a-z80)
        The export.py file has been modified to notify of this special case with the Lattice toolset; for now, simply remove data_pins.v and use Lattice variation.

        Unfortunately, I don't have a Lattice setup to check; I could get a board from eBay, but not sure about the total cost of all the tools. Right now I do verify every change on Altera and Xilinx (and also people email me as they are using various boards but it has been only Altera or Xilinx up to now).

        Reply
        1. Juan Solsona

          Hi Goran,

          If you are thinking in buying a development board, make sure to use the "iCE40-HX8K Breakout Board" (http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/iCE40HX8KBreakoutBoard.aspx) at minimum. Please note that they are really small FPGAs, a A-Z80 takes about 50% of the resources on that hardware.

          About the tools, the ones I am using are free! I follow the guide for APIO Framework (https://github.com/fpgawars/apio)

          Please do not hesitate to contact me if need some support.

          Reply
    2. gdevic

      Did you have any other issues with the Lattice toolset and this code?
      I am adding it to the documentation and your code to the release as a Lattice special case.
      Thanks!

      Reply
  5. Juan Solsona

    Hi Goran,

    Sorry, I forget to really thank you for so excellent work. It is really cool and a reference for me. I am a rookie in RTL, so maybe in my following lines you may find some mistakes.

    As I wrote before, I cannot simulate properly write cycles using your lattest sources and nor modelsim or IVerilog/GTKWave simulators, I always have the same issue, undefined address bus values on write cycles.

    I made the following top module using your CPU:

    And this testbench:

    And, also wrote a simple assembler application that spins as follows:

    It is clear that the processor executes properly fetch cycles, and I tested also read cyces, but with writes the processor always places address bus to undefinted. I have a bitmap that could upload with some help.

    Thanks for your support
    Juan

    Reply
      1. gdevic

        Hi Juan,

        Our posts got a bit out of sync since I have to manually approve each, otherwise I was getting tons of spam. Likewise, I manually edit each post (incl. yours) for source formatting, so don't worry about that.

        Would you mind we move our conversation to email? It would work better until we resolve your issues. Please email me at gdevic at yahoo com

        Everything that you mentioned so far is still using Lattice toolset, right? I test on both Altera and Xilinx and those have no issues synthesizing. If you could advise which Lattice tools to donwload (free, I hope), perhaps I could repro some of the things you see, at least the simulation part. Hope the learning curve on it is not too bad...

        I run ModelSim (Altera edition) extensively, though.

        Thanks!

        Reply
  6. Jeffrey Burrell

    Hi, Goran
    I am trying to compile your core (downloaded from GitHub) and am getting many errors like this:
    275022 Illegal bus range or name for logic function for instance "alu_" of type 4-BIT ALU CORE UNIT

    My usual language is VHDL, so I don't know if this a real problem with the files or some Verilog thing I don't know about. I would appreciate any insight you can provide.

    Also, the top level files for the hosts are not in your GitHub files. This is not a problem for me since I can easily interconnect them, but it might be a problem for others.

    Thanks

    Jeff

    Reply
    1. gdevic

      Jeff, I am using "Quartus II Version 13.0.1 Web Edition" for Altera Cyclone II device on DE1 board. I just double checked and all files should be there. Did you try to load "https://github.com/gdevic/A-Z80/blob/master/host/basic_de1/basic_de1.qpf" for example?
      You can email me in private and I will try to help you.

      Reply

Leave a Reply