Saturday 14 December 2019

MAX1000 NIOS

SYSIN and SYSOUT
With a working NIOS processor available to us, thoughts turn more to software.  Programs are written in C and compile by Eclipse.  The first requirement for C programs is to have SYSIN and SYSOUT available.  If we have a single serial port, for example MAX1000 JTAG UART SYSIN/OUT are assigned by default to JTAG.  If we have multiple serial ports we can choose (in BSP) which ones to use.

Hello World
The simplest way to create C applications is to "create new application and BSP " in Eclipse.  There is a basic "Hello World" program which sends a SYSOUT message.  Tutorials generally instruct you to ensure you are using small C library and reduced device drivers to save space and the "Hello World small" causes these to be used.

Typically C "Hello World" specifies <stdio.h> and calls printf for terminal output.
If you select the small option you use an alternative library and functions.


Our first processor has about 40KB on-chip memory defined which is just about enough for printf output but not for input as well.  To have a program including input and output we use the small libraries and alt_putstr/alt_getchar.  These can use as little as 800B program memory.

SDRAM
The physical Altera FPGA chip has 64kB on-chip RAM and 64MB SDRAM so it makes sense to use SDRAM for our C programs and data.A tutorial from University of Las Vegas suggests that you can simply add SDRAM and directly replace on-chip memory with it.
I ran into difficulties when I tried to add SDRAM as it requires lots of pins to be defined.  I found that the test_board project provided with CycloneIV documentation checks SDRAM so I used the pin assignments from this for my SDRAM.

Once I had added SDRAM in platform designer I removed on-chip memory and pointed reset and exception vectors to SDRAM.
Now when I download programs from Eclipse to NIOS I can make them as large as I like.  A simple C program with printf and scanf takes about 100KB.

MICRIUM
There is a third variant Hello World program provided in Eclipse which uses MicroC-OS/II  RTOS.  It seems sensible to me to have an RTOS in an embedded processor.  I could compile the RTOS Hello World quite easily and it only took 60KB.  In fact it started two threads which run simultaneously.  It will be useful for multi-tasking but it doesn't have a user shell (unless I write it myself).





Cyclone IV NIOS

Introduction
Whilst building CycloneIV 8080 CPU I checked out a NIOS processor, in particular to determine how it used serial I/O.
Altera have provided NIOS as their ready-made processor for a number of years.  One reason for using FGPAs is to combine bespoke digital logic together with embedded uProc in the same chip and not many customers would want to build these features from scratch.[November 2019]

Overview
To specify hardware Altera provide Platform Designer so that you can choose components for you processor including a NIOS II core.  When you have working hardware software build tools (SBT) for Eclipse (IDE) enable you to specify, compile and download C Programs to the processor.
If you need extra functionality / peripherals you simply add the to the hardware design and utilise them in your program.

Samples
CycloneIV development kit came with some NIOS samples.  To check out the LED example I simply downloaded the SOF file and it ran.  Similarly for the Bell.  Unfortunately TFT examples didn't work directly.

Simple Processor
It is very easy to create a simple processor in Platform Designer.  The minimal components needed are a clock, somoe on-chip memory, a NIOS core and parallel IO for some LED outputs.  I wanted to add terminal I/O to the solution as I was researching that facility for my 8080 CPU and also being a bit fed up with using LEDs for debugging.
Using LEDs for output was easy but, try as I might I couldn't use JTAG USB as a serial port.  I feel that it may not be supported, or perhaps my USB blaster doesn't support that function.  Along the way I found it is much easier to debug NIOS problems using command in the NIOS2 shell


Once I used the DB9 serial port the process became a lot simpler.  Having tried a number of tutorials, mainly on youtube, my favourite was from Labbook pages. This helped me understand what we are creating at each step of the process.  



Useable Processor
The labbook processor included the C executable in its image. I used a youtube tutorial to create my useable processor which contained LEDs and serial I/O as well as the ability to load programs through Eclipse.

Extending Functionality
I could now add a Seven Segment display to my processor.  I chose to do this by specifying a 16-bit number to output from the CPU and then utilise previously written verilog to convert this to a number of hex digits and display them.  The display works by refreshing each digit in turn every millisecond so that all the digits appear to remain lit.  
I could have put this functionality in the NIOS processor and output 7SEG pin signals, but that was more work.



Cyclone IV 8080 CPU

When I have successfully (re-constructed) the 8080 CPU on  MAX10000 it should be straightforward to build the same functionality into CycloneIV.  We can use project stages from MAX1000 build and amend the project for CycloneIV pins and hardware components.[October 2019]

Stage 1
Create a project and use a generic verilog program (LEDwater) to check that LEDs are are setup.  The Top Level module becomes Board.v which we will use as a "PCB" for our processor.
We can then add CPU functions in the increments:
test states
Add data memory
Add ALU

Stage 2
CycloneIV provides RS232 UART for I/O.  I had an old PL23203 DB9 cable which I could plug in to the connector and PC-USB.  Unfortunately it was too old; although I persuaded it to do output I couldn't do input until I bought a new cable.
It was then straightforward to implement keyboard input and screen output via a Putty terminal emulation session.

Stage 3
In addition to the DB9 UART CycloneIV allows terminal communication via USB for a NIOS console and this same feature worked fine for my 2nd MAX1000 serial interface. I wasn't able to get the inbuilt port to work on CycloneIV so I added an FTDI RS232 interface for the serial port.  Once the corresponding CPU verilog functions had been added I had a working CycloneIV 8080 development environment complete with program load capabilities.

Thursday 12 December 2019

Elektor processor dissection and reconstruction

To understand how a machine works you can, perhaps, take it apart and put it back together again.  Whilst Elektor exp5 is great to see and you can look at the code to get a general idea of its construction, something more is needed to become familiar and understand it better.[October 2019]

Dissection


Stage 1
The processor has a debug serial input allowing you to type in single character commands and get text output.  I checked that I could add commands myself to look at the processor

Stage 2
I removed unwanted peripherals from Top.v, the top level function:  DAC, accelerometer, SPI.
I then took out UART processing as this requires a lot of code.  Subsequent stages use LEDs for output.

Stage 3
I slowed down the CPU to 16Hz (400 ticks) so that LED changes appear in real time - ie without needing to insert delays.
At the end of this stage Top.v is small but we still have a working CPU running C programs and producing output.

Stage 4
Firstly we remove the code for LED output from the processor and use LEDs for debugging output instead.
We can see, in Icarus each opcode being processed.
We can remove the special states div1, div2, readmemat3 without affecting processing.
Finally we can remove all the opcodes from the CPU except for jumps.  A test program now loops but other codes are treated as NOP.
The end product is a processor with a clock, program counter and jump instructions.

Construction

1 Clock
We start a new MAX1000 project and add ALTPLL clock and LPM_COUNTER IP.  In a skeleton top level program Board.v we incorporate these components and output appropriate bits from the counter to LEDs so that we can see binary values being incremented.
In Quartus we need to add LED, clock pins and timing (SDC) information.

2 States
Add a skeleton Cpu.v which just switches between the states fetch, decode, readmem etc.
Add Testbench.v so that we can run tests in icarus first.
Add USR_BTN which stops the processor when pressed, we can use this for single stepping.
Use LEDs to see the processor cycles through instructions and states.

3,4 Codemem, datamem
We implement JMP and NOP instructions.
We use a program copied from stage 4 above and can see the program counter increasing until the JMP and then looping round.
We can now implement instructions ST (store), LD (load) to access memory and LDIND, STIND.  We setup a stack at the end of datamem  and implement CALL, RET. Add stack operations e.g. PUSHR0, ADDSP,....
We also add HALT to finish the program.

5 Arithmetic
Add arithmetic, logic and comparison instructions:
IADD
XOR, OR, AND, COM, NEG, MUL
CMPEQ/NE/LT/LE/GT/GE, CMPULT/ULE/UGT/UGE
Also add conditional jumping
A few more optimiser instructions (added to C by the author to decrease number of instructions) were also added.
At this stage it is possible to compile and run a c program containing code like result=i+i;

6 Output
All peripheral output is directed by the OUTA instruction. Initially we implement channel 5 to set LED values.  We then add channels 9 (output character), 8 set output speed and input channel 5 (determine bits left to transmit).  The CPU needs code to process the channels and Top.v needs corresponding details for physical hardware processing.
We have to add TXuart.v to do the bit-banging.
We can then run compiled programs including the C putchar() function.

7 Input, debug and load
Add RXuart.v module
Add, irq processing to CPU
Add bootload/standalone parameter to switch program load.
This was quite an extensive step at the end of which we had most C instructions available to us.
Quite a lot of code 

8 RTC, DAC, LIS
Finally we add other peripheral functions to the processor so we are confident we have a complete working system.

Conclusion
This was a time-consuming and very worthwhile exercise which allowed me to understand how the Elektor-provided verilog code creates an 8080 processor.  I kept variable names and formatted code the same so that the final result doesn't look radically different from the starting version but I understand content a lot better.

MAX1000 FPGA UARTs

UARTs are particularly useful for FPGA embedded processors as it quickly becomes very tedious using LEDs for debugging and for program output.  The Elektor 8080 embedded processor experiment 5 requires two UARTs one for terminal I/O and the other for program load and debug statements.

As an introduction to UARTs I used an electronoobs tutorial which provides a clear explanation of the verilog required to send and receive bits.  I used MAX1000 pins A4/B4 to utilise the internal UART for testing.

For the second UART Elektor I used an FTDI RS232 cable attached to pins M2/M1. I could have used any available GPIO and if I wanted I could add more terminals using more FTDIs/GPIOs.

Initially, in Elektor experiments 1 and 2,  executable 8080 C programs are loaded into the image which is downloaded to MAX1000.  Changing a program requires you to compile a program and put the executable in the Quartus project folder then running synthesis and loading using Programmer, which very quickly becomes very tedious.  Elektor experiment 5 uses Processing (a C environment on PC equivalent to arduino) to transmit executable 8080 C programs to MAX1000 via a serial interface.  The interface also provides some basic commands to be be used for debugging.

Our first serial interface (built-in B4/A4) is required for SYSIN/SYSOUT terminal I/O and we use the second one (FTDI M2/M1)  for program loading. [August 2019]