An 8-bit Stack Processor
author: Steven Sutankayo
designers:
Rob Chapman and
Steven Sutankayo
1.0 Introduction
Digital systems can sometimes be implemented more cheaply or easily with
a processor based solution. This can allow hardware savings by implementing
functions algorithmically rather than in pure hardware. Instead
of using an external processor, a lightweight processor can be synthesized
on the FPGA and the remainder of the chip made available for system peripherals.
An on-board processor is thus suitable for "system on a chip" applications.
This application note presents a general purpose 8-bit stack processor
(SP) suitable for such applications.
A general overview and history is presented, along with the source files
and example usage.
2.0 Overview
For an overview of stack-based processors, refer to http://www.cs.cmu.edu/~koopman/stack_computers/index.html
.
The processor is an Altera FPGA implementation of an existing processor
created by my lab partner Rob Chapman in previous courses. His papers
"A Stack
Processor: Synthesis" and "A
Writable Computer" describe the creation and evolution of the
processor. There is also some useful documentation on how we use
the stack processor to implement a neuroprocessor network in our project
specification document.
2.1 Architecture
The processor has an eight bit data bus, eight bit address bus, one data
stack, and one return stack. It is a zero operand processor, which
means that the instruction opcodes do not contain information pertaining
to the source, destination, or value of its instruction arguments.
As is prevalent in stack processors, there are no data or address registers.
Addition and substraction is performed with a statistical addition circuit
in order to conserve chip resources.
2.2 Programmability
Instruction sets are available for data stack manipulation, program branching
and returning, conditional branching, addition, subtraction, pointer fetching,
pointer loading, and single-instruction incrementing of counters and pointers.
Text source code files can be assembled into a Memory Initialization File
(.mif file) format which is used to intialize memory at compile time.
2.3 Synthesis Issues
A simple test system requires 500 LCs, approximately 90% of an Altera 10k10
series FPGA. Version 8.1 of the maxplus2 design package was used.
Since the source file assembler outputs .mif files, program memory should
be implemented using the LPM_RAM_DQ megafunction. This implements
memory using Altera's Embedded Array Blocks (EABs). The use of EABs
for memory also reduces the delay that would be associated with combinational
lookup tables.
3.0 Design Files
Two sets of files are available: stackprocessor.tar
and example.tar.
3.1 Stackprocessor.tar
Included with in stackprocessor.tar are all the source code files
and project configuration files (.cnf files) required to compile the SP:
Also included is the Stack Processor Assembler (SPASM). SPASM
compiles the user's source code file into a .mif file. It is a two-pass
assembler, so program counter labels may be used by the programmer and
they will be resolved by the assembler. If the "-w" option is used,
SPASM also compiles instruction code constants into a VHDL package.
This is only necessary if the designer wants to modify or add instructions
to the instruction register. The PERL interpreter is required to
run SPASM.
3.2 Example.tar
The example.tar package contains a sample design for memory, and
a sample top-level design file. A summation program is included which can
be compiled by SPASM and used to provide the memory initialization file
for the test system.
4.0 Using the Processor in a Design
Using the stack processor to create a digital system involves designing
memory, creating the top-level entity (i.e. your stack processor based
system), creating the program source file, assembling it, initializing
memory, and simulation. If new instructions are needed, the stack
processor can be modified, but this requires detailed knowledge of the
processor's internals.
4.1 Memory Specification
To function properly, the SP requires the system
memory behave in a well-known way. Memory is synchonous. To access a particular
memory location, the appropriate address is placed on the address lines.
On the next rising clock edge, the address is latched in. The SP
assumes that the data will be available on the next rising clock edge.
As mentioned, the LPM_RAM_DQ megafunction is used to implement program
memory. Altera recommends that this memory be configured as synchronous.
There are two clock inputs to the RAM: address_in and data_out. If
both of these inputs are clocked on the rising edge, two clock cycles would
be required to fetch a particular memory location. Since this would
violate our memory specification, the data_out port should be falling-edge
triggered. This can be done by inverting the system clock.
This way, the memory specification is met because the contents of the memory
address present at any rising clock edge are guaranteed to be available
at the next rising clock edge. NOTE: When I attempted this
using version 7.1 of maxplus2, this did not work (see
note).
8 bits of addressing are available, which gives 256 separate memory
locations. For a general-purpose system, 128 locations could be allocated
to program RAM and the rest to ROMs or peripherals. Larger programs could
be allocated more address bits to provide more locations.
Care must be taken when designing compex memory systems, because of
the sychronous memory requirements. The method in the example memory design
uses a synchronous state machine to multiplex the two different memory
components. This method can be scaled to more complicated memory systems
by cascading the memory-decoding state machines.
The memory system included with the sample design allocates the first
128 locations to an EAB-based RAM, which is implemented with the Altera
lpm_ram_dq megafunction.
The upper half of the memory space (the last 128 locations) is left
undecoded, except for one 8-bit register which is fully decoded at address
FF.
4.2 Designing The Top-Level Entity
The top-level design merely instantiates the stack processor and memory
components and provides the required inputs and outputs. The processor
data and address lines must be connected to memory, and inputs/outputs
for peripherals must be provided.
4.3 Creating Program File
This version of the stack processor application note does not include extensive
documentation of the instructions available for programs, but a simple
and well commented example program has been included.
A brief description of each instruction is given here:
Instruction |
Description |
Action |
psh_mem_DS |
push the contents of the given memory location to the data stack |
DS_index <= DS_index - 1
DS(index)<= memory(PC+1) |
pop_DS_TOP |
pop the data dtack and store its contents to TOP |
TOP<= DS(index)
DS_index<= DS_index + 1 |
sto_DS_memimm |
store the contents of the data stack to the given memory location |
memory(PC+1)<= DS(index) |
sto_memimm_DS_and_sto_DS_TOP |
fetch the contents of the given memory address and store to the data
stack, while storing the current data stack to TOP |
TOP<= DS(index)
DS(index)<= memory(memory(PC+1)) |
bra_mem |
branch to the given memory location |
PC <= memory(PC+1) |
bnzero_imm |
branch to the given memory location if TOP is nonzero |
if TOP != 0
PC<= memory(PC+1) |
bzero_imm |
branch to the given memory location if TOP is zero |
if TOP = 0
PC<= memory(PC+1) |
add |
add TOP to the data stack, place the result in the data stack |
DS(index) <= DS + TOP |
subtract |
subtract TOP from the data stack, place the result in the data stack |
DS(index) <= DS - TOP |
sto_mem_DS |
store the contents of the given memory location to the data stack |
DS(index) <= memory(PC+1) |
psh_memptr_DS |
push the contents of the location referenced by the given pointer to
the data stack |
DS(index) <=
memory(memory(memory(PC+1))) |
sto_DS_TOP |
copy the data stack to TOP |
TOP <= DS |
swap |
exchange the contents of the data stack and TOP |
TOP <= DS, DS <= TOP |
sto_TOP_DS |
copy TOP to the data stack |
DS <= TOP |
incr_ptr |
increment given pointer by 1 |
ptr = memory(memory(PC+1))
memory(ptr) <= memory(ptr) + 1 |
sto_DS_memptr |
store the data stack to the location referenced by the given pointer |
ptr = memory(memory(PC+1))
memory(ptr) <= DS |
4.4 Running the assembler
The SPASM assembler complies a program file and outputs a Memory Initialization
File (.mif) which is used by the Altera compiler and simulator to initialize
the RAM. It is written in PERL and expects the location to
be /usr/local/bin/perl. If PERL has been installed
to a different location just edit the SPASM source code and change the
first line.
To compile your source file, type cat program_file.src | spasm
> program_file.mif.
4.5 Initializing memory, compilation, and simulation
If you wish to initialize the memory via the compiler, just recompile your
memory file or your top-level file. If no compilation is necessary,
you may choose the "Initialize Memory" menu item in the Simulator menu.
A dialog box will appear where you can inspect and initialize the contents
of memory.
You may then simulate. Be careful that the simulator does not
over-write your memory initialization between simulation runs. If
this happens, just re-initialize it manually, or compile your memory file
with the new .mif file present.
Endnotes
Version Discrepancy for EAB -based RAM:
When I attempted to clock the data_out port of the LPM_RAM_DQ megafunction
to an inverted clock in version 7.1 of the tools, it did not work.
Its still took 2 clock cycles to fetch the memory location.
References
Altera Corporation. "Guide to EAB-based RAM megafunctions": http://www.altera.com/document/an/an052_01.pdf
Rob Chapman. "A Stack Processor: Synthesis", http://www.compusmart.ab.ca/rc/Papers/spsynthesis.pdf
Rob Chapman. "A Writable Computer", http://www.compusmart.ab.ca/rc/Papers/writablecomputer.pdf
Steven Sutankayo, Rob Chapman. "Implementing A Neuroprocesor Network:
Product Specification", http://www.compusmart.ab.ca/rc/Papers/inndps.pdf