Instruction Decoder

As the machine code instructions are retrieved from memory, this functional block decodes them into the many signal lines and proper timing needed to control the hardware components. The decoder orchestrates the "symphony of the hardware"! It is the first level of abstraction in the computer model, where the hardware is first 'hidden' behind concepts (instruction codes) that we design to suit our needs.

In general there are two classes of instruction sets and thus decoder design: RISC and CISC machines. RISC machines may be further broken into 2 types - direct or encoded instructions (a CISC machine by nature is encoded). See below for a brief discussion of the alternatives. - This design uses a modified CISC decoder in that sequencers are used to execute the instructions as in CISC machines and some machine code bits may be connected directly to the hardware. The instruction width is fixed-length at 24 bits, which is more like an encoded RISC machine.

Detailed execution sequence descriptions for all instructions will be posted in the Instruction Sequences article as they are developed. A general reference guide for the programmer is provided in the Instruction Set article.

=Design= The Design Model is the 0-operand stack machine with Forth primitives (at least) encoded in the machine code. Instructions to this end must be included in the set. The design also includes instructions and resources far beyond those required to meet that minimum. For instance the design includes a fast register bank which is not required for the minimum Design Model.

The decoder will execute the minimum instruction set with the least delay possible and include instructions more suitable to a 'standard' execution model. By incorporating a sequencer for machine control new instructions will be easier to add once the hardware is designed and added. The design and layout of the machine code is therefore critical to maintain this continued flexibility.

Factors
There are three basic groups of instructions in this machine
 * 1) ALU Operations
 * 2) Data Transfers
 * 3) Stack Manipulation
 * 4) Program Flow Control

These provide the basis for the minimum Design Model and flexibility to be much more as well. A set of machine coodes to directly manipulate the stack(s?) is usually not found on general purpose machines with a hardware stack. Since this is a stack based data processor and also has a hardware 'return' stack the Stack commands will simplifiy higher level design greatly and give the quickest execution time of the forth primitives. - The instruction fetch time is the result of the design of the instruction memory and the program counter. This design uses a constant instruction width which gives a fixed fetch time which does not change regardless of the instruction. The PC is designed such that as soon as the current instruction address is latched, the PC begins its +1 computation of the next sequential IP value. Then it will be ready as soon as possible - before the end of the execution sequence of most instructions. This pre-calculation allows the controller to access the 'next instruction location' (after the +1 is computed, of course) as well as the current one. - In the standard forth model the pointers to both the data stack and return stack may be manipulated by the code. This will not be allowed in this system. The pointers may be INC'd or DEC'd but their actual values cannot be read or changed (by instructions meant for programmer's use). There are plenty of GP or dedicated registers that may be used if necessary.

Instructions may be included in the set to do these functions but they will be meant for testing and debugging only. - In many systems the instruction code resides in ROM. This ROM cannot be altered once design is complete. Because the Design Model is an extensible language it will be helpful to have the ability to move new commands (words) from RAM into ROM after they are debugged. This system will include several instructions to assist with that process. It will be assumed that all such transfers will essentially be 'bulk' moves and will probably use resources not used by the basic Design Model. -

Opcode Design
There are 24 bits available to the machine code. The instructions with the biggest parameter fields require both a 16-bit address and 5 bits? for a register designator. That leaves only 3 bits (-1) or enough for only 7 opcodes.

-THOUGHTS- Possibilities using the 2 fields:
 * A direct-immediate transfer where the value IN the instruction is written to the specified register claims 1.
 * An indirect-immediate will write the same immediate data to the memory pointed to by the register requires 1.
 * Direct-direct transfers affects the specified register (the 'direct' side) and the specified memory location (the 'direct' side) pair is 2.
 * Indirect-direct addressing uses the specified register as a pointer into memory ('indirect') and the specified memory location ('direct') would use 2. [This may be a useless operation]
 * Indirect-indirect transfers use the specified register and the specified memory location as pointers into memory. They would use another 2. [Wouldn't this be much neater as M[R1] -> M[R2] ? and this might take too long for a single sequence - a double access into main memory?!?] But it is high level pointers at machine level
 * Double indirect-indirect? the specified register points to a location in memory that itself is a pointer into memory and transfers with the memory location pointed to in the specified memory location field
 * 0branch to spec register instead of assuming T?

OR

Use only 4 bits leaving no access to dedicated registers? OR vice-versa, have No to GP registers?. Transfer only with dedicated registers. YES but it wouldn't be "bits", the registers would be specific to the instruction and probably limited.

Or, I'd need a bit somehere to indicate GP/DEDICATED registers!?

IRL the DEDicated registers are fewer than 8 so the MC can still access at least 8 GENeral purpose as fast as the DEDs? This is only for addressing modes needing a 16 bit field.

OR - I fully adopt the stack and force ALL transfers be through the TOS??? Or at least all main memory transfers? The decoder then has 8 bits + address field minimum... I can write minimal forth primitives and still allow for greater flexibility if no cost.

- I can always rewrite sequences and add or subtract modules!!! As long as the hardware can make it happen - or be extended to do so! BUT - NOT - adding to the instruction width.

Instructions without a 16-bit field at most use 2 4-bit ones (- the 3/4 bits above) leaving 13 bits for machine code or 3 Hexes -

Electrical Design
The simulation environment uses Hex (4 bit) signals as easily as a single bit. Consequently this design is built using Hex operation devices so the machine code is most easily and quickly broken into 4-bit segments. Some of these segments may be executed in parallel and some may be read by another segment for a cascaded execution sequence.

Using many parallel devices each decoding its own segment (Hex) of the machine code is the most time efficient method and the sequencer components may be distributed at the hardware sub-systems they need to control. The machine code is bussed throughout the system to all the sequencers and directly to other hardware. The sequencer can be as wide as desired without problems or limit. Each (Hex) instruction segment requires a device to provide the feedback address for the sequencer and another one to generate the "New Instruction" signal to the PC.

The various sequencer outputs directly control the hardware on a state-by-state basis. A maximum of 16 machine codes can be grouped into a single sequencer but that sequencer can have as many outputs as needed. Also many control lines are able to be "wire-ORed" so that more than one sequencer may control the same hardware. Proper sequencer design makes sure more than one does not use a resource at the same time. - The instruction code forms the address of the beginning of a microcode sequence. It is the high-order hex of the sequencer ROM. The low-order bits come from the feedback address generator ROM for that sequencer. To implement the WAIT state, the code hex is gated by an AND. The other input comes from the respective ROM. When the code input is disabled the ROM 'jumps' to address '0X'. At the same time the feedback address ROM outputs a '0'. So the ROM actually jumps to '00'. This location continues to output the signals needed to keep the sequencer in WAIT. -

=Implementing a Microcoded Sequencer= This decoder is implemented with a parallel array of fast memories. The instruction code forms the upper part of the address to access the first "micro-instruction". The array outputs directly to the hardware and also feeds several bits back to its own address input. This accesses the next micro-instruction, and so on. This setup should have the array run its sequences as fast as the system clock will allow it. The virtual environment allows for unlimited "fan-out" without any signal degradation or timing issues. The Memory Bank is used as the blocks of the sequencer. It provides the functionality of a 16 bit input, 4 bit output PLD that will be run in asynchronous mode so it responds in a single sysclock tic. (These are 2 factors that cannot be achieved in a physical system.)

The Memory Bank is used as the sequencer memory. It may be set up as a fast running sequencer by simply connecting the (hex) output back to one of its (hex) address inputs. The other address input is the instruction code. This allows the memory to advance at sysclock (tic) speed. This does mean that each instruction sequence can be no more than 16 states long, including the wait state "call". But this "call" output cannot be acted upon until the next tic at the soonest, so it can be generated in parallel with the last microinstruction. This way it will not cause any additional delay.

To provide the fastest operation speed possible, the single-deep Memory Bank should be the whole of the sequencer memory. However this will only allow 16 sequencer operations! To provide a useable instruction set the decoder uses segments of 4 bits each. A single 'primary' sequencer includes codes to activate 'secondary' sequencers and to wait for their completion before continuing its own sequence. (At this time it appears the primary sequencer must be in the WAIT state while a secondary sequencer executes and cannot perform any more operations once control is returned to it.

The signal sent to the PC is NOT the same as the control which puts the sequencer into WAIT. The call for a new instruction may be made sooner in order to let the PC fetch the new instruction while the sequencer is still executing. Or the wait state may be entered without sending the "New Instruction" signal. The hardware must then otherwise provide a signal to end the wait state.

This is a representation of the memory in the Memory Bank. The row is selecected by the instruction code and the column within that row is determined by the feedback address from another block within the sequencer.
 * Memory Bank Info

||0 1 2 3 4 5 6 7 8 9 A B C D E F| 0||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 1||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 2||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 3||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 4||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 5||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 6||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 7||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 8||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| 9||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| A||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| B||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| C||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| D||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| E||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| F||X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|

Each row in the matrix represents a single instruction execution sequence and sets the timing of that particular block's ouput during each instruction. During operation the low-order address is incremented by one each sysclok cycle (tic) and the block's output is set to the value programmed for that cycle. So each row represents the waveform of the output during the selected instruction.

Note that the output of the Bank is actually a 'hex' signal. This gives the system excellent flexibility. Most blocks will only need to output a single bit equivalent. This is satisfied by only programming either a '0' or 'F' in the proper locations. Other blocks may use the full hex capability, even allowing further decoding of the output if needed.

Operation
The sequencer starts each instruction in the WAIT state, as described above.

The "New Instruction" signal is received from the Program Counter after the new instruction code is stable. This signal initiates execution of that instruction code's particular sequence and the first microinstruction is executed. Each microinstruction sends signals to the various hardware components that it controls.

One section of the sequencer's output blocks (4-bits) is immediately fed back to the sequencer's address input. This addresses the next microinstruction, which is fetched and output. This address portion will be called the "state address" herein. So the complete addressing of the sequencer is formed from the sequence address (from the instruction code) and the state address (from the feedback block in the sequencer). The instruction code must remain latched as long as the sequencer is executing it. It can only change (without repurcussions) during the sequencer's WAIT state.

This repeats until the sequence is complete. Throughout the sequence each output creates a waveform in time. The resolution is at single tic speeds. This is how the sequencer controls the timing of each machine function. All base functions should be able to be broken down into sequences short enough to fit within 15 states (or tics). [Implementation granting.]

The sequencer can be made as wide as needed to accomodate all machine control signals. It is assumed that these signals may be reset to inactive at the conclusion of every instruction. If this is not the case, special provisions must be made [along with much cursing].

On the final mincroinstruction, in addition to any required machine controls, the sequencer will activate an "I'm waiting" signal and enter the wait state loop. This signal goes to the Program Counter to let it know the decoder can accept a new instruction. The sequencer enters the WAIT state loop. All outputs are set to their default values during WAIT.

And the process begins again.

Wait State Control
Entering and leaving the wait state must be fast since this affects the execution time of every instruction code. To eliminate the need for a gate chip, the environment's "wired-OR" feature will be used. When the new instruction is present and the PC sends the "READY" signal, this signal can be directly tied to the state address line. Then a 'F' on the ready signal will force the state address to an 'F' for the current instruction code. Execution of instructions thusly will begin at state 'F'.

There is no logical problem with this since the sequence of state addresses do not need to be sequential. However for human reading it is best that each next state is +1 from the former. This sequencer could be made to count 'backwards' from 'F' but it is probably easier to understand if it just 'rolls over' to the '0' state, then '1', etc.

In this case, it is best to use the 'E' state as the WAIT loop. The state address generator for each sequencer has an 'E' in the 'E' state of every sequence it executes. That forms the endless WAIT loop. Then when the "READY" signal is sent from the PC, it forces the state address to an 'F' so the new instruction may begin.

The new sequence will not continue as long as this "READY" signal stays active because it forces the state address to stay at 'F'. So "READY" must be as short as possible - prefereably a single tic long. When this signal is de-activated, the output of the state address generator block will be presented to the sequencer's address input and the next state ('0') will be entered at the next tic.

Execution will continue until this sequence is done. At that time the state address generator outputs an 'E' which will force the sequence into the WAIT state.

Chained Sequences
The above description is about a single stage sequencer. This design can allow chaining sequencer stages. The caveat for this extention method is that the machine code input cannot be an 'F'.

To implement a chain, an additional block is added to the primary sequencer stage. This block outputs an 'F' until the final state. Then it outputs a '0'. This output is wire-OR'd with the code input for the following stage. The wire-OR keeps the next stage in a wait state (in code 'F'). Then when the output goes to '0' it allows the machine code to pass through to the next stage to continue the extended sequence. The machine code must be isolated between each stage to prevent the wire-OR from affecting the stage currently executing the code. (This adds NO delay to execution.)

The timing of the "Next_Stage" signal is set for a seamless transition from one stage to the next.

Chained stages can only use 13? states while the primary can use 14. However the sequencer may be indefinitely extended in this fashion. - =Notes=
 * This design embraces parallelism and the decoder is no exception! Although there is the limitation of 16 instructions fed to the sequencer, the decoder may be parallelized (?). Each 4-bit group within the instruction code may be fed to a separate sequencer and all sequencers will operate in parallel.
 * Some bits of the instruction do not need to be fed through a sequencer, such as the (hex) control signal which is sent to the ALU.
 * If a portion of the instruction width is dedicated to addressing, this portion may be fed either directly to the associated memory or to a demultiplexer that routed it to the correct memory. This demux is in turn controlled by other bits within the instruction and these would likely come from another sequencer. That sequencer would set the timing of access to to proper memory.
 * Some of the states may read other parts of the machine code and use it according to the instruction segment.

When the primary sequencer gives control to a secondary one, the primary goes into WAIT but does not send the "New Instruction" request. It is up to the secondary sequencer to call for the next instruction at the appropriate time for the instruction being executed.

Can I have a secondary sequencer start an asynchronous function? The function would feedback to where? Perhaps use one of the feedback address bits? If so, that cuts the maximum sequence length in half. -