Digital Simulator Core

Writing an integrated digital simulator including schematic capture, automatic netlist extraction and all associated facilities is probably an arduous task. On the other hand, once a netlist is available, a digital simulator core is a more manageable project. In the first instance, only the functional level will be implemented, not the behavioural (or even mixed) mode. A simple simulator is built upon the premise that ' circuits shall not be tested very close to their maximum operational frequency'.

This section contains the following parts:

Preprocessor options

Input netlist checks

Tristate/ unknown nodes

Sensible options for the input preprocessor.

Here are some imperative features of the netlist input preprocessor:

Symbolic names, as well as node numbers, must be accepted.
A bus notation must be provided- it is no good having to quote 64 (or more) signal names each time. It will be necessary to reference part of the bus, too.
Identification and expansion of macros will be required: If parts of the circuit are repeated (looking, more or less, the same), it should only be necessary to define them once. Thereafter, they are described by instantiating the type defined and possibly quoting a few parameters. The preprocessor flattens macros to a detailed gatelist, after confirming that the proper number and type of arguments is used.
Users of Verilog are accustomed to outputs preceding inputs, MSB first, while VHDL fans seem to favour the opposite. It is impossible to please both camps without an input format converter.

Essential input netlist checks.

Here are some sound input netlist checks:

Unless it has been designated as a primary output, each node must have at least one gate/ module input connected to it.
If a node is not a primary input, precisely one gate output must be connected to it. Normally, multiple outputs connected to the same node give a warning. Tristate nodes, on the other hand, allow multiple tristate devices to be attached to them.
Fanout violations must be reported.
The program should not like unconnected inputs to or outputs from gates or modules: Nodes 0 and 1 can permanently provide logic levels of the corresponding values for unused inputs. Node 2 can sink any number of unused outputs (such as may be the complimentary outputs of flip flops.)

These checks will prevent an input being substituted for an output, or vice versa. They will not, however, stop an input being supplied for a different input (or an output for a different output.)

Delay models.

Delays would be of little concern, were circuits purely combinatorial: They would simply imply a maximum operational speed for the circuit. However, as soon as some of the results have to be stored, the register latching them must wait for the data from the previous stage to become valid, before doing so. Data may be lost if the register delays too long, on the other hand, (and the design will not function as fast as it could, anyway.) When dynamic registers are involved, a minimum operational frequency is enforced upon the circuit, even if the remaining part of the design does not need such a high speed.

The duration of the widest pulse which will be suppressed by a gate defines its inertia delay. On the other hand, transport delay is the time difference between a change in an input signal and the corresponding transition taking effect on the output. Strictly speaking, the time for an output to rise, can be different from the time it needs to fall. It is important to calculate exact delay values when designs are operating near their maximum frequency.

On- chip, transport delay is basically proportional to the sum of inputs and outputs connected to the gate. For really fast chips, some wires become slower than gates. Then, the exact delay is not known until the subunits of the integrated circuit have been placed, and connections implemented by the autorouter. (You will be using a more sophisticated simulator long before that.)

Turning to sequential circuits, the input data must be sustained for a short time after the active edge of the clock signal, if it is to be stored reliably. This is known as the hold time. Some simulators even insist that data be stable for a small time interval prior to the active transition of the clock- that's the setup time. (A similar constraint is imposed upon the 'program enable' input.) Additionally, control inputs such as clock, clear, preset, program load, count and so on must persist for a minimum amount of time, or the state of the flip flops becomes undefined. There may be technology- dependent priority rules if several control inputs are excited at the same time.

Inertia delays, hold times and minimum control pulse widths are verified by performing timing checks. If a timing check fails, some of the logic values scheduled have to be deleted, as well as some of the future timing checks: Since a leading edge does not occur, there is no need to verify the timing of the trailing edge.

Many are the advantages of synchronous design, not least, that it results in circuits which are easy to understand and simulate. Unless the circuit is near its maximum speed, restrictions on sequential elements should be likely to be observed.

Data structures for digital simulator.

The simulator needs a data structure to hold the input netlist, that is, the node numbers to which each gate or module is connected: Thus each gate knows where to look for its input data, and where to place the results. (It may seem that this is just a copy of the netlist file, but macros must be collapsed to gatelists, and additional preprocessing may be needed. More importantly, the items in the file can only be examined sequentially, so the speed of access will be poor.)

Here are some options:

Directly accessing memory- abandoned a long time ago.
Using pointers- adequate, but not available in some well- established languages.
Using a non- uniform two- dimensional array- ditto.
Using a two dimensional array initialised to the maximum width required- very wasteful of memory (especially if modules, like multi- bit adders are provided to the user ready- made). It may be acceptable, though, if the circuit to be simulated is small.
The uneven 2- d array is collapsed to a one dimensional array. A second array gives the starting index for each gate/ module in the first array.

This last option makes the program easily portable to any language- it looks like the preferred choice.

A similar data structure is used for the affected component list -ACL- (sometimes known as the fan- out list- FOL.) For each node, a list of the serial numbers of each gate or module, which has at least one of its inputs connected to the node, is compiled. When the logic level on the node changes, it is thus easily possible to find out which gates may be affected. Their output(s) will have to be evaluated again.

From the subsection on delays, it should be clear that node values are scheduled, rather than being applied to nodes immediately. A data structure is needed to hold that information in the meantime. In the first array of records, each record has the following fields:

the node number
the logic value to be impressed upon the node
the index of the next record pertaining to the same simulation time (or -1 if there isn't one.)

The second array of records has the following fields:

the simulation time
the index of the first record (in the first array) corresponding to that simulation time

(These data structures shall be collectively termed 'the event buffer' from now on. 'Event' here has the usual programming meaning, not the one in hardware description languages (HDL.)

The third array can be used as a stack to reserve and (later on, after records have been processed) to free slots for records. Slots (indices) are allocated from the top of the stack, and restored there, after the corresponding records have been read. The reason for needing this array, is that generally record slots will not be recovered in the same order they have been reserved (especially if modules, rather than just gates, are used.) This may look like a waste of space and time, since memory is plentiful nowadays. However, circuits have grown very large, too. This array can be omitted if only small circuits are going to be simulated: Addresses for records already processed are simply discarded, rather than retrieved for new use. The entire workspace can be reclaimed by the operating system at the end of the simulation. Sample code for asserting and relinquishing slots for records is in the fast fill routine section, which, by the way, now includes the improved version.

Records for timing checks are similar to records for values scheduled to be applied to nodes: Of course, the logic levels indicated are to be verified until the time mentioned, rather than being impressed upon the instant specified.

Additional data structures are needed for tristate nodes; these are discussed in the tristate section.

Describing input waveforms.

There are about three ways to define input signals:

specify initial state, frequency and mark- to- space ratio, for the more mundane signals
specify initial state and switching instants, for the more exotic waveforms
logic equations

Boolean expressions take the following form :

Reset= (Time>100 nsec)
Interrupt= (Time<1000 msec) Or (Time>1001msec)
Clock= (Time Mod 10)<5 nsec

Simulation process.

Here are the main simulation steps:

Initially (at time zero), the event buffer is empty and all node levels undefined. (It is possible to continue from a previous simulation run, if all the values have been stored.)
The levels on primary input nodes at that time are entered in the event buffer.
The records pertaining to the present simulation time are extracted from the event buffer. Timing checks are performed and the state on tristate nodes resolved. Logic levels are enforced on nodes, if different from old values. (It may be that no values have to be updated. Should that happen, time is advanced to the next simulation step.)
The affected components list is consulted, and the corresponding modules are reevaluated. Events are scheduled for the levels at the outputs of those modules.
If the corresponding simulation time does not have a record, it is created and placed in the appropriate place, so that the list is ordered. Records, for simulation times whose events have been exhausted, are deleted.
Time is advanced to the next simulation event, and all steps above save the first are repeated.
Simulation proceeds until the time specified or an error is met.

There are some clear timesaving steps here:

all the new logic values are impressed simultaneously
each module is only reevaluated once, whether one or all of its input nodes had their levels changed

It looks as though half the processing can be evaded (much more for multi- bit counters and decoders) if events are only generated when the (future) logic level to be scheduled is different from the present value on the node . However, this is also like rushing out on thin ice. Simulation errors will escalate as soon as any module is provided with input data any faster than it can produce the results. (Why you would want to do this, I don't know. It can happen, though. For example, the gates driving the module inputs may need different times to switch to the low, or high, state.) Therefore, it is better to be safe than sorry. An event is generated even if the present and scheduled (future) state of the node are the same. If, at the time the new value is to be impressed, it is no different from the old value, the event is propagated no further.

Handling tristate or ambiguous nodes.

Three levels of tolerance can be applied to nodes, if they have multiple outputs connected to them:

at most one (tristate) output may drive the node at any time
several outputs can assert the node simultaneously, if they are trying to enforce the same logic level (so as to increase the output current)
wired- And/ Or is allowed

Being technology- dependent, this last option is probably best avoided.

Tristate modules conveniently arbitrate bus use without need for multiplexers. How to cope with unknown values on nodes? There may be some scope for claiming that, for example, once any input to an And gate is zero, the output will be zero- whatever the state of all the remaining inputs. This, however, is probably not what the designer intended (after the device has been reset), so it seems better to give a warning.

Slim are the chances for resolving the state of sequential circuits in the presence of uninitialised or vague inputs: (there is one exception to which the text will refer later under this subheading.) As soon as the clock, clear, set, program, count and so on input becomes undefined, so does the sequential element. Generally, the states of internal flip- flops in sequential blocks will be unknown at power- on; this implies that, for example, counters will not count until they have been reset or programmed. Shift registers, on the other hand, gradually shift out the ambiguous state as fresh data is clocked into them.

The only time, really, that unknown values are acceptable (after the device has been reset), is on data inputs when the clock is not active (while respecting hold times, and, possibly, set- up times.)

If at most one driver will assert a tristate node at any time, it is possible to get away just with storing the serial number of the driver last enforcing a logic level upon the node. It is then checked that this driver has switched- off before another device can drive the node.

If multiple outputs may drive a node at the same time, it probably easier to define a dummy node for each output. The logic levels on dummy nodes are combined, according to the rules at the beginning of this section, to resolve the eventual state of the node (or, maybe, to give an error message.) Whenever any dummy node is updated, the aggregate value must be recomputed.

Plausible output options.

Excessive data shall either be filtered, or ignored. Here are some reasonable output options:

Only the nodes selected by the user shall have their waveforms displayed: These will be local nodes, when debugging subunits of the design. Global (primary output) nodes will be examined when verifying the circuit in its entirety.
Maybe it is necessary to display waveforms for a short time before the manifestation of a fault only.
The output postprocessor must provide a coarse mode, too. This will display node waveforms changes as if they were coincident with the most recent transition in one of the primary inputs. If any nodes do not have time to stabilise before the next edge in one of the primary inputs, their levels must be marked as provisional. The simulation process is still exactly the same; only the results are presented differently, to help in keeping the screen relatively uncluttered.

The coarse mode is useful to detect logic faults, and some timing faults. Additional timing faults may appear near the maximum operational frequency. In that condition, the coarse mode becomes more of a liability, than an asset. The normal, fine mode, will be selected (but, at low frequencies, it is the detailed mode which will look more confusing.)

Basic library.

Certain building blocks form the bulk of any digital design. Most of the time, it makes sense to simulate those at the behavioural level, not the gate level, (even if the former is not accessible to the user.) Here are some predefined modules:

gates from two to eight inputs (excluding seven): And, Nand, Or, Nor. Also Exor, Exnor and inverters. Buffers (in multiple of 8 bits), including tristate option.
multiplexers, encoders/ decoders: In powers of two, up to sixteen inputs/ outputs. Multiplexers have a tristate option.
full adders and probably an ALU, too, (in multiples of 4 bits)
delay and Set- Reset flip- flops
bus transceivers and registers, in multiples of 8 bits, including tristate option
counters, programmable counters and shift registers (in multiples of 4 bits)

A library can become very detailed and extensive. Then its length is a considerable proportion of the simulator package size.

Drawbacks.

Since the netlist is not automatically extracted, the program is incomplete . It is much faster to type in a netlist, if a labeled (paper) schematic is already available, than transfer the design to a computer using a CAD package. On the other hand, typing is boring and error- prone. It is ill- suited to people who have been justifiably used to the nice graphical interfaces of the last decade, or so. Any amount of time can be expended on the input interface.

Not providing the behavioural mode to the user implies (among other things) that use cannot be made of several excellent automatic synthesis tools: These (on their own) generate a gatelist for the controller of a finite state machine, that is, the circuitry for present state outputs, and next state logic. However, finite state machines are arguably better implemented on a microcontroller board. Very high speed may not be needed. Ease of design, flexibility, and lower risk associated with design faults which must be rectified before the project is committed to production, seem to point towards the software implementation. This is also the cheaper option if the volume is not large.

Afterword.

Verifying, rather than hoping, that software components operate as expected, is surely a crucial factor to writing a successful suite. Yet, once the program has been tested, the simulator may not have to be blamed when the results occasionally appear 'weird'. It may well be that the circuit is not operating as initially expected.

For example, when both inputs to a symmetric Set- Reset bistable are asserted, then released at the same time, the simulation will show that the outputs are oscillating. This may not be obvious from an inspection of the truth table: Readers may concentrate on different input combinations, rather than sequences of those combinations, (which are far more numerous.)

Related future sections.

It looks like this section has grown too long as it is. An example is clearly missing: A test- case circuit, together with input netlist and output waveforms will be presented in a future section.

Some material had to be left out, though equally important: That includes testability analysis, submodule placement, autorouting, possibly by using Lee's algorithm, or heuristics, and fault simulation. (Part of the latter, admittedly, only applies to circuits of limited, and non- reconvergent, fanouts.)