This is only a preview of the April 2021 issue of Practical Electronics. You can view 0 of the 72 pages in the full issue. Articles in this series:
|
Circuit Surgery
Regular clinic by Ian Bell
Timing and metastability in synchronous circuits – Part 2
O
ur discussion on digital
timing and metastability started
a couple of months ago when we
investigated a digital frequency divider
simulation in Micro-Cap 12. The circuit
oscillated in the simulation but worked
fine as a physical circuit. The simulated
behaviour was due to the fact that all the
gates in the simulation had exactly the
same delay. This set up the conditions for
oscillation, which would be unlikely in
a real circuit – however, the simulation
highlighted the fact that the circuit could
potentially suffer from timing problems
in a real implementation.
Micro-Cap 12 forum
On the subject of Micro-Cap 12, we
discovered that an online user’s forum
has been started recently: ‘Micro-Cap
EDA Users’ at mc12.createaforum.com
Readers interested in using this
software may find it a useful resource.
This once-pricey software was made
freely available in July 2019 after
development stopped, so there is no
longer official support.
Recap on synchronous
circuit timing
The divider circuit was relatively
unusual in that it was designed
using a minimum number of NAND
gates, using asynchronous design
techniques, rather than just using an
existing flip-flop. So last month we
looked at timing issues in the much
more common context of synchronous
digital circuits. Synchronous circuits
are controlled by a clock signal – a
regular train of pulses which controls
the overall timing of the circuit. Even
if a synchronous circuit has a complex
overall structure, it fundamentally
D
Q
R 1
In
O ut
CL
comprises register-to-register transfers,
as shown in Fig.1 – data held in register
1 (R1) is processed by the combinational
logic (CL) and the result is stored in R2.
On each clock cycle, R1 loads new data
to be processed, and R2 stores the result
of processing the data that was held in
R1 in the previous clock cycle.
The circuit in Fig.1 is not infinitely fast.
There is a delay from when the active
clock edge occurs to when the register’s
output change (TDR) and delays from
when the combinational logic’s inputs
change to when we can guarantee that
its outputs are correct. We also have to
consider that when the data changes, the
flip-flop’s internal circuitry takes time
to settle in response to that change. If
the clock is activated too close to a data
change, the flip-flop may not function
correctly (we say a timing violation has
occurred). It may load the wrong value or
go metastable, potentially resulting in a
much longer than normal delay before the
output changes. To help prevent timing
violations, flip-flops are specified by a
setup time (TSetup) and a hold time (Thold)
– the time before and after the clock edge
during which the data must not change
in order to ensure correct operation.
Timing violations
As discussed last month, for the circuit
in Fig.1, the minimum clock period
must be greater than TDR + TDC + TSetup
to make sure that the data loaded
into R2 is valid. We can ensure this
in a synchronous circuit by design,
which means that the circuit will not
suffer timing violations. This is not
necessarily easy, particularly in large
designs, where there are performance
requirements which demand high clock
rates. Professional design tools for large
D
digital circuits (eg, FPGA design) include
timing analysers to help identify timing
problems. Sources of timing issues are
more complex than just the clock period
condition we mentioned above. For
example, in a large design the clock will
not arrive at each flip-flop at exactly the
same time (this is called clock skew),
which can also cause timing violations.
Nevertheless, it is possible, with some
effort, to ensure that timing violations
will not occur in a synchronous circuit
with a single clock.
The ‘guaranteed by design’ does
not apply when we have external
asynchronous signals – they can change
any time in the clock cycle, which means
it is possible for them to change close
enough to the active clock to cause
timing violations. Similarly, in circuits
with multiple clocks (clock domains)
there is a possibility of timing violations
when signals cross clock domains. There
is a period of time as the data changes
(metastability window, T0) when clocking
the latch will result in metastability (see
Fig.2 and Fig.3). If metastability occurs,
the latch will take an amount of time,
called the ‘resolution time’ (TR), before
it returns to one of the stable states. In
theory, this could be infinite, but in
practice it is more likely to be in a range
of up to about ten times the propagation
delay (TDR). For asynchronous signals
we have no control over the relative
signal timing, so we cannot guarantee to
prevent metastability. We have to deal
with it in terms of probability, which
we will discuss in more detail shortly.
Metastability: philosophy
and analogy
Before looking at a metastability
probability in circuits it is worth looking
Q
R 2
Clock
Data
Q
Data
Q
Clock
Fig.1. Register-to-register transfer (R1, R2) via a block of
combinational logic (CL) is the key structure in a synchronous circuit.
58
Fig.2. A Latch circuit captures a 1 or 0 in a storage loop. Metastability
occurs if it captures an intermediate voltage.
Practical Electronics | April | 2021
TC
Clock
T0
D1
Data
L atch captures
intermediate voltage
TR
E xit from
metastability
Q
Fig.3. Latch metastability waveforms.
at an analogy or two. First, last month
we noted that metastability is like flipflop indecision – it gets stuck half-way
between 0 and 1 and takes much longer
than usual before it fi nally settles to
one of the stable states. This is similar
to a paradox in philosophy known as
Fig.4. The ball and hill analogy – there are
two stable states on the flat on either side.
a)
b)
Fig.5. Analogy to normal flip-flop operation
– the ball lands close to the stable state and
quickly attains a stable state.
a)
b)
Fig.6. Analogy to metastable flip-flop
operation – the ball lands close to (a), or
exactly on (b), the top of the hill and takes a
long time to return to a stable state.
Practical Electronics | April | 2021
‘Buridan’s ass’ (donkey). The
idea is that an animal (the
donkey) is positioned exactly
halfway between two equally
desirable items of food or
drink and therefore is unable
to decide which one to
consume – it takes so long to
make up its mind that it dies
of hunger or thirst. As well
as featuring in philosophical
discussions on reason and
determinism from antiquity
(predating Buridan) the idea
has been used numerous
times in popular culture. Buridan’s unfortunate ass (donkey) – courtesy of Julian
For more information, see the Mayers, YouTube.
Wikipedia page on Buridan’s
ass (http://bit.ly/pe-apr21-ass).
The second analogy – the ball and
voltages (see Fig.7). Analysis results
hill – is commonly used to help discuss
in a differential equation, but we’ll
metastable circuits. It helps us understand
not go into the full details of the maths
the variation of resolution time with
here. However, readers familiar with
input timing. The idea is illustrated
RC charging may not be surprised to
in Fig.4. The ball can be in one of two
learn that the solution is an exponential
stable positions on either side of the hill
function relating V D at time t after
– this corresponds to the latch circuit
the clock edge to the initial voltage
holding a 0 or a 1. Attempting to store a
difference captured by the loop (VD0).
new value in the latch corresponds with
The smaller VD0 is the longer it takes
kicking the ball. To properly reload the
for the latch to get back to a normal
same state the ball receives a small kick
state – this corresponds with the ball
and quickly rolls back from the unstable
landing closer to the top of the hill in
position on the slope to the original stable
the analogy discussed above.
state (Fig.5a). To cleanly change state it
In terms of digital circuit design, we
receives a large kick, lands low down
would like any flip-flop that happens
on the other side and quicky rolls to the
to go metastable to recover sufficiently
other stable position (Fig.5b).
quickly not to cause any problems.
This analogy is not based on the
Typically, this means within one clock
detailed physics of kicked balls – we
cycle, with relevant parameters such as
assume the ball drops vertically onto the
delays and setup time taken into account,
hill. If the ball receives an intermediatein a similar way to our earlier discussion
strength kick, corresponding with a latch
on maximum clock frequency. This sets
storing an intermediate voltage part way
a maximum resolution time (TR) which
between 0 and 1, it lands near the top of
we can tolerate. Fig.3 shows two possible
the hill. It will take much longer to roll
voltage waveforms on the latch output
to a stable state (Fig.6a), or, in the most
(for equal but opposite initial voltages).
extreme case the ball will balance exactly
This is extended in Fig.8 to show a range
on the hill-top and take a potentially
of waveforms resulting from different
infinite time to reach one of the stable
initial voltages. The latch exits from
states (Fig.6b).
metastability when the voltage difference
(VD) exceeds the minimum which can be
considered as ‘normal’ latch operation –
Circuit analysis
with digital 0 and 1 on the latch outputs
The inverter loop shown in Fig.2
captures two voltages – on the output
of each inverter when the clock occurs.
Normally, one inverter is at logic 0
and the other at 1, so the voltage
difference between the two outputs
(V D ) is relatively large. However, if
VD
the data is changing at the time of the
clock, as shown in Fig.3, the loop will
capture a small voltage difference. We
can model what happens by considering
the inverter loop as two amplifiers
connected to RC circuits (wiring and
Fig.7. The inverter loop (see Fig.2) in the
inverter input and output capacitance
latch behaves like two amplifiers each driving
and resistance) – inverters act like
an RC circuit.
amplifiers with intermediate input
59
multiply the probabilities to find the
overallinprobability,
soCircuits
PF = PE–PSPart
.
Timing and Metastability
Synchronous
2
When the voltage
OK – voltage reaches VN before TR
D
i
g
i
t
a
l
c
i
r
c
u
i
t
s
a
r
e
p
r
o
c
e
ssing
difference (VD)
VN
between the
information continuously, so a single
Timing and Metastability in Synchronous Circuits – Part 2
inverters becomes
failure probability is not very useful.
greater than the
normal difference
We are more interested in how 𝑇𝑇often the
$
(VN) the latch exits
𝑉𝑉!"#
𝑉𝑉# exp &− Given
* an
circuit will fail
in =
operation.
FAIL
metastability
𝜏𝜏
voltage < VN at TR
asynchronous𝑇𝑇input to a synchronous
T
$– Part 2
VN exp – R
Timing andStill
Metastability
in Synchronous
Circuits
metastable
𝑉𝑉!"#
= 𝑉𝑉#the
exp
&− of
* failure will be given
latch,
rate
τ
𝜏𝜏
by
the
rate
at
which
data is
𝑇𝑇$changing
TR
= exp
*
Time t
(the data rate fD)𝑃𝑃%and
the&−
probability
of
𝜏𝜏
failure (PF 𝑇𝑇
from above), which occurs
TR
$ 𝑇𝑇$
–VN exp –
exp
*data changes. We get:
time
the
% ==
𝑉𝑉𝑃𝑃each
𝑉𝑉#&−
exp
τ
!"#
𝜏𝜏&− 𝜏𝜏 *
TR is the maximum
time the latch can
𝑇𝑇$
Failure rate = 𝑓𝑓! 𝑃𝑃& = 𝑓𝑓! 𝑃𝑃' 𝑃𝑃% = 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" exp &− *
remain metastable
𝜏𝜏
without causimg a
system failure
𝑇𝑇$
𝑇𝑇
𝑓𝑓! exp
𝑃𝑃' 𝑃𝑃%&−
= 𝑓𝑓$!*𝑓𝑓( 𝑇𝑇" exp &− *
Failure rate = 𝑓𝑓! 𝑃𝑃& 𝑃𝑃= =
%
𝜏𝜏
𝜏𝜏
–VN
𝑇𝑇
exp
- 𝜏𝜏$reliability
.
It is common to discuss system
MTBF =
𝑓𝑓
𝑓𝑓
𝑇𝑇
in terms of Mean
! ( " Failures
𝑇𝑇 Time Between
$
exp - 𝜏𝜏$ . is simply the𝑇𝑇reciprocal
𝑓𝑓!(MTBF),
𝑃𝑃& ==𝑓𝑓! 𝑃𝑃which
𝑃𝑃% = 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" exp &− *
Failure rate = MTBF
'
Fig.8. Voltage difference changes in a latch which enters metastability at time t = 0.
𝜏𝜏 MTBF
of failure𝑓𝑓rate
𝑓𝑓 𝑇𝑇(1/ failure rate). The
VD
( )
( )
! ( "
for a flipflop is:
values are equally probable, the failure
– call this VN. From the solution of the
loop differential equation, we can find
probability is simply the proportion
Timing and Metastability in Synchronous Circuits – Part 2
the relationship between the initial
of metastable V D0 values less than
voltage difference (VD0) resolution time
VD0N. The maximum value of V D0 for
(TR) and the metastability exit voltage
metastability is VN as initial voltages
(VN). The boundary between the circuit
above this implies
𝑇𝑇$ normal operation. So,
the=probability
failing and not failing occurs when the 𝑉𝑉!"#
𝑉𝑉# exp &− of* failure after entering
𝜏𝜏
metastability is PS = VD0N/VN. From the
voltage difference just reaches VN at TR
exponential equation above we get:
(see Fig.8). This occurs with a specific
Metastability in Synchronous Circuits – Part 2
initial voltage difference VD0N. From the
𝑇𝑇$
𝑃𝑃% = exp &− *
circuit equation we find (if we solve the
𝜏𝜏
differential equation):
This gives a probability that the latch
𝑇𝑇$
will fail if it has become metastable,
but
𝑉𝑉!"# = 𝑉𝑉# exp &− *
𝑇𝑇$
𝜏𝜏
Failure rate = 𝑓𝑓! for
𝑃𝑃& =
𝑃𝑃' 𝑃𝑃% =failure
𝑓𝑓! 𝑓𝑓( 𝑇𝑇" probability
exp &− * (PF) we
an𝑓𝑓!overall
𝜏𝜏
Here, τ is the time constant of the latch
also need to know the probability that
loop – it depends
the latch enters metastability in the first
𝑇𝑇$on resistor and capacitor
values
place (PE). This
𝑃𝑃% =and
exp amplifier
&− * gain (Fig.7).
𝑇𝑇$ is more straightforward
𝜏𝜏
exp -and
to calculate
𝜏𝜏 . was mentioned in last
MTBF =
month’s 𝑓𝑓
article.
The probability of a
Probabilities and MTBF
! 𝑓𝑓( 𝑇𝑇"
latch
becoming
metastable
is basically
If the clock happens to occur
within
𝑇𝑇$
Failure rate = the
𝑓𝑓! 𝑃𝑃&time
= 𝑓𝑓!range
𝑃𝑃' 𝑃𝑃% =
𝑓𝑓! 𝑓𝑓( 𝑇𝑇" expas
&−T0 in
* Fig.3
the proportion of the clock cycle taken
designated
𝜏𝜏
by T0, that is PE = T0/TC – we assume
then the latch will go metastable. Within
this period, we will assume that all
the asynchronous signal can change
initial voltages 𝑇𝑇(VD0) occur with equal
at a point of the clock cycle with
expThe
- 𝜏𝜏$ .probability that the
equal probability. We can also write
probability.
MTBF
=(given that it went metastable
this as PE = fcT0, where fc is the clock
latch
fails
𝑓𝑓! 𝑓𝑓( 𝑇𝑇"
in the first place) is the probability
frequency. The probability of the latch
that the latch is still metastable after
failing (PF) is the probability that it both
the acceptable TR – call this PS (‘still
enters metastability and that it is still
metastable after the acceptable resolution
metastable’ probability). This is the
time. If something is dependent on
probability that VD0 is less than VD0N.
two conditions occurring together we
Given the assumption that all V D0
DAsync
D
Q
DSync
DAsync
D
Q
DSync1
Synchronous
system
Clock
Fig.9. Single flip-flop synchroniser to protect a
synchronous system from metastability due to an
asynchronous input.
60
𝑇𝑇
exp - $ .
𝜏𝜏
MTBF =
𝑓𝑓! 𝑓𝑓( 𝑇𝑇"
Note the change in sign in the exponential
from taking the reciprocal.
Synchronisers
A typical strategy for avoiding errors due
to metastability caused by asynchronous
inputs to synchronous systems is to add
a flip-flop, clocked by the system clock,
between the asynchronous signal and
the system input (see Fig.9). This is
known as a ‘synchroniser’. The idea is
that it is OK for the synchroniser flipflop to become metastable as long as it
recovers by the next clock cycle – exactly
the scenario we calculated the MTFB
for above. An important thing here is
that adding the synchroniser does not
eliminate the possibility of failure of
the system, but it will be lower than if
the signal was input directly. The MTBF
equation above indicates the performance
of the synchroniser, but we need to be
able to interpret the results correctly.
Typically, 63% of items will fail in the
MTBF time, so it generally needs to be
considerably longer than the acceptable
error-free lifetime of the system. The
issue is compounded in large digital
circuits which contain a large number
D
Q
DSync2
Synchronous
system
Clock
Fig.10. Two-flip-flop synchroniser.
Practical Electronics | April | 2021
of synchronisers – the system may fail
if any one of them fails.
If we know the values for τ and T0,
which are dependent on the specific
technology and flip-flops used, then we
can calculate the MTBF. The clock and
data rates should be known from the
system specification and TR is typically
the clock period minus the setup time of
the synchronous system input and any
propagation delays from the synchroniser
to the system.
Example
As an example MTBF calculation, we
will use τ = 65ps and T0 = 400ps – these
are not for a particular technology, just
for illustration. Consider a system with a
clock of 500MHz and an input data rate
of 150MHz. The clock cycle is 2ns, so TR
must be less than this, say 1.6ns (again,
just for illustrative purposes). Putting
these numbers into the MTBF equation
we get about 3.5 hours – so if we built
many copies of our circuit, 63% would
fail in the first 3.5 hours of operation.
This is unlikely to be acceptable!
A possible solution, if a single
synchroniser flipflop is unable to achieve
a sufficient MTBF, is to use two or
three in chain. For two flip-flops (see
Fig.10) the probability that the second
is metastable at the point the data enters
the system is PF = PEPS1PS2 – that is, the
first flip-flop has to enter metastability
and still be metastable after TR, causing
the second one to enter metastability and
it still has to be metastable after a further
TR. If both have the same parameters,
we end up with a new MTBF equation
with 2TR instead of TR in the exponential.
Running this calculation with the values
from the previous example gives a MTBF
of about 150 million years and a much
smaller likelihood of failure during the
circuit’s operational lifetime. In practice,
it may be difficult to find values for τ
and T0, but hopefully these examples
illustrate the fact that it is not necessarily
obvious how many synchroniser stages
are required.
Multi-bit synchronisation
The synchronisers shown in Fig.9 and
Fig.10 can only be used for single-bit
data. If we have multi-bit data, then it
may seem that we could simply use a
synchroniser on each bit in parallel.
Unfortunately, the nature of metastability
and the effect of slight differences
in input timing, clock skew and the
variability of individual flip-flops make
this a very risky approach. Consider a
single bit entering a synchroniser flipflop; say it changes from 0 to 1 but causes
metastability which resolves in time, so
does not cause a failure. It may resolve
to 0 or 1, depending on exactly what got
Practical Electronics | April | 2021
captured in the latch. On the next clock
cycle the input bit will have definitely
settled to 1, so the synchroniser flip-flop
will load a 1 with no metastability. The
1 enters the system OK in this scenario,
but there is uncertainty as to which clock
cycle this occurs in. This is fine for a
single bit – it is asynchronous, so it is
not expected in a particular clock cycle.
For multi-bit values, which happen to
change close to the clock, individual
bits in a set of parallel synchronisers
could resolve on different clock cycles.
This would present corrupt data to
the system for a clock cycle – which
could be catastrophic, depending on the
implications of inputting wrong values.
For transferring multi-bit data between
synchronous systems, we must use
different approaches. One way is to
use synchronised handshake signals.
The sender sets up a new data value,
and only after this is stable sends a
single-bit ‘I have data for you’ handshake
signal via a synchroniser to the receiving
system. The receiver sends a single-bit
acknowledge signal back, again via a
synchroniser, when it has loaded the
data. This is effective, but relatively
slow. For faster data rates a special
FIFO (first in first out) memory can
be used. Any FIFO contains a bank of
dual port memory (it is written to and
read from via different ports rather
than a single bus) and acts as a buffer
where data production and consumption
rates may vary (like buffering online
videos). If both sides use the same clock
things are straightforward, but if they
are asynchronous the problem is not
synchronising the memory data but
synchronising the counters that keep
track of where the data is being written
and read in the memory. These have
to be compared to check for FIFO full
and empty conditions. As one counter
is associated with each asynchronous
system, they are multibit values which
have to be synchronised in order to
perform the comparisons. The clever
trick here is to use a Gray code number
system for the counters. In Gray code
an increment of one causes just one bit
to change, so parallel synchronisers
like those in Fig.9 and Fig.10 can be
used on each bit. Since only one bit
changes at a time, there is no possibility
of corrupting the count value by different
synchronisers resolving in different
clock cycles.
Simulation files
Most, but not every month, LTSpice
is used to support descriptions and
analysis in Circuit Surgery.
The examples and files are available
for download from the PE website.
www.poscope.com/epe
- USB
- Ethernet
- Web server
- Modbus
- CNC (Mach3/4)
- IO
- PWM
- Encoders
- LCD
- Analog inputs
- Compact PLC
- up to 256
- up to 32
microsteps
microsteps
- 50 V / 6 A
- 30 V / 2.5 A
- USB configuration
- Isolated
PoScope Mega1+
PoScope Mega50
- up to 50MS/s
- resolution up to 12bit
- Lowest power consumption
- Smallest and lightest
- 7 in 1: Oscilloscope, FFT, X/Y,
Recorder, Logic Analyzer, Protocol
decoder, Signal generator
61
|