Circuit Surgery - April 2021 - Silicon Chip Online

Outer Front Cover
Contents
Subscriptions: PE Subscription
Subscriptions: PicoLog Cloud
Back Issues: PICOLOG
Publisher's Letter
Feature: The Fox Report by Barry Fox
Feature: Techno Talk by Mark Nelson
Feature: Net Work by Alan Winstanley
Project: DIY Solder ReFLow Oven with PID Control by Phil Prosser
Project: Programmable Thermal Regulator by Tim Blythman and Nicholas Vinen
Project: Frequency Reference Signal Distributor by Charles Kosina
Feature: KickStart by Mike Tooley
Feature: Max’s Cool Beans by Max the Magnificent
Feature: Max’s Cool Beans cunning coding tips and tricks by Max the Magnificent
Feature: AUDIO OUT by Jake Rothman
Feature: Circuit Surgery by Ian Bell
Feature: Make it with Micromite by Phil Boyce
PCB Order Form
Advertising Index

This is only a preview of the April 2021 issue of Practical Electronics.

You can view 0 of the 72 pages in the full issue.

Circuit Surgery Regular clinic by Ian Bell Timing and metastability in synchronous circuits – Part 2 O ur discussion on digital timing and metastability started a couple of months ago when we investigated a digital frequency divider simulation in Micro-Cap 12. The circuit oscillated in the simulation but worked fine as a physical circuit. The simulated behaviour was due to the fact that all the gates in the simulation had exactly the same delay. This set up the conditions for oscillation, which would be unlikely in a real circuit – however, the simulation highlighted the fact that the circuit could potentially suffer from timing problems in a real implementation. Micro-Cap 12 forum On the subject of Micro-Cap 12, we discovered that an online user’s forum has been started recently: ‘Micro-Cap EDA Users’ at mc12.createaforum.com Readers interested in using this software may find it a useful resource. This once-pricey software was made freely available in July 2019 after development stopped, so there is no longer official support. Recap on synchronous circuit timing The divider circuit was relatively unusual in that it was designed using a minimum number of NAND gates, using asynchronous design techniques, rather than just using an existing flip-flop. So last month we looked at timing issues in the much more common context of synchronous digital circuits. Synchronous circuits are controlled by a clock signal – a regular train of pulses which controls the overall timing of the circuit. Even if a synchronous circuit has a complex overall structure, it fundamentally D Q R 1 In O ut CL comprises register-to-register transfers, as shown in Fig.1 – data held in register 1 (R1) is processed by the combinational logic (CL) and the result is stored in R2. On each clock cycle, R1 loads new data to be processed, and R2 stores the result of processing the data that was held in R1 in the previous clock cycle. The circuit in Fig.1 is not infinitely fast. There is a delay from when the active clock edge occurs to when the register’s output change (TDR) and delays from when the combinational logic’s inputs change to when we can guarantee that its outputs are correct. We also have to consider that when the data changes, the flip-flop’s internal circuitry takes time to settle in response to that change. If the clock is activated too close to a data change, the flip-flop may not function correctly (we say a timing violation has occurred). It may load the wrong value or go metastable, potentially resulting in a much longer than normal delay before the output changes. To help prevent timing violations, flip-flops are specified by a setup time (TSetup) and a hold time (Thold) – the time before and after the clock edge during which the data must not change in order to ensure correct operation. Timing violations As discussed last month, for the circuit in Fig.1, the minimum clock period must be greater than TDR + TDC + TSetup to make sure that the data loaded into R2 is valid. We can ensure this in a synchronous circuit by design, which means that the circuit will not suffer timing violations. This is not necessarily easy, particularly in large designs, where there are performance requirements which demand high clock rates. Professional design tools for large D digital circuits (eg, FPGA design) include timing analysers to help identify timing problems. Sources of timing issues are more complex than just the clock period condition we mentioned above. For example, in a large design the clock will not arrive at each flip-flop at exactly the same time (this is called clock skew), which can also cause timing violations. Nevertheless, it is possible, with some effort, to ensure that timing violations will not occur in a synchronous circuit with a single clock. The ‘guaranteed by design’ does not apply when we have external asynchronous signals – they can change any time in the clock cycle, which means it is possible for them to change close enough to the active clock to cause timing violations. Similarly, in circuits with multiple clocks (clock domains) there is a possibility of timing violations when signals cross clock domains. There is a period of time as the data changes (metastability window, T0) when clocking the latch will result in metastability (see Fig.2 and Fig.3). If metastability occurs, the latch will take an amount of time, called the ‘resolution time’ (TR), before it returns to one of the stable states. In theory, this could be infinite, but in practice it is more likely to be in a range of up to about ten times the propagation delay (TDR). For asynchronous signals we have no control over the relative signal timing, so we cannot guarantee to prevent metastability. We have to deal with it in terms of probability, which we will discuss in more detail shortly. Metastability: philosophy and analogy Before looking at a metastability probability in circuits it is worth looking Q R 2 Clock Data Q Data Q Clock Fig.1. Register-to-register transfer (R1, R2) via a block of combinational logic (CL) is the key structure in a synchronous circuit. 58 Fig.2. A Latch circuit captures a 1 or 0 in a storage loop. Metastability occurs if it captures an intermediate voltage. Practical Electronics | April | 2021 TC Clock T0 D1 Data L atch captures intermediate voltage TR E xit from metastability Q Fig.3. Latch metastability waveforms. at an analogy or two. First, last month we noted that metastability is like flipflop indecision – it gets stuck half-way between 0 and 1 and takes much longer than usual before it fi nally settles to one of the stable states. This is similar to a paradox in philosophy known as Fig.4. The ball and hill analogy – there are two stable states on the flat on either side. a) b) Fig.5. Analogy to normal flip-flop operation – the ball lands close to the stable state and quickly attains a stable state. a) b) Fig.6. Analogy to metastable flip-flop operation – the ball lands close to (a), or exactly on (b), the top of the hill and takes a long time to return to a stable state. Practical Electronics | April | 2021 ‘Buridan’s ass’ (donkey). The idea is that an animal (the donkey) is positioned exactly halfway between two equally desirable items of food or drink and therefore is unable to decide which one to consume – it takes so long to make up its mind that it dies of hunger or thirst. As well as featuring in philosophical discussions on reason and determinism from antiquity (predating Buridan) the idea has been used numerous times in popular culture. Buridan’s unfortunate ass (donkey) – courtesy of Julian For more information, see the Mayers, YouTube. Wikipedia page on Buridan’s ass (http://bit.ly/pe-apr21-ass). The second analogy – the ball and voltages (see Fig.7). Analysis results hill – is commonly used to help discuss in a differential equation, but we’ll metastable circuits. It helps us understand not go into the full details of the maths the variation of resolution time with here. However, readers familiar with input timing. The idea is illustrated RC charging may not be surprised to in Fig.4. The ball can be in one of two learn that the solution is an exponential stable positions on either side of the hill function relating V D at time t after – this corresponds to the latch circuit the clock edge to the initial voltage holding a 0 or a 1. Attempting to store a difference captured by the loop (VD0). new value in the latch corresponds with The smaller VD0 is the longer it takes kicking the ball. To properly reload the for the latch to get back to a normal same state the ball receives a small kick state – this corresponds with the ball and quickly rolls back from the unstable landing closer to the top of the hill in position on the slope to the original stable the analogy discussed above. state (Fig.5a). To cleanly change state it In terms of digital circuit design, we receives a large kick, lands low down would like any flip-flop that happens on the other side and quicky rolls to the to go metastable to recover sufficiently other stable position (Fig.5b). quickly not to cause any problems. This analogy is not based on the Typically, this means within one clock detailed physics of kicked balls – we cycle, with relevant parameters such as assume the ball drops vertically onto the delays and setup time taken into account, hill. If the ball receives an intermediatein a similar way to our earlier discussion strength kick, corresponding with a latch on maximum clock frequency. This sets storing an intermediate voltage part way a maximum resolution time (TR) which between 0 and 1, it lands near the top of we can tolerate. Fig.3 shows two possible the hill. It will take much longer to roll voltage waveforms on the latch output to a stable state (Fig.6a), or, in the most (for equal but opposite initial voltages). extreme case the ball will balance exactly This is extended in Fig.8 to show a range on the hill-top and take a potentially of waveforms resulting from different infinite time to reach one of the stable initial voltages. The latch exits from states (Fig.6b). metastability when the voltage difference (VD) exceeds the minimum which can be considered as ‘normal’ latch operation – Circuit analysis with digital 0 and 1 on the latch outputs The inverter loop shown in Fig.2 captures two voltages – on the output of each inverter when the clock occurs. Normally, one inverter is at logic 0 and the other at 1, so the voltage difference between the two outputs (V D ) is relatively large. However, if VD the data is changing at the time of the clock, as shown in Fig.3, the loop will capture a small voltage difference. We can model what happens by considering the inverter loop as two amplifiers connected to RC circuits (wiring and Fig.7. The inverter loop (see Fig.2) in the inverter input and output capacitance latch behaves like two amplifiers each driving and resistance) – inverters act like an RC circuit. amplifiers with intermediate input 59 multiply the probabilities to find the overallinprobability, soCircuits PF = PE–PSPart . Timing and Metastability Synchronous 2 When the voltage OK – voltage reaches VN before TR D i g i t a l c i r c u i t s a r e p r o c e ssing difference (VD) VN between the information continuously, so a single Timing and Metastability in Synchronous Circuits – Part 2 inverters becomes failure probability is not very useful. greater than the normal difference We are more interested in how 𝑇𝑇often the $ (VN) the latch exits 𝑉𝑉!"# 𝑉𝑉# exp &− Given * an circuit will fail in = operation. FAIL metastability 𝜏𝜏 voltage < VN at TR asynchronous𝑇𝑇input to a synchronous T $– Part 2 VN exp – R Timing andStill Metastability in Synchronous Circuits metastable 𝑉𝑉!"# = 𝑉𝑉#the exp &− of * failure will be given latch, rate τ 𝜏𝜏 by the rate at which data is 𝑇𝑇$changing TR = exp * Time t (the data rate fD)𝑃𝑃%and the&− probability of 𝜏𝜏 failure (PF 𝑇𝑇 from above), which occurs TR $ 𝑇𝑇$ –VN exp – exp *data changes. We get: time the % == 𝑉𝑉𝑃𝑃each 𝑉𝑉#&− exp τ !"# 𝜏𝜏&− 𝜏𝜏 * TR is the maximum time the latch can 𝑇𝑇$ Failure rate = 𝑓𝑓! 𝑃𝑃& = 𝑓𝑓! 𝑃𝑃' 𝑃𝑃% = 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" exp &− * remain metastable 𝜏𝜏 without causimg a system failure 𝑇𝑇$ 𝑇𝑇 𝑓𝑓! exp 𝑃𝑃' 𝑃𝑃%&− = 𝑓𝑓$!*𝑓𝑓( 𝑇𝑇" exp &− * Failure rate = 𝑓𝑓! 𝑃𝑃& 𝑃𝑃= = % 𝜏𝜏 𝜏𝜏 –VN 𝑇𝑇 exp - 𝜏𝜏$reliability . It is common to discuss system MTBF = 𝑓𝑓 𝑓𝑓 𝑇𝑇 in terms of Mean ! ( " Failures 𝑇𝑇 Time Between $ exp - 𝜏𝜏$ . is simply the𝑇𝑇reciprocal 𝑓𝑓!(MTBF), 𝑃𝑃& ==𝑓𝑓! 𝑃𝑃which 𝑃𝑃% = 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" exp &− * Failure rate = MTBF ' Fig.8. Voltage difference changes in a latch which enters metastability at time t = 0. 𝜏𝜏 MTBF of failure𝑓𝑓rate 𝑓𝑓 𝑇𝑇(1/ failure rate). The VD ( ) ( ) ! ( " for a flipflop is: values are equally probable, the failure – call this VN. From the solution of the loop differential equation, we can find probability is simply the proportion Timing and Metastability in Synchronous Circuits – Part 2 the relationship between the initial of metastable V D0 values less than voltage difference (VD0) resolution time VD0N. The maximum value of V D0 for (TR) and the metastability exit voltage metastability is VN as initial voltages (VN). The boundary between the circuit above this implies 𝑇𝑇$ normal operation. So, the=probability failing and not failing occurs when the 𝑉𝑉!"# 𝑉𝑉# exp &− of* failure after entering 𝜏𝜏 metastability is PS = VD0N/VN. From the voltage difference just reaches VN at TR exponential equation above we get: (see Fig.8). This occurs with a specific Metastability in Synchronous Circuits – Part 2 initial voltage difference VD0N. From the 𝑇𝑇$ 𝑃𝑃% = exp &− * circuit equation we find (if we solve the 𝜏𝜏 differential equation): This gives a probability that the latch 𝑇𝑇$ will fail if it has become metastable, but 𝑉𝑉!"# = 𝑉𝑉# exp &− * 𝑇𝑇$ 𝜏𝜏 Failure rate = 𝑓𝑓! for 𝑃𝑃& = 𝑃𝑃' 𝑃𝑃% =failure 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" probability exp &− * (PF) we an𝑓𝑓!overall 𝜏𝜏 Here, τ is the time constant of the latch also need to know the probability that loop – it depends the latch enters metastability in the first 𝑇𝑇$on resistor and capacitor values place (PE). This 𝑃𝑃% =and exp amplifier &− * gain (Fig.7). 𝑇𝑇$ is more straightforward 𝜏𝜏 exp -and to calculate 𝜏𝜏 . was mentioned in last MTBF = month’s 𝑓𝑓 article. The probability of a Probabilities and MTBF ! 𝑓𝑓( 𝑇𝑇" latch becoming metastable is basically If the clock happens to occur within 𝑇𝑇$ Failure rate = the 𝑓𝑓! 𝑃𝑃&time = 𝑓𝑓!range 𝑃𝑃' 𝑃𝑃% = 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" expas &−T0 in * Fig.3 the proportion of the clock cycle taken designated 𝜏𝜏 by T0, that is PE = T0/TC – we assume then the latch will go metastable. Within this period, we will assume that all the asynchronous signal can change initial voltages 𝑇𝑇(VD0) occur with equal at a point of the clock cycle with expThe - 𝜏𝜏$ .probability that the equal probability. We can also write probability. MTBF =(given that it went metastable this as PE = fcT0, where fc is the clock latch fails 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" in the first place) is the probability frequency. The probability of the latch that the latch is still metastable after failing (PF) is the probability that it both the acceptable TR – call this PS (‘still enters metastability and that it is still metastable after the acceptable resolution metastable’ probability). This is the time. If something is dependent on probability that VD0 is less than VD0N. two conditions occurring together we Given the assumption that all V D0 DAsync D Q DSync DAsync D Q DSync1 Synchronous system Clock Fig.9. Single flip-flop synchroniser to protect a synchronous system from metastability due to an asynchronous input. 60 𝑇𝑇 exp - $ . 𝜏𝜏 MTBF = 𝑓𝑓! 𝑓𝑓( 𝑇𝑇" Note the change in sign in the exponential from taking the reciprocal. Synchronisers A typical strategy for avoiding errors due to metastability caused by asynchronous inputs to synchronous systems is to add a flip-flop, clocked by the system clock, between the asynchronous signal and the system input (see Fig.9). This is known as a ‘synchroniser’. The idea is that it is OK for the synchroniser flipflop to become metastable as long as it recovers by the next clock cycle – exactly the scenario we calculated the MTFB for above. An important thing here is that adding the synchroniser does not eliminate the possibility of failure of the system, but it will be lower than if the signal was input directly. The MTBF equation above indicates the performance of the synchroniser, but we need to be able to interpret the results correctly. Typically, 63% of items will fail in the MTBF time, so it generally needs to be considerably longer than the acceptable error-free lifetime of the system. The issue is compounded in large digital circuits which contain a large number D Q DSync2 Synchronous system Clock Fig.10. Two-flip-flop synchroniser. Practical Electronics | April | 2021 of synchronisers – the system may fail if any one of them fails. If we know the values for τ and T0, which are dependent on the specific technology and flip-flops used, then we can calculate the MTBF. The clock and data rates should be known from the system specification and TR is typically the clock period minus the setup time of the synchronous system input and any propagation delays from the synchroniser to the system. Example As an example MTBF calculation, we will use τ = 65ps and T0 = 400ps – these are not for a particular technology, just for illustration. Consider a system with a clock of 500MHz and an input data rate of 150MHz. The clock cycle is 2ns, so TR must be less than this, say 1.6ns (again, just for illustrative purposes). Putting these numbers into the MTBF equation we get about 3.5 hours – so if we built many copies of our circuit, 63% would fail in the first 3.5 hours of operation. This is unlikely to be acceptable! A possible solution, if a single synchroniser flipflop is unable to achieve a sufficient MTBF, is to use two or three in chain. For two flip-flops (see Fig.10) the probability that the second is metastable at the point the data enters the system is PF = PEPS1PS2 – that is, the first flip-flop has to enter metastability and still be metastable after TR, causing the second one to enter metastability and it still has to be metastable after a further TR. If both have the same parameters, we end up with a new MTBF equation with 2TR instead of TR in the exponential. Running this calculation with the values from the previous example gives a MTBF of about 150 million years and a much smaller likelihood of failure during the circuit’s operational lifetime. In practice, it may be difficult to find values for τ and T0, but hopefully these examples illustrate the fact that it is not necessarily obvious how many synchroniser stages are required. Multi-bit synchronisation The synchronisers shown in Fig.9 and Fig.10 can only be used for single-bit data. If we have multi-bit data, then it may seem that we could simply use a synchroniser on each bit in parallel. Unfortunately, the nature of metastability and the effect of slight differences in input timing, clock skew and the variability of individual flip-flops make this a very risky approach. Consider a single bit entering a synchroniser flipflop; say it changes from 0 to 1 but causes metastability which resolves in time, so does not cause a failure. It may resolve to 0 or 1, depending on exactly what got Practical Electronics | April | 2021 captured in the latch. On the next clock cycle the input bit will have definitely settled to 1, so the synchroniser flip-flop will load a 1 with no metastability. The 1 enters the system OK in this scenario, but there is uncertainty as to which clock cycle this occurs in. This is fine for a single bit – it is asynchronous, so it is not expected in a particular clock cycle. For multi-bit values, which happen to change close to the clock, individual bits in a set of parallel synchronisers could resolve on different clock cycles. This would present corrupt data to the system for a clock cycle – which could be catastrophic, depending on the implications of inputting wrong values. For transferring multi-bit data between synchronous systems, we must use different approaches. One way is to use synchronised handshake signals. The sender sets up a new data value, and only after this is stable sends a single-bit ‘I have data for you’ handshake signal via a synchroniser to the receiving system. The receiver sends a single-bit acknowledge signal back, again via a synchroniser, when it has loaded the data. This is effective, but relatively slow. For faster data rates a special FIFO (first in first out) memory can be used. Any FIFO contains a bank of dual port memory (it is written to and read from via different ports rather than a single bus) and acts as a buffer where data production and consumption rates may vary (like buffering online videos). If both sides use the same clock things are straightforward, but if they are asynchronous the problem is not synchronising the memory data but synchronising the counters that keep track of where the data is being written and read in the memory. These have to be compared to check for FIFO full and empty conditions. As one counter is associated with each asynchronous system, they are multibit values which have to be synchronised in order to perform the comparisons. The clever trick here is to use a Gray code number system for the counters. In Gray code an increment of one causes just one bit to change, so parallel synchronisers like those in Fig.9 and Fig.10 can be used on each bit. Since only one bit changes at a time, there is no possibility of corrupting the count value by different synchronisers resolving in different clock cycles. Simulation files Most, but not every month, LTSpice is used to support descriptions and analysis in Circuit Surgery. The examples and files are available for download from the PE website. www.poscope.com/epe - USB - Ethernet - Web server - Modbus - CNC (Mach3/4) - IO - PWM - Encoders - LCD - Analog inputs - Compact PLC - up to 256 - up to 32 microsteps microsteps - 50 V / 6 A - 30 V / 2.5 A - USB conﬁguration - Isolated PoScope Mega1+ PoScope Mega50 - up to 50MS/s - resolution up to 12bit - Lowest power consumption - Smallest and lightest - 7 in 1: Oscilloscope, FFT, X/Y, Recorder, Logic Analyzer, Protocol decoder, Signal generator 61