This paper presents a 20 Gb/s injection-locked clock and data recovery circuit for burst-mode applications. Using a new injection-locked LC oscillator as a half rate CDR, relaxes the speed requirements of VCO and re-timing flip flops. This allows us to design higher speed CDR with lower power consumption. Complementary clocks drive two flip flops in the decision circuit for re-timing the data. Simulated in 0.18 μm CMOS, the circuit consumes 68.7 mW at 20 Gb/s, while the recovered clock rms jitter is 0.84 ps, rms in response to a 27-1 PRBS.

Keywords—burst mode, clock and data recovery (CDR), fast acquisition CDR, injection locking, injection-locked LC oscillator

I. INTRODUCTION

Clock and data recovery circuits (CDR) are widely used in modern high speed serial link communications where a data stream is transmitted without any additional timing reference. Therefore clock should be extracted from the data to reach synchronous data processing in the receiver. In high-speed optical multi-access network applications such as passive optical networks (PON), amplitude and phase of packets are not fixed and may vary from one packet to another [1]. Therefore, burst-mode receivers must realign clock phase at beginning of every packet and as a result they require fast response time to improve detection of short packets or burst data [2]. The clock and data recovery circuit in these receivers extracts a clock from the header area of the burst packets in a small fraction of a packet transmission time [3].

The clock and data recovery circuits can be implemented as open or closed loop circuits. Closed loop CDRs usually use PLL architecture to track the data phase. While closed loop CDRs offer a great performance in the presence of frequency offset and can also suppress the jitter, (Jitter suppression has been achieved by using low bandwidth loop filters in PLL-based CDRs) It leads to long lock time which disqualified them to be used in burst mode data recovery. Since open loop CDRs can achieve fast locking, they are attractive solutions for burst mode data communications. However, open loop CDRs suffer from limited frequency tracking and very weak jitter suppression [4]. Since no repeater is required in these applications, jitter transfer is not a serious issue [5] and open loop CDRs are proper solution for such applications.

Open loop CDRs can be designed by Gated Voltage Control Oscillator (GVCO) or Injection-Locked LC oscillator (ILO). GVCO is used in [5]-[7] for burst mode applications. CDR circuits using GVCOs consume low power and have compact structures. However, ring structure of the GVCOs which is shown in Fig.1 will raise jitter in extracted clock. Also, the operating frequency of GVCOs is lower than ILO [8]. In injection locking method LC VCO is used instead of ring oscillator. It has better phase noise and higher speed in comparison to ring oscillator.

A CDR circuit based on injection locking technique is described in [8], in which rise and fall edges of received data has been detected with an edge detector and then injected to an LC oscillator, so it extracts a full rate clock from the input data. However, using injection-locked oscillator as a full rate CDR in the foregoing technique leads to design a high frequency VCOs. Another issue is design of edge detector which needs high speed CML XOR and delay cell circuits. In [9] a half rate injection-locked CDR is described. In this method the injection signal is applied to a super-harmonic ILO. In a super-harmonic structure the only block which operates at twice of the oscillator frequency is edge detector. Therefore, design of edge detector is serious bottleneck in high speed CDR design. Also, these blocks increase power consumption and area of the chip.

In this paper, a high speed injection-locked CDR circuit has been proposed with a new injection technique. This new architecture improves the speed performance of injection-locked CDR circuit by eliminating edge detector and sampling data with a half rate clock. Since in this new injection technique the output frequency of the oscillator is half of the bit rate, the VCO and the decision circuit operation frequencies are reduced to half of the corresponding value in full rate techniques.

In [8] long runs cause fluctuations in VCO output amplitude due to steering unbalanced currents in oscillator.

Fig. 1 Gated voltage control oscillator (GVCO)
outputs. So to purify the clock, output of VCO1 is injected to another identical VCO. However, in our proposed CDR there is no steering current in long runs. Therefore, the output clock does not suffer from fluctuations.

II. THE PROPOSED CDR

Fig. 2 shows the proposed CDR architecture. This architecture incorporates an injection-locked oscillator, a reference PLL and a decision circuit which consists of two retiming flip-flops. The data is injected to the injection-locked LC oscillator to recover the clock. The oscillator output is fed to a clock buffer and the differential output signal of the clock buffer is used by two flip-flops for sampling the data with the half-rate clock. As the injection locking technique has narrow locking range, the VCO may not lock to the frequency of the incoming data stream. Therefore, a PLL is used to generate a 10GHz clock from a reference clock which is 312MHz here. The VCO of the reference PLL is designed identical to the VCO in injection locking part; so the control voltage of this VCO can be used as an initial value or approximation to help the VCO lock easily to the incoming data. It also helps to compensate for the PVT variations [8]. The two oscillators may suffer from mismatches in fabrication, but it can be minimized with proper designing of layout. Since the data line can pull the VCO frequency through coupling between D and CLK ports of the flip-flops, a clock buffer is used to isolate them [10].

Since the data rate is constant, the output frequency of the oscillator is constant as well. Nevertheless, VCO deviates from its locking conditions by the PVT variations and fail the correct data recovery. As stated before, a reference PLL is used to accommodate the variations and correct the free running frequency of the VCO.

III. BUILDING BLOCKS

A. Proposed Injected-Locked LC Oscillator

In full rate CDRs, output clock frequency must be equal to the data rate for sampling the random data. In [8] the data is applied to an edge detector circuit for creating a frequency component at the input data rate. To increase CDR speed, clock frequency can be half of the data rate and sample the data with two 180 degree phase difference signals.

There are several techniques to inject incident signal to differential LC oscillators. In the first technique, incident signal is injected to the gate of the tail transistor as is shown in Fig.4 (a) [11]. Here cross coupled transistors mix the output and incident signal to generate \( \omega_0 + \omega_{\text{inj}} \) component while the sum component (\( \omega_0 + \omega_{\text{inj}} \)) filtered by the tank [12]. In this method, the tail node oscillates at twice the output frequency, so it is a good point for a second harmonic injection.
The drawback of the first topology is large input capacitor which reduces locking range and frequency operation of the oscillator [13]. Also, this structure has single-ended injection input and as a consequence the previous stage sees non-symmetric load. The topology, shown in Fig. 4(b), has lower input capacitor and also differential inputs, but it cannot be used as a super harmonic structure. In addition this technique suffers from asymmetric loading on the oscillator outputs nodes in long runs. Therefore, long runs lead to distorted output signal in oscillator output [14]. For using third topology in a half-rate CDR structure, the input data cannot be directly injected to the injection switches because it shorts the VCO outputs in long runs. Therefore the data should be applied to an edge detector before injecting to VCO.

In proposed structure as is shown in Fig. 5(a) the data is injected to an ILO with a simple RC network directly instead of applying to an edge detector. M1 and M2 create an incident current in rise edge of the data. Thus, in every data transition the two outputs of the oscillator will be shorted. Therefore the oscillator output clock will lock to data edges. This structure is used as a half rate CDR in this paper. Its advantages are low power consumption and reducing maximum operational frequency by half in compare to the full rate structures. The foregoing technique can be used as full rate CDR as is shown in Fig 5(b).

In the other technique the incident signal directly injected to the LC tank of oscillator via MOS transistors [8], [13]. In [8] the coupling has been done with two NMOS transistors as shown in Fig. 4(b). And in [13] the switch transistors are located between two outputs as shown in Fig.4(c).
edge of the data, gate voltages go down from bias point so the transistors do not pass any current. Fig. 8 shows the current of M1 and M2 for a 20 Gb/s input data.

\[ \text{Fig. 6 Circuit for Direct Injection of Data} \]

\[ \text{Fig. 7 Vg1 and Vg2} \]

\[ \text{Fig. 8 The Injection Currents} \]

In [12] locking range of an ILO is calculated using a phase-domain approach. Injection causes phase shift around feedback loop of the oscillator. To satisfy Barkhausen phase criterion the tank must oscillate in a new frequency \( \omega_{\text{inj}} \). The obtained locking range is proportional to \( \omega_{\text{inj}} / I_{\text{bias}} \), where \( I_{\text{inj}} \) is the injection current, \( I_{\text{bias}} \) is the bias current of the ILO, Q is the quality factor and \( \omega_0 \) is the self-oscillation frequency of the LC tank. It can be seen from the above equation that the locking range is directly proportional to the \( I_{\text{inj}} / I_{\text{bias}} \) and inversely proportional to the quality factor. For increasing locking range we can enhance the injection ratio or decrease the quality factor. To increase \( I_{\text{inj}} \) we can increase the size of the switch transistors but it increases the parasitic capacitances and reduces the operation frequency of the ILO [15].

B. Flip Flop

For designing high speed circuits, current mode logic (CML) circuits are used because of their high frequency operation. Fig. 9 shows the operation principle of this logic family. The CML gates are fully differential, so they reject any common-mode noise created by the power supply or the ambient noise.

\[ \text{Fig. 9 The CML circuit overview} \]

![Inductive peaking and cross coupled transistors](image)

There are some techniques to improve speed of CML circuits. In [16], [17], an inductor is added in series with the load resistor to extend the bandwidth. This inductor shunts the output capacitor and adds a zero to the frequency response. In fact, inductor steers the current to the capacitor and therefore reduces the rise time. This method is called inductive peaking.

Another way to increase the speed of CML circuits is to add an extra cross-coupled transistor pair to the outputs as shown in Fig. 10. The added cross-coupled transistors provide positive feedback which accelerates transition of output signals.
Figs. 11 shows latch circuit that is designed as a CML block and uses inductive peaking technique to enhance the speed.

![Latch Circuit Diagram](image)

**Fig.11 Implementation of latch**

C. Reference PLL

As mentioned before, a reference PLL is used to provide a control voltage for the injection-locked main VCOs. The PLL circuit is depicted in Fig. 12. In this PLL a VCO that is identical to the main VCO is used to produce the required control voltage.

![Reference PLL Circuit](image)

**Fig.12 Reference PLL circuit**

A 312MHz reference frequency is applied to a conventional PFD which compares its phase with the VCO output frequency and produces an up-down signal for the V/I converter that is followed by a second order loop filter. The V/I converter is shown in Fig. 13 [18]. The loop filter generates a proportional control voltage for tuning main VCO’s frequency. A divider chain is placed in the feedback path that divides output of the VCO by a factor of 64.

![V/I Converter](image)

**Fig.13 V/I converter**

Since in this design, PLL must lock to a constant frequency, settling time and reference frequency magnitude is not of prime importance. By reducing the bandwidth of loop filter and choosing a relatively high reference frequency we can reduce VCO’s control voltage ripples. A CML divider is used for the first two stages in the divider chain to work at high speed. For the next stages, dynamic dividers are designed as shown in Fig.14. This structure is a T flip flop that is made of two consecutive D flip flops (CML divide-by-two circuit is also made of two CML D latches combined to make a CML T flip flop). This divider consumes less power compared to CML circuits. These D flip flops are dynamic latches that sample the data in every rising edge of the clock. With the cross coupled PMOS configuration, the voltage of output nodes can change faster that enables the circuit operation at high speeds.

![Dynamic Frequency Divider](image)

**Fig.14 Dynamic frequency divider**

IV. Simulation Result

The proposed circuit has been designed and simulated in a 0.18 μm CMOS technology. This circuit consumes 68.9 mW from a 1.8V supply voltage, where 38.6mW is consumed in the reference PLL and 30.3mW is consumed in the CDR core (including the two flip flops). As the performance (locking speed, etc) of the reference PLL does not have a large impact on the total system performance, the power consumption of
that PLL can be reduced to achieve an optimized design.

Fig. 15 depicts the reference PLL output clock spectrum and waveform. The spur of the clock reference are approximately -60dBc. Also transient response of the VCO control voltage is represented in Fig. 16.

The input data is shown in Fig. 17(a) and the recovered data in response to a 2^-1 continuous mode PRBS is shown in Fig. 17(b). The simulated jitter values for the recovered data are 4.8 ps, pp and 1.3 ps, rms. The recovered clock is shown in Fig. 18. The simulated peak-to-peak and rms jitters for the recovered clock are 2.4 ps, pp and 0.84 ps, rms, respectively. The VCO locking range for a fixed control voltage is achieved at 150 MHz. By using the reference PLL, the locking range of the CDR circuit is extended to 1 GHz. The CDR circuit recovers the data in less than 50 ps which is equal to 1 bit. Fig. 19 shows this fast locking time.

The circuit is simulated in different process corners, temperatures and supply voltages. Fig. 20 shows how the VCO frequency changes with PVT variations. In slow/slow process corner, 70°C and 1.65V, the rms and peak to peak jitter of the recovered clock are 1.07 ps and 3.2 ps, respectively. The consumed power in this situation is 60.7 mW and the circuit locks after 1 bits. In fast/fast corner, −20°C and 1.95 V, the rms and peak to peak jitter of the recovered clock are 0.86 ps and 2.43 ps, respectively. The consumed power is 83.8 mW and the circuit locks after 1 bits.

Table 1 summarizes and compares the performance of this work and some other burst mode CDRs recently published in the literature. By using a super-harmonic ILO instead of first-harmonic ILO which can be seen from the table, the proposed design achieves a performance similar in speed to [8], while it uses a technology that is two nodes older than that of [8] and also consumes less power because of using half rate clock.
A 20Gbps CDR circuit for burst mode applications is proposed and simulated in a 0.18µm CMOS process. Using super harmonic injection locking technique, frequency of the oscillator is reduced to the half of the bit rate, so two flip flops are needed for sampling the data with the half rate clock. To help CDR lock to the incoming data and track the frequency in the presence of PVT variations, a control voltage is made by a reference PLL and applied to the VCOs. Utilizing a new injection-locked oscillator as the VCO in this work allowed achieving the same high bit rate (20Gbps) in an older technology (i.e., 0.18 µm CMOS) compared to the previously published work (in 90 nm CMOS).

Table 1 The Measurement Results of similar works

<table>
<thead>
<tr>
<th></th>
<th>[7]</th>
<th>[19]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data Rate</td>
<td>10 Gb/s</td>
<td>1.3-5.2 Gb/s</td>
</tr>
<tr>
<td>Recovered Clock Jitter</td>
<td>1.35 ps,rms (with 2^-1 PRBS)</td>
<td>0.82 ps,rms (with 2^-1 PRBS)</td>
</tr>
<tr>
<td>Operation Range</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Locking Time</td>
<td>32UI</td>
<td>&lt; 20UI</td>
</tr>
<tr>
<td>Supply Voltage</td>
<td>1.8 V</td>
<td>1.1 V</td>
</tr>
<tr>
<td>Power Diss.</td>
<td>200 mW</td>
<td>12.4 mW</td>
</tr>
<tr>
<td>Technology</td>
<td>0.18µm CMOS</td>
<td>40nm CMOS</td>
</tr>
</tbody>
</table>

Table 2 The Simulation Result of the proposed CDR compared with similar work

<table>
<thead>
<tr>
<th></th>
<th>[9]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data Rate</td>
<td>71 Gb/s</td>
<td>20 Gb/s</td>
</tr>
<tr>
<td>Recovered Clock Jitter</td>
<td>N/A</td>
<td>0.84ps,rms (with 2^-1 PRBS)</td>
</tr>
<tr>
<td>Operation Range</td>
<td>N/A</td>
<td>1GHz</td>
</tr>
<tr>
<td>Locking Time</td>
<td>N/A</td>
<td>1UI</td>
</tr>
<tr>
<td>Supply Voltage</td>
<td>3.3 V</td>
<td>1.8 V</td>
</tr>
<tr>
<td>Power Diss.</td>
<td>0.5 W</td>
<td>68.9 mW</td>
</tr>
<tr>
<td>Technology</td>
<td>0.18µm SiGe</td>
<td>0.18µm CMOS</td>
</tr>
</tbody>
</table>

V. CONCLUSION

REFERENCES


[5] P. Han and W. Y. Choi, “1.25/2.5-Gb/s dual bit-rate burst-mode clock recovery circuits in 0.18-µm CMOS technology,” in IEEE Transactions...


