# Dual Sawtooth-Based Delay Locked Loops for Heterogeneous 3-D Clock Networks

Andres Ayes and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester Rochester, New York (aayes, friedman)@ece.rochester.edu

*Abstract*—3-D systems and chiplets pose a significant challenge for reliable timing of heterogeneous integrated systems. Delay locked loops are capable of synchronizing clock signals; however, the increasing speed of deeply scaled technologies leads to large and complex delay lines. In this paper, a dual sawtooth-based delay locked loop is proposed to address the increasing difficulty of delay generation in high speed 3-D systems. The proposed architecture lacks a traditional voltage controlled delay line in favor of differential integrators to generate the target delay. The architecture is verified using PTM 7 nm models and achieves locking speeds as low as six cycles for a 1 GHz clock signal.

Index Terms—3-D IC, synchronization, delay locked loop, through silicon via (TSV), chiplets

# I. INTRODUCTION

The individual layers of a three-dimension (3-D)/chiplet system experience significant variations in process, voltage, and temperature (PVT) [1]. These variations may result in timing failures. Phase compensation circuits, such as phase locked loops (PLLs) and delay locked loops (DLLs), mitigate the occurrence of these timing failures. DLLs have recently garnered significant attention due to a shorter locking time, improved linearity, and the lack of feedback paths within the DLL [2].

An advantage in synchronizing 3-D integrated systems is the short feedback path enabled by through silicon vias (TSVs) [3], [4]. A DLL can be placed above or below the leaf of a clock tree to synchronize the clock signal to the clock signal source. This approach supports accurate point-to-point synchronization between layers. A generalized representation of a DLL within a 3-D environment is illustrated in Fig. 1. To fully reap the benefits of point-to-point synchronization, a distributed approach composed of multiple DLLs spread throughout the system is proposed to ensure effective timing at the local level. Delay lines within DLLs, however, require multiple delay components. These delay components become faster with technology scaling. The short latency of these



Fig. 1. Single layer DLL within 3-D system.

components leads to either small delay ranges of the DLL or an excessive number of delay components.

A DLL architecture that replaces the traditional phase detector and voltage controlled delay line with sawtooth waveforms to align the clock edges is proposed here. The proposed DLL achieves locking times comparable to fast locking DLLs. The sawtooth DLL periodically aligns the clock signals and is managed by a central DLL control unit that determines the periodicity of the alignment. The focus of this paper is on the sawtooth DLL.

This paper is organized as follows: Background on previous work on DLLs is provided in Section II. The operating principle of the DLL is described in section III. The circuit topology of the proposed DLL architecture is described in Section IV. Simulation results are reviewed in Section V. Certain conclusions are drawn in Section VI.

# II. BACKGROUND

Clock tree topologies for fully synchronous 3-D systems have previously been explored [5], [6]. However, with the advent of heterogeneous integration and multi-clock domains, modular solutions to ensure timing in 3-D/chiplet systems require renewed exploration.

As noted in section I, phase compensation circuits are frequently used at the interface between the 3-D/chiplet boundaries, as described in [7]. DLLs are a favored candidate for phase compensation and are generally comprised of the

The effort depicted is supported by the National Science Foundation under Grant No. 2124453 (DISCOVER Expeditions). The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. This work was also supported by Qualcomm Corporation.

following components: a phase detector, charge pump, loop filter, and voltage controlled delay line [2]. In digital DLLs, the delay line has multiple tap points that are multiplexed by a digital code. Speedup techniques to improve the locking time (successful alignment of the reference and feedback signals) include successive approximation register (SAR) algorithms and flash time-to-digital conversion (TDC) [8]–[12]. DLLs targeting 3-D systems have also been proposed [13], [14].

A primary concern in modern DLLs is the temporal range of the delay lines. The delay components are typically composed of NAND gates or inverter cells. In modern technology nodes, the delay of each cell can be quite small (tens of picoseconds). This behavior leads to longer delay lines composed of complex DLL structures to achieve a target delay. A sawtooth waveform is one method to mitigate these challenges. A single ended sawtooth-based delay line is proposed in [15]. In this work, a DLL architecture based on a differential sawtooth waveform to perform phase detection and delay generation is presented.

#### **III. OPERATING PRINCIPLE**

The operating principle of a sawtooth DLL is described in this section. Assume a reference clock signal with a 50% duty cycle and period  $T_{period}$  with an associated sawtooth waveform. The waveform rises when the clock signal is high and falls when the clock signal is low, as depicted in the first two waveforms shown in Fig. 2. After the START signal goes high, the DLL sources a pulse from the reference clock signal which propagates through the clock tree. Once the clock signal arrives at the feedback input of the DLL, the signal is delayed by the propagation delay  $T_{delay}$  of the clock tree. The sawtooth DLL maintains the voltage level of the sawtooth waveform  $V_{offset}$  at the rising edge of the feedback signal.



Fig. 2. Sawtooth DLL waveforms depicting the ideal timing relationship between the reference clock signal, DLL output, and feedback clock signal.

The delay inserted by the DLL into the clock path occurs when the sawtooth waveform crosses  $V_{offset}$  in the opposite direction. This event is indicated as  $T_{output}$ . This relationship is described by the following expressions:

$$T_{output} = T_{delay} + 2(T_{period}/2 - T_{delay})$$
  
=  $T_{period} - T_{delay}$ , for  $T_{delay} < T_{period}/2$ , (1)

$$T_{output} = T_{delay} + 2(T_{period} - T_{delay})$$
  
=  $2T_{period} - T_{delay}, for T_{delay} > T_{period}/2.$  (2)

Note that the factor of two in (2) occurs when  $T_{delay} > T_{period}/2$ . An ideal DLL requires an additional cycle when  $T_{delay} > T_{period}/2$ . After the output signal of the DLL propagates through the clock tree, CLK<sub>IN</sub> and CLK<sub>FB</sub> are aligned. The first occurence of  $T_{output}$  is skipped to avoid a pulse overlapping between the first and second pulse of CLK<sub>OUT</sub>. The sawtooth DLL must generate a sawtooth shaped waveform, maintain the voltage offset, and delay the reference signal by  $T_{output}$ . The circuit architecture and components used to realize these functions are described in the following section.

#### **IV. CIRCUIT ARCHITECTURE**

The architecture and components of the sawtooth DLL used to realize the behavior outlined in section III are described in this section. A block diagram of the DLL is shown in Fig. 3. Two sawtooth waveforms are generated by the DLL. The first waveform aligns the rising edge, while the second waveform aligns the falling edge. The pulse generation stages detect when the sawtooth waveform crosses  $V_{offset}$ , generating a pulse to change the output state. The output generation stage changes the output clock waveform based on the timing characteristics of the previous stages.



Fig. 3. Block diagram of the proposed dual sawtooth DLL.

#### A. Sawtooth generation

A differential integrator is used to generate the sawtooth waveform [16]. Two capacitors,  $C_1$  and  $C_2$ , provide robustness to PVT variations and also allow the circuit to maintain the voltage offset by terminating the charge and discharge operation of one of the capacitors when the feedback clock signal arrives.



Fig. 4. Integrator circuit with differential output. When  $\text{CLK}_{\text{FB}}$  arrives, *i.e.*, at  $T_{delay}$ , capacitor  $C_2$  is no longer charged or discharged by the integrator, maintaining the voltage offset.

The signal that controls charging and discharging  $C_1$  switches from  $\text{CLK}_{\text{IN}}$  to  $\overline{\text{CLK}_{\text{IN}}}$ . This switch maintains the same timing relationship between the sawtooth waveform and  $V_{offset}$  shown in Fig. 2. This switch causes a voltage overshoot in  $C_1$  but returns to the average value of the voltage after a few cycles. The integrator is shown in Fig. 5a, and the relevant waveforms are shown in Fig. 5b. Note that  $\text{STOP}_{\text{INT}}$  transitions high at the rising edge of the first  $\text{CLK}_{\text{FB}}$  pulse, and the multiplexer switches the input of  $C_1$ . The integrator generator responsible for aligning the falling edge has a similar architecture where  $\text{STOP}_{\text{INT}}$  transitions high at the falling edge of the first  $\text{CLK}_{\text{FB}}$  pulse.

# B. Pulse and output generation

The DLL generates a delayed version of the input clock based on the timing information provided by the sawtooth waveform. The pulse generation stage is composed of a pulse generator and a level detector, which is achieved by a differential amplifier with  $C_1$  and  $C_2$  at the input. The pulse generator is composed of an inverter chain and a NAND gate. The pulse generator provides a pulse with sufficient hold time for the toggle flip flop. The toggle flip flop is the output generator, switching between high and low depending upon whether the flip flop receives a pulse from the rising edge locator or the falling edge locator. These components are illustrated in Fig. 6.

# V. PERFORMANCE OF SAWTOOTH DLL

The proposed DLL is evaluated assuming a 7 nm predictive technology model (PTM) [17]. A waveform of the relevant voltage nodes is shown in Fig. 7 for a 1 GHz reference clock signal and a clock delay  $T_{delay}$  of 300 ps. After the START signal transitions high, a single pulse from CLK<sub>IN</sub> is passed to the output. At the rising edge of the feedback clock signal,  $C_2$  of the RISE<sub>SAW</sub> waveform stops charging and discharging. At this moment, the input that charges  $C_1$  changes



Fig. 5. Integrator stage, (a) integrator circuit, and (b) sawtooth waveform with relationship to  $\rm CLK_{IN}$  and  $\rm CLK_{FB}.$ 



Fig. 6. Level detectors for rising and falling edges, pulse generators, and toggle output stage.

from CLK<sub>IN</sub> to  $\overline{\text{CLK}_{\text{IN}}}$ . The RISE<sub>TOGGLE</sub> and FALL<sub>TOGGLE</sub> pulses are, respectively, the output of the rising edge locator and falling edge locator, as illustrated in Fig. 6. These pulses are produced when the sawtooth waveforms cross  $V_{offset}$ . The ready signal indicates when the  $C_1$  sawtooth waveform has achieved steady state after the inputs have switched. In future work, this signal will be produced by the central DLL control unit which will determine the average cycle time based on statistical information characterizing the technology and system. For this work, two cycles are required to stabilize  $C_1$ .

Note the small pulse at the output generated during the low-to-high transition of the READY signal. This transition is caused by the pulse generators between the level detectors and the toggle output stage. A small pulse is produced upon waking up from sleep mode.

The skew reduction performance of the sawtooth DLL for different  $T_{delay}$  is shown in Fig. 8. The DLL successfully attenuates the skew between the feedback and reference across a range of delay from 0 ps to 950 ps. Note two exceptions: at 0 ps (*i.e.*, when  $T_{delay}$  is a multiple of  $T_{period}$ ) and at 450 ps (*i.e.*, when  $T_{delay}$  is a multiple of  $T_{period}/2$ ). This failure in attenuating the skew arises from the sawtooth waveforms crossing  $V_{offset}$  near the peak of the waveform. These peaks occur at  $T_{period}$  (for the 0 ps case) and at  $T_{period}/2$  (for the 450 ps case). In these cases, the central control DLL determines whether the sawtooth DLL applies no delay or inverts the reference clock signal, depending upon which peak is closer to  $V_{offset}$ .



Fig. 7. Relevant waveforms within the sawtooth DLL for  $T_{delay} = 400$  ps.

The differential nature of the sawtooth DLL offers additional benefits such as robustness to temperature corners. The DLL is evaluated at -50°C, 27°C, and 125°C. As shown in Fig. 9a, the extreme temperature corners affect the amplitude of the sawtooth waveforms; however, the timing relationship between  $V_{offset}$  and the sawtooth waveforms is maintained despite the wide temperature corners. The feedback signal CLK<sub>FB</sub> for the three cases is shown in Fig. 9b. Any difference at the output



Fig. 8. Attenuation of clock skew due to initial  $T_{delay}$ .

due to the temperature corners is barely noticeable. The DLL exhibits an average power dissipation of 510  $\mu$ W at -50°C, 613  $\mu$ W at 27°C, and 991  $\mu$ W at 125°C.



Fig. 9. Performance of the sawtooth DLL at extreme temperature corners.

#### VI. CONCLUSIONS

A differential sawtooth DLL to synchronize 3-D systems is described in this paper. The DLL aligns a reference signal to a clock leaf (register) on a separate layer within the 3-D system. The differential nature of the DLL provides robustness against temperature corners, an issue of significant concern in 3-D systems. The proposed sawtooth DLL is intended for distributed DLL networks where a central control unit provides relevant timing information, such as how frequently the signals should align to the DLL. The central control unit and placement optimization of the sawtooth DLL will be explored in future work.

## REFERENCES

- [1] V. F. Pavlidis, I. Savidis, and E. G. Friedman, *Three-Dimensional Integrated Circuit Design, 2nd Edition*, Morgan Kaufmann, 2017.
- [2] E. Salman and E. Friedman, *High Performance Integrated Circuit Design*, McGraw-Hill Professional, August 2012.
- [3] I. Savidis and E. G. Friedman, "Electrical Modeling and Characterization of 3-D Vias," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 784–787, May 2008.
- [4] —, "Closed-Form Expressions of 3-D Via Resistance, Inductance, and Capacitance," *IEEE Transactions on Electron Devices*, vol. 56, no. 9, pp. 1873–1881, September 2009.
- [5] V. F. Pavlidis, I. Savidis, and E. G. Friedman, "Clock Distribution Networks for 3-D Integrated Circuits," *Proceedings of the IEEE Custom Integrated Circuits Conference*, pp. 651–654, IEEE, September 2008.
- [6] —, "Clock Distribution Networks for 3-D Integrated Circuits," Proceedings of the IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 12, pp. 2256–2268, IEEE, December 2011.
- [7] W. Gomes et al., "Ponte Vecchio: A Multi-Tile 3D Stacked Processor for Exascale Computing," Proceedings of the IEEE International Solid-State Circuits Conference, pp. 42–44, February 2022.
- [8] M. E. Quchani and M. Maymandi-Nejad, "Design of a Low-Power Linear SAR-Based All-Digital Delay-Locked Loop," *Proceedings of the Iranian Conference on Electrical Engineering*, pp. 118–124, April 2019.
- [9] D. Park and J. Kim, "A 7-GHz Fast-Lock 2-Step TDC-Based All-Digital DLL for Post-DDR4 SDRAMs," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1–4, May 2018.
- [10] M. Hossain *et al.*, "A Fast-Lock, Jitter Filtering All-Digital DLL Based Burst-Mode Memory Interface," *IEEE Journal of Solid-State Circuits*, no. 4, pp. 1048–1062, February 2014.
- [11] J.-H. Chae et al., "A 1.74 mW/GHz 0.11–2.5 GHz Fast-Locking, Jitter-Reducing, 180 Phase-Shift Digital DLL with a Window Phase Detector for LPDDR4 Memory Controllers," *Proceedings of the IEEE Asian Solid-State Circuits Conference*, pp. 1–4, November 2015.
- [12] D. Zhang et al., "A Multiphase DLL with a Novel Fast-Locking Fine-Code Time-to-Digital Converter," *Proceedings of the IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 23, no. 11, pp. 2680–2684, December 2014.
- [13] C.-C. Chung and C.-Y. Hou, "All-Digital Delay-Locked Loop for 3D-IC Die-to-Die Clock Synchronization," *Proceedings of the IEEE International Symposium on VLSI Design, Automation and Test*, pp. 1–4, April 2014.
- [14] M. Sadi, S. Kannan, L. England, and M. Tehranipoor, "Design of a Digital IP for 3D-IC Die-to-Die Clock Synchronization," *Proceedings* of the IEEE International Symposium on Circuits and Systems, pp. 1–4, May 2017.
- [15] C.-C. Chen and S.-I. Liu, "An Infinite Phase Shift Delay-Locked Loop with Voltage-Controlled Sawtooth Delay Line," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 11, pp. 2413–2421, November 2008.
- [16] T. H. Smilkstein, "Jitter Reduction on High-Speed Clock Signals," Ph.D. Dissertation, University of California, Berkeley, August 2007.
- [17] Y. Cao, "Ptm," 2018, [Online]. Available: ptm.asu.edu