# **Exploiting On-Chip Inductance in High Speed Clock Distribution Networks** Yehea I. Ismail Department of Electrical and Computer Eng. Northwestern University Evanston, IL 60208 Eby G. Friedman Department of Electrical and Computer Eng. University of Rochester Rochester, NY 14627 and Jose L. Neves IBM Microelectronics 1580 Route 52 Fishkill, NY 12533 Abstract On-chip inductance effects can be used to improve the performance of high speed integrated circuits. Specifically, inductance can improve the signal slew rate (the rise time), virtually eliminate shortcircuit power consumption, and reduce the area of the active devices and repeaters inserted to optimize the performance of long interconnects. These positive effects suggest the development of design strategies that benefit from on-chip inductance. An example of an industrial clock distribution network is presented to illustrate the process in which inductance can be used to improve the performance of high speed integrated circuits. ## I. Introduction The importance of on-chip inductance is continuously increasing with faster on-chip rise times, wider wires, and the introduction of new materials for low resistance interconnect [1]-[16]. These increasing inductance effects are typically viewed as an added problem which must be properly managed. Dealing with inductance requires efficient extraction methods [3]-[7], [11] and increases the algorithmic and computational complexity of IC design and analysis tools. Furthermore, underdamped responses can cause reliability issues and increase noise in integrated circuits. However, these concerns regarding on-chip inductance are primarily due to insufficient sophistication of the tools and methods that are available for designing and analyzing high performance integrated circuits. To date, much of the effort within industry has focused on limiting the effects of inductance, see e.g., [1], [8], and [10]. However, suppressing inductance effects is typically at the expense of deteriorating the performance of an integrated circuit in terms of speed, power consumption, and/or device area. An example of an industrial clock distribution network is presented in this paper that illustrates this behavior. As described here, inductance has beneficial effects on integrated circuits such as faster signal rise times, lower power consumption, and less active device area. Design methodologies can be developed to exploit these useful effects of on-chip inductance while maintaining noise at acceptable levels so as to guarantee the reliable performance of an integrated circuit. The primary objective of this paper is to describe these beneficial effects of inductance on the performance of an integrated circuit. This topic is introduced in section II. The design of a clock distribution network is presented in section III to illustrate how inductance effects can significantly improve circuit performance. Finally, some conclusions are offered in section IV. # **II. Useful Inductance Effects** The effect of inductance on the rise time of signals within an integrated circuit is discussed in subsection A. It is shown that increasing inductance effects can result in faster signal rise times. In subsection B, the effects of inductance on the area of repeaters inserted to reduce signal degradation along long interconnects is discussed. It is shown that the total repeater area decreases as inductance effects increase. The effects of inductance on the power dissipated by CMOS gates are discussed in subsection C, particularly the dramatic decrease in the short-circuit power consumption of CMOS gates. Also, the decreased device area required to drive inductive lines results in less device capacitance, further decreasing the total power consumption. #### A. Effects of Inductance on the Signal Rise Time The faster rise times of signals in a high speed integrated circuit as inductance effects increase can be explained by examining the signal propagation characteristics of a lossy RLC transmission line. Signals attenuate when propagating across an RLC transmission line. This attenuation is frequency dependent and is given by $$\alpha = \omega \sqrt{LC} \sqrt{\frac{1}{2} \left( \sqrt{\left(1 + \left(\frac{R}{\omega L}\right)^2\right) - 1} \right)},\tag{1}$$ where R, L, and C are the resistance, inductance, and capacitance per unit length of the line, respectively, and $\omega$ is the radial frequency. The attenuation constant as a function of frequency is plotted in Fig. 1 with L = 10 nH/cm, C = 1 pF/cm, and with variable R. The frequency components of a signal launched at the input of an RLC transmission line travel at different speeds and suffer different levels of attenuation. As shown in Fig. 1, higher frequency components at the edges of a pulse suffer greater attenuation as compared to the low frequency components. The shape of a signal degrades as the signal travels across a lossy transmission line due to the loss of these high frequency components. The attenuation constant becomes less frequency dependent as the inductance effects increase or as R/WL decreases, as shown in Fig. 1. In the limiting case of a lossless line, which represents maximum inductance effects, the attenuation constant α becomes zero. Thus, as inductance effects increase, a pulse propagating across an RLC line maintains the high frequency components in the edges, improving the signal rise and fall times. This behavior is qualitatively illustrated in Fig. 2. Fig. 1. Attenuation constant of a lossy transmission line versus frequency. $L = 10 \, nH/cm$ , $C = 1 \, pF/cm$ , and R is 10, 50, 100, 200, and 400 $\Omega/cm$ , respectively. Fig. 2. Signal dispersion of a square wave signal in a lossy transmission line. a) Pulse shape after traveling along a lossless transmission line. b) Pulse shape after traveling along a lossy transmission line. # B. Effects of Inductance on the Repeater Insertion Process Repeater insertion has become a common design methodology for driving long resistive interconnect [18]-[24]. Since the propagation delay has a square dependence on the length of an RC interconnect line, subdividing the line into shorter sections by inserting repeaters is an effective strategy for reducing the total propagation delay. Currently, typical high performance circuits have a significant number of repeaters inserted along the global interconnect lines. These repeaters are large gates and consume a significant portion of the total circuit power. The propagation delay from the input to the output of an RLC line of length l with an ideal power supply and an open circuit load is given by [25] $t_{pd} = \sqrt{LC} (e^{-2.9(\alpha_{anym}l)^{1.35}} l + 0.74\alpha_{asym}l^2), \qquad (2)$ $$t_{nd} = \sqrt{LC} \left( e^{-2.9 \left( \alpha_{asym} l \right)^{1.35}} l + 0.74 \alpha_{asym} l^2 \right), \tag{2}$$ where $$\alpha_{asym} = \frac{R}{2} \sqrt{\frac{C}{L}}$$ (3) $\alpha_{\scriptscriptstyle{\text{anym}}}$ is the asymptotic value at high frequencies of the attenuation per unit length of the signals as the signals propagate across a lossy transmission line as shown in Fig. 1. For the limiting case where L ightarrow 0, (2) reduces to 0.37RCl<sup>2</sup>. Again, note the square dependence on the length of an RC wire. For the other limiting case where $R\to 0$ , the propagation delay is given by $\sqrt{LC}$ . Note the linear dependence on the length of the line. Inserting repeaters in an LC line only increases the total delay due to the added gate delay. Thus, an LC line requires zero repeaters to minimize the overall propagation delay. In the general case of an RLC line, the optimal repeater area for minimum propagation delay is between the maximum repeater area in the RC case and the zero repeater area in the LC case. The repeater area for minimum propagation delay of an RLC line decreases as the inductance effects increase due to the sub-quadratic dependence of the propagation delay on the length of the interconnect [25]. Hence, inserting repeaters based on an RC model and neglecting inductance results in a larger repeater area than necessary to achieve a minimum delay. The magnitude of the excess repeater area when using an RC model depends upon the relative magnitude of the inductance within the RLC tree. The reduced number of inserted repeaters also simplifies the layout and routing constraints. Finally, the smaller repeater area greatly reduces the power consumed by the repeaters. A more thorough analytical analysis of the effect of inductance on the repeater insertion process is described in [25]. #### C. Effects of Inductance on Power Dissipation Power consumption is an increasingly important design parameter with mobile systems and high performance, high complexity circuits such as leading edge microprocessors. If the frequency of switching is f cycles per second, the dynamic power consumption can be characterized by the well-known formula. $$P_{dyn} = C_t V_{DD}^2 f. (4)$$ The dynamic power depends only on the total load capacitance, the supply voltage, and the operating frequency. As discussed in subsection B, increasing inductance effects result in fewer number of repeaters as well as a smaller repeater size. The smaller size and number of repeaters significantly reduces the total capacitance of the repeaters and, consequently, reduces the total dynamic power consumption. Short-circuit power [26]-[28] results from the NMOS and PMOS blocks of a CMOS gate being on simultaneously during the rise and fall times of the input signal, creating a current path between the power supply and ground. As discussed in subsection A, the inductance reduces the rise time of the signals in an integrated circuit, thereby reducing the short-circuit power. To quantify this effect, consider the circuit configuration shown in Fig. 3. A fast input signal drives CMOS gate 1 which in turn drives an *RLC* transmission line. The output at the far end of the *RLC* transmission line is the input to the second gate $V_{\rm ac}$ . Gate 2 drives a capacitive load $C_{\rm ac}$ . The short-circuit energy consumed by gate 2 per cycle is plotted in Fig. 4. Note that as inductance effects increase, the short-circuit power decreases significantly due to the faster input rise time. Fig. 3. A CMOS gate driving another CMOS gate with an RLC transmission line connecting the two gates. The second gate drives a capacitive load. The effect of smaller repeater sizes on the short-circuit power consumption is significant. By decreasing the width of the transistors, the short-circuit current decreases because the output current of the NMOS and PMOS transistors is linearly proportional to the transistor width. Also, by decreasing the width of the transistors, the output transition time becomes slower which decreases the source-to-drain voltage across the transistor passing the short-circuit current. Since the output current is proportional to the source-to-drain voltage of a MOS transistor operating in the linear region, the short-circuit current is smaller with decreasing transistor size. Thus, decreasing the transistor size has a two fold effect on the short-circuit power. In general, the short-circuit power has a super linear dependence on gate size. AS/X [17] simulations of the short-circuit energy/cycle versus gate size of a CMOS gate driving a constant capacitance of 0.2 pF with a 100 ps input rise time are depicted in Fig. 5. The super linear behavior is evident in Fig. 5. Thus, the smaller repeater size and number significantly reduces the overall short-circuit power. As shown in [28], the short-circuit power consumption of a CMOS gate decreases as the inductance of the driven net becomes more significant. Fig. 4. Short-circuit energy consumed per cycle by gate 2 shown in Fig. 3 versus the inductance of the transmission line. The total resistance and capacitance of the line are maintained constant at $100 \Omega$ and 1 pF, respectively. Fig. 5. Simulations of the short-circuit energy consumed per cycle by a gate driving a load capacitance of 0.2 pF versus the gate width. The rise time of the input signal is 100 ps. ### III. Clock Distribution Network Example The clock distribution network significantly affects the performance of an integrated circuit and consumes a large portion of the total chip power (typically 20% to 40%) [29]-[31]. To demonstrate the concepts discussed in section II, an industrial clock distribution network is investigated. The integrated circuit has been designed based on a 0.18 µm IBM CMOS technology with copper interconnect. The supply voltage is 1.8 volts and the target frequency is 250 MHz. The integrated circuit is composed of four primary modules and several smaller modules. The clock distribution network at the top level is composed of a wide buffer that drives a four-node H-tree carrying the clock signal to the center of the four quadrants of the integrated circuit. At the center of each of the four quadrants, a local central wide buffer receives the clock signal and drives the local clock distribution network of each quadrant. Each local central buffer drives a clock tree connected to an average of 1350 sinks. At each sink, a final buffer (a CMOS inverter) receives the clock signal and drives the final group of flip flops. Each of the final buffers drives a capacitive load of approximately 250 fF. The structure of the local clock distribution network of this module is schematically depicted in Fig. 6. The top level and local clock distribution networks have been initially simulated with wires sized to satisfy the design constraints of the clock tree. The transition time is designed to be within 5% of the clock period at the input of the latches and the clock delay (or the phase delay) must be less than the clock period. AS/X [17] simulated waveforms at the input of the central buffer $V_{\text{man}}$ and at the inputs of the final buffers $V_{\text{man}}$ are shown in Fig. 7. The initial wire sizes result in degraded signal waveforms on the internal nodes of the clock distribution network. Note that the transition time of the signals illustrated in Fig. 7 is greater than one ns. The final buffers restore the signal rise time to 200 ps at the input of the local flip flops $V_{\text{outp}}$ . This faster rise time is necessary to maintain stable operation of the flip flops. Thus, the performance of the clock distribution network satisfies a target cycle period of 4 ns. Note also that this clock distribution network suffers no inductance effects, therefore an RC model can be used to accurately model the clock distribution network Fig. 6. Local clock distribution network of a primary quadrant of a large integrated circuit. Fig. 7. AS/X [17] simulations of the signals at the input of the central buffer $V_{\text{nump}}$ , at the input of the final buffers $V_{\text{outp}}$ , and at the output of the final buffers $V_{\text{outp}}$ for the local clock distribution network shown in Fig. 6 with narrow wires. The power consumption of the clock distribution network, however, is excessively high due to the slow signal transition times at the inputs of the central buffer and the final buffers. AS/X simulations of the dynamic and short-circuit power consumption of the central buffer are shown in Fig. 8. The current depicted in Fig. 8 is drawn from the supply voltage $V_{DD}$ through the PMOS network. When the output is pulled down, this current is the shortcircuit current. When the output is pulled high, the current drawn from the power supply is the sum of the short-circuit current (through the N-channel transistor) and the dynamic current charging the output capacitance. The energy diagram shown in Fig. 8 is the integration of the supply current multiplied by the supply voltage and represents the total energy consumed by the gate at any given time. Note in Fig. 8 that the short-circuit power is much higher than the dynamic power consumption, constituting about 80% of the total power consumption of the central buffer. This large amount of shortcircuit current directly contradicts the common conception that the shortcircuit power contributes less than 20% of the total power consumption [26], [27]. This 20% figure is typically true when the input and output rise times are close to each other. However, for this clock distribution network example, the The state of s input rise time is extremely slow (> 1 ns) and the final buffers provide sufficient current to enable small rise times of 200 ps at the inputs of the flip flops. The dynamic and short-circuit power consumption of the central buffer and the final buffers are listed separately in Table 3. Note again that the short-circuit power dominates the dynamic power for the final buffers. Table 3. Dynamic and short-circuit energy of the central buffer and the final buffers in the local clock distribution network depicted in Fig. 6 with narrow wires | | Dynamic power (pJ/cycle) | Short-circuit<br>power<br>(pJ/cycle) | Total power<br>(pJ/cycle) | |---------------------|--------------------------|--------------------------------------|---------------------------| | Central buffer | 68 | 133 | 201 | | Single final buffer | 0.71 | 1.86 | 2.57 | | All final buffers | 958 | 2511 | 3469 | | Local CDN (total) | 1026 | 2644 | 3670 | Fig. 8. AS/X [17] simulations of the dynamic current, short-circuit current, and energy of the central buffer in the local clock distribution network depicted in Fig. 6 with narrow wires. To decrease the power dissipated by the final buffers, the clock distribution network is rerouted with wires twice as wide as the original wires. The transition time of the signals at the input of the central buffer and the final buffers is below 200 ps. The short-circuit and dynamic power of the central buffer are shown in Fig. 9. Note that the dynamic power consumption of the central buffer has increased due to the increased capacitance of the wider wires driven by the central buffer. However, the faster input transition time has effectively eliminated the short-circuit power, reducing the total power consumption of the central buffer. The short-circuit power consumed by the final buffers is also virtually eliminated while the dynamic power remains constant since the load of the final buffers has not changed. The power consumption of the redesigned clock distribution network is compared to the power consumption of the original clock distribution network in Table 4. Note that the effects of inductance become significant with the wider wires in the clock distribution network, requiring that inductance be included in the interconnect model. This example illustrates that exploiting inductance can improve the performance of an integrated circuit and that penalties in transition time, propagation delay, and/or power consumption are incurred if these effects are eliminated. Table 4. The power consumption of the central buffer, the final buffers, and the clock distribution network shown in Fig. 6 when wider wires are used as compared to a narrow wire implementation. | Total power dissipation (pJ/cycle) | Old design<br>(narrow wires) | New design<br>(wider wires) | % power savings | |------------------------------------|------------------------------|-----------------------------|-----------------| | Central buffer | 201 | 137 | 31.8% | | All final buffers | 3469 | 1445 | 58.3% | | Local CDN | 3670 | 1582 | 56.9% | Fig. 9. AS/X [17] simulations of the dynamic current, short-circuit current, and energy of the central buffer in the local clock distribution network depicted in Fig. 6 with wider wires. #### **IV. Summary** It is shown in this paper that on-chip inductance can be exploited to improve the performance of high speed integrated circuits. Specifically, inductance improves the signal slew rate, dramatically reduces the short-circuit power consumption, and decreases the area of the active repeaters inserted to optimize the performance of long interconnects. These beneficial effects encourage design strategies that can exploit on-chip inductance. AS/X simulations of an industrial clock distribution network are presented to illustrate how inductance can be used to improve the performance of high speed integrated circuits. The power consumption of the clock distribution network decreases from 3670 pJ/cycle to 1582 pJ/cycle and the slew rate decreases from 1.2 ns to 200 ps on the internal nodes of the clock distribution network when wider, more inductive, wires are used. ## References - D. A. Priore, "Inductance on Silicon for Sub-Micron CMOS VLSI," Proceedings of the IEEE Symposium on VLSI Circuits, pp. 17-18, May 1993. - [2] D. B. Jarvis, "The Effects of Interconnections on High-Speed Logic Circuits," *IEEE Transactions on Electronic Computers*, Vol. EC-10, No. 4, pp. 476 - 487, October 1963. - [3] M. P. May, A. Taflove, and J. Baron, "FD-TD Modeling of Digital Signal Propagation in 3-D Circuits with Passive and Active Loads," *IEEE Transactions on Microwave Theory and Techniques*, Vol. MTT-42, No. 8, pp. 1514 - 1523, August 1994. - [4] Y. Eo and W. R. Eisenstadt, "High-Speed VLSI Interconnect Modeling Based on S-Parameter Measurement," *IEEE Transactions on Components, Hybrids, and Manufacturing Technology*, Vol. CHMT-16, No. 5, pp. 555 - 562, August 1993. - [5] A. Deutsch, et al., "High-Speed Signal Propagation on lossy transmission lines," IBM Journal of Research and Development, Vol. 34, No. 4, pp. 601 - 615, July 1990. - [6] A. Deutsch, et al., "Modeling and Characterization of Long Interconnections for High-Performance Microprocessors," IBM Journal of Research and Development, Vol. 39, No. 5, pp. 547 -667, September 1995. - [7] A. Deutsch, et al., "When are Transmission-Line Effects Important for On-Chip Interconnections?," *IEEE Transactions on Microwave Theory and Techniques*, Vol. 45, No. 10, pp. 1836 - 1846, October 1997 - [8] M. Shoji, High-Speed Digital Circuits, Addison Wesley, Massachusetts, 1996. - [9] A. Duetsch, A. Kopcsay, and G. V. Surovic, "Challenges Raised by Long On-Chip Wiring for CMOS Microprocessors," Proceedings of the IEEE Topical Meeting on Electrical Performance of Electronic Packaging, pp. 21 – 23, October 1995. - [10] Y. Massoud, S. Majors, T. Bustami, and J. White, "Layout Techniques for Minimizing On-Chip Interconnect Self Inductance," Proceedings of the IEEE/ACM Design Automation Conference, pp. 566-571, June 1998. - [11] B. Krauter and S. Mehrotra, "Layout Based Frequency Dependent Inductance and Resistance Extraction for On-Chip Interconnect Timing Analysis," Proceedings of the IEEE/ACM Design Automation Conference, pp. 303 – 308, June 1998. - [12] A. Duetsch, et al., "Design Guidelines for Short, Medium, and Long On-Chip Interconnect," Proceedings of the IEEE Topical Meeting on Electrical Performance of Electronic Packaging, pp. 30 – 32 October 1996 - [13] Y. I. Ismail, E. G. Friedman, and J. L. Neves, "Figures of Merit to Characterize the Importance of On-Chip Inductance," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 7, No. 4, pp. 442 – 449, December 1999. - [14] L. T. Pillage, "Coping with RC(L) Interconnect Design Headaches," Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 246 – 253, September 1995. - [15] J. Torres, "Advanced Copper Interconnections for Silicon CMOS Technologies," *Applied Surface Science*, Vol. 91, No. 1, pp. 112 -123, October 1995. - [16] P. J. Restle and A. Duetsch, "Designing the Best Clock Distribution Network," *Proceedings of the IEEE VLSI Circuit Symposium*, pp. 2 – 5. June 1998. - [17] AS/X User's Guide, IBM Corporation, New York, 1996. - [18] L. V. Ginneken, "Buffer Placement in Distributed RC-tree Networks for Minimal Elmore Delay," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 865 - 868, May 1990. - [19] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Analog Integrated Circuits and Signal Processing, Vol. 14, No. 1/2, pp. 29 - 39, September 1997. - [20] H. B. Bakoglu and J. D. Meindl, "Optimal Interconnection Circuits for VLSI," *IEEE Transactions on Electron Devices*, Vol. ED-32, No. 5, pp. 903 - 909, May 1985. - [21] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley Publishing Company, 1990. - [22] V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, Vol. CAS II-45, No. 5, pp. 607 - 616, May 1998. - [23] C. J. Alpert and A. Devgan, "Wire Segmenting for Improved Buffer Insertion," Proceedings of the IEEE/ACM Design Automation Conference, pp. 649-654, June 1997. - [24] S. Dhar and M. A. Franklin, "Optimum Buffer Circuits for Driving Long Uniform Lines," *IEEE Journal of Solid-State Circuits*, Vol. SC-26, No. 1, pp. 32 - 40, January 1991. - [25] Y. I. Ismail and E. G. Friedman, "Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits," Proceedings of the IEEE/ACM Design Automation Conference, pp. 721-724, June 1999. - [26] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on the Design of Buffer Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-19, No. 4, pp. 468 - 473, August 1984. - [27] S. R. Vemuru and N. Scheinberg "Short-Circuit Power Dissipation Estimation for CMOS Logic Gates," *IEEE Transactions on Circuits and Systems*, Vol. CAS-41, No. 11, pp. 762 - 765, November 1994. - [28] Y. I. Ismail, E. G. Friedman, and J. L. Neves, "Dynamic and Short-Circuit Power of CMOS Gates Driving Lossless Transmission Lines," *IEEE Transactions on Circuits and Systems 1: Fundamental Theory and Applications*, Vol. CAS-46, No. 8, pp. 950 961, August 1999. - [29] E. G. Friedman, Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, New Jersey, 1995. - [30] E. G. Friedman, High Performance Clock Distribution Networks, Kluwer Academic Publishers, Massachusetts, 1997. - [31] I. S. Kourtev and E. G. Friedman, Timing Optimization Through Clock Skew Scheduling, Kluwer Academic Publishers, Massachusetts, 2000. ASE SOM