## Repeater Insertion to Reduce Delay and Power in RC Tree Structures

Victor Adler and Eby G. Friedman Department of Electrical Engineering University of Rochester Rochester, New York 14627

Abstract—In large chips, the propagation delay of the data and clock signals is limited due to long resistive interconnect. The proper insertion of repeaters alleviates the quadratic increase in propagation delay with interconnect length while decreasing power dissipation by reducing short-circuit current. These repeaters are inserted within different types of common resistive interconnect structures, such as a line or a tree. In this paper, the application of repeaters to RC tree structures is discussed.

A tree topology is a common interconnect structure frequently found in VLSI circuits. A short-channel transistor model is used as a foundation for the development of delay and power expressions to develop a design methodology for inserting repeaters into an RC tree network. Power dissipation expressions for these repeater structures are presented which consider both dynamic and short-circuit power. These design expressions are validated against simulated experiments with a maximum 11% and 16% deviation from SPICE for delay and power, respectively.

#### I. Introduction

Interconnect delay has become a dominant performance limitation in VLSI circuit design. A common method of driving long interconnect is to insert a buffer at the beginning and the end of the interconnect line to improve the delay and slew rate of the signal. This method, however, does not produce the most effective delay results and may introduce a large amount of short-circuit power.

Bakoglu presented a methodology for inserting repeaters to overcome the quadratic increase in delay due to a linear increase in interconnect length so that the RC interconnect impedances do not dominate the delay of a critical path [1]. Extensions to this repeater insertion methodology have also been described in [2, 3]. In this paper, the propagation delay and transition time characteristics of a system of repeaters driving an RC tree structure are analyzed. Expressions are presented which permit the development of a repeater design methodology for efficiently driving an RC tree structure, such as a clock tree, so as to reduce both delay and slew rate. In this

methodology, the number and size of the repeaters to minimize propagation delay and transition time are determined. The design expressions are based on an analytical expression derived from the  $\alpha$ -power law model for short-channel devices [4, 5].

In addition to delay, the introduction of portable and massively parallel applications has made power an important criterion in the circuit design process. Thus, power consumption must be both accurately estimated and minimized when developing design techniques that decrease the delay of a signal propagating through a long resistive interconnect. The power dissipation of a repeater system inserted into an RC tree is therefore an important issue. Analytical expressions for dynamic and short-circuit power dissipation are applied to an example RC tree. Furthermore, the relative contribution of short-circuit power versus dynamic power is also discussed.

The paper is organized as follows: in Section II the time response for a repeater driving a lumped RC load and a method to determine signal delay through an RC tree is presented. A comparison of the analytic model versus circuit simulation is presented in Section III as well as a comparison of the efficiency of repeaters versus buffers in driving resistive interconnect. Power dissipation in RC trees is discussed in Section IV. Finally some concluding comments are offered in Section V.

## II. Analytical Delay Model for RC Trees

An analytical model for determining the delay and placement of uniformly sized and spaced repeaters in RC trees based on Sakurai's  $\alpha$ -power law is presented in this section [4–8]. This model assumes that the transistor operates in the linear region when driving an RC load since the linear region is the dominant region of operation when operating with fast input signals.

The structure of an RC tree is composed of a primary trunk with branching points. Each branch is modeled as a lumped resistance and capacitance, as shown in Fig. 1. The total delay is from the signal input of the trunk to each end point of the tree (or leaf node).

The method presented here for optimizing delay in an RC tree with uniform repeaters is performed in a stepwise fashion. With the assumption that each

This research was supported in part by the National Science Foundation under Grant No. MIP-9208165, Grant No. MIP-9423886, and Grant No. MIP-9610108, the Army Research Office under Grant No. DAAH04-93-G-0323, a grant from the New York State Science and Technology Foundation to the Center for Advanced Technology-Electronic Imaging Systems, and by grants from Xerox, IBM, and Intel.

branch has a repeater at its source, each branch is first optimized to minimize delay, and the path from the trunk to the leaf is then minimized. The method for optimization, therefore, is depth-first, in which the lowest level branches are optimized first, moving towards the trunk of the RC tree.

The time required to drive a branch of an RC tree using uniform repeaters is

$$t_{branch} = t_{first\ stage} + (n-2)t_{int.\ stage} + t_{final\ stage}$$
 . (1)

The first term  $t_{first\ stage}$  is the time required for the output of the first repeater to reach the turn-on voltage of the second repeater assuming the output voltage is initially at  $V_{DD}$  given a step input. The term  $t_{int.\ stage}$  describes the time required for each repeater between the first and last stage to transition from  $V_{DD} + V_{TP}$  to  $V_{TN}$  or vice versa. The time required for the output of the final repeater to reach either 10%, 90%, or 50% of  $V_{DD}$  from one threshold voltage is described by the third component of (1),  $t_{final\ stage}$  [7].

The terms  $t_{first\ stage}$ ,  $t_{int.\ stage}$ , and  $t_{final\ stage}$  are based on the expression for the delay of a CMOS inverter reaching an output voltage  $V_{out}$ ,

$$t_{out} = \frac{(1 + \mho_{do}R)(C_{rep} + C_{int})}{\mho_{do}} \ln \left(\frac{V_{DD}}{V_{out}}\right) \ . \ \ (2)$$

 $\mho_{do}$  is the saturation conductance, a device parameter from the  $\alpha$ -power law model derived from  $\frac{I_{do}}{V_{do}}$ .  $I_{do}$  is the saturation current of the device when  $V_{DS} = V_{DD}$ .  $V_{do}$  is the voltage at which the device begins to operate in the saturation region [4, 6].  $C_{rep}$  and  $C_{int}$  are the capacitances of the following inverting repeater and the interstage load capacitance, respectively.

A plot of  $t_{branch}$  derived from (1) versus the size and number of repeater stages n in a branch is shown in Fig. 2. The optimal implementation of the number and size of the repeater system for a specific RC load is represented by the minimum point on the graph. A similar graph can be drawn for any RC branch load. The optimal number of repeaters inserted within a branch to minimize delay is determined from a numerical solution of the data illustrated in Fig. 2.

Because the repeater insertion for each branch is determined recursively in a depth-first manner, if a branch has any sub-branches, the repeater insertion for these sub-branches is first determined. This depth-first method accounts for the load capacitance that the last repeater of an upstream node must drive due to the capacitance of the repeaters of the downstream branches (in Fig. 1, branch 1 is downstream from branch 4).



Fig. 1. An example of an RC tree. Circled numbers are used to identify specific branches (note that the downstream nodes are to the right of the upstream nodes).



Fig. 2. Total delay for an  $R=1~{\rm k}\Omega,~C=1~{\rm pF}$  branch as a function of the number of repeaters and repeater sizes.

## III. Repeater Insertion and Comparison to Tapered Buffers

The RC tree shown in Fig. 1 is shown in Fig. 3 with the appropriate repeaters inserted. The total delay  $t_{total}$  from the input of an RC tree (the root of the trunk node) to the output leaf nodes of each branch versus the numerically derived delay values from SPICE are shown in Table I. Note that the error of the analytical prediction versus SPICE for this example RC tree ranges from zero to a maximum of 11%.

A comparison is made here between the proposed repeater system and a typical system of cascaded buffers inserted at the source of each branch. The buffer system used for comparison is a series of optimally tapered buffers (assuming a tapering factor of three [9–11]) placed at the input of each branching section so as to drive the capacitive load of each branch.

TABLE I

COMPARISON OF THE ANALYTICAL EXPRESSION VS. SPICE
FOR THE TOTAL DELAY OF EACH BRANCH.

| Branch | $t_{total}$ (nS)<br>Analytical | $t_{total}$ (nS)<br>SPICE | % error |
|--------|--------------------------------|---------------------------|---------|
| 1      | 1.99                           | 2.00                      | 0       |
| 2      | 1.98                           | 1.95                      | 2       |
| 3      | 2.11                           | 2.05                      | 3       |
| 4      | 1.98                           | 1.78                      | 11      |
| 5      | 1.24                           | 1.31                      | 5       |
| 6      | 1.77                           | 1.72                      | 3       |

The waveforms at the final branch output of the repeater system and the optimally tapered buffer system are shown in Fig. 4. The performance improvement of the repeater system over the tapered buffer system for this example RC tree is in the range of 30 to 50%. The buffer system does not drive the highly resistive lines effectively, hence longer than expected propagation delays and slower rise times are generated, particularly for branch 2.

# IV. Power Dissipation of Repeaters in RCTrees

Transient power dissipation in repeaters can be broken into two components: the dynamic power dissipated by switching the capacitance of the interconnect and the repeaters and the short-circuit power dissipated when an input signal simultaneously turns on both the P-channel and N-channel transistors [12]. Both of these power components are examined in this section.



Fig. 4. The delay from the input of the RC tree to the final branch outputs using the proposed repeater system versus using optimally tapered buffers. Circled numbers indicate the leaf nodes shown in Fig. 3.

The dynamic power dissipation is quantified by

$$CV^2f,$$
 (3)

where V is the voltage to which the capacitance is switched, typically  $V_{DD}$ , f is the frequency of the switching activity, and C is the total capacitance being charged and discharged. In the case of a repeater system driving an RC tree, C is the sum of the capacitance of the RC tree plus the sum of the gate and active diffusion capacitances of the transistors within the repeater system.

An expression for the short-circuit power of a CMOS inverter can be approximated by [8]

$$P_{SC} = \frac{1}{2} I_{peak} t_{base} V_{DD} f \quad . \tag{4}$$

 $I_{peak}$  is the maximum short-circuit current sourced by the inverter.  $t_{base}$  is the time that the input wave-



Fig. 3. The RC tree shown in Fig. 1 implemented with the synthesized repeater system. The transistor widths are shown below the first repeater of each branch, and the number of repeaters per branch is shown inside the last repeater of the branch.

form is switching from the threshold voltage of the P-channel transistor to the threshold voltage of the N-channel transistor and is [8]

$$t_{base} = \left| \ln\left(\frac{V_{TN}}{V_{DD} + V_{TP}}\right) \right| \frac{C + \mathcal{V}_{do}RC}{\mathcal{V}_{do}}.$$
 (5)

Therefore, the short-circuit power is

$$P_{SC} = \left| \ln\left(\frac{V_{TN}}{V_{DD} + V_{TP}}\right) \right| \frac{C + \mathcal{U}_{do}RC}{\mathcal{U}_{do}} I_{peak} f V_{DD} . \tag{6}$$

The value of  $I_{peak}$  is based on (12) from [13] and is

$$I_{DSAT} \left(2 - \frac{V_{DD} - V_O(t_{INV})}{V_{DSAT_p}}\right) \left(\frac{V_{DD} - V_O(t_{INV})}{V_{DSAT_p}}\right). \quad (7)$$

 $I_{DSAT}$  is the saturation current at the saturation voltage  $V_{DSAT}$ .  $V_O(t_{INV})$  is the output voltage when the input reaches the logic threshold voltage  $V_{INV}$ . In a uniform repeater structure, the short-circuit power in each repeater stage within a branch is the same because the transition times of the waveforms between each repeater are designed to be equal.

The total power dissipated by the *RC* tree with repeaters as shown in Fig. 3 is 30.3 mW at a frequency of 100 MHz (as compared to a simulated 36.3 mW). The analytical model of the power dissipation is based on a total switched capacitance of 11.64 pF. A dynamic power of 29 mW and a short-circuit power of 1.3 mW make up the total power that is dissipated. In this example, the short-circuit power is 4.5% of the dynamic power. However, the relative contribution of short-circuit power to the total transient power is dependent on the number of repeaters in each *RC* branch.

A comparison of short-circuit power versus dynamic power for an RC load of 1 K $\Omega$  and 1 pF is shown in Figure 5. Both the short-circuit power and the dynamic power dissipated within the repeater chain versus the number of repeaters are shown. For the larger sized repeater, the peak short-circuit power is about 30% of the dynamic power at two stages; at five stages the short-circuit power is 12% of the dynamic power; and at nine stages, about 5%.

### V. Conclusions

Design expressions for determining the optimum number and size of uniform repeaters inserted into an RC tree are presented. Analytical estimates of the total propagation delay for an example tree are within 11% of SPICE. The power dissipation of the inserted repeaters is examined. The analytically derived estimate of the sum of the dynamic and short-circuit power is within 16% of the total power dissipation derived from SPICE. Thus, this paper presents an accurate methodology for inserting repeaters into RC trees that minimizes path delay and power dissipation.



Fig. 5. The short-circuit and dynamic power dissipation versus the number of stages in a repeater system.

#### REFERENCES

- H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley Publishing Company, 1990.
- [2] C. Y. Wu and M. Shiau, "Accurate Speed Improvement Techniques for RC Line and Tree Interconnections in CMOS VLSI," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2.1648-2.1651, May 1990.
- [3] M. Nekili and Y. Savaria. "Optimal Methods of Driving Interconnections in VLSI Circuits," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 21-23, May 1992.
- [4] T. Sakurai and A. R. Newton, "Alpha-Power Law MOS-FET Model and its Applications to CMOS Inverter Delay and Other Formulas," *IEEE Journal of Solid-State Circuits*, Vol. SC-25, No. 2, pp. 584-594, April 1990.
- [5] T. Sakurai and A. R. Newton, "A Simple Short-Channel MOSFET Model and its Application to Delay Analysis of Inverters and Series-Connected MOSFETs." Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 105-108, May 1990.
- [6] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.101-4.104, May 1996.
- [7] V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2148-2151, June 1997.
- cuits and Systems, pp. 2148-2151, June 1997.

  [8] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Analog Integrated Circuits for Signal Processing, Vol. 14, No. 1/2, pp. 29-40, September 1997.
- Vol. 14, No. 1/2, pp. 29-40, September 1997.

  [9] R. C. Jaeger, "Comments on 'An Optimized Output Stage for MOS Integrated Circuits'," *IEEE Journal of Solid-State Circuits*, Vol. SC-10, No. 3, pp. 185-186, June 1975.
- [10] B. S. Cherkauer and E. G. Friedman, "A Unified Design Methodology for CMOS Tapered Buffers," *IEEE Trans*actions on VLSI Systems, Vol. VLSI-3, No. 1, pp. 99-111, March 1995.
- [11] B. S. Cherkauer and E. G. Friedman, "Design of tapered Buffers with Local Interconnect Capacitance," *IEEE Journal of Solid-State Circuits*, Vol. SC-30, No. 2, pp. 151-155, February 1995.
- [12] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC– 19, No. 4, pp. 468–473, August 1984.
- [13] A. Hirata, H. Onodera, and K. Tamaru, "Estimation of Short-Circuit Power Dissipation for Static CMOS Gates," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences. Vol. E79-A, No. 3, pp. 304-311, March 1996.