# Low Power Repeaters Driving RC Interconnects with Delay and Bandwidth Constraints

Guoqing Chen and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester, Rochester, NY 14627 Email: guchen,friedman@ece.rochester.edu

Abstract—A repeater insertion methodology is presented for achieving the minimum power in an RC interconnect while satisfying delay and bandwidth constraints. With delay constraints, closed form solutions for the minimum power are developed which are within 10% of SPICE. With bandwidth constraints, the minimum power can be achieved with minimum sized repeaters.

#### I. INTRODUCTION

Repeater insertion is an efficient method for reducing interconnect delay. The optimal number and size of the repeaters to achieve the minimum delay of an *RC* interconnect has been described in [1]. The size of an optimal repeater is typically much larger than a minimum sized repeater. Since millions of repeaters will be inserted to drive global interconnects in future high complexity circuits [2], significant power will be consumed by these repeaters, particularly if delay-optimal repeaters are used. A powerdelay tradeoff is, therefore, necessary to support efficient repeater insertion methodologies [3].

The number and size of the repeaters are determined in [4] to minimize the total interconnect power and area while satisfying a target delay constraint. In [4], only dynamic power is considered. By including both short-circuit and leakage power, a more practical design methodology is presented in [5]. Closed form solutions, however, are not provided.

With on-chip signal frequencies continuously increasing, bandwidth has become another important criterion in interconnect design. In this paper, a new repeater insertion methodology is proposed. As compared with the literature, the contribution of this paper is

- A new and more accurate delay and transition time model is developed and applied in the repeater insertion methodology.
- Bandwidth as well as delay constraints are considered.

$$\overset{h}{\searrow} \overset{R_t/k, \ C_t/k}{\longrightarrow} \overset{h}{\longrightarrow} \cdots \overset{h}{\longrightarrow} \overset{R_t/k, \ C_t/k}{\longrightarrow} \overset{h}{\longrightarrow} \cdots$$

Fig. 1. Repeater insertion in an RC interconnect.

3) Closed form solutions for the minimum power are provided.

The paper is organized as follows. By including the effects of the input transition time, a new model of delay and transition time for RC interconnects with repeaters is presented in section II. In section III, the three primary power dissipation sources are reviewed. A more accurate model [6] is also adopted to analyze short-circuit power. In sections IV and V, analytic methods for achieving the minimum power are presented with delay constraints and bandwidth constraints, respectively. Finally, some conclusions are offered in section VI.

## II. DELAY AND TRANSITION TIME MODEL

As shown in Fig. 1, a distributed RC interconnect is evenly divided into k segments by uniform repeaters.  $R_t$ and  $C_t$  are the total resistance and capacitance of the interconnect. The repeaters are h times as large as a minimum sized repeater, with the output resistance  $R_{tr0}/h$ , output capacitance  $hC_{d0}$ , and input capacitance  $hC_{g0}$ , where  $R_{tr0}$ ,  $C_{d0}$ , and  $C_{g0}$  are the output resistance, output capacitance, and input capacitance of a minimum sized repeater, respectively.

In this paper, the repeater is implemented by a CMOS inverter. The inverter is assumed to be symmetric such that the effective output resistance is the same for both rising and falling signal transitions. The Berkeley predictive technology model (BPTM) [7] for a 45 nm printed channel length is used in this paper, which corresponds to the 80 nm technology node described in the ITRS [8]. Some model parameters are modified to capture the trends of the saturated drain current and subthreshold current predicted by the ITRS.

 $R_{tr0}$  can be approximated as

$$R_{tr0} = K \frac{V_{dd}}{I_{dn0}},\tag{1}$$

where K is a fitting parameter, and  $I_{dn0}$  is the saturated drain current of a minimum sized NMOS transistor with

This research was supported in part by the Semiconductor Research Corporation under Contract No. 2003-TJ-1068, the DARPA/ITO under AFRL Contract F29601-00-K-0182, the National Science Foundation under Contract No. CCR-0304574, the Fulbright Program under Grant No. 87481764, grants from the New York State Office of Science, Technology & Academic Research to the Center for Advanced Technology - Electronic Imaging Systems and to the Microelectronics Design Center, and by grants from Xerox Corporation, IBM Corporation, Intel Corporation, Lucent Technologies Corporation, and Eastman Kodak Company.

both  $V_{gs}$  and  $V_{ds}$  equal to  $V_{dd}$ . K can be determined by matching either the 50% delay or the transition time of the step response of an RC equivalent circuit to SPICE simulations. Note that the K obtained by matching the 50% delay and the K obtained by matching the transition time are different and are denoted here as  $K_d$  and  $K_r$ , respectively. The corresponding output resistances are  $R_{d0}$ and  $R_{r0}$ .

The delay  $t_{ds}$  and transition time  $t_{rs}$  of a single stage of the interconnect for a step input can be obtained from [9],

$$t_{ds} = 0.377 \frac{R_t C_t}{k^2} + 0.693 (R_{d0} C_0 + \frac{R_{d0} C_t}{hk} + \frac{R_t C_{g0} h}{k}),$$
(2)

$$t_{rs} = \frac{t_{90\%} - t_{10\%}}{0.8}$$
  
=  $1.1 \frac{R_t C_t}{k^2} + 2.75 (R_{r0} C_0 + \frac{R_{r0} C_t}{hk} + \frac{R_t C_{g0} h}{k}),(3)$ 

where  $C_0 = C_{d0} + C_{g0}$ . With a finite input slew rate, both the repeater delay [10] and the repeater output transition time [11] depend linearly on the input transition time  $Tr_{in}$ . The contribution of  $Tr_{in}$  to the repeater delay can be represented by  $\gamma Tr_{in}$ , where [10]

$$\gamma = \frac{1}{2} - \frac{1 - v_t}{1 + \alpha}.\tag{4}$$

In (4),  $\alpha$  is the velocity saturation index, and  $v_t$  is the normalized threshold voltage.

The transition time at the far end of an interconnect excited by a ramp input is difficult to model directly. For the interconnect system shown in Fig. 1, each stage is assumed to exhibit the same behavior. The input and output signal of a stage, therefore, exhibits the same transition time. The transition time in such a situation is denoted as  $t_r$ . The ratio between  $t_r$  and  $t_{rs}$  is represented by  $\beta$ , which is almost a constant for various interconnect parameters and repeater sizes. For the BPTM 45 nm technology,  $\beta$  is approximately 1.15. By including the effects of the input transition time, the delay and transition times of each stage are, respectively,

$$t_{d\_stage} = t_{ds} + \gamma t_r, \tag{5}$$

$$t_r = \beta t_{rs}.\tag{6}$$

The total delay of the interconnect is

$$T_{total} = a_1 \frac{R_t C_t}{k} + a_2 (R_0 C_0 k + \frac{R_0 C_t}{h} + R_t C_{g0} h), \quad (7)$$

where

$$a_1 = 0.377 + 1.1\gamma\beta,$$
 (8)

$$a_2 = 0.693 + 2.75\gamma\beta \tag{9}$$

$$R_0 = \frac{0.693R_{d0} + 2.75\gamma\beta R_{r0}}{a_2}.$$
 (10)

The total delay obtained by (7) as well as the model neglecting input transition time effects are compared with SPICE in Fig. 2 for different numbers of repeaters. As



Fig. 2. Total delay for an *RC* interconnect driven by repeaters.  $R = 0.31 \Omega/\mu m$ ,  $C = 0.223 \text{ ff}/\mu m$ , l = 5 mm, and h = 50.

shown in Fig. 2, neglecting the effects of the input transition time significantly underestimates the total delay. When more than four repeaters are inserted, the error of (7) is within 5% as compared with SPICE. By setting  $\partial T_{total}/\partial k$  and  $\partial T_{total}/\partial h$  to zero, the minimum delay can be obtained as

$$T_{opt} = 2\sqrt{a_1 a_2 R_t C_t R_0 C_0} \left(1 + \sqrt{\frac{a_2 C_{g0}}{a_1 C_0}}\right).$$
(11)

# **III. POWER DISSIPATION SOURCES**

There are three primary types of power dissipation mechanisms in digital CMOS circuits: dynamic power, shortcircuit power, and leakage power.

# A. Dynamic Power

Dynamic power is the power consumption due to charging and discharging the load capacitance. The total dynamic power dissipation in an interconnect is

$$P_d = \alpha_s f C_{total} V_{dd}^2 = \alpha_s f (C_t + khC_0) V_{dd}^2, \qquad (12)$$

where f is the clock frequency and  $\alpha_s$  is the switching factor which is assumed in this paper to be 0.15 [5].

### B. Short-Circuit Power

If the signal applied at the input of a CMOS inverter has a finite slew rate, a direct current path exists between  $V_{dd}$ and ground when the input signal switches between  $V_{tn}$ and  $V_{dd} + V_{tp}$ . The power consumed in this way is called short-circuit power. The total short-circuit power dissipated in an interconnect system can be obtained by adopting the model presented in [6],

$$P_{s} = \frac{4\alpha_{s} f I_{d0}^{2} t_{r}^{2} V_{dd} k h^{2}}{V_{dsat} G C_{stage} + 2 H I_{d0} t_{r} h}.$$
 (13)

In this expression, G and H are process related constants.  $V_{dsat}$  and  $I_{d0}$  are, respectively, the average saturated drain voltage and average saturated drain current of the NMOS and PMOS transistors in a minimum sized inverter.  $C_{stage}$ is the load capacitance in a single stage,

$$C_{stage} = C_0 h + \frac{C_t}{k}.$$
 (14)



Fig. 3. Repeater design space with different delay constraints.  $R = 0.31 \Omega/\mu m$ ,  $C = 0.223 \text{ fF}/\mu m$ , and l = 10 mm.

## C. Leakage Power

In deep submicrometer CMOS technologies, the dominant leakage current source is the subthreshold current and the gate leakage current [12]. The total leakage power dissipated in the repeaters is

$$P_l = hkV_{dd}(I_{sub0} + I_{g0}), (15)$$

where  $I_{sub0}$  is the average subthreshold current of the NMOS and PMOS transistors in a minimum sized repeater.  $I_{g0}$  is the average gate leakage current of a minimum sized repeater with low and high inputs. Since the subthreshold current increases rapidly with increasing temperature, a worst case temperature of 100 °C is assumed in this paper to emphasize the leakage power.

#### IV. POWER DISSIPATION WITH DELAY CONSTRAINTS

For a delay constraint  $T_{req}$  greater than  $T_{opt}$ , the design space of the repeaters is the area inside the closed curves shown in Fig. 3. An expression characterizing the design space edge is

$$T_{total} - T_{req} = 0. (16)$$

With  $T_{req}$  approaching  $T_{opt}$ , the design space converges to the delay optimal point  $(h_{opt}, k_{opt})$ . Note in Fig. 3 that the minimum h that can satisfy the delay constraint occurs when  $k = k_{opt}$ . Alternatively, the minimum k that can satisfy the delay requirement occurs when  $h = h_{opt}$ .

The total power dissipated by an RC interconnect with repeaters is the summation of the three kinds of power dissipation sources,

$$P_{total} = P_d + P_s + P_l. \tag{17}$$

In Fig. 4,  $P_{total}$  is plotted as a function of k and h. From (12) and (15),  $P_d$  and  $P_l$  increase linearly with increasing h for a fixed k.  $P_s$  can also be shown to increase monotonically with increasing h for a fixed k by verifying that  $\partial P_s / \partial h$  is always positive.  $P_{total}$ , therefore, increases monotonically with increasing h for a fixed k. This behavior is also illustrated in Fig. 4. For the design space shown in



Fig. 4. Total power dissipated by an RC interconnect with repeaters as a function of h and k. f = 5 GHz,  $R = 0.31 \Omega/\mu m$ , C = 0.223 fF/ $\mu m$ , and l = 10 mm.

Fig. 3, the minimum power can only be reached on the left edge.

The power dissipation on the edge of the design space is plotted as a function of h in Fig. 5(a). The dynamic and leakage power are plotted together since both of these power components depend linearly on kh. The minimum total power with delay constraints  $P_{m\_delay}$  can be obtained by solving  $dP_{total}/dh = 0$ . Note that on the edge of the design space, k is a function of h. In order to provide a closed form solution for  $P_{m\_delay}$ , the curve of  $P_d + P_l$ around the power-optimal point is approximated by a part of an ellipse, as shown in Fig. 5(b). The optimal design parameters  $(h_0, k_0)$  for minimizing  $P_d + P_l$  with a delay constraint  $T_{reg}$  can be solved by the Lagrange method [4],

$$k_0 = \frac{-b - \sqrt{b^2 - T_{req}^2 a_1 a_2 R_0 C_0 R_t C_t}}{T_{req} a_2 R_0 C_0}, \qquad (18)$$

$$h_0 = \frac{T_{req}k_0 - 2a_1 R_t C_t}{2a_2 R_t C_{a0} k_0},$$
(19)

$$b = a_2 R_0 R_t C_t (a_2 C_{g0} - a_1 C_0) - \frac{T_{req}^2}{4}.$$
 (20)

 $h_1$  is the minimum repeater size that can satisfy a target delay constraint, which can be obtained by inserting  $k_1 = k_{opt}$  into (16).  $P_0$  and  $P_1$  are the corresponding values of  $P_d + P_l$  at  $(h_0, k_0)$  and  $(h_1, k_1)$ , respectively. The curve of  $P_s$  is approximated by a linear function. With these approximations, the power-optimal repeater size  $h_p$  with a delay constraint can be obtained as

$$h_p = h_0 - \frac{x_0(h_0 - h_1)^2}{\sqrt{x_0^2(h_0 - h_1)^2 + (P_1 - P_0)^2}},$$
(21)

$$x_{0} = \frac{dP_{s}}{dh}\Big|_{h_{0}} \approx 4\alpha_{s} f I_{d0}^{2} V_{dd} k_{0}^{2} t_{r0}^{2} \\ \cdot \frac{2V_{dsat} G(C_{0}h_{0}k_{0} + C_{t})h_{0} + 2HI_{d0}k_{0}t_{r0}h_{0}^{2}}{\left[V_{dsat} G(C_{0}h_{0}k_{0} + C_{t}) + 2HI_{d0}k_{0}t_{r0}h_{0}\right]^{2}}, (22)$$

$$t_{r0} = t_{r}\Big|_{h_{0},k_{0}}.$$

The corresponding  $k_p$  can be solved by inserting  $h_p$  into



Fig. 5. Power dissipation with constant delay. f = 5 GHz,  $T_{req} = 1$  ns,  $R = 0.31 \Omega/\mu$ m, C = 0.223 fF/ $\mu$ m, and l = 10 mm.

(16). Upon obtaining  $h_p$  and  $k_p$ ,  $P_{m\_delay}$  can be obtained directly from (17). If  $k_p$  is not an integer, the nearest two integers are used to determine the minimum power ( $h_p$  will need to be re-calculated).

For different interconnect loads and delay constraints, results from the proposed method are compared with SPICE simulations in Table I. The minimum power obtained analyticly is within 10% of SPICE. In these simulations, the total power does not include the power dissipated by the buffer loading the interconnect.

# V. POWER DISSIPATION WITH BANDWIDTH CONSTRAINTS

The bandwidth of an interconnect is limited by the output signal transition time. Shorter signal transition times support a shorter signal bit period, therefore, a higher bandwidth. For a bandwidth constraint  $B_{req}$ , the signal transition time is assumed to be less than or equal to half the bit period, *e.g.*,  $t_r \leq 1/2B_{req}$ . The design space for different bandwidth constraints is shown in Fig. 6. The design space is the area in the upper-right side of the curve.

TABLE IMINIMUM POWER WITH DELAY CONSTRAINTS OBTAINED ANALYTICLYAS COMPARED WITH SPICE SIMULATIONS. f = 1 GHz.

| $R_{t}$     | $C_{t}$ | T <sub>req</sub> (ps) | SPICE |       |                                   | Analytic |       |                       |                  |
|-------------|---------|-----------------------|-------|-------|-----------------------------------|----------|-------|-----------------------|------------------|
| $(k\Omega)$ | (pF)    |                       | $k_p$ | $h_p$ | $P_{m\_delay} \ (\mu \mathrm{W})$ | $k_p$    | $h_p$ | $\frac{P_m}{(\mu W)}$ | delay<br>% Error |
| 1           | 1       | 400                   | 4     | 90    | 315.3                             | 4        | 84.5  | 332.5                 | 5.5              |
| 1           | 1       | 500                   | 3     | 59    | 267.8                             | 4        | 51.2  | 278.6                 | 4.0              |
| 2           | 2       | 800                   | 8     | 90    | 624.5                             | 9        | 80.7  | 663.7                 | 6.3              |
| 2           | 2       | 900                   | 7     | 69    | 558.5                             | 8        | 63.2  | 594.8                 | 6.5              |
| 2           | 2       | 1000                  | 7     | 56.5  | 528.3                             | 8        | 51.2  | 557.1                 | 5.5              |
| 3           | 1       | 700                   | 7     | 47.5  | 300.6                             | 7        | 47.1  | 328.2                 | 9.2              |
| 3           | 1       | 800                   | 7     | 36    | 274.9                             | 7        | 34.4  | 292.1                 | 6.2              |
| 3           | 1       | 900                   | 6     | 30.5  | 260.7                             | 6        | 28.7  | 273.0                 | 4.7              |
| 2           | 3       | 1000                  | 9     | 112   | 935.2                             | 10       | 96.9  | 973.0                 | 4.0              |
| 2           | 3       | 1200                  | 9     | 71.5  | 801.6                             | 9        | 66.8  | 844.5                 | 5.4              |
| 2           | 3       | 1400                  | 9     | 55    | 753 7                             | 9        | 50.7  | 785 5                 | 4.2              |



Fig. 6. Repeater design space with bandwidth constraints.  $R = 0.31 \,\Omega/\mu$ m,  $C = 0.223 \,\text{fF}/\mu$ m, and  $l = 10 \,\text{mm}$ .

An expression for the design space edge is

$$t_r = \frac{1}{2B_{reg}}.$$
(24)

From (24), k can be solved as a function of h,

$$k(h) = \frac{\sqrt{\tau_2^2 - 4.4\tau_1 R_t C_t} + \tau_2}{-2\tau_1},$$
(25)

where

$$\tau_1 = 2.75 R_{r0} C_0 - \frac{1}{2\beta B_{req}},\tag{26}$$

$$\tau_2 = 2.75 \left(\frac{R_{r0}C_t}{h} + R_t C_{g0}h\right). \tag{27}$$

In order for k to be a positive real number,  $\tau_1$  should be negative. An upper limit, therefore, is placed on the bandwidth by the process technology,

$$B_{req} \le \frac{1}{5.5\beta R_{r0} C_0}.$$
(28)

Similar to the delay-constraint case, the minimum power with a bandwidth constraint can only be reached at the edge of the design space.  $P_s$  can be rewritten as

$$P_{s} = \frac{4\alpha_{s}fI_{d0}^{2}t_{r}^{2}V_{dd}kh}{V_{dsat}G(C_{0} + \frac{C_{t}}{kh}) + 2HI_{d0}t_{r}}.$$
 (29)



(b) The 50% delay

Fig. 7. Power and 50% delay at the edge of the design space with bandwidth constraint.  $B_{req} = 1$  Gb/s,  $R = 0.31 \Omega/\mu m$ , C = 0.223 fF/ $\mu m$ , and l = 10 mm.

For a fixed  $t_r$ ,  $P_s$  increases monotonically with increasing kh. This relationship is also true for  $P_d$  and  $P_l$ . On the edge of the design space,  $t_r = 1/2B_{req}$ ; therefore, kh can be obtained from (25),

$$kh = \frac{\sqrt{(\tau_2 h)^2 - 4.4\tau_1 R_t C_t h^2} + \tau_2 h}{-2\tau_1}.$$
 (30)

From (30), kh increases monotonically with h (note that  $\tau_1$  is negative). The total power at the edge of the design space, therefore, increases monotonically with increasing h, as shown in Fig. 7(a). The minimum power satisfying the bandwidth constraint can be achieved with minimum sized repeaters. For minimum sized repeaters, the corresponding k and total delay, however, are unpractically large as shown in Figs. 6 and 7(b). In order to produce an effective repeater system, delay and area constraints should also be applied. The multiple constraints problem is not addressed here due to space limitations.

#### VI. CONCLUSIONS AND FUTURE WORK

In this paper, a repeater insertion methodology is presented for achieving the minimum power in RC interconnects while satisfying delay and bandwidth constraints. A delay and transition time model of RC interconnect with repeaters is introduced. An error of less than 5% as compared with SPICE is demonstrated for interconnects with more than four repeaters. The minimum power is achieved at the edge of the design space. With delay constraints, closed form solutions for the minimum power are developed which are within 10% of SPICE. With bandwidth constraints, the minimum power can be achieved with minimum sized repeaters.

Inductive effects are not included in this paper since interconnect inductance effects are not significant in narrow interconnects with technology scaling [5]. For wide interconnects, the proposed methodology, however, has to be refined to include the effects of inductance on the transition time, delay, and power [13], [14].

#### REFERENCES

- H. B. Bakoglu and J. D. Meindl, "Optimal Interconnection Circuits for VLSI," *IEEE Transactions on Electron Devices*, Vol. ED-32, No. 5, pp. 903-909, May 1985.
- [2] P. Kapur, G. Chandra, and K. C. Saraswat, "Power Estimation in Global Interconnects and Its Reduction Using a Novel Repeater Optimization Methodology," *Proceedings of the ACM/IEEE Design Automation Conference*, pp. 461-466, June 2002.
- [3] V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," *IEEE Transactions on Circuits* and System II: Analog and Digital Signal Processing, Vol. 45, No. 5, pp. 607-616, May 1998.
- [4] A. Nalamalpu and W. Burleson, "A Practical Approach to DSM Repeater Insertion: Satisfying Delay Constraints while Minimizing Area and Power," *Proceedings of the IEEE ASIC/SOC Conference*, pp. 152-156, September 2001.
- [5] K. Banerjee and A. Mehrotra, "A Power-Optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs," *IEEE Transactions on Electron Devices*, Vol. 49, No. 11, pp. 2001-2007, November 2002.
- [6] K. Nose and T. Sakurai, "Analysis and Future Trend of Short-Circuit Power," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 19, No. 9, pp. 1023-1030, September 2000.
- [7] Berkeley predictive technology model. [Online]. Available: http://www-device.eecs.berkeley.edu/~ptm
- [8] The International Technology Roadmap for Semiconductors. CA: Semiconductor Industry Association, 2003.
- [9] T. Sakurai, "Closed-Form Expressions for Interconnection Delay, Coupling, and Crosstalk in VLSI's," *IEEE Transactions on Electron Devices*, Vol. 40, No. 1, pp. 118-124, January 1993.
- [10] T. Sakurai and A. R. Newtion, "Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas," *IEEE Journal of Solid-State Circuits*, Vol. 25, No. 2, pp. 584-594, April 1990.
- [11] S. O. Nakagawa *et al.*, "On-Chip Crosstalk Noise Model for Deep-Submicrometer ULSI Interconnect," *The Hewlett-Packard Journal*, pp. 39-45, August 1998.
- [12] A. Ferré and J. Figueras, "Leakage Power Bounds in CMOS Digital Technologies," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 21, No. 6, pp. 731-738, June 2002.
- [13] Y. I. Ismail and E. G. Friedman, "Effects of Inductance on the Propagation Delay and Repeater Insertion in VLSI Circuits," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 8, No. 2, pp. 195-206, April 2000.
- [14] Y. Ismail, E. G. Friedman, and J. L. Neves, "Exploiting the On-Chip Inductance in High-Speed Clock Distribution Networks," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 9, No. 6, pp. 963-973, December 2001.