# A Repeater Timing Model and Insertion Algorithm to Reduce Delay in RC Tree Structures Victor Adler and Eby G. Friedman Department of Electrical Engineering University of Rochester Rochester, New York 14627 USA adler@ee.rochester.edu friedman@ee.rochester.edu #### Abstract One method of overcoming wire delay due to long resistive interconnect is to insert repeaters in the line. Analytical expressions describing a CMOS inverter driving an RC load have been integrated into a methodology for inserting repeaters in RC trees. These expressions are based on a short channel I-V model and exhibit less than 10% error. This repeater insertion methodology and its software implementation are described in this paper. ### I. INTRODUCTION Interconnect delay has become a dominant performance limitation in VLSI circuit design. A common method of driving long interconnect is to insert a buffer at the beginning and the end of the interconnect line to improve the delay and slew rate of the signal. This method, however, does not necessarily minimize the delay caused by the large resistance encountered in long lines. Bakoglu presents a methodology for inserting repeaters in a line to overcome the quadratic increase in delay due to a linear increase in interconnect length so that the RC interconnect impedance does not dominate the delay of a critical path [1]. Extensions to this repeater insertion methodology have also been reported in [2,3]. In [4], a buffer placement methodology based strictly on minimizing the Elmore delay is described. In this paper, the propagation delay and transition time characteristics of a system of CMOS repeaters driving an RC tree structure are analyzed. Expressions are presented which permit the development of a repeater design methodology for efficiently driving an RC tree structure, such as a clock distribution network, so as to reduce both the delay and the slew rate. In this methodology, the number and size of the repeaters to minimize the propagation delay and transition time are determined. The design expressions are based on an analytical expression derived from the $\alpha$ -power law model for short-channel CMOS devices [5]. The algorithm and software implementation of the proposed methodology are described in this paper. Furthermore, the efficacy of the proposed repeater insertion methodology is compared to the more standard buffer insertion methodology. This paper is organized as follows: in Section II, a methodology for determining an optimal repeater placement within an RC tree is presented. The repeater insertion algorithm is discussed in Section III. A comparison of the analytic model versus circuit simulation is presented in Section IV as well as a comparison of the efficiency of repeaters versus buffers in driving resistive interconnect. Finally some concluding comments are offered in Section V. # II. Analytical Delay Model for RC Trees An analytical model for determining the delay and placement of uniformly sized and spaced repeaters in RC trees based on Sakurai's $\alpha$ -power law is presented in this section [5–8]. This model assumes that the transistor operates in the linear region when driving an RC load since the linear region is the dominant region of operation when operating with fast input signals. The structure of an RC tree is composed of a primary trunk with branching points. Each branch is modeled as a lumped resistance and capacitance, exemplified by Fig. 1. The total path delay is from the signal input at the root of the trunk to each end point of the tree (or leaf node). The time required to drive a branch of an RC tree using uniform repeaters is $$t_{branch} = t_{first\ stage} + (n-2)t_{int.\ stage} + t_{final\ stage}$$ . (1) The first component $t_{first\ stage}$ is the time required for the output of the first repeater in a branch to reach the turn-on voltage of the second repeater. The $t_{int.\ stage}$ component describes the time required for each repeater between the first and last stage to transition from $V_{DD}+V_{TP}$ to $V_{TN}$ or vice versa. The last component, $t_{final\ stage}$ , is the time required to reach a given output voltage from either $V_{DD}+V_{TP}$ or This research was supported in part by the National Science Foundation under Grant No. MIP-9423886 and Grant No. MIP-9610108, the Army Research Office under Grant No. DAAH04-93-G-0323, a grant from the New York State Science and Technology Foundation to the Center for Advanced Technology-Electronic Imaging Systems, and by grants from Xerox, IBM, and Intel. $V_{TN}$ [7–9]. $t_{final\ stage}$ also considers the effect of the additional capacitance, $C_{branch}$ , of the downstream repeaters at a branching point. The components $t_{first\ stage}$ , $t_{int.\ stage}$ , and $t_{final\ stage}$ utilize an expression for the delay of a CMOS inverter reaching an output voltage $V_{out}$ given a step input [8], $$t_{out} = \frac{(1 + \mho_{do}R)(C_{rep/branch} + C_{int})}{\mho_{do}} \ln \left(\frac{V_{DD}}{V_{out}}\right) \ . \ \ (2)$$ $\mho_{do}$ is the saturation conductance, a device parameter from the $\alpha$ -power law model derived from $\frac{I_{do}}{V_{do}}$ . $I_{do}$ is the saturation current of the device when $V_{DS} = V_{DD}$ . $V_{do}$ is the voltage at which the device begins to operate in the saturation region [5, 6]. $C_{rep/branch}$ and $C_{int}$ are the capacitances of the following inverting repeater and the interstage load capacitance, respectively. A plot of $t_{branch}$ derived from (1) versus the size and number of repeater stages n in a branch is shown in Fig. 2 for $C_{rep}=0$ . The optimal implementation of a repeater system for a specific RC load in terms of the number and geometric size of each repeater is represented by the minimum point on the graph. A similar graph can be drawn for each RC branch. The optimal number of repeaters inserted within a branch to minimize the total delay is determined from a numerical solution of the data illustrated in Fig. 2. Each term in (1) is characterized by a step input to a single inverter driving an RC load to permit the solution of the delay time to be tractable. This permits the output waveform to be approximated by (2). The output waveform of the first stage is the input waveform of the following repeater assuming that the second repeater turns on quickly when its input threshold is reached. An example of this series of piecewise connections is shown in Fig. 3. The information describing the waveform shape permits a more accurate delay estimation as compared to estimating the path delay based on the classical Elmore delay [10]. Since the Elmore delay adds the products of a resistor (composed of the sum of the linearized model of a repeater and the interconnect resistance) and all of its downstream capacitors, the Elmore delay does not account for the interaction of a repeater with the RC interconnect nor does the Elmore delay consider the shape of the output signal waveform. Thus, by integrating a more accurate timing model of the CMOS repeater into the algorithm for inserting repeaters into an RC tree, a more efficient circuit implementation can be achieved. # III. Repeater Insertion Algorithm A local optimization method for repeater insertion into *RC* trees is presented here. With the assumption that each branch has a repeater at its source, Fig. 1. An example of an RC tree. Ordered triplets (i, j, k) are used to identify specific branches (note that the downstream nodes are to the right of the upstream nodes). Fig. 2. The total delay for a branch as a function of the number of repeaters and repeater sizes in a 0.8 $\mu m$ technology. $C_{rep}=0,~R=1~{\rm k}\Omega,~C=1~{\rm pF}.$ the minimum delay of each branch is determined initially. The total path delay from each leaf to the root is then minimized according to the expressions presented in Section II. The method for optimization is therefore depth first, in which the lowest level branches are optimized first followed by each upstream branch. Thus, the RC tree is optimized locally, terminating at the root of the RC tree. The program to perform this repeater insertion process requires information describing the RC characteristics and the number of sub-branches of each branch of the RC tree beginning at the root. This procedure continues until all the leaf nodes have zero branches, indicating that the lowest level of the hierarchy of the RC tree has been reached. The RC tree is constructed in this top-down fashion with every branch identified by a triplet (i,j,k). In this notation, i is the depth of the branch within the tree, j is the branch number with respect to its parent branch, and k is the branch number of the parent branch with respect to its parent branch. Alternatively, k is the grandparent of the current branch. Thus k of a branch at depth 3 is equal to j of the parent branch Fig. 3. The analytic and SPICE derived output waveforms of an 11-stage repeater chain driving an evenly distributed RC load of 1 K $\Omega$ and 1 pF. at depth 2. An example of this labeling is shown in Fig. 1. Once the tree has been constructed, it is traversed in a depth-first manner to determine the optimal repeater insertion for the final leaf nodes. When all of the branches of a parent have been optimized, the immediate upstream branch is optimized while considering the capacitance of the repeaters of the downstream branches according to the method described in Section II. In Fig. 1, the branches (3, 1, 1), (3, 2, 1), and (3, 3, 1) are downstream from branch (2, 1, 1). The pseudocode of the program is shown in Fig. 4. The first function, build\_RCtree, recursively builds each branch starting from the root and its subbranches based on the specific branch resistances and capacitances. The second function, insert\_repeater, is a double loop cycling through the number of repeaters and repeater sizes and computing the de- ``` (1) function build_RCtree(node); begin get R; get C; get number_of_branches; if (number_of_branches = 0) build_RCtree(branch); number_of_branches--; (2) function insert_repeater(tree); begin if (number_of_branches > 2) insert repeater(branch) for(width=1 to i) for(number_of_repeaters=1 to j) compute delay[width, number_repeaters]; number_of_branches- end ``` Fig. 4. The pseudocode of the repeater insertion algorithm. lay for each branch starting with the lowest level branches in the tree hierarchy. # IV. Accuracy of the Repeater Methodology The RC tree shown in Fig. 1 is illustrated again in Fig. 5 with the appropriate number and size of the repeaters inserted as determined by the repeater insertion program. The 90% delay $t_{total}$ from the input of an RC tree (the root of the trunk node) to the output leaf nodes of each branch versus the simulated delay values from SPICE is listed in Table I. Note that the typical error of the analytical prediction versus SPICE for this example RC tree is well under 10%. $\begin{array}{c} R_{int} = R_{total}/n \\ W_{p} = 19 \ \mu m \\ W_{p} = 57 \ \mu m \end{array} \begin{array}{c} (2,3,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} W_{p} = 9 \ \mu m \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,3) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m \end{array} \begin{array}{c} (3,1,1) \\ W_{p} = 27 \ \mu m$ Fig. 5. The RC tree shown in Fig. 1 synthesized by the repeater insertion system. The transistor widths are shown below the first repeater of each branch, and the number of repeaters per branch is shown inside the last repeater of each branch. TABLE I Comparison of the analytical expression vs. SPICE for the 90% delay of each branch driven by uniform repeaters. | Branch | $t_{total}$ (nS) | $t_{total}$ (nS) | % error | |---------|------------------|------------------|----------| | Dramen | Analytical | SPICE | ,0 01101 | | (3,1,1) | 1.77 | 1.70 | 4 | | (3,1,3) | 1.70 | 1.67 | 2 | | (2,1,1) | 1.51 | 1.54 | 2 | | (2,2,1) | 1.71 | 1.67 | 2 | | (2,3,1) | 1.51 | 1.48 | 2 | | (1,1,0) | 1.05 | 1.16 | 9 | A comparison is made here between the proposed repeater system and a typical system of cascaded buffers inserted at the source of each branch. The buffer system used for comparison is a series of optimally tapered buffers (assuming a tapering factor of three [11–13]) placed at the input of each branch so as to drive the capacitive load of each branch without considering the interconnect resistance. Waveforms at the final branch output of the repeater system and the optimally tapered buffer system are shown in Fig. 6. The performance improvement of the repeater system over the tapered buffer system for this example RC tree is in the range of 25% to 33%. The buffer system does not drive the highly resistive lines effectively, hence longer than expected propagation delays and slower rise times are generated, particularly for highly resistive branches such as branch (2,2,1). Fig. 6. The delay from the input of the RC tree to specific leaves of the tree using the proposed repeater system versus using optimally tapered buffers. Triplets indicate the leaf nodes as labeled in Fig. 5. #### V. Conclusions A design system for determining the optimal number and size of uniform repeaters inserted into an RC tree is presented. An accurate timing model which considers the shape of the waveform is also presented. Analytical estimates of the total propagation delay of an example RC tree with inserted repeaters agree within 10% of SPICE. A software program that implements the repeater insertion algorithm is also described. The algorithm locally minimizes the delay of each branch of an RC tree by inserting repeaters so as to reduce the delay from the input of the RC tree to the various leaf nodes. Delay improvements of 25% to 33% over a typical buffer insertion methodology are demonstrated. Thus a design system for accurately inserting repeaters into an RC tree is presented in this paper. Extensions to this repeater insertion capability include power and area minimization while simultaneously optimizing for delay. #### References - H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley Publishing Company, 1990. - [2] C. Y. Wu and M. Shiau, "Accurate Speed Improvement Techniques for RC Line and Tree Interconnections in CMOS VLSI," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2.1648-2.1651, May 1990. - [3] M. Nekili and Y. Savaria, "Optimal Methods of Driving Interconnections in VLSI Circuits," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 21-23, May 1992. - tems, pp. 21-23, May 1992. [4] L. P. P. P. van Ginneken, "Buffer Placement in Distributed RC-tree Networks for Minimal Elmore Delay," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 865-868, May 1990. - [5] T. Sakurai and A. R. Newton, "Alpha-Power Law MOS-FET Model and its Applications to CMOS Inverter Delay and Other Formulas," *IEEE Journal of Solid-State Circuits*, Vol. SC-25, No. 2, pp. 584-594, April 1990. - [6] V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.101-4.104, May 1996. - [7] V. Adler and E. G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2148-2151, June 1997. - cuits and Systems, pp. 2148-2151, June 1997. V. Adler and E. G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Analog Integrated Circuits for Signal Processing, Vol. 14, No. 1/2, pp. 29-40, September 1997. - [9] V. Adler and E. G. Friedman, "Repeater Insertion to Reduce Delay and Power in RC Tree Structures," Proceedings of the Asilomar Conference on Signals, Systems, and Computers, Navamber 1907. - and Computers, November 1997. W. C. Elmore, "The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers," Journal of Applied Physics, Vol. 19, No. 1, pp. 55-63, January 1948. - [11] R. C. Jaeger, "Comments on 'An Optimized Output Stage for MOS Integrated Circuits'," *IEEE Journal of Solid-State Circuits*, Vol. SC-10, No. 3, pp. 185-186, June 1975. - [12] B. S. Cherkauer and E. G. Friedman, "A Unified Design Methodology for CMOS Tapered Buffers," *IEEE Transactions on VLSI Systems*, Vol. VLSI-3, No. 1, pp. 99-111, March 1995. - [13] B. S. Cherkauer and E. G. Friedman, "Design of tapered Buffers with Local Interconnect Capacitance," IEEE Journal of Solid-State Circuits, Vol. SC-30, No. 2, pp. 151-155, February 1995.