# Buffered Clock Tree Synthesis with Non-Zero Clock Skew Scheduling for Increased Tolerance to Process Parameter Variations\*

JOSÉ LUIS NEVES AND EBY G. FRIEDMAN

Department of Electrical Engineering University of Rochester

Rochester, NY 14618

Received August 15, 1996; Revised November 20, 1996

Abstract. An integrated top-down design system is presented in this paper for synthesizing clock distribution networks for application to synchronous digital systems. The timing behavior of a synchronous digital circuit is obtained from the register transfer level description of the circuit, and used to determine a non-zero clock skew schedule which reduces the clock period as compared to zero skew-based approaches. Concurrently, the *permissible range* of clock skew for each local data path is calculated to determine the maximum allowed variation of the scheduled clock skew such that no synchronization failures occur. The choice of clock skew values considers several design objectives, such as minimizing the effects of process parameter variations, imposing a zero clock skew constraint among the input and output registers, and constraining the permissible range of each local data path to a minimum value.

The clock skew schedule and the worst case variation of the primary process parameters are used to determine the hierarchical topology of the clock distribution network, defining the number of levels and branches of the clock tree and the delay associated with each branch. The delay of each branch of the clock tree is physically implemented with distributed buffers targeted in CMOS technology using a circuit model that integrates short-channel devices with the signal waveform shape and the characteristics of the clock tree interconnect. A bottom-up approach for calculating the worst case variation of the clock skew due to process parameter variations is integrated with the top-down synthesis system. Thus, the local clock skews and a clock distribution network are obtained which are more tolerant to process parameter variations.

This methodology and related algorithms have been demonstrated on several MCNC/ISCAS-89 benchmark circuits. Increases in system-wide clock frequency of up to 43% as compared with zero clock skew implementations are shown. Furthermore, examples of clock distribution networks that exploit intentional localized clock skew are presented which are tolerant to process parameter variations with worst case clock skew variations of up to 30%.

#### 1. Introduction

Most existing digital systems utilize fully synchronous timing, requiring a reference signal to control the temporal sequence of operations. Globally distributed

\*This research is based upon work supported by Grant 200484/89.3 from CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico—Brasil), the National Science Foundation under Grant No. MIP-9208165 and Grant No. MIP-9423886, the Army Research Office under Grant No. DAAH04-93-G-0323, and by a grant from the Xerox Corporation.

signals, such as clock signals, are used to provide this synchronous time reference. These signals can dominate and limit the performance of VLSI-based digital systems. The importance of these global signals is, in part, due to the continuing reduction of feature size concurrent with increasing chip dimensions. Thus interconnect delay has become increasingly significant, perhaps of greater importance than active device delay. The increased global interconnect delay also leads to significant differences in clock signal propagation within the clock distribution network, called *clock skew*, which occurs when the clock signals arrive at the

storage elements at different times. The clock skew can be further increased by unintentional factors such as process parameter variations which may limit the maximum frequency of operation, as well as create race conditions independent of clock frequency, leading to circuit failure. Therefore, the design of high performance, process tolerant clock distribution networks is a critical phase in the synthesis of synchronous VLSI digital circuits. Furthermore, the design of the clock distribution network, particularly in high speed applications, requires significant amounts of time, inconsistent with the high turnaround in the design of the more common data flow elements of digital VLSI circuits.

Several techniques have been developed to improve the performance and design efficiency of clock distribution networks, such as placing distributed buffers within clock tree layouts [1] to control the propagation delay and power consumption characteristics of the clock distribution networks, resizing clock nets for speed optimization and clock path delay balancing [2, 3], perform simultaneous buffer and interconnect sizing to optimize for speed and reduce power dissipation [4], using symmetric distribution

networks, such as H-tree structures [5], to minimize clock skew, and applying zero-skew clock routing algorithms [e.g., 6, 7] to the automated layout of high speed clock distribution networks in cell-based circuits. Effort has also been placed on reducing clock skew due to process variations [e.g., 8-10], and on designing clock distribution networks so as to ensure minimal variation in clock skew [1, 7]. Alternative approaches have been developed for using intentional non-zero clock skew to improve circuit performance and reliability by properly choosing the local clock skews [10–12]. Targeting non-zero local clock skew, a synthesis methodology has been developed for designing clock distribution networks capable of accurately producing specific clock path delays [13, 14]. These clock distribution networks exploit intentional localized clock skew while taking into account the effects of process parameter variations on the clock path delays.

A design environment is presented in this paper for efficiently synthesizing distributed buffer, tree-structured clock distribution networks. This methodology is illustrated in terms of the IC design process cycle in Fig. 1. The IC design cycle typically begins with



Figure 1. Block diagram of the clock tree design cycle integrated with standard IC design flow.

the System Specification phase. The Clock Tree Design Cycle utilizes timing information from the Logic Design phase, such as the minimum and maximum delay values of the logic blocks and the registers. The timing information is used to determine the maximum frequency of operation of the circuit, the non-zero clock skew schedule, the permissible range of clock skew between any pair of sequentially adjacent registers, and the minimum clock path delay to each register. The topology of the clock tree is designed to enforce the clock skew schedule. The delay of each clock path is accurately implemented using repeaters targeting CMOS technology. Finally, the clock tree is validated by ensuring that the worst case clock path delays caused by process parameter variations do not create clock skew values outside the allowed permissible range of each pair of sequentially adjacent registers. Process parameter information is extensively used in several stages of the design environment for ensuring the accuracy of the clock tree. The output of the Clock Tree Design Cycle is a detailed circuit description of the clock distribution network, including the number and geometric size of each buffer stage within each branch of the clock tree.

This paper is organized as follows: in Section 2, a localized clock skew schedule is derived from the effective permissible range of the clock skew for each local data path considering any global clock skew constraints and process parameter variations. In Section 3, a topology of the clock distribution network is obtained, producing a clock tree with specific delay values assigned to each branch. The design of circuit structures for implementing the individual branch delay values is summarized in Section 4. In Section 5, techniques for compensating the scheduled local clock skew values to process-dependent clock path delay variations are presented. In Section 6, these results are evaluated on a series of circuits, thereby demonstrating performance improvements and immunity to process parameter variations. Finally, some conclusions are drawn in Section 7.

#### 2. Optimal Clock Skew Scheduling

A synchronous digital circuit C can be modeled as a finite directed multi-graph G(V, E). Each vertex in the graph,  $v_j \in V$ , is associated with a register, circuit input, or circuit output. Each edge in the graph,  $e_{ij} \in E$ , represents a physical connection between vertices  $v_i$  and  $v_j$ , with an optional combinational logic path between the two vertices. An edge

is a bi-weighted connection representing the maximum (minimum) propagation delay  $T_{\rm PD\,max}$  ( $T_{\rm PD\,min}$ ) between two sequentially adjacent storage elements. The propagation delay  $T_{\rm PD}$  includes the register, logic, and interconnect delays of a local data path [13], as described in (1),

$$T_{\rm PD} = T_{\rm C-Q} + T_{\rm Logic} + T_{\rm Int} + T_{\rm Set-up}, \qquad (1)$$

where  $T_{C-Q}$  is the time required for the data to leave  $R_i$  once it is triggered by a clock pulse  $C_i$ ,  $T_{Logic}$  is the propagation delay through the logic block between registers  $R_i$  and  $R_j$ ,  $T_{Int}$  accounts for the interconnect delay, and  $T_{Set-up}$  is the time to successfully propagate to and latch the data within  $R_j$  [15].

A local data path  $L_{ij}$  is a set of two vertices connected by an edge,  $L_{ij} = \{v_i, e_{ij}, v_j\}$  for any  $v_i, v_j \in V$ . A global data path,  $P_{kl} = v_k \stackrel{l}{\to} v_l$ , is a set of alternating edges and vertices  $\{v_k, e_{kl}, v_1, e_{12}, \ldots, e_{n-1l}, v_l\}$ , representing a physical connection between vertices  $v_k$  and  $v_l$ , respectively. A multi-input circuit can be modeled as a single input graph, where each input is connected to vertex  $v_0$  by a zero-weighted edge.  $Pl(L_{ij})$  is defined as the permissible range of a local data path and  $Pg(P_{kl})$  is the permissible range of a global data path.

#### 2.1. Timing Constraints

The timing behavior of a circuit C can be described in terms of two sets of timing constraints, local constraints and global constraints. The local constraints are designed to ensure the correct latching of data into the registers of a local data path. In particular, (2) prevents latching the incorrect data signal into  $R_i$  (preventing double clocking [10, 11]),

$$T_{\text{Skew}}(L_{ii}) \ge T_{\text{Hold }i} - T_{\text{PD(min)}} + \zeta_{ii},$$
 (2)

where  $\zeta_{ij}$  is a safety term to provide some margin in a local data path against race conditions due to process parameter variations, and (3) guarantees that the data signal latched in  $R_i$  is latched into  $R_j$  by the following clock pulse (preventing zero clocking [10, 11]),

$$T_{\text{Skew}}(L_{ii}) \le T_{\text{CP}} - T_{\text{PD(max)}}.$$
 (3)

Constraints (2) and (3) are similar to the synchronous constraints introduced in [11, 12, 16, 17], where



Figure 2. Permissible range of the clock skew of a local data path.

the clock skew  $T_{\text{Skew }ij} = T_{\text{CD}i} - T_{\text{CD}j}$  and where  $T_{\text{CD}i}(T_{\text{CD}j})$  is the delay of the Ith (Jth) clock path.

Assuming that the minimum and maximum delay of each combinational logic block and register are known, a region of valid clock skew is assigned to each local data path, called the permissible range  $Pl(L_{ij})$  [13, 18], as shown in Fig. 2. The bounds of  $Pl(L_{ij})$  are determined from the local constraints, (3) and (4), for a given clock period  $T_{CP}$ . Also, the width of a permissible range is defined as the difference between the maximum ( $T_{Skew ij(max)}$ ) and the minimum ( $T_{Skew ij(min)}$ ) clock skew.

Satisfying the clock skew constraints of each individual local data path does not guarantee that the clock skew between two vertices of a global data path  $P_{kl}$  is satisfied, particularly when there are multiple parallel and feedback paths between the two vertices. Since any two registers connected by more than one global data path are each driven by a single clock path, the clock skew between these two registers is unique and the permissible range of every path connecting the two registers must contain this clock skew value to ensure that the circuit will operate correctly. As an example to illustrate that the clock skew between registers must be contained within the permissible range of each global data path connecting both registers, consider the circuit illustrated in Fig. 3, where the numbers assigned to the edges are the maximum and minimum propagation delay of each local data path  $L_{ij}$ , and the register set-up and hold times are assumed to be zero. Furthermore, the pair of clock skew values associated with a vertex are the minimum and maximum clock skew calculated with respect to the origin vertex  $v_0$ for a given clock period. The minimum bound of



Figure 3. Matching permissible clock skew ranges by adjusting the clock period  $T_{\rm CP}$ .

 $Pl(L_{ij})$  is given by (3) and is  $T_{\text{Skew }ij(\text{min})} = -T_{\text{PD min}}$  and the maximum bound of  $Pl(L_{ij})$  is given by (4) and is  $T_{\text{Skew }ij(\text{max})} = T_{\text{CP}} - T_{\text{PD max}}$ . Observe that in Fig. 3(a), a non-empty permissible range for each individual local data path can is obtained with a clock period  $T_{\text{CP}} = 6$  time units (tu). However, no clock skew value exists that is common to the paths connecting vertices  $v_1$  and  $v_3$ . The common value for  $T_{\text{Skew13}}$  is only obtained when the clock period is increased to 8 tu.

To guarantee that a clock skew value exists for any pair of registers  $v_k$ ,  $v_l \in V$  within a global data path, a set of global timing constraints must be satisfied. Complete proofs of the following theorems are found in [19]. The global timing constraints (4) and (5) are used to calculate the permissible range of any global data path  $P_{kl} \in V$ , and are based on the permissible range of the local data paths within the respective global data path. In particular, (4) determines the minimum and maximum clock skew of a global data path with respect to  $v_k$ , while (5) constrains the clock skew of two vertices connected by multiple forward and feedback paths. These two constraints can be formally stated as:

**Theorem 1.** For any global data path  $P_{kl} \in V$ , clock skew is conserved. Alternatively, the clock skew between any two storage elements,  $v_k, v_l \in V$ , is the sum of the clock skews of each local data path  $L_{k1}, L_{12}, \ldots, L_{n-1l}$ , where  $L_{k1}, L_{12}, \ldots, L_{n-1l}$  are the local data paths within  $P_{kl}$ ,

$$T_{\text{Skew}}(P_{kl}) = T_{\text{Skew}}(L_{k1}) + T_{\text{Skew}}(L_{12}) + \cdots + T_{\text{Skew}}(L_{l-1l}).$$
 (4)

**Theorem 2.** For any global data path  $P_{kl}$  containing feedback paths, the clock skew in a feedback path between any two storage elements, say  $v_m$  and  $v_n \in P_{kl}$ , is the negative of the clock skew between  $v_m$  and  $v_n$  in the forward path,

$$T_{\text{Skew}}(P_{kl}) = -T_{\text{Skew}}(P_{lk}). \tag{5}$$

In the presence of multiple parallel and/or feedback paths connecting any two registers  $R_k$  and  $R_l$ , a permissible range only exists between these two registers if there is overlap among the permissible ranges of each individual parallel and feedback path connecting both registers. Furthermore, the upper and lower bounds of such a permissible range are determined from the upper and lower bounds of the permissible ranges of each individual parallel and feedback path. Formally, the concept of permissible range overlap and the upper and lower bounds of the permissible range of a global data path  $P_{kl}$  can be stated as follows:

**Theorem 3.** Let  $P_{kl} \in V$  be a global data path within a circuit C with m parallel and n feedback paths. Let the two vertices,  $v_k$  and  $v_l \in P_{kl}$ , which are not necessarily sequentially-adjacent, be the origin and destination of the m parallel and n feedback paths, respectively. Also, let  $Pg(P_{kl})$  be the permissible range of the global data path composed of vertices  $v_k$  and

 $v_l$ .  $Pg(P_{kl})$  is a non-empty set of values iff the intersection of the permissible ranges of each individual parallel and feedback path is a non-empty set, or

$$Pg(P_{kl}) = \left(\bigcap_{i=1}^{m} Pg(P_{kl}^{i})\right) \cap \left(\bigcap_{j=1}^{n} Pg(P_{lk}^{j})\right) \neq \emptyset.$$
(6)

**Theorem 4.** Let the two vertices,  $v_k$  and  $v_l \in P_{kl}$ , be the origin and destination of a global data path with m forward and n feedback paths. If  $Pg(P_{kl}) \neq \emptyset$ , the upper bound of  $Pg(P_{kl})$  is given by

$$T_{\text{Skew}}(P_{kl})_{\text{max}} = \text{MIN} \left\{ \min_{1 \le i \le m} \left[ T_{\text{Skew}}(P_{kl}^{i})_{\text{max}} \right], \\ \min_{1 \le j \le n} \left[ T_{\text{Skew}}(P_{lk}^{j})_{\text{min}} \right] \right\}, \quad (7)$$

and the lower bound of  $Pg(P_{kl})$  is given by

$$T_{\text{Skew}}(P_{kl})_{\min} = \text{MAX} \left\{ \max_{1 \le i \le m} \left[ T_{\text{Skew}} \left( P_{kl}^{i} \right)_{\min} \right], \\ \max_{1 \le j \le n} \left[ T_{\text{Skew}} \left( P_{lk}^{j} \right)_{\max} \right] \right\}. \quad (8)$$

Two global timing constraints impose zero clock skew among the I/O storage elements and limit the permissible clock skew range that can be implemented by the fabrication technology. By constraining the clock skew among the off-chip registers to zero, race conditions are eliminated among all integrated circuits controlled by the same clock source by avoiding the propagation of a non-zero clock skew beyond the integrated circuit. This condition is represented by the following expression,

$$T_{\text{Skew}}(P_{kl})_{\min} = \text{MAX} \left\{ \max_{1 \le i \le m} \left[ T_{\text{Skew}} \left( P_{kl}^{i} \right)_{\min} \right], \\ \max_{1 \le j \le n} \left[ T_{\text{Skew}} \left( P_{lk}^{j} \right)_{\max} \right] \right\}. \quad (9)$$

An immediate consequence of (9) is that the clock path delay from the clock source to every input and output register is equal.

Although the permissible range of a local data path is theoretically infinite, practical limitations place constraints on the minimum clock path delays that can be implemented with a given fabrication technology. These clock path delays determine the minimum clock

skew that can be assigned to any two vertices in the circuit. These fabrication dependent timing constraints are

$$|T_{\text{Skew}}(L_{ij})_{\text{max}} - T_{\text{Skew}}(L_{ij})_{\text{min}}| \ge C_1,$$

$$|T_{\text{Skew}ii}| \ge C_2,$$
(10)

where  $C_1$  and  $C_2$  are dependent on the fabrication technology and are a measure of the statistical variation of the process parameters.

## 2.2. Optimal Clock Period

The problem of determining an optimal clock period for a synchronous circuit while exploiting non-zero clock skew has been previously studied [11, 12, 16, 17]. In these approaches, clock delays rather than clock skews are calculated. Therefore, these clock delays cannot be directly used for determining the permissible range of the local clock skews. Thus, there is no process for determining the position of the scheduled clock skew within the permissible range. A technique to perform this process is described in this paper to schedule the clock skew and to prevent synchronization failures due to process parameter variations.

The determination of the minimum clock period using permissible ranges is possible by recognizing that the width of the permissible range of a local data path is dependent on the clock period [from (3)]. The overlap of permissible ranges guarantees the synchronization of the data flow between non-adjacent registers connected by multiple feedback and/or parallel paths. This technique initially guarantees the existence of a permissible range for each local data path and terminates by satisfying (6) for every data path in the circuit. The difference between the propagation delays of a local data path  $L_{ii}$  defines the minimum clock period necessary to safely latch data within  $L_{ij}$ . The largest difference among all the local data paths of the circuit defines the minimum clock period that can be used to safely latch data into any local data path. However, as shown in the example depicted in Fig. 3, in the presence of feedback and/or parallel paths, local timing constraints may not be sufficient to determine the minimum clock period (since certain global timing constraints such as (6) must also be satisfied). Nevertheless, a clock period always exists that satisfies all the local and global timing constraints of a circuit. This clock period is bounded by two terms,  $T_{\text{CP min}}$  and  $T_{\text{CP max}}$ , as independently demonstrated by Deokar and Sapatnekar in [12]. The lower bound of the clock period,  $T_{CP \min}$ , is the greatest difference in propagation delay of any local

data path  $L_{ij} \in C$ ,

$$T_{\text{CP min}} = \text{MAX} \left[ \max_{v_{ij} \in V} (T_{\text{PD max } ij} - T_{\text{PD min } ij}), \\ \max_{v_{i} \in V} (T_{\text{PD max } ii}) \right], \tag{11}$$

and the upper bound of the clock period,  $T_{\rm CP\,max}$ , is the greatest propagation delay of any local data path  $L_{ii}\in G$ ,

$$T_{\text{CP max}} = \text{MAX} \left[ \max_{\nu_{ij} \in V} (T_{\text{PD max } ij}), \max_{\nu_i \in V} (T_{\text{PD max } ii}) \right].$$
(12)

The second term in (11) and (12) accounts for the self-loop circuit when the output of a register is connected to its input through an optional logic block. Since the initial and final registers are the same, the clock skew in a self-loop is zero and the clock period is determined by the maximum propagation delay of the path connecting the output of the register to its input. Observe that a clock period equal to the lower bound exists for circuits without parallel and/or feedback paths. Furthermore, a clock period equal to the upper bound always exists since the permissible range of any local data path in the circuit contains the zero clock skew value. Although (12) satisfies any local and global timing constraints of circuit C, it is possible to determine a lower clock period that satisfies (6).

Several algorithms for determining the optimal clock period while exploiting non-zero clock skew exist. Fishburn [11] introduced this approach with a linear programming-based algorithm that minimizes the clock period while determining a set of clock path delays to drive the individual registers within the circuit. In [12], Deokar and Sapatnekar present a graph-based approach to achieve a similar goal, followed by an optimization step to reduce the skew between registers while preserving the minimum clock period. Other works, such as Sakallah et al. [16] and Szymanski [17], also calculate the optimal clock period and clock path delay schedule using linear programming techniques.

A graph-based algorithm is implemented in C to determine the minimum clock period and a permissible range for each local data path while ensuring that all the permissible ranges in the circuit satisfy (6) [18, 19]. The initial clock period is given by (11) and, the local and global permissible ranges for each local data path are calculated assuming this clock period. If at least one data path does not satisfy (6), the clock period is increased and the permissible ranges are re-calculated. This iterative process continues until (6) is satisfied for

all global data paths. The primary distinction of this algorithm is that the permissible range of each local data path  $Pl(L_{ij})$  is determined rather than the individual clock path delays to registers  $R_i$  and  $R_j$ . From each permissible range a clock skew value is chosen as explained in Section 2.3. This information is crucial for maximizing the performance of a synchronous circuit while considering the effects of process parameter variations in the design of high speed clock distribution networks.

#### 2.3. Selecting Clock Skew Values

Given any two vertices  $v_k, v_l \in V$ , the set of valid clock skew values between  $v_k$  and  $v_l$  is given by (6) and bounded by (7) and (8), as described in Section 2.2. In the presence of feedback and/or parallel paths, the resulting permissible range  $Pg(P_{kl})$  is a sub-set of the permissible range of each independent global data path between  $v_k$  and  $v_l$ , as exemplified in Fig. 3. However, due to (4),  $Pg(P_{kl})$  is the sum of the permissible range of each local data path for every global data path  $P_{kl}$  connecting  $v_k$  and  $v_l$ . Therefore, it is necessary to constrain the permissible range of each local data path to a sub-set of values within its original permissible range. Alternatively, if  $Pl(L_{ii})$  is the permissible range of a local data path within one of the global data paths connecting  $v_k$  and  $v_l$ ,  $\rho(L_{ii})$  is a sub-set of values within  $Pl(L_{ii})$ such that  $\rho(L_{ij}) \subseteq Pl(L_{ij})$ . This new region  $\rho(L_{ij})$  is described as the effective permissible range of a local data path.

An example of an effective permissible range is the parallel path shown in Fig. 3(a). For  $T_{\rm CP}=8$  tu, the permissible range  $Pg(P_{13})=[-2,-2]$ . Since  $Pg(P_{13})=Pl(L_{12})+Pl(L_{23})$ , the local data paths  $L_{12}$  and  $L_{23}$  can only assume clock skew values for which the sum is within [-2,-2]. In this case, the permissible range of each local data path is reduced to a single value, or  $Pl(L_{12})=[1,1]$  and  $Pl(L_{23})=[-3,-3]$ , respectively.

Assume that the clock period of the circuit in Fig. 3(a) is now increased from 8 tu to 9 tu. The new permissible range  $Pg(P_{13}) = [-2, 0]$  and the effective permissible range of each local data path is  $\rho(L_{12}) = [1, 2]$ ,  $\rho(L_{23}) = [-3, -2]$ , and  $\rho(L_{13}) = [-2, 0]$ , respectively. Note that selecting a clock skew value outside the effective permissible range of a local data path may lead to a race condition since (7) is violated. Also, there is no unique solution to the selection of an effective permissible range unless  $\rho(L_{ij}) = Pl(L_{ij})$ . For example,  $Pl(L_{12})$  could be set to [0, 2] and  $Pl(L_{23})$ 

set to [-2, -2], giving the same permissible range  $Pg(P_{13}) = [-2, 0]$ . Therefore, given any two vertices  $v_k$ ,  $v_l \in V$  with feedback and/or parallel paths connecting  $v_k$  and  $v_l$ , the selection of a clock skew schedule requires determining the effective permissible range  $\rho(L_{ij})$  for each local data path between  $v_k$  and  $v_l$ , and the relative position of  $\rho(L_{ij})$  within  $Pl(L_{ij})$ .

The effective permissible range of a local data path  $\rho(L_{ii})$  may not be unique, leading to multiple solutions to the clock skew scheduling problem. It is, however, possible to obtain one solution that is most suitable for minimizing the clock period while reducing the possibility of race conditions due to the effects of process parameter variations. This solution for  $\rho(L_{ii})$  is derived from the observation that the bounds of the permissible range of any two vertices  $v_k, v_l \in V$  (with possible feedback and/or parallel paths connecting  $v_k$  and  $v_l$ ) are maximum when determined by (7) and (8), and that the permissible  $Pg(P_{kl})$  bounded by (7) and (8) is unique. Therefore, the clock skew scheduling problem can be divided into two phases. In the first phase, the permissible range of each global data path is derived from (6), with bounds given by (7) and (8). In the second phase, the clock skew schedule is solved by the following process: 1) the permissible range of a global data path  $P_{e}(P_{kl})$  is divided equally among each local data path belonging to each global data path connecting the vertices  $v_k$  and  $v_l$ ; 2) within each global data path each effective permissible range  $\rho(L_{ii})$  is placed as close as possible to the upper bound of the original permissible range  $Pl(L_{ij})$ , thereby minimizing the likelihood of creating any race conditions; and 3) the specific value of the clock skew is chosen in the middle of the effective permissible range, since no prior information describing the variation of a particular clock skew value may exist. An algorithm for selecting the clock skew of each local data path was implemented as described in [18, 19]. From this clock skew schedule the minimum clock path delay to each register in the circuit is calculated

Providing independent clock path delays for each register is impractical due to the large capacitive load placed on the clock source and the inefficient use of die area. A tree structured clock distribution network is more appropriate, where the branching points are selected according to the delay of each clock path, the relative physical position of the clocked registers, and the sensitivity of each local data path to delay variations. Such an approach for determining the structural topology of a clock distribution network is described in the following section.

# 3. Clock Tree Topological Design

The topology of a clock tree derived from a clock skew schedule must ensure that the clock path delays are accurately implemented while considering the effects of process parameter variations. A tree-structured topology can be based on the hierarchical description of the circuit netlist, on implementing a balanced tree with a fixed number of branching levels from the clock source to each register with a pre-defined number of branching points per node (an example of this approach is a binary tree with n levels for  $2^n$  registers with two branching points per node), on reducing the effects of process parameter variations by driving common local data paths by the same sub-tree, or by implementing each clock path delay with pre-defined delay segments such that the layout area of the clock tree is reduced.

The topology of the clock distribution tree is built by driving common local data paths by the same sub-tree and by assigning precise delay values to each branch of the clock tree such that the skew assignment is satisfied [20]. For this purpose, each clock path delay is partitioned into a series of branches, each branch emulating a precise quantified delay value. Between any two segments, there is a branching point to other registers or sub-trees of the clock tree, where several branches with pre-defined delays are cascaded to provide the appropriate delay between the clock source (or root) and each leaf node. The selection of the branch delay is dependent upon the minimum propagation delay that can be implemented for a particular fabrication process and the inverter transconductance (or gain). An example of the topology of a clock tree is shown in Fig. 4, where the numbers in brackets are the delays assigned



Figure 4. Topology of the clock distribution network.

to each branch and the numbers in parenthesis are the clock skew assignment.

#### 4. Circuit Design of the Clock Tree

The circuit structures are designed to emulate the delay values associated with each branch of the clock tree. Special attention is placed on guaranteeing that the clock skew *between* any two clock paths is satisfied rather than satisfying each individual clock path delay. The successful design of each clock path is primarily dependent on two factors: 1) isolating each branch delay using active elements, specifically CMOS inverters, and 2) using repeaters to integrate the inverter and interconnect delay equations so as to more accurately calculate the delay of each clock path.

The interconnect lines are modeled as purely capacitive lines by inserting inverting buffer repeaters into the clock path such that the output impedance of each inverter is significantly greater than the resistance of the driven interconnect line [21]. As a consequence, the slope of the input signal of a buffer connected to a branching point is identical to the slope of the output signal of the buffer driving that same branching point [22].

In the existing design methodology [14, 22], the delay of a branch is implemented with one or more CMOS inverters, as illustrated in Fig. 5. The delay equations of each inverter are based on the MOSFET  $\alpha$ -power law short-channel I-V model developed by Sakurai and Newton [23].

Each inverter is assumed to be driven by a ramp signal with symmetric rising and falling slopes, selected



Figure 5. Design of a branch delay element.

such that during discharge (charge), the effects of the PMOS (NMOS) transistor can be neglected. The capacitive load of an inverter so as to satisfy a specific branch delay  $t_{\rm di}$  is

$$C_{\text{Li}} = \frac{2I_{\text{DO}}}{V_{\text{DD}}} \left[ t_{\text{di}} - \left( \frac{1}{2} - \frac{1 - v_{\tau}}{1 + \alpha} \right) t_{\text{Ti-1}} \right],$$
 (13)

where  $I_{DO}$  is the drain current at  $V_{GS} = V_{DS} =$  $V_{\rm DD}$ ,  $V_{\rm DO}$  is the drain saturation voltage at  $V_{\rm GS}$  =  $V_{\rm DD}$ ,  $V_{\rm th}$  is the threshold voltage,  $\alpha$  is the velocity saturation index,  $V_{\rm DD}$  is the power supply,  $t_{\rm di}$  is the delay of an inverter defined at the 50%  $V_{\rm DD}$  point of the input waveform to the 50%  $V_{\rm DD}$  point of the output waveform,  $v_T = V_{th}/V_{DD}$ , and  $t_{Ti}$  is the transition time of the input signal. Note that  $C_{Li}$  is composed of the capacitance of the driven interconnect line and the total gate capacitance of all  $b_{i+1}$  inverters. Since  $t_{di}$  is known, the only unknown in (13) is the transition time of the input signal  $t_{Ti}$  (provided by [23]).  $t_{Ti}$  can be approximated by a ramp shaped waveform, or by linearly connecting the points  $0.1V_{DD}$  and  $0.9V_{DD}$  of the output waveform. This assumption is accurate as long as the interconnect resistance is negligible as compared with the inverter output impedance.

$$t_{\text{Ti}} = \frac{t_{0.9} - t_{0.8}}{0.8}$$

$$= \frac{C_{\text{Li}} V_{\text{DD}}}{I_{\text{DO}}} \left( \frac{0.9}{0.8} + \frac{V_{\text{DO}}}{0.8 V_{\text{DD}}} \ln \frac{10 V_{\text{DO}}}{e V_{\text{DD}}} \right). \quad (14)$$

For each clock path within the clock tree, the procedure to design the CMOS inverters is as follows: 1) the load of the initial trunk of the clock tree is determined from (13), assuming a step input clock signal; 2) the slope of the output signal is calculated from (14) and applied in (13) to determine the capacitive load of the following branch, permitting the slope of the output signal to be calculated; and 3) step 2 is repeated for each subsequent branch of the clock path. Steps 1-3 are applied to the remaining clock paths within the clock tree. Observe that if the transition time of the output signal of branch  $b_i$  does not satisfy

$$t_{\text{Ti}} \le \frac{1}{\left(\frac{1}{2} - \frac{1 - v_{\text{r}}}{1 + \alpha}\right)} \left(t_{\text{di}+1} - \frac{V_{\text{DD}}C_{\text{Li}+1}}{2I_{\text{DO}}}\right),$$
 (15)

(13) is no longer valid. The transition time  $t_{Ti}$  can be reduced in order to satisfy (15) by increasing the output current drive of the inverter in branch  $b_i$ . However, increasing  $I_{DOi}$  would increase the capacitive load  $C_{Li}$  in order to maintain the propagation delay  $t_{di}$  for

branch  $b_i$ . Therefore, the transition time associated with branch  $b_i$  must be maintained constant as long as the propagation delay  $t_{\rm di}$  of the branch  $b_i$  remains the same. Furthermore, the number of inverters required to implement the propagation delay  $t_{\rm di}$  is chosen such that (15) is satisfied and the proper polarity of the clock signal driving branch  $b_{i+1}$  is maintained.

# 5. Increasing Tolerance to Process Parameter Variations

Every semiconductor fabrication process can be characterized by variations in process parameters. These process parameter variations along with environmental variations, such as temperature, supply voltage, and radiation, may compromise both the performance and the reliability of the clock distribution network. A bottom-up approach is presented in this section for verifying the selected clock skew values and correcting for any variations of the clock skew due to process parameter variations that violate the bounds of the permissible range.

## 5.1. Circuit Design Considerations

Each clock path delay can be modeled as being composed of both a deterministic delay component and a probabilistic delay component. While the deterministic component can be characterized with well developed delay models [e.g., 23], the probabilistic component of the clock path delay is dependent upon variations of the fabrication process and the environmental conditions. The variations of the fabrication process affect both the active device parameters (e.g.,  $I_{DO}$ ,  $V_{th}$ ,  $\mu_o$ ) and the passive geometric parameters (e.g., the interconnect width and spacing).

The probabilistic delay component is determined for each clock path by assuming that the cumulative effects of the device parameter variations, such as threshold voltage and channel mobility, can be collected into a single parameter characterizing the gain of the inverter, specifically the output current of a CMOS inverter  $I_{DO}$  [23]. The minimum and maximum clock path delays are calculated considering the minimum and maximum  $I_{DO}$  of each inverter within a branch of the clock distribution network. The worst case variation of the clock skews is determined from the minimum and maximum clock path delays of each local data path. If at least one worst case clock skew value is outside the effective permissible range of the corresponding local data path (i.e.,  $T_{Skewij} \not\subset \rho(L_{ij})$ ), a timing constraint



Figure 6. Example of upper and lower bound clock skew violations.

is violated and the circuit will not work properly, as illustrated in the example shown in Fig. 6.

This violation is passed to the top-down synthesis system, indicating which bound of the effective permissible range is violated. The clock skew of at least one local data path  $L_{ij}$  within the system may violate the upper bound of  $\rho(L_{ij})$ , i.e.,  $T_{\text{Skew }ij} > T_{\text{Skew }ij(\text{max})}$ . Observe that if  $\rho(L_{ii}) = Pl(L_{ii})$ ,  $T_{\text{Skew}\,ii}$  does not satisfy (3), shown as region C in Fig. 2, causing zero clocking [11]. By increasing the clock period  $T_{CP}$ , the effective permissible clock skew range for each local data path is also increased  $(T_{Skew ij(max)})$  is increased due to monotonicity), permitting those local data paths previously in region C to satisfy (3). The new clock skew value may also violate the lower bound of a local data path, i.e.,  $T_{\text{Skew }ij} < T_{\text{Skew }ij(\text{min})}$ , where  $T_{\text{Skew }ij(\min)} \subset \rho(L_{ij})$ . Observe that if  $\rho(L_{ij}) = Pl(L_{ij})$ ,  $T_{\text{Skew }ii}$  does not satisfy (2), shown as region A in Fig. 2, causing double clocking [11]. This situation can be potentially dangerous since the lower bound of  $Pl(L_{ii})$  is independent of the clock frequency, causing the circuit to function improperly.

Two compensation techniques are used to prevent lower bound violations, depending upon where the effective permissible range of a local data path  $\rho(L_{ij})$  is located within the absolute permissible range of the local data path,  $Pl(L_{ij})$ . If the worst case clock skew is in between the lower bounds of  $\rho(L_{ij})$  and  $Pl(L_{ij})$ ,  $MIN[Pl(L_{ij})] < T_{Skew\,ij} < MIN[\rho(L_{ij})]$ , the clock period  $T_{CP}$  is increased until the race condition is eliminated, since the effective permissible range will increase, due to monotonicity. If the worst case clock skew is less than the lower bound of the permissible

range of the local data path,  $T_{\text{Skew}\,ij} < \text{MIN}[Pl(L_{ij})],$ any increase in the clock period will not eliminate the synchronization failure since (2) is not dependent on the clock period. To compensate for this violation a safety term  $\zeta_{ij} > 0$  is added to the local timing constraint that defines the lower bound of  $Pl(L_{ii})$  [see (2)]. The clock period is increased and a new clock skew schedule is calculated for this value of the clock period. The increased clock period is required to obtain a set of effective permissible ranges with widths equal to or greater than the set of effective permissible ranges that existed before the clock skew violation. Observe that by including the safety term  $\zeta_{ii}$ , the lower bound of the clock skew of the local data path containing the race condition is shifted to the right (see Fig. 2), moving the new clock skew schedule of the entire circuit away from the bound violation and removing any race conditions. This iterative process continues until the worst case variations of the selected clock skews no longer violate the corresponding effective permissible range of each local data path.

#### 6. Simulation Results

The simulation results presented in this section illustrate the performance improvements obtained by exploiting non-zero clock skew. In order to demonstrate these performance improvements, a set of ISCAS-89 sequential circuits is chosen as benchmark circuits. The performance results are illustrated in Table 1. The number of registers and gates within the circuit including the I/O registers are shown in Column 2. The upper bound of the clock period assuming zero clock skew  $T_{CP0}$  is shown in Column 3. The clock period obtained with intentional clock skew TCPi is shown in Column 4. The resulting performance gain is shown in Column 5. The clock period obtained with the constraint of zero clock skew imposed among the I/O registers is shown in Column 6 while the performance gain with respect to zero I/O skew is shown in Column 7.

The results shown in Table 1 clearly demonstrate reductions of the minimum clock period when intentional clock skew is exploited. The amount of reduction is dependent on the characteristics of each circuit, particularly the differences in propagation delay between each local data path. Note also that by constraining the clock skew of the I/O registers to zero, circuit speed can be improved, although less than if this I/O constraint is not used.

Table 1. Performance improvement with non-zero clock skew.

| Circuit | Size<br># register/# gates | $T_{\text{CPO}}$ $T_{\text{Skew}ij} = 0$ | $T_{\text{CP}i} \\ T_{\text{Skew}ij} \neq 0$ | $T_{\text{CP}}$ Gain (%) $T_{\text{Skewl/O}} = 0$ Gain (9) |      |      |
|---------|----------------------------|------------------------------------------|----------------------------------------------|------------------------------------------------------------|------|------|
| ex l    | 20/-                       | 11.0                                     | 6.3                                          | 43.0                                                       | 7.2  | 35.0 |
| s27     | 7/10                       | 9.2                                      | 6.6                                          | 28.0                                                       | 9.2  | 0.0  |
| s298    | 23/119                     | 16.2                                     | 11.6                                         | 28.0                                                       | 11.6 | 28.0 |
| s344    | 35/160                     | 28.4                                     | 25.6                                         | 9.9                                                        | 25.6 | 9.9  |
| s386    | 20/159                     | 19.8                                     | 19.8                                         | 0.0                                                        | 19.8 | 0.0  |
| s444    | 30/181                     | 18.6                                     | 12.2                                         | 34.4                                                       | 12.2 | 34.4 |
| s510    | 32/211                     | 19.8                                     | 17.3                                         | 13.0                                                       | 17.3 | 13.0 |
| s938    | 67/446                     | 27.0                                     | 21.4                                         | 20.7                                                       | 25.0 | 7.4  |
| s1196   | 45/529                     | 37.0                                     | 30.8                                         | 16.8                                                       | 37.0 | 0.0  |
| s1512   | 89/780                     | 53.2                                     | 43.2                                         | 18.8                                                       | 53.2 | 0.0  |

Table 2. Worst case variations in clock skew due to process parameter variations,  $I_{DO} = 15\%$ .

| Circuit | $T_{\mathrm{CP0}}/T_{\mathrm{CP}i}$ | Gain(%) | Permissible range | Selected clock skew | Simulated skew (ns) |            | Error (%) |            |
|---------|-------------------------------------|---------|-------------------|---------------------|---------------------|------------|-----------|------------|
|         |                                     |         |                   |                     | Nom                 | Worst case | Nom       | Worst case |
| cdn I   | 11/9                                | 18.0    | [-8, -2]          | -3.0                | -3.0                | -2.10      | 0.0       | 30.0       |
| cdn 2   | 18/15                               | 17.0    | [-6.8, -1.4]      | -4.2                | -4.1                | -3.3       | 2.4       | 21.4       |
| cdn 3   | 27/18                               | 33.0    | [-14, 2.3]        | 1.1                 | 1.14                | 1.3        | 3.6       | 18.2       |

Clock distribution networks which exploit intentional clock skew and are less sensitive to the effects of process parameter variations are depicted in Table 2. The ratio of the minimum clock period assuming zero clock skew  $T_{CPa}$  to the intentional clock skew  $T_{CPi}$  and the per cent improvement is shown in Columns 2 and 3, respectively. The permissible range most susceptible to process parameter variations is illustrated in Column 4. The selected clock skew is shown in Column 5. In Columns 6 and 7, respectively, the nominal and maximum clock skew are depicted, assuming a  $\pm 15\%$ variation of the drain current  $I_{DO}$  of each inverter. Note that both the nominal and the worst case value of the clock skew are within the permissible range. The per cent variation of clock skew due to the effects of process parameter variations is shown in Columns 8 and 9. This result confirms the claim stated previously that variations in clock skew due to process parameter variations can be both tolerated and compensated.

# 7. Conclusions

An integrated top-down, bottom-up approach is presented for synthesizing clock distribution networks

tolerant to process parameter variations. In the topdown phase, the clock skew schedule and permissible ranges of each local data path are calculated while minimizing the clock period. The process of determining the bounds of the permissible ranges and selecting the clock skew value for each local data path so as to minimize the effects of process parameter variations is described. Rather than placing limits or bounds on the clock skew variations, this approach guarantees that each selected clock skew value is within the permissible range despite worst case variations of the clock skew. Techniques for designing the topology and the CMOS-based circuit structure of the clock trees are presented. In the bottom-up phase, worst case variations of clock skew due to process parameter variations are determined from the specific clock distribution network. Variations are compensated by the proper choice of clock skew for each local data path. Results of optimizing the clock skew schedule of several MCNC/ISCAS-89 benchmark circuits are presented. A schedule of the clock skews to make a clock distribution network less sensitive to process parameter variations is presented for several example networks. An 18% improvement in clock frequency with up to a 30% variation in the nominal clock skew, and a 33% improvement in clock frequency with up to an 18% variation in the nominal clock skew are demonstrated for several example circuits.

#### References

- S. Pullela, N. Menezes, J. Omar, and L.T. Pillage, "Skew and delay optimization for reliable buffered clock trees," *Proceed*ings of the IEEE International Conference on Computer-Aided Design, pp. 556-562, Nov. 1993.
- Q. Zhu, W.W.-M. Dai, and J.G. Xi, "Optimal sizing of highspeed clock networks based on distributed RC and lossy transmission line models," *Proceedings of the IEEE Interna*tional Conference on Computer-Aided Design, pp. 628-633, Nov. 1993.
- J. Cong and K.-S. Leung, "Optimal wiresizing under the distributed elmore delay model," Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 634-639, Nov. 1993.
- J. Cong and C.-K. Koh, "Simultaneous driver and wire sizing for performance and power optimization," *IEEE Transactions* on VLSI Systems, Vol. VLSI-2, No. 4, pp. 408–425, Dec. 1994.
- H.B. Bakoglu, J.T. Walker, and J.D. Meindl, "A symmetric clock-distribution tree and optimized high-speed interconnections for reduced clock skew in ULSI and WSI circuits," Proceedings of the IEEE International Conference on Computer Design, pp. 118-122, Oct. 1986.
- T.-H. Chao, Y.-C. Hsu, J.-M. Ho, K.D. Boese, and A.B. Kahng, "Zero skew clock routing with minimum wirelength," *IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing*, Vol. CAS-39, No. 11, pp. 799-814, Nov. 1992.
- R.-S. Tsay, "An exact zero-skew clock routing algorithm," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. CAD-12, No. 2, pp. 242–249, Feb. 1993.
- S. Lin and C.K. Wong, "Process-variation-tolerant clock skew minimization," Proceedings of the IEEE International Conference on Computer-Aided Design, pp. 284–288, Nov. 1994.
- M. Shoji, "Elimination of process-dependent clock skew in CMOS VLSI," *IEEE Journal of Solid-State Circuits*, Vol. SC-21, No. 5, pp. 875-880, Oct. 1986.
- E.G. Friedman, Clock Distribution Networks in VLSI Circuits and System, IEEE Press, 1995.
- J.P. Fishburn, "Clock skew optimization," *IEEE Transactions on Computers*, Vol. C-39, No. 7, pp. 945–951, July 1990.
- R.B. Deokar and S. Sapatnekar, "A graph-theoretic approach to clock skew optimization," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 407–410, May 1994.
- J.L. Neves and E.G. Friedman, "Design methodology for synthesizing clock distribution networks exploiting non-zero localized clock skew," *IEEE Transactions on VLSI Systems*, Vol. VLSI-4, No. 2, pp. 286-291, June 1996.
- J.L. Neves and E.G. Friedman, "Synthesizing distributed buffer clock trees for high performance ASICs," *Proceedings of the IEEE ASIC Conference*, pp. 126–129, Sept. 1994.
- E.G. Friedman, "Latching characteristics of a CMOS bistable register," IEEE Transactions on Circuits and Systems—1:

- Fundamental Theory and Applications, Vol. CAS I-40, No. 12, pp. 902–908, Dec. 1993.
- K.A. Sakallah, T.N. Mudge, and O.A. Olukotun, "CheckTc and minTc: Timing verification and optimal clocking of synchronous digital circuits," Proceedings of the IEEE/ACM Design Automation Conference, pp. 111-117, June 1990.
- T.G. Szymanski, "Computing optimal clock schedules," Proceedings of the IEEE/ACM Design Automation Conference, pp. 399-404, June 1992.
- J.L. Neves and E.G. Friedman, "Optimal clock skew scheduling tolerant to process variations," *Proceedings of the ACM/IEEE Design Automation Conference*, pp. 623–628, June 1996.
- J.L. Neves, "Synthesis of Clock Distribution Networks for High Performance VLSI/ULSI-Based Synchronous Digital Systems," Ph.D. Dissertation, University of Rochester, Dec. 1995.
- J.L. Neves and E.G. Friedman, "Topological design of clock distribution networks based on non-zero clock skew specifications," Proceedings of the IEEE Midwest Symposium on Circuits and Systems, pp. 461-471, Aug. 1993.
- S.Dhar and M.A. Franklin, "Optimum buffer circuits for driving long uniform lines," *IEEE Journal of Solid State Circuits*, Vol. SC-26, No. 1, pp. 32-40, Jan. 1991.
- J.L. Neves and E.G. Friedman, "Circuit synthesis of clock distribution networks based on non-zero clock skew," Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 4.175–4.178, May 1994.
- T. Sakurai and A.R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," *IEEE Journal of Solid State Circuits*, Vol. SC-25, No. 2, pp. 584– 594, April 1990.



José Luis P.C. Neves received the B.S. degree in Electrical Engineering in 1986, and the M.S. degree in Computer Science in 1989 from the Federal University of Minas Gerais (UFMG), Brazil. He received the M.S. and Ph.D. degrees in electrical engineering from the University of Rochester, New York, in 1991 and 1995, respectively.

He was with the Physics Department of the UFMG as an electrical engineer from 1986 to 1987, where he managed the automation of several research laboratories, designing data acquisition equipment and writing programs for data collect and analysis. He was a Teaching and Research Assistant at the University of Rochester from 1990 to 1995. He was a computer systems administrator with the Laboratory of Respiratory Physiology in the Department of Anesthesiology, University of Rochester from 1992 to 1996, writing programs for data collect and analysis, and designing the supporting electronic equipment. He has been with IBM Microelectronics since 1996 as

an advisory engineer/scientist responsible for developing and implementing clock distribution design and synthesis tools. His research interests include high performance VLSI/IC design and analysis, timing issues in VLSI design, and CAD tool and methodology development with application to the design and synthesis of clock distribution networks, low power circuits, and CMOS circuit design techniques tolerant to process parameter variations.

Dr. Neves received a Doctoral Fellowship from the National Research Council (CNPq) Brazil from 1990 to 1994. He is a member of the Technical Program Committee of ISCAS '97. neves@ee.rochester.edu



Eby G. Friedman was born in Jersey City, New Jersey in 1957. He received the B.S. degree from Lafayette College, Easton, PA in 1979, and the M.S. and Ph.D. degrees from the University of California, Irvine, in 1981 and 1989, respectively, all in electrical engineering.

He was with Philips Gloeilampen Fabrieken, Eindhoven, The Netherlands, in 1978 where he worked on the design of bipolar differential amplifiers. From 1979 to 1991, he was with Hughes Aircraft Company, rising to the position of manager of the Signal Processing Design and Test Department, responsible for the design and test of high performance digital and analog IC's. He has been with the Department of Electrical Engineering at the University of Rochester, Rochester, NY, since 1991, where he is an Associate Professor and Director of the High Performance VLSI/IC Design and Analysis Laboratory. His current research and teaching interests are in high performance microelectronic design and analysis with application to high speed portable processors and low power wireless communications.

He has authored two book chapters and many papers in the fields of high speed and low power CMOS design techniques, pipelining and retiming, and the theory and application of synchronous clock distribution networks, and has edited one book, Clock Distribution Networks in VLSI Circuits and Systems (IEEE Press, 1995). Dr. Friedman is a Senior Member of the IEEE, a Member of the editorial board of Analog Integrated Circuits and Signal Processing, Chair of the VLSI Systems and Applications CAS Technical Committee, Chair of the VLSI track for ISCAS '96 and '97, and a Member of the technical program committee of a number of conferences. He was a Member of the editorial board of the IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Chair of the Electron Devices Chapter of the IEEE Rochester Section, and a recipient of the Howard Hughes Masters and Doctoral Fellowships, an NSF Research Initiation Award, an Outstanding IEEE Chapter Chairman Award, and a University of Rochester College of Engineering Teaching Excellence Award.

friedman@ee.rochester.edu