# FOPAC: Flexible On-Chip Power and Clock

Ragh Kuttappa<sup>®</sup>, Student Member, IEEE, Selçuk Köse, Member, IEEE, and Baris Taskin<sup>®</sup>, Senior Member, IEEE

Abstract—A novel flexible on-chip power and clock (FOPAC) generation and distribution circuit is proposed to enable fast dynamic voltage and frequency scaling (DVFS). FOPAC utilizes resonant rotary clocks (ReRoCs) along with multi-phase voltage regulators (MPVR) for the clock and power generation and distribution. The locally distributed ReRoCs provide the required clock phases to the MPVR, and the MPVR provides the required voltage levels to the ReRoC, providing spatial and temporal flexibility for fast DVFS. The ReRoC and MPVR share the on-chip fly capacitor of the switched capacitor voltage regulators to achieve greater frequency scaling at run-time while reducing the overhead. The FOPAC architecture is evaluated on industrial designs demonstrating a <2 ns DVFS switching time.

Index Terms—Resonant rotary clock, voltage regulators, low power, VLSI.

#### I. INTRODUCTION

ODERN integrated circuits have an increasing need for various levels of both supply voltage (V) and operating frequency (f) available at fine spatial and temporal granularity. This work introduces a unique solution that provides a high number and quality of locally distributed V/f domains through FOPAC, as shown in Figure 1. Opportunistically sharing design resources and features between multi-phase voltage regulators (MPVRs) and resonant rotary clocks (ReRoCs) enabling i) the scalability to hundreds of domains, ii) fast switching times for both voltage and frequency, leading to temporal flexibility, and iii) locally distributed designs, leading to spatial flexibility.

The performance improvements and power savings enabled by flexible on-chip power and clocks (FOPAC) are motivated in Figure 2 with shaded regions. When a higher performance is needed, the voltage is scaled up  $(V_0 \text{ to } V_1)$  followed by frequency up-scaling  $(f_0 \text{ to } f_1)$ . The speed of V/f up-scaling enables high performance node starting at time  $t_2$  as opposed to  $t_4$  in Figure 2. Alternatively, when a lower performance is sufficient (i.e. for better energy-efficiency), the frequency is decreased followed by voltage down-scaling. Here, the speed of down-scaling enables higher amount of time to be spent in

Manuscript received March 5, 2019; revised June 6, 2019 and July 20, 2019; accepted July 24, 2019. Date of publication August 27, 2019; date of current version December 6, 2019. This article was recommended by Associate Editor F. J. Kurdahi. (Corresponding author: Ragh Kuttappa.)

- R. Kuttappa and B. Taskin are with the Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104 USA (e-mail: fr67@drexel.edu; taskin@coe.drexel.edu).
- S. Köse is with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627 USA (e-mail: selcuk.kose@rochester.edu).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2019.2934009



Fig. 1. FOPAC topology with resonant rotary clocks (ReRoC) and multiphase voltage regulators (MPVR).



Fig. 2. Fast and symmetric DVFS with FOPAC.

the lower performance node, initiated at time t<sub>8</sub> as opposed to t<sub>9</sub> in Figure 2. The granularities of the voltage V and frequency f values achievable with specific hardware implementations also impact the energy-efficiency. Costly implementations of fractional PLLs are used to provide frequency granularity (such as in [1]), whereas a multitude of voltage regulators and power grids are common to provide voltage granularity.

FOPAC provides spatially and temporally flexible power/clock domains that are fine-tuned for each individual unit and collectively designed with shared overhead with superior performance. This flexibility comes with significant savings in power, performance, area, and accuracy, thanks to the opportunistic design of the MPVR and the ReRoC, leading to the following novelties:

- 1) fast switching between different V/f pairs,
- 2) symmetric switching between V/f pairs to improve power savings and performance, as illustrated in Figure 2,
- 3) improved granularity of frequency values, without the overhead of multiple fractional PLLs distributed locally,



Fig. 3. FOPAC circuit with the multiphase voltage regulator sharing the fly capacitor with the resonant rotary clock.



Fig. 4. 2:1 Switched capacitor voltage regulator (SCVR).

- 4) opportunistic sharing of the fly capacitor as illustrated in Figure 3, to reduce design overhead and to help scale the ReRoC frequency,
- 5) power savings of  $\approx$ 35% as compared to PLL based designs.

The preliminaries encapsulating the circuit level aspects necessary for the proposed DVFS approach with ReRoCs and MPVRs are presented in Section II. The proposed architecture is presented in Section III. The simulation setup and results are discussed in Section IV. Conclusions are provided in Section V.

# II. PRELIMINARIES

The following sections discuss the on-chip voltage regulator and resonant rotary clock background, and prior works.

## A. Switched Capacitor Voltage Regulator

On-chip voltage regulators (OCVRs) have been widely studied in prior works making the implementation feasible in traditional CMOS processes [2]-[5]. OCVRs can provide faster voltage scaling, reduce the number of dedicated I/O pins to the power, and facilitate fine granularity power management techniques [2], [4], [6]. Switched capacitor voltage regulators (SCVR) utilize fly capacitors to generate a DC output voltage [2]. A schematic of a 2:1 SCVR is illustrated in Figure 4. SCVRs are designed with non-overlapping signals  $\phi_1$  and  $\phi_2$ that operate at the MHz frequency range [7]. The intrinsic, switching, and conduction loses related to the fly capacitors result in lower conversion efficiency. To overcome the ripple at the output, multi-stage interleaving is proposed [3], [8]. Interleaving necessitates the need for multiple phases of the clock [3], [8]. To generate multiple clock phases, dedicated and robust clock sources are required.

#### B. Resonant Rotary Clock Design

Resonant rotary clocks (ReRoCs) is a type of resonant clocking with constant magnitude, low power, low jitter,



Fig. 5. Dynamic resonant rotary frequency divider [9].

and multiple phases [10]. ReRoCs are designed using IC interconnects for the transmission lines and inverter pairs that are uniformly distributed along the transmission lines in antiparallel fashion, as illustrated in Figure 3. The ReRoC is modeled as an LC oscillator, where the frequency is estimated by,

$$f_{osc} = \frac{1}{2\sqrt{L_T C_T}}. (1)$$

In Eq. (1),  $f_{osc}$  is the operating frequency of the ReRoC. The total inductance and total capacitance is given by  $L_T$  and  $C_T$ , respectively [10]. Most efficient design of ReRoCs are sparse rotary oscillator arrays (SROA) [11], similar to a non-uniform clock mesh topology. The capacitive load and inductance affect the frequency of oscillations. SROA is correct by design through algorithmic novelties proposed in the local distribution for use in the proposed FOPAC methodology.

The improved granularity of frequency values, without the overhead of multiple local fractional PLLs, is achieved via the use of the resonant frequency dividers [9]. The frequency dividers, as illustrated in Figure 5, are designed with spot advancing blocks (SAB) and transmission gates for the multiplexers to maintain the adiabatic property of the ReRoCs. The building blocks are the multi-input SABs (MISAB) and multi-output SABs (MOSAB), as illustrated in Figure 5. CLK1 through CLK8 inputs of the SABs are the multiple phases readily available on each local ReRoC building block of the SROA. In particular, CLK1 is shifted by 0° from a reference, whereas CLKN is shifted by (N-1)\*360/N degrees. The dynamic frequency division, with improved granularity proposed in this work is accomplished by topologically changing the connections CLK1 through CLK8 shown in Figure 5.

## C. Prior Work

The existing resonant DVFS approaches have relatively low frequency scalability and require bulky inductors and capacitors [5], [12]–[14] making it challenging to achieve high energy efficiency. On-chip SCVRs have been prototyped targeting high conversion efficiency [2], [3]. These prior works achieve good conversion efficiencies but require robust multi-phase non-overlapping clock signals. The generation and synchronization of the input clock signal to each multi-phase voltage regulator (MPVR) stage becomes quite costly and even unfeasible when the number of phases is very high [15], [16]. A prototype implementation for the MPVRs utilized ring oscillators to generate the multi-phase clock signals [15]. Prototype implementations of ReRoCs on silicon have been presented to explore the energy efficiency and the multiple



Fig. 6. FOPAC device: MPVR architecture with ReRoCs.

TABLE I PRIOR WORKS WITH SILICON PROTOTYPES

| - D .                         | D 4 151           | D 1 [10]          | T (151                         |
|-------------------------------|-------------------|-------------------|--------------------------------|
| Design                        | Restle [5]        | Rahman [12]       | Lu [15]                        |
|                               | ISSCC 2014        | JSSC 2018         | ISSCC 2015                     |
| Technology                    | 22 nm SOI         | 65 nm bulk        | 65 nm bulk                     |
| Results                       | Experimental      | Experimental      | Experimental                   |
| Clock source                  | PLL+resonant grid | PLL+resonant grid | VCO                            |
| System resonant               | Always            | Always            | No                             |
| Resonant DVFS                 | No                | Yes               | No, DVS only                   |
| Voltage range                 | 0.75-1.05 V       | 0.7-1.2 V         | 0.6-1.2 V                      |
| Frequency range               | 2.5-5 GHz         | DC-132 MHz        | -                              |
| Inductor                      | On chip           | Off chip          | -                              |
|                               | (0.3-2.5 nH each) | (7 nH)            |                                |
| Voltage regulator (VR)        | Yes               | No                | Yes                            |
| DVS speed                     | DNR               | -                 | $2.5 \mathrm{V}/\mu\mathrm{s}$ |
| $\eta_{max}$ of VR            | DNR               | -                 | 78.3%                          |
| $\rho  (W/mm^2)  @\eta_{max}$ | DNR               | <u>-</u>          | 0.18                           |
| Power reduction               | 36%               | 34%-38%           | -                              |
| Clock power reduction         | -                 | -                 | -                              |

DNR - Did not report

phases [10], [17]. Rotary clock based distribution networks have been well studied in recent years focusing on optimization of the local and global clock networks [11], [18]–[22]. Additionally, there has been some work on the co-design of power and clock distribution network [23] which focuses on the global interconnect, rather than the fusion of power and clock generation circuitry. In Table I, the silicon measured results for DVFS and DVS implementations for resonant clocking and multiphase SCVRs are presented, respectively.

#### III. PROPOSED FOPAC ARCHITECTURE

Flexible on-chip power and clock (FOPAC) architecture encompasses the FOPAC circuit building blocks, as illustrated in Figure 3, distributed through a circuit, as illustrated in Figure 1, using an ASIC-flow-compliant methodology. The on-chip voltage regulators are distributed throughout the power grid and the ReRoCs are distributed locally. The FOPAC methodology is detailed in the following sections, first focusing on DVFS operation (Section III-A and Section III-B) and next on describing the integration with standard ASIC flow (Section III-C).

# A. Dynamic Voltage Scaling With MPVR

The circuit topology for the proposed MPVR, which is an integrated SCVR with ReRoC for FOPAC, is illustrated in Figure 6. MPVRs are designed with ReRoCs that provide the multiple clock phases for the interleaved operation. The voltage ripple across the capacitor is reduced with multiphase interleaving [3], [8], and the ReRoCs provide higher



Fig. 7. Integrator logic [24].

granularity of phases. The adaptive gain comparators and the integrator logic in the feedback loop are illustrated in Figure 6 and 7, respectively [24]. The comparators that provide high gain in the design are driven by a higher frequency clock signal (CLK $_{Hf}$ ) whereas the comparators that provide nominal gain by a lower frequency clock signal ( $CLK_{Lf}$ ) [24]. This significantly improves the speed of the regulation at the output of the MPVR while also maintaining the stability of the control loop. The two different frequencies of the clock signal are provided from the same ReRoC, with the minimal overhead of an additional frequency divider. The comparator architecture is chosen as double-tail latch-type for speed and low kickback noise. To increase the settling speed of the integrator, several integrator regions that have different step sizes based on the difference in between the output and reference voltages are designed. This technique helps the integrator to keep up with the actual current requirement at the output rather than setting it to the maximum value for fast recovery [8].

Overall, one ReRoC structure with k dividers can provide the k distinct clock frequencies for comparator operation, and the m multi-phase signals, shown in Figure 6 for k=2 (i.e. for  $\operatorname{CLK}_{Hf}$  and  $\operatorname{CLK}_{Lf}$ ). ReRoCs are designed in the GHz frequency range and the clock for the SCVR is generated after frequency division and duty cycle conversion. The placement of the frequency dividers with respect to the ReRoC rings is illustrated in Figure 8. It is straight forward to tap the m



Fig. 8. ReRoC with divider placement.

multi-phase clock sources for the SCVRs since the tapping locations are accurately known and the routing not as complex as clock distribution networks accomplished in [21]. Consider the tapping point for a particular phase  $\Theta_{P_i}$  to be located at (x,y). The SCVR clock source taps onto  $\Theta_{P_i}$  that satisfies the phase requirement. The placement of the SCVR depends on: i)  $\Theta_{SCVR_p}$  - the phase required for the SCVR and ii)  $\Theta_{l_i}$  the phase attributed to the tapping wire  $l_i$ . The SCVR is placed such that  $\Theta_{SCVR_p} = \Theta_{l_i} + \Theta_{P_i}$ .

Two sets of results are presented to validate and measure the effectiveness of the DVS operation with MPVR. The first set of results is the symmetric and fast response to both step-up and step-down changes in the reference voltage of the MPVR with a maximum output current of 50 mA shown in Figure 9. The high gain comparators are clocked at an arbitrarily selected ReRoC frequency of  $3.3\,\text{GHz}$  (CLK $_{Hf}$  in Figure 6). The nominal gain comparators ( $CLK_{Lf}$  in Figure 6) and MPVRs are clocked at 360 MHz after performing frequency division by 9. Step-up and step-down scaling with MPVRs takes 89 ps and 87 ps, respectively. The robust ReRoC signals with accurate phase matching between the SCVRs and ReRoCs along with the digital control helps in achieving the symmetric step-up and step-down scaling of the MPVRs. A maximum voltage ripple of 20 mV is achieved with 18 interleaved stages of the MPVR.

The second set of results for DVS scaling with MPVR are based on the fly capacitor selection on i) SCVR conversion efficiency and ii) the opportunistic design assisting DFS. In Figure 10(a), the voltage conversion efficiency  $\eta_{SCVR}$  for different fly capacitor values is shown. The overall power efficiency of the SCVR is computed as,

$$\eta_{SCVR} = \frac{P_{out}}{P_{out} + P_{sw} + P_{buff} + P_{control} + P_{par}}.$$
 (2)

In Eq. (2),  $P_{out}$ ,  $P_{sw}$ ,  $P_{buff}$ ,  $P_{par}$ , and  $P_{control}$  are the output power, switching power, buffer power, parasitic power, and control and reference circuit related power, respectively. These values are obtained from SPICE simulations of extracted layouts of a FOPAC designed in a 65 nm technology. The (known) impact of the fly capacitor size on the maximum power efficiency is for varying load current is shown in Figure 10(a). The fly capacitors can be split to achieve maximum power efficiency for varying load currents.



Fig. 9. MPVR symmetric step up and step down scaling, with reference voltages of 980 mV and 910 mV.

In the FOPAC architecture, fly capacitors connected to the ReRoCs can optionally be used to lower the frequency. Frequency scaling with different fly capacitor values on an arbitrarily selected ReRoC frequency of 4.7 GHz is shown in Figure 10(b). The fly capacitors are split to achieve finer granularity of frequency scaling. This design parameter of fly capacitors supplements the DFS operation provided by the resonant rotary divider (Section III-B), enabling even finer control of the DFS operation in FOPAC.

# B. Dynamic Frequency Scaling of ReRoC

In the dynamic ReRoC frequency divider, illustrated in Figure 5, the phase delay between the adjacent MOSABs in the main-loop is expressed as  $((m-1)/m) \cdot 2\pi$  and the phase delay between the MISABs in the sub-loop is expressed as  $((m-2)/m) \cdot 2\pi$  [9]. When  $n_1$  is the number of connections between the MOSABs in the main-loop, and  $n_2$  is the number of connections between the MISABs and MOSABs in the main-loop and sub-loop, then the number of connections required to perform a division of ratio r is,

$$\left(n_1 \cdot \frac{m-1}{m}\right) \cdot 2\pi + \left(n_2 \cdot \frac{m-2}{m}\right) \cdot 2\pi = r \cdot 2\pi. \tag{3}$$

The m phases of the ReRoCs are used to produce the frequency divider output. When m=8, the phase delays between the adjacent SABs in the main-loop is  $7/8 \cdot 2\pi$  and the sub-loop is  $6/8 \cdot 2\pi$ , as illustrated in Figure 5. From Eq (3), to perform a frequency division of r=9,  $n_1=6$  and  $n_2=5$  connections are required. This can be achieved with a circuit topology similar but not identical to that in Figure 5. The proposed topology is not identical because for



Fig. 10. Power conversion efficiency of SCVRs and frequency scaling of ReRoC with fly caps.



Fig. 11. Power consumption of the dynamic frequency divider, 6 main-loops and 1 sub-loop.

r=10,  $n_1=8$  and  $n_2=4$ , connections would be required, which are higher than the available SABs in Figure 5 [9]. A modified topology is proposed in this paper in order to perform frequency division greater than 9. The main-loop in Figure 5 is stacked with an additional main-loop. An increase in the number of the main-loops requires larger multiplexers to enable the selection between the main-loops and the sub-loop. In [9], it is shown that restricting the number of MISABs in the sub-loop is desirable for power savings and lower area. Stacking additional main-loops, any integer division ratio from 3 to n can be achieved, at limited power and area cost. This is achieved by implementing the smallest number of  $n_1$  and  $n_2$  desirable for power savings and area reduction from Eq. (3).

The power consumption of the frequency dividers for division ratios 3 to 44 with a master clock of 3.3 GHz is shown in Figure 11. The total number of MISABs and MOSABs increases monotonically along with the division ratio. The power consumption however does not increase monotonically due to the adiabatic nature of the resonant frequency dividers and the selection lines for the DFS. The switching time between frequency domains is approximately 3 clock cycles (≈0.60 ns for a 5 GHz clock). Experiments are repeated for arbitrarily selected master clock frequencies of 2.5 GHz, 4.2 GHz, and 5 GHz, demonstrating similar trends of sublinearly increasing power dissipation with increasing divider value and under 3 clock cycles of switching time for DFS as shown in Table II.



Fig. 12. Phase noise versus offset frequency of the ReRoCs.

In Figure 12, the phase noise of the ReRoCs designed for Table II is plotted. The phase noise of the 2.5 GHz ReRoC at an offset frequency of 1 MHz is -137.22dBc/Hz. The phase noise for the 4.2 GHz and 5 GHz master clock frequencies are -135.37dBc/Hz and -132.40dBc/Hz, respectively. An all digital phase locked loop (ADPLL) manufactured in the 65 nm SOI technology node with an output frequency of 4 GHz has a phase noise of -111.22dBc/Hz at an offset frequency of 1 MHz [25].

| Div   | Re    | RoC master clock 2.5 | 6 GHz    | ReRoC master clock 4.2 GHz |                  |          | ReRoC master clock 5 GHz |                  |          |
|-------|-------|----------------------|----------|----------------------------|------------------|----------|--------------------------|------------------|----------|
| ratio | Power | Power normalized     | DFS time | Power                      | Power normalized | DFS time | Power                    | Power normalized | DFS time |
|       | (mW)  | to div ratio 44      | (ns)     | (mW)                       | to div ratio 44  | (ns)     | (mW)                     | to div ratio 44  | (ns)     |
| 3     | 7.32  | 0.67×                | 1.18     | 5.20                       | 0.59×            | 0.72     | 4.31                     | 0.55×            | 0.59     |
| 9     | 7.90  | 0.73×                | 1.19     | 5.78                       | 0.66×            | 0.69     | 4.89                     | 0.62×            | 0.60     |
| 16    | 9.43  | 0.87×                | 1.19     | 7.31                       | 0.84×            | 0.71     | 6.42                     | 0.82×            | 0.61     |
| 23    | 10.18 | 0.94×                | 1.21     | 8.07                       | 0.92×            | 0.70     | 7.18                     | 0.91×            | 0.60     |
| 30    | 10.32 | 0.95×                | 1.20     | 8.20                       | 0.94×            | 0.71     | 7.31                     | 0.93×            | 0.60     |
| 37    | 10.27 | 0.94×                | 1.19     | 8.15                       | 0.93×            | 0.72     | 7.26                     | 0.92×            | 0.60     |
| 44    | 10.87 | 1×                   | 1.19     | 8.74                       | 1×               | 0.68     | 7.86                     | 1×               | 0.59     |
| Avg.  | 9.47  | 0.85×                | 1.19     | 7.35                       | 0.81×            | 0.70     | 6.46                     | 0.79×            | 0.60     |

TABLE II

DYNAMIC FREQUENCY SCALING WITH DIVIDER

TABLE III
PVT, IR AND LDI/DT ANALYSIS RESULTS

| Design        | PVT (500 Mor     | Static IR        |          |
|---------------|------------------|------------------|----------|
|               | Max. intra-die   | Max. inter-die   | & Ldi/dt |
| AES cipher    | $31\mathrm{MHz}$ | $15\mathrm{MHz}$ | 2.1%     |
| CORTEX M0     | $35\mathrm{MHz}$ | $17\mathrm{MHz}$ | 1.9%     |
| VSCALE RISC-V | $28\mathrm{MHz}$ | $24\mathrm{MHz}$ | 2.4%     |
| Average       | $30\mathrm{MHz}$ | $19\mathrm{MHz}$ | 2.1%     |

# C. FOPAC Methodology

The novelties of this work are embedded into the FOPAC methodology, compliant with the traditional ASIC flow, as illustrated in Figure 13. First, the design is synthesized with an industrial tool and undergoes initial placement, power planning, and placement blockages for the ReRoCs. Then, the designs undergo ReRoC design and power generation. The custom flows enabling FOPAC are as follows:

- 1) ReRoC Design: The custom ReRoC clock distribution network synthesis has five steps, illustrated in Figure 13.
  - Register clustering to generate balanced capacitance load clusters.
  - 2) BST/DME to generate an unbuffered steiner tree for each cluster [26].
  - ReRoC topology generation with the dynamic frequency dividers.
  - 4) Generation of synchronous distribution aware sparse ReRoCs (SReRoC) [11].
  - 5) Physical connections translation to a netlist and PVT analysis of the clock distribution network.

Step 1 and step 2 constitute the subnetwork tree generation process for a bottom-up clock tree synthesis (CTS) process. The clock network and distribution are designed to be correct by design, thanks to steps 1 through 4, concluding with the PVT analysis with SPICE-accurate verification in step 5. The proposed steps for ReRoC are similar to [27], where [27] details the automation of resonant rotary clock design for any ASIC design. ReRoC is differentiated from [27] in codesigning clock and power, i.e. the power planning input to the ReRoC Stage has pervasive impact on the ReRoC design, as the fly capacitors are opportunistically shared.

- 2) *Power Generation:* The power generation (and distribution) has five steps, illustrated in Figure 13.
  - 1) Power budget estimation for the design.
  - 2) Determination of the number of SCVRs and topology.



Fig. 13. FOPAC Methodology.

- 3) Placement of the MPVR.
- Power grid extraction along with the ReRoCs and core load.
- 5) Worst case static IR and Ldi/dt analysis.

The input to Step 1 is the topology of the ReRoC rings including the number of ReRoC rings and tapping points for the multiple phases required for the MPVRs from the ReRoC design flow. For a given power budget, an SCVR topology is designed with the goal of achieving the desired target efficiency by distributing the power budget over multiple SCVRs. The number of SCVRs required is divided such that each ReRoC ring has a voltage regulator (with load balancing) and the rest of the design has the appropriate number of voltage regulators necessary to operate during the low performance mode. Similar to use of PVT analysis in ReRoC design for DFS, the power generation stage utilizes SPICE simulations in step 5 for signal integrity analysis of DVS operation.

| Design        | Num.  | PLL clocked design |              | FOPAC           |                        |       |
|---------------|-------|--------------------|--------------|-----------------|------------------------|-------|
|               | SCVRs | $PLL_{clock}$      | $PLL_{core}$ | $FOPAC_{clock}$ | $	extsf{FOPAC}_{core}$ | Num.  |
|               |       | power (mW)         | power (mW)   | power (mW)      | power (mW)             | ReRoC |
| AES cipher    | 8     | 18.17              | 57.60        | 5.96 (-67.19%)  | 37.87 (-34.25%)        | 4     |
| CORTEX M0     | 12    | 26.04              | 73.74        | 9.34 (-64.13%)  | 45.56 (-38.21%)        | 6     |
| VSCALE RISC-V | 16    | 41.21              | 108.28       | 12.78 (-68.98%) | 69.04 (-36.23%)        | 10    |
| Average       | -     | -                  | -            | -67%            | -36%                   | -     |

 $\label{eq:table_iv} \text{TABLE IV}$  Power Consumption of PLL Design Versus forac Operating at Freq = 825 MHz,  $V_{dd}=0.98$  V, and Temp. = 25°

#### IV. FOPAC EVALUATION

FOPAC is demonstrated on three different industrial designs that are publicly available: 1) *AES* encryption core, 2) *Arm core* - CORTEX M0, and 3) *VSCALE* RISC-V. The designs are placed and routed (P&R) and subjected to STA in order to verify the timing of the ASIC flow at the system level. The timing and power characteristics of the FOPAC components (ReRoC and MPVR) are analyzed in deeper detail through SPICE simulations of layout-extracted models that include parasitics. In particular, the transmission line interconnect parasitics are extracted using the high frequency structural simulator (HFSS) [28]. The algorithms are implemented in C++ and Matlab. An industrial 65 nm technology library is used for the evaluation.

The simulations results and conclusions are presented within three major categories:

- 1) FOPAC DVFS operation,
- FOPAC power consumption with respect to traditional systems, and
- 3) FOPACs power consumption with respect to previous literature on resonant systems.

## A. FOPAC DVFS Operation

An arbitrary ReRoC frequency of  $3.3 \,\text{GHz}$  ( $F_M$ ) is chosen to evaluate the FOPAC methodology. Two sets of dynamic resonant frequency dividers to perform frequency division in integer ratios 3 to 9 are designed for the core clock source  $(F_{core})$  and the MPVR clock source  $(F_{mpvr})$ . In the PVT stage, the geometries of the ReRoC rings along with the frequency dividers are varied  $\pm 10\%$  to represent the worst case scenarios. The deviation from the target frequency of 3.3 GHz with PVT variations for 500 Monte-Carlo runs are presented in Table III. The variation analysis is performed across the three designs. Average frequency variations of <1% and <0.6% are observed under intra- and inter-die variations, respectively. Worst case static IR and Ldi/dt analysis are performed on the layout extracted industrial designs (RLC models). The average worst case voltage drop across the three industrial designs is 2.1% of the  $V_{dd}$ .

A sample operation of FOPAC DVFS operation of the RISC-V core is presented in Figure 14, prior to the presentation of the performance comparisons to prior work in literature in Tables IV and V. The switching speed between

TABLE V

COMPARISON OF FOPAC WITH PRIOR SIMULATION WORK

| Design                        | Ahn [29]   | This work                      |
|-------------------------------|------------|--------------------------------|
|                               | TCAD 2016  | FOPAC                          |
| Technology                    | 45 nm bulk | 65 nm <b>bulk</b>              |
| Results                       | Simulation | Simulation                     |
| Clock source                  | _          | ReRoC                          |
| System resonant               | Always     | Always                         |
| Resonant DVFS                 | Yes        | Yes                            |
| Voltage range                 | 1.5 -1.9 V | <b>0.9-</b> 1.2 V              |
| Frequency range               | 2-4 GHz    | 348 MHz-1.1 GHz*               |
| Inductor                      | On chip    | No                             |
| Voltage regulator (VR)        | No         | Yes                            |
| DVS speed                     | _          | $7.86\mathrm{V}/\mu\mathrm{s}$ |
| $\eta_{max}$ of VR            | _          | 77%                            |
| $\rho  (W/mm^2)  @\eta_{max}$ | _          | 0.17                           |
| Power reduction               | 27%        | 25%-39%                        |
| Clock power reduction         | -          | 62%-74%                        |

\*Resonant frequency divider topology dependent

different frequency domains is <1 ns ( $\approx$ 3 cycles of  $F_M$ ). At the start, it takes 3 ns for the ReRoC oscillations to sustain at  $3.3\,\mathrm{GHz}$  (F<sub>M</sub>). After which a divide by 5 is performed to generate the 660 MHz clock for the core ( $F_{core}$ ) and divide by 9 to generate the  $360\,\text{MHz}$  clock  $(F_{mpvr})$  for the MPVRs at  $V_{dd} = 0.98 \,\mathrm{V}$  (nominal). The high gain comparators are clocked at  $F_M$  and nominal gain comparators are clocked at  $F_{mpvr}$ . The frequency  $F_{core}$  and voltage are scaled between different levels to validate the accuracy of the switching speed. To enable the fly capacitor reuse mode ( $RU_{mode}$ ), 10 of the SCVRs are shut down and the fly capacitor is loaded to the ReRoC rings in the RISC-V core. In the  $RU_{mode}$ , it takes  $\approx$ 3 ns for the frequency of the ReRoC F<sub>M</sub> to stabilize to 3.14 GHz (with voltage scaling). Then, it takes 0.97 ns to scale  $F_{core}$  to 624 MHz and  $F_{mpvr}$  to 348 MHz to operate the RISC-V core in the  $RU_{mode}$  with  $V_{dd} = 0.91 \, V$ . In total, it takes 3.97 ns to scale the frequency to the  $RU_{mode}$  at run time by utilizing the fly capacitor of the SCVRs.

# B. FOPAC Power Consumption

The power consumption of the FOPAC based designs versus PLL based designs operating at 825 MHz is presented in Table IV. The PLL-based design is built with a traditional PLL from a cell library used on the ASIC implementations performed with Cadence Innovus. The designs are extracted with



(a) Dynamic voltage scaling speed



(b) Dynamic voltage and frequency scaling speed

Fig. 14. FOPAC DVFS operation on the RISC-V core. (a) Dynamic voltage scaling speed. (b) Dynamic voltage and frequency scaling speed.

Mentor Graphics Calibre. Power measurements are presented for the PLL only, labeled  $PLL_{clock}$  in Table IV, and for the entire design, labeled  $PLL_{core}$  in Table IV. The power of the SCVRs, the frequency dividers, and the control circuit is included in FOPAC $_{core}$ . The PLL based designs and FOPAC based designs have the same number of SCVRs. The clock source for the SCVRs in the PLL based designs are ring oscillators [15]. A total power saving of 36% is achieved for the circuits (FOPAC $_{core}$ ) when compared against a PLL clocked core (PLL $_{core}$ ). The clock power savings are significant for the clocks: 67% power savings (FOPAC $_{clock}$ ) when compared against a PLL based design (PLL $_{clock}$ ).

# C. FOPAC Power Comparison With Previous Works

FOPAC is compared to prior resonant works with simulations only in Table V. The numbers reported in Table V are from SPICE simulations of the sweep of the V/f range—0.9 to 1.2 V and 348 MHz to 1.1 GHz— over SS, FF, FS, and SF corners, and not only the results reported in Table IV. Overall, FOPAC delivers power with 77% efficiency, and achieves 25%-39% power reduction thanks to 64%-74% reduction in clock power. The voltage scaling within FOPAC is symmetric and robust with a (worst-case)  $t_{response}$  of 89 ps. The DFS switching time within FOPAC utilizing a 3.3 GHz ReRoC

is 0.9 ns. Overall, FOPAC demonstrates scaling of the voltage-frequency over a wide range without the need for on/off-chip inductors while re-utilizing ( $RU_{mode}$ ) the fly capacitor for frequency tuning.

#### V. CONCLUSIONS

In this paper, the fusion of resonant rotary clock with on chip voltage regulators enabling flexible on-chip power and clock is presented. FOPAC is designed and evaluated on three different industrial designs to validate the architecture. FOPAC can switch between different V/f domains in 1.9 ns with a ReRoC clock operating at 3.3 GHz. FOPAC achieves 25% - 39% power savings while offering fly capacitance re-usability to tune the ReRoC frequency at run time without any negative implications. FOPAC can provide high number of V/f domains with fast DVFS capability while consuming low-power and operating reliably, justified via evaluation on industrial designs in this work.

## REFERENCES

- [1] K. Shu, E. Sanchez-Sinencio, J. Silva-Martinez, and S. H. K. Embabi, "A 2.4-GHz monolithic fractional-N frequency synthesizer with robust phase-switching prescaler and loop capacitance multiplier," *IEEE J. Solid-State Circuits*, vol. 38, no. 6, pp. 866–874, Jun. 2003.
- [2] H.-P. Le, S. R. Sanders, and E. Alon, "Design techniques for fully integrated switched-capacitor DC-DC converters," *IEEE J. Solid-State Circuits*, vol. 46, no. 9, pp. 2120–2131, Sep. 2011.
- [3] Y. K. Ramadass, A. A. Fayed, and A. P. Chandrakasan, "A fully-integrated switched-capacitor step-down DC-DC converter with digital capacitance modulation in 45 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2557–2565, Dec. 2010.
- [4] S. Sanders, E. Alon, H.-P. Le, M. Seeman, M. John, and V. Ng, "The road to fully integrated DC-DC conversion via the switched-capacitor approach," *IEEE Trans. Power Electron.*, vol. 28, no. 9, pp. 4146–4155, Sep. 2013.
- [5] P. Restle et al., "5.3 wide-frequency-range resonant clock with on-the-fly mode changing for the POWER8<sup>TM</sup> microprocessor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 100–101.
- [6] G. Villar-Piqué, H. J. Bergveld, and E. Alarcón, "Survey and benchmark of fully integrated switching power converters: Switched-capacitor versus inductive approach," *IEEE Trans. Power Electron.*, vol. 28, no. 9, pp. 4156–4167, Sep. 2013.
- [7] O. A. Uzun and S. Köse, "Converter-gating: A power efficient and secure on-chip power delivery system," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 4, no. 2, pp. 169–179, Jun. 2014.
- [8] H. P. Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19 W/mm<sup>2</sup> at 73% efficiency," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 372–373.
- [9] Y. Teng and B. Taskin, "Resonant frequency divider design methodology for dynamic frequency scaling," in *Proc. Int. Conf. Comput. Design (ICCD)*, Oct. 2013, pp. 479–482.
- [10] J. Wood, T. C. Edwards, and S. Lipa, "Rotary traveling-wave oscillator arrays: A new clock technology," *IEEE J. Solid-State Circuits*, vol. 36, no. 11, pp. 1654–1665, Nov. 2001.
- [11] Y. Teng and B. Taskin, "Sparse-rotary oscillator array (SROA) design for power and skew reduction," in *Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE)*, Mar. 2013, pp. 1229–1234.
- [12] F. U. Rahman and V. Sathe, "Quasi-resonant clocking: Continuous voltage-frequency scalable resonant clocking system for dynamic voltage-frequency scaling systems," *IEEE J. Solid-State Circuits*, vol. 53, no. 3, pp. 924–935, Mar. 2018.
- [13] L. G. Salem and P. P. Mercier, "26.4 A 0.4-to-1V 1MHz-to-2GHz switched-capacitor adiabatic clock driver achieving 55.6% clock power reduction," in *Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*), Feb. 2017, pp. 442–443.
- [14] H. Fuketa, M. Nomura, M. Takamiya, and T. Sakurai, "Intermittent resonant clocking enabling power reduction at any clock frequency for near/sub-threshold logic circuits," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, pp. 536–544, Feb. 2014.

- [15] Y. Lu et al., "A 123-phase DC-DC converter-ring with fast-DVS for microprocessors," in Proc. Int. Solid-State Circuits Conf. (ISSCC), Feb. 2015, pp. 1–3.
- [16] Y. Lu, J. Jiang, and W.-H. Ki, "Design considerations of distributed and centralized switched-capacitor converters for power supply on-chip," *IEEE J. Emerg. Sel. Topics Power Electron.*, vol. 6, no. 2, pp. 515–525, Jun. 2018.
- [17] A. Martchovsky and K. D. Pedrotti, "Amplifier innovations for improvement of rotary traveling wave oscillators," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 2, pp. 522–530, Feb. 2018.
- [18] G. Venkataraman, J. Hu, F. Liu, and C. N. Sze, "Integrated placement and skew optimization for rotary clocking," in *Proc. Design Autom. Test Eur. Conf. (DATE)*, vol. 1, Mar. 2006, pp. 1–6.
- [19] Z. Yu and X. Liu, "Design of rotary clock based circuits," in *Proc. Design Autom. Conf. (DAC)*, Jun. 2007, pp. 43–48.
- [20] J. Lu, V. Honkote, X. Chen, and B. Taskin, "Steiner tree based rotary clock routing with bounded skew and capacitive load balancing," in *Proc. Design, Autom. Test Eur. (DATE)*, Mar. 2011, pp. 1–6.
- [21] V. Honkote and B. Taskin, "CROA: Design and analysis of the custom rotary oscillatory array," *IEEE Trans. Very Large Scale Integr. (VLSI)* Syst., vol. 19, no. 10, pp. 1837–1847, Oct. 2011.
- [22] X. Hu and M. R. Guthaus, "Distributed LC resonant clock grid synthesis," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 11, pp. 2749–2760, Nov. 2012.
- [23] R. Jakushokas and E. G. Friedman, "Globally integrated power and clock distribution network," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May/Jun. 2010, pp. 1751–1754.
- [24] O. A. Uzun, "Speed, power efficiency, and noise improvements for switched capacitor voltage converters," M.S. thesis, Univ. South Florida, Tampa, FL, USA, 2017.
- [25] J. A. Tierno, A. V. Rylyakov, and D. J. Friedman, "A wide power supply range, wide tuning range, all static CMOS all digital PLL in 65 nm SOI," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 42–51, Jan. 2008.
- [26] K. D. Boese and A. B. Kahng, "Zero-skew clock routing trees with minimum wirelength," in *Proc. IEEE Int. ASIC Conf. Exhibit (ASIC)*, Sep. 1992, pp. 17–21.
- [27] R. Kuttappa, A. Balaji, V. Pano, B. Taskin, and H. Mahmoodi, "Rota-SYN: Rotary traveling wave oscillator SYNthesizer," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 7, pp. 2685–2698, Jul. 2019.
- [28] High Frequency Structural Simulator (HFSS): User's Guide, Ansoft Corp., Pittsburgh, PA, USA, 2018.
- [29] S. Ahn, M. Kang, M. C. Papaefthymiou, and T. Kim, "Design methodology for synthesizing resonant clock networks in the presence of dynamic voltage/frequency scaling," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 35, no. 12, pp. 2068–2081, Mar. 2016.



Ragh Kuttappa (S'15) received the bachelor's degree from Visvesvaraya Technological University, India, in 2012, and the master's degree from San Francisco State University, San Francisco, CA, USA, in 2015. He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA. He has held an intern position at the Samsurung Austin Research Center (SARC) in 2017. His current research interests include electronic design automation for VLSI, low-power circuits, and resonant clocking.



Selçuk Köse (S'10–M'12) received the B.S. degree in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 2006, and the M.S. and Ph.D. degrees in electrical engineering from the University of Rochester, Rochester, NY, USA, in 2008 and 2012, respectively.

He was with TUBITAK, Ankara, Intel Corporation, Santa Clara, CA, USA, and Freescale Semiconductor, Tempe, AZ, USA. He was an Assistant Professor with the University of South Florida, Tampa, FL, USA. He is currently

an Associate Professor with the Department of Electrical Engineering, University of Rochester. His current research interests include integrated voltage regulation, 3-D integration, hardware security, and green computing.

Dr. Köse was a recipient of the NSF CAREER Award, the Cisco Research Award, the USF College of Engineering Outstanding Junior Researcher Award, and the USF Outstanding Faculty Award. He has served on the Technical Program and Organization Committees of various conferences. He is an Associate Editor of the *Journal of Circuits, Systems, and Computers* and the *Microelectronics Journal*.



Baris Taskin (S'01–M'05–SM'12) received the B.S. degree in electrical and electronics engineering from Middle East Technical University, Ankara, Turkey, in 2000, and the M.S. and Ph.D. degrees in electrical engineering from the University of Pittsburgh, Pittsburgh, PA, USA, in 2003 and 2005, respectively.

He is currently a Professor of electrical and computer engineering with Drexel University, Philadelphia, PA, USA. His current research interests include electronic design automation for VLSI, low-power

circuits, resonant clocking, clock network synthesis, wireless IC interconnects, and networks-on-chip for chip multiprocessors.

Dr. Taskin was a recipient of a number of awards for his research and professional contributions, including the National Science Foundation Faculty Early Career Development (NSF CAREER) Award in 2009, the ACM SIGDA Distinguished Service Award in 2012, the Delaware Valley Young Electrical Engineer of the Year Award from the IEEE Philadelphia Section in 2013, and the Drexel ECE Department's Outstanding Research Award in 2015. He is the General Chair of the ACM Great Lakes Symposium on VLSI (GLSVLSI) 2019 and the Chair of the IEEE Circuits and Systems Society's Technical Committee on VLSI Systems and Applications (IEEE CAS VSA-TC) 2018–2020.