# **Supply and Threshold Voltage Scaling Techniques in CMOS Circuits**

by

#### Volkan Kursun

Submitted in Partial Fulfillment
of the
Requirements for the Degree
Doctor of Philosophy

Supervised by Professor Eby G. Friedman

Department of Electrical and Computer Engineering School of Engineering and Applied Sciences The College

> University of Rochester Rochester, New York 2004

UMI Number: 3122245

#### **INFORMATION TO USERS**

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.



#### UMI Microform 3122245

Copyright 2004 by ProQuest Information and Learning Company.

All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346  $\mathcal{MELIORA}$ 

# **Dedication**

This work is dedicated to my mother Nazan and my sister Pınar.

### **Curriculum Vitae**

The author was born in Ankara, Turkey on June 5, 1974. He attended the Middle East Technical University from 1995 to 1999 and graduated with a Bachelor of Science degree in Electrical and Electronics Engineering in 1999. He received a Master of Science degree in Electrical and Computer Engineering from the University of Rochester in 2001. Since 1999 he has been working toward a Ph.D. degree in Electrical and Computer Engineering at the University of Rochester.

He performed research on high speed voltage interface circuits with Xerox Corporation, Webster, New York in 2000. During summers 2001 and 2002, he was with Intel Microprocessor Research Laboratories, Hillsboro, Oregon, responsible for the modeling and design of high frequency monolithic DC-DC converters. His current research interests include low voltage, low power, and high performance integrated circuit design, modeling of semiconductor devices, and emerging integrated circuit technologies.

## Acknowledgments

I would like to thank Professor Eby G. Friedman for his dedication and excellence as a supervisor. His technical skills as a researcher combined with his immense experience in management, human psychology, and life in general constitutes a rich source that I have been sampling since we have met. His guidance to his students goes well beyond the regular duty of a Ph.D. supervisor. He spent an ample amount of time on helping me to develop the necessary skills for becoming a successfull researcher in the field of Electrical and Computer Engineering. His continuous encouragement and support throughout my Ph.D. studies are deeply appreciated.

I would like to thank Doctor Siva G. Narendra of Intel Corporation, Hillsboro, Oregon for sharing his experience and enthusiasm with me during summers 2001 and 2002. I would also like to thank him for taking the time from his busy schedule to participate in my Ph.D final oral examination committee.

I would like to thank Professor David Albonesi for his leading role in the Complexity Adaptive Processing Project which initiated my interest in the areas of low power CMOS circuit design and monolithic DC-DC converters. I would also like to thank him for his support throughout my graduate studies and his service in my Ph.D. qualifying and final oral examination committees.

I would like to thank the colleagues at Intel Microprocessor Research Laboratories for the wonderful time we had during summers 2001 and 2002 in Hillsboro, Oregon. I continue to be inspired by their hard work and innovative thinking. My special thanks go to Shekhar Borkar, the Director of Circuit Research Labs, for managing a work environment that fully supports collaboration and creativity.

I would like to thank Professor Sandhya Dwarkadas and Professor Martin Margala for their support and service in my Ph.D. qualifying and final oral examination committees.

#### **Abstract**

The generation, distribution, and dissipation of power are at the forefront of problems faced in the development of high performance integrated circuits. Several techniques for designing low power and high speed integrated circuits are presented in this dissertation. Supply and threshold voltage scaling techniques, targeting lower power consumption and enhanced device reliability without degrading circuit speed, are described.

Systems with multiple supply voltages can significantly reduce power consumption without degrading speed by selectively lowering the supply voltages along non-critical delay paths. High frequency monolithic DC-DC conversion techniques applicable to multiple supply voltage CMOS circuits are presented in order to provide additional voltage levels with low energy and area overhead. Full integration of a high efficiency buck converter on the same die as a dual supply voltage microprocessor is demonstrated to be feasible. A low swing DC-DC conversion technique is presented that enhances the energy efficiency of a monolithic DC-DC converter. Device reliability issues in monolithic DC-DC converters operating at high input voltages are discussed. A cascode bridge circuit that guarantees the reliable operation of deep submicrometer MOSFETs without exposure to high voltage stress while operating at high input and output voltages is introduced.

An important technique for reducing the impact of supply voltage scaling on circuit performance is scaling threshold voltages. Exponentially increasing subthreshold leakage currents and worsening short-channel effects at reduced threshold voltages are discussed. Increasing performance degradation caused by dieto-die and within-die parameter variations at reduced gate lengths and threshold voltages is described. Dynamic threshold voltage scaling techniques reduce the deleterious effects of static threshold voltage scaling. A novel variable threshold voltage CMOS circuit technique for simultaneously enhancing the speed and power

characteristics of dynamic circuits is presented. Both reverse and forward body bias techniques are applied to domino logic circuits for enhanced robustness against onchip noise. Multiple threshold voltage CMOS circuits offer decreased subthreshold leakage currents and enhanced performance by selectively lowering the threshold voltages along the speed critical paths. A sleep switch dual threshold voltage domino logic circuit technique providing significant savings in subthreshold leakage energy is described.

## **Contents**

| De  | dicati  | on        |                                                       |                                         | iii  |
|-----|---------|-----------|-------------------------------------------------------|-----------------------------------------|------|
| Cu  | rricul  | um Vitae  |                                                       |                                         | iv   |
| Ac  | know    | ledgmen   | ts                                                    |                                         | v    |
| Αb  | stract  |           |                                                       |                                         | vi   |
| Lis | st of T | ables     |                                                       |                                         | xv   |
| Lis | st of F | igures    |                                                       |                                         | xvii |
| 1   | Intro   | duction   |                                                       |                                         | 1    |
|     | 1.1     | Evoluti   | on of Integrated Circuits                             |                                         | 4    |
|     | 1.2     | Outline   | of the Dissertation                                   | •••••                                   | 15   |
| 2   | Sour    | ces of P  | ower Consumption in CMOS Integrated Circuits          | •••••                                   | 22   |
|     | 2.1     | Dynam     | ic Switching Power                                    |                                         | 23   |
|     | 2.2     | Leakag    | e Power                                               | · • • • • • • • • • • • • • • • • • • • | 27   |
|     |         | 2.2.1     | Subthreshold Leakage Current                          |                                         | 27   |
|     |         |           | 2.2.1.1 Short-Channel Effects                         |                                         | 29   |
|     |         |           | 2.2.1.2 Drain-Induced Barrier Lowering                |                                         | 31   |
|     |         |           | 2.2.1.3 Characterization of Subthreshold Leakage Curr | rent                                    | 31   |
|     |         | 2.2.2     | Gate Oxide Leakage Current                            | · • • • • • • • • • • • • • • • • • • • | 37   |
|     | 2.3     | Short-C   | Circuit Power                                         | • • • • • • • • • • • • • • • • • • • • | 45   |
|     | 2.4     | Static I  | OC Power                                              | • • • • • • • • • • • • • • • • • • • • | 46   |
| 3   | Supr    | olv and T | hreshold Voltage Scaling Techniques                   |                                         | 48   |

|   | 3.1   | Dynam    | ic Supply  | Voltage Scaling                                    | 53  |
|---|-------|----------|------------|----------------------------------------------------|-----|
|   | 3.2   | Multipl  | le Supply  | Voltage CMOS                                       | 57  |
|   | 3.3   | Thresh   | old Voltag | e Scaling                                          | 61  |
|   |       | 3.3.1    | Body Bia   | as Techniques                                      | 67  |
|   |       |          | 3.3.1.1    | Reverse Body Bias                                  | 67  |
|   |       |          | 3.3.1.2    | Forward Body Bias                                  | 76  |
|   |       |          | 3.3.1.3    | Bidirectional Body Bias                            | 83  |
|   |       | 3.3.2    | Multiple   | Threshold Voltage CMOS                             | 88  |
|   | 3.4   | Multipl  | le Supply  | and Threshold Voltage CMOS                         | 93  |
|   | 3.5   | Dynam    | ic Supply  | and Threshold Voltage Scaling                      | 97  |
|   | 3.6   | Chapte   | r Summar   | y                                                  | 100 |
| 4 | Low   | Voltage  | Power Su   | pplies                                             | 102 |
|   | 4.1   | Linear   | DC-DC C    | onverters                                          | 106 |
|   | 4.2   | Switch   | ed-Capaci  | tor DC-DC Converters                               | 110 |
|   | 4.3   | Switch   | ing DC-D   | C Converters                                       | 112 |
|   |       | 4.3.1    | Operation  | n of a Buck Converter                              | 113 |
|   |       | 4.3.2    | Power Ro   | eduction Techniques for Switching DC-DC Converters | 117 |
|   | 4.4   | Chapte   | r Summar   | y                                                  | 118 |
| 5 | Anal  | ysis of  | Buck Con   | verters for On-Chip Integration with a Dual Supply |     |
|   | Volta | age Mici | roprocesso | r                                                  | 122 |
|   | 5.1   | Circuit  | Model of   | a Buck Converter                                   | 125 |
|   |       | 5.1.1    | MOSFET     | Γ Related Power Losses                             | 126 |
|   |       | 5.1.2    | Filter Inc | luctor Related Power Losses                        | 128 |

|   |       | 5.1.3    | Filter Capacitor Related Power Losses                       | 129 |
|---|-------|----------|-------------------------------------------------------------|-----|
|   |       | 5.1.4    | Total Power Consumption of a Buck Converter                 | 129 |
|   | 5.2   | Efficien | ncy Analysis of a Buck Converter                            | 130 |
|   |       | 5.2.1    | Circuit Analysis for Global Maximum Efficiency              | 132 |
|   |       | 5.2.2    | Circuit Analysis with Limited Filter Capacitance            | 136 |
|   |       | 5.2.3    | Output Voltage Ripple Constraint                            | 137 |
|   | 5.3   | Simula   | tion Results                                                | 140 |
|   | 5.4   | Chapte   | r Summary                                                   | 142 |
| 6 | Low   | Voltage  | Swing Monolithic DC-DC Conversion                           | 144 |
|   | 6.1   | Circuit  | Model of a Low Voltage Swing Buck Converter                 | 146 |
|   |       | 6.1.1    | MOSFET Power Dissipation                                    | 147 |
|   |       | 6.1.2    | MOSFET Model                                                | 150 |
|   |       | 6.1.3    | Filter Inductor Power Dissipation                           | 151 |
|   | 6.2   | Low V    | oltage Swing Buck Converter Analysis                        | 152 |
|   |       | 6.2.1    | Full Swing Circuit Analysis for Global Maximum Efficiency   | 153 |
|   |       | 6.2.2    | Low Swing Circuit Analysis for Global Maximum Efficiency    | 155 |
|   | 6.3   | Chapte   | r Summary                                                   | 159 |
| 7 | High  | Input V  | Voltage Step-Down DC-DC Converters for Integration in a Low |     |
|   | Volta | age CM(  | OS Process                                                  | 162 |
|   | 7.1   | Cascod   | le Bridge Circuit                                           | 165 |
|   | 7.2   | Large S  | Step-Down Non-Isolated Switching DC-DC Converter            | 166 |
|   |       | 7.2.1    | Operation of the Cascode DC-DC Converter                    | 166 |
|   |       | 7.2.2    | Efficiency Characteristics                                  | 168 |

|   |      | 7.2.3    | Charge Recycling Mechanism                                                      | 70  |
|---|------|----------|---------------------------------------------------------------------------------|-----|
|   | 7.3  | Chapte   | r Summary 1                                                                     | 71  |
| 8 | Sign | al Trans | fer in Integrated Circuits with Multiple Supply Voltages 1                      | 72  |
|   | 8.1  | A High   | Speed and Low Power Voltage Interface Circuit                                   | 74  |
|   | 8.2  | Voltage  | e Interface Circuit Simulation Results                                          | 75  |
|   | 8.3  | Experi   | mental Results                                                                  | 80  |
|   | 8.4  | Chapte   | r Summary 1                                                                     | 82  |
| 9 | Dom  | ino Log  | ic with Variable Threshold Voltage Keeper 1                                     | 83  |
|   | 9.1  | Standa   | rd Domino Logic Circuits                                                        | 85  |
|   |      | 9.1.1    | Operation of Standard Domino Logic Circuits 1                                   | 85  |
|   |      | 9.1.2    | Noise Immunity, Delay, and Energy Tradeoffs in Domino                           |     |
|   |      |          | Logic Circuits                                                                  | 87  |
|   | 9.2  | Domin    | o Logic with Variable Threshold Voltage Keeper 1                                | 92  |
|   |      | 9.2.1    | Variable Threshold Voltage Keeper 1                                             | 92  |
|   |      | 9.2.2    | Dynamic Body Bias Generator 1                                                   | 94  |
|   | 9.3  | Simula   | tion Results 1                                                                  | 95  |
|   |      | 9.3.1    | Multiple Output Domino Carry Generator with Variable Threshold Voltage Keeper   | .96 |
|   |      |          | 9.3.1.1 Improved Delay and Power Characteristics with Comparable Noise Immunity | .99 |
|   |      |          | 9.3.1.2 Improved Noise Immunity with Comparable Delay or Power Characteristics  | 202 |
|   |      | 9.3.2    | Clock-Delayed Domino Logic with Variable Threshold  Voltage Keeper              | 204 |

|    |       | 9.3.3    | Impact of Gate Size on the Energy Overhead of the Dynamic                                                         |     |
|----|-------|----------|-------------------------------------------------------------------------------------------------------------------|-----|
|    |       |          | Body Bias Generator                                                                                               | 208 |
|    | 9.4   | Domino   | Logic with Forward and Reverse Body Biased Keeper                                                                 | 210 |
|    |       | 9.4.1    | Clock-Delayed Domino Logic with Forward and Reverse Body Biased Keeper                                            | 210 |
|    |       | 9.4.2    | Technology Scaling Characteristics of the Reverse and Forward Body Bias Techniques Applied to a Keeper Transistor | 215 |
|    | 9.5   | Chapter  | r Summary                                                                                                         | 217 |
| 10 | Subtl | nreshold | Leakage Current Characteristics of Dynamic Circuits                                                               | 219 |
|    | 10.1  | State D  | ependent Subthreshold Leakage Current Characteristics                                                             | 221 |
|    | 10.2  | Noise I  | mmunity                                                                                                           | 227 |
|    | 10.3  | Power a  | and Delay Characteristics During the Active Mode                                                                  | 232 |
|    | 10.4  | Dual T   | hreshold Voltage CMOS Technology                                                                                  | 234 |
|    | 10.5  | Chapte   | r Summary                                                                                                         | 239 |
| 11 | -     |          | Dual Threshold Voltage Domino Logic with Reduced Standby rent                                                     | 241 |
|    | 11.1  | Previou  | usly Published Sleep Mode Circuit Techniques                                                                      | 242 |
|    | 11.2  | Dual T   | hreshold Voltage Domino Logic Employing Sleep Switches                                                            | 246 |
|    | 11.3  | Simula   | tion Results                                                                                                      | 247 |
|    |       | 11.3.1   | Subthreshold Leakage Energy Reduction                                                                             | 249 |
|    |       | 11.3.2   | Stack Effect in Domino Logic Circuits                                                                             | 251 |
|    |       | 11.3.3   | Delay and Power Reduction in the Active Mode                                                                      | 254 |
|    |       | 11.3.4   | Sleep/Wake-up Delay and Energy Overhead                                                                           | 255 |
|    | 11.4  | Noise I  | mmunity Compensation                                                                                              | 260 |

|     | 11.5    | Chapter | Summary                                              | 265 |
|-----|---------|---------|------------------------------------------------------|-----|
| 12  | Low     | Swing D | Oomino Logic                                         | 268 |
|     | 12.1    | Power I | Reduction Techniques in Domino Logic Circuits        | 269 |
|     | 12.2    | Low Sv  | ving Domino Logic                                    | 272 |
|     |         | 12.2.1  | Low Swing Domino Logic with Fully Driven Keeper      | 273 |
|     |         | 12.2.2  | Low Swing Domino Logic with Weakly Driven Keeper     | 273 |
|     | 12.3    | Simulat | tion Results                                         | 274 |
|     | 12.4    | Dual Tl | nreshold Voltage Low Swing Domino Logic              | 279 |
|     | 12.5    | Chapter | r Summary                                            | 283 |
| 13  | Conc    | lusions |                                                      | 284 |
| 14  | Futur   | e Resea | rch                                                  | 293 |
|     | 14.1    | Nanom   | eter Devices                                         | 293 |
|     | 14.2    | Energy  | Efficiency in CMOS Circuits                          | 294 |
|     |         | 14.2.1  | Multiple Supply Voltage CMOS Circuits                | 295 |
|     |         | 14.2.2  | Dynamic Supply Voltage and Frequency Scaling         | 296 |
|     |         | 14.2.3  | Circuits with Multiple Voltage and Clock Domains     | 297 |
|     |         | 14.2.4  | Leakage Current Reduction Techniques                 | 299 |
|     | 14.3    | Reliabi | lity in CMOS Circuits                                | 300 |
|     |         | 14.3.1  | On-Chip Noise and Immunity Issues in CMOS Integrated |     |
|     |         |         | Circuits                                             | 300 |
|     |         | 14.3.2  | Parameter Variations                                 | 304 |
|     |         | 14.3.3  | On-Chip Clock Generation                             | 304 |
| Bih | oliogra | phy     |                                                      | 306 |

|                         | xiv |
|-------------------------|-----|
| Appendix A Patents      | 326 |
| Appendix B Publications | 327 |

## **List of Tables**

| 1.1 | Technological trends of high performance microprocessors                                                                                                                                                       | 7   |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 2.1 | A comparison of the subthreshold slope (S) and leakage current ( $I_{OFF}$ ) of                                                                                                                                |     |
|     | NMOS transistors fabricated at different technologies (T = 25°C)                                                                                                                                               | 36  |
| 2.2 | Semiconductor device scaling trends                                                                                                                                                                            | 38  |
| 2.3 | The dominant mechanisms of gate oxide tunneling current for different regions of operation of a MOSFET                                                                                                         | 42  |
| 4.1 | A comparison of the electrical characteristics and typical applications of the linear, switched-capacitor, and switching DC-DC converters                                                                      | 119 |
| 5.1 | Maximum efficiency circuit configurations of a buck converter with different filter capacitances                                                                                                               | 136 |
| 6.1 | Efficiency ( $\eta$ ) characteristics of the full swing (FS) and low swing (LS) DC-DC converter circuits obtained from the power model and simulation ( $V_{DD1} = 1.8 \text{ volts}$ and $C = 3 \text{ nF}$ ) | 150 |
|     | $(VDD) = 1.8 \text{ Voits and } C = 3 \text{ in}^3$                                                                                                                                                            | 159 |
| 7.1 | Circuit Characteristics of the Maximum Efficiency DC-DC Converters                                                                                                                                             | 169 |
| 8.1 | Normalized area, MFSO, and average internal power consumption of each voltage interface circuit ( $C_L = 1 \text{ pF}$ )                                                                                       | 179 |
| 8.2 | Experimentally measured test results                                                                                                                                                                           | 181 |
| 9.1 | A comparison of the evaluation delay, power dissipation, power-delay product (PDP), and NML (for maximum reverse body biased keeper) of                                                                        |     |
|     | SD and DVTVK circuit techniques for KPR = 2.2                                                                                                                                                                  | 201 |

| 9.2  | Achievable improvement in NML with the DVTVK circuit technique as compared to SD while maintaining equal delay, power dissipation, or PDP                                                                                                       |     |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
|      | (KPR of DVTVK is 2.2)                                                                                                                                                                                                                           | 203 |
| 9.3  | Delay, power, and PDP savings of COR-DVTVK as compared to COR-SD with different keeper sizes                                                                                                                                                    | 206 |
| 9.4  | Achievable improvement in NML with the DVTVK circuit technique as compared to SD while maintaining equal delay, power dissipation, or PDP (KPR of DVTVK is 2.2)                                                                                 | 207 |
| 9.5  | Delay, power, power-delay product (PDP), and NML savings of COR-DVTVK as compared to COR-SD (with a forward body bias voltage of 0.6 volts)                                                                                                     | 214 |
| 11.1 | Input vectors applied to an adder                                                                                                                                                                                                               | 248 |
| 11.2 | Degradation in noise immunity of standard dual- $V_t$ and sleep switch adders as compared to the low- $V_t$ adder with same size transistors                                                                                                    | 261 |
| 11.3 | A comparison of normalized subthreshold leakage energy of low- $V_t$ , standard dual- $V_t$ , and sleep switch adders under similar and degraded noise immunity conditions                                                                      | 263 |
| 11.4 | Minimum duration of the idle mode required for the sleep switch circuit technique to provide a net savings in standby energy as compared to the standard low- $V_t$ and dual- $V_t$ adders under similar and degraded noise immunity conditions | 266 |
| 12.1 | Normalized dynamic power, evaluation delay, and MTNA (PNEW = 3)                                                                                                                                                                                 | 278 |
| 12.2 | Standby mode leakage power and active mode total power for different threshold voltage distributions                                                                                                                                            | 282 |
| 12.3 | Evaluation delay and MTNA for different threshold voltage distributions                                                                                                                                                                         | 282 |

# **List of Figures**

| 1.1 | Microphotographs of three landmark ICs from the evolution of the            |    |
|-----|-----------------------------------------------------------------------------|----|
|     | integrated circuit technology (the sizes of the dies are not to scale). (a) |    |
|     | The first monolithic integrated circuit, Fairchild Semiconductor (1959).    |    |
|     | (b) The first microprocessor, Intel 4004 (1971). (c) The latest Intel       |    |
|     | Pentium 4 microprocessor (2002)                                             | 1  |
| 1.2 | A timeline of some of the key events during the evolution of                |    |
|     | semiconductor technologies                                                  | 5  |
| 1.3 | The general form of Moore's law                                             | 6  |
| 1.4 | Scaling of the minimum feature size and the increasing total number of      |    |
|     | transistors within each lead Intel microprocessor                           | 8  |
| 1.5 | Die area of lead Intel microprocessors                                      | 9  |
| 1.6 | Operating frequency and supply voltage of lead Intel microprocessors        | 11 |
| 1.7 | Maximum power consumption of lead Intel microprocessors                     | 12 |
| 1.8 | Power density trends of lead Intel microprocessors                          | 13 |
| 1.9 | Increasing current demand of lead Intel microprocessors                     | 14 |
| 2.1 | A CMOS gate driving an output capacitor. The drain-to-body junction         |    |
|     | capacitances of the driver gate, the equivalent capacitance of the          |    |
|     | interconnect, and the gate oxide capacitance of the load transistors are    |    |
|     | lumped into a single equivalent capacitance. C <sub>1</sub>                 | 23 |

| 2.2  | The drain current of a short-channel n-type MOSFET as a function of the                                               |    |
|------|-----------------------------------------------------------------------------------------------------------------------|----|
|      | gate-to-source voltage $(V_{\text{GS}})$ for three different drain-to-source voltages                                 |    |
|      | $\left(V_{DS}\right)$ . The DIBL, weak inversion, and p-n junction diode leakage                                      |    |
|      | components of the drain subthreshold leakage current are indicated (0.18                                              |    |
|      | $\mu m$ CMOS technology, W = 10 * W <sub>min</sub> , and L = L <sub>min</sub> )                                       | 28 |
| 2.3  | Short-channel MOSFET threshold voltage roll-off for super-halo (both                                                  |    |
|      | vertically and laterally non-uniform) and retrograde (vertically non-                                                 |    |
|      | uniform) doping profiles                                                                                              | 30 |
| 2.4  | Variation of the subthreshold leakage current with junction temperature                                               |    |
|      | for four different CMOS technology generations (W = 10 * $W_{min}$ and L =                                            |    |
|      | L <sub>min</sub> )                                                                                                    | 33 |
| 2.5  | Gate oxide tunneling current density as a function of the gate voltage for                                            |    |
|      | various gate oxide (SiO <sub>2</sub> ) thicknesses assuming a 100 nm CMOS                                             |    |
|      | technology                                                                                                            | 39 |
| 2.6  | The three mechanisms of gate dielectric tunneling current in an NMOS                                                  |    |
|      | transistor                                                                                                            | 40 |
| 2.7  | Different components of gate dielectric tunneling current in a MOSFET                                                 | 41 |
| 2.8  | Standby power dissipation current paths in a CMOS circuit                                                             | 42 |
| 2.9  | Comparison of the gate oxide capacitance per unit area versus the gate                                                |    |
|      | oxide leakage current density of various insulators for Aluminum Oxide                                                |    |
|      | (Al <sub>2</sub> O <sub>3</sub> ), Hafnium Dioxide (HfO <sub>2</sub> ), Silicon Dioxide (SiO <sub>2</sub> ), Tantalum |    |
|      | Pentoxide (Ta <sub>2</sub> O <sub>5</sub> ), Titanium Dioxide (TiO <sub>2</sub> ), and Zirconium Dioxide              |    |
|      | (ZrO <sub>2</sub> )                                                                                                   | 44 |
| 2.10 | Static DC current in a full voltage rail CMOS inverter driven by a low                                                |    |
|      | voltage swing signal                                                                                                  | 46 |

| 3.1  | Normalized power consumption versus supply voltage (V <sub>DD</sub> ) of a 19               |            |
|------|---------------------------------------------------------------------------------------------|------------|
|      | stage ring oscillator assuming a 0.18 µm CMOS technology                                    | 49         |
| 3.2  | Normalized delay versus supply voltage (V <sub>DD</sub> ) of a 19 stage ring                |            |
|      | oscillator for a 0.18 μm CMOS technology                                                    | 51         |
| 3.3  | Variation of the throughput required to execute certain tasks in a typical                  |            |
|      | microprocessor system                                                                       | 54         |
| 3.4  | Feedback loop architecture for a dynamic voltage scaling circuit                            | 56         |
| 3.5  | A single supply voltage circuit                                                             | 58         |
| 3.6  | A dual supply voltage circuit. The gates that operate at a lower supply                     |            |
|      | voltage are shaded                                                                          | 59         |
| 3.7  | A dual supply voltage circuit with the clustered voltage scaling                            |            |
|      | technique. The circuits operating at a lower supply voltage are shaded.                     |            |
|      | VIC: voltage interface circuit                                                              | 61         |
| 3.8  | Variation of the delay of a CMOS inverter with supply voltage for                           |            |
|      | different MOSFET threshold voltages assuming a 0.18 µm CMOS                                 |            |
|      | technology                                                                                  | 62         |
| 3.9  | Effect of threshold voltage scaling on the delay of a 19 stage ring                         |            |
|      | oscillator for four different supply voltages assuming a 0.18 µm CMOS                       |            |
|      | technology                                                                                  | 63         |
| 3.10 | Effect of threshold voltage scaling on short-channel effects in an NMOS                     |            |
|      | transistor. (a) A high-V <sub>t</sub> short-channel MOSFET. (b) A low-V <sub>t</sub> short- |            |
|      | channel MOSFET. $N_A$ : acceptor concentration in the channel area ( $N_{A2} < N_{A1}$ )    | 65         |
|      |                                                                                             | 65         |
| 3.11 | Reverse body bias circuit technique. (a) A reverse body biased NMOS                         | <b>(</b> 0 |
|      | transistor. (b) A reverse body biased PMOS transistor                                       | 68         |

| 3.12 | Effect of reverse body bias on the depletion region and inversion layer charge in a MOSFET. (a) A zero body biased NMOS transistor. (b) A reverse body biased NMOS transistor. $W_{D1} < W_{D2}$ . $W_{I1} > W_{I2}$                                                                                                                                                    | 69 |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.13 | Variation of the total standby power of a microprocessor test circuit as a function of reverse body bias voltage                                                                                                                                                                                                                                                        | 70 |
| 3.14 | Block diagram of a speed adaptive body bias circuit                                                                                                                                                                                                                                                                                                                     | 71 |
| 3.15 | Reduced die-to-die delay variations by applying the speed adaptive reverse body bias circuit technique to test circuits fabricated in a 0.25 $\mu$ m CMOS technology [73]. (a) Delay distribution of standard CMOS circuits with zero body bias. (b) Reduced delay distribution with adaptive body bias. (c) Enhanced worst case speed by further scaling the threshold |    |
|      | voltages with the adaptive body bias circuit technique                                                                                                                                                                                                                                                                                                                  | 72 |
| 3.16 | Body effect degradation due to channel length scaling. (a) A long channel MOSFET. (b) A short-channel MOSFET                                                                                                                                                                                                                                                            | 73 |
| 3.17 | Increasing short-channel effects and threshold voltage roll-off with reverse body bias (RBB) for low-V $_t$ and high-V $_t$ MOSFETs for a 0.25 $\mu$ m CMOS technology [63]. NBB: no body bias, RBB: reverse body bias                                                                                                                                                  | 75 |
| 3.18 | Effect of the reverse body bias circuit technique on drain-induced barrier lowering ( $\Delta V_t/\Delta V_{DS}$ ) for a 0.18 $\mu m$ CMOS technology. The threshold voltage ( $V_t$ ) is the gate-to-source voltage at which the drain current is equal to 1 $\mu A/\mu m$                                                                                             | 75 |
| 3.19 | Forward body bias circuit technique. (a) A forward body biased NMOS transistor. (b) A forward body biased PMOS transistor                                                                                                                                                                                                                                               | 76 |
| 3.20 | Effect of forward body bias on the depletion region and inversion layer charge in a MOSFET. (a) A zero body biased NMOS transistor. (b) A                                                                                                                                                                                                                               |    |
|      | forward body biased NMOS transistor. $W_{D1} > W_{D2}$ . $W_{I1} < W_{I2}$                                                                                                                                                                                                                                                                                              | 77 |

| 3.21 | Effect of forward body bias on short-channel effects in an NMOS transistor. FBB: forward body bias ( $V_{Body} > 0$ ). ZBB: zero body bias ( $V_{Body} = 0$ )                   | 79 |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.22 | Schematic representation of a forward body biased CMOS circuit.  IDIODE1: source-to-body junction diode current, IDIODE2: drain-to-body                                         | ,, |
|      | junction diode current, C <sub>J1</sub> : source-to-body junction capacitance, and C <sub>J2</sub> : drain-to-body junction capacitance                                         | 80 |
| 3.23 | Variation of the propagation delay and energy consumption of a 101 stage ring oscillator with body bias voltage based on a 0.18 µm CMOS technology                              | 81 |
| 3.24 | Variation of the energy-delay product of a 101 stage ring oscillator with body bias voltage based on a 0.18 µm CMOS technology                                                  | 82 |
| 3.25 | Leakage power and clock frequency characteristics of microprocessor test circuits fabricated in a 0.15 μm CMOS technology (LFB: lower frequency bin, HFB: higher frequency bin) | 85 |
| 3.26 | A multithreshold-voltage CMOS (MTCMOS) circuit. The high threshold voltage transistors are illustrated by a bold line in the channel area                                       | 90 |
| 3.27 | A multiple supply and threshold voltage integrated circuit with voltage partitioning based on the difference of the activity factors among different circuit blocks             | 94 |
| 3.28 | Power dissipation of a test circuit with varying supply voltage for three different activity factors assuming a 2 GHz clock frequency and a 100 nm CMOS technology              | 95 |

| 3.29 | Power dissipation of dual- $V_{DD}$ /dual- $V_t$ and standard single- $V_{DD}$ /single- $V_t$        |     |
|------|------------------------------------------------------------------------------------------------------|-----|
|      | test circuits with varying supply voltage for three different clock                                  |     |
|      | frequencies, assuming a 100 nm CMOS technology. For a dual- $V_{\text{DD}}$ /dual-                   |     |
|      | $V_t$ circuit, $V_{DD2}$ and $\left V_{t2}\right $ are fixed at 0.5 volts and 0 volts, respectively. |     |
|      | The supply voltage of the low activity circuits $(V_{DD1})$ is varied together                       |     |
|      | with the threshold voltages ( $ V_{t1} $ ) while maintaining a target clock                          |     |
|      | frequency                                                                                            | 96  |
| 3.30 | Active mode power dissipation of a CMOS circuit with varying supply                                  |     |
|      | voltage for a fixed operating frequency. The threshold voltages are                                  |     |
|      | modified together with the supply voltage to maintain a constant                                     |     |
|      | frequency                                                                                            | 98  |
| 3.31 | Active power with varying supply voltage for various clock frequencies.                              |     |
|      | With each curve, the threshold voltages are modified together with the                               |     |
|      | supply voltage to maintain a constant frequency                                                      | 99  |
| 4.1  | Power supply system for a laptop computer                                                            | 104 |
| 4.2  | A simple voltage divider circuit describing the operating principle of a                             |     |
|      | linear DC-DC converter                                                                               | 106 |
| 4.3  | A linear voltage regulator                                                                           | 107 |
| 4.4  | A high current-efficiency linear regulator                                                           | 109 |
| 4.5  | Diagram representing the operation of the flexible control of the output                             |     |
|      | current (FCOC) technique proposed in [106]                                                           | 110 |
| 4.6  | Schematic representation of a switched-capacitor DC-DC converter                                     | 111 |
| 4.7  | Buck converter circuit                                                                               | 114 |
| 4.8  | Inductor current $i_L(t)$ , output voltage $V_{DD2}(t)$ , and capacitor current $i_C(t)$             |     |
|      | waveforms                                                                                            | 116 |
| 5.1  | Circuit model of the parasitic impedances of a buck converter                                        | 125 |

| 5.2 | Total power consumption of a buck converter as a function of $f_s$ and $\Delta i$                                                                                                                                                                                                                                 | 132 |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 5.3 | Efficiency of a buck converter as a function of $f_s$ and $\Delta i$                                                                                                                                                                                                                                              | 133 |
| 5.4 | Variation of the total MOSFET related optimized power (including the power dissipated in the gate driver buffers of the power MOSFETs) with the switching frequency and inductor current ripple                                                                                                                   | 134 |
| 5.5 | Variation of the total power dissipated in the filter inductor with the switching frequency and inductor current ripple                                                                                                                                                                                           | 135 |
| 5.6 | Variation of the total power dissipated in the filter capacitor with the switching frequency and inductor current ripple                                                                                                                                                                                          | 135 |
| 5.7 | Variation of maximum efficiency and switching frequency of a buck converter with filter capacitance $C$ (1 nF < $C$ < 100 nF) and output voltage ripple $\Delta V_{DD2}$ (5 mV < $\Delta V_{DD2}$ < 25 mV). (a) Maximum efficiency. (b) Switching frequency                                                       | 138 |
| 5.8 | Variation of filter inductance and MOSFET and inductor related power components of a buck converter with filter capacitance $C$ (1 nF < $C$ < 100 nF) and output voltage ripple $\Delta V_{DD2}$ (5 mV < $\Delta V_{DD2}$ < 25 mV). (a) Filter inductance. (b) Total MOSFET and inductor related power components | 139 |
| 5.9 | Simulation waveforms of a buck converter for $C = 100$ nF. (a) Output voltage ripple $V_{ripple}(t)$ . (b) Output response of a buck converter to a change in load current from $I_{min}$ to $I$ . (c) Output response of a buck converter to a step current changing between $I_{min}$ and $I$                   | 141 |
| 6.1 | Parasitic impedances and transistor geometric sizes of a buck converter                                                                                                                                                                                                                                           | 146 |
| 6.2 | Variation of the effective series resistance of 1 $\mu m$ wide NMOS and PMOS transistors with gate-to-source voltage, $V_{GS}$ ( $ V_{DS} $ = 0.1 volts)                                                                                                                                                          | 151 |
| 6.3 | The maximum efficiency attainable with a full swing (FS) buck converter circuit for different tapering factors                                                                                                                                                                                                    | 154 |

| 6.4 | Efficiency of a full swing buck converter as a function of the switching                           |     |
|-----|----------------------------------------------------------------------------------------------------|-----|
|     | frequency $(f_s)$ and inductor current ripple $(\Delta i)$                                         | 155 |
| 6.5 | Optimum power supply voltage of the power NMOS drivers ( $V_{gn}$ ) and                            |     |
|     | optimum ground voltage of the power PMOS drivers $(V_{gp})$ that maximize                          |     |
|     | the efficiency for different tapering factors                                                      | 156 |
| 6.6 | A comparison of the optimum widths of the power PMOS and NMOS                                      |     |
|     | transistors that maximize the efficiency of the full swing (FS) and the low                        |     |
|     | swing (LS) buck converters for different tapering factors                                          | 157 |
| 6.7 | A comparison of the maximum efficiency attainable with the low swing                               |     |
|     | (LS) and the full swing (FS) buck converter circuits for different tapering                        |     |
|     | factors                                                                                            | 158 |
| 6.8 | A comparison of the total transistor width (including the widths of the                            |     |
|     | transistors within the gate drivers) of the low swing (LS) and the full                            |     |
|     | swing (FS) buck converter circuits with the highest efficiency                                     |     |
|     | characteristics for different tapering factors                                                     | 160 |
| 7.1 | High voltage off-chip power delivery and on-chip DC-DC conversion                                  | 163 |
| 7.2 | Input voltage constraint in an off-chip buck converter circuit ( $V_{DD1} \le$                     |     |
|     | V <sub>max</sub> )                                                                                 | 164 |
| 7.3 | Cascode bridge circuit operating at an input supply voltage of $V_{DD1}$ =                         |     |
|     | $3V_{max}$ ( $V_{DD3} = 2V_{max}$ and $V_{DD4} = V_{max}$ )                                        | 166 |
| 7.4 | Proposed DC-DC converter circuit operating at an input supply voltage of                           |     |
|     | $V_{DD1} = 3V_{max} (V_{DD3} = 2V_{max}, V_{DD4} = V_{max}, \text{ and } V_{DD2} < V_{DD1}) \dots$ | 167 |
| 8.1 | Signal transfer between circuit blocks in a multiple supply voltage                                |     |
|     | integrated circuit                                                                                 | 172 |
| 8.2 | Circuit architecture for low swing interconnect                                                    | 173 |
| 8.3 | The proposed voltage interface circuit                                                             | 175 |

| 8.4 | Average delay versus load capacitance                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 177 |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 8.5 | Average power versus load capacitance                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 177 |
| 8.6 | Power efficiency versus load capacitance                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 178 |
| 8.7 | Microphotograph of the interface circuit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 180 |
| 8.8 | Experimentally derived input and output voltage waveforms of the proposed voltage interface circuit. (a) 10 V → 5 V interface. (b) 5 V → 10 V interface.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 181 |
| 9.1 | Domino gates with standard keeper transistors. (a) Standard footed domino gate. (b) Standard clock-delayed footless domino logic circuit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 186 |
| 9.2 | Comparison of the normalized noise immunity, evaluation delay, and power characteristics of standard footless domino logic circuits with different keeper sizes. (a) Effect of the increased keeper size on the circuit characteristics of a four input domino AND gate. (b) Effect of the increased keeper size on the circuit characteristics of a four input domino OR gate. NML 1, Delay 1, and Power 1: only one input is excited while the other inputs are either grounded (for the OR gates) or connected to $V_{DD}$ (for the AND gates). NML 2, Delay 2, and Power 2: All four inputs are excited with the same input or noise signal | 189 |
| 9.3 | A k input domino OR gate with a variable threshold voltage keeper                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 193 |
| 9.4 | Waveforms that characterize the operation of the variable threshold voltage keeper circuit technique                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 194 |
| 9.5 | Body bias generator circuit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 194 |
| 9.6 | A four-bit multiple-output domino carry generator of a carry lookahead adder implemented with the variable threshold voltage keeper circuit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 107 |
|     | technique                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 197 |

| 9.7  | Variation of the power-delay product (PDP), delay, power, and noise             |     |
|------|---------------------------------------------------------------------------------|-----|
|      | margin low (NML) characteristics of CG-DVTVK with V <sub>DD2</sub> . Values are |     |
|      | normalized to those of a standard domino (SD) carry generator circuit           |     |
|      | with the same size transistors (KPR = 2.2)                                      | 198 |
| 9.8  | SD and DVTVK simulation results for different keeper to critical path           |     |
|      | equivalent transistor width ratios (KPR). (a) Evaluation delay versus           |     |
|      | KPR. (b) Power dissipation versus KPR. (c) Noise margin versus KPR.             |     |
|      | (d) Power delay product versus KPR                                              | 200 |
| 9.9  | Clock delayed domino logic with the variable threshold voltage keeper           |     |
|      | circuit technique                                                               | 205 |
| 9.10 | Variation of the delay, power, and PDP savings of the CG-DVTVK and              |     |
|      | COR-DVTVK circuits with the output load capacitance as compared to              |     |
|      | CG-SD and COR-SD, respectively (KPR = 2.2)                                      | 209 |
| 9.11 | An eight-input footless domino OR gate with forward body biased keeper          | 211 |
| 9.12 | Variation of COR-DVTVK noise margins with the forward body bias for             |     |
|      | KPR = 1 and $KPR = 2.2$ . The noise margins are normalized to the zero          |     |
|      | body biased keeper condition. NML-ONE: noise couples to one input               |     |
|      | while all of the other inputs are grounded. NML-ALL: noise couples to           |     |
|      | all of the inputs                                                               | 213 |
| 9.13 | Variation of the savings in delay, power, and PDP of COR-DVTVK as               |     |
|      | compared to COR-SD with a forward body bias applied to the keeper for           |     |
|      | two different keeper sizes. {Delay1, Power1, PDP1} → KPR = 1.                   |     |
|      | $\{Delay2.2, Power2.2, PDP2.2\} \rightarrow KPR = 2.2 \dots$                    | 214 |
| 10.1 | Power trends of high performance microprocessors                                | 219 |
| 10.2 | A dual-V, domino logic circuit                                                  | 222 |

| 10.3 | Variation of the subthreshold leakage current conduction paths with the           |     |
|------|-----------------------------------------------------------------------------------|-----|
|      | state of the dynamic and output nodes in a two input standard low- $V_{t}$        |     |
|      | domino AND gate. (a) High (H) dynamic node voltage state. (b) Low (L)             |     |
|      | dynamic node voltage state. LVK: Low-V <sub>t</sub> keeper transistor. LVPU:      |     |
|      | Low-V <sub>t</sub> pull-up transistor. LVN: Low-V <sub>t</sub> NMOS transistor    | 223 |
| 10.4 | Variation of the subthreshold leakage current conduction paths with the           |     |
|      | node voltages in a two input dual-V <sub>t</sub> domino AND gate. (a) High (H)    |     |
|      | dynamic node voltage. (b) Low (L) dynamic node voltage. HVK: High-                |     |
|      | $V_t$ keeper transistor. HVPU: High- $V_t$ pull-up transistor. HVN: High- $V_t$   |     |
|      | NMOS transistor                                                                   | 224 |
| 10.5 | Comparison of the subthreshold leakage current of low-V $_{t}$ and dual-V $_{t}$  |     |
|      | domino logic circuits for the two states of the dynamic node. The leakage         |     |
|      | current of each gate is normalized to the leakage current of the                  |     |
|      | corresponding low- $V_t$ gate with a high (H) dynamic node voltage. L: low        |     |
|      | dynamic node voltage. AND2, AND4, AND6, and AND8: 2, 4, 6, and 8                  |     |
|      | input, respectively, domino AND gates. OR2, OR4, and OR8: 2, 4, and 8             |     |
|      | input, respectively, domino OR gates. MUX16: 16-bit domino                        |     |
|      | multiplexer                                                                       | 225 |
| 10.6 | Comparison of the noise immunity of low- $V_t$ and dual- $V_t$ domino logic       |     |
|      | circuits with the same size transistors. The noise margin of each gate is         |     |
|      | normalized to the noise margin of the corresponding low-V <sub>t</sub> gate. HVK: |     |
|      | high-V <sub>t</sub> keeper. LVK: low-V <sub>t</sub> keeper                        | 229 |

| 10.7  | Comparison of the subthreshold leakage current of low-V <sub>t</sub> and dual-V <sub>t</sub> |     |
|-------|----------------------------------------------------------------------------------------------|-----|
|       | domino logic circuits for the two states of the dynamic node (under                          |     |
|       | similar noise immunity conditions). The leakage current of each gate is                      |     |
|       | normalized to the leakage current of the corresponding low-V $_{t}$ gate with a              |     |
|       | high dynamic node voltage (H). L: low dynamic node voltage. Dual- $V_{t}$ -                  |     |
|       | HVK: dual- $V_t$ domino with high- $V_t$ keeper. Dual- $V_t$ -LVK: dual- $V_t$               |     |
|       | domino with low- $V_t$ keeper. Low- $V_t$ : standard low- $V_t$ domino circuit               | 230 |
| 10.8  | Subthreshold leakage current conduction paths for the low (L) voltage                        |     |
|       | state of the dynamic node in a dual-V $_{t}$ domino AND gate with a low-V $_{t}$             |     |
|       | keeper. LVK: Low- $V_t$ keeper transistor. HVPU: High- $V_t$ pull-up                         |     |
|       | transistor. HVN: High-V <sub>t</sub> NMOS transistor                                         | 231 |
| 10.9  | Comparison of the evaluation delay of domino logic circuits. The                             |     |
|       | evaluation delay of each gate is normalized to the delay of the                              |     |
|       | corresponding low- $V_t$ gate. SNI: same noise immunity                                      | 233 |
| 10.10 | Comparison of the precharge delay of domino logic circuits. The                              |     |
|       | precharge delay of each gate is normalized to the precharge delay of the                     |     |
|       | corresponding low-V $_t$ gate. SNI: same noise immunity                                      | 233 |
| 10.11 | Comparison of the power consumption of domino logic circuits during                          |     |
|       | the active mode. The power consumed by each gate is normalized to the                        |     |
|       | power consumption of the corresponding low-V <sub>t</sub> gate. SNI: same noise              |     |
|       | immunity                                                                                     | 234 |

| 10.12 | The range of savings in subthreshold leakage current provided by the                     |     |
|-------|------------------------------------------------------------------------------------------|-----|
|       | dual-V <sub>t</sub> domino logic circuit technique as compared to the standard low-      |     |
|       | V <sub>t</sub> domino logic circuit technique for three different sets of dual threshold |     |
|       | voltages. Min_L and Max_L: minimum and maximum, respectively, of                         |     |
|       | the reduction in subthreshold leakage current as compared to the low- $V_t$              |     |
|       | domino logic circuits at a low dynamic node voltage state. Min_H and                     |     |
|       | Max_H: minimum and maximum, respectively, of the reduction in                            |     |
|       | subthreshold leakage current as compared to the low- $V_t$ domino logic                  |     |
|       | circuits at a high dynamic node voltage state                                            | 236 |
| 10.13 | The range of difference in evaluation delay of the dual-V <sub>t</sub> circuits as       |     |
|       | compared to the low- $V_t$ domino logic circuits for three different sets of             |     |
|       | dual threshold voltages. A negative difference indicates a smaller                       |     |
|       | evaluation delay as compared to a low- $V_t$ circuit. Min: minimum                       |     |
|       | difference in evaluation delay as compared to the low-V $_{t}$ domino logic              |     |
|       | circuits. Max: maximum difference in evaluation delay as compared to                     |     |
|       | the low-V <sub>t</sub> domino logic circuits                                             | 237 |
| 10.14 | The range of difference in precharge delay between the dual- $V_t$ and low-              |     |
|       | $V_{t}$ domino logic circuits for three different sets of dual threshold                 |     |
|       | voltages. Min: minimum difference in precharge delay as compared to                      |     |
|       | $low-V_t$ domino logic circuits. Max: maximum difference in precharge                    |     |
|       | delay as compared to low- $V_t$ domino logic circuits                                    | 238 |
| 10.15 | The range of difference in the power consumption (during the active                      |     |
|       | mode) of the dual- $V_t$ and low- $V_t$ domino logic circuits for three different        |     |
|       | sets of dual threshold voltages. A negative difference indicates smaller                 |     |
|       | power consumption as compared to a low-V $_{t}$ circuit. Min: minimum                    |     |
|       | difference in power consumption as compared to low- $V_t$ domino logic                   |     |
|       | circuits. Max: maximum difference in power consumption as compared                       |     |
|       | to low- $V_t$ domino logic circuits                                                      | 239 |

| 11.1 | Standard domino logic circuits. (a) Standard low-V <sub>t</sub> domino logic circuit. (b) Standard dual-V <sub>t</sub> domino logic circuit. High-V <sub>t</sub> transistors are symbolically represented by a thick line in the channel region                                                                                                                                                      | 243 |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 11.2 | Sleep switch dual- $V_t$ domino logic circuit technique. High- $V_t$ transistors are symbolically represented by a thick line in the channel region                                                                                                                                                                                                                                                  | 246 |
| 11.3 | Block diagram of a clock-delayed domino carry lookahead adder with the sleep switch dual-V <sub>t</sub> circuit technique                                                                                                                                                                                                                                                                            | 248 |
| 11.4 | A comparison of the leakage energy (per clock cycle) of the adder circuits with the low- $V_t$ , standard dual- $V_t$ , and sleep switch circuit techniques for six different input vectors                                                                                                                                                                                                          | 250 |
| 11.5 | Variation of subthreshold leakage current conduction paths with input vector for a high voltage state at the dynamic node in a standard dual-V <sub>t</sub> domino logic circuit. (a) Sources of subthreshold leakage current for V <sub>0</sub> . (b) Sources of subthreshold leakage current for V <sub>1</sub> . (c) Sources of subthreshold leakage current for V <sub>5</sub> . H: high. L: low | 252 |
| 11.6 | A comparison of the delay, power, and power delay product (PDP) of adder circuits with low- $V_t$ , standard dual- $V_t$ , and sleep switch circuit techniques for the input vectors $V_2$ and $V_3$                                                                                                                                                                                                 |     |
| 11.7 | Cumulative standby energy dissipation of the low-Vt and sleep switch adders for three different input vectors                                                                                                                                                                                                                                                                                        | 258 |
| 11.8 | Cumulative standby energy dissipation of the sleep switch and standard dual- $V_t$ adders for three different input vectors                                                                                                                                                                                                                                                                          | 259 |
| 11.9 | Under similar noise immunity conditions, a comparison of the leakage energy (per clock cycle) of the adder circuits with the low-V <sub>t</sub> , standard dual-V <sub>t</sub> , and sleep switch circuit techniques for six different input vectors                                                                                                                                                 | 262 |

| 11.10 | Under similar noise immunity conditions, cumulative standby energy dissipation of the low-V <sub>t</sub> and sleep switch adders for three different input vectors                                                                                                               | 264 |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 11.11 | Under similar noise immunity conditions, cumulative standby energy dissipation of the sleep switch and standard dual- $V_t$ adders for three different input vectors                                                                                                             | 265 |
| 12.1  | Domino logic circuit and voltage transfer characteristics (VTC). (a) Standard domino logic circuit with a keeper (SDK). (b) VTC of SDK for various threshold voltages                                                                                                            | 270 |
| 12.2  | The proposed low swing domino logic with fully driven keeper (LSDFDK) circuit technique                                                                                                                                                                                          | 273 |
| 12.3  | The proposed low swing domino logic with weakly driven keeper (LSDWDK) circuit technique                                                                                                                                                                                         | 274 |
| 12.4  | Three stage pipeline of four input domino AND gates                                                                                                                                                                                                                              | 275 |
| 12.5  | Four input domino AND gates based on the proposed low swing domino logic circuit techniques. (a) LSDFDK AND gate. (b) LSDWDK AND gate.                                                                                                                                           | 275 |
| 12.6  | Simulation results for different pull-down network transistor sizes for a constant keeper size ( $W_{keeper} = W_{min}$ ). (a) Power versus pull-down network equivalent width (PNEW). (b) Evaluation time versus PNEW. (c) Maximum tolerable noise amplitude (MTNA) versus PNEW | 277 |
| 12.7  | Dual threshold voltage implementation of the proposed low swing domino logic with a fully driven keeper (LSDFDK) circuit technique. High-V <sub>t</sub> transistors are illustrated with a bold line in the channel area                                                         | 280 |
| 12.8  | Dual threshold voltage implementation of the proposed low swing domino logic with a weakly driven keeper (LSDWDK) circuit technique. High-V <sub>t</sub> transistors are illustrated with a bold line in the channel area                                                        | 281 |

| 14.1 | Triple gate and gate-all-around (quadruple gate) MOSFETs                | 294 |
|------|-------------------------------------------------------------------------|-----|
| 14.2 | An integrated circuit with multiple voltage and clock domains           | 298 |
| 14.3 | Effect of technology scaling on the physical geometries of interconnect |     |
|      | lines                                                                   | 301 |
| 14.4 | Various sources of noise in a microprocessor                            | 303 |

## **Chapter 1**

#### Introduction

The scaling of semiconductor process technologies has been continuing for more than four decades. Advancing process technologies are the fuel that has been moving the semiconductor industry and the key to its growth. In response to growing customer demand for enhanced performance and functionality at reduced cost, a new process technology has been introduced by the semiconductor industry every two to three years during the past four decades [1]. Both the performance and the complexity of integrated circuits have grown dramatically since the invention of the integrated circuit in 1959. Microphotographs of the first monolithic integrated circuit (Fairchild Semiconductor, 1959), the first microprocessor (Intel 4004, 1971), and a latest microprocessor (Intel Pentium 4, 2002) are shown in Fig. 1.1.



Fig. 1.1. Microphotographs of three landmark ICs from the evolution of the integrated circuit technology (the sizes of the dies are not to scale). (a) The first monolithic integrated circuit, Fairchild Semiconductor (1959). (b) The first microprocessor, Intel 4004 (1971). (c) The latest Intel Pentium 4 microprocessor (2002).

Technology scaling reduces the delay of the circuit elements, enhancing the operating frequency of an integrated circuit (IC) [1]-[5]. The density and number of transistors on an IC increase with the scaling of the feature sizes. By utilizing this growing number of available transistors in each new process technology, novel circuit techniques and microarchitectures can be employed, further enhancing the performance of the ICs beyond the levels that are made possible by simply scaling (or shrinking) a previous generation [1]-[7]. The price for the performance and functional enhancements has traditionally been increased complexity and power consumption. Generation, distribution, and dissipation of power are at the forefront of current problems faced by IC designers.

Historically, circuit techniques and architectures employed during the evolution of the integrated circuits have followed two different paths. For a group of technologies, enhancing the speed has been at the core of the design process. This class of ICs represents the high end of the performance spectrum. In this high end arena, increasing clock frequency and die size and the wide-spread use of power hungry circuit techniques and microarchitectures (with continuously increasing levels of speculative execution often translated into an inefficient use of energy) have increased the power consumption many fold over the years [2], [3], [7]. Until recently, the removal of heat in high performance ICs was handled by inexpensive packaging solutions, passive heat sinks, and air fans. With the power dissipation of ICs rising well above 100 watts, however, more expensive packaging and cooling solutions such as liquid cooling or refrigeration hardware will soon be required [2]-[10]. The power dissipation and heat removal issues are likely to be the cause for the end of the ever-decreasing price to performance ratios of high performance ICs.

Another group of ICs has emerged as a result of the customer demand for miniaturization and portability. Portable devices, until recently, represented the low end of the performance spectrum with power constraints always dominating over performance [4], [6], [9]. Extended battery life and reduced system cost constraints drove the portable equipment design process until the 1990s. However, since the 1990s, strong customer demand has been growing for higher performance (for high

speed computing and data transfer) and a wider variety of applications in portable equipment. Today, people expect from their portable devices the same computing capability as a desktop system.

While the performance of mobile devices continues to advance at a fast pace in accordance with general semiconductor technology trends, the evolution of battery technologies has proceeded at a much slower pace [4], [9], [11]. Before rechargeable battery technologies evolved to offer sufficient energy in a miniaturized volume, standard disposable alkaline battery technology was the popular power solution. Frequent battery purchases coupled with the inconvenience of carrying replacement batteries pushed the vendors to search for a rechargeable battery solution. Nickel-Cadmium (Ni-Cd) chemistry (invented in 1899) became the battery supply for portable devices towards the end of the 1980s. Ni-Cd was replaced by Nickel-Metal-Hydride (Ni-M-H) chemistry during the mid-1990s. Ni-M-H batteries offer twice the energy density with faster charging times as compared to Ni-Cd batteries [11]. The Lithium-ion (Li-ion) chemistry (first introduced in the early 1990s) gradually replaced the Ni-M-H technology towards the end of the last decade. Li-ion chemistry, with better energy density characteristics as compared to both the Ni-Cd and Ni-M-H batteries, is the most widely used battery technology today [11].

Vendors respond to the continuous market demand for greater functionality and higher processing speed while continuing to decrease the physical size and weight of the portable devices. Batteries are, therefore, required to offer increasing amounts of energy while occupying smaller volumes as the semiconductor technology progresses [11]. Today, the lack of a low cost, little volume, and light weight battery technology with a higher energy density as compared to the Li-ion technology is a primary limitation to further advancements in portable integrated circuit technologies.

Traditional circuits and architectures of high performance ICs, because of the typical power hungry characteristics of these technologies, are not usable in ICs designed for portable systems. Alternatively, circuits and architectures that have been developed for portable devices, because of the typical low throughput characteristics of these technologies, are not applicable to high performance ICs. Today, the IC

industry is experiencing a shift in the requirements at both the high performance and portability ends of the market. Power dissipation is no longer a secondary issue in high performance ICs. Similarly, enhancing the throughput is as important as lowering the power, area, and weight in many portable devices. Energy efficient semiconductor devices, circuit techniques, and microarchitectures are necessary to maintain the pace of expansion that the semiconductor industry has been enjoying for the past forty years [1]-[12].

Going back in history, the invention of the transistor in 1947 can be seen as the first step towards low power electronics. Operation of a vacuum tube requires hundreds of volts of anode voltage and a few watts of power. In comparison, a transistor operates at a higher speed and at a significantly lower supply voltage and consumes orders of magnitude smaller power [4]. Similarly, the invention of the integrated circuit (IC) in the late 1950s can be seen as the first step towards low power microelectronics. ICs consume less power, are lower weight, and occupy smaller volume while offering the same functionality often with enhanced performance and reliability as compared to circuits composed of discrete devices [13], [14]. The trends that shaped the evolution of the IC technology are reviewed in Section 1.1. An outline of this dissertation is presented in Section 1.2.

# 1.1. Evolution of the Integrated Circuit Technology

The monolithic integrated circuit (IC) was invented in 1959. The primary reasons for implementing certain functions as ICs were to lower the weight and size while enhancing the reliability and performance characteristics as compared to circuits composed of discrete devices [13]. ICs were an expensive technology during the 1960s, limiting the use of ICs to specific military applications with severe requirements of weight, size, and reliability. Gordon Moore noticed in 1965, only six years after the birth of the very first integrated circuit, that the unit costs of integrated circuits were steadily decreasing as the technology evolved and fabrication techniques matured [13], [14]. Moore saw that shrinking transistor sizes, increasing

manufacturing yield, and the larger wafer and die sizes would make ICs increasingly cheaper, more powerful, and more plentiful. As Moore declared in 1965, "the future of integrated electronics turned out to be the future of electronics itself" [13]. Advances in integrated circuit technology enabled the so called "information age" that is experienced today. A timeline of some of the key events that have led to the invention and advancements of IC technologies are illustrated in Fig. 1.2.



Fig. 1.2. A timeline of some of the key events during the evolution of semiconductor technologies.

The essence of Moore's Law is depicted in Fig. 1.3 [13]. As more components are added onto an integrated circuit at a particular process technology generation (or a technology node), the relative manufacturing cost per component decreases (assuming that the same amount of semiconductors and the same package are used to incorporate more components) [13]. However, as more components are integrated onto the same die, the complexity (at the process, circuits, and layout levels) increases, degrading yield. There is, therefore, an optimum number of components per IC that minimizes the total manufacturing cost at any generation in the evolution of an IC technology [13]. The unit price of a transistor decreases as the device dimensions scale, defect densities are reduced, and wafer and die sizes grow [13], [14]. The optimum number

of transistors that minimizes the total manufacturing cost, therefore, increases from one technology generation to the next as shown in Fig. 1.3. The total number of transistors that can be integrated onto a piece of semiconductor material has increased by more than a million times since the mid 1960s, verifying the analysis Gordon Moore made in 1965. What began as an observation has become both the compass and engine, setting the bar for the semiconductor industry over the past four decades.



Fig. 1.3. The general form of Moore's Law [13].

High performance microprocessors represent the front end of the market demand for enhanced performance and functionality. No IC technology has, therefore, witnessed the employment of more aggressive semiconductor process technologies, circuits, and architectures as compared to high performance microprocessors [1], [2]. The high performance microprocessor and high density random access memory (RAM) industries have, historically, encountered the side effects of the advances of process technologies before any other branch of the semiconductor industry. The focus of this section is on the advancements of high performance microprocessor technologies. The technological trends in the evolution of the lead Intel

microprocessors will be examined. The choice of the lead microprocessor product line of Intel Corporation is due to the significant role that the company has played in the semiconductor industry by creating innovative technologies during the past three decades. Similar technological trends can also be observed in other leading vendor product lines. Common trends in the variation of some of the technological parameters among different microprocessor generations for three vendors are listed in Table 1.1.

TABLE 1.1
TECHNOLOGICAL TRENDS OF HIGH PERFORMANCE MICROPROCESSORS

| Vendor                      | DEC Alpha |       |       | AMD  |      |      | IBM/MOTOROLA PowerPC |      |      |
|-----------------------------|-----------|-------|-------|------|------|------|----------------------|------|------|
| Microprocessor              | 21064     | 21164 | 21264 | K5   | K6   | K7   | 750                  | 7400 | 7450 |
| Technology (μm)             | 0.75      | 0.5   | 0.35  | 0.5  | 0.3  | 0.25 | 0.29                 | 0.2  | 0.18 |
| Frequency (MHz)             | 200       | 300   | 600   | 75   | 233  | 500  | 266                  | 400  | 667  |
| Die Area (mm²)              | 234       | 299   | 314   | 251  | 162  | 184  | 67                   | 83   | 106  |
| Transistor Count (millions) | 1.68      | 9.3   | 15.2  | 4.3  | 8.8  | 22   | 6.35                 | 10.5 | 33   |
| Supply Voltage (V)          | 3.3       | 3.3   | 2.2   | 3.5  | 3.3  | 1.6  | 2.6                  | 1.8  | 1.6  |
| Supply Current (A)          | 9.1       | 15.2  | 32.7  | 3.3  | 9.2  | 26.3 | 3                    | 6.3  | 11.9 |
| Power (Watts)               | 30        | 50    | 72    | 11.6 | 30.2 | 42   | 7.9                  | 11.3 | 19   |
| Power Density (W/cm²)       | 12.8      | 16.7  | 22.9  | 4.6  | 18.6 | 22.8 | 11.8                 | 13.6 | 17.9 |

The important element behind the IC evolution is the advancing fabrication technology that permits technology scaling [1]-[9]. The feature size of the transistors and interconnect have been continually scaled, increasing the transistor density in each new process technology generation. The minimum feature size of the transistors in the lead Intel microprocessors has decreased from 10  $\mu$ m in 1971 to 0.13  $\mu$ m in 2002 as shown in Fig. 1.4. The second primary development behind the IC evolution is the

reduced defect densities due to the maturing fabrication technology, thereby making larger dies (individual integrated circuits or chips) economical. Die areas have grown steadily by about 14% per year from 1971 to 1995 as shown in Fig. 1.5. Starting with the mid 1990s, however, limiting further increases in die size became necessary due to concerns about power consumption [3], [5]. As a result of the reduced physical dimensions of the transistors and the increased die area, the total number of transistors in the lead Intel microprocessors has increased by twenty four thousand times over the past three decades as shown in Fig. 1.4.



Fig. 1.4. Scaling of the minimum feature size and the increasing total number of transistors within each lead Intel microprocessor.

The increasing number of transistors per IC in each new process technology generation offers more tools for enhancing circuit performance and functionality. The

propagation delays are reduced as the physical dimensions of the transistors are scaled. Technology scaling related enhancements coupled with advances in circuits and microarchitectures (such as deeper pipelining, superscalar, and out-of-order execution) have significantly increased the performance of integrated circuits [1]-[8]. As shown in Fig. 1.6, the operating frequency of the lead Intel microprocessors has increased by more than twenty eight thousand times since the introduction of the first microprocessor (Intel 4004) in 1971.



Fig. 1.5. Die area of lead Intel microprocessors.

The period of technology scaling, since the invention of the first integrated circuit, is divided into two separate eras depending upon the characteristics of the supply voltage in a scaled technology as compared to a preceding technology generation. The supply voltage in the first three Intel microprocessor generations was 12 volts as

shown in Fig. 1.6. Starting with the 3 µm technology node, the supply voltage was reduced to 5 volts. IC supply voltages were maintained at 5 volts up until the  $0.8~\mu m$ technology node that was commercialized at the beginning of the 1990s (see Fig. 1.6). At the 0.8 µm technology node, supply voltage scaling became an essential part of the technology scaling process due to transistor reliability and power consumption concerns [3]-[5], [15]. The era (until 1993 in case of Intel) during which supply voltage scaling was not necessarily a part of technology scaling is called the constant voltage scaling era. The technology scaling era (after 1993 in case of Intel) during which the supply voltage scaling has coupled the scaling of the other device parameters is called the constant field scaling era [3], [4], [15]. The name constant field scaling arises from the fact that the supply voltage for a new technology is ideally chosen so as to maintain the electric fields between the terminals of a transistor constant [15]. The need to slow the rate of increase in power consumption became an increasingly important factor in supply voltage scaling towards the end of the 1990's. Today, the requirements for lowering the power consumption and improving device reliability together with the targeted circuit speed determine the rate of supply voltage scaling [1]-[5], [7], [8], [11], [15], [16].

An increase in the operating frequency and die size (due to the increased number of transistors for additional circuitry and novel microarchitectures) not only enhances the performance, but also increases the power consumption [1]-[6], [8]-[10], [16], [17]. As shown in Fig. 1.7, the maximum power dissipation of the lead Intel microprocessors has been increasing over the past 30 years. The circuitry in the first two Intel microprocessor generations was PMOS. Starting with the Intel 8080, NMOS became the preferred topology due to the speed and area advantages of NMOS as compared to PMOS. NMOS circuits, however, suffered from static DC power dissipation and low noise margins [18], [20]. By the end of the 1970s, technology scaling of NMOS circuits became increasingly difficult as the low noise margins of the NMOS circuits did not permit supply voltage scaling to accompany scaling of the feature size [18]. The increasing number of transistors operating at higher clock frequencies at a high supply voltage coupled with the intrinsic static DC power

dissipation of the NMOS circuits set the stage for the end of a decade long dominance of NMOS. As shown in Fig. 1.8, the power density of the last NMOS microprocessor of Intel (the i8086 that was commercialized in 1978) is similar to the power density of a kitchen hot plate. The cooling technology that was available at the beginning of the 1980s was already limited, permitting no further technological advances that would lead to an increase in power dissipation.



Fig. 1.6. Operating frequency and supply voltage of lead Intel microprocessors.

The CMOS circuit topology (first proposed in 1963 [18], [20]) was adapted by the IC industry in the early 1980s due to the intrinsically lower power dissipation and better scaling characteristics of CMOS as compared to NMOS [18]-[20]. The higher noise margins in CMOS circuits made possible supply voltage scaling that accelerated in the 1990s, enhancing both transistor reliability and energy efficiency. CMOS

became the preferred circuit topology in the lead Intel microprocessors starting with the i286 (introduced in 1982). The transition from NMOS to CMOS reduced both the power consumption and the power density of the Intel microprocessors as shown in Figs. 1.7 and 1.8, respectively [5].



Fig. 1.7. Maximum power consumption of lead Intel microprocessors.

The reduction in the power dissipation of high performance microprocessors due to the transition to CMOS, however, provided only temporary relief. Maintaining the approach of employing higher clock frequencies coupled with power hungry circuits and highly speculative architectures in order to achieve enhanced performance, the power consumption and power density of the post-NMOS era (*i.e.*, CMOS and BiCMOS) ICs were, once again, pushed to higher levels. As shown in Figs. 1.7 and 1.8, respectively, both the power consumption and power density of the lead Intel microprocessors (with the exception of the first generation Pentium 3) have been

increasing since the introduction of the second generation CMOS microprocessor (i386) in 1985. As shown in Fig. 1.8, the power density of the current high performance microprocessors has well exceeded the power density of the heating coil of a kitchen hot plate [2], [3], [5], [21].



Fig. 1.8. Power density trends of lead Intel microprocessors.

The temperature of a die is controlled to maintain proper operation of the circuitry compliant with technical specifications [5], [10], [22]. Thermal management of high performance integrated circuits has become increasingly difficult due to the continuously increasing power dissipation and power density in each new process technology generation [2]-[5], [10], [12], [16], [17], [21], [23]. Within a few technology generations, traditional cooling solutions such as low cost heat sinks and air flow fans will become ineffective for thermal management of ICs [2]-[5], [10], [22]. If the current trend in the rate of increase in the power levels continues, ICs will

consume thousands of watts of power in the near future [2], [5], [21]. The power density of a high performance microprocessor will exceed the power density levels encountered in typical rocket nozzles within the next decade [2], [5]. No cost effective cooling solutions that can handle power densities in excess of nuclear reactors or rocket nozzles are likely for integrated circuits. As acknowledged by many designers and researchers, excessive power dissipation has emerged as the single greatest jeopardy against the further advances of the integrated circuit technologies [1]-[14].



Fig. 1.9. Increasing current demand of lead Intel microprocessors.

Another important issue is the increasing current demand of the ICs. While the power consumption of the ICs continues to increase, the supply voltages have been reduced as shown in Fig. 1.6. The supply current, therefore, increases as shown in Fig. 1.9. Increased current demand of ICs coupled with the scaling of the interconnect (as part of the technology scaling process) create metal migration and resistive voltage

drop problems in the power distribution network [21], [24], [25]. The tolerance of integrated circuits to voltage fluctuations in the power supply grid is typically reduced while the resistance of the interconnect is increased with technology scaling [18], [19], [24]. The resistive voltage drop of the power distribution grid, therefore, has become an increasingly important issue. Furthermore, current slew rates have increased due to the higher operating frequencies as well as the growing current demand. Therefore, in addition to resistance, the inductance of power distribution grids has also become an important factor that affects the propagation delay, area, and power consumption of CMOS circuits. Simultaneous switching noise ( $L \frac{di}{dt}$ ) due to the inductance of a power distribution grid affects the propagation delays, thereby degrading the performance and causing circuit malfunction under certain circumstances [25].

Power generation, delivery, and dissipation are primary limitations to further scaling of semiconductor devices [1]-[9], [16]-[18], [21], [23]. In order to continue to reduce the unit cost of an IC while simultaneously enhancing the performance and functionality, radical changes are required in the way ICs have been designed during the past three decades. Higher performance at all costs is no longer an option. Novel energy efficient devices, circuits, microarchitectures, and macroarchitectures must be developed to lower the rate of increase in the power consumed by next generation ICs.

#### 1.2. Outline of the Dissertation

Several new techniques for the design of low power and high performance integrated circuits are proposed in this dissertation. Particular emphasis is placed on issues related to the scaling of the supply and threshold voltages in high performance integrated circuits.

An analysis of power dissipation related problems faced by the semiconductor industry starts with the identification of the sources of power consumption. The primary sources of power consumption in CMOS integrated circuits are described in Chapter 2.

Supply and threshold voltage scaling techniques, aimed at lowering power dissipation and enhancing device reliability without degrading performance, are discussed in Chapter 3. The importance of supply voltage scaling is discussed from an energy efficiency point of view. As the supply voltage is reduced, the performance of an IC degrades due to reduced transistor currents [27]. Systems with multiple supply voltages can minimize the speed degradation for reducing power by selectively lowering the supply voltages along non-critical delay paths [28]. Dynamic and static versions of multiple supply voltage IC design techniques are reviewed. Another alternative technique for reducing the impact of supply voltage scaling on circuit performance is threshold voltage scaling. Threshold voltage scaling has accelerated together with scaling of the supply voltages during the past decade. At reduced threshold voltages, however, subthreshold leakage currents increase. Supply voltage scaling when coupled with threshold voltage reduction, therefore, increases the leakage power while reducing the dynamic switching power. Multiple threshold voltage circuits reduce leakage currents while enhancing performance by selectively lowering the threshold voltages only on speed critical paths [29]. Dynamic threshold voltage scaling (V<sub>t</sub>-hopping) and multiple threshold voltage CMOS circuit techniques are also reviewed in Chapter 3. Dynamic and static versions of multiple threshold voltage circuit techniques are discussed.

The generation and distribution of the energy required for the functioning of an IC are important challenges due to system level power budget limitations and circuit reliability issues. Increasing supply currents together with reduced supply voltages degrade the energy efficiency and voltage quality of the power generation and distribution networks in high performance ICs. Energy efficient low voltage monolithic DC-DC conversion and voltage regulation techniques are developed in this dissertation. Before presenting the details of the proposed monolithic DC-DC conversion techniques in the following chapters, the basics of DC-DC conversion and a review of several widely employed types of low voltage DC-DC converters are presented in Chapter 4.

In single power supply microprocessors, the primary power supply is typically an external (non-integrated) buck converter. In a typical non-integrated switching DC-DC converter, significant energy is dissipated by the parasitic impedances of the interconnect between the non-integrated devices (the filter inductor, filter capacitor, power transistors, and pulse width modulation circuitry) [9], [26], [30], [31]. Moreover, the devices of a discrete DC-DC converter are typically fabricated in old technologies with poor parasitic impedance characteristics. Integrating a DC-DC converter onto a microprocessor can potentially lower the parasitic losses as the interconnect between (and within) the DC-DC converter and the microprocessor is reduced. Additional energy savings can be realized by utilizing advanced deep submicrometer fabrication technologies with lower parasitic impedances. The efficiency attainable with a monolithic DC-DC converter, therefore, is higher than a non-integrated DC-DC converter [30]. An analysis of an on-chip buck converter is presented in Chapter 5. A model of the parasitic impedances of a buck converter is proposed. With this model, a design space is determined that supports the integration of active and passive devices on the same die for a target technology. A monolithic, high efficiency, and high frequency switching DC-DC converter with an integrated inductor on the same die as a dual supply voltage microprocessor is demonstrated to be feasible.

The model proposed in Chapter 5 provides an accurate representation of the parasitic losses of a full voltage swing DC-DC converter (with an error of less than 2.4% as compared to simulation). A high switching frequency is the key design parameter that enables the full integration of a high efficiency DC-DC converter. At these high switching frequencies, the energy dissipated in the power MOSFETs and gate drivers dominates the total losses of a DC-DC converter. The efficiency can, therefore, be improved by applying a variety of MOSFET power reduction techniques [31]. A low swing MOSFET gate drive technique is proposed in Chapter 6 that improves the efficiency of a DC-DC converter. A new circuit model for low swing circuit optimization is proposed. The gate voltages and transistor sizes are included as independent parameters in the proposed model. The optimum gate voltage swing of a

power MOSFET that maximizes efficiency is shown to be lower than a standard full voltage swing. Lowering the input and output voltage swing of a power MOSFET gate driver is shown to be effective for enhancing the efficiency characteristics of a DC-DC converter.

Due to the advantages of high voltage power delivery on a circuit board and monolithic DC-DC conversion, next generation low voltage and high power microprocessors are likely to require high input voltage, large step-down DC-DC converters monolithicly integrated onto the same die. The voltage conversion ratios attainable with standard non-isolated switching DC-DC converter circuits are, however, limited due to MOSFET reliability issues. Provided that a DC-DC converter is integrated onto the same die as a microprocessor (fabricated in a low voltage nanometer CMOS technology), the range of input voltages that can be applied to a standard DC-DC converter circuit is further reduced. A standard non-isolated switching DC-DC converter topology such as the buck converter circuit discussed in Chapters 5 and 6 is, therefore, not suitable for future high performance integrated circuits. A cascode bridge circuit that can be used in a monolithic DC-DC converter providing a high voltage conversion ratio is presented in Chapter 7. The cascode bridge circuit ensures that the voltages across the terminals of all of the MOSFETs in a DC-DC converter are maintained within the limits imposed by an available low voltage CMOS technology.

In ICs with multiple supply voltages, signal transfer among the regions operating at different voltage levels requires specialized voltage interface circuits [32]. Another low power circuit technique that requires voltage level conversion is low-swing interconnect signaling. At each new IC generation, the relative amount of interconnect increases due to the greater number of transistors and the larger die size. In many recent systems, charging and discharging these interconnect lines can require more than 50% of the total power consumed on-chip. In certain programmable logic devices, more than 90% of the total power consumption is due to the interconnect wires [32]. Decreasing the signal voltage swing on the interconnect can significantly decrease the power consumption. In a low swing interconnect architecture, voltage

level converters are placed at the driver and receiver ends of the low swing interconnect to change the voltage swing. A bi-directional CMOS voltage interface circuit that drives high capacitive loads to full swing at high speed while consuming no static DC power is presented in Chapter 8. The propagation delay, power consumption, and power efficiency characteristics of the proposed voltage interface circuit are compared to other interface circuits described in the literature. The proposed voltage interface circuit offers significant power savings and lower propagation delay as compared to these circuits.

Domino logic circuit techniques are extensively applied in high performance microprocessors due to the superior speed and area characteristics of dynamic CMOS circuits as compared to static CMOS circuits. High speed operation of domino logic circuits is primarily due to the lower noise margins of dynamic circuits as compared to static gates. This property of a lower noise margin, however, makes domino logic circuits highly sensitive to noise as compared to static gates. On-chip noise becomes more severe with technology scaling and higher operating frequencies. Furthermore, the noise sensitivity of domino logic circuits increases with technology scaling. Error free operation of domino logic circuits has, therefore, become a major challenge [33]. A variable threshold voltage keeper circuit technique is proposed in Chapter 9 for simultaneous power reduction and speed enhancement of domino logic circuits. The threshold voltage of the keeper transistor is modified during circuit operation to reduce the contention current without sacrificing noise immunity. The variable threshold voltage keeper circuit technique is shown to enhance circuit evaluation speed by up to 60% while reducing power consumption by 35% as compared to a standard domino logic circuit. The keeper size can be increased while preserving the same delay or power characteristics as compared to a standard domino circuit since the contention current is reduced with the proposed technique. The proposed domino logic circuit technique offers 14.1%, 8.9%, or 11.9% higher noise immunity under the same delay, power, or power-delay product conditions, respectively, as compared to a standard domino logic circuit technique. Forward body biasing the keeper transistor is also proposed for improved noise immunity as compared to a standard domino circuit with the same keeper size. It is shown that by applying forward and reverse body bias circuit techniques, the noise immunity and evaluation speed of domino logic circuits are both enhanced.

The subthreshold leakage current of a domino logic circuit can vary dramatically with the voltage state of the dynamic and output nodes. A quantitative study of the subthreshold leakage current characteristics of standard low threshold voltage and dual threshold voltage domino logic circuits is presented in Chapter 10. Different subthreshold leakage current conduction paths which exist depending upon whether the dynamic node is charged or discharged are identified. A discharged dynamic node is preferable for reducing leakage current in a dual threshold voltage circuit. Alternatively, a charged dynamic node is preferred for lower subthreshold leakage current in a standard low threshold voltage domino logic circuit with stacked pull-down devices, such as an AND gate. The effect of dual threshold voltage CMOS technologies on the noise immunity characteristics of domino logic circuits is also evaluated in Chapter 10.

A circuit technique is presented in Chapter 11 for exploiting the dynamic node voltage dependent asymmetry of the subthreshold leakage current characteristics of domino logic circuits. Sleep switch transistors are proposed to place an idle dual threshold voltage domino logic circuit into a low leakage state. The circuit technique enhances the effectiveness of a dual threshold voltage CMOS technology to reduce the subthreshold leakage current by strongly turning off all of the high threshold voltage transistors. The sleep switch circuit technique significantly reduces the subthreshold leakage energy as compared to both standard low threshold voltage and dual threshold voltage domino logic circuits. A domino adder enters and leaves the low leakage sleep mode within a single clock cycle. The energy overhead of the circuit technique is low, justifying the activation of the proposed sleep scheme by providing a net savings in total power consumption during short idle periods.

The low swing circuit technique has become an attractive method to reduce power consumption in high performance integrated circuits. Static CMOS circuits driven by low swing input signals, however, dissipate excessive power while displaying poor

delay characteristics. Specialized voltage interface circuits are required to transfer signals between static CMOS circuits operating at different voltage levels. Low swing circuit techniques, however, can be effective in domino logic circuits [34]. In a domino gate, the input signals are only applied to the NMOS transistors in the pulldown path, while a single pull-up PMOS transistor is driven by a separate clock signal. Therefore, a low swing signal that transitions between ground and a second sufficiently high voltage level to effectively turn on an NMOS transistor does not impose any functional or static DC power consumption problems in domino logic circuits [34]. A low swing domino logic circuit is proposed in Chapter 12 to reduce dynamic switching power consumption without degrading the noise immunity. The low swing concept is also applied to the domino circuit keeper to further reduce the power consumption while enhancing speed. A simple and efficient circuit technique is proposed for a dual threshold voltage implementation of the proposed low swing circuits. Significant reductions in leakage power (when the circuit is idle) without incurring a delay penalty (when the circuit is active) are observed as compared to low threshold voltage circuits.

A summary of the research results presented in this dissertation is provided in Chapter 13. Finally, future work inspired by current challenges faced by the semiconductor industry is discussed in Chapter 14. Some important research topics that respond to these challenges are proposed.

# Chapter 2

# **Sources of Power Consumption in CMOS Integrated Circuits**

Power consumption is a primary limitation to the further advancement of semiconductor technologies. Identifying the sources of power consumption is critical for developing power reduction techniques at the circuit, architecture, and fabrication technology levels. There are four sources of power consumption in CMOS circuits. The total power consumption of a CMOS circuit is

$$P_{total} = P_{dynamic} + P_{leakage} + P_{short-circuit} + P_{DC}, \qquad (2.1)$$

where  $P_{dynamic}$  is the dynamic switching power dissipated while charging or discharging the parasitic capacitances during a node voltage transition.  $P_{leakage}$  is a combination of the subthreshold leakage power due to the non-ideal off state characteristics of the MOSFET switches and the gate leakage power caused by carrier tunneling thorough the thin gate oxides.  $P_{short-circuit}$  is the transitory power dissipated during an input signal transition when both the pull-up and pull-down networks of a CMOS gate are simultaneously on.  $P_{DC}$  is the static DC power consumed when a CMOS circuit is driven by low voltage swing input signals.

All of these four sources of power consumption in a CMOS integrated circuit are analyzed in this chapter. The dynamic switching power is discussed in Section 2.1. The sources of leakage power are identified in Section 2.2. The short-circuit and static DC power consumption mechanisms are discussed in Sections 2.3 and 2.4, respectively.

# 2.1. Dynamic Switching Power

The dominant component of power consumption in a typical CMOS circuit is the dynamic switching power [4], [9], [12], [21], [23], [27]-[29], [36]. The dynamic switching power is dissipated while charging or discharging the parasitic capacitances during the voltage transitions of the nodes within a CMOS circuit. The dynamic switching power is independent of the type of switching gate and the shape of the input waveform (input rise and fall times). The dynamic switching power is dependent only on the supply voltage, the switching frequency, the initial and final voltages, and the equivalent capacitance of a switching node [4], [9], [36]. Since the switching power is independent of the type of switching gate, a block diagram representation of a generic CMOS gate (as shown in Fig. 2.1) is used to explain dynamic switching power dissipation in CMOS circuits.



Fig. 2.1. A CMOS gate driving an output capacitor. The drain-to-body junction capacitances of the driver gate, the equivalent capacitance of the interconnect, and the gate oxide capacitance of the load transistors are lumped into a single equivalent capacitance, C<sub>L</sub>.

For a low-to-high transition at the output node, the pull-up network is activated and the pull-down network is disabled. The portion of the current sourced by the power supply that passes through the pull-up transistors to charge the output capacitor is denoted by  $I_{out}(t)$ . The instantaneous power drawn from the power supply to charge the output capacitor is

$$P(t) = V_{DD}I_{out}(t), \tag{2.2}$$

$$I_{out}(t) = C_L \frac{dV_{out}(t)}{dt}, (2.3)$$

where  $V_{out}(t)$  is the instantaneous voltage across the output capacitor.

The energy drawn from the power supply for a  $V_1 \rightarrow V_2$  transition at the output node voltage is

$$E_{V_1 \to V_2} = \int_{t_1}^{t_2} P(t)dt = V_{DD} \int_{t_1}^{t_2} I_{out}(t)dt = C_L V_{DD} \int_{V_1}^{V_2} dV_{out}(t) = C_L V_{DD}(V_2 - V_1), (2.4)$$

$$V_{swing} = V_2 - V_1, \tag{2.5}$$

$$E_{V_1 \to V_2} = C_L V_{DD} V_{swing} , \qquad (2.6)$$

where  $E_{VI \to V2}$  is the energy drawn from the power supply for charging the output capacitance from an initial voltage of  $V_1$  to a final voltage of  $V_2$  and  $t_1$  and  $t_2$  are the times for the output voltage to reach  $V_1$  and  $V_2$ , respectively. After the  $V_1 \to V_2$  transition of the output node voltage is completed, the energy stored in the output capacitor is

$$E_{C_L} = \int_{t_1}^{t_2} P_{C_L}(t) dt = \int_{t_1}^{t_2} V_{out}(t) I_{out}(t) dt = C_L \int_{V_1}^{V_2} V_{out}(t) dV_{out}(t) = \frac{1}{2} C_L (V_2^2 - V_1^2), (2.7)$$

where  $P_{CL}(t)$  is the instantaneous power stored in the output capacitor. The remaining portion of the energy drawn from the power supply is dissipated in the parasitic resistances of the pull-up transistors during the output  $V_1 \rightarrow V_2$  transition.

For a high-to-low transition of the output node voltage, the pull-up network transistors are cutoff and the pull-down network is enabled. The magnitude of the portion of the instantaneous current through the pull-down network transistors that discharges the output node capacitor is  $I_{out}(t)$ . The polarity (or direction) of this discharging current is opposite to the direction of the load current as shown in Fig. 2.1. The energy dissipated in the parasitic resistances of the pull-down network transistors to discharge the output capacitor is

$$E_{V_2 \to V_1} = \int_{t_1}^{t_2} P_{pulldown}(t) dt = -\int_{t_1}^{t_2} V_{out}(t) I_{out}(t) dt = -C_L \int_{V_2}^{V_1} V_{out}(t) dV_{out}(t), \qquad (2.8)$$

$$E_{V_2 \to V_1} = -\frac{1}{2}C_L(V_1^2 - V_2^2) = \frac{1}{2}C_L(V_2^2 - V_1^2) = E_{C_L},$$
 (2.9)

where  $E_{V2\to VI}$  is the energy dissipated in the pull-down network while discharging the output capacitor from an initial voltage of  $V_2$  to a final voltage of  $V_1$  and  $t_1$  and  $t_2$  are the times for the output voltage to reach  $V_2$  and  $V_1$ , respectively. As given by (2.7) and (2.9), all of the energy stored in the output capacitor during a  $V_1 \to V_2$  transition is dissipated in the impedances of the pull-down network transistors during the following  $V_2 \to V_1$  transition.

The power is the energy stored or dissipated per unit of time [38]. Assuming that a node voltage periodically transitions between  $V_1$  and  $V_2$  with a period of  $T_s$ , the average dynamic power consumed by a CMOS gate driving a switching node is

$$P = \frac{E_{V_1 \to V_2}}{T_s} = f_s C_L V_{DD} V_{swing} . {(2.10)}$$

In a CMOS IC, all of the internal nodes do not necessarily change state at each clock cycle. In a synchronous CMOS IC, if statistical data are available for the average number of transitions experienced by a node during the execution of a certain task, an average activity factor  $\alpha$  can be introduced into the power and energy expressions. The average power consumed for switching a node i in a CMOS circuit is

$$P_i = \alpha_i f_s C_L V_{DD} V_{swing} , \qquad (2.11)$$

where  $P_i$  is the average dynamic switching power dissipation of a gate driving the  $i^{th}$  node and  $\alpha_i$  is the probability that a state changing voltage transition will occur at the  $i^{th}$  node within a clock cycle.

Summing the average dynamic switching power consumed at all of the nodes within a circuit, the total dynamic switching power consumption of an IC is [9], [27]

$$P_{Total} = f_s V_{DD} \sum_{i=1}^{N} \alpha_i C_{L_i} V_{swing_i}, \qquad (2.12)$$

where N is the total number of nodes within a CMOS circuit,  $C_{Li}$  is the equivalent parasitic capacitance of the  $i^{th}$  node, and  $V_{swingi}$  is the voltage swing on the  $i^{th}$  node.

In CMOS circuits, the node voltages are typically full swing between the ground and  $V_{DD}$ . The average switching power consumed by a full swing CMOS gate is [from (2.11)]

$$P_i = \alpha_i f_s C_L V_{DD}^2. \tag{2.13}$$

# 2.2. Leakage Power

A transistor switch is essentially a resistive-capacitive network between the power supply and ground. Due to the non-ideal off-state characteristics (a finite resistance) of a transistor, current is drawn from a power supply even when a transistor operates in the cutoff region. The leakage currents are dominated by weak inversion and reverse biased p-n junction diode currents in long channel devices [4], [9], [36], [37], [48]. In deep submicrometer integrated circuits, other leakage mechanisms such as drain-induced barrier lowering (DIBL) and gate oxide tunneling are also important [4], [37], [39]-[48].

The primary mechanisms of leakage current within deep submicrometer integrated circuits are reviewed in this section. The sources of subthreshold leakage current are discussed in Section 2.2.1. The gate-insulator leakage current mechanisms in deep submicrometer MOSFETs are reviewed in Section 2.2.2.

#### 2.2.1. Subthreshold Leakage Current

A MOSFET operates in the weak inversion (subthreshold) region when the magnitude of the gate-to-source voltage is less than the magnitude of the threshold voltage [44]. In the weak inversion mode, current conduction between the source and drain (the subthreshold leakage current) is primarily due to diffusion of the carriers [4], [9], [23], [44], [48]. The transistor off-state current (I<sub>OFF</sub>) is the drain current when the gate-to-source voltage is zero [4], [37]. I<sub>OFF</sub> is affected by the threshold voltage, channel length, channel width, depletion width beneath the channel area, channel/surface doping profiles, drain/source junction depths, gate oxide thickness, supply voltage, and the junction temperature [37]. The variation of the drain current of an NMOS transistor as a function of the gate voltage for three different drain voltages

assuming a 0.18 µm CMOS technology is shown in Fig. 2.2. Measurements reveal the dominant leakage current mechanisms within a MOSFET fabricated in a deep submicrometer CMOS process.



Fig. 2.2. The drain current of a short-channel n-type MOSFET as a function of the gate-to-source voltage ( $V_{GS}$ ) for three different drain-to-source voltages ( $V_{DS}$ ). The DIBL, weak inversion, and p-n junction diode leakage components of the drain subthreshold leakage current are indicated (0.18  $\mu$ m CMOS technology, W =  $10*W_{min}$ , and L =  $L_{min}$ ).

As the length of the channel is scaled, the capability of the gate to control the charge and potential distribution in the channel area degrades. The threshold voltage of a MOSFET is reduced with decreasing channel length [37], [39]-[45], [48]. The

effects of scaling the channel length on the threshold voltage and subthreshold leakage current characteristics of a MOSFET are called *short-channel effects*. The short-channel effects are discussed in Section 2.2.1.1. While the gate loses some control of the channel region, the effect of the drain on the voltage potential distribution across the channel area increases with scaling of the gate length [45], [50]. The effect of the bias conditions of the drain on the threshold voltage and subthreshold leakage current characteristics of a MOSFET is called drain-induced barrier lowering (DIBL) [44], [48]. DIBL is discussed in Section 2.2.1.2. The various parameters that characterize subthreshold leakage current in a deep submicrometer IC are reviewed in Section 2.2.1.3.

#### 2.2.1.1. Short-Channel Effects

In a long channel MOSFET, extensions of the space charge regions at the source and drain-to-body p-n junctions into the channel area occupy only a small fraction of the entire channel region. The gate voltage controls most of the space charge induced in the channel area of a long channel device during inversion. The effects of the extensions of the source and drain depletion regions into the channel area on the threshold voltage are, therefore, negligible in a long channel device [4], [48], [50].

As the channel length is decreased, however, the gate begins to play a small role on the creation of a depletion layer in the channel area [37], [45], [50]. As the channel length of a MOSFET is reduced with technology scaling, the depletion regions around the source and drain terminals become closer. The total depth of the source and drain depletion regions becomes comparable to the effective channel length in deep submicrometer devices. More charge is contributed to the depletion region beneath the gate area by the source-to-substrate and drain-to-substrate depletion layers in a short-channel device as compared to a long-channel device [44], [48], [50]. The threshold voltage of a transistor is, therefore, reduced (typically called V<sub>t</sub> roll-off) with decreasing gate length as shown in Fig. 2.3 [39]-[45].

The increasing dependence of the threshold voltage of a short-channel MOSFET on the gate length is an issue that threatens the future of technology scaling [40], [50]. The within-die and inter-die variation of circuit parameters such as the gate length causes CMOS circuits to display different clock frequency and leakage power characteristics from die-to-die, wafer-to-wafer, and lot-to-lot. Increasing short-channel effects combined with increasing die-to-die and within-die variations of the critical dimensions (channel length, oxide thickness, and junction depth) cause the performance of CMOS integrated circuits to become increasingly probabilistic, degrading yield. The number of dies that satisfy functional, timing, and power dissipation specifications is reduced with technology scaling, thereby further increasing the cost of fabrication in each new technology generation [50].



Fig. 2.3. Short-channel MOSFET threshold voltage roll-off for super-halo (both vertically and laterally non-uniform) and retrograde (vertically non-uniform) doping profiles [37], [40]-[43].

Novel device structures are, therefore, necessary to control the short-channel effects in deep submicrometer CMOS technologies. As shown in Fig. 2.3, a retrograde doping profile (vertically non-uniform doping) in the channel region causes an unacceptably large threshold voltage degradation as the channel length is reduced. Novel and more complex doping profiles such as super-halo (both vertical and lateral non-uniform doping) are needed to control the short-channel effects in deep submicrometer devices [40]-[43], [51]-[55].

### 2.2.1.2. Drain-Induced Barrier Lowering (DIBL)

As the magnitude of the reverse bias voltage across the drain-to-body p-n junction is increased, the depth of the junction depletion layer increases. A deeper depletion layer around the drain contributes a larger amount of depletion charge to the channel. An increased drain-to-body reverse bias voltage, therefore, enhances the short-channel effects and lowers the threshold voltage of a MOSFET. The threshold voltage degradation caused by an increased or decreased drain bias voltage of an n-type or p-type, respectively, MOSFET is commonly referred to as drain-induced barrier lowering (DIBL) [37], [44], [48]. As shown in Fig. 2.2, a significant portion of the subthreshold leakage current of a deep submicrometer MOSFET can be due to DIBL at high drain-to-body reverse bias voltages.

#### 2.2.1.3. Characterization of Subthreshold Leakage Current

The subthreshold leakage current in a short-channel MOSFET can be characterized by the following expression [48],

$$I_{subthresho\ ld} = \frac{\mu W C_{OX}}{I} V_T^2 e^{\frac{|V_{GS}| - |V_t|}{n V_T}} (1 - e^{\frac{-|V_{DS}|}{V_T}}), \tag{2.14}$$

$$V_T = \frac{kT}{q},\tag{2.15}$$

where  $\mu$  is the carrier mobility, W is the transistor width,  $C_{OX}$  is the oxide capacitance per unit area,  $V_T$  is the thermal voltage,  $V_{GS}$  is the gate-to-source voltage,  $V_t$  is the threshold voltage, n is the subthreshold swing coefficient,  $V_{DS}$  is the drain-to-source voltage, k is the Boltzman constant (1.38 x  $10^{-23}$  J/K), T is the absolute temperature (K), and q is the unit charge (1.6 x  $10^{-19}$  C). The subthreshold swing coefficient (the ideality factor [40]) is the reciprocal of the rate of change in the channel surface potential as a function of the gate voltage [40], [100]. The subthreshold swing coefficient for a bulk MOSFET is [40]

$$n \cong 1 + \frac{\varepsilon_{Si} t_I}{\varepsilon_I t_{Si}}, \tag{2.16}$$

where  $\varepsilon_{Si}$  and  $\varepsilon_{I}$  are the permittivity of silicon and gate oxide, respectively,  $t_{Si}$  is the thickness of the depletion layer of the substrate, and  $t_{I}$  the physical thickness of the gate oxide.

Assuming the body effect is approximately linear for low source-to-body voltages [37], the threshold voltage of a short-channel MOSFET is

$$V_{t} \cong V_{t0} + \gamma V_{SB} - \eta V_{DS} , \qquad (2.17)$$

where  $V_{t0}$  is the zero body bias threshold voltage,  $\gamma$  is the body effect coefficient (assuming a linear approximation), and  $\eta$  is the drain-induced barrier lowering coefficient.

Weak inversion is the most significant source of the total leakage current in current deep submicrometer CMOS technologies [46], [48]. As given by (2.14) and (2.17), the subthreshold leakage current is exponentially dependent on the junction

temperature and the gate-to-source, drain-to-source, and threshold voltages. The exponential variation of the subthreshold leakage current of an n-type MOSFET with temperature for four different CMOS technology generations is illustrated in Fig. 2.4. As shown in Fig. 2.4, the subthreshold leakage current increases with technology scaling due to the lower threshold voltages and increasing short-channel and drain-induced barrier lowering effects.



Fig. 2.4. Variation of the subthreshold leakage current with junction temperature for four different CMOS technology generations (W =  $10*W_{min}$  and L =  $L_{min}$ ).

A commonly used parameter in characterizing the leakage behavior of deep submicrometer circuits is the subthreshold slope (also called the gate swing or the subthreshold swing [100]). The subthreshold slope (S) is the amount of variation required in the gate-to-source ( $V_{GS}$ ) or threshold ( $V_t$ ) voltage in order to vary the weak inversion current by one decade [36], [100]. As shown in Fig. 2.2, the rate of change of the logarithm of the drain current [log ( $I_D$ )] with respect to the gate voltage is

approximately linear in the subthreshold region. The subthreshold slope (mV/decade) can be evaluated by choosing two points in the subthreshold region of an  $I_D$ - $V_{GS}$  curve such that the subthreshold leakage current changes by a factor of ten. Using (2.14),

$$e^{\frac{|V_{GS_1}|-|V_{GS_2}|}{nV_T}} = 10, (2.18)$$

$$S = |V_{GS1}| - |V_{GS2}| = nV_T \ln 10, \tag{2.19}$$

where  $V_{GSI}$  and  $V_{GS2}$  are the two gate-to-source voltages (in the subthreshold region of the  $I_D$ - $V_{GS}$  curve) between which the subthreshold current varies by one decade.

Substituting S in (2.14),

$$I_{subthresho\ ld} = \frac{\mu WC_{OX}}{L} V_T^2 e^{\frac{\ln 10(|V_{GS}| - |V_t|)}{S}} (1 - e^{\frac{-|V_{DS}|}{V_T}}), \tag{2.20}$$

$$I_0 = \frac{\mu W C_{OX}}{I_c} V_T^2 e^{\frac{|V_{GS}| \ln 10}{S}} (1 - e^{\frac{-|V_{DS}|}{V_T}}), \tag{2.21}$$

$$I_{subthresho\ ld} = I_0 e^{\frac{-|V_t|\ln 10}{S}} = I_0 y,$$
 (2.22)

$$y = e^{\frac{-|V_t|\ln 10}{S}} \Rightarrow \ln y = \frac{-|V_t|\ln 10}{S} \Rightarrow \frac{\ln y}{\ln 10} = \frac{-|V_t|}{S} \Rightarrow \log_{10} y = \frac{-|V_t|}{S}, \quad (2.23)$$

$$y = 10^{\frac{-|V_t|}{S}}, (2.24)$$

$$I_{subthreshold} = I_0 10^{\frac{-|V_t|}{S}}. (2.25)$$

Equations (2.14), (2.20), (2.22), and (2.25) are commonly used subthreshold leakage current expressions found in the literature. Equation (2.14) is a simplified version of the Berkeley short-channel insulated gate field effect transistor model (BSIM) [49]. Equation (2.25) captures the approximately linear relationship between the logarithmic leakage current and the threshold voltage of a MOSFET operating in the subthreshold region (see Fig. 2.2). Alternatively, the following expression is often used to emphasize the logarithmic relationship between the gate-to-source and threshold voltages and the subthreshold leakage current,

$$I_{subthreshold} = I_0' 10^{\frac{(|V_{GS}| - |V_t|)}{S}}, \qquad (2.26)$$

$$I_{o}' = \frac{\mu W C_{OX}}{L} V_{T}^{2} (1 - e^{\frac{-|V_{DS}|}{V_{T}}}). \tag{2.27}$$

A low subthreshold slope is desirable as the subthreshold leakage current decreases exponentially with a reduced S. As given by (2.16) and (2.19), S can be reduced by lowering the thickness of the gate oxide and/or the doping concentration of the substrate (due to the increasing thickness of the depletion layer of the substrate). Another way to reduce S is to lower the junction temperature. As given by (2.16), if the depletion capacitance is assumed to be negligible as compared to the oxide capacitance ( $C_{DEPLETION}/C_{ox} \approx 0$  and  $n \approx 1$ ), the lower boundary of S is

$$S \ge V_T \ln 10. \tag{2.28}$$

As given by (2.28), a subthreshold slope of 60 mV/decade is a lower theoretical limit at room temperature for bulk silicon MOSFETs [84]. This minimum value of the subthreshold slope can be approached by fully depleted silicon-on-insulator devices [36], [73], [75]. Typical values of S vary between 80 mV/decade and 100 mV/decade

at room temperature for bulk silicon MOSFETs fabricated in current CMOS technologies [51]-[55]. A comparison of the subthreshold slope, drain saturation current ( $I_{DSAT}$ ), and off-state leakage current ( $I_{OFF}$ ) of an NMOS transistor fabricated at different process technologies is listed in Table 2.1 [4], [51]-[55].

TABLE 2.1 A COMPARISON OF THE SUBTHRESHOLD SLOPE (S) AND LEAKAGE CURRENT ( $I_{OFF}$ ) OF NMOS TRANSISTORS FABRICATED AT DIFFERENT TECHNOLOGIES (T =  $25^{\circ}$ C) [4], [51]-[55]

| Technology (µm)        | L <sub>eff</sub> (µm) | V <sub>DD</sub> (V) | V <sub>t</sub> (V) | I <sub>DSAT</sub> (mA/μm) | I <sub>OFF</sub><br>(nA/μm) | $I_{ m DSAT}/I_{ m OFF}$ | S (mV/decade) |
|------------------------|-----------------------|---------------------|--------------------|---------------------------|-----------------------------|--------------------------|---------------|
| 0.80                   | 0.55                  | 5.0                 | 0.60               | _                         | 5.8x10-5                    | 5.8x10-5 -               |               |
| 0.60                   | 0.35                  | 3.3                 | 0.58               | _                         | 1.5x10-4                    | -                        | 80            |
| 0.35                   | 0.25                  | 2.5                 | 0.47               | _                         | 8.9x10-3                    | -                        | 80            |
| 0.25                   | 0.15                  | 1.8                 | 0.43               | -                         | 24x10-3                     | -                        | 78            |
| 0.18                   | 0.10                  | 1.6                 | 0.40               | -                         | 86x10-3                     | _                        | 85            |
| 0.18                   | 0.10                  | 1.5                 | 0.29               | 1.04                      | 3                           | 347x10 <sup>3</sup>      | 90            |
| 0.13                   | 0.06                  | 1.4                 | 0.30               | 1.14                      | 10                          | 114x10 <sup>3</sup>      | 85            |
| (Dual-V <sub>t</sub> ) | 0.06                  | 1.4                 | 0.27               | 1.30                      | 100                         | $130 \times 10^2$        | 85            |
| 0.10                   | 0.05                  | 1.2                 | 0.34               | 0.95                      | 30                          | $316 \times 10^2$        | 87            |
| 0.09                   | 0.05                  | 1.2                 | -                  | 1.26                      | 40                          | 315x10 <sup>2</sup>      | 85            |
| (Dual-V <sub>t</sub> ) | 0.05                  | 1.2                 | _                  | 1.45                      | 400                         | 3600                     | 85            |

## 2.2.2. Gate Oxide Leakage Current

Ideal scaling theory suggests maintaining the electric fields within a device by shrinking all of the voltages, currents, and physical dimensions and increasing all of the doping concentrations by the same scaling factor ( $\lambda$ ) [40], [56]. Historically, however, the voltages and currents have been scaled at a lower rate as compared to the physical dimensions. The electric fields within the devices, therefore, have significantly increased. The primary reason for the reluctance to scale the voltages and currents as rapidly as the physical dimensions has been the positive effect of increasing electric fields on the device performances (causing the carrier velocities to increase) [40], [56].

There are a number of challenges for continuing the scaling of MOSFET devices. A primary and most immediate challenge is imposed by continuing the scaling of the gate dielectric thickness. The scaling of the gate oxide thickness is crucial to enhancing the performances of semiconductor devices [40], [51]-[56]. Reducing the thickness of the gate oxide increases the oxide capacitance per unit area, thereby enhancing the drain current of a deep submicrometer MOSFET. An alternative, simpler argument that stems from geometric considerations is that the gate-insulator in a MOSFET should be thin as compared to the device channel length in order for the gate to exert dominant control over the charge distribution in the channel as compared to the source and drain terminals of a device [56]. An oxide thickness much smaller than the channel length reduces the short-channel effects (discussed in Section 2.2.1.1) [51]-[56]. Moreover, scaling the thickness of the gate oxide reduces the subthreshold slope, thereby lowering the subthreshold leakage current. The dielectric thickness is required to be a few percent of the channel length in a typical MOSFET [56]. Future scaling trends of the gate oxide in relation with general CMOS technology scaling trends, extracted from the projections of the International Technology Roadmap for Semiconductors, are listed in Table 2.2 [56].

Silicon dioxide (SiO<sub>2</sub>) has been the material of choice as the gate oxide insulator for the past three decades. SiO<sub>2</sub> is relatively easy to grow on silicon, forming an abrupt

interface with near ideal electrical characteristics [37], [40], [56]. The trap and fixed charge densities at the Si-SiO<sub>2</sub> interface are typically less than one surface defect in  $10^5$  surface silicon atoms, forming a nearly ideal interface between the silicon and SiO<sub>2</sub> [56].

TABLE 2.2
SEMICONDUCTOR DEVICE SCALING TRENDS [56]

| Year of Production                            | 1999    | 2002    | 2005    | 2008    | 2011    | 2014    |
|-----------------------------------------------|---------|---------|---------|---------|---------|---------|
| Minimum Feature Size (nm)                     | 180     | 130     | 100     | 70      | 50      | 35      |
| Gate Length (nm)                              | 100     | 70      | 50      | 35      | 24      | 18      |
| DRAM Bits/Chip                                | 1G      | 3G      | 8G      | 24G     | 64G     | 192G    |
| DRAM Chip Size (mm²)                          | 400     | 460     | 530     | 630     | 710     | 860     |
| Equivalent Physical Gate oxide Thickness (nm) | 1.9-2.5 | 1.5-1.9 | 1.0-1.5 | 0.8-1.2 | 0.6-0.8 | 0.5-0.6 |
| Dielectric Constant of DRAM Capacitor         | 22      | 50      | 250     | 700     | 1500    | 1500    |
| Power Supply (volts)                          | 1.5-1.8 | 1.2-1.5 | 0.9-1.2 | 0.6-0.9 | 0.5-0.6 | 0.5     |

As listed in Table 2.2, the equivalent physical thickness of the gate oxide will likely be scaled to the range of 1.0 nm to 1.5 nm to maintain a reasonable coupling capacitance between the gate and the channel of a MOSFET at the 100 nm technology generation. Oxides thinner than about 1.5 nm, however, conduct high direct tunneling currents. The quantum mechanical tunneling of carriers increases exponentially with decreasing insulator layer thickness [37], [40], [56]-[60]. The SiO<sub>2</sub> gate oxide tunneling current density versus the gate voltage for various insulator thicknesses of

an NMOS device is shown in Fig. 2.5 assuming a 100 nm CMOS technology [40]-[43].



Fig. 2.5. Gate oxide tunneling current density as a function of the gate voltage for various gate oxide (SiO<sub>2</sub>) thicknesses assuming a 100 nm CMOS technology [40]-[43].

Direct tunneling current in an MOS device depends on the tunneling probability function and the number of tunneling carriers [56]. Different mechanisms of gate dielectric tunneling are shown in Fig. 2.6. The three mechanisms of gate dielectric tunneling leakage in a MOSFET are electron conduction band tunneling (ECB), electron valence band tunneling (EVB), and hole valence band tunneling (HVB) [39], [57]. Because of the higher tunneling barrier and the heavier effective mass of the holes, the hole tunneling current is approximately an order of magnitude smaller than the electron tunneling current for the same bias conditions [40], [56]. The gate oxide

leakage current of a CMOS circuit is, therefore, typically dominated by the oxide leakage of the NMOS transistors.

As shown in Fig. 2.7, the gate tunneling current is composed of several components [39], [57].  $I_{gb}$  is the gate-to-substrate leakage current.  $I_{gs0}$  and  $I_{gd0}$  are, respectively, the leakage currents through the gate-to-source and gate-to-drain overlap regions.  $I_{gc}$  is the gate-to-channel tunneling current during operation in the inversion region. A portion of  $I_{gc}$  is collected by the source while the remaining portion of  $I_{gc}$  is collected by the drain. The primary gate tunneling current mechanisms (as shown in Fig. 2.6) that generate the current components (as illustrated in Fig. 2.7) for different regions of operation of a MOSFET are listed in Table 2.3 [57].



Fig. 2.6. The three mechanisms of gate dielectric tunneling current in an NMOS transistor.

A schematic of the subthreshold and gate oxide leakage current paths in a CMOS inverter are shown in Fig. 2.8. When the gate of an NMOS device is positively biased, an inversion layer is formed underneath the gate. The electrons in the inverted channel

can tunnel to the positively biased polysilicon gate, producing a gate oxide leakage current. Assume that the drain current of a MOSFET in a 100 nm technology generation is approximately 1 mA/µm [56]. Also assume that the gate tunneling current is limited to 1% of the drain current in order to not significantly degrade the gain of the devices [56]. These assumptions constrain the acceptable maximum gate leakage current density to 10<sup>4</sup> A/cm<sup>2</sup>. As shown in Fig. 2.5 and assuming a supply voltage as listed in Table 2.2 for a 100 nm technology, a gate oxide thickness in the range of 1.0 nm to 1.5 nm would result in acceptable gate leakage without significantly degrading the gain of the devices in a 100 nm technology generation. These high levels of gate oxide tunneling current, however, are expected to significantly increase the energy dissipation and degrade the reliability characteristics of future deeply scaled CMOS technologies [37], [40], [56], [58], [59].



Fig. 2.7. Different components of gate dielectric tunneling current in a MOSFET.

TABLE 2.3

THE DOMINANT MECHANISMS OF GATE OXIDE TUNNELING CURRENT FOR DIFFERENT REGIONS OF OPERATION OF A MOSFET [57]

| Current Component   |      | $ m I_{gc}$ | $ m I_{gb}$ |        |
|---------------------|------|-------------|-------------|--------|
| Region of Operation |      | Inversion   | Vg > 0      | Vg < 0 |
| Type of Transistor  | PMOS | HVB         | ECB         | EVB    |
|                     | NMOS | ECB         | EVB         | ECB    |



Fig. 2.8. Standby power dissipation current paths in a CMOS circuit.

The direct tunneling currents that continuously flow in thin oxide layers create some long term reliability concerns [56]. The oxide defect density gradually increases due to the oxide leakage current. Eventually, the gate oxide destructively breaks down (via a short-circuit) [56], [58], [59]. As has been described in [58], the hard breakdown (failure) of the gate-insulator is highly dependent on the thickness of the insulator and the gate voltage. As the oxide tunneling current increases exponentially with decreasing oxide thickness and as the scaling of the supply voltage slows down due to

the difficulty to scale the threshold voltages, the time to breakdown will decrease with technology scaling [56], [58], [59].

For the gate oxide leakage current to not have a significant effect on the leakage energy characteristics of a deep submicrometer device, the gate oxide leakage current should not exceed the subthreshold leakage current. Assume that the subthreshold leakage current (I<sub>off</sub>) density of a MOSFET is 100 nA/µm in a 100 nm CMOS technology [56]. The gate oxide leakage current density will thus be limited to less than 100 A/cm² to have a non-dominant effect on the overall leakage energy characteristics. As shown in Fig. 2.5, this limits the thinnest acceptable gate oxide layer to approximately 1.25 nm at the 100 nm technology node [40], [56].

The dielectric thicknesses projected in Table 2.2 are, therefore, unrealizable beyond the year 2005 if SiO<sub>2</sub> is maintained as the insulator material. New materials with higher dielectric constant (high-K) as compared to SiO<sub>2</sub> are required as the gate insulator beginning with the 100 nm technology generation [55], [56]. A material with a higher dielectric constant can have a physically thicker dielectric layer while offering an equivalent SiO<sub>2</sub> thickness that corresponds to the values listed in Table 2.2. A thicker and higher-K gate insulator material can significantly reduce oxide tunneling leakage current while increasing the time to breakdown the oxide, thereby enhancing both the device reliability and the energy efficiency [37], [56]. Some strong candidates that are likely to replace SiO<sub>2</sub> as the gate oxide material in the near future are shown in Fig. 2.9 [17].

As discussed previously, gate oxide tunneling current not only depends on the thickness of the insulator but also on the effective mass of the carriers and the barrier height of the insulator [56]. Although employing a high-K dielectric provides the opportunity to increase the thickness of the insulator while enhancing the capacitive coupling between the gate and the channel, a higher-K material also typically decreases the bandgap of the insulator. A large bandgap is desirable as the barrier height typically scales with the bandgap [40], [56]. A high-K dielectric should, therefore, be chosen carefully as a thicker insulator does not always result in a significantly lower tunneling current.



Fig. 2.9. Comparison of the gate oxide capacitance per unit area versus the gate oxide leakage current density of various insulators [17] for Aluminum Oxide (Al<sub>2</sub>O<sub>3</sub>), Hafnium Dioxide (HfO<sub>2</sub>), Silicon Dioxide (SiO<sub>2</sub>), Tantalum Pentoxide (Ta<sub>2</sub>O<sub>5</sub>), Titanium Dioxide (TiO<sub>2</sub>), and Zirconium Dioxide (ZrO<sub>2</sub>).

Another issue with replacing SiO<sub>2</sub> with a high-K material is the possible degradation of the interface between the silicon and the new dielectric material. Provided that a new high-K material cannot offer a low defect density comparable to the Si-SiO<sub>2</sub> interface, the potential improvement in the transistor current due to enhanced capacitive coupling between the gate and the channel can be offset by degraded carrier mobility at the surface [56]. Increasing the dielectric constant above a certain limit has been shown to degrade the circuit performance due to a lower surface mobility provided that similar subthreshold leakage characteristics comparable to a

standard SiO<sub>2</sub> MOSFET are maintained [60]. It has also been shown in [60] that employing a thicker high-K dielectric can increase the short-channel effects, thereby increasing the subthreshold leakage current and threshold voltage roll-off due to increasing fringing electric fields from the gate to the source and drain.

#### 2.3. Short-Circuit Power

In static CMOS circuits, there is a time period during the transition of the input signals when both the pull-up and pull-down network transistors are simultaneously on, thereby forming a DC current path between the power supply and ground. The DC current conducted by a CMOS circuit during an input signal transient (due to non-zero rise and fall times of the input signals) is called the short-circuit current [9], [36], [61]. The short-circuit current is temporarily observed during the input signal transition,  $V_{tn} \leq V_{DD}$ - $|V_{tp}|$ .

Short-circuit power is a function of the rise and fall times of the input and output signals and the output load. The short-circuit current can be significant when the rise and fall times of the input signals are significantly larger than the output rise and fall times as the short-circuit current path will exist for a longer period of time [36], [61]. As discussed in [9], [36], and [61], the short-circuit power typically contributes to less than 10% of the total power consumed in a CMOS circuit provided that the input slew rate is higher than the output slew rate. However, if the output signal transition occurs faster than the input signal transition, the short-circuit power can be as high as the dynamic switching power [61].

As discussed in [62], the contribution of the short-circuit power to the total power consumption is expected to be smaller with technology scaling due to the increasing threshold to supply voltage ratio ( $V_t/V_{DD}$ ). The short-circuit current can be effectively eliminated by lowering the supply voltage below the sum of the threshold voltages of the PMOS and NMOS transistors,  $V_{DD} < V_{tn} + |V_{tp}|$  [36]. A CMOS circuit technology that does not suffer from short-circuit current is ultra-low power subthreshold CMOS.

The transistors in a subthreshold logic circuit operate in the weak inversion region [99] without consuming any short-circuit power.

#### 2.4. Static DC Power

The change of the dominant IC technology from NMOS to CMOS in the early 1980s has diminished the issue of static DC power. CMOS circuits do not consume any static DC power (excluding leakage power) as long as the signal voltage at the internal nodes swings full rail between V<sub>DD</sub> and ground. Non-full rail voltage levels, however, are often encountered in CMOS circuits due to the employment of low signal swing circuitry such as NMOS pass gates [9] and low swing interconnect signaling techniques [32]. Non-full rail voltage levels can also be observed at the interfaces between different integrated circuits or circuit styles (as in SoC systems) operating at different voltage levels. When a CMOS circuit supplied by full rail power and ground supplies is driven by a low swing input signal, static power is dissipated as the transistors in both the pull-up and pull-down networks are simultaneously turned on. A CMOS inverter driven by a low swing signal is shown in Fig. 2.10. The second stage gate in Fig. 2.10 behaves as a voltage divider (rather than as an inverter), consuming static DC power and degrading the voltage swing at node<sub>2</sub>.



Fig. 2.10. Static DC current in a full voltage rail CMOS inverter driven by a low voltage swing signal.

In ICs with multiple supply voltages, energy efficient and full swing signal transfer among regions operating at different voltage levels requires specialized voltage interface circuits. Voltage interface circuits are discussed in Chapter 8.

# **Chapter 3**

# Supply and Threshold Voltage Scaling Techniques

Supply voltage scaling is an essential step in the technology scaling process. Two primary reasons for scaling the supply voltage are to maintain the power density of an integrated circuit below a limit dictated by available cost effective cooling techniques and to guarantee the long term reliability of the devices fabricated in a scaled semiconductor technology.

Provided that the supply voltage is not scaled together with the vertical and lateral dimensions of the devices, the electric fields across the terminals of the MOSFETs increase, degrading the reliability and changing the electrical characteristics of the devices. The electric fields between the source and drain and across the source-to-body and drain-to-body junctions must be maintained below certain levels to lessen any short-channel effects in a scaled CMOS technology. The electric field across the gate oxide must also be limited to maintain high carrier mobility in the channel region [52], [54] and to lower tunneling-based leakage currents through the scaled gate insulator [51], [52], [54]. The gate oxide leakage currents can significantly increase the static power dissipation while decreasing the transconductance and gate oxide failure time of the devices [40], [56], [58], [59].

Dynamic switching energy is the dominant component of the total energy consumed by an integrated circuit in current CMOS technologies. The dynamic switching energy is proportional to the square of the supply voltage in a full voltage swing CMOS circuit. Moreover, the leakage and short-circuit energy components also depend superlinearly on the supply voltage. Reducing the supply voltage, therefore, is an effective way to lower the power dissipation. The variation of the total power

consumption of a  $0.18~\mu m$  based CMOS ring oscillator with the supply voltage is shown in Fig. 3.1.



Fig. 3.1. Normalized power consumption versus supply voltage ( $V_{DD}$ ) of a 19 stage ring oscillator assuming a 0.18  $\mu$ m CMOS technology.

The propagation delay (high-to-low or low-to-high) through a CMOS gate can be approximated by [9], [91]

$$T_d \cong \frac{C_L V_{DD}}{I} = \frac{L_{eff} C_L V_{DD}}{W B (V_{DD} - V_t)^n},$$
(3.1)

$$T_d \alpha \frac{V_{DD}}{(V_{DD} - V_t)^n}, \tag{3.2}$$

where  $C_L$  is the load capacitance,  $V_{DD}$  is the supply voltage, I is the drain current in the saturation region, W is the effective transistor width, and  $L_{eff}$  is the effective transistor length. B and n are technology related parameters that determine the drain current characteristics of a deep submicrometer MOSFET operating in the saturation region [91]. The value of n typically varies between one and two depending upon the MOSFET fabrication technology. For a long-channel device, n is two. For a short-channel device, n is typically less than two due to velocity saturation.

Combining (2.11) and (3.2) and assuming a full voltage swing circuit, the relationship between the dynamic switching power consumption and the power supply and threshold voltages is

$$P_d \propto V_{DD} (V_{DD} - V_t)^n$$
. (3.3)

The relationship between the delay and supply voltage is nonlinear. As given by (3.2), the delay of a CMOS circuit increases with reduced supply voltage. The variation of the delay of a CMOS ring oscillator with supply voltage is shown in Fig. 3.2. Lowering the supply voltage (while maintaining the same threshold voltages) reduces both the energy consumed by the parasitic impedances (due to the lower amount of energy stored in the parasitic capacitors) and the maximum operating frequency. The reduction in power consumption by supply voltage scaling is, therefore, more than quadratic as given by (3.3) and as shown in Fig. 3.1.

The primary reason for the reluctance to move to lower supply voltages has been the speed penalty. Until recently, the preferred value of supply voltage has typically been determined by device reliability requirements rather than by power dissipation concerns. Tradeoffs and priorities, however, have shifted as the power density of high performance integrated circuits approach 100 W/cm<sup>2</sup> (see Fig. 1.8). Available low cost

cooling solutions are ineffective at such high power densities. Moreover, satisfying the market demand for enhanced performance and functionality in portable applications has become increasingly challenging due to lagging improvements in battery technology and cooling solutions. There is, therefore, a necessity for optimizing high performance integrated circuits not only for higher speed and reliability but also for lower power dissipation. Supply voltage scaling is expected to continue into the foreseeable future, as scaling is the most effective technique for reducing power consumption in CMOS integrated circuits.



Fig. 3.2. Normalized delay versus supply voltage ( $V_{DD}$ ) of a 19 stage ring oscillator for a 0.18  $\mu m$  CMOS technology.

Several techniques have been proposed for exploiting the more than quadratic reduction in power consumption by lowering the supply voltage while compensating for the speed degradation when operating at a lower supply voltage. At the architectural level, an effective way to maintain circuit performance while lowering the supply voltage is to utilize parallel (or pipelined) architectures. As discussed in [36], employing parallel circuits (each parallel circuit has a similar function) permits the clock speed requirement per circuit block to be reduced in order to execute a specific task with a target latency. Parallel circuit blocks can operate at a lower supply voltage at reduced speed while achieving overall circuit throughput objectives [36]. The primary disadvantage of this technique, however, is the significant area and power overhead due to the parallel replication of the circuitry [29].

At the circuit level, an effective way to lower power consumption without degrading performance is to dynamically adjust the supply voltage as the workload varies with time. Dynamic voltage scaling techniques are discussed in Section 3.1. Another technique for minimizing the deleterious effects of supply voltage scaling is to lower the supply voltage of only those circuits along the non-critical delay paths while maintaining a higher supply voltage on the speed critical paths [28]. Multiple supply voltage circuit techniques that exploit differences in signal propagation delays along different delay paths by selectively scaling the local supply voltages are reviewed in Section 3.2.

The most widely employed technique for enhancing the performance of a circuit at a reduced supply voltage is to scale the threshold voltages. Lowering the threshold voltages enhances the gate overdrive ( $|V_{GS}|$ - $|V_t|$ ) of the transistors, thereby reducing the propagation delay of the circuits. The threshold voltage scaling technique is discussed in Section 3.3.

Threshold voltage scaling not only enhances the speed but also increases the subthreshold leakage current, short-channel effects, and die-to-die and within-die parameter variations. A promising circuit technique aimed at lowering deleterious side effects caused by supply and threshold voltage scaling is the use of multiple supplies

and threshold voltages. Circuit techniques based on multiple supplies and threshold voltages are reviewed in Section 3.4.

Dynamic supply and threshold voltage scaling techniques combine the desirable characteristics of dynamic supply voltage scaling (in order to lower the power consumption) and dynamic threshold voltage scaling (in order to lower subthreshold leakage current and die-to-die and within-die variations of electrical characteristics). Dynamic supply and threshold voltage scaling techniques are discussed in Section 3.5. A summary of the various supply and threshold voltage scaling techniques presented in this chapter is provided in Section 3.6.

## 3.1. Dynamic Supply Voltage Scaling

The computational load in a microprocessor system varies with time [68]-[71]. Applications in a typical microprocessor system tend to have peak performance requirements followed by idle periods, as shown in Fig. 3.3. During active computation, the microprocessor performance required to execute a task varies as a function of the workload. An operation is a basic unit of computation [68]. Another commonly used basic unit of computation is an instruction. The utilization of a microprocessor can be evaluated in terms of the number of operations or the number of instructions required to complete the processing of a task within a specific time frame. A metric called throughput is often used as a measure of the utilization of a microprocessor system. The throughput is the number of operations (or the number of instructions) performed over a unit period of time [68],

Throughput = 
$$\frac{Number\ of\ Operations}{Unit\ Time}$$
. (3.4)

The throughput is typically described in terms of either millions of operations per second (MOPS) or millions of instructions per second (MIPS).

Computation-intensive and short-latency tasks (e.g., audio and video decompression, speech and image recognition) typically utilize the maximum throughput of a microprocessor [68]. Alternatively, low speed and long latency tasks (e.g., playing music, periodic system checks and backup) require only a fraction of the maximum throughput offered by a microprocessor. Executing the long latency tasks faster than the required throughput has no observable benefit for the user [68], [70]. Moreover, as shown in Fig. 3.3, there are frequent idle periods during which no active computation is made by a computer system (zero throughput requirement). Maintaining the full computational capacity of a processor during such idle periods wastes significant amounts of energy despite the zero throughput requirement. Particularly in portable applications such as cellular phones, the duration of these idle periods is typically significantly longer than the duration of the active periods [83]. Lowering the energy consumption of an integrated circuit during these long idle periods is therefore critical for extending the limited lifetime of a portable battery.



Fig. 3.3. Variation of the throughput required to execute certain tasks in a typical microprocessor system [68].

The dynamic voltage scaling (DVS) circuit technique exploits variations in the computational workload by dynamically modifying the supply voltage and clock frequency of a microprocessor system [68]-[70]. The primary objective of the DVS circuit technique is to provide high throughput during the execution of only the computation-intensive tasks while saving energy during the rest of the time by lowering the supply voltage and operating speed of a microprocessor.

The clock frequency of a DVS microprocessor is controlled by software. Only the operating system (OS) has the necessary information that characterizes all of the active tasks. The OS, therefore, controls the clock frequency by determining the minimum clock frequency required to complete a task within a specific time period.

Since the required clock frequency of a microprocessor varies, the supply voltage should also vary to minimize the energy consumption while guaranteeing the operation of the microprocessor circuitry at the revised clock frequency requested by the operating system. The software is not aware of the minimum supply voltage required for a microprocessor to operate at a desired clock frequency. Translating a desired clock frequency (requested by the operating system) to a particular minimum supply voltage at which the microprocessor circuitry can operate at the desired clock frequency (while dissipating minimum energy) is accomplished by the circuit hardware.

Using closed loop feedback circuitry as shown in Fig. 3.4, the supply voltage of a microprocessor can be dynamically adjusted [68]-[71]. The desired clock frequency (f<sub>DESIRED</sub>) to execute a specific task is passed to a register by the operating system. A replica of the most critical path in a microprocessor is employed to track the instantaneous clock frequency of a microprocessor at a specific supply voltage. A ring oscillator translates the supply voltage generated by a DVS DC-DC converter to a specific clock frequency (f<sub>CLOCK</sub>). This clock frequency is compared to the desired clock frequency, generating a digital frequency error signal (f<sub>ERROR</sub>). The loop filter, using this error function, generates the control signals for the drivers of the power transistors of the DC-DC converter to either modify or maintain the output voltage.

The minimum supply voltage required for the operation of a microprocessor at a desired clock frequency is, thereby, dynamically generated.



Fig. 3.4. Feedback loop architecture for a dynamic voltage scaling circuit [70].

The circuitry in a standard single supply voltage microprocessor system is typically designed to tolerate a maximum variation of approximately  $\pm$  10% of the supply voltage. Alternatively, the circuitry of a DVS microprocessor is designed to operate over a much wider range of supply voltages to maximize the energy efficiency while maintaining the operation of the microprocessor system during supply voltage transients. Static CMOS logic gates typically have a high tolerance to variations in the supply voltage. As the supply voltage is scaled, the delay of a static CMOS circuit scales proportionally [68], [70]. Dynamic CMOS circuits, custom arrays, latches, and analog circuits, however, cannot tolerate significant variations in the supply voltage. Modifications (such as replacing NMOS pass gates with CMOS pass gates, employing keeper transistors in dynamic gates, and avoiding high stacks of transistors) to standard cell libraries are often necessary in the design of a DVS microprocessor [68]-[71].

The DVS circuit technique can also be used to effectively reduce die-to-die variations of the electrical characteristics such as the clock frequency and power

consumption. In standard CMOS circuits, frequency binning is the primary method to enhance yield [98]. The frequency binning method reduces the clock frequency of those dies that do not satisfy the target active power requirements, thereby exploiting the linear dependence of dynamic switching power on the switching frequency. The reduction in frequency required to lower the active mode power below the maximum acceptable limit is typically significant. This significant reduction in clock frequency pushes the frequency distribution to the lower frequency bins, causing a violation of the minimum acceptable clock frequency for a large number of dies. The enhancement in yield with a standard frequency binning technique is, therefore, limited. Alternatively, the dynamic voltage scaling circuit technique lowers both the supply voltage and clock frequency to reduce the active power. Since the dependence of the dynamic switching power on the supply voltage and frequency is cubic and the dependence of the leakage power on the supply voltage is superlinear, the dynamic voltage scaling technique can satisfy target active power constraints with more moderate scaling of the clock frequency. The dynamic voltage scaling technique, therefore, increases both the yield and the number of dies accepted in a higher frequency bin as compared to the standard frequency binning technique, subject to the constraints of total active power, burn-in leakage power, and standby leakage power [98].

# 3.2. Multiple Supply Voltage CMOS

Current integrated circuits are typically designed to operate with a single supply voltage. The clock speed of a synchronous circuit is determined by the delay of the critical paths. A critical delay path between flip flops FF<sub>1</sub> and FF<sub>2</sub> in a single supply voltage synchronous circuit is shown in Fig. 3.5.

In a standard single supply voltage circuit, the value of the supply voltage is determined such that the target clock speed is achieved by the most critical (slowest) delay path. However, as the number of critical paths typically constitute only a small fraction of the total number of paths within an integrated circuit, a significant number

of gates along the non-critical delay paths operate with excessive slack as shown in Fig. 3.5 (signals propagate faster than necessary and arrive early, generating a time gap between the arrival and utilization of the input signals). If a signal arrives earlier than is necessary to the circuitry at the end of a non-critical path, the performance of a circuit is not increased. Operating the gates along these non-critical delay paths at the same supply voltage level as the gates along the critical paths wastes energy.



Fig. 3.5. A single supply voltage circuit.

The multiple supply voltage circuit technique exploits these delay differences among the different signal propagation paths within an integrated circuit. The multiple supply voltage circuit technique selectively lowers the supply voltages of the gates on the non-critical delay paths while maintaining a higher supply voltage on the critical delay paths in order to satisfy a target clock frequency [28]. A dual supply voltage circuit in which the supply voltage of all of the gates along the non-critical delay paths are replaced by a lower supply voltage is shown in Fig. 3.6.



Fig. 3.6. A dual supply voltage circuit. The gates that operate at a lower supply voltage are shaded.

Scaling the supply voltage of all of the gates along a non-critical delay path, however, may not always be feasible due to local timing constraints. The slack of a delay path after supply voltage scaling must be significantly lower but still greater than zero so as to not degrade the overall performance or reliability (by creating a race

condition) of an integrated circuit. A combination of high and low supply voltage gates can exist along a delay path if the delay requirements are not satisfied by scaling the supply voltages of all of the gates along a path. As discussed in Section 2.1.4, when a circuit supplied by a low supply voltage drives a CMOS circuit supplied by a higher supply voltage, static DC current and non-full-rail output voltage swing problems occur. Specialized voltage level converter circuits are required to interface the circuits operating at different supply voltages in a multiple supply voltage circuit [28], [32]. The power and area overhead of these voltage interface circuits must be included in the supply voltage optimization process. Circuit blocks with a lower supply voltage should, therefore, be chosen such that the number of voltage interface circuits and the total power (including the power overhead of the voltage level converters) are minimized while satisfying the timing constraints of all of the delay paths [28]. The clustered voltage scaling (CVS) technique, proposed in [28], minimizes the number of voltage level converters in a multiple supply voltage circuit. In the CVS technique shown in Fig. 3.7, the supply voltages are assigned such that no low supply voltage gate drives a high supply voltage gate.

A dual supply voltage media processor based on the clustered voltage scaling technique is presented in [28]. An automated synthesis method is proposed in [28] in which the total power dissipation is minimized without violating the timing constraints of the delay paths while scaling the supply voltage. A test circuit based on this method is fabricated in a 0.3 µm CMOS technology with a nominal supply voltage of 3.3 volts. An interesting observation reported in [28] is that an optimum lower supply voltage exists which minimizes the total power in a dual supply voltage circuit. As the supply voltage of the gates along the non-critical delay paths is reduced, not only the dynamic switching energy consumption per circuit block but also the number of circuit blocks supplied by this lower supply voltage is reduced (due to the timing constraints). There is, therefore, an optimum low supply voltage that minimizes the total power (reported as 1.9 volts in [28]). Within an optimum circuit configuration with the lowest power dissipation, the power supply for 76% of the cells is replaced with this lower supply voltage. A total of 5.8 million data paths are investigated.

Fifteen thousand critical paths are identified, constituting only 0.3% of the total number of paths. It is reported in [28] that approximately 60% of the data paths have delays only half of the cycle time (a 50% slack time) in a standard single supply voltage processor. A 39% to 57% reduction in power is reported with the dual supply voltage scheme as compared to a standard single supply voltage circuit operating at a nominal supply voltage of 3.3 volts [28].



Fig. 3.7. A dual supply voltage circuit with the clustered voltage scaling technique [28]. The circuits operating at a lower supply voltage are shaded. VIC: voltage interface circuit.

# 3.3. Threshold Voltage Scaling

Lowering the supply voltage is an effective way to reduce power dissipation in CMOS integrated circuits. Lowering the supply voltage, however, degrades circuit speed due to reduced transistor currents. The variation of the delay of an inverter with supply voltage for different threshold voltages, based on a  $0.18~\mu m$  CMOS technology, is depicted in Fig. 3.8.

Reducing the threshold voltages for a fixed supply voltage enhances the circuit speed by increasing the gate overdrive ( $|V_{GS}|$ - $|V_t|$ ) of the transistors, as shown in Figs. 3.8 and 3.9. Reducing the threshold voltage permits the supply voltage to be scaled without degrading the speed. For example, as indicated with the delay line shown in Fig. 3.8, the supply voltage can be scaled to 0.8 volts from an initial voltage of 1.6 volts and the threshold voltage can be scaled to 0.1 volts from an initial voltage of 0.5 volts while maintaining the same delay characteristics. If the threshold voltage to supply voltage ratio ( $V_t/V_{DD}$ ) is maintained constant, the increase in delay due to scaling the supply voltage can be limited. The  $V_t/V_{DD}$  ratio should typically be maintained below 0.25 for reasonable performance in a scaled CMOS technology [67], [74], [82]. By scaling both the supply and threshold voltages, the power dissipation and propagation delay of a CMOS circuit can be simultaneously reduced.



Fig. 3.8. Variation of the delay of a CMOS inverter with supply voltage for different MOSFET threshold voltages assuming a 0.18 µm CMOS technology.

Although lowering the threshold voltage is effective in enhancing the speed, there are a number of issues that limit threshold voltage scaling in a new technology generation. Due to limitations in the maximum acceptable standby power and die-to-die and within-die variations of the electrical characteristics, scaling the threshold voltage typically lags scaling the supply voltage in each new technology generation. Therefore, despite also scaling the threshold voltages, the V<sub>t</sub>/V<sub>DD</sub> ratio typically increases with technology scaling, degrading the achievable gain in circuit performance in a scaled CMOS technology generation [74].



Fig. 3.9. Effect of threshold voltage scaling on the delay of a 19 stage ring oscillator for four different supply voltages assuming a 0.18 µm CMOS technology.

A primary limitation to threshold voltage scaling is exponentially increasing subthreshold leakage currents with reduced threshold voltages. Subthreshold leakage current is the primary source of energy dissipation in an idle CMOS circuit. Energy consumption caused by standby leakage in portable devices is a significant concern

since subthreshold leakage current can greatly reduce the lifetime of a battery. In a high performance integrated circuit, leakage currents in both the active and standby modes of operation are a serious concern, due to the aggressive scaling of the threshold voltages. Provided that current technology scaling trends continue, by 2010 more than half of the total active mode power consumption in high performance integrated circuits is expected to be due to subthreshold leakage current [5].

Another significant issue with threshold voltage scaling is the increasing effect of die-to-die and within-die parameter variations on the speed and power dissipation characteristics. Traditionally, the focus of process engineers has been on controlling die-to-die parameter variations while within-die parameter variations have been somewhat neglected. Die-to-die parameter variations are caused by lot-to-lot and wafer-to-wafer differences in the processing temperature, wafer polishing, wafer placement, and the properties of the equipment used in the lithography process. Another source of die-to-die parameter variations is within-wafer differences primarily caused by aberrations in the stepper lens [64]. As the gate length of current semiconductor devices are lowered below the wavelength of light used in optical lithography (currently ranging from 193 nm to 248 nm), within-die parameter variations have also become a significant source of performance variation in CMOS circuits. Die-to-die and within-die parameter variations such as variations in the critical dimensions (e.g., gate length, gate oxide thickness, and junction depletion width) are difficult to control and typically do not scale. Die-to-die and within-die fluctuations of the critical dimensions, therefore, effectively increase with technology scaling [63], [72].

Alternatively, as discussed in Section 2.2.1.1, the sensitivity of the threshold voltage to variations in the critical dimensions is greater due to increasing short-channel effects as the gate length is reduced with technology scaling. The doping concentration in the channel area is typically reduced to lower the threshold voltage of a MOSFET in a scaled CMOS technology. As shown in Fig. 3.10, reducing the doping concentration in the channel area further increases short-channel and drain induced barrier lowering effects. The sensitivity of the threshold voltage to variations in the

critical dimensions, therefore, increases as the threshold voltage is scaled. Die-to-die and within-die fluctuations of the threshold voltages from a nominal target value increases with technology scaling.



Fig. 3.10. Effect of threshold voltage scaling on short-channel effects in an NMOS transistor. (a) A high- $V_t$  short-channel MOSFET. (b) A low- $V_t$  short-channel MOSFET.  $N_A$ : acceptor concentration in the channel area ( $N_{A2} < N_{A1}$ ).

Process parameter variations cause integrated circuits to exhibit different clock frequency and power dissipation characteristics. The electrical characteristics of a CMOS circuit fabricated in a deep submicrometer process technology become increasingly non-deterministic due to the enhanced sensitivity of the devices to parameter variations. The variation in the performance characteristics increases with greater fluctuations in the threshold voltage. The number of dies that satisfies the minimum acceptable clock speed and the maximum tolerable power dissipation is reduced with scaled threshold voltages and minimum transistor dimensions, degrading the overall yield. The increasing cost of fabricating deep submicrometer integrated circuits is, therefore, further aggravated by scaling the threshold voltages.

Several threshold voltage scaling techniques have been proposed that lower the effect of threshold voltage scaling on active and standby mode leakage power and dieto-die and within-die parameter variations in CMOS circuits. The body bias circuit technique dynamically changes the threshold voltage of the transistors by varying the voltage of the body terminal depending upon the dynamically changing power and speed requirements during circuit operation. The body bias circuit technique is discussed in Section 3.3.1. The multiple threshold voltage CMOS circuit technique employs transistors with different threshold voltages within the same circuit. A version of the multiple threshold voltage circuit technique reduces standby leakage power by employing high threshold voltage transistors between the power supply and ground terminals and the low threshold voltage circuitry. The high threshold voltage switches are cutoff in the standby mode to suppress the high subthreshold leakage current characteristics of the low threshold voltage circuitry. An alternative dual threshold voltage circuit technique reduces the subthreshold leakage power by selectively employing low threshold voltage transistors on the critical delay paths and high threshold voltage transistors on the non-critical delay paths. Different multiple threshold voltage CMOS circuit techniques are reviewed in Section 3.3.2.

### 3.3.1. Body Bias Techniques

The exponentially increasing subthreshold leakage current and die-to-die and within-die threshold voltage variations determine the lowest acceptable (or achievable) threshold voltages for a specific deep submicrometer technology generation. The threshold voltage is typically adjusted during the fabrication process by varying the doping concentration in the channel area (see Fig. 3.10). Alternatively, the body bias circuit technique utilizes the body terminal to dynamically modify the threshold voltage of a transistor during circuit operation. Depending upon the polarity of the voltage difference between the source and body terminals (V<sub>SB</sub>), the threshold voltage can be either increased or decreased as compared to a zero body biased transistor.

The threshold voltage is increased when the source-to-substrate p-n junction of a MOSFET is reverse biased. The reverse body bias circuit technique is described in Section 3.3.1.1. The threshold voltage of a MOSFET can also be reduced by forward biasing the source-to-substrate p-n junction. The forward body bias circuit technique is presented in Section 3.3.1.2. The bidirectional body bias circuit technique (providing both reverse and forward body bias voltages) is described in Section 3.3.1.3.

#### 3.3.1.1. Reverse Body Bias

The reverse body bias technique increases the threshold voltage of a MOSFET by applying a negative voltage across the source-to-substrate p-n junction, as shown in Fig. 3.11. The variation of the charge distribution in the depletion region and inversion layer of a MOSFET under zero body bias and reverse body bias conditions is illustrated in Fig. 3.12. In a MOSFET, the gate charge, the insulator, the mobile charges in the channel area, and the immobile ions in the depletion region form a capacitor (the MOS capacitor). The positive charge on the gate is balanced by the sum of the electronic charge in the inversion layer and the negative ionic charge in the depletion region. When a MOSFET is reverse body biased, the width of the depletion region beneath the gate increases as shown in Fig. 3.12b. Increasing depletion width

corresponds to an increase in the ionic charge in the semiconductor plate of the MOS capacitor. In order to maintain the charge balance, the mobile charge (number of electrons) in the inversion layer decreases, as depicted in Fig. 3.12b. As the number of mobile charges in the inversion layer is reduced in a reverse body biased MOSFET, the gate voltage needs to be increased to achieve a similar level of inversion as compared to a zero body biased MOSFET. The threshold voltage of a reverse body biased MOSFET, therefore, increases.

The reverse body bias technique can be used during standby and burn-in modes to increase the threshold voltages of all of the transistors in an integrated circuit, thereby reducing the subthreshold leakage current. The standby mode is the mode during which a circuit is idle, while the burn-in mode is the mode during which standard stress tests are applied to an integrated circuit under elevated temperature and supply voltage conditions [72], [74], [75], [79]. Alternatively, the reverse body bias technique can be applied to the idle portions of an integrated circuit to reduce the active leakage power without degrading speed [76]. A significant reduction of up to ten thousand times in leakage power consumption is reported in [72] by applying a reverse body bias  $(1.2V_{DD})$  to all of the transistors during the idle mode in a discrete cosine transform processor fabricated in a 0.3  $\mu$ m CMOS technology.



Fig. 3.11. Reverse body bias circuit technique. (a) A reverse body biased NMOS transistor. (b) A reverse body biased PMOS transistor.



Fig. 3.12. Effect of reverse body bias on the depletion region and inversion layer charge in a MOSFET. (a) A zero body biased NMOS transistor. (b) A reverse body biased NMOS transistor.  $W_{D1} < W_{D2}$ .  $W_{I1} > W_{I2}$ .

Although increasing the reverse body bias voltage across the source-to-substrate p-n junction of a MOSFET increases the threshold voltage, thereby reducing the subthreshold leakage current, a reverse body bias also increases the tunneling leakage current at the reverse biased source-to-body and drain-to-body p-n junctions. The reverse biased junction leakage current in a MOSFET is composed of three primary components. The surface band-to-band tunneling current (also known as gate induced drain leakage) is the dominant junction leakage current component at zero body bias and low junction temperature conditions. At high junction temperatures and zero body bias, the junction current due to the thermal emission of carriers is a significant leakage current component. When a reverse body bias is applied to a MOSFET at room temperature, junction leakage due to thermal emission is typically negligible as compared to the junction current due to band-to-band tunneling [75]. The junction band-to-band tunneling leakage current is dominated by gate induced drain leakage (GIDL) at low reverse body bias voltages. The band-to-band tunneling current in the bulk is the dominant component of the junction leakage current at high reverse body bias voltages (typically above 0.5 volts) [75], [76].



Fig. 3.13. Variation of the total standby power of a microprocessor test circuit as a function of reverse body bias voltage [75].

As the reverse body bias voltage is increased, both the surface and bulk band-to-band tunneling current components increase while the subthreshold leakage current decreases [74]-[76]. There is, therefore, an optimum reverse body bias voltage (specific to a process technology) that minimizes the total leakage power consumption [75]. The variation of the total standby power consumption of a test circuit for various body bias voltages is shown in Fig. 3.13 [75]. The competing subthreshold and band-to-band tunneling leakage current mechanisms at increasing reverse body bias voltages are also illustrated. As shown in Fig. 3.13, the total leakage power does not monotonically decrease with increasing reverse body bias voltage, due to increasing band-to-band tunneling current.



Fig. 3.14. Block diagram of a speed adaptive body bias circuit [79].

The reverse body bias technique has also been shown to be effective in reducing variations in the speed and power characteristics of integrated circuits due to fluctuations in the supply voltage, temperature, and die-to-die process parameters [72], [73], [79], [80]. To compensate for the unpredictable variations of these circuit parameters, an adaptive body bias control scheme, shown in Fig. 3.14, can be used. This adaptive reverse body bias circuit dynamically varies the body bias voltages depending upon local speed and power requirements. A feedback circuit integrated onto the same die as the integrated circuit tracks changes in speed and power

dissipation caused by variations in temperature, supply voltage, and/or process parameters. Matching an external reference signal to the delay or power information provided by a replica of a critical path, the necessary body bias voltages can be dynamically generated to adaptively compensate for variations in the circuit parameters. The external reference signal can either be a speed reference such as a clock signal (as shown in Fig. 3.14) [73] or a power reference such as a target leakage current [72].



Fig. 3.15. Reduced die-to-die delay variations by applying the speed adaptive reverse body bias circuit technique to test circuits fabricated in a 0.25  $\mu$ m CMOS technology [73]. (a) Delay distribution of standard CMOS circuits with zero body bias. (b) Reduced delay distribution with adaptive body bias. (c) Enhanced worst case speed by further scaling the threshold voltages with the adaptive body bias circuit technique.

This speed adaptive reverse body bias technique reduces die-to-die delay variations from 45% to 30%, as shown in Fig. 3.15 [73]. Since die-to-die frequency variations are reduced, the adaptive reverse body bias technique provides an opportunity to further scale the threshold voltages without violating limitations in leakage power. The worst case clock frequency is shown to increase by up to 37% by applying this speed adaptive reverse body bias technique [73].



Fig. 3.16. Body effect degradation due to channel length scaling. (a) A long channel MOSFET. (b) A short-channel MOSFET.

The effectiveness of the reverse body bias technique to lower the subthreshold leakage current is reduced with technology scaling due to a weaker body effect [63], [74]-[78]. As the channel length is reduced, the body effect degrades due to increasing short-channel effects as illustrated in Fig. 3.16. Not only the gate terminal but also the body terminal looses some control of the charge distribution in the channel area in short-channel MOSFETs [63].

Reverse body biasing a MOSFET alleviates the short-channel effects by increasing the width of the junction depletion region. Moreover, in a circuit that is reverse body biased in the standby mode, the zero body bias threshold voltages are typically designed to be low to enhance the speed of the circuit when operating in the active mode. Lowering the doping concentration in the channel area to reduce the zero body bias threshold voltage further degrades the body effect (see Fig. 3.10) [63], [74]-[76]. The reverse body bias technique, therefore, becomes less effective in controlling the threshold voltage at reduced channel lengths and threshold voltages. Moreover, the optimum reverse body bias voltage that minimizes the leakage current decreases with technology scaling due to increased band-to-band tunneling current [74]. Alternatively, the reverse body bias voltage necessary to achieve a target threshold voltage variation ( $\Delta V_t$ ) increases with technology scaling due to the reduced body effect. The higher the subthreshold leakage current becomes with technology scaling, the less effective the reverse body bias technique is in lowering this leakage current.

Another significant disadvantage of the reverse body bias technique is that variations in the leakage current due to parameter variations increase with higher reverse body bias voltages [76]. The threshold voltage becomes more sensitive to parameter variations due to increasing short-channel and drain induced barrier lowering effects as the reverse body bias voltage is increased. The effect of the reverse body bias on short-channel effects and threshold voltage roll-off is shown in Fig. 3.17 [63]. As illustrated in Fig. 3.17, low threshold voltage devices are more sensitive to variations in the critical dimensions. Threshold voltage roll-off further increases with higher reverse body bias voltages. Increasing drain induced barrier lowering due to the reverse body bias circuit technique is illustrated in Fig. 3.18.



Fig. 3.17. Increasing short-channel effects and threshold voltage roll-off with reverse body bias (RBB) for low- $V_t$  and high- $V_t$  MOSFETs for a 0.25  $\mu$ m CMOS technology [63]. NBB: no body bias, RBB: reverse body bias.



Fig. 3.18. Effect of the reverse body bias circuit technique on drain-induced barrier-lowering ( $\Delta V_t / \Delta V_{DS}$ ) for a 0.18  $\mu$ m CMOS technology. The threshold voltage ( $V_t$ ) is the gate-to-source voltage at which the drain current is equal to 1  $\mu$ A /  $\mu$ m.

#### 3.3.1.2. Forward Body Bias

An alternative body bias scheme is the forward body bias technique. The threshold voltage of a MOSFET can be reduced by applying a positive voltage across the source-to-substrate p-n junction, as shown in Fig. 3.19. The variation of the charge distribution in the depletion region and inversion layer of a forward body biased MOSFET as compared to a zero body biased MOSFET is illustrated in Fig. 3.20. When a MOSFET is forward body biased, the width of the depletion region beneath the gate decreases as shown in Fig. 3.20b. Reducing depletion width corresponds to a decrease in the ionic charge in the semiconductor plate of the MOS capacitor. In order to maintain the charge balance, the mobile charge (number of electrons) in the inversion layer increases, as depicted in Fig. 3.20b. As the number of mobile charges in the inversion layer is increased, the gate voltage needed to achieve the similar level of inversion as compared to a zero body biased MOSFET is reduced. The threshold voltage of a forward body biased MOSFET, hence, decreases.



Fig. 3.19. Forward body bias circuit technique. (a) A forward body biased NMOS transistor. (b) A forward body biased PMOS transistor.



Fig. 3.20. Effect of forward body bias on the depletion region and inversion layer charge in a MOSFET. (a) A zero body biased NMOS transistor. (b) A forward body biased NMOS transistor.  $W_{D1} > W_{D2}$ .  $W_{I1} < W_{I2}$ .

As discussed in Section 3.3.1.1, for the purpose of reducing standby leakage current, the reverse body bias technique employs low threshold voltage transistors (under zero body bias conditions) to achieve a target circuit performance during the active mode of operation. The threshold voltages of these transistors are increased during the standby and burn-in modes by applying a reverse body bias, thereby reducing the subthreshold leakage current. Alternatively, the forward body bias technique employs high threshold voltage transistors (under zero body bias conditions) to maintain the standby leakage current below a target limit. The threshold voltages of these transistors are reduced during the active mode by applying a forward body bias to achieve a target circuit speed. The forward body bias is removed (by applying either a zero body bias or a reverse body bias) during the standby and burn-in modes to increase the threshold voltages, thereby reducing the subthreshold leakage current [74], [77], [78].

Similar to the reverse body bias technique, the effectiveness of the forward body bias technique is reduced with technology scaling due to the degradation of the body effect with increased short-channel effects at smaller channel lengths [74], [79]. However, unlike a reverse body biased transistor, the short-channel effects of a forward body biased transistor are lower as compared to a zero body biased transistor. As shown in Fig. 3.21, when a forward body bias is applied to a MOSFET, the depletion width of the source-to-substrate and drain-to-substrate p-n junctions is reduced. Increasing the forward body bias, therefore, reduces short-channel and drain induced barrier lowering effects while enhancing the body effect [74], [77], [78]. Moreover, contrary to the reverse body bias circuit technique, the zero body bias threshold voltages of a forward body biased circuit are typically higher, further enhancing the body and lowering the short-channel and drain induced barrier lowering effects (due to the higher doping concentration in the channel area of a high threshold voltage transistor, as shown in Fig. 3.10) [74], [77], [78]. The forward body bias technique, therefore, is more effective as compared to the reverse body bias technique with technology scaling. The forward body bias technique is expected to become more

common as compared to the reverse body bias technique in future nanometer CMOS technology generations [77]-[79].



Fig. 3.21. Effect of forward body bias on short-channel effects in an NMOS transistor. FBB: forward body bias ( $V_{Body} > 0$ ). ZBB: zero body bias ( $V_{Body} = 0$ ).

The maximum forward body bias voltage applicable to a MOSFET is limited by diode currents in the forward biased source-to-body and drain-to-body p-n junctions. The junction diode currents increase the active leakage power in a forward body biased circuit. The voltage swing at an output node can be degraded due to these junction diode currents if the forward body bias voltage is increased to effectively turn on the body diodes [78]. Moreover, as shown in Fig. 3.22, the diode currents oppose the transition of the voltage state of a node, degrading the effective switching current and therefore the propagation delay. Another side effect of the forward body bias technique is the increased source-to-body and drain-to-body junction capacitances (C<sub>J1</sub> and C<sub>J2</sub> in Fig. 3.22) with higher forward body bias voltages. These larger junction capacitances increase the active mode switching power and can become significant at high forward body bias voltages, degrading the propagation delay.

The variation of the propagation delay and active mode energy with the body bias voltage, for a 101 stage ring oscillator fabricated in a 0.18  $\mu$ m CMOS technology, is illustrated in Fig. 3.23 [77]. The energy-delay product of this ring oscillator is shown in Fig. 3.24. The diode current from this test circuit is 3 nA/ $\mu$ m for a forward body bias voltage of 0.6 volts. This diode leakage current is significantly smaller as compared to the on state drain current ( $I_{DSAT} \approx 0.1$  mA/ $\mu$ m). Due to the significant enhancement of the drain current by the forward body bias circuit technique (e.g., the drain current increases by 40% for a 0.6 volt forward body bias), the propagation delay is reduced with increasing forward body bias voltage despite the increasing diode currents and junction capacitances.



Fig. 3.22. Schematic representation of a forward body biased CMOS circuit.  $I_{DIODE1}$ : source-to-body junction diode current,  $I_{DIODE2}$ : drain-to-body junction diode current,  $C_{J1}$ : source-to-body junction capacitance, and  $C_{J2}$ : drain-to-body junction capacitance.

As shown in Fig. 3.23, the active mode energy increases approximately linearly up to a forward body bias voltage of 0.4 volts due to the increasing junction

capacitances. As the forward body bias voltage is increased beyond 0.4 volts, the exponentially increasing junction diode currents significantly increase the active mode energy. Similarly, as shown in Fig. 3.24, the energy-delay product is reduced with increasing forward body bias up to a bias voltage of 0.4 volts. As the forward body bias voltage is increased beyond 0.4 volts, the energy-delay product increases due to the significantly higher diode currents and junction capacitances. Similar optimum forward body bias voltages in the range of 0.4 volts to 0.6 volts have also been reported in [74] and [78] to maximize the clock frequency or minimize the energy-delay product.



Fig. 3.23. Variation of the propagation delay and energy consumption of a 101 stage ring oscillator with body bias voltage based on a 0.18 μm CMOS technology [77].

An interesting observation reported in [74], [77], and [78] is that the speed enhancement and reduction in the energy-delay product achieved by the forward body bias technique increases with supply voltage scaling. As reported in [74], by applying a forward body bias voltage of 0.6 volts, the oscillation frequency of a ring oscillator

fabricated in a 100 nm CMOS technology is improved by 30% at a supply voltage of 1.5 volts. Under the same body bias conditions, the speed enhancement increases to 45% and 150% as the supply voltage is scaled to 1.2 volts and 0.8 volts, respectively. The effectiveness of the forward body bias technique, therefore, increases provided that the supply voltage is scaled more aggressively than the threshold voltages, which is the likely trend in technology scaling (increasing V<sub>t</sub>/V<sub>DD</sub> ratio due to the constraints imposed by higher standby leakage power and large manufacturing induced variations in V<sub>t</sub>). The junction capacitances and switching energy of this ring oscillator increase by 10% for a forward body bias voltage of 0.6 volts (independent of the supply voltage) [74].



Fig. 3.24. Variation of the energy-delay product of a 101 stage ring oscillator with body bias voltage based on a 0.18 μm CMOS technology [77].

The forward body bias technique can also be used to reduce the active mode power consumption [74], [77], [78]. By forward biasing the substrate, a higher clock

frequency can be achieved at a lower supply voltage. Lowering the supply voltage, due to the quadratic dependence of the switching energy on the supply voltage, significantly reduces the active mode power with moderate forward body bias voltages. As shown in [74], the active power consumption of a ring oscillator circuit (fabricated in a 100 nm CMOS technology) is reduced by approximately 40% by reducing the supply voltage from a nominal value of 1.1 volts to 0.8 volts. The same speed as a standard zero body biased circuit is maintained by applying a forward body bias voltage of 0.6 volts. The power savings caused by reducing the supply voltage outweighs the power overhead due to the increasing diode currents and junction capacitance.

Similarly, a microprocessor test circuit fabricated in a 0.15 µm CMOS technology operating at a nominal supply voltage of 1.2 volts is reported in [78]. This test circuit operates at a clock frequency of 1 GHz under standard zero body bias conditions. It is shown that the supply voltage can be scaled to 1.1 volts without degrading the clock frequency by applying a forward body bias of 0.5 volts to all of the transistors within this integrated circuit. For this body bias condition, the active mode leakage current increases by approximately a hundred times. Similarly, the total switched capacitance increases by 10% [78]. The total active mode power dissipation (including the energy overhead due to the increasing active leakage current and junction capacitance) of this microprocessor is reduced by approximately 8%.

#### 3.3.1.3. Bidirectional Body Bias

As described in [74], before the 0.13 µm technology node, a single static threshold voltage (zero body bias) has been standard for satisfying both speed and standby power requirements. However, due to the reduced circuit speed with lower supply voltages and the increased subthreshold and gate oxide leakage currents at scaled technologies, a single threshold voltage design space that satisfies both the speed and power requirements (for a single supply voltage system) is unlikely after the 0.13 µm technology generation. To scale the threshold voltages together with the

supply voltages, some form of body bias is necessary. Either a reverse body bias or a forward body bias circuit technique will, therefore, be required to maintain the speed enhancements within a reasonable power budget below the 0.13 µm technology generation [74]. As the effectiveness of the reverse body bias circuit technique diminishes with technology scaling, a reverse-body-bias-only circuit technique will not simultaneously satisfy the speed and power requirements beyond the 70 nm technology generation [74]. Similarly, due to the weaker body effect, the forward-body-bias-only solution will no longer satisfy these performance requirements beyond the 50 nm technology generation. Beginning with the 50 nm technology generation, therefore, application of both forward and reverse body bias techniques within the same integrated circuit will become necessary to enhance circuit speed within a limited power budget [74].

In a bidirectional (forward and reverse) body bias circuit, the zero body bias threshold voltages of the transistors can be set to an intermediate value by controlling the channel doping concentration. In order to increase the circuit speed, the threshold voltages can be dynamically reduced by forward body biasing the transistors. Alternatively, in order to reduce both the circuit speed and leakage power, the threshold voltages can be increased by reverse body biasing the transistors. As discussed previously, the forward body bias voltage that can be applied to a CMOS circuit is limited due to increasing diode currents and junction capacitances. Similarly, the reverse body bias voltage that can be applied to a CMOS circuit is limited due to the increasing junction band-to-band tunneling currents with technology scaling. Since the transistors in a bidirectional body biased circuit would be initially set to an intermediate threshold voltage (rather than the low threshold voltages utilized in a reverse-body-bias-only circuit or the high threshold voltages utilized in a forward-body-bias-only circuit), the bidirectional body bias technique can produce a wider choice of threshold voltages.

As discussed in Section 3.3.1.1, the adaptive reverse body bias technique can be used to reduce die-to-die parameter variations. However, as shown in [63] and [80], the reverse body bias circuit technique increases within-die parameter variations due

to increasing short-channel effects as compared to a zero body biased circuit. Since the forward body bias technique reduces short-channel effects, a bidirectional adaptive body bias technique can be used to reduce both die-to-die and within-die parameter variations.



Fig. 3.25. Leakage power and clock frequency characteristics of microprocessor test circuits fabricated in a 0.15  $\mu$ m CMOS technology (LFB: lower frequency bin, HFB: higher frequency bin) [80].

The measured clock frequency and leakage power consumption of 62 microprocessor test circuits fabricated in a 0.15 µm CMOS technology are shown in Fig. 3.25 [80]. Due to the die-to-die and within-die parameter variations, different dies display different frequency and leakage characteristics, as shown in Fig. 3.25. For an integrated circuit to be acceptable, both the minimum clock frequency and maximum leakage power requirements must be satisfied. The increasing die-to-die and within-

die variations of the electrical characteristics are expected to further degrade yield with technology scaling. As shown in Fig. 3.25, a significant number of dies are rejected due to violating either the speed or power constraint.

A simple bidirectional adaptive body bias scheme that reduces die-to-die parameter variations is the application of a single adaptively generated body bias combination (for the NMOS and PMOS transistors) to an entire integrated circuit. This body bias combination can satisfy the delay requirements of the longest critical delay path in an integrated circuit using a feedback circuit similar to the circuit shown in Fig. 3.14. Microprocessor test circuits based on this bidirectional adaptive body bias technique, fabricated in a 0.15 µm CMOS technology, are reported in [80]. With the standard zero body bias circuit technique, only 50% of the test circuits pass the speed and power tests. Moreover, most of the acceptable circuits have clock frequencies in the lower frequency bin (LFB). By applying a bidirectional adaptive body bias, the die acceptance rate increases to 100%.

This technique, however, ignores within-die parameter variations by applying a single body bias voltage combination to the entire integrated circuit. The die-to-die parameter variations similarly affect the electrical characteristics of all of the devices in an integrated circuit [64]. Applying a single set of body bias voltages to an entire circuit can, therefore, effectively reduce die-to-die parameter variations by shifting the threshold voltages of all of the devices by a similar ratio. The within-die parameter variations, however, affect the electrical characteristics of all of the individual devices differently. Applying a single set of adaptive body bias voltages to an integrated circuit is, therefore, ineffective for reducing within-die parameter variations. Due to the significance of within-die parameter variations in a deeply scaled CMOS technology, despite the reduction of die-to-die variations and the yield enhancement to 100%, only 32% of the dies are acceptable in the higher frequency bin (HFB) by applying the same bidirectional adaptive body bias voltages to all of the transistors in an integrated circuit. An alternative adaptive body bias technique is proposed in which a second set of test circuits are divided into different blocks. An independently generated adaptive body bias voltage is applied to each circuit block, thereby lowering the within-die parameter variations. It has been shown that by applying this second bidirectional adaptive body bias technique, which reduces both the within-die and dieto-die parameter variations, the number of dies accepted in the highest frequency bin is increased to 99% while maintaining a 100% yield [80].

An alternative bidirectional body bias technique (named  $V_t$ -hopping) is proposed in [81] to lower the leakage power consumed during both the active and standby modes of operation. The  $V_t$ -hopping technique is essentially a dynamic threshold voltage scaling technique inspired by the dynamic voltage scaling (DVS) technique (DVS is discussed in Section 3.1). The primary objective of the DVS circuit technique is to lower the dynamic switching power. The DVS circuit technique is effective in reducing the active power dissipation provided that the total active power is dominated by the dynamic switching power. Alternatively, the  $V_t$ -hopping scheme is effective for reducing the active power consumption provided that the dominant mechanism of active power consumption is subthreshold leakage current due to the aggressive scaling of the supply and threshold voltages ( $V_{DD} \le 0.5$  volts and  $V_t \approx 0$  volts).

A reduced instruction set (RISC) microprocessor based on the  $V_t$ -hopping circuit technique fabricated in a 0.6  $\mu$ m CMOS technology is reported in [81]. The  $V_t$ -hopping scheme utilizes two different sets of threshold voltages for operation at either a high clock frequency ( $f_{CLK}$ ) or a low clock frequency ( $f_{CLK}/2$ ). As the throughput requirement from a processor changes depending upon the variations of the workload, the operating system switches the desired clock frequency between  $f_{CLK}$  and  $f_{CLK}/2$ . The  $V_t$ -hopping circuitry, in response to a request from the operating system, applies either a predetermined set of forward body bias voltages or a predetermined set of reverse body bias voltages to switch the operating frequency of the processor circuitry to  $f_{CLK}$  or  $f_{CLK}/2$ , respectively. The  $V_t$ -hopping technique reduces the total active power dissipation by up to 82% as compared to a low threshold voltage RISC processor for the same workload at a supply voltage of 0.5 volts [81].

#### 3.3.2. Multiple Threshold Voltage CMOS

Multiple threshold voltage CMOS technologies employ both high and low threshold voltage transistors within the same integrated circuit. The primary goal of multiple threshold voltage circuits is to selectively scale the threshold voltages together with the supply voltage in order to enhance the speed without significantly increasing the subthreshold leakage current.

The multiple threshold voltage circuit technique selectively places low threshold voltage transistors on the speed critical paths of a circuit to enhance the speed while operating at a reduced supply voltage [88]-[90]. The motivation for this multiple threshold voltage circuit technique is similar to the motivation for the multiple supply voltage circuit technique (discussed in Section 3.2). In a standard single threshold voltage circuit, the threshold voltages of the transistors are chosen to achieve a specific target clock frequency. Since the speed of a synchronous digital circuit is determined by the most critical (slowest) delay paths, the threshold voltages (similar to the supply voltage) is primarily chosen to lower the propagation delay of the signals along the critical paths in order to satisfy a target clock period. Since all of the transistors have the same nominal threshold voltage in a standard CMOS circuit, the signal propagation along many non-critical delay paths is unnecessarily fast, creating excessive slack. A single threshold voltage circuit essentially wastes power in the form of leakage current on many non-critical delay paths. The multiple threshold voltage circuit technique exploits this characteristic by selectively scaling the threshold voltages only along the speed critical paths. The primary objective of the multiple threshold voltage circuit technique is to minimize the number of low threshold voltage transistors required to satisfy a target clock frequency while maximizing the number of high threshold voltage transistors to achieve the lowest subthreshold leakage current. The multiple threshold voltage circuit technique provides an opportunity to further scale the threshold voltages (as compared to a standard single threshold voltage circuit) without violating any limitation in the total subthreshold leakage power. A target clock frequency can, therefore, be satisfied within a limited power budget by only scaling the threshold voltages of those portions of a circuit where a low threshold voltage transistor is required to achieve a specific propagation delay at a reduced supply voltage.

A dual threshold voltage PowerPC RISC microprocessor fabricated in a 0.25 μm dual threshold voltage CMOS technology is reported in [89]. Standard threshold voltage transistors are used in the caches, non-critical paths, and the leakage sensitive dynamic circuits (primarily because of noise immunity concerns). The low threshold voltage transistors are selectively used on the speed critical delay paths. The clock frequency of this dual threshold voltage microprocessor, with 40% of the transistors operating at a low threshold voltage, is enhanced by up to 10% as compared to a standard single threshold voltage microprocessor [89].

A 760 MHz G6 microprocessor fabricated in a dual threshold voltage 0.2 μm CMOS technology is reported in [90]. Similar to the previous example, standard threshold voltage transistors are used in the caches and the dynamic circuitry. The low threshold voltage transistors are selectively placed along the critical delay paths of the logic circuitry. The clock frequency of this microprocessor, with only 3% of the total transistors operating at a low threshold voltage (corresponding to 10% of the logic transistors), is enhanced by 10% as compared to a standard single threshold voltage microprocessor [90].

Another CMOS circuit technique employing multiple threshold voltage transistors, multithreshold-voltage CMOS (MTCMOS), has been proposed by Mutoh [29]. An MTCMOS circuit is shown in Fig. 3.26. In an MTCMOS circuit, all of the logic transistors have low threshold voltages to enhance circuit speed. In order to suppress the high subthreshold leakage current characteristics of the scaled low threshold voltage transistors, high threshold voltage switches are added between the low threshold voltage logic circuits and the power supply and ground lines. These high threshold voltage power supply and ground switches are controlled by a sleep signal. During the active mode of operation, the sleep control switches are activated, providing a virtual power and ground line for the logic circuits. During standby mode, these high threshold voltage sleep control switches are turned off, reducing the

subthreshold leakage current. As shown in Fig. 3.26, in a typical MTCMOS circuit, the same sleep control transistor is shared by several low threshold voltage logic gates, assuming the switching activity of the gates connected to the same sleep switch is small (switching activity is assumed to be less than 30% in [29]).



Fig. 3.26. A multithreshold-voltage CMOS (MTCMOS) circuit [29], [83]. The high threshold voltage transistors are illustrated by a bold line in the channel area.

The delay of an MTCMOS circuit is degraded due to the high threshold voltage sleep switches connected in series, as compared to a standard low threshold voltage CMOS circuit. Due to the voltage drop across the series resistance of a sleep switch transistor, the voltage difference between the virtual power and ground lines is less than the standard full voltage swing between the primary power supply  $(V_{DD})$  and ground. The effective supply voltage in an MTCMOS circuit is, therefore, smaller than

the primary supply voltage  $V_{DD}$ . In order to minimize the speed degradation due to this reduction in the effective supply voltage, the width of the sleep switches are increased. Increasing the width of the sleep transistors, however, increases the area overhead, the subthreshold leakage current (which is proportional to the total width of the sleep switches), and the energy overhead of activating/deactivating the sleep switches. The optimum size of the high threshold voltage sleep switches is, therefore, a critical design issue in an MTCMOS circuit.

An average current method is proposed in [85] to optimize the width of the sleep transistors. This technique assumes that the current consumption of an MTCMOS circuit is constant and well understood before the beginning of the design process. An alternative technique is proposed in [86] to minimize the total width of the sleep switches. In this technique, the size of the sleep transistor of each low threshold voltage circuit is individually optimized based on circuit simulations. Once the optimum sleep transistor size is determined for all of the circuit blocks, the sleep transistors of the mutually exclusive gates (the gates that are guaranteed to not simultaneously switch) are merged, minimizing the total sleep transistor width [86]. Both of the circuit techniques described in [85] and [86] guarantee that for any input vector, the degradation in the effective supply voltage is limited such that the circuit delay is within a specific target range. For example, a maximum 2% degradation in effective supply voltage is allowed in [85] for a worst case 2% delay fluctuation. In addition to the resistive voltage drop, the virtual power and ground lines bounce whenever a gate switches. The parasitic capacitances of the virtual power and ground lines (C<sub>Virtual-VDD</sub> and C<sub>Virtual-GND</sub>) temporarily supply charge to the internal logic circuits, limiting any transient reduction in the effective supply voltage due to the switching activity.

The primary reason for the reduction in leakage current of an MTCMOS circuit as compared to a standard low threshold voltage CMOS circuit is the smaller subthreshold leakage current due to the high threshold voltage transistors between the low threshold voltage logic circuits and the power and ground lines. The subthreshold leakage current of a high threshold voltage transistor is exponentially smaller than a

low threshold voltage transistor. The serially connected high threshold voltage sleep switches, therefore, suppress the leakage current characteristics of the low threshold voltage transistors. Moreover, the total effective transistor width available for subthreshold leakage current conduction is smaller in an MTCMOS circuit as compared to a standard low threshold voltage CMOS circuit, due to the use of the same sleep transistor by several logic gates. The total width of the sleep control switches is significantly smaller (typically less than 10%) than the total equivalent width of the low threshold voltage logic gates. Reducing the total effective transistor width between the power supply and ground linearly decreases the subthreshold leakage current.

Using a narrower leakage current conduction path in order to reduce both the active and standby leakage power was first proposed by Sakata [84] in 1994. The switched-power-supply technique proposed in [84] assumes a single threshold voltage CMOS technology. The switched-power-supply technique inserts a standard threshold voltage transistor, called a standby-current-limiting transistor, between the primary power supply and a virtual power supply line. A switched-power-supply circuit is divided into separate blocks to reduce the active leakage power in the unused sections of a circuit. Each circuit block has a separate block-select transistor between the virtual power supply line and the logic circuitry within the block. This switchedpower-supply technique reduces the leakage current during both the active and standby modes of operation. During the active mode, the standby-current-limiting switch is always on and the block-select transistors of the unused circuit blocks are selectively turned off, reducing the active mode leakage current. During the standby mode, the standby-current-limiter switch is turned off, isolating the virtual supply line from the main power supply, thereby reducing the standby leakage current of the entire circuit [84]. The reduction in subthreshold leakage current with this technique is limited as compared to the MTCMOS technique, since the power supply switches have the same threshold voltage as the transistors in the logic circuits.

### 3.4. Multiple Supply and Threshold Voltage CMOS

As discussed in Section 3.2, a dual supply voltage circuit technique offers significant power savings as compared to a standard single supply voltage CMOS circuit by exploiting the slack in a significant number of non-critical delay paths. Scaling the supply voltage along the non-critical delay paths, however, is limited as long as a standard single threshold voltage is maintained in a dual supply voltage circuit. To achieve higher power gains by further scaling the supply voltage, the threshold voltages must also be scaled. In an integrated circuit where the power consumption is dominated by dynamic switching power, a dual supply and threshold voltage circuit technique can significantly increase the power savings as compared to a dual supply and single threshold voltage circuit [92].

A typical high performance CMOS integrated circuit, such as a microprocessor, consists of two types of circuits which depend upon the circuit activity. A small portion of the circuitry (typically less than 5%) has a very high activity factor, on the order of 100% (such as clock circuits). These high activity circuits dominate the total power consumption of a typical CMOS circuit. The majority of the gates in a typical integrated circuit have a much lower activity factor, typically between 1% and 10% [53]. Although these low activity factor circuits constitute the large majority of the circuitry (typically greater than 95%), the low activity circuits typically consume a smaller portion of the total power consumption.

In a high activity circuit, the total power is typically dominated by the dynamic switching power even at very low supply and threshold voltages. Supply and threshold voltage scaling can, therefore, be effectively applied to a high activity circuit. For a low activity circuit, the total power is dominated by the dynamic switching power only at high supply voltages. As the supply voltage is decreased, the greater subthreshold leakage power begins to dominate the total power consumption. A low activity circuit, therefore, has a very limited design space for supply and threshold voltage scaling. A multiple supply and threshold voltage circuit technique can exploit the different scaling characteristics of circuit blocks with different activity factors as

illustrated in Fig. 3.27. Low supply and threshold voltages can be used in the high activity factor circuitry while high supply and threshold voltages can be maintained in the low activity factor circuitry. By scaling the supply voltage in the high activity factor circuitry which dominates the total power consumption of an integrated circuit, the total power can be significantly reduced without degrading the speed.



Fig. 3.27. A multiple supply and threshold voltage integrated circuit with voltage partitioning based on the difference of the activity factors among different circuit blocks.

The power consumption of a test circuit with varying supply voltage for three different activity factors assuming a 2 GHz clock frequency and a 100 nm CMOS technology is shown in Fig. 3.28. As illustrated in Fig. 3.28, for a 100% activity

factor, the total power monotonically decreases as the supply and threshold voltages are scaled. For an activity factor of 10%, the total power is only dominated by dynamic switching power for supply voltages higher than 0.8 volts. As the supply voltage is scaled below 0.8 volts, the increasing subthreshold leakage power starts to dominate the total power consumption. Similarly, for an activity factor of 1%, the dynamic switching power dominates the total power dissipation for supply voltages above 1.2 volts. As the supply voltage is scaled below 1.2 volts, the higher subthreshold leakage current increases the total power consumption despite the reduced dynamic switching power.



Fig. 3.28. Power dissipation of a test circuit with varying supply voltage for three different activity factors assuming a 2 GHz clock frequency and a 100 nm CMOS technology [53].

A multiple supply and threshold voltage circuit technique is proposed in [53]. This technique employs a low supply voltage ( $V_{DD2} = 0.5$  volts) and low threshold

voltage transistors ( $|V_{t2}| \approx 0$ ) in the high activity circuitry. Alternatively, a high supply voltage and high threshold voltage transistors are employed in the low activity circuitry. The change in the total power consumption of the microprocessor test circuits based on dual supply and threshold voltages or standard single supply and threshold voltage circuit techniques with varying supply voltage is shown in Fig. 3.29. The primary power supply ( $V_{DD1}$ ) and threshold voltages ( $|V_{t1}|$ ) are scaled for three different target clock frequencies.



Fig. 3.29. Power dissipation of dual- $V_{DD}$ /dual- $V_t$  and standard single- $V_{DD}$ /single- $V_t$  test circuits with varying supply voltage for three different clock frequencies, assuming a 100 nm CMOS technology. For a dual- $V_{DD}$ /dual- $V_t$  circuit,  $V_{DD2}$  and  $|V_{t2}|$  are fixed at 0.5 volts and 0 volts, respectively. The supply voltage of the low activity circuits ( $V_{DD1}$ ) is varied together with the threshold voltages ( $|V_{t1}|$ ) while maintaining a target clock frequency [53].

As shown in Fig. 3.29, for a target clock frequency of 2 GHz, the optimum supply voltage that minimizes the total power consumption of a dual supply and threshold

voltage test circuit is 1.2 volts. For this supply voltage and for the same clock frequency, the total power consumption of the dual supply and threshold voltage test circuit ( $V_{DD1} = 1.2$  volts,  $|V_{t1}| = 0.32$  volts,  $V_{DD2} = 0.5$  volts, and  $|V_{t2}| = 0$ ) is reduced by 55% as compared to a single supply and threshold voltage circuit ( $V_{DD} = 1.2$  volts and  $|V_t| = 0.32$  volts). The dual supply and threshold voltage circuit technique, therefore, enlarges the design space for supply voltage scaling, lowering the total power consumption as compared to a standard single supply and threshold voltage circuit while maintaining a target clock frequency.

#### 3.5. Dynamic Supply and Threshold Voltage Scaling

As discussed in Section 3.1, dynamic voltage scaling (DVS) is an effective circuit technique for reducing the active mode power of an integrated circuit under varying workload conditions. It is assumed in the DVS circuit technique that the dominant power dissipation mechanism in a CMOS circuit is dynamic switching power. DVS adjusts the supply voltage of a circuit to the minimum voltage required to complete a specific task with a targeted latency, thereby exploiting the quadratic dependence of dynamic switching power on the supply voltage. The threshold voltages are preset during the fabrication process, typically to satisfy a clock frequency for anticipated maximum supply voltage and workload conditions. As the operating conditions (such as temperature) or workload changes, the supply voltage is varied to adjust the operating frequency while the threshold voltages are maintained the same. The DVS circuit technique, therefore, primarily focuses on minimizing dynamic switching power while ignoring the effect of subthreshold leakage current on the total power dissipation of a CMOS circuit.

As the subthreshold leakage current is expected to become a significant contributor to the total power consumption of future nanometer CMOS circuits, dynamic switching and subthreshold leakage power must be balanced to minimize the total active power consumption [5], [97]. As the supply voltage is reduced, the threshold voltages must also be varied to maintain a target clock frequency while

minimizing the total power consumption. The dynamic switching power is quadratically reduced and the subthreshold leakage power is exponentially increased with scaled supply and threshold voltages, respectively. As shown in Fig. 3.30, the total power of a CMOS circuit does not monotonically decrease with smaller supply voltages, provided that the circuit speed is maintained the same. There is an optimum supply and threshold voltage pair that minimizes the total active power of a CMOS circuit at a target clock frequency. As the required clock frequency changes with varying workload, the optimum choice of supply and threshold voltages to minimize the total active power at a different target frequency also changes, as shown in Fig. 3.31. Since the threshold voltages are optimized for a specific operating frequency and supply voltage in a DVS circuit, the DVS circuit technique is not capable of minimizing the active power consumption for clock frequencies other than a nominal (typically, the maximum) clock frequency.



Fig. 3.30. Active mode power dissipation of a CMOS circuit with varying supply voltage for a fixed operating frequency. The threshold voltages are modified together with the supply voltage to maintain a constant frequency [97].

The dynamic supply and threshold voltage scaling circuit technique (DSTVS) dynamically adjusts both the supply and threshold voltages to change the clock

frequency whenever a change in the workload or operating conditions is detected [93], [97]. The DSTVS circuit technique offers flexibility to further reduce the total active mode power dissipation beyond the power savings achievable with the DVS technique by trading subthreshold leakage power for dynamic switching power.



Fig. 3.31. Active power with varying supply voltage for various clock frequencies. With each curve, the threshold voltages are modified together with the supply voltage to maintain a constant frequency [97].

Circuits operating at low or high clock frequencies require the threshold voltages to be significantly scaled as compared to nominal zero body bias threshold voltages. However, since the maximum body bias voltages that can be applied to a CMOS circuit are limited (see the discussion in Section 3.3.1), the range of threshold voltages that can be achieved by the body bias technique is also limited. The theoretical optimum supply and threshold voltages, therefore, may not always be achievable by applying a dynamic supply and threshold voltage scaling technique [97].

#### 3.6. Chapter Summary

The effects of supply voltage scaling on the power consumption of a CMOS circuit are discussed in this chapter. Several circuit techniques that compensate for the degradation in speed at a lower supply voltage are presented.

An effective strategy for lowering power consumption without degrading performance is to dynamically adjust the supply voltage as the workload varies with time. The objective of dynamic voltage scaling is to provide high throughput when executing the computation-intensive tasks while saving energy during other times by adequately lowering the supply voltage and operating speed of a circuit.

The multiple supply voltage CMOS circuit technique exploits the delay differences among the different signal propagation paths within an integrated circuit. The multiple supply voltage circuit technique selectively lowers the supply voltage of the gates along the non-critical delay paths while maintaining a higher supply voltage along the critical delay paths in order to satisfy a target clock frequency.

The most widely employed technique for enhancing the speed of a circuit at a reduced supply voltage is to scale the threshold voltages. Lowering the threshold voltages enhances the gate overdrive ( $|V_{GS}|$ - $|V_t|$ ) of the transistors, thereby reducing the propagation delay of the circuits. Threshold voltage scaling, however, also increases the subthreshold leakage current, short-channel effects, and die-to-die and within-die parameter variations.

Several threshold voltage scaling techniques that lower the deleterious side effects of threshold voltage scaling are discussed. The body bias circuit technique dynamically adjusts the threshold voltage of the transistors by varying the voltage of the body terminal depending upon the changing power and speed requirements during circuit operation. An alternative threshold voltage scaling technique, multiple threshold voltage CMOS, employs transistors with different threshold voltages within the same circuit. A version of the multiple threshold voltage circuit technique reduces the standby leakage power by employing high threshold voltage transistors between the power supply and ground terminals and the low threshold voltage circuitry. These

high threshold voltage switches are cutoff during standby mode to suppress the high subthreshold leakage current characteristics of the low threshold voltage circuitry. Another dual threshold voltage circuit technique reduces the subthreshold leakage power by selectively employing low threshold voltage transistors only along the critical delay paths and high threshold voltage transistors along the non-critical delay paths.

A promising circuit technique aimed at lowering the deleterious side effects caused by supply and threshold voltage scaling is the use of multiple supplies and threshold voltages. With this multiple supply and threshold voltage circuit technique, the supply and threshold voltages in the high activity circuitry are scaled to very low levels while high supply and threshold voltages are maintained in the low activity circuitry. The multiple supply and threshold voltage circuit technique significantly lowers the total power as compared to a standard single supply and threshold voltage circuit without degrading the clock frequency.

Since subthreshold leakage current is expected to become a significant contributor to the total power consumption of future nanometer CMOS circuits, dynamic switching and subthreshold leakage power must be balanced to minimize the total active power consumption. The dynamic supply and threshold voltage scaling circuit technique (DSTVS) dynamically adjusts both the supply and threshold voltages to change the clock frequency whenever a change in the workload or operating conditions is detected. The DSTVS circuit technique offers flexibility to further reduce the total active mode power consumption beyond the power savings achievable with the dynamic voltage scaling technique by trading subthreshold leakage power for dynamic switching power.

# **Chapter 4**

## **Low Voltage Power Supplies**

The speed and power characteristics of an integrated circuit are highly dependent on the supply voltage. For a circuit to operate reliably while satisfying target performance specifications, a stable supply voltage is essential. In most electronic systems (such as a computer system), several circuits with different voltage and current requirements exist. In order to supply these circuits with different voltages, currents, and power ratings, several voltage converters are necessary.

Many integrated circuits (such as microprocessors, digital signal processors, dynamic random access memories, and static random access memories) are designed to operate with a DC supply voltage. Therefore, in a typical computer system, the AC voltage of the utility system is first converted into a DC voltage by an AC-to-DC converter. Once a DC voltage is obtained, several DC-DC converters generate the specific DC voltages required by the different circuit blocks within a system. The DC voltage supplied to a circuit must be maintained within a tight voltage envelope to satisfy the guaranteed performance and functionality of the circuitry under variations of the load current and DC input supply voltage. A power supply should, therefore, provide not only voltage conversion, but also voltage regulation. A DC-DC voltage regulator is a circuit that generates a regulated DC output voltage from a (possibly) unregulated DC input voltage with a different voltage magnitude and/or polarity.

The power from a voltage converter changes dramatically for different applications. In battery operated portable applications, the power demand of a load is typically on the order of a few watts. Power supplies for computers and office equipment can supply hundreds to thousands of watts. In variable speed motor drives, the power required by a load ranges from kilowatts to megawatts. The power levels encountered in rectifiers and inverters that interface DC transmission lines to an AC utility system can be as high as thousands of megawatts [102]. The preferred circuit

topology for voltage conversion in order to satisfy the voltage and current requirements of the load in an energy efficient manner changes with the type of application. Many DC-DC conversion techniques have been developed over the years to provide energy efficient DC-DC conversion for a wide variety of applications [102]-[115].

The systems of interest in this dissertation are battery operated portable devices and computer systems. In a computer system, various blocks operate with different voltage and current requirements. In order to ensure coherent operation of these circuits so that a stable system can be formed and maintained, high quality voltage regulation is necessary. The power supply system of a typical laptop computer is illustrated in Fig. 4.1. A charger with a transformer line isolation converts the AC line voltage to DC in order to charge the battery. A lithium-ion battery supplies unregulated voltage to the entire system. Several voltage converters generate the regulated supply voltages required by different circuit blocks from this unregulated battery input voltage. A buck converter produces the low DC voltage required by a microprocessor. A boost converter increases the battery voltage to the level required by the disk drive. A DC-to-AC converter produces the high frequency AC voltage that supplies the display [102].

A primary factor that determines the quality of a DC-DC converter is the output regulation, the stability of the output voltage over a wide range of input voltages and load currents. The output stability of a voltage regulator is characterized by the output voltage droop and peak-to-peak output voltage ripple under changing conditions of the load current and input voltage. Another important factor that determines the quality of a voltage converter is the energy efficiency of the voltage conversion process. A specific amount of energy is dissipated by the parasitic impedances of a DC-DC converter in order to generate a supply voltage. The choice of DC-DC conversion topology and related circuit techniques for a specific application is critical to the energy efficiency of the voltage conversion process. The energy efficiency  $\eta$  of a DC-DC converter is

$$\eta = \frac{P_{out}}{P_{in}} = \frac{V_{out}I_{out}}{V_{in}I_{in}},\tag{4.1}$$

where  $P_{out}$  is the power supplied to the load,  $I_{out}$  is the load current,  $P_{in}$  is the total power supplied by the input power supply, and  $I_{in}$  is the current drawn from the input power supply. The power consumed by the parasitic impedances of the components within a voltage converter is

$$P_{lost} = P_{in} - P_{out} = P_{out} (\frac{1}{\eta} - 1). \tag{4.2}$$



Fig. 4.1. Power supply system for a laptop computer.

Supply voltage scaling is an essential part of the technology scaling process in order to reduce the rate of increase of power consumption and to maintain device reliability with each new generation of high performance integrated circuits. Integrated circuits, with higher power consumption and lower supply voltages, require innovative DC-DC conversion techniques that can provide substantial amounts of power and

current with high energy efficiency [101]. DC-DC conversion techniques for low voltage integrated circuits are examined in this chapter. The operating principles of linear, switched-capacitor, and switching DC-DC converters are discussed.

Linear regulators are used to generate a DC output voltage with a lower magnitude and the same polarity as compared to a DC input voltage. Linear regulators utilize resistive voltage division to produce an output supply voltage lower than an input supply voltage. Linear converters have intrinsically low efficiency, particularly if the input to output voltage conversion ratio is high [9], [102]. Linear regulators are found in many types of integrated circuits due to the easy design, low circuit complexity, and small area consistent with an on-chip implementation [103]-[106]. The basic operating principles of linear regulators are presented in Section 4.1.

Switched-capacitor DC-DC converters (or charge pumps) are widely used in integrated circuits to modify the amplitude and/or polarity of the primary power supply voltage of a system [9], [107]-[109]. Similar to a linear regulator, the efficiency of a switched-capacitor regulator is typically low. Alternatively, the area occupied by a switched-capacitor regulator is higher than a linear regulator. Unlike a linear regulator, a switched-capacitor DC-DC converter can change the polarity and increase the amplitude of an input supply voltage. Switched-capacitor regulators are, therefore, preferred in on-chip low-to-high voltage conversion or polarity reversing applications. The operating principles of switched-capacitor regulators are reviewed in Section 4.2.

Switching regulators are capable of modifying both the amplitude and polarity of the input voltages [9], [102], [112]-[115]. The primary advantages of a switching regulator are the high conversion efficiency and good output voltage regulation characteristics as compared to a linear or switched-capacitor DC-DC converter. The primary drawback of switching regulators, however, is the inductive elements (inductors and/or transformers) required for energy storage and filtering. Filter inductors are, to date, prohibitive in the fabrication of an on-chip switching DC-DC converter. The operating principles of switching regulators are discussed in Section 4.3. A summary of the low voltage DC-DC conversion circuit techniques presented in this chapter is given in Section 4.4.

#### 4.1. Linear DC-DC Converters

Linear (series-pass) DC-DC converters are popular due to the simple structure and small physical area of these circuits [9], [102]. Linear DC-DC converters operate on the principle of resistive voltage division. The operation of a simple linear voltage converter is illustrated in Fig. 4.2.



Fig. 4.2. A simple voltage divider circuit describing the operating principle of a linear DC-DC converter.

As shown in Fig. 4.2, in an ideal linear converter, the current supplied to the load is equal to the current drawn from the primary power supply  $V_{DDI}$ . The highest efficiency  $\eta_{max}$  attainable with an ideal (lossless) linear converter is, therefore,

$$\eta_{\text{max}} = \frac{V_{DD2}}{V_{DD1}},\tag{4.3}$$

where  $V_{DD2}$  is the DC output voltage supplied to the load and  $V_{DD1}$  is the DC input supply voltage. As given by (4.3), a linear DC-DC converter can only offer high

energy efficiency (regardless of how ideal the circuit components are) if the difference between the input  $(V_{DDI})$  and output  $(V_{DD2})$  voltages is small.

A linear voltage regulator should maintain the output voltage within certain (upper and lower) limits under variations of the load current and input supply voltage. A circuit schematic of a simple linear regulator with feedback circuitry for output voltage regulation is shown in Fig. 4.3. A feedback circuit varies the gate voltage of a series transistor (which behaves as a variable resistor) by comparing the output voltage  $V_{DD2}$  to a reference voltage  $V_{REFERENCE}$ .



Fig. 4.3. A linear voltage regulator.

The ideal maximum efficiency, given by (4.3), is not attainable in a practical voltage regulator due to the energy losses in the parasitic impedances of the feedback circuitry and series switches. With careful design of the feedback circuit, however, the efficiency characteristics of a linear voltage regulator can approach the ideal upper limit. A different metric, called the current efficiency, is often used in the literature to characterize the efficiency of a linear regulator [106]. The current efficiency  $\eta_{current}$  is

$$\eta_{current} = \frac{I_{out}}{I_{in}},\tag{4.4}$$

where  $I_{out}$  is the current supplied to the load and  $I_{in}$  is the current drawn from the input power supply  $V_{DDI}$ . The current efficiency provides a measure of how close the efficiency of a linear regulator is to the ideal upper limit given by (4.3). The relationship between the energy and current efficiencies of a linear regulator is

$$\eta = \frac{V_{DD2}}{V_{DD1}} \eta_{current}. \tag{4.5}$$

As given by (4.5), if the difference of the output and input voltages is high, the energy efficiency of a linear regulator can be quite low despite a high current efficiency.

A high current efficiency linear regulator is proposed in [106]. This linear regulator circuit is shown in Fig. 4.4. By a technique called "flexible control technique of output current (FCOC)" [106], the output current drive capability of a linear regulator is dynamically modified depending upon changes in the load current. The operation of the FCOC technique is illustrated in Fig. 4.5.

The FCOC technique dynamically varies the output current in seven stages determined by variations of the load current (see Fig. 4.5). As shown in Fig. 4.4, an FCOC linear regulator has three independent current mirror amplifiers (A1, A2, and A3) which are either turned on or turned off depending upon the relationship between the instantaneous output voltage and the six reference voltages generated by the circuitry in block B (the reference voltages generator). For each current mirror circuit in block A, separate reference voltages (V<sub>ref-high</sub> and V<sub>ref-low</sub>) are generated within block B. When the output voltage is less than V<sub>ref-low</sub>, N-Tr of the corresponding current mirror is turned on while P-Tr is turned off. When the output voltage is higher than V<sub>ref-high</sub>, P-Tr of the corresponding current mirror is turned on while N-Tr is turned off. With this mechanism, the number of available current paths for charging or discharging the output node of the linear regulator is dynamically modified depending upon the variation of the load current. When the output voltage is within predetermined tolerable limits, all of the pull-up and pull-down transistors connected to the output node are cutoff. This circuit technique, by dynamically varying the

strength of a DC-DC converter to supply current based on varying load current requirements, reduces the power losses while stabilizes the output voltage over a wide range of load currents [106].



Fig. 4.4. A high current-efficiency linear regulator [106].

This linear regulator has been fabricated in a 1.2 µm CMOS technology [106]. For 5 volt to 3 volt conversion, the current efficiency is 96.5% (corresponding to an energy efficiency of 57.9%) at a DC output current of 5.7 mA. The current efficiency is reduced to 90% when the DC load current is increased to 27.35 mA. The fluctuation of the output voltage is between 2.85 volts and 3.11 volts as the input supply voltage changes from 4.5 volts to 5.5 volts. For load currents below 5.7 mA, the output voltage ripple (peak-to-peak) is less than 150 mV.

| Mode                 | Charging Mode           |                                  |                                                   |                                             | Discharging Mode                        |                                    |                          |
|----------------------|-------------------------|----------------------------------|---------------------------------------------------|---------------------------------------------|-----------------------------------------|------------------------------------|--------------------------|
| Driving<br>Current   | 3 × Idriv               | 2 × Idriv                        | 1 × Idriv                                         | <b>÷</b> 0                                  | 1 × Idriv                               | 2×Idriv                            | 3 × Idriv                |
| Circuit<br>Schematic | 10 1                    | T                                | <del>                                      </del> | Vout                                        | T T T                                   |                                    | T                        |
| Vout                 | Vref-low-3<br>V<br>Vout | Vref-low-2<br>Vout<br>Vref-low-3 | Vout                                              | Vref-high-1<br>∨<br>Vout<br>∨<br>Vref-low-1 | Vref-high-2<br>Vout<br>V<br>Vref-high-1 | Vref-high-3<br>Vout<br>Vref-high-2 | Vout<br>V<br>Vref-high-3 |

Fig. 4.5. Diagram representing the operation of the flexible control of the output current (FCOC) technique proposed in [106].

### 4.2. Switched-Capacitor DC-DC Converters

Switched-capacitor DC-DC converters (or charge pumps) are used to generate a DC output supply voltage with a different magnitude and/or an opposite polarity as compared to a DC input supply voltage [9], [107]-[109]. On-chip switched-capacitor DC-DC converters are widely used to supply nonvolatile memory circuits (flash and electrically erasable-programmable read only memories), dynamic random access memories (DRAM), and analog portions of mixed-signal circuits [9], [107]. A schematic representation of a switched-capacitor DC-DC converter that doubles the input voltage is shown in Fig. 4.6.

The operation of the switched-capacitor voltage converter circuit shown in Fig. 4.6 behaves in the following manner. There are two mutually exclusive switching networks controlled by two phase control signals in a switched-capacitor DC-DC converter. Switches labeled by one are controlled by the phase one control signal while the switches labeled by two are controlled by the phase two control signal. The phase one and phase two switch control signals do not overlap. When the phase one switches are activated (the phase two switches are cutoff),  $C_1$  is charged to  $V_{\rm DD1}$ . In

this phase, the output current is supplied by the output capacitor  $C_{out}$ . After  $C_1$  is fully charged to  $V_{DD1}$ , the phase one switches are cutoff and the phase two switches are activated. As a result of this connection, the output capacitor  $C_{out}$  is charged to 2 x  $V_{DD1}$ . Provided that the switching action of the S1 and S2 switches is accomplished at a sufficiently high speed (the required frequency of switching depends upon the load current and output capacitor), the average output voltage ( $V_{DD2}$ ) is maintained at twice  $V_{DD1}$ . An ideal step up conversion ratio of two is, hence, achieved with the circuit illustrated in Fig. 4.6. The output voltage in a practical switched-capacitor DC-DC converter based on the topology illustrated in Fig. 4.6 is, however, less than 2 x  $V_{DD}$  due to the voltage drop across the series resistance of the MOSFET switches. Moreover, in an actual charge pump, the output voltage degrades with increasing load current [107]-[109].



Fig. 4.6. Schematic representation of a switched-capacitor DC-DC converter ( $V_{DD2} = 2 \times V_{DD1}$ ).

A primary disadvantage of a switched-capacitor DC-DC converter is the poor efficiency characteristics. The operation of a switched-capacitor regulator relies on periodically charging/discharging the charge pump capacitors through resistive switches. The internal power losses of a switched-capacitor regulator are, therefore, typically high [9], [107]-[109].

Another disadvantage of a charge pump circuit is the poor output regulation [9]. In order to maintain a steady DC output voltage, a certain amount of charge should be maintained across each charge pump capacitor. The only control mechanism that can be employed in a charge pump regulator to maintain a specific amount of charge in the charge pump capacitors under varying load current conditions is to vary the conductance of the switches charging/discharging the charge pump capacitors. This strategy, however, typically requires high energy consuming feedback circuitry, further degrading the efficiency of the switched-capacitor regulator. An energy efficient feedback control scheme applicable to switched-capacitor regulators does not yet exist [9]. Switched-capacitor circuits are, therefore, typically used in applications with relaxed supply voltage constraints (such as DRAMs) that do not require tight voltage regulation [9], [107].

#### 4.3. Switching DC-DC Converters

A switching DC-DC converter generates a DC output supply voltage with a different magnitude and/or polarity than the DC input voltage. Among DC-DC converter topologies, switching voltage regulators are the most widely used due to the high efficiency and good output voltage regulation characteristics. Unlike a linear or switched-capacitor DC-DC converter, the efficiency of a switching DC-DC converter approaches 100% as the transistor switches are made more ideal (by employing a more advanced fabrication technology with reduced parasitic impedances).

Switching DC-DC converters can be divided into two primary categories. The first category of switching DC-DC converters utilizes transformers. Switching DC-DC converters with transformers are called isolated switching DC-DC converters [102]. The primary use of transformers in switching DC-DC converters is the DC isolation of the input and output grounds. Provided that the primary power supply operates at a relatively high voltage and/or is noisy, isolation of the load from the input supply is necessary to maintain reliable operation of the load. Another advantage of isolated switching DC-DC converters is the relatively easy and straightforward generation of

multiple DC output voltages from a single DC input voltage. A single control circuit can be used to generate several different DC supply voltages by simply utilizing a multiple winding transformer, provided that the voltage regulation requirements of the load circuits are not excessively tight.

A second category of switching DC-DC converters utilizes inductors (no isolating transformers) for energy storage and signal filtering. These switching DC-DC converters without transformers are called non-isolated switching DC-DC converters [102]. Non-isolated switching DC-DC converters are widely used in both low power and low voltage applications. A switching DC-DC converter that generates an output supply voltage with a higher magnitude as compared to the input supply voltage is a boost converter. Alternatively, a switching DC-DC converter that generates an output supply voltage with a smaller magnitude as compared to the input supply voltage is a buck converter. Buck and boost types of non-isolated switching DC-DC converters are widely used to generate voltage levels required by microprocessors, digital signal processors, memory modules, and hard disks in modern computer systems.

In a typical computer system, the power to a microprocessor is supplied by a buck converter. The operation of a buck converter is described in Section 4.3.1. Several power reduction techniques applicable to switching DC-DC converters are discussed in Section 4.3.2.

#### 4.3.1. Operation of a Buck Converter

A buck converter is a standard switching DC-DC converter circuit topology with high efficiency and good output voltage regulation characteristics [9], [26], [30], [102], [112]-[115]. Buck converters are used to generate a DC output voltage from a higher DC input voltage with the same polarity. A buck converter is the preferred voltage regulator for a typical state-of-the-art high performance microprocessor. A typical buck converter circuit with a synchronous rectifier is shown in Fig. 4.7. Traditionally, a Schottky diode is employed for rectification in a buck converter. However, as described in [110] and [111], in low voltage applications the overhead of

the voltage drop across the diode p-n junction significantly degrades the efficiency. Therefore, in low voltage buck converters the diode is replaced by a MOSFET rectifier (N1 in Fig. 4.7) for improved efficiency [9], [102].



Fig. 4.7. Buck converter circuit.

The operation of a buck converter circuit behaves in the following manner. The power MOSFETs, labeled as  $P_1$  and  $N_1$  in Fig. 4.7, produce an AC signal at node<sub>1</sub> by a switching action controlled by a pulse width modulator (PWM). The AC signal at node<sub>1</sub> is applied to a second order low pass filter composed of an inductor and a capacitor. Assuming the filter corner frequency is much smaller than the switching frequency  $f_s$  of the power MOSFETs, the low pass filter passes to the output the DC component of the AC signal at node<sub>1</sub> and a small amount of high frequency harmonics generated by the switching action of the power MOSFETs.

The buck converter output voltage  $V_{DD2}(t)$  is [26], [30]

$$V_{DD2}(t) = V_{DD2} + V_{ripple}(t),$$
 (4.6)

where  $V_{DD2}$  is the DC component of the output voltage and  $V_{ripple}(t)$  is the voltage ripple waveform observed at the output due to the non-ideal characteristics of the output filter. The DC component of the output voltage is [26], [30]

$$V_{DD2} = \frac{1}{T_s} \int_{0}^{T_s} V_s(t) dt = DV_{DD1}, \tag{4.7}$$

where  $V_s(t)$  is the AC signal generated at node<sub>1</sub> and  $T_s$ , D, and  $V_{DDI}$  are the period, duty cycle, and amplitude, respectively, of  $V_s(t)$ . As given by (4.7), any positive DC output voltage less than  $V_{DDI}$  can be generated by a buck converter.

The power transistors are typically large in physical size and have high parasitic input capacitances. To control the operation of the power transistors, therefore, a series of MOSFET gate drivers is required. These gate driver buffers are typically tapered to drive these large capacitive loads [31], [120]. The gate driver buffers are controlled by a pulse width modulator. Using a feedback circuit, the PWM generates the necessary control signals for the power MOSFETs such that a square wave with an appropriate duty cycle is produced at node<sub>1</sub>. During operation of a buck circuit, the duty cycle may be modified in order to maintain the output voltage at a desired value whenever variations in the load current and input voltage are detected. Due to the strong dependence of the output voltage on the switching duty cycle [see (4.7)], precise output voltage regulation can be produced by a buck converter with a fast feedback circuit [26], [30], [31].

The inductor current  $i_L(t)$ , output voltage  $V_{DD2}(t)$ , and capacitor current  $i_C(t)$  waveforms of a buck converter are shown in Fig. 4.8. The output voltage ripple is exaggerated in Fig. 4.8 for better illustration. In a typical buck converter, the amplitude of the output voltage ripple  $\Delta V_{DD2}$  must be maintained at a small level (typically less than 1%) as compared to the output DC voltage  $V_{DD1}$ .



Fig. 4.8. Inductor current  $i_L(t)$ , output voltage  $V_{DD2}(t)$ , and capacitor current  $i_C(t)$  waveforms.

The filter capacitance is chosen such that the impedance of the capacitor is much smaller than the load impedance. The AC component of the inductor current, therefore, passes through the filter capacitor while the DC component I passes through the load (see Fig. 4.8). The output voltage increases whenever the inductor current rises above I, as the filter capacitor is being charged. Similarly, the output voltage falls whenever the inductor current decreases below I, as the filter capacitor is being discharged.

Expressions for the inductor current ripple  $\Delta i$  and the amplitude of the output voltage ripple  $\Delta V_{DD2}$  (see Fig. 4.8) are [26], [30], respectively,

$$\Delta i = \frac{(V_{DD1} - V_{DD2})D}{2Lf_s},\tag{4.8}$$

$$\Delta V_{DD2} = \frac{(V_{DD1} - V_{DD2})D}{16LCf_s^2} = \frac{\Delta i}{8Cf_s},$$
(4.9)

where L is the filter inductance, C is the filter capacitance, and  $f_s$  is the switching frequency.

#### 4.3.2. Power Reduction Techniques for Switching DC-DC Converters

In low power portable systems the compactness and energy efficiency of a DC-DC converter is important due to the limitations of the available physical space, the limited effectiveness of the cooling solutions, and the need to extend battery life as much as possible. Switching DC-DC converters typically have large capacitive and inductive storage elements and power switches that significantly increase both the area and power. The sizes of the active and passive devices in a switching DC-DC converter are reduced with higher switching frequencies [26], [30], [31]. Increasing the switching frequency, however, also increases the power MOSFET related switching losses [26], [30]. Increasing the switching frequency beyond a certain value, therefore, degrades the converter efficiency [26], [30].

Two techniques have been proposed in the literature for reducing the dynamic switching power of the power MOSFETs and are briefly discussed in this section. The zero voltage switching (soft switching) technique can substantially reduce switching losses associated with high frequency operation [9], [113]. In a zero voltage switching (ZVS) scheme, the filter inductor is used to charge/discharge the parasitic capacitances at the input of the output filter (node<sub>1</sub> in Fig. 4.7) in a lossless manner. Provided that the activation times of the power transistors are carefully controlled, the parasitic capacitance at node<sub>1</sub> can be switched ideally without a power loss (neglecting the power dissipated by the series resistance of the filter inductor). If the power transistor

 $P_1$  ( $N_1$ ) is turned on immediately after node<sub>1</sub> is charged (discharged) by the filter inductor, the power transistors are switched to a zero drain-to-source voltage difference, thereby eliminating the switching power losses that would have, otherwise, been dissipated in the power MOSFET while charging (discharging) node<sub>1</sub>. The purpose of the power transistors in a ZVS voltage regulator is, therefore, to maintain the voltage of node<sub>1</sub> at either  $V_{DD1}$  or ground rather than charging or discharging node<sub>1</sub>.

The activation time of the power transistors is critical for providing effective power savings with the ZVS circuit technique [9], [113]. The time required to charge/discharge node<sub>1</sub> depends upon the load current. To provide an effective ZVS over a wide range of loads, an adaptive dead time control scheme is proposed in [113]. The proposed technique dynamically adjusts the activation time of the power MOSFETs depending upon the instantaneous load current.

A similar power reduction technique, called the resonant gate drive technique, is proposed in [116] and [117] for lossless switching of the gate oxide related parasitic capacitances of the power MOSFETs. Based on a similar principle as the ZVS circuit technique, the resonant gate drive technique charges/discharges the input capacitances of the MOSFETs through an ideally lossless resonant circuit. Similar to the ZVS circuit technique, the resonant circuit technique stores energy in the inductors, utilizing this energy to charge or discharge the gate oxide related parasitic capacitors. Provided that the activation times of the gate driver transistors are carefully controlled, the resonant gate drive technique can significantly reduce the power dissipated by the power MOSFET gate drivers [117].

## 4.4. Chapter Summary

Three power supply topologies used in low voltage applications are reviewed in this chapter. The operating principles of linear, switched-capacitor, and switching DC-DC converters are presented. A comparison of the electrical characteristics and typical applications of the linear, switched-capacitor, and switching DC-DC converters is listed in Table 4.1.

TABLE 4.1

A COMPARISON OF THE ELECTRICAL CHARACTERISTICS AND

TYPICAL APPLICATIONS OF THE LINEAR, SWITCHED-CAPACITOR, AND

SWITCHING DC-DC CONVERTERS

| Type of DC-DC Converter | Linear | Switched-Capacitor                          | Switching                                       |  |
|-------------------------|--------|---------------------------------------------|-------------------------------------------------|--|
| Low-to-High             | No     | Yes                                         | Yes                                             |  |
| High-to-Low             | Yes    | Yes                                         | Yes                                             |  |
| Polarity Reversal       | No     | Yes                                         | Yes                                             |  |
| Efficiency              | Low    | Low                                         | High                                            |  |
| Voltage<br>Regulation   | Poor   | Poor                                        | Good                                            |  |
| Area                    | Small  | Medium                                      | Large                                           |  |
| Typical Applications    | DRAM   | DRAM, Flash,<br>EEPROM, and<br>Mixed-Signal | Microprocessors, DSPs,<br>SRAMs, and Hard Disks |  |

Linear regulators are used to generate a DC output voltage with a lower magnitude and the same polarity as compared to a DC input voltage. Linear regulators utilize resistive voltage division to produce an output supply voltage lower than an input supply voltage. Linear converters have intrinsically low efficiency, particularly if the input to output voltage conversion ratio is high. Linear regulators are found in

many types of integrated circuits (such as high-density DRAMs) due to the easy design, low circuit complexity, and small area consistent with an on-chip implementation.

Switched-capacitor DC-DC converters (or charge pumps) are widely used in integrated circuits to modify the amplitude and/or polarity of the primary power supply voltage of a system. Similar to a linear regulator, the efficiency of a switched-capacitor regulator is typically low. Alternatively, the area occupied by a switched-capacitor regulator is higher than a linear regulator. Unlike a linear regulator, a switched-capacitor DC-DC converter can change the polarity and increase the amplitude of an input supply voltage. Switched-capacitor regulators are, therefore, preferred in on-chip low-to-high voltage conversion or polarity reversing applications (such as flash and electrically erasable-programmable read only memories, DRAMs, and analog portions of mixed-signal circuits).

Switching regulators are capable of modifying both the amplitude and polarity of the input voltages. The primary advantages of a switching regulator are the high conversion efficiency and good output voltage regulation characteristics as compared to a linear or switched-capacitor DC-DC converter. A switching DC-DC converter is typically composed of discrete active and passive components and hence occupies a large area. The primary drawback of switching regulators is the inductive storage elements (inductors and/or transformers) required for energy storage and filtering. Filter inductors are, to date, prohibitive in the fabrication of an on-chip switching DC-DC converter.

A switching DC-DC converter that generates an output supply voltage with a higher magnitude as compared to the input supply voltage is a boost converter. Alternatively, a switching DC-DC converter that generates an output supply voltage with a smaller magnitude as compared to the input supply voltage is a buck converter. Buck and boost types of non-isolated switching DC-DC converters are widely used to generate voltage levels required by microprocessors, digital signal processors, memory modules, and hard disks in modern computer systems. In a typical computer system, the power to a microprocessor is supplied by a buck converter. The operation of a

buck converter is described. Several power reduction techniques applicable to switching DC-DC converters are discussed.

## Chapter 5

# Analysis of Buck Converters for On-Chip Integration with a Dual Supply Voltage Microprocessor

Decreasing the power dissipation and current demand of high performance microprocessors are the two primary reasons for implementing a dual supply voltage (dual- $V_{DD}$ ) microprocessor [26], [30]. Due to the quadratic dependence of the dynamic switching power and the more than linear dependence of the subthreshold and gate oxide leakage power on the supply voltage, power dissipation is significantly reduced when portions of a microprocessor operate at a lower voltage level. A linear relationship exists between the current demand and power consumption of a microprocessor. Reducing the maximum power consumption, therefore, reduces the maximum current required by a microprocessor, thereby decreasing the number of power and ground pads on a microprocessor die. In order to maximize this reduction in current, the lower voltage supply of a dual- $V_{DD}$  microprocessor should be integrated on the same die with the microprocessor. Moreover, in order to fully exploit expected reductions in power and current, the energy overhead of an integrated DC-DC converter to produce a second voltage level must be minimized.

Buck converters are popular due to the high efficiency and good output voltage regulation characteristics of these circuits [9], [26], [30]. In single power supply microprocessors, the primary power supply is typically an external (non-integrated) buck converter. In a dual- $V_{DD}$  microprocessor, the choices are either a second external DC-DC converter, or a monolithic (both active and passive devices on the same die as the load) DC-DC converter.

In a typical non-integrated switching DC-DC converter, significant energy is dissipated by the parasitic impedances of the interconnect among the non-integrated devices (the filter inductor, filter capacitor, power transistors, and pulse width modulation circuitry) [9], [30]. Moreover, the integrated active devices of a pulse width modulation circuit are typically fabricated in an old technology with poor parasitic impedance characteristics [30].

Integrating a DC-DC converter with a microprocessor can potentially lower the parasitic losses as the interconnect between (and within) the DC-DC converter and the microprocessor is reduced [26], [30]. Additional energy savings can be realized by utilizing advanced deep submicrometer fabrication technologies with lower parasitic impedances. The efficiency attainable with a monolithic DC-DC converter, therefore, is higher than a non-integrated DC-DC converter [26], [30].

Fabrication of a monolithic switching DC-DC converter, however, imposes a challenge as the on-chip integration of inductive and capacitive devices is required for energy storage and output signal filtering. Integrated capacitors and inductors above certain values are not acceptable due to the tight area constraints that exist within high performance microprocessor integrated circuits (ICs). Another significant issue with integrated inductors is the poor parasitic impedance characteristics which can degrade the efficiency of a voltage regulator. The value, physical size, and parasitic impedances of the passive devices required to implement a buck converter, however, are reduced with increasing switching frequency [26], [30]. Integrated capacitors of small value (used for decoupling and constrained by the available area on the microprocessor die) are available in high performance microprocessors [118]. Furthermore, with the use of magnetic materials, a new integrated microinductor technology with relatively small parasitic impedances and higher cutoff frequencies (over 3 GHz) has recently been reported [119]. Therefore, employing switching frequencies higher than the typical switching frequency range found in conventional DC-DC converters permits the on-chip integration of active and passive devices of a buck converter onto the same die as a high performance microprocessor [30].

The efficiency characteristics of a buck converter, however, change dramatically as the switching frequency is increased. The switching frequency of DC-DC converters has been, so far, limited to the range from a few KHz to a few MHz [9], [26], [30], [102], [112]-[115]. Based on oversimplified circuit models of switching DC-DC converters, a general assumption in the research community has been that a high switching frequency DC-DC converter is not feasible with the expectation that the efficiency would degrade significantly due to the increased power losses at high switching frequencies [9], [102]. The low switching frequency range utilized in typical non-integrated DC-DC converters has been a result of this assumption rather than based on a study modeling the variation of the DC-DC converter efficiency as a function of the switching frequency. Comprehensive circuit models of the parasitic impedances of monolithic switching DC-DC converters are necessary in order to characterize an optimum circuit configuration with the maximum efficiency.

A parasitic model is presented in this chapter to analyze the frequency dependent efficiency characteristics of a buck converter. A closed form expression that characterizes the power consumption of a monolithic buck converter is proposed [26], [30]. The effects of scaling the active and passive devices and the related switching and conduction losses on the total power characteristics of a buck converter are examined. With the proposed buck converter energy model, a design space which characterizes the integration of both active and passive devices on the same die as a dual-V<sub>DD</sub> microprocessor while maintaining high efficiency is determined for an 80 nm CMOS technology. An efficiency of 88.4% is shown for a voltage conversion from 1.2 volts to 0.9 volts while supplying 9.5 amperes maximum current. The area of the buck converter at the target design point is 12.6 mm<sup>2</sup> which is primarily occupied by a 100 nF filter capacitor. Full integration of a high efficiency buck converter on the same die as a dual-V<sub>DD</sub> microprocessor is demonstrated to be feasible.

The proposed parasitic circuit model and a closed form expression of the average power dissipation of a buck converter are presented in Section 5.1. With the proposed analytic model, the efficiency characteristics of a buck converter are investigated in

Section 5.2. Simulation results at a target design point are presented in Section 5.3. A summary of the research results presented in this chapter is offered in Section 5.4.

#### 5.1. Circuit Model of a Buck Converter

A circuit model has been developed to analyze the frequency dependence of the efficiency characteristics of a buck converter. The proposed circuit model for the parasitic impedances of a buck converter is shown in Fig. 5.1.



Fig. 5.1. Circuit model of the parasitic impedances of a buck converter.

The power consumption of a buck converter is a combination of the conduction losses caused by the parasitic resistive impedances and the switching losses due to the

parasitic capacitive impedances of the circuit components. The power consumption of the pulse width modulation feedback circuit is typically small as compared to the power consumption of the power train (the power MOSFETs, MOSFET gate drivers, filter inductor, and filter capacitor) [26], [30], [31]. Only the power dissipation of the power train components is, therefore, considered in the efficiency analysis.

MOSFET related power losses are analyzed in Section 5.1.1. An analysis of the filter inductor related losses is presented in Section 5.1.2. The filter capacitor related losses are discussed in Section 5.1.3. An analytical expression for the total power dissipation of a buck converter is presented in Section 5.1.4.

#### 5.1.1. MOSFET Related Power Losses

The total power loss of a MOSFET is a combination of conduction losses and dynamic switching losses. The conduction power is dissipated in the series resistance of the transistors operating in the active region. The dynamic power is dissipated each switching cycle while charging/discharging the gate oxide, gate-to-source/drain overlap, and drain-to-body junction capacitances of the MOSFETs. In the following analysis it is assumed that the PWM control signals applied to P<sub>1</sub> and N<sub>1</sub> are non-overlapping. There is, therefore, no short-circuit current path through P<sub>1</sub> and N<sub>1</sub> during the PWM signal transition. The short-circuit power dissipated in the gate drivers is also neglected assuming the transition times of the input signal applied at each power MOSFET gate driver is smaller than the output transition times [9], [61], [120].

The average power consumption of a power MOSFET and the related gate drivers is

$$P_{MOS} = \frac{R_0}{W}i_{rms}^2 + EWf_s, \tag{5.1}$$

$$E \cong \frac{\alpha}{\alpha - 1} (C_{ox} + C_{gs} + 2C_{gd} + C_{db}) V_{DD1}^2, \tag{5.2}$$

where  $P_{MOS}$  is the total power consumed during a switching cycle of a power MOSFET (which includes the power dissipated by the MOSFET gate drivers),  $R_0$  is the equivalent series resistance of a 1  $\mu$ m wide transistor,  $i_{rms}$  is the rms current passing through the power MOSFET, W is the width of the power MOSFET,  $\alpha$  is the tapering factor of the power MOSFET gate drivers,  $C_{ox}$ ,  $C_{gs}$ ,  $C_{gd}$ , and  $C_{db}$  are the gate oxide, gate-to-source overlap, gate-to-drain overlap, and drain-to-body junction capacitances, respectively, of a 1  $\mu$ m wide MOSFET, and E is the unit energy (per 1  $\mu$ m wide power MOSFET) consumed during a full switching cycle of a power MOSFET (includes the energy dissipated in the gate drivers).

As given by (5.1), increasing the MOSFET transistor width reduces the conduction losses while increasing the switching losses. An optimum MOSFET width, therefore, exists that minimizes the total MOSFET related power. The optimum MOSFET width and power loss expressions for a target rms current and switching frequency are

$$W_{opt} = \sqrt{\frac{R_0 i_{rms}^2}{f_s E}},\tag{5.3}$$

$$P_{MOS}(\min) = 2\sqrt{R_0 i_{rms}^2 f_s E}.$$
 (5.4)

As mentioned before, it is assumed that the PWM signals for the power MOSFETs are non-overlapping. The time period during which both  $N_1$  and  $P_1$  are cutoff is called the dead time. The rms currents through  $N_1$  and  $P_1$  (assuming a small dead time to switching period  $(T_s)$  ratio as compared to D) are

$$i_{rms}(NMOS) = \sqrt{(1-D)(I^2 + \frac{\Delta i^2}{3})},$$
 (5.5)

$$i_{rms}(PMOS) = \sqrt{D(I^2 + \frac{\Delta i^2}{3})},$$
 (5.6)

where *I* is the DC current supplied to the load.

Applying (5.4) for  $N_1$  and  $P_1$  and substituting the rms current expressions (5.5) and (5.6), an expression for the total MOSFET related optimized power consumption of a buck converter  $P_{tot,MOS}(opt)$  is

$$P_{tot,MOS}(opt) = a\sqrt{(I^2 + \frac{\Delta i^2}{3})f_s}, \qquad (5.7)$$

$$a = 2\left[\sqrt{R_{0NMOS}(1-D)E_{NMOS}} + \sqrt{R_{0PMOS}DE_{PMOS}}\right].$$
 (5.8)

#### 5.1.2. Filter Inductor Related Power Losses

Some portion of the total energy consumption of a buck converter is due to the series resistance and the stray capacitance of the filter inductor. Integrated spiral inductors have a high series resistance and other intrinsic problems associated with a planar design which makes these inductors area inefficient [119]. Integration of a spiral inductor with sufficient inductance is, therefore, not feasible for a high performance microprocessor. A novel low resistance inductor has recently been reported [119]. Assuming the inductor parasitic impedances scale linearly with the inductance [121], the total power dissipated in the filter inductor is

$$P_{tot,inductor} = b \left[ \frac{I^2}{\Delta i f_s} + \frac{\Delta i}{3 f_s} + \frac{C_{L0} V_{DD1}^2}{R_{L0} \Delta i} \right], \tag{5.9}$$

$$b = \frac{(V_{DD1} - V_{DD2})DR_{L0}}{2},\tag{5.10}$$

where  $C_{L0}$  and  $R_{L0}$  are, respectively, the parasitic stray capacitance and parasitic series resistance per nH inductance.

#### 5.1.3. Filter Capacitor Related Power Losses

The filter capacitance affects the total power consumption of a buck converter due to the effective series resistance (esr)  $R_C$ . Assuming the integrated capacitor is implemented utilizing the gate oxide capacitance of a MOSFET, the total power dissipation of a filter capacitor is

$$P_{tot,capacitor} = df_s \Delta i, \tag{5.11}$$

$$d = \frac{8R_{0cap}L_{cap}C_0\Delta V_{DD2}}{3},$$
(5.12)

where  $R_{0cap}$  is the effective series resistance of a 1  $\mu$ m wide MOSFET,  $C_0$  is the gate oxide capacitance per  $\mu$ m<sup>2</sup>, and  $L_{cap}$  is the channel length of the MOSFET.

#### 5.1.4. Total Power Consumption of a Buck Converter

Combining (5.7), (5.9), and (5.11), the total power consumption of a buck converter is

$$P_{buck} = a\sqrt{(I^2 + \frac{\Delta i^2}{3})f_s} + b\left[\frac{I^2}{\Delta i f_s} + \frac{\Delta i}{3f_s} + \frac{C_{L0}V_{DD1}^2}{R_{L0}\Delta i}\right] + df_s\Delta i,$$
 (5.13)

where a, b, and d are given by (5.8), (5.10), and (5.12), respectively.

The power dissipation of a buck converter is a strong function of the switching frequency and the inductor current ripple. As given by (5.13), a higher switching frequency increases the MOSFET and filter capacitor related losses while decreasing the filter inductor related losses. Similarly, the MOSFET and filter capacitor power losses increase with greater inductor current ripple. The relationship between the inductor losses and the inductor current ripple however is more complicated. Increased current ripple reduces the filter inductance required for a target switching frequency, which reduces the inductor parasitic impedances and the related power loss. A higher current ripple, however, also increases the rms current through the filter inductor which causes the conduction losses of the inductor to be larger.

Depending upon the ratio of the inductor and MOSFET related components of the total power dissipation of a buck converter, the efficiency can actually increase with higher switching frequency and current ripple within a specified  $(f_s, \Delta i)$  range. This observation agrees with the analysis presented in Section 5.2.

#### 5.2. Efficiency Analysis of a Buck Converter

The efficiency of a buck converter is

$$\eta = 100 \times \frac{P_{load}}{P_{load} + P_{buck}},\tag{5.14}$$

where  $P_{load}$  is the average power delivered to the load and  $P_{buck}$  is the average total internal power consumption of a buck converter as given by (5.13).

The DC-DC converter efficiency is strongly dependent on the switching frequency  $f_s$ . Switching frequency is, therefore, a primary design variable in this analysis. High  $f_s$  is desirable for a monolithic buck converter due to the dependence of

the filter inductance and capacitance on  $f_s$  as described by (4.8) and (4.9). As  $f_s$  is increased, values of L and C required to satisfy the target output voltage and current are reduced. Since the integration of the active and passive devices of a buck converter circuit is a primary concern in this analysis, a frequency range higher than the typical ranges found in conventional buck converters is used throughout the analysis. The range of switching frequency  $f_s$  is varied from 10 MHz to 4 GHz.

As given by (5.13), another buck converter circuit parameter that strongly affects the circuit efficiency is the inductor current ripple  $\Delta i$ . For a target  $f_s$ , increasing  $\Delta i$  reduces the required filter inductance [see (4.8)]. The filter capacitance, however, must be increased to maintain the output voltage ripple  $\Delta V_{DD2}$  within acceptable limits with increased  $\Delta i$  for a target  $f_s$  [see (4.9)]. An appropriate  $\Delta i$ , therefore, should be chosen that results in a filter inductance and capacitance suitable for on-chip integration.

In the following analysis, it is assumed that the two power supply voltage levels used in the microprocessor are 1.2 volts ( $V_{DDI}$ ) and 0.9 volts ( $V_{DD2}$ ). The maximum load current demand I is assumed to be 9.5 amperes. It is also assumed that the tapering factor  $\alpha$  of the power MOSFET drivers is two for a worst case energy efficiency analysis. It should be noted that an optimal tapering factor of the power MOSFET gate drivers for energy efficiency is typically much greater than the tapering factor assumed in this analysis [see Chapter 6]. An 80 nm CMOS technology is assumed.

The global maximum efficiency circuit configuration is discussed in Section 5.2.1. The effect of a reduced filter capacitance on the circuit configuration and the resulting efficiency characteristics of a buck converter are analyzed in Section 5.2.2. The allowable output voltage ripple  $\Delta V_{DD2}$  is assumed to be 5 mV in Sections 5.2.1 and 5.2.2. Another advantage of an integrated DC-DC converter is that a higher  $\Delta V_{DD2}$  is acceptable as compared to a non-integrated DC-DC converter, while satisfying the same load voltage and current specifications. The beneficial effects of increasing  $\Delta V_{DD2}$  on the efficiency characteristics of a buck converter are examined in Section 5.2.3.

#### 5.2.1. Circuit Analysis for Global Maximum Efficiency

The power dissipation and efficiency variation of a buck converter are shown in Figs. 5.2 and 5.3, respectively, for 0.1 amperes  $\leq \Delta i \leq 9.5$  amperes and 10 MHz  $\leq f_s \leq 4$  GHz. The "z" axis represents the power (in watts) and the efficiency (%) in Figs. 5.2 and 5.3, respectively. The MOSFET, filter inductor, and filter capacitor components of the total power dissipation of a buck converter are shown in Figs. 5.4, 5.5, and 5.6, respectively. The "z" axis in Figs. 5.4, 5.5, and 5.6 represents the power (in watts).



Fig. 5.2. Total power consumption of a buck converter as a function of  $f_s$  and  $\Delta i$ .

As shown in Figs. 5.4 and 5.6, the MOSFET and capacitor related power increases with increasing switching frequency and inductor current ripple.

Alternatively, as shown in Fig. 5.5, the inductor power monotonically decreases with increasing switching frequency and inductor current ripple. The capacitor power is negligibly small (less than 1%) as compared to the inductor and MOSFET power over the entire  $(f_s, \Delta i)$  range of analysis. The filter capacitor losses, although included in the analysis, are therefore not further discussed in the chapter.



Fig. 5.3. Efficiency of a buck converter as a function of  $f_s$  and  $\Delta i$ .

The efficiency of a buck converter is characterized by competing inductor and MOSFET losses. At low  $f_s$  and  $\Delta i$ , the buck converter power is primarily dissipated in the filter inductor. As the switching frequency and current ripple are increased, the inductance is dramatically reduced, lowering the parasitic losses of the inductor. The MOSFET power increases, however, with increasing  $f_s$  and  $\Delta i$ . At a certain range of  $f_s$  and  $\Delta i$  the inductor losses dominate the total losses. As shown in Fig. 5.2, the total

power dissipation of a buck converter decreases with increasing  $f_s$  and  $\Delta i$  in the range dominated by the inductor losses. After the peak efficiency is reached, increasing MOSFET losses begin to dominate the total power dissipation of a buck converter. Hence, the efficiency degrades with further increases in  $f_s$  and  $\Delta i$ .

An optimum switching frequency and inductor current ripple pair exists that maximizes the efficiency of a buck converter. The global maximum efficiency is 92% at a switching frequency of 114 MHz and a current ripple of 9.5 amperes. The required filter capacitance and inductance at this operating point are 2083 nF and 104 pH, respectively. This filter capacitor would occupy an unacceptably large area on a microprocessor die for the target technology. Fabrication of a monolithic DC-DC converter at this maximum efficiency operating point is, therefore, not feasible.



Fig. 5.4. Variation of the total MOSFET related optimized power (including the power dissipated in the gate driver buffers of the power MOSFETs) with the switching frequency and inductor current ripple.



Fig. 5.5. Variation of the total power dissipated in the filter inductor with the switching frequency and inductor current ripple.



Fig. 5.6. Variation of the total power dissipated in the filter capacitor with the switching frequency and inductor current ripple.

#### 5.2.2. Circuit Analysis with Limited Filter Capacitance

Because of the area overhead of an integrated capacitor, the filter capacitance that can be integrated on a microprocessor die is limited. The filter capacitance is swept between 100 nF and 1 nF to evaluate the effects of a reduced filter capacitance on the circuit configuration and the efficiency characteristics of a buck converter. The circuit configurations at each operating point offering the highest efficiency ( $\eta$ ) are listed in Table 5.1.

TABLE 5.1

MAXIMUM EFFICIENCY CIRCUIT CONFIGURATIONS OF A BUCK
CONVERTER WITH DIFFERENT FILTER CAPACITANCES

| C<br>(nF) | η (%) | f <sub>s</sub> (MHz) | L<br>(pH) | W <sub>P1</sub> (mm) | W <sub>N1</sub> (mm) |
|-----------|-------|----------------------|-----------|----------------------|----------------------|
| 1         | 74.7  | 3174                 | 279       | 50.8                 | 20.2                 |
| 10        | 82.8  | 1227                 | 187       | 81.7                 | 32.5                 |
| 100       | 88.4  | 477                  | 124       | 131.9                | 52.5                 |

As listed in Table 5.1, an efficiency of 88.4% can be achieved with a 100 nF filter capacitance. The area occupied by the maximum efficiency configuration with a 100 nF filter capacitance is 12.6 mm<sup>2</sup>. The maximum achievable efficiency is reduced to 74.7% as the filter capacitance is lowered to 1 nF. The reason for the increase in power dissipation with reduced filter capacitance is explained by the relationship between the filter inductor, filter capacitor, output voltage ripple, and the inductor current ripple, as described by (4.8) and (4.9). As the filter capacitance is reduced, the filter inductance and switching frequency are both increased to satisfy the output voltage and current requirements. Therefore, both the switching and conduction power dissipation of the power MOSFETs and the filter inductor increase with reduced filter

capacitance, thereby degrading the converter efficiency. Note that the conduction and switching components of the MOSFET power dissipation are equal at the optimum transistor width. Both power components increase due to increasing  $f_s$  and MOSFET series resistance  $R_{on}$  as the filter capacitance is reduced.

With this analysis, a design space is presented that supports full integration of a high efficiency buck converter onto a microprocessor die. With further capacitor space available on the microprocessor die, the attainable efficiency increases towards the global maximum efficiency of 92% as described in Section 5.2.1. Another advantage of a higher filter capacitance is the lower switching frequency requirement, thereby improving circuit reliability and making the design of the pulse width modulation circuitry less complicated.

#### 5.2.3. Output Voltage Ripple Constraint

In an external (non-integrated) DC-DC converter, as the current demand of the microprocessor varies during operation with changing circuit activity level, the voltage supplied to the load also varies due to the resistance of the interconnect between the converter output and the microprocessor input. A droop window of 10% is, typically, allowed as the microprocessor current demand steps from a minimum (caused by standby leakage current) to a maximum. The external wiring (the interconnect between the converter output and the on-chip power distribution network) that exists in an external DC-DC converter does not occur in an on-die DC-DC converter. A larger portion of the acceptable 10% voltage drop window can therefore be applied to the output voltage ripple of an integrated DC-DC converter.

The effect of increasing the output voltage ripple on the circuit configuration and efficiency characteristics of a buck converter is examined in this section. The output voltage ripple  $\Delta V_{DD2}$  is increased from 5 mV (the value assumed in Sections 5.2.1 and 5.2.2) to 25 mV. The filter capacitance C is also increased from 1 nF to 100 nF. The maximum efficiency attainable with each  $\Delta V_{DD2}$  and C pair are shown in Fig. 5.7a. The switching frequency and filter inductance of the buck converter circuit

configuration offering the highest efficiency are shown in Figs. 5.7b and 5.8a, respectively. The filter inductor and MOSFET components of the total power dissipation of a buck converter are illustrated in Fig. 5.8b.



Fig. 5.7. Variation of maximum efficiency and switching frequency of a buck converter with filter capacitance C (1 nF < C < 100nF) and output voltage ripple  $\Delta V_{DD2}$  (5 mV <  $\Delta V_{DD2}$  < 25 mV). (a) Maximum efficiency. (b) Switching frequency.





Fig. 5.8. Variation of filter inductance and MOSFET and inductor related power components of a buck converter with filter capacitance C (1 nF < C < 100nF) and output voltage ripple  $\Delta V_{DD2}$  (5 mV <  $\Delta V_{DD2}$  < 25 mV). (a) Filter inductance. (b) Total MOSFET and inductor related power components.

As shown in Figs. 5.7b and 5.8a, increasing the output voltage ripple reduces the switching frequency and filter inductance required to satisfy the DC-DC converter output voltage and current specifications for a fixed filter capacitance. With decreased switching frequency and filter inductance, both the MOSFET and inductor related components of the total power dissipation of a buck converter are reduced, as shown in Fig. 5.8b. The efficiency attained by a limited filter capacitance, therefore, increases by relaxing the output voltage ripple constraint. Moreover, as the required filter inductance is reduced, the die area required for the integrated filter inductor becomes smaller. Similarly, as the required switching frequency is reduced, the circuit reliability increases while the design of the pulse width modulation circuit becomes less complicated. As shown in Fig. 5.7a, the maximum achievable efficiency increases by up to 7.9% as the output voltage ripple is increased from 5 mV to 25 mV. Similarly, the filter inductance and switching frequency required for a corresponding maximum efficiency configuration are reduced by 24% and 48.7%, respectively, as  $\Delta V_{DD2}$  is increased from 5 mV to 25 mV.

#### 5.3. Simulation Results

The buck converter circuit configuration that produces the maximum efficiency (see Table 5.1) with a filter capacitance of 100 nF and an output voltage ripple of 5 mV is evaluated assuming an 80 nm CMOS technology. The analytical expression [see (5.13)] for the total power consumption of a buck converter is effective in estimating the circuit efficiency characteristics. The buck converter efficiency as determined by simulation at the target design point is 86% which only differs by 2.4% from the efficiency determined from the analytic expression.

The converter output voltage (which supplies 9.5 amperes of DC current to the load) is shown in Fig. 5.9a. The peak-to-peak output voltage ripple is actually lower than the analytic expectation of 10 mV. This behavior is noted since the voltage drop across the equivalent parasitic resistance of the power MOSFETs and the filter

inductor has been neglected during the steady-state analysis used in the development of (4.8) and (4.9).



Fig. 5.9. Simulation waveforms of a buck converter for C = 100 nF. (a) Output voltage ripple  $V_{ripple}(t)$ . (b) Output response of a buck converter to a change in load current from  $I_{min}$  to I. (c) Output response of a buck converter to a step current changing between  $I_{min}$  and I.

The response of the buck converter to changes in the current demand (between the minimum and the maximum) at the load has also been evaluated. A 10% output voltage window is allowed as the average current demand of the microprocessor swings from a minimum ( $I_{min}$ ) to a maximum (I). The minimum current demand  $I_{min}$  is caused by leakage current when the microprocessor is idle and is assumed to be 25% of the maximum current demand I [26], [30]. The waveforms illustrating the DC-DC converter output response for a current step from  $I_{min}$  to I are shown in Figs. 5.9b and 5.9c. As shown in Fig. 5.9b, the response time for the buck converter to settle within the allowed 10% voltage window after the microprocessor transitions to the maximum

current mode from the idle mode is 87 ns. One solution that provides a stable voltage to the microprocessor until the buck converter output settles within the 10% window is to use several high speed linear regulators distributed around the microprocessor die. These regulators are activated whenever the buck converter output voltage drops below the lower limit of the 10% window. The linear regulator circuits are intrinsically low efficiency voltage converters [for a detailed discussion of the linear DC-DC converters see Chapter 4]. These large current steps, however, do not occur frequently and the linear regulators are only active for a brief amount of time (a worst case time of 87 ns) until the buck converter output settles within the 10% voltage droop window. The overall impact of these linear regulators on the energy dissipation of the microprocessor is, therefore, small.

#### 5.4. Chapter Summary

An analysis of the power characteristics of a standard switching DC-DC converter topology, a buck converter, is provided in this chapter. A parasitic model of a buck converter is presented. With this model, a closed form expression for the total power dissipation of a buck converter is proposed. An analysis over a range of design parameters is evaluated, permitting the development of a design space for full integration of active and passive devices on the same die for a target CMOS technology.

Two major challenges for a monolithic switching DC-DC converter are the area occupied by the integrated filter capacitor and the effect of the parasitic impedance characteristics of the integrated inductor on the overall efficiency characteristics of a switching DC-DC converter. A high switching frequency is the key design parameter that enables the integration of a high efficiency buck converter on the same die as a dual- $V_{\rm DD}$  microprocessor.

It is shown that an optimum switching frequency and inductor current ripple pair that maximizes the efficiency of a buck converter exists for a target technology. The global maximum efficiency is 92% at a switching frequency of 114 MHz and a current

ripple of 9.5 amperes, assuming an 80-nm CMOS technology. The required filter capacitance and inductance at this operating point are 2083 nF and 104 pH, respectively.

The effects of reducing the filter capacitance due to the tight area constraints on a microprocessor die are examined. An efficiency of 88.4% is shown at a switching frequency of 477 MHz with a filter capacitance of 100 nF. The area occupied by the buck converter is 12.6 mm<sup>2</sup> and is dominated by the area of the integrated filter capacitor. The analytic model for the converter efficiency is within 2.4% of the simulation results at the target design point.

The output voltage ripple can be increased in a fully integrated DC-DC converter, offering the same 10% output voltage droop window as compared to a non-integrated DC-DC converter. It is shown that the maximum attainable efficiency increases by up to 7.9% as the output voltage ripple is increased from 5 mV to 25 mV. Similarly, the filter inductance and switching frequency required for maximizing the efficiency of a buck converter are reduced by 24% and 48.7%, respectively, with increasing  $\Delta V_{DD2}$ .

## Chapter 6

## Low Voltage Swing Monolithic DC-DC

## **Conversion**

In a typical non-integrated switching DC-DC converter, significant energy is dissipated in the parasitic impedances of the circuit board interconnect and among the discrete components of the regulator [26], [30], [101]. As the supply current of high performance microprocessors increases with technology scaling, the energy losses of the off-chip power generation and distribution increase, further degrading the efficiency of DC-DC converters. Integrating both the active and passive devices of a buck converter onto the same die as a dual-V<sub>DD</sub> microprocessor is proposed in Chapter 5 in order to improve efficiency, reduce manufacturing costs, and decrease the number of I/O pads dedicated for power delivery on the microprocessor die. A model is developed and an analysis is presented that describes a design space for full integration of active and passive devices onto the same die as a dual-V<sub>DD</sub> microprocessor.

As shown in Chapter 5, a high switching frequency is the key design parameter that enables the full integration of a high efficiency buck converter. At these high switching frequencies, the energy dissipated in the power MOSFETs and gate drivers dominates the total losses of a DC-DC converter [31]. The efficiency can, therefore, be improved by applying MOSFET power reduction techniques. A low swing MOSFET gate drive technique is proposed in this chapter that improves the efficiency of a DC-DC converter.

The model proposed in Chapter 5 provides an accurate representation of the parasitic losses of a full voltage swing buck converter (with an error of less than 2.4% as compared to simulation). The model proposed in Chapter 5, however, does not provide the flexibility to further optimize the efficiency of the buck converter by

varying the driver tapering factors and gate voltages of the power MOSFETs. The independent variables of the buck converter power expressions proposed in Chapter 5 are the switching frequency  $f_s$  and the inductor current ripple  $\Delta i$ . The buck converter model proposed in Chapter 5 assumes that the PMOS to NMOS width ratio within each MOSFET gate driver is two. Similarly, the tapering factor of the MOSFET gate drivers is assumed to be two, assuming a worst case energy efficiency analysis. The signal swing at all of the internal nodes of the buck converter is assumed to be full rail between ground and  $V_{DDI}$ . A more comprehensive parasitic model of a buck converter, that permits the individual optimization of the gate voltage swings and tapering factors, is necessary in order to achieve the objective of this chapter; that is to design a low voltage swing monolithic DC-DC converter with optimized efficiency for a specific CMOS technology.

A circuit model that permits the optimization of the input and output voltage swing and tapering factor of the power MOSFET gate drivers (in addition to the switching frequency and discrete component values) is described in this chapter. Closed form expressions that characterize the power consumption of a low voltage swing buck converter are presented. The gate voltages and tapering factors of the MOSFETs are included as independent parameters in the proposed model. With the buck converter energy model, a design space is presented which characterizes the integration of both active and passive devices onto the same die (assuming a 0.18 µm CMOS technology). Lowering the input and output voltage swing of the power MOSFET gate drivers is shown to be effective in enhancing the efficiency characteristics of a DC-DC converter.

An efficiency of 84.1% is demonstrated for a voltage conversion from 1.8 volts to 0.9 volts at the target design point for a full swing DC-DC converter. Expressions for estimating the efficiency of a full swing buck converter are within 0.3% of circuit simulation. It is shown that the power dissipation of a low swing DC-DC converter is reduced by 27.9% as compared to a full swing DC-DC converter. The maximum efficiency achieved with a low swing DC-DC converter is 88%, 3.9% higher than that achieved with a full swing DC-DC converter.

The chapter is organized as follows. The proposed variable voltage swing and tapering factor DC-DC converter circuit model and closed form expressions characterizing the average power dissipation of a buck converter are presented in Section 6.1. With this model, the efficiency characteristics of a low voltage swing buck converter are analyzed in Section 6.2. A summary of the research results presented in this chapter is provided in Section 6.3.

### 6.1. Circuit Model of a Low Voltage Swing Buck Converter

A circuit model has been developed to analyze the efficiency characteristics of a low swing buck converter. The proposed circuit model for the parasitic impedances of a buck converter is shown in Fig. 6.1.



Fig. 6.1. Parasitic impedances and transistor geometric sizes of a buck converter.

The power consumed by a buck converter is due to a combination of conduction losses caused by the parasitic resistive impedances and switching losses due to the parasitic capacitive impedances of the circuit components. As discussed in Chapter 5, the power consumed by the pulse width modulation feedback circuit and the integrated filter capacitor is typically small as compared to the power consumed by the power train (the power MOSFETs, MOSFET gate drivers, and the filter inductor). Therefore, only the power dissipation of the power train components is considered in the efficiency analysis.

The MOSFET related power losses are analyzed in Section 6.1.1. The MOSFET model used during the analysis is discussed in Section 6.1.2. An analysis of the filter inductor related losses is presented in Section 6.1.3.

#### **6.1.1.** MOSFET Power Dissipation

The total power loss of a MOSFET is a combination of conduction losses and dynamic switching losses. The conduction power is dissipated in the series resistance of the transistors operating in the active region. The dynamic power is dissipated each switching cycle while charging/discharging the gate oxide, gate-to-source/drain overlap, and drain-to-body junction capacitances of the MOSFETs.

As shown in Fig. 6.1, the buffers driving  $P_1$  have a ground voltage of  $V_{gp}$  where  $0 \le V_{gp} < (V_{DDI} + V_{tp})$ . The unit energy (per 1  $\mu$ m wide power MOSFET) dissipated in the drivers of  $P_1$ , assuming ap > (b+1), is

$$E_{PMOSdrivers} \cong \frac{1}{ap - b - 1} (bC_{0PMOS} + C_{0NMOS}) (V_{DD1} - V_{gp})^2,$$
 (6.1)

$$C_{0NMOS} = C_{ox0NMOS} + 2C_{gd0NMOS} + C_{gs0NMOS} + C_{db0NMOS},$$
(6.2)

$$C_{0PMOS} = C_{ox0PMOS} + 2C_{gd0PMOS} + C_{gs0PMOS} + C_{db0PMOS},$$
(6.3)

where  $C_{ox0}$ ,  $C_{gs0}$ ,  $C_{gd0}$ , and  $C_{db0}$  are the gate oxide, gate-to-source overlap, gate-to-drain overlap, and the drain-to-body junction capacitances, respectively, of a 1  $\mu$ m wide transistor, ap is the tapering factor of the buffers driving  $P_1$ , and b is the PMOS to NMOS transistor width ratio within each inverter (see Fig. 6.1).

The voltage swing at the gate of  $P_1$  is between  $V_{gp}$  and  $V_{DDI}$ . The dynamic energy dissipated during a full switching cycle to charge/discharge the parasitic capacitances of a 1  $\mu$ m wide P-type power transistor is

$$E_{P1} = \begin{bmatrix} (C_{ox0PMOS} + C_{gs0PMOS})(V_{DD1} - V_{gp})^2 + \\ 2C_{gd0PMOS}(-V_{DD1}V_{gp} + V_{DD1}^2 + \frac{V_{gp}^2}{2}) + C_{db0PMOS}V_{DD1}^2 \end{bmatrix}.$$
(6.4)

Combining (6.1), (6.4), and the conduction power dissipated by the effective series resistance of  $P_1$ , the total power dissipation related to  $P_1$  is

$$P_{P1TOTAL} = \frac{R_{0PMOS}}{W_{P1}} i_{rmsPMOS}^2 + W_{P1} E_{P1TOTALswitching} f_s, \qquad (6.5)$$

$$E_{P1TOTALswitching} = E_{P1} + E_{PMOSdrivers}, (6.6)$$

$$i_{rmsPMOS} = \sqrt{D(I^2 + \frac{\Delta i^2}{3})},\tag{6.7}$$

where  $R_{0PMOS}$  is the effective series resistance of a 1  $\mu$ m wide PMOS transistor,  $W_{PI}$  is the width of  $P_1$ ,  $f_s$  is the switching frequency of the buck converter, D is the duty cycle of the signal generated at Node<sub>1</sub> (see Fig. 6.1), I is the DC current supplied to the microprocessor, and  $\Delta i$  is the current ripple of the filter inductor.

As shown in Fig. 6.1, the buffers driving  $N_1$  have a supply voltage of  $V_{gn}$  ( $V_{tn} < V_{gn} \le V_{DDI}$ ). The unit energy (per 1  $\mu$ m wide power MOSFET) dissipated in these buffers, assuming an > (b+1), is

$$E_{NMOSdrivers} \cong \frac{1}{an - b - 1} (bC_{0PMOS} + C_{0NMOS}) V_{gn}^2, \tag{6.8}$$

where an is the tapering factor of the  $N_1$  gate drivers.

The voltage swing at the gate of  $N_1$  is between ground (0 volts) and  $V_{gn}$ . The dynamic energy dissipated during a full switching cycle to charge/discharge the parasitic capacitances of a 1  $\mu$ m wide N-type power transistor is

$$E_{N1} = \begin{bmatrix} (C_{ox0NMOS} + C_{gs0NMOS} + C_{gd0NMOS})V_{gn}^{2} + \\ (C_{gd0NMOS} + C_{db0NMOS})V_{DD1}^{2} \end{bmatrix}$$
(6.9)

Combining (6.8), (6.9), and the conduction power dissipated in the effective series resistance of  $N_1$ , the total power dissipation related to  $N_1$  is

$$P_{N1TOTAL} = \frac{R_{0NMOS}}{W_{N1}} i_{rmsNMOS}^2 + W_{N1} E_{N1TOTALswitching} f_s, \tag{6.10}$$

$$E_{N1TOTALswitching} = E_{N1} + E_{NMOSdrivers}, (6.11)$$

$$i_{rmsNMOS} = \sqrt{(1-D)(I^2 + \frac{\Delta i^2}{3})},$$
 (6.12)

where  $R_{0NMOS}$  is the effective series resistance of a 1  $\mu$ m wide NMOS transistor and  $W_{NI}$  is the width of  $N_1$ .

As given by (6.5) and (6.10), increasing the MOSFET transistor width reduces the conduction losses while increasing the switching losses. An optimum MOSFET width, therefore, exists that minimizes the total MOSFET related power. The optimum transistor widths for  $N_1$  and  $P_1$ , respectively, are

$$W_{Nlopt} = \sqrt{\frac{R_{0NMOS}i_{rmsNMOS}^2}{f_s E_{N1TOTALswitching}}},$$
(6.13)

$$W_{Plopt} = \sqrt{\frac{R_{0PMOS}i_{rmsPMOS}^2}{f_s E_{PlTOTALswitching}}}.$$
(6.14)

#### 6.1.2. MOSFET Model

A low swing MOSFET gate drive technique is investigated in this chapter to improve the efficiency of a DC-DC converter. At a reduced gate voltage, the effective series resistance of a MOSFET increases. As discussed in Section 6.1.1, conduction power dissipated in the series resistance of a power MOSFET constitutes a significant portion of the total MOSFET related power consumption in a buck converter (half of the total power dissipation of a power MOSFET with an optimized transistor width). An accurate MOSFET model is, therefore, required to evaluate the effective series resistance of the MOSFETs at each gate voltage within the range of analysis. The MOSFETs are modeled using the n<sup>th</sup> power law MOSFET model [91]. As shown in Fig. 6.2, the n<sup>th</sup> power law MOSFET model captures the dependence of the effective series resistance of the MOSFETs on the gate voltages. The worst case error of the model as compared to the simulation data is less than 10%.

#### 6.1.3. Filter Inductor Power Dissipation

Some portion of the total energy consumption of a buck converter occurs due to the series resistance and stray capacitance of the filter inductor. As shown in Chapter 5, the power dissipation in the integrated inductor dominates the total power losses of a buck converter at low switching frequencies.

The integrated filter inductor is a metal slab completely encapsulated by a magnetic material. The magnetic film surrounding the metal is an amorphous Cobalt-Tantalum-Zirconium (CoTaZr) alloy that exhibits a good high frequency response, small hysteresis losses, and can be integrated in a standard high temperature CMOS silicon process [119], [121].



Fig. 6.2. Variation of the effective series resistance of 1  $\mu m$  wide NMOS and PMOS transistors with gate-to-source voltage,  $V_{GS}$  ( $|V_{DS}| = 0.1$  volts).

In the following analysis it is assumed that the parasitic impedances of an integrated inductor scale linearly with the inductance (within the range of analysis) [121]. The total power dissipated in the filter inductor is

$$P_{inductor} = LR_{L0}i_{rms}^{2} + \frac{C_{L0}}{L}V_{DD1}^{2}f_{s}, \qquad (6.15)$$

$$L = \frac{(V_{DD1} - V_{DD2})D}{2\Delta i f_s},\tag{6.16}$$

where  $C_{L0}$  and  $R_{L0}$  are, respectively, the parasitic stray capacitance and parasitic series resistance per nH inductance and L is the filter inductance.

#### 6.2. Low Voltage Swing Buck Converter Analysis

The DC-DC converter provides 1.8 volts to 0.9 volts conversion while supplying 250 mA per phase DC current to the load in a 0.18  $\mu$ m CMOS technology. The tapering factors of the P<sub>1</sub> and N<sub>1</sub> drivers are treated as independent variables and ap and an are assumed to be equal (a = an = ap). PMOS to NMOS width ratio b within each MOSFET gate driver is assumed to be two. Using the model proposed in Section 6.1, the maximum efficiency attainable for each tapering factor ( $8 \le a \le 24$ ) is evaluated.

The switching frequency is the primary design variable used in the analysis. The efficiency of a buck converter is analyzed over the frequency range,  $10 \text{ MHz} \le f_s \le 1 \text{ GHz}$ . At each tapering factor, the maximum attainable efficiency is evaluated over the switching frequency range, varying the circuit configuration. Maximum efficiency circuit configurations determined by the model are simulated verifying the circuit operation and performance characteristics.

In the first part of the analysis, the ground voltage of the power PMOS drivers  $(V_{gp})$  and the power supply voltage of the power NMOS drivers  $(V_{gn})$  (see Fig. 6.1) are

fixed at 0 volts and 1.8 volts (full swing configuration), respectively. The maximum efficiency attainable with a full swing DC-DC converter is presented in Section 6.2.1. In the second part of the analysis,  $V_{gp}$  and  $V_{gn}$  are included as independent parameters of the global efficiency optimization process (low swing configuration). The maximum efficiency attainable with a low swing DC-DC converter is presented in Section 6.2.2.

#### 6.2.1. Full Swing Circuit Analysis for Global Maximum Efficiency

In the first part of the analysis,  $V_{gp}$  and  $V_{gn}$  (see Fig. 6.1) are fixed at 0 volts and 1.8 volts (full swing configuration), respectively. The maximum efficiency attainable with a full swing buck converter for each tapering factor is shown in Fig. 6.3. The global maximum efficiency attainable with a full swing DC-DC converter is 84.1% based on a tapering factor of 10. The switching frequency of the maximum efficiency configuration is 102 MHz. The analytic estimate of the efficiency for the full swing configuration is within 0.3% of the simulations.

The efficiency variation of a buck converter is shown in Fig. 6.4 for 10 mA  $\leq \Delta i$   $\leq$  250 mA and 10 MHz  $\leq f_s \leq$  500 MHz (a=10). The "z" axis represents the efficiency (%) in Fig. 6.4. Similar to the analysis made in Chapter 5 for an 80 nm CMOS technology, the efficiency of a buck converter is characterized by competing inductor and MOSFET losses. At low  $f_s$  and  $\Delta i$ , the buck converter power is primarily dissipated in the filter inductor. As the switching frequency and current ripple are increased, the inductance is dramatically reduced, lowering the parasitic losses of the inductor. The MOSFET power increases, however, with increasing  $f_s$  and  $\Delta i$ . At a certain range of  $f_s$  and  $\Delta i$  the inductor losses dominate the total losses. As shown in Fig. 6.4, the efficiency of a buck converter increases with increasing  $f_s$  and  $\Delta i$  in the range dominated by the inductor losses. After the peak efficiency is reached, increasing MOSFET losses begin to dominate the total power dissipation of a buck converter. Hence, the efficiency degrades with further increases in  $f_s$  and  $\Delta i$ . An

optimum switching frequency and inductor current ripple pair exists that maximizes the efficiency of a buck converter. The global maximum efficiency is 84.1% at a switching frequency of 102 MHz and a current ripple of 250 mA, assuming a 0.18 μm CMOS technology. The required filter inductance at this operating point is 8.8 nH.



Fig. 6.3. The maximum efficiency attainable with a full swing (FS) buck converter circuit for different tapering factors.

In the full swing maximum efficiency circuit configuration, 62% of the total buck converter power is dissipated in the power MOSFETs (P<sub>1</sub> and N<sub>1</sub>) and the MOSFET gate driver buffers while 38% of the total power dissipation occurs in the parasitic impedances of the filter inductor. As most of the buck converter energy is dissipated in the MOSFETs, MOSFET related power reduction techniques can be effective in enhancing the efficiency characteristics of a DC-DC converter.



Fig. 6.4. Efficiency of a full swing buck converter as a function of the switching frequency  $(f_s)$  and inductor current ripple  $(\Delta i)$ .

#### 6.2.2. Low Swing Circuit Analysis for Global Maximum Efficiency

In the second part of the analysis,  $V_{gp}$  and  $V_{gn}$  are included in the optimization process as independent variables. The effect of reducing the voltage swing of the MOSFET gate driver buffers is explored. For  $0 \le V_{gp} \le 1.2$  volts and 0.5 volts  $\le V_{gn} \le 1.8$  volts, an optimal choice of gate voltage is performed at each tapering factor a ( $8 \le a \le 24$ ).  $V_{gp}$ ,  $V_{gn}$ , the switching frequency  $f_s$ , filter inductance  $f_s$ , and the optimum MOSFET size of the maximum efficiency configurations are determined for each driver tapering factor  $f_s$ . Optimum  $f_s$ ,  $f_s$ , and transistor widths (of  $f_s$ ) and  $f_s$  that maximize efficiency for each  $f_s$  are shown in Figs. 6.5 and 6.6, respectively. The optimum circuit configurations obtained from the model are simulated to verify

operation. A comparison of the maximum efficiency attainable by a low swing DC-DC converter and a full swing DC-DC converter for each tapering factor is shown in Fig. 6.7.



Fig. 6.5. Optimum power supply voltage of the power NMOS drivers  $(V_{gn})$  and optimum ground voltage of the power PMOS drivers  $(V_{gp})$  that maximize the efficiency for different tapering factors.

The total power dissipation of the low swing buck converter is reduced by 27.9% as compared to the full swing maximum efficiency configuration by increasing  $V_{gp}$  from 0 to 0.64 volts and lowering  $V_{gn}$  from 1.8 to 1.13 volts. As shown in Fig. 6.7, the maximum efficiency for a low swing DC-DC converter is 88%, 3.9% higher than achieved with a full swing DC-DC converter. The tapering factor, switching frequency, and filter inductance of the full swing and low swing circuit configurations with the maximum efficiency characteristics are listed in Table 6.1.

The optimal circuit configurations with the highest efficiency characteristics change as the gate voltages are reduced from the full swing voltage. The effective series resistance of a MOSFET is increased while the total dynamic switching energy is decreased with reduced gate voltage. The optimum MOSFET width that minimizes the power dissipation, therefore, increases with a reduced gate voltage swing [as given by (6.13) and (6.14) and as shown in Fig. 6.6]. As shown in Fig. 6.8, the total transistor width of the power MOSFETs and gate drivers for the low swing circuit configuration with the highest efficiency is 23% larger as compared to the full swing circuit with the highest efficiency characteristics.



Fig. 6.6. A comparison of the optimum widths of the power PMOS and NMOS transistors that maximize the efficiency of the full swing (FS) and the low swing (LS) buck converters for different tapering factors.



Fig. 6.7. A comparison of the maximum efficiency attainable with the low swing (LS) and the full swing (FS) buck converter circuits for different tapering factors.

The proposed model does not include short-circuit currents in the MOSFET drivers. The model, therefore, produces an efficiency that increases monotonically with increasing a, as shown in Fig. 6.3. With increasing tapering factor, the dynamic switching power is reduced while the short-circuit currents increase [31], [120]. At a certain range of a, the dynamic switching energy losses dominate the total losses. As shown in Fig. 6.3, the efficiency of a buck converter increases with higher a in the range dominated by switching losses. After the peak efficiency is reached, the increasing short-circuit losses in the power MOSFET gate drivers begin to dominate the total power dissipation of the buck converter. Hence, the efficiency degrades with further increases in a. The optimum tapering factors are 10 and 16 for the full swing and low swing circuits, respectively.

TABLE 6.1 EFFICIENCY ( $\eta$ ) CHARACTERISTICS OF THE FULL SWING (FS) AND LOW SWING (LS) DC-DC CONVERTER CIRCUITS OBTAINED FROM THE POWER MODEL AND SIMULATION ( $V_{DD1}$  = 1.8 VOLTS AND C = 3 NF)

| 46.11                | FS<br>Model | FS<br>Simulation | LS<br>Simulation |
|----------------------|-------------|------------------|------------------|
| $V_{gp}(V)$          | 0           | 0                | 0.64             |
| V <sub>gn</sub> (V)  | 1.8         | 1.8              | 1.13             |
| f <sub>s</sub> (MHz) | 102         | 102              | 102              |
| L (nH)               | 8.8         | 8.8              | 8.8              |
| a                    | 10          | 10               | 16               |
| Maximum η (%)        | 84.4        | 84.1             | 88.0             |
| Power reduction      | N/A         | N/A              | 27.9%            |
| η difference         | +0.3%       | 0                | +3.9%            |

#### 6.3. Chapter Summary

A low voltage swing MOSFET gate drive technique is proposed in this chapter for enhancing the efficiency characteristics of high frequency monolithic DC-DC converters. An analysis of the power characteristics of a standard switching DC-DC converter topology, a buck converter, is provided. Closed form expressions for the total power dissipation of a low swing buck converter are proposed. A range of design parameters is evaluated, permitting the development of a design space for full

integration of active and passive devices onto the same die for a target CMOS technology.



Fig. 6.8. A comparison of the total transistor width (including the widths of the transistors within the gate drivers) of the low swing (LS) and the full swing (FS) buck converter circuits with the highest efficiency characteristics for different tapering factors.

The effect of reducing the MOSFET gate voltage swings is explored with the proposed circuit model. The optimum gate voltage swings of the power MOSFETs that maximize efficiency are shown to be lower than the standard full voltage swing. An efficiency of 84.1% is demonstrated for a voltage conversion from 1.8 volts to 0.9 volts with a full swing monolithic buck converter operating at 102 MHz assuming a 0.18 µm CMOS technology. It is shown that the power dissipation of a low swing buck converter is reduced by 27.9% as compared to this full swing maximum

efficiency configuration by increasing the ground voltage of the power PMOS drivers to 0.64 volts and lowering the power supply voltage of the power NMOS drivers to 1.13 volts. The maximum efficiency achieved with a low swing DC-DC converter is 88%, 3.9% higher than that achieved with a full swing DC-DC converter.

# Chapter 7

# **High Input Voltage Step-Down DC-DC Converters for Integration in a Low Voltage CMOS Process**

Microprocessors, with higher power consumption and lower supply voltages, demand greater amounts of current from external power supplies with each new technology generation. Voltages significantly higher than current board level voltages will become necessary to efficiently deliver greater levels of power to future high performance integrated circuits [31]. Distributing power at a higher voltage to the input pads of an integrated circuit reduces the supply current. At a reduced supply current, resistive voltage drops and parasitic power dissipation of the off-chip power distribution network is reduced, thereby enhancing the energy efficiency and quality of the distributed voltage [30], [31], [101]. Once the required energy reaches the input pads of a microprocessor, a lower supply voltage for the microprocessor circuitry can be generated by a monolithic DC-DC converter on the same die as the microprocessor, as shown in Fig. 7.1.

As discussed in Chapters 5 and 6, monolithic DC-DC conversion on the same die as the load provides several desirable aspects. In a typical non-integrated switching DC-DC converter (as shown in Fig. 7.2), significant energy is dissipated in the parasitic impedances of the circuit board interconnect and among the discrete components of the regulator [30], [31]. As microprocessor current demands increase, the energy losses of the off-chip power generation increase, further degrading the efficiency of the DC-DC converters. Integrating both the active and passive devices of a DC-DC converter onto the same die as a microprocessor improves energy efficiency, enhances the quality of the voltage regulation, and decreases the number of I/O pads

dedicated for power delivery on the microprocessor die. Furthermore, by employing an integrated circuit technology, the reliability of the voltage conversion circuitry can be enhanced, area can be reduced, and the overall cost of the DC-DC converter can be decreased as compared to a discrete DC-DC converter [30], [31].



Fig. 7.1. High voltage off-chip power delivery and on-chip DC-DC conversion.

Due to the advantages of high voltage power delivery on a circuit board and monolithic DC-DC conversion, next generation low voltage and high power microprocessors are likely to require high input voltage, large step down DC-DC converters monolithicly integrated onto the same die. The voltage conversion ratios attainable with standard non-isolated switching DC-DC converter circuits are, however, limited due to MOSFET reliability issues. In a standard buck converter circuit, as shown in Fig. 7.2, the input voltage V<sub>DD1</sub> is limited to the maximum voltage V<sub>max</sub> that can be directly applied across the terminals of a MOSFET in a specific fabrication technology. Provided that a DC-DC converter is integrated onto the same die as a microprocessor (fabricated in a low voltage deep submicrometer CMOS technology), the range of input voltages that can be applied to a standard DC-DC converter circuit is further reduced. A standard non-isolated switching DC-DC converter topology such as the standard buck converter circuit shown in Fig. 7.2 is, therefore, not suitable for future high performance integrated circuits. High efficiency

monolithic switching DC-DC converters that can generate very low operating voltages from a significantly higher board level distribution voltage in a scaled nanometer CMOS technology are highly desirable.



Fig. 7.2. Input voltage constraint in an off-chip buck converter circuit ( $V_{DD1} \leq V_{max}$ ).

A cascode bridge circuit that can be used in a monolithic switching DC-DC converter providing a high voltage conversion ratio is described in this chapter. This circuit can also be used as an I/O buffer to interface with circuits operating at significantly different voltages. The cascode circuit, when used as part of a voltage regulator, ensures that the voltages across the terminals of all of the MOSFETs in a DC-DC converter are maintained within the limits imposed by the available low voltage CMOS technologies. With the proposed circuit technique, high-to-low non-isolated switching DC-DC converters have been designed based on a 0.18 μm CMOS technology. An efficiency of 79.6% is demonstrated for a voltage conversion from 5.4 volts to 0.9 volts while supplying 250 mA of DC current.

The chapter is organized as follows. The proposed cascode bridge circuit is described in Section 7.1. The operation and simulation results of the two voltage converters based on the cascode bridge circuit technique are described in Section 7.2. A summary of the research results presented in this chapter is provided in Section 7.3.

#### 7.1. Cascode Bridge Circuit

A cascode bridge circuit is described in this section. The circuit can operate at input voltages higher than the maximum voltage ( $V_{max}$ ) that can be applied directly across the terminals of a MOSFET in a deep submicrometer low voltage CMOS technology. The proposed cascode bridge circuit is shown in Fig. 7.3.

The cascode bridge circuit generates an output signal swinging between ground and  $V_{DD1}$  ( $V_{DD1} > V_{max}$ ) from an input control signal swinging between ground and  $V_{DD4}$ . The cascode bridge circuit guarantees that the voltages applied between the gate-to-source, gate-to-drain, gate-to-body, drain-to-body, and source-to-body terminals of the MOSFETs do not exceed the maximum voltage difference  $V_{max}$  permitted in a CMOS technology. As shown in Fig. 7.3, the input supply voltage  $V_{DD1}$  can be as high as three times  $V_{max}$  while complying with steady state voltage constraints across the terminals of all of the MOSFETs.

In Fig. 7.3,  $V_{DD1} = 3V_{max}$ ,  $V_{DD3} = 2V_{max}$ , and  $V_{DD4} = V_{max}$ . The number of inverters that drive Node<sub>8</sub> is even while the number of inverters that drive Node<sub>6</sub> and Node<sub>10</sub> is odd. The proposed circuit behaves in the following manner. When the input control signal transitions low, Node<sub>8</sub> is pulled down to  $V_{max}$ . Node<sub>6</sub> is pulled up to  $3V_{max}$ , turning off  $P_1$ . Node<sub>10</sub> is pulled up to  $V_{max}$ , turning on  $N_1$ . The output transitions low through  $N_3$ ,  $N_2$ , and  $N_1$ . Node<sub>2</sub> and Node<sub>1</sub> are discharged to  $V_{max} + |V_{tp}|$  and  $2V_{max} + |V_{tp}|$ , respectively.

When the input control signal transitions high, Node<sub>8</sub> is pulled up to  $2V_{max}$ . Node<sub>10</sub> is pulled down to ground, cutting off N<sub>1</sub>. Node<sub>6</sub> is pulled down to  $2V_{max}$ , turning on P<sub>1</sub>. The output is pulled up to  $3V_{max}$  through P<sub>1</sub>, P<sub>2</sub>, and P<sub>3</sub>. Node<sub>4</sub> and Node<sub>5</sub> are charged to  $2V_{max}$ -V<sub>tn</sub> and V<sub>max</sub>-V<sub>tn</sub>, respectively. The source and body terminals of P<sub>2</sub>, P<sub>3</sub>, N<sub>2</sub>, and N<sub>3</sub> are shorted to ensure that the maximum permitted source-to-body and drain-to-body junction reverse bias voltages and the maximum permitted gate-to-body voltage are not exceeded. With the proposed circuit technique, voltage differences between the terminals of all of the MOSFETs satisfy the steady-

state voltage constraints dictated by a low voltage process technology while operating at high input and output voltages.



Fig. 7.3. Cascode bridge circuit operating at an input supply voltage of  $V_{DD1} = 3V_{max}$  ( $V_{DD3} = 2V_{max}$  and  $V_{DD4} = V_{max}$ ).

#### 7.2. Large Step-Down Non-Isolated Switching DC-DC Converter

A step-down DC-DC converter based on the cascode bridge circuit is presented in this section. The operation of the DC-DC converter circuit is described in Section 7.2.1. Simulation results characterizing the maximum efficiency circuit configurations are presented in Section 7.2.2. A charge recycling mechanism in the cascode bridge circuit that significantly reduces the energy overhead of the proposed circuit technique is discussed in Section 7.2.3.

#### 7.2.1. Operation of the Cascode DC-DC Converter

A high-to-low DC-DC converter operating at an input supply voltage of  $3V_{max}$  is shown in Fig. 7.4. The operation of the DC-DC converter circuit behaves in the following manner. The pull-up  $(P_1, P_2, \text{ and } P_3)$  and pull-down  $(N_1, N_2, \text{ and } N_3)$ 

network transistors produce an AC signal at Node<sub>3</sub> by a switching action controlled by a pulse width modulator. The AC signal at Node<sub>3</sub> is applied to a second order low pass filter composed of an inductor and a capacitor. The low pass filter passes the DC component of the AC signal at Node<sub>3</sub> to the output. A small amount of high frequency harmonics (assuming the filter corner frequency is much smaller than the switching frequency  $f_s$  of the DC-DC converter) generated by the switching action of the MOSFETs also reaches the output due to the non-ideal characteristics of the output filter.



Fig. 7.4. Proposed DC-DC converter circuit operating at an input supply voltage of  $V_{DD1} = 3V_{max}$  ( $V_{DD3} = 2V_{max}$ ,  $V_{DD4} = V_{max}$ , and  $V_{DD2} < V_{DD1}$ ).

The output voltage  $V_{DD2}(t)$  is

$$V_{DD2}(t) = V_{DD2} + V_{ripple}(t),$$
 (7.1)

where  $V_{DD2}$  is the DC component of the output voltage and  $V_{ripple}(t)$  is the voltage ripple waveform observed at the output due to the nonideal characteristics of the output filter.

The DC component of the output voltage is

$$V_{DD2} = \frac{1}{T_s} \int_{0}^{T_s} V_s(t) dt = DV_{DD1},$$
 (7.2)

where  $V_s(t)$  is the AC signal generated at Node<sub>3</sub> and  $T_s$ , D, and  $V_{DDI}$  are the period, duty cycle, and amplitude, respectively, of  $V_s(t)$ . As given by (7.2), any positive DC output voltage less than  $V_{DDI}$  can be generated by the proposed DC-DC converter by varying the switching duty cycle D of the pull-up and pull-down network transistors.

#### 7.2.2. Efficiency Characteristics

Two high-to-low DC-DC converters have been designed based on the cascode bridge circuit. The maximum voltage that can be applied across the terminals of a MOSFET ( $V_{max}$ ) for a specific 0.18 µm CMOS technology is 1.8 volts. The DC-DC converter shown in Fig. 7.4 provides 5.4 volts ( $3V_{max}$ ) to 0.9 volts ( $V_{max}$ /2) conversion while supplying 250 mA per phase DC current to the load.

Another DC-DC converter circuit has been designed for 4.5 volts  $(2.5V_{max})$  to 0.9 volts  $(V_{max}/2)$  conversion using a similar circuit topology as shown in Fig. 7.4. For the  $2.5V_{max}$  to  $V_{max}/2$  conversion,  $V_{DD3}$  and  $V_{DD4}$  are  $1.7V_{max}$  and  $0.8V_{max}$ , respectively, in order to enhance the gate drive of  $P_1$ ,  $P_2$ , and  $P_3$  and to further reduce the voltage stresses across the terminals of  $N_1$ ,  $N_2$ , and  $N_3$ .

Both DC-DC converters have been simulated assuming a 0.18 µm CMOS technology. Circuit parameters are optimized to maximize efficiency while satisfying the output voltage and current requirements. The optimized circuit configurations offering the highest efficiency are listed in Table 7.1.

TABLE 7.1

CIRCUIT CHARACTERISTICS OF THE

MAXIMUM EFFICIENCY DC-DC CONVERTERS

| Conversion Ratio            | $2.5 \text{ V}_{\text{max}} \rightarrow \text{ V}_{\text{max}}/2$ $(5:1)$ | $3Vmax \rightarrow V_{max}/2$ (6:1) |
|-----------------------------|---------------------------------------------------------------------------|-------------------------------------|
| V <sub>DD1</sub> (volts)    | 4.5                                                                       | 5.4                                 |
| V <sub>DD2</sub> (volts)    | 0.9                                                                       | 0.9                                 |
| V <sub>DD3</sub> (volts)    | 3.0                                                                       | 3.6                                 |
| V <sub>DD4</sub> (volts)    | 1.5                                                                       | 1.8                                 |
| I <sub>out</sub> (mA)       | 250                                                                       | 250                                 |
| Max η (%)                   | 79.4                                                                      | 79.6                                |
| f <sub>s</sub> (MHz)        | 97                                                                        | 97                                  |
| C (nF)                      | 3                                                                         | 3                                   |
| L (nH)                      | 14.8                                                                      | 15.5                                |
| W <sub>N1</sub> (mm)        | 5.2                                                                       | 5.3                                 |
| W <sub>P1</sub> (mm)        | 7.2                                                                       | 4.8                                 |
| Total Transistor Width (mm) | 41.3                                                                      | 34.1                                |
| I <sub>VDD3</sub> (μA)      | 7.6                                                                       | -178                                |
| I <sub>VDD4</sub> (μA)      | 205                                                                       | -186                                |

As listed in Table 7.1, an efficiency of 79.6% is achieved with the proposed DC-DC converter circuit for 5.4 volts to 0.9 volts conversion. The circuit operates at a switching frequency of 97 MHz. The filter capacitor and inductor of this maximum efficiency circuit configuration are 3 nF and 15.5 nH, respectively. Similarly, an efficiency of 79.4% is observed for 4.5 volts to 0.9 volts conversion. The parasitic

energy dissipation within the DC-DC converter is greater as the parasitic series resistances of the MOSFETs increase when the input supply voltage  $V_{DD1}$  is reduced from 5.4 volts to 4.5 volts. The efficiency achievable with the proposed DC-DC converter circuit is, therefore, slightly reduced when the conversion ratio is decreased from 6:1 to 5:1.

#### 7.2.3. Charge Recycling Mechanism

These high efficiencies achieved for such high voltage conversion ratios (6:1 and 5:1) are attributed to a charge recycling mechanism that exists in the cascode bridge circuit. At any time during a state changing transition of the pulse width modulator output (irrespective of the direction of the transition), a portion of the charge required by the inverters to drive Node<sub>8</sub> originates from discharging the parasitic capacitances of the gate drivers of  $P_1$  rather than from the power supply  $V_{DD3}$ . Similarly, a significant portion of the charge required by the gate drivers of  $N_1$  for a low-to-high output transition originates from discharging the output parasitic capacitances of the gate drivers of  $P_3$  and  $P_3$  rather than from the power supply  $V_{DD4}$ . Most of the charge drawn from  $V_{DD1}$  during the low-to-high output transition of the buffers driving  $P_1$  is initially recycled for use inside the buffers driving Node<sub>8</sub>. This charge is eventually recycled for use within the buffers driving  $P_1$  before finally being discharged to ground.

As listed in Table 7.1, the average current drawn from  $V_{DD3}$  and  $V_{DD4}$  is significantly smaller than the load current. The energy overhead of the two extra reference voltages required to properly operate the cascode bridge circuit is, therefore, small.  $V_{DD3}$  and  $V_{DD4}$  can be generated by simple integrated linear regulators without significantly affecting the overall efficiency of the DC-DC converters. For 5.4 volts to 0.9 volts conversion, the average current supplied by  $V_{DD3}$  and  $V_{DD4}$  are negative, meaning that the two extra power supplies essentially sink rather than supply current. For 4.5 volts to 0.9 volts conversion, the average current supplied by  $V_{DD3}$  and  $V_{DD4}$  are 7.6  $\mu$ A and 205  $\mu$ A, respectively. The primary purpose of  $V_{DD3}$  and  $V_{DD4}$  is,

therefore, to maintain the voltages at Node<sub>7</sub> and Node<sub>9</sub> at  $2V_{max}$  and  $V_{max}$  (1.7 $V_{max}$  and 0.8 $V_{max}$  for 4.5 volts to 0.9 volts conversion), respectively, rather than supplying current to the switching gate drivers.

#### 7.3. Chapter Summary

A cascode bridge circuit for use in a monolithic switching DC-DC converter with a high voltage conversion ratio is described in this chapter. The circuit can also be used as an I/O buffer to interface circuits operating at significantly different voltages without creating any MOSFET reliability issues due to the high voltage stress. The presented circuit, when used as part of a voltage regulator, ensures that the voltages across the terminals of all of the MOSFETs in a monolithic DC-DC converter are maintained within the limits imposed by available low voltage CMOS technologies.

High-to-low DC-DC converters have been designed based on the cascode bridge circuit. Reliable operation of the DC-DC converters operating at an input supply voltage up to three times as high as the maximum voltage  $(V_{max})$  that can be directly applied across the terminals of a MOSFET is verified assuming a 0.18  $\mu$ m CMOS technology. The energy overhead of the presented circuit technique is low due to a charge recycling mechanism in the MOSFET gate drivers. An efficiency of 79.6% is demonstrated for a voltage conversion from 5.4 volts to 0.9 volts while supplying 250 mA of DC current.

## **Chapter 8**

# Signal Transfer in Integrated Circuits with Multiple Supply Voltages

As discussed in Chapter 3, in a multiple supply voltage circuit, blocks that are required to operate at high speed utilize a higher supply voltage while those blocks for which speed is less critical operate at a lower supply voltage. Moreover, due to timing constraints, circuits operating at different supply voltages can exist on the non-critical delay paths in a multiple supply voltage circuit. When a low swing signal drives a CMOS circuit supplied by full rail supply and ground voltages, static DC power is dissipated as the transistors in both the pull-up and pull-down networks are simultaneously turned on. The output voltage swing of a static CMOS gate driven by low swing input signals also degrades. In order to transfer signals among these circuits operating at different voltage levels, specialized voltage interface circuits are required as illustrated in Fig. 8.1 [32].



Fig. 8.1. Signal transfer between circuit blocks in a multiple supply voltage integrated circuit.

Another application which requires specialized voltage level converters is the transmission and reception of low swing signals along long interconnects. At each new

IC generation, the relative amount of interconnect increases due to the greater number of transistors and the larger die size. In many recent systems, charging and discharging these interconnect lines can require more than 50% of the total power consumed on-chip [125], [126]. In certain programmable logic devices, more than 90% of the total power consumption is due to the interconnect wires [125]. As described in [124]-[126], decreasing the signal voltage swing on the interconnect can significantly lower the power consumed.

A low swing interconnect architecture [126] is shown in Fig. 8.2. In this scheme, the circuit blocks operate at a high voltage for high throughput, while a low voltage swing signal is transmitted along the interconnect to decrease the power consumption. Voltage level converters are placed at the driver and receiver ends of this low swing interconnect architecture to change the voltage swing.



Fig. 8.2. Circuit architecture for low swing interconnect.

A level converter circuit must consume very low power in order to fully exploit the reduced power achieved by lowering the voltage. In order to not degrade the circuit operating speed, the voltage interface circuit must convert the input signal swing to the desired output signal swing with minimum delay [122], [124]. A simple CMOS interface circuit composed of two cascaded inverters is a standard circuit approach for converting voltage levels [122]-[126]. This circuit suffers from static power consumption and a non-full rail output voltage swing when converting a low

voltage swing input to a high voltage swing output (such as the receiver end shown in Fig. 8.2) [32], [123], [126]. Specialized circuits are therefore required to efficiently convert voltage levels.

A bi-directional CMOS voltage interface circuit that drives high capacitive loads to full swing at high speed while consuming no static DC power is presented in this chapter. The propagation delay, power consumption, and power efficiency characteristics of the proposed voltage interface circuit are compared to other interface circuits described in the literature [122], [123], [125], [126]. The proposed voltage interface circuit offers significant power savings and lower propagation delay as compared to these circuits.

This chapter is organized as follows. Operation of the proposed interface circuit is described in Section 8.1. Simulation results and a comparison with other converter circuits are presented in Section 8.2. Results from experimental test circuits are presented in Section 8.3. A summary of the key points presented in this chapter is provided in Section 8.4.

#### 8.1. A High Speed and Low Power Voltage Interface Circuit

The voltage interface circuit proposed in this chapter is shown in Fig. 8.3. The circuit provides bi-directional voltage level conversion. Therefore, without any change in circuit configuration, the interface circuit can be used at both the driver and receiver ends of a low voltage swing circuit architecture (see Fig. 8.2) to convert voltage levels from high to low and low to high.

In the proposed interface circuit,  $P_1$  is isolated from the input to minimize both the static power consumption and the propagation delay. As the pull-up and pull-down networks are never simultaneously on, the proposed voltage interface circuit consumes no static DC power while driving high capacitive loads to full swing  $(V_{DD2})$  at high speed.

In this circuit, only  $I_1$  is supplied by  $V_{DD1}$ . The rest of the circuit (to the right of the demarcation line) is supplied by  $V_{DD2}$ . The circuit operates in the following

manner. With a  $0 \to 1$  transition at the input, node<sub>2</sub> is discharged through  $N_1$ .  $P_2$  ensures that  $P_1$  is cutoff, and  $I_2$  ensures that  $P_3$  is cutoff during the output transition, so that the short-circuit power consumption and output transition time are minimized. When node<sub>2</sub> becomes sufficiently low, the output transitions high. With a  $1 \to 0$  transition at the input, node<sub>1</sub> goes high. Node<sub>4</sub> is pulled down to ground through  $N_2$  and  $N_3$  ( $N_2$  is on before the input signal changes). As node<sub>4</sub> is discharged to ground,  $P_1$  turns on, charging node<sub>2</sub>. When node<sub>2</sub> is sufficiently high, the output signal transitions low. There is a negative feedback path from node<sub>3</sub> to node<sub>4</sub> to node<sub>2</sub> through  $I_2$ ,  $P_4$ , and  $P_1$ .  $P_3$  preserves the output state after  $P_1$  is cutoff through the feedback path.



Fig. 8.3. The proposed voltage interface circuit.

#### 8.2. Voltage Interface Circuit Simulation Results

The voltage interface circuit proposed here is compared to selected voltage interface circuits published in the literature [122], [123], [125], [126]. These circuits are referred to by acronyms derived from the first letters of the last names of the authors who proposed the circuits. The circuit proposed in [122] (SF), the circuit

proposed in [123] (CQ), and the circuit proposed here (KSF) are non-inverting while the asymmetric level converter circuit introduced in [125] (ZGR), and the symmetric level converter circuit introduced in [126] (NIITA) are inverting. To produce a fair comparison, an inverter is added to the output stages of ZGR and NIITA. The output stage inverter of each voltage interface circuit is sized the same.

Simulations are performed for a 0.18 µm CMOS technology. The two voltage levels are 1.8 volts and 3.3 volts. The simulations have only been carried out for level conversion from low to high since CQ, ZGR, and NIITA have been designed specifically for low-swing-to-high-swing voltage conversion. The input signal applied to each interface circuit is a 1 MHz square wave signal with a 1.8 volt swing and a 50% duty cycle. The input to output propagation delay is calculated from 50% of the input swing to 50% of the output swing. The average delay is the arithmetic mean of the high-to-low and low-to-high propagation delays. The average power consumption is calculated for a full cycle of the input waveform.

Each circuit is optimized to drive a 15 pF load. The load at the output of each interface circuit is swept from 1 pF to 15 pF in order to evaluate the delay and power characteristics. The average propagation delay versus load capacitance characteristics for each of the circuits are shown in Fig. 8.4. The average power consumption versus load capacitance are shown in Fig. 8.5.

The voltage interface circuit proposed here exhibits the minimum conversion delay among the target interface circuits. As shown in Fig. 8.4, KSF is 3.6 times faster than CQ, 1.9 times faster than SF, 1.2 times faster than ZGR, and 1.9 times faster than NIITA for a 1 pF load capacitance. The propagation delay of ZGR approaches the propagation delay of KSF with increasing load capacitance. However, ZGR displays poor power characteristics as compared to KSF.

The high speed operation of KSF produces no power penalty. Rather, as shown in Fig. 8.5, the proposed voltage interface circuit offers a significant power savings. KSF reduces the average power consumption by up to 57% as compared to CQ, by up to 24% as compared to SF, by up to 95% as compared to ZGR, and by up to 12% as compared to NIITA.



Fig. 8.4. Average delay versus load capacitance.



Fig. 8.5. Average power versus load capacitance.

To better understand the power characteristics of these voltage interface circuits, the power efficiency (defined as the ratio of the power delivered to the load to the total power consumed by the circuit) characteristics are shown in Fig. 8.6. The normalized area, maximum frequency of full swing operation (MFSO), and internal power consumption (excluding the power delivered to the load,  $C_L = 1$  pF) of each target circuit are listed in Table 8.1. The circuit area is evaluated assuming the area is proportional to the total transistor width. The area of each circuit is normalized with respect to the smallest circuit (NIITA). MFSO is defined as the maximum input signal frequency at which a full swing signal is observable at the output for a 1 pF load capacitance.



Fig. 8.6. Power efficiency versus load capacitance.

As shown in Fig. 8.6, the internal losses of the KSF circuit are quite small. The power efficiency of KSF ranges from 89.3% to 99.4% as the load is increased from 1

pF to 15 pF. The power efficiency of KSF is 10.3% higher than the power efficiency of NIITA for a 1 pF load. As the load capacitance is increased, the power efficiency of KSF and NIITA both improve and approach each other ( $\sim$ 1% difference) since the internal losses of both circuits become negligible as compared to the power delivered to the load. However, the internal power loss of KSF is significantly lower than the internal power loss of NIITA over the entire range of load capacitances (55% lower for  $C_L = 1$  pF and 47% lower for  $C_L = 15$  pF).

The power consumed by each circuit increases linearly with the load capacitance (see Fig. 8.5). The internal losses of CQ and SF are primarily due to the short-circuit current at the output stage during the output signal transition. As the load capacitance increases, the output transition requires additional time, increasing the short-circuit current. Therefore, the slopes of the CQ and SF power curves are higher as compared to the other circuits. The worsening short-circuit power loss of SF degrades the efficiency as the load increases above 5 pF (see Fig. 8.6). ZGR suffers from significant static DC power loss when the input signal is high, therefore ZGR has the lowest power efficiency (the highest internal power loss).

TABLE 8.1 NORMALIZED AREA, MFSO, AND AVERAGE INTERNAL POWER CONSUMPTION OF EACH VOLTAGE INTERFACE CIRCUIT ( $C_L = 1$  PF)

| Circuit     | Area<br>(normalized) | MFSO<br>(MHz) | Power<br>(μW) |
|-------------|----------------------|---------------|---------------|
| SF [122]    | 2.8                  | 240           | 4.5           |
| CQ [123]    | 2.1                  | 200           | 17.8          |
| ZGR [125]   | 1.6                  | 590           | 257.1         |
| NIITA [126] | 1.0                  | 380           | 2.9           |
| KSF [32]    | 1.3                  | 610           | 1.3           |

As listed in Table 8.1, the proposed voltage interface circuit KSF occupies a small amount of area (second smallest) and offers the highest operating frequency range. KSF is operational up to an input frequency of 610 MHz (when driving a 1 pF output load). The MFSO is not directly related to the average delay shown in Fig. 8.4 since the MFSO is determined by the longest input to output full rail delay (rising or falling) of each circuit.

#### 8.3. Experimental Results

The interface circuit has been fabricated in a 3 µm CMOS technology. A microphotograph of the circuit is shown in Fig. 8.7.



Fig. 8.7. Microphotograph of the interface circuit.

The circuit has been experimentally evaluated with 5 volt and 10 volt power supplies. To verify the bi-directional operation of the circuit, the circuit has been evaluated for both low-to-high and high-to-low voltage interfaces. The experimental results are listed in Table 8.2. The waveforms obtained from the circuit tests are shown in Fig. 8.8 (the time axis is 500 ns/division, and the voltage axis is 5 volts/division).

The functional operation of the proposed interface circuit has also been experimentally verified. The propagation delays listed in Table 8.2 are higher than the

simulation results (see Fig. 8.4) due to the voltage level [(1.8 volts and 3.3 volts) versus (5 volts and 10 volts)] and feature size (3  $\mu$ m versus 0.18  $\mu$ m) differences.



Fig. 8.8. Experimentally derived input and output voltage waveforms of the proposed voltage interface circuit. (a)  $10 \text{ V} \rightarrow 5 \text{ V}$  interface. (b)  $5 \text{ V} \rightarrow 10 \text{ V}$  interface.

TABLE 8.2 EXPERIMENTALLY MEASURED TEST RESULTS

| Voltage<br>Levels                      | Output $1 \rightarrow 0 \text{ (ns)}$ | Output $0 \rightarrow 1 \text{ (ns)}$ |
|----------------------------------------|---------------------------------------|---------------------------------------|
| $10 \text{ V} \rightarrow 5 \text{ V}$ | 190                                   | 80                                    |
| $5 \text{ V} \rightarrow 10 \text{ V}$ | 120                                   | 70                                    |

As listed in Table 8.2, the high-to-low propagation delay is longer than the low-to-high propagation delay for both the 5 V  $\rightarrow$  10 V and 10 V  $\rightarrow$  5 V interfaces. The critical node that determines the output transition time is node<sub>2</sub>. After a 0  $\rightarrow$  1 transition at the input, the time to discharge node<sub>2</sub> only depends upon the response time of N<sub>1</sub>. Alternatively, after a 1  $\rightarrow$  0 transition at the input, the time to charge node<sub>2</sub> depends upon the delay along the path I<sub>1</sub>, N<sub>3</sub>, N<sub>2</sub>, and P<sub>1</sub>; thus causing a longer delay.

#### 8.4. Chapter Summary

A bi-directional CMOS voltage interface circuit for signal transfer between circuits operating at different voltage levels is presented in this chapter. The circuit can also be used at the driving and receiving ends of long interconnect lines so as to lower the power consumption by propagating a smaller voltage swing signal along the line. Up to a 3.6 times delay improvement and up to a 95% power reduction are observed as compared to previously published schemes. The proposed voltage interface circuit operates at high speed while consuming no static DC power.

## Chapter 9

# Domino Logic with Variable Threshold Voltage Keeper

Domino logic circuit techniques are extensively applied in high performance microprocessors due to the superior speed and area characteristics of domino CMOS circuits as compared to static CMOS circuits [33], [34]. High speed operation of domino logic circuits is primarily due to the lower noise margins of domino circuits as compared to static gates. This desirable property of a lower noise margin, however, makes domino logic circuits highly sensitive to noise as compared to static gates. As on-chip noise becomes more severe with technology scaling and increasing operating frequencies, error free operation of domino logic circuits has become a major challenge [33], [34], [128]-[130].

Threshold voltage reduction accompanies supply voltage scaling, providing enhanced speed while maintaining dynamic power consumption within acceptable levels in each new integrated circuit technology generation. Scaling the threshold voltage, however, degrades the noise immunity of domino logic gates [33], [34]. Moreover, exponentially increasing subthreshold leakage currents with reduced threshold voltages have become an important issue threatening the reliable operation of deep submicrometer (DSM) dynamic circuits [33], [34], [128]-[130].

In a standard domino logic gate, a feedback keeper is employed to maintain the state of the dynamic node against coupling noise, charge sharing, and subthreshold leakage current. The keeper transistor is fully turned on at the beginning of the evaluation phase. Provided that the necessary input combination to discharge the dynamic node is applied, the keeper and pull-down network transistors compete to determine the logical state of the dynamic node. This contention between the keeper and the pull-down network transistors degrades the circuit speed and power

characteristics. The keeper transistor is typically sized smaller than the pull-down network transistors in order to minimize the delay and power degradation due to the keeper contention current. A small keeper, however, cannot provide the necessary noise immunity for reliable operation in an increasingly noisy and noise sensitive on-chip environment [128]-[130]. There is, therefore, a tradeoff between reliability and high speed/energy efficient operation in domino logic circuits.

A variable threshold voltage keeper circuit technique is proposed in this chapter for simultaneous power reduction and speed enhancement of domino logic circuits. The current drive of the keeper transistor is dynamically adjusted with the proposed circuit technique. The threshold voltage of the keeper transistor is modified during circuit operation to reduce the contention current without sacrificing noise immunity. The variable threshold voltage keeper circuit technique is shown to enhance circuit evaluation speed by up to 60% while reducing power dissipation by 35% as compared to a standard domino logic circuit. The keeper size can be increased while preserving the same delay or power characteristics as compared to a standard domino circuit since the contention current is reduced with the proposed technique. The proposed domino logic circuit technique offers 14.1%, 8.9%, or 11.9% higher noise immunity under the same delay, power, or power-delay product conditions, respectively, as compared to a standard domino logic circuit technique. Forward body biasing the keeper transistor is also proposed for improved noise immunity as compared to a standard domino circuit with the same keeper size. It is shown that by applying forward and reverse body bias circuit techniques, the noise immunity and evaluation speed of domino logic circuits are both enhanced.

Challenges in the design of standard domino logic (SD) circuits are reviewed in Section 9.1. The operation of the domino logic with a variable threshold voltage keeper (DVTVK) circuit technique is described in Section 9.2. Simulation results characterizing the delay, power, and noise immunity of the DVTVK technique as compared to SD are presented in Section 9.3. Dynamically forward body biased keeper circuit technique for enhanced noise immunity is described in Section 9.4. The research results presented in this chapter are summarized in Section 9.5.

### 9.1. Standard Domino Logic Circuits

Performance critical paths in high performance integrated circuits are often implemented with domino logic circuits. Although domino logic circuit techniques are preferable in high speed circuits, the reliability of domino circuits is seriously degraded in nanometer technologies. The operating principles of domino logic circuits are reviewed in this section. Reliability issues threatening the correct operation of domino logic circuits together with some promising solutions recently proposed in the literature are reviewed. The basic operation of a standard domino logic circuit is described in Section 9.1.1. The noise immunity, signal delay, and energy dissipation tradeoffs in domino logic circuits are discussed in Section 9.1.2.

### 9.1.1. Operation of Standard Domino Logic (SD) Circuits

A standard footed domino gate is shown in Fig. 9.1a. Domino circuits behave in the following manner. When the clock signal is low, the domino logic circuit is in the precharge phase. During this phase, the dynamic node is charged to  $V_{DD1}$  by the pull-up transistor. The output transitions low, turning on the keeper transistor. When the clock transitions high, the circuit enters the evaluation phase. In this phase, provided that the necessary input combination to discharge the dynamic node is applied, the circuit evaluates and the dynamic node is discharged to ground. If the circuit does not evaluate in the evaluation phase, the high state of the dynamic node is preserved against coupling noise, charge sharing, and subthreshold leakage current by the keeper transistor until the pull-up transistor is turned on at the beginning of the following precharge phase.

The foot transistor (see Fig. 9.1) controlled by the clock signal divides the operation of a domino logic circuit into two distinct phases independent of the timing of the input signals. The isolation of the pull-down network from ground in the precharge phase eases the relative timing of the input and clock signals in cascaded multi-stage footed domino circuits. If the necessary input combination to discharge the

dynamic node is applied during the precharge phase, the pull-down transistors cannot alter the state of the dynamic node as the pull-down path to ground is blocked by the foot transistor.



Fig. 9.1. Domino gates with standard keeper transistors. (a) Standard footed domino gate. (b) Standard clock-delayed footless domino logic circuit.

The foot transistor is a switch in series between the pull-down network and ground. This transistor has a non-zero resistance and parasitic capacitance that degrades the evaluation speed of a domino circuit. The foot transistor is typically sized significantly larger than the pull-down network transistors to minimize this speed degradation. Increasing the size of the foot transistor, however, increases the power dissipation since the foot transistor switches every clock cycle. Provided that the clock signal is appropriately delayed, the foot transistors can be omitted in a cascaded multistage domino circuit (as shown in Fig. 9.1b), reducing both the circuit evaluation delay and the power dissipation. The clock signal is intentionally delayed from one stage to the next stage in order to ensure that no short-circuit current path from the power supply to ground exists (formed by the pull-up and pull-down network transistors being simultaneously turned on). The clock signal driving a footless domino gate is delayed to transition low only after the previous stage domino gates are all precharged and the inputs to the footless domino gate are all low. Similarly, the inputs to a footless domino gate should transition high only after the clock signal at the gate transitions high and the evaluation phase begins [127]. Although more strict timing of the input and clock signals is required, the overall delay and power characteristics of a footless domino circuit are enhanced as compared to a standard footed domino circuit. Footless domino circuits are, therefore, increasingly popular in high speed integrated circuits [127]. Since the clock signal driving each domino gate is delayed, a multistage footless domino circuit is often categorized as a clock-delayed or delayed-reset domino circuit. Note that a first stage domino gate in a multi-stage clock-delayed domino circuit is typically footed as shown in Fig. 9.1b.

### 9.1.2. Noise Immunity, Delay, and Energy Tradeoffs in Domino Logic Circuits

As described in Section 9.1.1, the keeper transistor is fully turned on as the output goes low during the precharge phase. When the clock signal transitions high, the pull-up transistor turns off and the keeper transistor provides the only conductive path between the dynamic node and  $V_{DD1}$ , preserving the logical state of the dynamic node

in the evaluation phase. Provided that the necessary input combination to discharge the dynamic node is applied during the evaluation phase, the keeper transistor opposes the evaluation of the inputs, degrading the speed and power characteristics of standard domino logic circuits. Current provided by the keeper transistor to charge the dynamic node while the pull-down network transistors are attempting to discharge the dynamic node is called contention current.

The effect of the keeper transistor on the noise immunity, evaluation delay, and power characteristics of domino logic circuits are evaluated assuming a  $0.18~\mu m$  CMOS technology. The low noise margin (NML) is the noise immunity metric used in this chapter. The NML is defined as

$$NML = V_{IL} - V_{OL}, \tag{9.1}$$

where  $V_{IL}$  is the input low voltage defined as the smaller of the DC input voltages on the voltage transfer characteristic (VTC) at which the rate of change of the dynamic node voltage with respect to the input voltage is equal to one (the unity gain point on the VTC).  $V_{OL}$  is the output low voltage.

Simulation results for four input standard footless domino AND and OR gates are shown in Fig. 9.2. For comparison, simulation results of domino logic circuits without a keeper are also included in Fig. 9.2. All of the transistors other than the keeper transistor are sized the same. The effect of the keeper transistor on the circuit delay and noise immunity characteristics varies depending upon the gate input excitation. The simulations of the first group of circuits (NML1, Delay1, and Power1 shown in Fig. 9.2) are based on the assumption that the input or noise signals couple only at a single gate input while the other gate inputs are connected either to ground (for the OR gates) or to V<sub>DD</sub> (for the AND gates). Additional simulations (NML2, Delay2, and Power2 shown in Fig. 9.2) are produced assuming that all of the gate inputs are excited simultaneously by the same input or noise signal.



Fig. 9.2. Comparison of the normalized noise immunity, evaluation delay, and power characteristics of standard footless domino logic circuits with different keeper sizes. (a) Effect of the increased keeper size on the circuit characteristics of a four input domino AND gate. (b) Effect of the increased keeper size on the circuit characteristics of a four input domino OR gate. NML 1, Delay 1, and Power 1: only one input is excited while the other inputs are either grounded (for the OR gates) or connected to  $V_{DD}$  (for the AND gates). NML 2, Delay 2, and Power 2: All four inputs are excited with the same input or noise signal.

As shown in Fig. 9.2a, when the input or noise signal is applied to only one input while the other gate inputs are connected to  $V_{DD}$  (NML1, Delay1, and Power1 shown in Fig. 8.2a), the addition of a keeper whose size is a quarter of a pull-down transistor degrades the evaluation speed and power by 16% and 14%, respectively, as compared to a four input domino AND gate without a keeper. Increasing the keeper size from 0.25 to 1, the NML1 is increased by 163%. The increased keeper size, however, also increases the delay and power dissipation by 190% and 132%, respectively. When all of the gate inputs are excited (NML2, Delay2, and Power2 shown in Fig. 9.2a), the NML2, delay, and power are increased by 104%, 177%, and 125%, respectively, by increasing the keeper size from 0.25 to 1.

When only one input signal is excited while the other three input signals are grounded in a four input domino OR gate, the addition of a keeper half the size of a pull-down network transistor degrades the power and delay by 18% and 16%, respectively, as compared to a standard domino circuit without a keeper (as shown in Fig. 9.2b). Increasing the keeper size from 0.5 to 2 increases the noise immunity, delay, and power by 119%, 104%, and 118%, respectively. When all of the gate inputs are excited by the same noise or input signal, the effect of the keeper current on both the circuit performance and reliability is reduced. Increasing the keeper size from 0.5 to 2, therefore, improves the NML by only 24%. The delay and power are increased by 40% and 67%, respectively.

As displayed in Fig. 9.2, from a circuit performance and energy efficiency point of view, the keeper should be sized as small as possible (or preferably omitted as in earlier domino logic circuits). On the contrary, from a noise immunity and operation reliability point of view, the keeper size should be as large as possible while guaranteeing functionality for a worst case delay input combination. There is, therefore, a tradeoff between high noise immunity and high speed/energy efficient operation of domino logic gates [128]-[130].

In order to manage these conflicting requirements (a strong keeper for high noise immunity and a weak keeper for high speed), a variable strength keeper scheme (a conditional keeper technique) was first proposed by Alvandpour [128]. Two keeper

transistors are employed in the proposed scheme. One of the keeper transistors is sized small in order to reduce the contention current while the other keeper transistor is sized larger for high noise immunity. The larger keeper transistor is conditionally turned on if the dynamic node is not discharged during the evaluation phase. The weak keeper offers limited noise immunity, improving the evaluation speed during the worst case evaluation delay while the strong keeper offers good robustness to noise and leakage during the rest of the evaluation phase [129]. The primary drawback of this technique is that a delay element and a conditional keeper control circuit are required for each domino gate, increasing the area and energy overhead of the conditional keeper circuits. A similar technique with a single keeper transistor which is cut off at the beginning of the evaluation phase has been proposed in [130]. The dynamic node, without any conductive path to the power supply, floats at the beginning of the evaluation phase. Although the contention current is reduced with the technique proposed in [130], reliable operation cannot be maintained in an increasingly noisy and noise sensitive on-chip environment. It is assumed with the domino circuit techniques proposed in [128] and [130] that the timing of the clock and input signals driving the domino gates are well known, permitting the worst case evaluation delay to be accurately estimated. The effectiveness of both techniques in reducing the delay and power of domino logic circuits depends upon the accurate estimation of the worst case evaluation delay [129]. Provided that the worst case evaluation delay is underestimated, the conditional keeper can be turned on before the evaluation is completed (the dynamic node is fully discharged), causing a contention current on par with the current produced by a standard domino keeper transistor. Alternatively, if the worst case evaluation delay is overestimated, the circuit is exposed to noise with little noise immunity for an extended amount of time, thereby degrading reliability.

A variable threshold voltage keeper circuit technique is described in this chapter for simultaneously reducing power, enhancing speed, and improving noise immunity in domino logic circuits. The current drive of the keeper transistor is adjusted by dynamically body biasing the keeper. The threshold voltage of the keeper transistor is modified during circuit operation to reduce the contention current without sacrificing

noise immunity. Similar to the conditional keeper [128] and high speed domino [130] techniques, it is assumed that the worst case evaluation delay of the domino circuits can be accurately predicted. The operation of the proposed domino logic with a variable threshold voltage keeper (DVTVK) circuit technique is described in Section 9.2.

### 9.2. Domino Logic with Variable Threshold Voltage Keeper

The DVTVK circuit technique is introduced in Section 9.2.1. The threshold voltage of the keeper is dynamically modified during circuit operation by changing the body bias voltage of the keeper. Operation of the body bias generator is described in Section 9.2.2.

### 9.2.1. Variable Threshold Voltage Keeper

A K input domino OR gate based on the variable threshold voltage keeper circuit technique is shown in Fig. 9.3. A representative waveform that characterizes the operation of the circuit is shown in Fig. 9.4.

The operation of the DVTVK circuit behaves in the following manner. When the clock is low, the pull-up transistor is on and the dynamic node is charged to  $V_{DD1}$ . The substrate of the keeper is charged to  $V_{DD2}$  ( $V_{DD2} > V_{DD1}$ ) by the body bias generator, increasing the keeper threshold voltage. The value of the high threshold voltage (high- $V_t$ ) of the keeper is determined by the reverse body bias voltage ( $V_{DD2} - V_{DD1}$ ) applied to the source-to-substrate p-n junction of the keeper. The current sourced by the high- $V_t$  keeper is reduced, lowering the contention current when the evaluation phase begins. A reduction in the current drive of the keeper does not degrade the noise immunity during precharge as the dynamic node voltage is maintained during this phase by the pull-up transistor rather than by the keeper.



Fig. 9.3. A k input domino OR gate with a variable threshold voltage keeper.

When the clock goes high (the evaluation phase), the pull-up transistor is cutoff and only the high- $V_t$  keeper current contends with the current from the evaluation path transistor(s). Provided that the appropriate input combination that discharges the dynamic node is applied in the evaluation phase, the contention current due to the high- $V_t$  keeper is significantly reduced as compared to standard domino logic. After a delay determined by the worst case evaluation delay of the domino gate, the body bias voltage of the keeper is reduced to  $V_{DD1}$ , zero biasing the source-to-substrate p-n junction of the keeper. The threshold voltage of the keeper is lowered to the zero body bias level, thereby increasing the keeper current. The DVTVK keeper has the same threshold voltage of a standard domino keeper, offering the same noise immunity during the remaining portion of the evaluation phase (assuming the SD and DVTVK keepers are the same size).



Fig. 9.4. Waveforms that characterize the operation of the variable threshold voltage keeper circuit technique.

### 9.2.2. Dynamic Body Bias Generator

The proposed dynamic body bias generator (DBBG) is shown in Fig. 9.5. The DBBG produces an output signal swinging between  $V_{DD1}$  and  $V_{DD2}$  from an input signal swinging between ground and  $V_{DD1}$ . The DBBG generates the proper body bias voltages for the keeper with an appropriate delay, ensuring that the contention current is reduced without sacrificing noise immunity.



Fig. 9.5. Body bias generator circuit.

The operation of the DBBG is controlled by the clock signal that also controls the operational phases of the domino logic circuit. When the clock goes low, Node<sub>2</sub> is discharged through N<sub>2</sub>, turning on P<sub>1</sub> and P<sub>3</sub>. P<sub>2</sub> and P<sub>4</sub> are cutoff and the body bias voltage is increased to V<sub>DD2</sub>. When the clock goes high, the domino logic enters the evaluation phase. Node<sub>1</sub> is discharged through N<sub>1</sub>, turning on P<sub>2</sub> and P<sub>4</sub>. P<sub>1</sub> and P<sub>3</sub> are cutoff. The voltage at Node<sub>3</sub> is maintained at V<sub>DD1</sub> through P<sub>4</sub>. During this stage, the DBBG must ensure that the keeper current is increased to the low-V<sub>t</sub> level to maintain higher noise immunity if the dynamic node is not discharged by the evaluation path transistors. After a delay determined by the worst case evaluation delay of the domino gate, the body bias voltage is reduced to V<sub>DD1</sub>. Hence, with a time delay t<sub>d</sub> after the clock edge, the threshold voltage of the keeper is reduced to the zero body bias level, increasing the keeper current. During the remaining portion of the evaluation phase, the noise immunity characteristics of the SD and DVTVK circuit techniques are identical.

The proposed dynamic body bias generator assumes two supply voltages,  $V_{DD1}$  and  $V_{DD2}$ , where  $V_{DD1} < V_{DD2}$ . The delay and power savings can be improved by increasing  $V_{DD2}$  as compared to  $V_{DD1}$ . This change, however, also degrades the noise immunity characteristics of the domino circuit at the beginning of the evaluation phase. The appropriate reverse body bias voltage applied to the keeper is determined by the target delay/power objectives while satisfying the lowest acceptable noise immunity requirements during the worst case evaluation delay of the domino gate. The highest bias voltages that can be applied across the source-to-substrate p-n junction and the gate oxide of a MOSFET for a specific technology are other factors that determine  $V_{DD2}$ .

#### 9.3. Simulation Results

As discussed in Section 9.1, the worst case evaluation delay of a wide domino OR gate occurs when only one input is excited while the other inputs are grounded. Similarly, the worst case evaluation delay in a domino gate with stacked pull-down

transistors (e.g., an AND-OR or an AND gate) occurs when all of the inputs in the critical pull-down path are excited by the same input signal while all of the other inputs are grounded. The worst case evaluation delay determines the clock speed of a domino circuit while the target clock speed determines the size of a keeper. Speed and power characteristics of the domino logic circuits are evaluated for the set of worst case input vectors. While evaluating the noise immunity, the same noise signal is applied to all of the test circuit inputs as this situation represents the worst case noise condition.

The SD and DVTVK circuit techniques are evaluated for two different test circuits assuming a 0.18 µm CMOS technology. Simulation results of a multiple-output domino carry generator implemented with the proposed DVTVK circuit technique are presented in Section 9.3.1. The proposed DVTVK circuit technique is also applied to a chain of footless domino OR gates. The simulation results of the clock delayed domino OR gates (COR) with the proposed DVTVK circuit technique are presented in Section 9.3.2. The effect of gate sizing on the delay and power characteristics of the proposed DVTVK circuit technique is discussed in Section 9.3.3.

# 9.3.1. Multiple Output Domino Carry Generator with Variable Threshold Voltage Keeper

A four-bit multiple-output domino carry generator (CG) implemented with the proposed variable threshold voltage keeper circuit technique (CG-DVTVK) is shown in Fig. 9.6. A description of the multiple-output domino circuit technique is presented in [133]. The CG circuit has four dynamic nodes. Each dynamic node of the CG can be discharged independently by asserting the generate (G) input of the corresponding node. The critical path of the CG circuit is along the N<sub>5</sub>-N<sub>9</sub> path. The worst case evaluation delay of the CG occurs while discharging the fourth dynamic node (Dynamic<sub>4</sub>) through the critical path. During evaluation of the delay and power characteristics, the propagate inputs (P<sub>1</sub>-P<sub>4</sub>) and C<sub>in</sub> are asserted while the generate inputs (G<sub>1</sub>-G<sub>4</sub>) are grounded. While evaluating the noise immunity, all of the inputs

are excited by the same noise signal. A 1 GHz clock with a 50% duty cycle is applied to the circuits. All of the common transistors in the SD and DVTVK test circuits are sized the same.



Fig. 9.6. A four-bit multiple-output domino carry generator of a carry lookahead adder implemented with the variable threshold voltage keeper circuit technique.  $W_{N2} = 2W_{N1}/3$ ,  $W_{N3} = 2W_{N1}/4$ ,  $W_{N4} = 2W_{N1}/5$ ,  $W_{N5}$ ,  $W_{N6}$ ,  $W_{N7}$ ,  $W_{N8}$ ,  $W_{N9} = 2W_{N1}$ .

In order to determine an appropriate reverse body bias voltage to be applied to the keeper, the delay, power, power-delay product (PDP), and noise immunity characteristics of CG-DVTVK are evaluated by varying  $V_{DD2}$  (for a keeper to critical path effective transistor width ratio (KPR) of 2.2). The normalized delay, power, PDP, and NML of CG-DVTVK as compared to the standard domino carry generator (CG-SD) are shown in Fig. 9.7. The evaluation delay and power dissipation are reduced by increasing  $V_{DD2}$  as compared to  $V_{DD1}$ . Increasing  $V_{DD2}$ , however, also degrades the noise immunity characteristics of the domino circuit at the beginning of the evaluation phase.



Fig. 9.7. Variation of the power-delay product (PDP), delay, power, and noise margin low (NML) characteristics of CG-DVTVK with  $V_{DD2}$ . Values are normalized to those of a standard domino (SD) carry generator circuit with the same size transistors (KPR = 2.2).

As shown in Fig. 9.7, the degradation in noise immunity is 2% for a reverse body bias voltage of 0.3 volts while the delay and power savings are 4% and 1%,

respectively. Increasing the reverse body bias voltage of the keeper transistor to 1.8 volts ( $V_{DD2} = 3.6$  volts), the delay and power savings are increased to 60% and 35%, respectively, while the degradation in noise immunity at the beginning of the evaluation phase increases to 11%. It is assumed that applying a supply voltage of up to 3.6 volts to the body bias generator does not create any MOSFET gate oxide related reliability problems in the target CMOS technology. It is also (arbitrarily) assumed that a degradation of the noise margin by 11% at the beginning of the evaluation phase is acceptable. In the following analysis,  $V_{DD1}$  and  $V_{DD2}$  are 1.8 volts and 3.6 volts, respectively.

Simulation results characterizing the delay and power gains achievable with the DVTVK circuit technique for a same size keeper as compared to SD are analyzed in Section 9.3.1.1. Since the contention current is significantly reduced with the proposed variable threshold voltage keeper circuit technique, the size of the keeper transistor can be increased to improve the noise immunity without degrading the delay and power characteristics as compared to a standard domino logic circuit. The improvement in noise immunity offered by the DVTVK technique under the same delay, power, or power-delay product conditions as compared to SD is presented in Section 9.3.1.2.

# 9.3.1.1. Improved Delay and Power Characteristics with Comparable Noise Immunity

The keeper width is a multiple of the equivalent width of the pull-down critical path and is varied to evaluate the delay, power, and noise immunity characteristics. The evaluation delay, power, power-delay product (PDP), and NML of the SD and DVTVK circuits as a function of the keeper to critical path effective transistor width ratio (KPR) are shown in Fig. 9.8. Provided that the input vector combination that produces the worst case evaluation delay is applied, the fourth dynamic node of the SD circuit cannot be fully discharged during the entire evaluation phase for KPR values above 2.2 due to the high contention current in standard domino logic circuits.

A KPR of 2.2 is, therefore, the largest value that is considered in this analysis. The gain in delay, power, and PDP achieved by the proposed technique is listed in Table 9.1.



Fig. 9.8. SD and DVTVK simulation results for different keeper to critical path equivalent transistor width ratios (KPR). (a) Evaluation delay versus KPR. (b) Power dissipation versus KPR. (c) Noise margin versus KPR. (d) Power delay product versus KPR.

The proposed variable threshold voltage keeper circuit technique is effective for enhancing the evaluation speed of domino logic circuits. As listed in Table 9.1, DVTVK improves the evaluation delay by 60% as compared to SD (for a KPR = 2.2). As shown in Fig. 9.8a, the effectiveness of the proposed technique increases with larger keeper size as the degradation in circuit speed becomes more severe due to increased contention current. The enhancement in circuit speed of DVTVK as compared to SD reduces to 8% as the KPR is decreased to 0.6. As shown in Fig. 9.8b, the proposed circuit technique also lowers the power consumption for a wide range of keeper sizes. As listed in Table 9.1, DVTVK reduces the power by 35% as compared to SD (for a KPR = 2.2). As the keeper size is decreased, the effect of the keeper contention current on the evaluation delay and power dissipation becomes smaller. The reduction in power, therefore, diminishes with decreasing keeper size. Due to the energy overhead of the dynamic body bias generator circuit, the power consumption of DVTVK is 13% greater than SD when the KPR is reduced to 0.6.

TABLE 9.1
A COMPARISON OF THE EVALUATION DELAY, POWER DISSIPATION,
POWER-DELAY PRODUCT (PDP), AND NML (FOR MAXIMUM REVERSE
BODY BIASED KEEPER) OF SD AND DVTVK CIRCUIT TECHNIQUES FOR

KPR = 2.2

|           | Evaluation<br>Delay (ps) | Power<br>(μW) | PDP<br>(fJ) | NML<br>(mV) |
|-----------|--------------------------|---------------|-------------|-------------|
| SD        | 291                      | 2625          | 764         | 478         |
| DVTVK     | 116                      | 1717          | 199         | 427         |
| Reduction | 60%                      | 35%           | 74%         | -11%        |

The power-delay product (PDP) of the circuits is also illustrated in Fig. 9.8 to better compare the effect of the proposed variable threshold voltage keeper circuit technique on circuit performance and energy dissipation. SD has a higher PDP as compared to DVTVK for values of KPR greater than 0.8. As listed in Table 9.1, DVTVK lowers the PDP by 74% as compared to SD for a KPR of 2.2.

Another important metric for domino circuits is the noise immunity. The proposed circuit technique degrades the noise immunity as compared to SD, although only at the beginning of the evaluation phase. This degradation occurs for a brief amount of time until the threshold voltage of the keeper is lowered for increased noise immunity. The time delay (t<sub>D</sub>) at the beginning of the evaluation phase, after which the keeper current drive is increased to the low-V<sub>t</sub> level, is determined by the worst case evaluation delay of the domino gate. The degradation in noise immunity changes between 8% and 11% under maximum reverse body bias conditions as the KPR is increased from 0.6 to 2.2. As shown in Fig. 9.8c, the noise immunity of DVTVK is identical to the noise immunity of SD whenever a zero body bias is applied to the keeper.

### 9.3.1.2. Improved Noise Immunity with Comparable Delay or Power Characteristics

The DVTVK circuit technique is shown to offer significant delay and power savings for the same size keeper as compared to SD. Because of the high contention current in standard domino logic circuits, the circuit evaluation delay and power increases significantly with increased keeper size. As explained in Section 9.1, the huge speed and energy penalty incurred to increase the noise immunity in standard domino logic circuits is due to the static keeper current drive in the evaluation phase. As shown in Fig. 9.8, the NML of SD and zero body biased DVTVK increases by 34% as the KPR is increased from 0.6 to 2.2. The adverse effect of increased keeper size on the delay and power characteristics is significantly lower for DVTVK as compared to SD. As shown in Fig. 9.8, the evaluation delay and power dissipation of SD (DVTVK) are increased by 3.8 (1.6) times and 2.6 (1.5) times, respectively, for a

34% noise immunity improvement as the KPR is increased from 0.6 to 2.2. The PDP of SD (DVTVK) increases 10 (2.5) times for a KPR of 2.2 as compared to a KPR of 0.6.

Since the contention current is significantly reduced with the proposed variable threshold voltage keeper technique, the width of the keeper transistor in a DVTVK circuit can be increased without degrading the delay and power characteristics as compared to a standard domino logic circuit. DVTVK, therefore, offers higher noise immunity as compared to SD under the same delay, power, or power-delay product conditions. The KPR of DVTVK is fixed at 2.2 (the highest value considered during the analysis). The SD keeper size is reduced to lower the contention current, offering the same delay, power, or PDP as compared to DVTVK. The improvement in NML of DVTVK as compared to SD (both under the maximum reverse body biased and zero body biased DVTVK keeper conditions) are listed in Table 9.2. The KPR of SD required for the same delay, power dissipation, or PDP characteristics as compared to the DVTVK circuit technique are also listed in Table 9.2.

TABLE 9.2

ACHIEVABLE IMPROVEMENT IN NML WITH THE DVTVK CIRCUIT

TECHNIQUE AS COMPARED TO SD WHILE MAINTAINING EQUAL DELAY,

POWER DISSIPATION, OR PDP (KPR OF DVTVK IS 2.2)

|            |        | Noise margin improvement as compared to SD |                          |  |
|------------|--------|--------------------------------------------|--------------------------|--|
|            | SD-KPR | NML<br>Zero Body Bias                      | NML<br>Reverse Body Bias |  |
| Same Delay | 1.34   | 14.1%                                      | 1.9%                     |  |
| Same Power | 1.63   | 8.9%                                       | -2.7%                    |  |
| Same PDP   | 1.45   | 11.9%                                      | 0.0%                     |  |

As listed in Table 9.2, the NML of DVTVK is 14.1% higher as compared to SD (zero body biased keeper) when the SD keeper is sized for comparable evaluation speed. Since the keeper transistor in the CG-DVTVK circuit is sized 64% larger than the keeper in CG-SD, the noise immunity of CG-DVTVK is higher as compared to CG-SD even at the beginning of the evaluation phase when the keeper threshold voltage is increased by reverse body biasing the keeper. Under the same power dissipation conditions, the NML of DVTVK with zero body biased keeper improves by 8.9% as compared to SD. When the power-delay products of DVTVK and SD are maintained the same, the DVTVK (with zero body biased keeper) offers an 11.9% higher NML as compared to SD.

#### 9.3.2. Clock-Delayed Domino Logic with Variable Threshold Voltage Keeper

As discussed in Section 9.1, footless domino logic circuits have better speed and power characteristics as compared to footed domino logic circuits. Cascaded footless domino logic circuits, however, require careful timing of the clock and input signals. When the DVTVK circuit technique is applied to a clock-delayed footless domino circuit, the body bias signals should be delayed with respect to the input signals at each footless domino stage. Appropriate timing of the body bias signal is crucial for maximizing the delay and power gains without sacrificing noise immunity with the proposed circuit technique. The proposed DVTVK circuit technique is applied to cascaded footless domino OR gates as shown in Fig. 9.9. A three stage chain of eight input domino OR gates with a fan-out of three (COR) is investigated.

A body bias signal that swings between  $V_{DD1}$  and  $V_{DD2}$  from a clock signal that swings between ground and  $V_{DD1}$  is generated in the first stage of a clock-delayed domino circuit. The substrate of the keepers within the domino gates in the following stages are driven by cascaded inverters supplied by  $V_{DD1}$  and  $V_{DD2}$  (as shown in Fig. 9.9). The delay and driving strength of these inverters are adjusted in each domino stage to maintain the correct timing of the body bias signals. The clock and body bias

signals are delayed at each footless domino stage, maximizing the delay and power gains with the proposed variable threshold voltage keeper circuit technique.



Fig. 9.9. Clock delayed domino logic with the variable threshold voltage keeper circuit technique.

The keeper width is a multiple of the width of a pull-down network transistor (all of the NMOS transistors in a pull-down path are sized the same) and is varied to evaluate the delay, power, and noise immunity characteristics of the chain of domino logic circuits with variable threshold voltage keeper (COR-DVTVK) and the chain of domino logic circuits with standard keeper (COR-SD). A 1 GHz clock with a 50% duty cycle is applied to the circuits. All of the common transistors in the SD and DVTVK test circuits are sized the same. Each domino gate at the third stage drives a

10 fF load. The evaluation delay, power, and PDP savings of COR-DVTVK as compared to COR-SD for different keeper sizes are listed in Table 9.3.

As listed in Table 9.3, DVTVK improves the evaluation delay, power, and PDP by 6.9%, 0.6%, and 7.5%, respectively, as compared to SD for a KPR = 0.6. The effectiveness of the proposed technique increases with larger keeper size as the degradation in circuit speed and power characteristics becomes more severe due to increased keeper contention. The enhancement in circuit speed, power, and PDP of DVTVK as compared to SD are 43.4%, 37.2%, and 64.4%, respectively, for a KPR of 2.2. The degradation in noise immunity (NML) changes between 5.9% and 6.5% as the KPR is varied between 0.6 and 2.2.

TABLE 9.3

DELAY, POWER, AND PDP SAVINGS OF COR-DVTVK AS COMPARED TO

COR-SD WITH DIFFERENT KEEPER SIZES

|     | Percentage improvement as compared to SD |       |      |      |
|-----|------------------------------------------|-------|------|------|
| KPR | Delay                                    | Power | PDP  | NML  |
| 0.6 | 6.9                                      | 0.6   | 7.5  | -6.1 |
| 0.8 | 9.9                                      | 3.2   | 12.8 | -5.9 |
| 1.0 | 12.3                                     | 5.7   | 17.3 | -5.9 |
| 1.2 | 15.8                                     | 8.8   | 23.2 | -6.0 |
| 1.4 | 19.3                                     | 12.7  | 29.5 | -6.0 |
| 1.6 | 23.3                                     | 16.8  | 36.2 | -6.1 |
| 1.8 | 28.6                                     | 21.9  | 44.2 | -6.2 |
| 2.0 | 35.0                                     | 28.5  | 53.5 | -6.5 |
| 2.2 | 43.4                                     | 37.2  | 64.4 | -6.4 |

Similar to CG-DVTVK, the keeper transistors in a COR-DVTVK circuit can be sized larger, offering higher noise immunity with the same delay and power characteristics as compared to a standard domino logic circuit. The keeper transistors of COR-DVTVK and COR-SD are sized for the same delay, power, or PDP characteristics. The improvement in the NML of COR-DVTVK as compared to COR-SD (both under the maximum reverse body biased and zero body biased DVTVK keeper conditions) are listed in Table 9.4. COR-DVTVK offers 8.1% higher noise immunity as compared to SD with the same evaluation speed. The larger size of the COR-DVTVK keeper compensates for the reduced gate overdrive (|Vgs-Vtp|) of the keeper transistor at the beginning of the evaluation phase when the keeper is reverse body biased. The noise margins of COR-DVTVK with reverse body biased keeper and COR-SD for the same evaluation delay are, therefore, equal.

TABLE 9.4

ACHIEVABLE IMPROVEMENT IN NML WITH THE DVTVK CIRCUIT

TECHNIQUE AS COMPARED TO SD WHILE MAINTAINING EQUAL DELAY,

POWER DISSIPATION, OR PDP (KPR OF DVTVK IS 2.2)

| STANCE II  | SD-KPR | Noise margin improvement as compared to SI |                          |  |
|------------|--------|--------------------------------------------|--------------------------|--|
|            |        | NML<br>Zero Body Bias                      | NML<br>Reverse Body Bias |  |
| Same Delay | 1.45   | 8.1%                                       | 0.0%                     |  |
| Same Power | 1.61   | 6.1%                                       | -1.8%                    |  |
| Same PDP   | 1.52   | 7.2%                                       | -0.8%                    |  |

### 9.3.3. Impact of Gate Size on the Energy Overhead of the Dynamic Body Bias Generator

It is assumed that each of the carry generator outputs (in Section 9.3.1) and the third stage footless domino OR gate outputs (in Section 9.3.2) drive a 10 fF load. The transistors in the domino logic circuits have been sized to operate with a 1 GHz clock with a 50% duty cycle. In Fig. 9.6,  $W_{N1} = 25W_{min}$  and  $W_{pull-up} = 8W_{min}$ . In Fig. 9.9,  $W_{pull-down} = 10W_{min}$  and  $W_{pull-up} = 9W_{min}$ . In the body bias generators,  $P_1$ ,  $P_2$ ,  $P_3$ ,  $P_4$ ,  $N_1$ ,  $N_2$ , and the transistors within  $I_1$  are minimum sized ( $L = L_{min}$  and  $W = W_{min}$ ) while the size and number of inverters have been adjusted to appropriately delay the body bias signals. The DVTVK circuit technique increases the area by 2.3% to 2.8% and 3% to 2.6% as compared to CG-SD and COR-SD, respectively, for  $0.6 \le KPR \le 2.2$ . For increasing keeper size, the delay elements (the inverters) are resized to strengthen the body bias signal while most of the transistors forming the DBBG are minimum size. The energy savings due to the reduced contention current as compared to a standard domino circuit typically exceeds the additional energy dissipated by the body bias generator.

The affect of reducing the output load capacitance on the delay and power characteristics of the proposed DVTVK circuit technique is evaluated in this section for a four bit multiple-output domino carry generator (CG) and cascaded three stage eight input clock-delayed domino OR gates (COR). The load capacitance is scaled from 10 fF to 2 fF while maintaining a clock frequency of 1 GHz. The savings in the delay, power, and PDP of the CG-DVTVK and COR-DVTVK circuits varies with the load capacitance as shown in Fig. 9.10 (KPR = 2.2).

DBBG is used to only drive the substrate of the keeper transistors in the domino logic circuits. Most of the transistors in a DBBG are, therefore, sized minimum even for a high output load capacitance. The energy overhead of DBBG becomes more significant as the pull-up, pull-down, and the output inverter transistors of the domino logic circuits are scaled together with the load capacitance. As shown in Fig. 9.10, the power savings are, therefore, reduced as the output load capacitance is decreased. The

degradation in the power savings of the CG is more significant as compared to COR at small load capacitances. This behavior is explained by the same DBBG being shared by several OR gates in the second and third stages of COR-DVTVK, reducing the overall energy overhead of the DBBG circuits. At high loads, however, the power savings of CG-DVTVK and COR-DVTVK are similar. The speed enhancement by the proposed DVTVK technique is primarily dependent on the relative size of the pull-down network transistors and the keeper. The effectiveness of the DVTVK circuit technique for improving the delay characteristics as compared to SD is, therefore, relatively insensitive to the load capacitance as shown in Fig. 9.10 (for the same keeper to pull-down network transistor width ratio).



Fig. 9.10. Variation of the delay, power, and PDP savings of the CG-DVTVK and COR-DVTVK circuits with the output load capacitance as compared to CG-SD and COR-SD, respectively (KPR = 2.2).

### 9.4. Domino Logic with Forward and Reverse Body Biased Keeper

Reverse body biasing the keeper at the beginning of the evaluation phase is effective for simultaneously improving the speed and power characteristics of domino logic circuits. The keeper transistor is zero body biased after the worst case evaluation delay in order to not sacrifice noise immunity with the variable threshold voltage keeper circuit technique.

Alternatively, forward body biasing the keeper after the worst case evaluation delay is proposed in this section to improve the noise immunity characteristics as compared to standard domino. The threshold voltage of a forward body biased MOSFET is reduced, increasing the conduction current as compared to a zero body biased transistor with the same physical dimensions. Forward body biasing the keeper, therefore, improves the noise immunity characteristics as compared to a standard domino logic circuit with the same keeper size. The proposed DVTVK circuit technique with a forward and reverse body biased keeper is applied to cascaded footless domino OR gates. Simulation results for the COR-DVTVK with a forward body biased keeper are presented in Section 9.4.1. Technology scaling characteristics of the reverse and forward body bias techniques applied to a keeper transistor are discussed in Section 9.4.2.

# 9.4.1. Clock-Delayed Domino Logic with Forward and Reverse Body Biased Keeper

A three stage chain of eight input domino OR gates with a fan-out of three (COR) is simulated assuming a 0.18  $\mu m$  CMOS technology. The only difference in the dynamic body bias generator (DBBG) of the domino circuit with a forward biased keeper is that  $V_{DD1}$  (as shown in Figs. 9.5 and 9.9) is replaced by a smaller supply voltage  $V_{DD3}$  ( $V_{DD3} < V_{DD1}$ ). A body bias signal that swings between  $V_{DD3}$  and  $V_{DD2}$  from a clock signal that swings between ground and  $V_{DD1}$  is generated in the first stage of the clock-delayed domino circuit. The substrate of the keepers within the

domino logic gates in the following stages are driven by cascaded inverters supplied by  $V_{DD3}$  and  $V_{DD2}$ . An eight input footless domino OR gate with a forward body biased keeper is shown in Fig. 9.11.



Fig. 9.11. An eight input footless domino OR gate with a forward body biased keeper.

When a keeper transistor is forward body biased the source-to-body and drain-to-body p-n junctions produce diode currents as illustrated in Fig. 9.11. The forward body bias voltage that can be applied to a MOSFET is limited due to these diode currents. The diode current through the drain-to-body p-n junction (I<sub>diode2</sub>) opposes the drain current (I<sub>drain</sub>) of a keeper transistor. I<sub>diode2</sub> attempts to discharge the dynamic node while I<sub>drain</sub> is charging the node. The drain-to-substrate current, therefore, reduces the net current supplied by the keeper to maintain the state of the dynamic node. The noise margin is greater at forward body bias voltages where the improvement in the keeper drain current due to the reduced threshold voltage dominates the increased drain-to-body junction current. For strongly forward body biased keepers, I<sub>diode2</sub> lowers (clamps) the voltage of the dynamic node. At room temperature, the DC operating point of the dynamic node when all of the pulldown transistors are cutoff (ideal noiseless condition) is reduced by more than 5% for forward body bias voltages

higher than 700 mV. The noise immunity can, therefore, be reduced, provided that the body diode is strongly turned on at high FBB voltages.

The noise immunity criterion used in this section is similar to the criterion described in [129]. The variation in the noise immunity characteristics of an eight input footless domino OR gate with the body bias voltage applied to the keeper transistor is shown in Fig. 9.12, for two different noise coupling scenarios. All of the values are normalized to the standard zero body biased keeper case. As shown in Fig. 9.12, increasing the forward body bias voltage towards 700 mV enhances the noise immunity. For a forward body bias voltage of 700 mV, the enhancement in noise immunity varies between 3.8% (noise couples to all of the inputs) and 11.2% (noise couples to only one input) as compared to a standard domino logic circuit with the same size transistors (KPR = 2.2). As the forward body bias voltage is increased beyond 700 mV, the body diodes are strongly turned on, degrading the noise immunity.

A FBB voltage of 700 mV provides the highest enhancement in the noise immunity characteristics at the room temperature. For the FBB voltages beyond 600 mV, however, the power overhead of the DVTVK circuit technique significantly increases due to the high diode currents. The variation of the savings in delay, power, and PDP of COR-DVTVK as compared to COR-SD with 500 and 600 mV FBB for two different KPR values is illustrated in Fig. 9.13. The improvement in delay, power, PDP, and NML of the DVTVK circuit technique as compared to SD for a forward body bias voltage of 600 mV with two different keeper sizes is listed in Table 9.5.

The speed enhancement of the DVTVK circuit technique is primarily dependent on the reverse body bias voltage applied to the keeper at the beginning of the evaluation phase. For a  $V_{DD2}$  of 3.6 volts, therefore, the delay savings of the proposed DVTVK circuit is similar to the delay savings reported in Section 9.3. As shown in Fig. 9.13, the improvement in the delay of the DVTVK circuit technique is approximately 43% under the 500 mV and 600 mV FBB conditions.



Fig. 9.12. Variation of COR-DVTVK noise margins with the forward body bias for KPR = 1 and KPR = 2.2. The noise margins are normalized to the zero body biased keeper condition. NML-ONE: noise couples to one input while all of the other inputs are grounded. NML-ALL: noise couples to all of the inputs.

The power overhead of the DVTVK circuit technique increases when the keeper is forward body biased due to the junction diode currents and the increased voltage swing of the DBBG and keeper substrate (from  $V_{DD1} \rightarrow V_{DD2}$  to  $V_{DD3} \rightarrow V_{DD2}$ ). As listed in Table 9.5, the power savings of the DVTVK circuit technique is reduced to 28.3% as the forward body bias voltage is increased to 600 mV (KPR = 2.2 and load = 10 fF). Similar to the analysis described in Section 9.3, for smaller keeper sizes, the effect of the keeper contention current on the evaluation delay and power dissipation is less. The reduction in delay is, therefore, lower and the power savings is smaller with decreased keeper size. As the KPR is reduced to 1, the savings in delay and PDP are reduced to 12.3% and 4.5%, respectively. Since the energy overhead of the DVTVK circuit technique increases when the keeper is forward body biased, the power dissipation of DVTVK is 8.9% higher as compared to SD for a KPR = 1 when the keeper transistor is forward body biased by 600 mV.



Fig. 9.13. Variation of the savings in delay, power, and PDP of COR-DVTVK as compared to COR-SD with a forward body bias applied to the keeper for two different keeper sizes. {Delay1, Power1, PDP1}  $\rightarrow$  KPR = 1. {Delay2.2, Power2.2, PDP2.2}  $\rightarrow$  KPR = 2.2.

TABLE 9.5

DELAY, POWER, POWER-DELAY PRODUCT (PDP),

AND NML SAVINGS OF COR-DVTVK AS COMPARED TO COR-SD

(WITH A FORWARD BODY BIAS VOLTAGE OF 0.6 VOLTS)

|     | Improvement (%) |       |      |         |         |
|-----|-----------------|-------|------|---------|---------|
| KPR | Delay           | Power | PDP  | NML-ALL | NML-ONE |
| 1   | 12.3            | -8.9  | 4.5  | 2.4     | 6.8     |
| 2.2 | 43.4            | 28.3  | 59.4 | 3.5     | 10.2    |

For a FBB of 600 mV and KPR = 2.2, the enhancement in noise immunity varies between 3.5% (noise couples to all of the inputs) and 10.2% (noise couples to one

input). For a KPR = 1, the range of enhancement in the noise immunity under a 600 mV FBB condition is between 2.4% and 6.8%.

# 9.4.2. Technology Scaling Characteristics of the Reverse and Forward Body Bias Techniques Applied to a Keeper Transistor

Dynamically adjusting the current drive of the keeper transistors in a domino logic circuit is proposed in this chapter. The threshold voltage of a keeper transistor is modified during circuit operation by body biasing the keeper transistor. More general schemes have been proposed in the literature for body biasing all of the transistors in order to enhance speed (by lowering the threshold voltage of the transistors), to reduce active power (by lowering both the supply and threshold voltages while maintaining the same speed as compared to a high threshold voltage circuit), to decrease active and standby leakage current (by increasing the threshold voltage of the transistors in the idle portions of a circuit), or to control the within-die and die-to-die threshold voltage variations (by adaptive body biasing) [74], [75], [77], [131], [132]. In a circuit where the body bias voltages of all of the transistors are modified, the power and current demand of the body bias generator can become significant [77]. A dynamic body bias generator is proposed in this paper to drive only the keeper transistors in a domino logic circuit. The power and current demand of the body bias generator for the variable threshold voltage keeper circuit technique is, therefore, small.

Reverse body biasing is typically applied to reduce the subthreshold leakage current (I<sub>off</sub>) when a circuit is idle [74], [75]. There is an exponential relationship between the subthreshold leakage current and threshold voltage of a MOSFET. Reverse body biasing a transistor increases the threshold voltage, thereby reducing the subthreshold leakage current. Increasing the reverse body bias voltage, however, also increases the band-to-band tunnelling current in the source-to-substrate and drain-to-substrate p-n junctions. At high reverse body bias voltages, the increased band-to-band tunnelling current becomes comparable to the reduced subthreshold leakage current. There is, therefore, an optimum reverse body bias voltage (limited by the increased

band-to-band tunnelling currents) that can be applied to a transistor to reduce the total leakage current [74], [75]. Reverse body biasing the keeper transistor is proposed in this paper in order to reduce the active mode conduction current (I<sub>drain</sub> when the keeper is on) rather than the subthreshold leakage current (I<sub>off</sub> when the keeper is off). The maximum reverse body bias that can be applied to a keeper transistor is, therefore, not limited by the increased band-to-band tunnelling current in the DVTVK circuit technique.

The maximum voltage that can be applied across the gate oxide of a MOSFET is another factor that limits the reverse body bias voltage. Due to the scaling of the gate oxide thickness, the maximum reverse body bias voltage that can be applied to a keeper can be reduced in future nanometer technology generations. The savings in delay and power of the variable threshold voltage keeper circuit technique as compared to standard domino are reduced at lower keeper reverse body bias voltages as discussed in Section 9.3.1.

The effectiveness of reverse body biasing is reduced with technology scaling due to increasing short-channel and decreasing body effects [74], [75]. Forward body biasing has often been proposed as an alternative to reverse body biasing [74], [131]. FBB enhances body effect while reducing short-channel effects. FBB is expected to become more effective for controlling the threshold voltage of MOSFETs fabricated in future nanometer process technologies as the supply to threshold voltage ratio decreases with technology scaling [74], [77]. FBB, however, produces diode currents through the source-to-substrate and drain-to-substrate p-n junctions. These diode currents can become comparable to the drain current of a keeper transistor at low drain-to-source voltages provided the forward body bias voltage is increased beyond a specific value dependent on the junction temperature (700 mV at room temperature). The diode currents degrade the DC operating voltage of the dynamic node even when all of the pulldown transistors are turned off. The diode currents also increase the power overhead of the DVTVK circuit technique. The increased diode currents, therefore, limit the maximum forward body bias voltage that can be applied to a keeper transistor for enhanced noise immunity.

### 9.5. Chapter Summary

A high speed, low power domino logic circuit technique is described in this chapter. The circuit technique dynamically changes the threshold voltage of the keeper with a specific delay after the beginning of each operational phase (evaluation and precharge) of the domino circuit by varying the body bias voltage of the keeper transistor. The keeper contention current is reduced by increasing the keeper threshold voltage by applying a reverse body bias to the keeper at the beginning of the evaluation phase. Similarly, the degradation in noise immunity of DVTVK as compared to SD is avoided by reducing the keeper threshold voltage to the zero body bias level after a delay greater than the worst case evaluation delay of a domino logic circuit. Significant speed enhancements and power reductions are achieved when the keeper is sized for increased noise immunity.

The DVTVK and SD circuit techniques are compared in terms of the evaluation delay and power dissipation assuming the DVTVK and SD circuits have the same keeper size. The DVTVK technique operates at up to a 60% higher speed while consuming 35% less power as compared to SD. DVTVK also reduces the PDP by up to 74% as compared to SD. A temporary degradation in the noise immunity of DVTVK as compared to SD of less than 11% is observed when the keeper of the DVTVK is reverse body biased.

Since the contention current is significantly reduced with the presented variable threshold voltage keeper technique, the keeper transistor in a DVTVK circuit can be sized larger, offering higher noise immunity with the same delay and power characteristics as compared to a standard domino logic circuit. The DVTVK and SD circuit techniques are compared in terms of the noise immunity that the two circuit techniques offer with the same evaluation delay, power dissipation, or power-delay product characteristics. For the same evaluation delay characteristics, DVTVK (with a zero biased keeper) offers 14.1% higher noise immunity as compared to SD. Under the same power dissipation conditions, DVTVK (with a zero biased keeper) increases the noise immunity by 8.9% as compared to SD. Similarly, under the same PDP

conditions, DVTVK (with a zero biased keeper) offers 11.9% higher noise immunity as compared to SD.

Forward body biasing the keeper transistor is also proposed to improve the noise immunity as compared to a standard domino circuit with the same keeper size. By applying a forward body bias of 600 mV to a keeper transistor, the noise immunity is enhanced by up to 10.2%. Dynamically forward and reverse body biasing the keeper transistor simultaneously enhances the noise immunity, evaluation speed, power dissipation, and PDP characteristics of a domino logic circuit.

### Chapter 10

# **Subthreshold Leakage Current Characteristics of Dynamic Circuits**

Subthreshold leakage power is expected to dominate the total power consumption of a CMOS circuit in the near future as depicted in Fig. 10.1 [5], [21], [29], [33]-[37]. Energy efficient circuit techniques aimed at lowering leakage currents are, therefore, highly desirable. The subthreshold leakage current of a domino logic circuit can vary dramatically with the voltage state of the dynamic and output nodes. The dynamic node voltage dependent asymmetry of the subthreshold leakage current characteristics of dual threshold voltage domino gates was first noted by Kao [136]. Based on this asymmetry, several circuit techniques that place dual threshold voltage domino logic circuits into a low leakage state have been proposed in [34], [130], [136], and [142].



Fig. 10.1. Power trends of high performance microprocessors.

A quantitative study of the subthreshold leakage current characteristics of standard low threshold voltage (low- $V_t$ ) or dual threshold voltage (dual- $V_t$ ) domino logic circuits, however, has to date not been presented in the literature. The node voltage dependent subthreshold leakage current characteristics of domino logic circuits are examined in this chapter. Different subthreshold leakage current conduction paths which occur during different dynamic and output node voltage states are identified. It is shown that a discharged dynamic node is preferable for reducing leakage current in a dual- $V_t$  circuit. Alternatively, a charged dynamic node is preferred for lower subthreshold leakage energy in a standard low- $V_t$  domino logic circuit with stacked pull-down devices, such as an AND gate.

Noise immunity issues in dual-V<sub>t</sub> domino logic circuits have been ignored in [136]. Provided that a dual-V<sub>t</sub> CMOS technology is employed, the noise immunity of domino logic circuits can be significantly degraded, affecting the reliability. A brief discussion of noise immunity related issues in dual-V<sub>t</sub> domino circuits is provided in [130]. A dual-V<sub>t</sub> domino logic circuit technique based on low-V<sub>t</sub> keeper transistors is proposed to maintain a noise immunity similar to standard low-V<sub>t</sub> domino logic circuits [130].

A discussion of the effect of dual-V<sub>t</sub> CMOS technologies on the noise immunity characteristics of domino logic circuits is provided in this chapter. Two different dual-V<sub>t</sub> domino logic circuit techniques that maintain similar noise immunity as compared to standard low-V<sub>t</sub> circuits are evaluated. Both keeper and output inverter sizing is required in a dual-V<sub>t</sub> domino logic circuit with a high threshold voltage (high-V<sub>t</sub>) keeper transistor in order to provide similar noise immunity as compared to a standard low-V<sub>t</sub> domino logic circuit. As an alternative technique, a dual-V<sub>t</sub> circuit technique based on low-V<sub>t</sub> keeper transistors is also considered in this chapter. Under similar noise immunity conditions as compared to standard low-V<sub>t</sub> domino logic circuits, the savings in subthreshold leakage energy achieved by the dual-V<sub>t</sub> circuit technique with a high-V<sub>t</sub> keeper is 5.7 to 10.9 times higher as compared to the savings offered by the dual-V<sub>t</sub> circuit technique with a low-V<sub>t</sub> keeper.

Under similar noise immunity conditions, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a high-V<sub>t</sub> keeper at a low dynamic node voltage is 224 to 235 times smaller as compared to low-V<sub>t</sub> domino logic circuits with a low dynamic node voltage. Alternatively, as compared to low-V<sub>t</sub> domino logic circuits with a high dynamic node voltage, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a high-V<sub>t</sub> keeper at a low dynamic node voltage is 89 to 3079 times smaller.

The chapter is organized as follows. The node voltage state dependence of the subthreshold leakage current characteristics of different domino logic circuits is described in Section 10.1. The effect of a dual-V<sub>t</sub> CMOS technology on the noise immunity characteristics of domino logic circuits is discussed in Section 10.2. The active mode delay and power dissipation of dual-V<sub>t</sub> domino logic circuits are presented in Section 10.3. The effect of the difference between the high and low threshold voltages provided in a dual-V<sub>t</sub> CMOS technology on the speed and power characteristics of the dual-V<sub>t</sub> domino logic circuit technique is evaluated in Section 10.4. The research results presented in this chapter are summarized in Section 10.5.

## 10.1. State Dependent Subthreshold Leakage Current Characteristics

A dual- $V_t$  domino logic circuit is shown in Fig. 10.2. The high- $V_t$  transistors are represented in Fig. 10.2 by a thick line in the channel region. The critical signal transitions that determine the delay of a domino logic circuit occur along the evaluation path. In a dual- $V_t$  domino circuit, therefore, all of the transistors that can be activated during the evaluation phase have a low- $V_t$ . The precharge phase transitions are typically not critical for the speed of a domino logic circuit. In order to exploit the excessive slack of the precharge paths, those transistors that are active during the precharge phase have a high- $V_t$ .

The node voltage dependence of the subthreshold leakage current characteristics of various dual- $V_t$  and low- $V_t$  domino logic circuits is evaluated in this section, assuming a 0.18  $\mu$ m CMOS technology ( $V_{tnlow} = |V_{tplow}| = 200$  mV,  $V_{tnhigh} = |V_{tphigh}| = 100$  mV,  $V_{tnhigh} = |V_{tphigh}| = 100$ 

500 mV, and  $T = 110^{\circ}$ C). The variation of the subthreshold current conduction paths with the node voltages in a low-V<sub>t</sub> and dual-V<sub>t</sub> domino logic circuit is shown in Figs. 10.3 and 10.4, respectively.



Fig. 10.2. A dual-V<sub>t</sub> domino logic circuit.

Clock gating is an effective method for lowering the dynamic switching power in the unused portions of an integrated circuit. Moreover, when the clock is gated high, the pull-up transistor is turned off, ensuring that no short-circuit current conduction path exists between the power supply and ground (provided that the inputs are high). In this paper, therefore, it is assumed that the clock is gated high in an idle domino logic circuit. The dynamic node is cyclically charged every clock period. Therefore, provided that the inputs are low after the clocks are gated, the dynamic node is maintained high during the idle mode, as illustrated in Figs. 10.3a and 10.4a. Alternatively, provided that the inputs are high after the clocks are gated, the dynamic node is discharged through the pull-down network transistors and the output transitions high, as shown in Figs. 10.3b and 10.4b. The subthreshold leakage current

of a domino logic circuit varies dramatically between these two different states of the dynamic and output nodes, as shown in Fig. 10.5.



Fig. 10.3. Variation of the subthreshold leakage current conduction paths with the state of the dynamic and output nodes in a two input standard low- $V_t$  domino AND gate. (a) High (H) dynamic node voltage state. (b) Low (L) dynamic node voltage state. LVK: Low- $V_t$  keeper transistor. LVPU: Low- $V_t$  pull-up transistor. LVN: Low- $V_t$  NMOS transistor.



Fig. 10.4. Variation of the subthreshold leakage current conduction paths with the node voltages in a two input dual- $V_t$  domino AND gate. (a) High (H) dynamic node voltage. (b) Low (L) dynamic node voltage. HVK: High- $V_t$  keeper transistor. HVPU: High- $V_t$  pull-up transistor. HVN: High- $V_t$  NMOS transistor.



Fig. 10.5. Comparison of the subthreshold leakage current of low-V<sub>t</sub> and dual-V<sub>t</sub> domino logic circuits for the two states of the dynamic node. The leakage current of each gate is normalized to the leakage current of the corresponding low-V<sub>t</sub> gate with a high (H) dynamic node voltage. L: low dynamic node voltage. AND2, AND4, AND6, and AND8: 2, 4, 6, and 8 input, respectively, domino AND gates. OR2, OR4, and OR8: 2, 4, and 8 input, respectively, domino OR gates. MUX16: 16-bit domino multiplexer.

When the dynamic node voltage is high (the inputs are low), the total subthreshold leakage current of a domino gate is

$$I_{subthreshold-H} = I_{Leak-PD} + I_{Leak-P}, (10.1)$$

where  $I_{Leak-PD}$  and  $I_{Leak-P}$  are the subthreshold leakage currents through the low-V<sub>t</sub> pull-down and output inverter pull-up transistors, respectively.

Alternatively, when the dynamic node voltage is low (the inputs are high), the total subthreshold leakage current of a low-V<sub>t</sub> domino gate is

$$I_{subthreshold-L} = I_{Leak-LVPU} + I_{Leak-LVK} + I_{Leak-LVN}, \tag{10.2}$$

where  $I_{Leak-LVPU}$ ,  $I_{Leak-LVK}$ , and  $I_{Leak-LVN}$  are the subthreshold leakage currents through the low-V<sub>t</sub> pull-up, keeper, and output inverter pull-down transistors, respectively.

The subthreshold leakage current through a stack of transistors is orders of magnitude smaller than the subthreshold leakage current through a single transistor [143]. When the inputs are low (the dynamic node is high),  $I_{leak-PD}$  decreases as more stacked devices are added to the pull-down network. Similarly, as the number of parallel pull-down paths is reduced, I<sub>leak-PD</sub> decreases. Alternatively, when the inputs are high (the dynamic node is low), the subthreshold leakage current through the pullup transistor increases as more stacked devices or parallel discharge paths are added to the pull-down network (due to the increasing width of the pull-up transistor required to drive the increased parasitic capacitance at the dynamic node).  $I_{subthreshold-L}$  is higher than I<sub>subthreshold-H</sub> for a two input low-V<sub>t</sub> AND gate. As more stacked devices are added to an AND gate, Isubthreshold-H decreases while Isubthreshold-L further increases. For a low-V<sub>t</sub> domino AND gate, therefore, a high dynamic node voltage is preferred for producing a lower subthreshold leakage current. Alternatively,  $I_{subthreshold-H}$  is higher than I<sub>subthreshold-L</sub> for a two input OR gate. As more parallel discharge paths are added to the pull-down network, both  $I_{subthreshold-H}$  and  $I_{subthreshold-L}$  increase. Since the increase in  $I_{subthreshold-L}$  is smaller than the increase in  $I_{subthreshold-H}$ , a low dynamic node voltage is preferred for reduced subthreshold leakage current in wide fan-in OR types of gates.

As shown in Fig. 10.5, a low dynamic node voltage state produces a 2.8 to 13.2 times smaller subthreshold leakage current as compared to a high dynamic node voltage state in a low- $V_t$  domino circuit with parallel pull-down network paths, such as two, four, and eight input OR gates and a 16-bit multiplexer. Alternatively, in low- $V_t$  domino logic circuits with stacked pull-down network transistors, such as two, four, six, and eight input AND gates, a low dynamic node voltage state produces a 13.3%

(AND2) to 153% (AND8) higher subthreshold leakage current as compared to a high dynamic node voltage state.

While the subthreshold leakage current characteristics of low- $V_t$  and dual- $V_t$  circuits are similar for a high dynamic node voltage state, the subthreshold leakage current characteristics of the two circuit techniques are dramatically different for a low dynamic node voltage state. When the inputs are high and the dynamic node voltage is low, the total subthreshold leakage current of a dual- $V_t$  domino gate is

$$I_{subthreshold-L} = I_{Leak-HVPU} + I_{Leak-HVK} + I_{Leak-HVN}, \tag{10.3}$$

where  $I_{Leak-HVPU}$ ,  $I_{Leak-HVK}$ , and  $I_{Leak-HVN}$  are the subthreshold leakage currents through the high-V<sub>t</sub> pull-up, keeper, and output inverter NMOS pull-down transistors, respectively.  $I_{Leak-HVPU}$ ,  $I_{Leak-HVK}$ , and  $I_{Leak-HVN}$  are orders of magnitude smaller than  $I_{Leak-LVPU}$ ,  $I_{Leak-LVK}$ , and  $I_{Leak-LVN}$ , respectively. Therefore, provided that the dynamic node is discharged in a domino logic circuit, the subthreshold leakage current can be significantly reduced by employing a dual-V<sub>t</sub> CMOS technology, as shown in Fig. 10.4. The subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a low dynamic node voltage is 257 (MUX16) to 293 times (AND2, OR2, and OR4) smaller as compared to low-V<sub>t</sub> domino logic circuits with a low dynamic node voltage. Alternatively, as compared to low-V<sub>t</sub> domino logic circuits with a high dynamic node voltage, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a low dynamic node voltage is 103 (AND8) to 3719 times (OR8) smaller.

## 10.2. Noise Immunity

The noise immunity of low- $V_t$  and dual- $V_t$  domino logic circuits is evaluated in this section. The noise immunity criterion used in this chapter is similar to the criterion described in [129]. The noise margin is the voltage amplitude of the DC noise signal

applied to the inputs that produces a signal with the same amplitude at the output of a domino logic circuit, assuming a 1 GHz clock with a 50% duty cycle.

The degradation in noise immunity due to employing dual-V<sub>t</sub> transistors is illustrated in Fig. 10.6. The drain current of a high-V<sub>t</sub> keeper transistor is reduced as compared to a low-V<sub>t</sub> keeper transistor with the same physical size. A dual-V<sub>t</sub> domino logic circuit with a high-V<sub>t</sub> keeper transistor, therefore, has lower noise immunity as compared to a standard low threshold voltage domino logic circuit. As illustrated in Fig. 10.6, the noise immunity of a dual-V<sub>t</sub> domino logic circuit with a high-V<sub>t</sub> keeper transistor (HVK) is reduced by 10% (MUX16) to 12.6% (AND2 and OR2) as compared to a low-V<sub>t</sub> domino logic circuit with the same size transistors.

The degradation in the noise immunity characteristics in a dual-V<sub>t</sub> circuit can be compensated by employing a low-V<sub>t</sub> keeper transistor rather than a high-V<sub>t</sub> keeper transistor. The noise immunity characteristics of dual-V<sub>t</sub> domino logic circuits with a low-V<sub>t</sub> keeper transistor (LVK) are illustrated in Fig. 10.6. Replacing a high-V<sub>t</sub> keeper transistor with a low-V<sub>t</sub> keeper transistor, as shown in Fig. 10.6, is not sufficient to fully compensate for the noise immunity degradation of a dual-V<sub>t</sub> domino logic circuit. The noise immunity depends not only on the physical size and threshold voltage of the keeper transistor but also on the gain of the output inverter. Since the low-V<sub>t</sub> NMOS pull-down transistor inside the output inverter of a low-V<sub>t</sub> domino logic circuit is replaced by a high-V<sub>t</sub> transistor in a dual-V<sub>t</sub> domino logic circuit (see Fig. 10.4), the high-to-low gain of the output inverter is reduced, further degrading the noise immunity. The noise immunity of a dual-V<sub>t</sub> domino logic circuit with a low-V<sub>t</sub> keeper transistor is 3.8% (AND4) to 6.3% (MUX16) lower as compared to a low-V<sub>t</sub> domino logic circuit with the same size transistors, as shown in Fig. 10.6.

One circuit technique for maintaining the same noise immunity as compared to a low- $V_t$  circuit is to employ a low- $V_t$  keeper while increasing the size of the high- $V_t$  pull-down transistor within the output inverter, thereby enhancing the high-to-low output gain and noise immunity. Another circuit technique to compensate the degradation in noise immunity is to increase the width of the high- $V_t$  keeper and the high- $V_t$  NMOS pull-down transistor within the output inverter. In this paper, both

techniques are applied to domino logic circuits to enhance noise immunity. A comparison of the subthreshold leakage current characteristics of dual- $V_t$  domino with a high- $V_t$  keeper (dual- $V_t$ -HVK), dual- $V_t$  domino with a low- $V_t$  keeper (dual- $V_t$ -LVK), and low- $V_t$  domino logic circuit techniques while providing similar noise immunity characteristics is shown in Fig. 10.7.



Fig. 10.6. Comparison of the noise immunity of low- $V_t$  and dual- $V_t$  domino logic circuits with the same size transistors. The noise margin of each gate is normalized to the noise margin of the corresponding low- $V_t$  gate. HVK: high- $V_t$  keeper. LVK: low- $V_t$  keeper.

Increasing the physical size of the keeper and the pull-down transistor within the output inverter increases both  $I_{Leak-HVK}$  and  $I_{Leak-HVN}$ , thereby degrading the reduction in subthreshold leakage current achieved at a low dynamic node voltage state. As shown in Fig. 10.7, under similar noise immunity conditions, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a high-V<sub>t</sub> keeper and a low dynamic node voltage is 224 (AND8) to 235 times (MUX16) smaller as compared to low-V<sub>t</sub> domino

logic circuits with a low dynamic node voltage. Alternatively, as compared to low-V<sub>t</sub> domino logic circuits with a high dynamic node voltage, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a high-V<sub>t</sub> keeper and a low dynamic node voltage is 89 (AND8) to 3079 times (OR8) smaller.



Fig. 10.7. Comparison of the subthreshold leakage current of low- $V_t$  and dual- $V_t$  domino logic circuits for the two states of the dynamic node (under similar noise immunity conditions). The leakage current of each gate is normalized to the leakage current of the corresponding low- $V_t$  gate with a high dynamic node voltage (H). L: low dynamic node voltage. Dual- $V_t$ -HVK: dual- $V_t$  domino with high- $V_t$  keeper. Dual- $V_t$ -LVK: dual- $V_t$  domino with low- $V_t$  keeper. Low- $V_t$ : standard low- $V_t$  domino circuit.

For the high voltage state of the dynamic node, the subthreshold leakage current characteristics of dual- $V_t$  domino circuits are similar to low- $V_t$  circuits. When the dynamic node voltage is low, the subthreshold leakage current of a dual- $V_t$  domino logic circuit is significantly increased, provided that a low- $V_t$  keeper rather than a

high- $V_t$  keeper transistor is employed. The subthreshold leakage current conduction paths within a dual- $V_t$  domino logic circuit with a low- $V_t$  keeper at a low dynamic node voltage are shown in Fig. 10.8. When the dynamic node voltage is low, the total subthreshold leakage current of a dual- $V_t$  domino gate with a low- $V_t$  keeper is

$$I_{subthreshold-L} = I_{Leak-HVPU} + I_{Leak-LVK} + I_{Leak-HVN}, \tag{10.4}$$

where  $I_{Leak-LVK}$  is the subthreshold leakage current through a low-V<sub>t</sub> keeper transistor.



Fig. 10.8. Subthreshold leakage current conduction paths for the low (L) voltage state of the dynamic node in a dual- $V_t$  domino AND gate with a low- $V_t$  keeper. LVK: Low- $V_t$  keeper transistor. HVPU: High- $V_t$  pull-up transistor. HVN: High- $V_t$  NMOS transistor.

Under similar noise immunity conditions, the subthreshold leakage current of dual- $V_t$  domino logic circuits with a low- $V_t$  keeper at a low dynamic node voltage is 21 (AND2, OR2, and OR4) to 41 times (MUX16) smaller as compared to low- $V_t$  domino logic circuits with a low dynamic node voltage. Alternatively, as compared to low- $V_t$  domino logic circuits with a high dynamic node voltage, the subthreshold

leakage current of dual- $V_t$  domino logic circuits with a low- $V_t$  keeper at a low dynamic node voltage is 14 (AND4, AND6, and AND8) to 503 times (MUX16) smaller. Since  $I_{Leak-LVK}$  is higher than  $I_{Leak-HVK}$ , the subthreshold leakage current of dual- $V_t$  domino circuits with a low- $V_t$  keeper is 5.7 (MUX16) to 10.9 times (OR4) higher than the subthreshold leakage current of dual- $V_t$  domino logic circuits with a high- $V_t$  keeper, under similar noise immunity conditions.

### 10.3. Power and Delay Characteristics During the Active Mode

The evaluation delay, precharge delay, and power consumption of dual- $V_t$  and low- $V_t$  domino logic circuits are evaluated in this section. The evaluation and precharge delay of example domino circuits are shown in Figs. 10.9 and 10.10, respectively. The power consumption characteristics of dual- $V_t$  and low- $V_t$  domino logic circuits are illustrated in Fig. 10.11.

As shown in Figs. 10.9 and 10.11, dual- $V_t$  domino logic circuits have reduced evaluation delay and power consumption as compared to low- $V_t$  domino logic circuits with the same size transistors. The enhancement in the delay and power characteristics is primarily due to the reduced contention current [33] of a high- $V_t$  keeper transistor and the increased low-to-high gain of the output inverter.

In dual-V<sub>t</sub> circuits, provided that the high-V<sub>t</sub> keeper and the output inverter are sized to maintain a similar noise immunity (SNI) as compared to low-V<sub>t</sub> circuits, except for the eight input AND gate, the evaluation delay is greater as compared to the low-V<sub>t</sub> circuits. The degradation in the evaluation speed is less than 11.8% (OR4). The increase in the precharge delay of the dual-V<sub>t</sub> domino circuits with high-V<sub>t</sub> keepers is less than 23.4% (OR8) as compared to the low-V<sub>t</sub> circuits. The increase in the active mode power consumption is less than 5.1% (AND2). For the six and eight input AND gates and the 16-bit multiplexer, the dual-V<sub>t</sub> domino logic circuit technique reduces the power consumed during both the active and standby modes while providing a similar noise immunity as compared to the low-V<sub>t</sub> circuit technique.



Fig. 10.9. Comparison of the evaluation delay of domino logic circuits. The evaluation delay of each gate is normalized to the delay of the corresponding low- $V_t$  gate. SNI: same noise immunity.



Fig. 10.10. Comparison of the precharge delay of domino logic circuits. The precharge delay of each gate is normalized to the precharge delay of the corresponding low- $V_t$  gate. SNI: same noise immunity.



Fig. 10.11. Comparison of the power consumption of domino logic circuits during the active mode. The power consumed by each gate is normalized to the power consumption of the corresponding low- $V_t$  gate. SNI: same noise immunity.

## 10.4. Dual Threshold Voltage CMOS Technology

The difference between the high and low threshold voltages ( $\Delta V_t$ ) is assumed in Sections 10.1, 10.2, and 10.3 to be 300 mV ( $V_{tnlow} = |V_{tplow}| = 200$  mV and  $V_{tnhigh} = |V_{tphigh}| = 500$  mV). The available high and low threshold voltages vary dramatically among different dual threshold voltage CMOS technologies [34], [35], [52], [142], [144]. The effect of the threshold voltages in a dual- $V_t$  CMOS technology on the speed and power characteristics of the dual- $V_t$  domino logic circuit technique is evaluated in this section. Under similar noise immunity conditions, the subthreshold leakage current, evaluation delay, precharge delay, and active mode power dissipation of the low- $V_t$  and dual- $V_t$  domino circuits are evaluated for three different sets of high and low threshold voltages (Tech<sub>1</sub>: { $V_{tnlow} = |V_{tplow}| = 200$  mV and  $V_{tnhigh} = |V_{tphigh}| = 300$  mV}, Tech<sub>2</sub>: { $V_{tnlow} = |V_{tplow}| = 200$  mV and  $V_{tnhigh} = |V_{tphigh}| = 400$  mV}, and Tech<sub>3</sub>: { $V_{tnlow} = |V_{tplow}| = 200$  mV and  $V_{tnhigh} = |V_{tphigh}| = 500$  mV}).

For all three dual threshold voltage CMOS technologies, dual- $V_t$  domino logic circuits based on a high- $V_t$  keeper transistor consume less subthreshold leakage energy as compared to dual- $V_t$  domino logic circuits based on a low- $V_t$  keeper, under similar noise immunity conditions. In this section, therefore, a comparison of the electrical characteristics of only the dual- $V_t$  domino logic circuits based on a high- $V_t$  keeper transistor and standard low- $V_t$  domino logic circuits is presented.

The range of savings in subthreshold leakage current provided by the dual- $V_t$  domino logic circuit technique for three different sets of dual threshold voltages is shown in Fig. 10.12. For Tech<sub>3</sub>, the subthreshold leakage current of the dual- $V_t$  domino logic circuits at a low dynamic node voltage is 224X (AND8) to 235X (MUX16) smaller as compared to the low- $V_t$  domino logic circuits at a low dynamic node voltage state. For a smaller  $\Delta V_t$ , the difference between  $I_{Leak-HVPU}$ ,  $I_{Leak-HVK}$ , and  $I_{Leak-HVPU}$ , and  $I_{Leak-LVPU}$ ,  $I_{Leak-LVK}$ , and  $I_{Leak-LVN}$ , respectively, is also smaller. The achievable savings in subthreshold leakage energy is, therefore, reduced with the decreased difference in the dual threshold voltages. As illustrated in Fig. 10.12, when the difference between the high and low threshold voltages is scaled to 100 mV (Tech<sub>1</sub>), the subthreshold leakage current of the dual- $V_t$  domino logic circuits at a low dynamic node voltage.

As compared to the low- $V_t$  domino logic circuits at a high dynamic node voltage state, the subthreshold leakage current of the dual- $V_t$  domino logic circuits at a low dynamic node voltage state is 89X (AND8) to 3079X (OR8) smaller for Tech<sub>3</sub>. When  $\Delta V_t$  is decreased to 100 mV (Tech<sub>1</sub>), the subthreshold leakage current of the dual- $V_t$  domino logic circuits is 3X (AND8) to 99X (OR8) smaller as compared to the low- $V_t$  domino logic circuits with a high dynamic node voltage. For all three dual threshold voltage technologies, the effectiveness of the dual- $V_t$  circuit technique for reducing subthreshold leakage current is greater in wide fan-in OR and AND-OR types of gates.



Fig. 10.12. The range of savings in subthreshold leakage current provided by the dual-V<sub>t</sub> domino logic circuit technique as compared to the standard low-V<sub>t</sub> domino logic circuit technique for three different sets of dual threshold voltages. Min\_L and Max\_L: minimum and maximum, respectively, of the reduction in subthreshold leakage current as compared to the low-V<sub>t</sub> domino logic circuits at a low dynamic node voltage state. Min\_H and Max\_H: minimum and maximum, respectively, of the reduction in subthreshold leakage current as compared to the low-V<sub>t</sub> domino logic circuits at a high dynamic node voltage state.

The range of difference in the evaluation delay of the dual- $V_t$  circuits as compared to the low- $V_t$  circuits is shown in Fig. 10.13. All of the delay differences are evaluated as a per cent of the delay of the corresponding low- $V_t$  gate. A negative difference indicates a higher evaluation speed as compared to a low- $V_t$  circuit. For Tech<sub>3</sub>, the difference between the evaluation delay of the dual- $V_t$  and low- $V_t$  domino logic circuits varies between -0.8% (AND8) and 11.8% (OR4). When  $\Delta V_t$  is reduced to 100 mV, the difference between the evaluation delays varies between -1.7% (AND8) and 1.9% (OR8).



Fig. 10.13. The range of difference in evaluation delay for the dual- $V_t$  circuits as compared to the low- $V_t$  domino logic circuits for three different sets of dual threshold voltages. A negative difference indicates a smaller evaluation delay as compared to a low- $V_t$  circuit. Min: minimum difference in evaluation delay as compared to the low- $V_t$  domino logic circuits. Max: maximum difference in evaluation delay as compared to the low- $V_t$  domino logic circuits.

The range of difference in the precharge delay for dual-V<sub>t</sub> circuits as compared to low-V<sub>t</sub> circuits is shown in Fig. 10.14. For Tech<sub>3</sub>, the difference between the precharge delay of the dual-V<sub>t</sub> and low-V<sub>t</sub> domino logic circuits varies between 11.4% (AND8) and 23.3% (OR8). For Tech<sub>1</sub>, the difference of the precharge delay varies between 5.7% (AND8) and 10.6% (OR2).

The range of difference between the power consumed by the dual- $V_t$  and low- $V_t$  circuits during the active mode of operation is shown in Fig. 10.15. All of the power differences are evaluated as a per cent of the power consumption of the corresponding low- $V_t$  gate. A negative difference indicates lower power consumption as compared to a low- $V_t$  circuit. For Tech<sub>3</sub>, the difference between the power consumed by the dual- $V_t$ 

and low- $V_t$  domino logic circuits varies between -3.9% (AND8) and 5.1% (AND2). For the six and eight input AND gates and the 16-bit multiplexer, the dual- $V_t$  domino logic circuit technique reduces the power consumption during the active mode as well as the standby mode. When  $\Delta V_t$  is reduced to 100 mV, the difference in the power consumption varies between -1.9% (AND8) and 0% (AND2). For Tech<sub>1</sub>, the dual- $V_t$  domino logic circuit technique reduces the power consumption during the active mode as well as the standby mode as compared to the low- $V_t$  circuit technique for all of the domino gates (except for the two input AND gate for which the active power consumption of the low- $V_t$  and dual- $V_t$  circuit techniques is similar).



Fig. 10.14. The range of difference in precharge delay between the dual- $V_t$  and low- $V_t$  domino logic circuits for three different sets of dual threshold voltages. Min: minimum difference in precharge delay as compared to low- $V_t$  domino logic circuits. Max: maximum difference in precharge delay as compared to low- $V_t$  domino logic circuits.



Fig. 10.15. The range of difference in the power consumption (during the active mode) of the dual- $V_t$  and low- $V_t$  domino logic circuits for three different sets of dual threshold voltages. A negative difference indicates smaller power consumption as compared to a low- $V_t$  circuit. Min: minimum difference in power consumption as compared to low- $V_t$  domino logic circuits. Max: maximum difference in power consumption as compared to low- $V_t$  domino logic circuits.

## 10.5. Chapter Summary

The node voltage dependent subthreshold leakage current characteristics of domino logic circuits are examined in this chapter. A discharged dynamic node is preferred for reducing the leakage current in a dual-V<sub>t</sub> domino logic circuit. Alternatively, a charged dynamic node is preferable for smaller subthreshold leakage energy in a standard low-V<sub>t</sub> domino logic circuit with stacked pull-down devices, such as an AND gate.

Proper keeper and output inverter sizes are required in a dual- $V_t$  domino logic circuit with a high- $V_t$  keeper in order to maintain a similar noise immunity as compared to a standard low- $V_t$  domino logic circuit. As an alternative dual- $V_t$  domino technique for enhanced noise immunity, the effect of a low threshold voltage keeper transistor on the leakage current characteristics is also evaluated. Under similar noise immunity conditions as compared to standard low- $V_t$  domino logic circuits, the savings in subthreshold leakage energy offered by dual- $V_t$  domino circuits with a high- $V_t$  keeper is 5.7 to 10.9 times greater as compared to the savings in leakage current offered by dual- $V_t$  domino circuits with a low- $V_t$  keeper.

Under similar noise immunity conditions, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a low dynamic node voltage is 224 to 235 times smaller as compared to low-V<sub>t</sub> domino logic circuits with a low dynamic node voltage. Alternatively, as compared to low-V<sub>t</sub> domino logic circuits with a high dynamic node voltage, the subthreshold leakage current of dual-V<sub>t</sub> domino logic circuits with a low dynamic node voltage is 89 to 3079 times smaller. The degradation in the precharge and evaluation speed of dual-V<sub>t</sub> domino circuits is less than 23.4% and 11.8%, respectively, as compared to standard low-V<sub>t</sub> domino circuits. The increase in active mode power consumption is less than 5.1%.

The effect of the difference of the high and low threshold voltages provided in a dual- $V_t$  CMOS technology on the speed and power characteristics of the dual- $V_t$  domino logic circuit technique is also evaluated. The dual- $V_t$  domino logic circuit technique provides a significant savings in subthreshold leakage energy down to a 100 mV difference between the high and low threshold voltages in a 0.18  $\mu$ m CMOS technology.

# **Chapter 11**

# Sleep Switch Dual Threshold Voltage Domino Logic with Reduced Standby Leakage Current

The subthreshold leakage current characteristics of domino logic circuits are examined in Chapter 10. A dual threshold voltage domino logic circuit consumes significantly lower subthreshold leakage energy at a low dynamic node voltage state as compared to a high dynamic node voltage state. The dynamic node voltage dependent asymmetry of the subthreshold leakage current characteristics of dual threshold voltage (dual-V<sub>t</sub>) domino gates was first noted by Kao [136]. A circuit technique to exploit this asymmetry has also been presented in [136]. Gating all of the inputs of the first stage of a domino pipeline is proposed in order to force dual-V<sub>t</sub> domino gates into a low leakage sleep state [136].

The energy and delay overhead for entering and leaving the sleep mode, however, has not been addressed in [136]. Due to the additional gates at the inputs, significant dynamic switching energy is consumed to activate the sleep mode with the technique described in [136]. Additional energy is dissipated to precharge all of the dynamic nodes while reactivating a domino logic circuit at the end of an idle period. In order to justify the use of additional circuitry to place a dual-V<sub>t</sub> circuit into a low leakage state, the total energy consumed to enter and leave the standby mode must be significantly less than the savings in standby leakage energy. Gating all of the inputs of the first stage of a domino circuit in a domino pipeline also increases the circuit area and active mode power. Furthermore, the circuit performance during the active mode is degraded due to the additional gates at the inputs. A circuit technique with low delay and energy overhead for placing a dual-V<sub>t</sub> domino logic circuit into a low leakage state is, therefore, highly desirable.

A circuit technique is proposed in this chapter for lowering the subthreshold leakage energy consumption of domino logic circuits. The proposed circuit technique employs sleep switches and a dual threshold voltage CMOS technology in order to place an idle domino logic circuit into a low leakage state. An eight bit domino carry lookahead adder has been designed based on this circuit technique. The sleep switch circuit technique reduces the leakage energy by up to 830 times as compared to a standard low threshold voltage (low-V<sub>t</sub>) domino circuit. The dual threshold voltage domino adder enters and leaves the sleep mode within a single clock cycle. The sleep switch circuit technique enhances the effectiveness of a dual-V<sub>t</sub> CMOS technology to reduce the subthreshold leakage current by strongly turning off all of the high threshold voltage (high-V<sub>t</sub>) transistors, independent of the input vector. The sleep switch circuit technique lowers the subthreshold leakage energy by up to 714 times as compared to a standard dual-V<sub>t</sub> domino logic circuit. The energy overhead of the circuit technique is low, permitting the sleep transistors to be activated during idle periods as short as 57 clock cycles so as to reduce the total power consumption.

Previously published leakage control techniques applicable to dual- $V_t$  domino logic circuits are discussed in Section 11.1. The operation of the sleep switch dual- $V_t$  domino logic circuit technique is described in Section 11.2. Simulation results characterizing the standby leakage energy and active mode delay and power of the sleep switch technique as compared to standard dual- $V_t$  and low- $V_t$  domino circuits are presented in Section 11.3. The noise immunity characteristics of the sleep switch dual threshold voltage domino logic circuit technique is evaluated in Section 11.4. The research results presented in this chapter are summarized in Section 11.5.

## 11.1. Previously Published Sleep Mode Circuit Techniques

Standard low- $V_t$  and dual- $V_t$  domino logic circuits are shown in Fig. 11.1. As discussed in Chapter 10, if all of the high- $V_t$  transistors are cutoff in a dual- $V_t$  domino logic circuit, the leakage current is significantly reduced as compared to a low- $V_t$  circuit. The clock is gated high, cutting off the high- $V_t$  pullup transistors when a

domino logic circuit is idle. In a standard dual- $V_t$  domino logic circuit, the modes of operation of the remaining portion of the high- $V_t$  transistors (other than the pullup transistors) are determined by the input vectors applied after the clock is gated high.



Fig. 11.1. Standard domino logic circuits. (a) Standard low- $V_t$  domino logic circuit. (b) Standard dual- $V_t$  domino logic circuit. High- $V_t$  transistors are symbolically represented by a thick line in the channel region.

Subthreshold leakage current exponentially decreases with increasing threshold voltage. The leakage current of a cutoff high-V<sub>t</sub> transistor is orders of magnitude lower as compared to a low-V<sub>t</sub> transistor [29], [145]. Assuming a subthreshold slope of 85 mV/decade (a typical number in current CMOS technologies [52], [54]), an 85 mV increase in the threshold voltage of a transistor reduces the subthreshold leakage current by ten times. Leakage currents in a dual-V<sub>t</sub> circuit can be reduced by employing a greater number of high-V<sub>t</sub> transistors [145]. Unless all of the high-V<sub>t</sub> transistors are strongly cutoff, the potential savings in energy from the dual-V<sub>t</sub> domino circuit technique cannot be fully exploited. Circuit techniques to place a domino logic circuit into a low leakage state regardless of the input vectors and the initial circuit node voltage states (before the clock is gated) are desirable. Dual-V<sub>t</sub> domino logic circuit techniques with different standby control mechanisms have been proposed in the literature [34], [35], [130], [136], [141], [142].

A dual-V<sub>t</sub> circuit technique was proposed in [136] to reduce the leakage current in domino pipelines. The dual-V<sub>t</sub> circuit technique described in [136] requires the input signals of the first stage circuits in a domino pipeline to be gated. After forcing the first stage of the domino gates to evaluate and discharge, the domino gates of the subsequent stages in the pipeline also evaluate and discharge in a domino fashion. The technique proposed in [136], however, is ineffective in placing a circuit into a low leakage state if some of the domino gates in a cascaded domino logic circuit require inverted signals (such as an XOR domino gate generating a sum bit at the output stage of a domino carry lookahead adder). Most domino logic circuits cannot be placed into a minimum leakage state in which all of the high-V<sub>t</sub> transistors are strongly cutoff simply by gating the input vectors of the first stage of a domino circuit. The technique proposed in [136] also requires a significant dynamic switching energy overhead for activating the sleep mode due to the additional gates at the inputs. The dual-V<sub>t</sub> domino circuit proposed in [136] only offers a savings in energy if the circuit stays idle for a long time. Furthermore, gating all of the inputs of the first stage of a domino circuit in a domino pipeline increases the circuit area and active mode power. The circuit performance during the active mode is also degraded due to the additional gates at the inputs.

An alternative dual-V<sub>t</sub> technique was proposed in [130] to reduce the dynamic power, propagation delay, and area overhead as compared to the technique proposed in [136]. Although the delay and area overhead is reduced by the technique proposed in [130], the energy consumed during the standby mode is higher as compared to the circuit proposed in [136]. This increased standby energy is primarily due to the NMOS transistor inside the output inverter of the domino gates in the first stage of each domino pipeline not being completely turned off and because the keeper has a low-V<sub>t</sub> in the technique described in [130]. Dual-V<sub>t</sub> domino logic circuits based on a low-V<sub>t</sub> keeper transistor consume significantly higher subthreshold leakage energy as compared to dual-V<sub>t</sub> circuit techniques based on a high-V<sub>t</sub> keeper transistor (see Chapter 10 for a quantitative comparison of the two circuit techniques).

The approach of utilizing the leakage currents of the pulldown path transistors was proposed in [142] in order to place a dual-V<sub>t</sub> domino logic circuit into a low leakage state. High-V<sub>t</sub> switches are employed in series with the keeper and the NMOS transistor of the output inverter in a domino circuit. When the circuit is active, these high-V<sub>t</sub> switches are on and the circuit operates similarly to a standard dual-V<sub>t</sub> circuit. When the circuit is idle, the high-V<sub>t</sub> series switches are cutoff by a sleep signal, isolating the dynamic node from the power supply. The floating dynamic node slowly discharges due to the leakage current of the transistors along the pulldown path. The high-V<sub>t</sub> switch in series with the NMOS transistor of the output inverter ensures that no short-circuit power is consumed during the slow discharge of the dynamic node. A high-V<sub>t</sub> series transistor at the output inverter, however, degrades the precharge delay. Furthermore, a high-V<sub>t</sub> transistor in series with a keeper degrades the noise immunity. To minimize the degradation in noise immunity and precharge delay, the size of these series switches should be increased. Wider series transistors, however, increase the energy overhead of activating the standby leakage control mechanism. Increasing the series transistor size also increases the area overhead of this technique. Another disadvantage of this technique is the low speed of the proposed mechanism for placing a circuit into a low leakage state. The circuit technique proposed in [142], therefore, may not be feasible for fine-grain leakage reduction during short idle periods (a few tens to hundreds of clock cycles) in high performance integrated circuits.

## 11.2. Dual Threshold Voltage Domino Logic Employing Sleep Switches

A low energy and delay overhead circuit technique is presented in this chapter to lower the subthreshold leakage currents in an idle domino logic circuit. The circuit technique employs sleep switches to place a dual- $V_t$  domino logic circuit into a low leakage state within a single clock cycle. A domino logic circuit based on the sleep switch dual- $V_t$  circuit technique is shown in Fig. 11.2.



Fig. 11.2. Sleep switch dual- $V_t$  domino logic circuit technique. High- $V_t$  transistors are symbolically represented by a thick line in the channel region.

A high-V<sub>t</sub> NMOS switch is added to the dynamic node of a domino circuit as shown in Fig. 11.2. The operation of this transistor is controlled by a separate sleep signal. During the active mode of operation, the sleep signal is set low, the sleep

switch is cutoff, and the proposed dual- $V_t$  circuit operates as a standard dual- $V_t$  domino circuit. During the standby mode of operation, the clock signal is maintained high, turning off the high- $V_t$  pull-up transistor of each domino gate. The sleep signal transitions high, turning on the sleep switch. The dynamic node of the domino gate is discharged through the sleep switch, thereby turning off the high- $V_t$  NMOS transistor within the output inverter. The output transitions high, cutting off the high- $V_t$  keeper. Following the low-to-high transition of the output of a sleep switch dual- $V_t$  domino gate, the subsequent gates (fed by the non-inverting signals) also evaluate and discharge in a domino fashion. After the node voltages settle to a steady state, all of the high- $V_t$  transistors are strongly cutoff, significantly reducing the subthreshold leakage current. Note that this technique, requiring no additional gating on the input signals while strongly turning off all of the high- $V_t$  transistors within a single clock cycle, is significantly more power, delay, and area efficient as compared to the techniques proposed in [130], [136], and [142].

#### 11.3. Simulation Results

Eight input clock-delayed domino carry lookahead adders based on the low- $V_t$ , standard dual- $V_t$ , and sleep switch circuit techniques are evaluated assuming a 0.18  $\mu$ m CMOS technology ( $V_{tnlow} = |V_{tplow}| = 200 \text{ mV}$ ,  $V_{tnhigh} = |V_{tphigh}| = 500 \text{ mV}$ , and  $T = 110^{\circ}$ C). The block diagram of a clock-delayed domino carry lookahead adder based on the sleep switch dual- $V_t$  circuit technique is shown in Fig. 11.3. Each sum output drives a capacitive load of 10 fF. A 1 GHz clock with a 50% duty cycle is applied to the domino logic circuits. All of the common transistors in the sleep switch and standard dual- $V_t$  adders are sized the same.

In the sleep switch adder, all of the propagate (P), generate (G), and sum (S) domino gates have sleep switches. When the domino adder is idle, the dynamic nodes of the P and G domino gates (in the first stage propagate and generate (PG) block) are forced to discharge via sleep switches. The domino gates within the lookahead carry (C) block do not have sleep switches. Following the low-to-high transition of the

outputs of the P and G gates, the domino gates within the carry block also evaluate and discharge in a domino fashion. Some of the signals originating from the PG and C blocks are inverted before being fed into the sum block (see Fig. 11.3). The domino logic circuits within the sum block, therefore, also require sleep switches in order to place the circuits into a low leakage state.

The input vectors applied to an adder are listed in Table 11.1. The leakage characteristics of the circuits are evaluated for six input vectors,  $V_0$  to  $V_5$ .  $C_{out}$  (S<sub>8</sub>) is evaluated through the critical path of the carry chain within the carry block for the input vector  $V_2$  (V<sub>3</sub>). The delay and active mode power are calculated for  $V_2$  and  $V_3$ .



Fig. 11.3. Block diagram of a clock-delayed domino carry lookahead adder with the sleep switch dual- $V_t$  circuit technique.

TABLE 11.1
INPUT VECTORS APPLIED TO AN ADDER

| _ |   | $V_0$ | $V_1$ | $V_2$ | $V_3$ | $V_4$ | $V_5$ |
|---|---|-------|-------|-------|-------|-------|-------|
| _ | A | 0     | 0     | 1     | 1     | 255   | 255   |
| _ | В | 0     | 255   | 255   |       | 255   | 0     |

The sleep switch circuit technique significantly reduces the subthreshold leakage current as compared to both low-V<sub>t</sub> and standard dual-V<sub>t</sub> circuits. The standby leakage energy characteristics of the adders based on the standard dual-V<sub>t</sub>, low-V<sub>t</sub>, and sleep switch circuit techniques are presented in Section 11.3.1. The subthreshold leakage current characteristics of a standard domino logic circuit displays a strong dependence on the input vectors when the circuit is at a high dynamic node voltage state. The stack effect on the subthreshold leakage current characteristics of a domino logic circuit at a high dynamic node voltage state is described in Section 11.3.2. The sleep switch circuit technique also enhances the active mode delay and power characteristics as compared to a low-V<sub>t</sub> circuit. The active mode delay and power characteristics of the circuit technique are discussed in Section 11.3.3. The sleep/wake-up delay and energy overhead are presented in Section 11.3.4.

#### 11.3.1. Subthreshold Leakage Energy Reduction

The standby leakage energy characteristics of the low- $V_t$ , standard dual- $V_t$ , and sleep switch dual- $V_t$  adders are evaluated in this section. When a low- $V_t$ , standard dual- $V_t$ , or sleep switch domino logic circuit is idle, the clock is gated high. In a sleep switch circuit, the sleep transistors are activated after clock gating. The leakage energy consumption (per clock cycle) of the low- $V_t$ , standard dual- $V_t$ , and sleep switch adders is shown in Fig. 11.4.

The leakage energy of a standard dual- $V_t$  circuit is reduced by 1.2 X to 2.8 X as compared to a low- $V_t$  circuit. The standby leakage energy of the standard low- $V_t$  and dual- $V_t$  circuits is dependent on the applied input vector after the clock signal is gated high. The dynamic nodes of all of the domino logic circuits are precharged when the clock is low. After the clock transitions high, a portion of these domino gates evaluates and discharges provided that a necessary input combination to discharge the dynamic node is applied. A high dynamic node voltage state is typically the highest leakage state for a dual- $V_t$  domino logic gate since all of the high- $V_t$  transistors (other than the pullup transistors) operate in the strong inversion region. As discussed in

Section 11.1, the advantages of a dual- $V_t$  CMOS technology for reducing the leakage current are maximized when all of the high- $V_t$  transistors are strongly cutoff during the idle mode. For  $V_0$ , the dynamic nodes of all of the domino logic gates of a standard dual- $V_t$  adder are maintained high during the idle mode. The  $V_0$  vector, therefore, produces the maximum leakage current in a standard dual- $V_t$  adder. For  $V_0$ , the subthreshold leakage current is produced by the low- $V_t$  transistors rather than the high- $V_t$  transistors in a standard dual- $V_t$  adder. The subthreshold leakage current of the domino gates within the standard low- $V_t$  and dual- $V_t$  adders is, therefore, similar for  $V_0$ . The small difference between the subthreshold leakage current characteristics of the standard low- $V_t$  and dual- $V_t$  adders for  $V_0$  is caused by the reduced leakage of the dual- $V_t$  delay elements in a standard dual- $V_t$  domino adder.



Fig. 11.4. A comparison of the leakage energy (per clock cycle) of the adder circuits with the low- $V_t$ , standard dual- $V_t$ , and sleep switch circuit techniques for six different input vectors.

As shown in Fig. 11.4, the proposed sleep switch circuit technique minimizes the leakage energy for all of the input vectors as compared to both the low- $V_t$  and standard dual- $V_t$  circuits. Activating the sleep transistors places all of the domino

gates into a low leakage state for any given input vector. The reduction in leakage energy offered by the sleep switch circuit technique varies between 461 X ( $V_2$ ) and 830 X ( $V_0$ ) as compared to a low- $V_t$  adder. The proposed circuit technique enhances the effectiveness of a dual- $V_t$  CMOS technology to reduce the subthreshold leakage current by cutting off all of the high- $V_t$  transistors. An adder based on the sleep switch circuit technique dissipates 167 X ( $V_2$ ) to 714 X ( $V_0$ ) lower leakage energy as compared to a standard dual- $V_t$  adder.

#### 11.3.2. Stack Effect in Domino Logic Circuits

For  $V_1$  and  $V_5$ , the dynamic nodes of the generate and carry gates are maintained high while the propagate and sum gates evaluate and discharge in a standard domino adder. In a standard dual- $V_t$  adder, the subthreshold leakage current in the propagate and sum gates is 2415 X and 1149 X, respectively, smaller for both  $V_1$  and  $V_5$  as compared to  $V_0$ . Similarly, in a standard low- $V_t$  adder, the subthreshold leakage current in the propagate and sum gates is 3.3 X and 1.7 X, respectively, smaller for  $V_1$  and  $V_5$  as compared to  $V_0$ . Despite this significant reduction in the subthreshold leakage current of the propagate and sum gates, the second and third highest leakage currents in standard low- $V_t$  and dual- $V_t$  domino logic circuits are observed for  $V_1$  and  $V_5$ , respectively, as shown in Fig. 11.4. The subthreshold leakage current of the generate and carry gates in standard dual- $V_t$  and low- $V_t$  adders approximately doubles for  $V_1$  and  $V_5$  as compared to  $V_0$ . This significant increase in subthreshold leakage current with input vector for a high dynamic node voltage state of the generate and carry gates is caused by the stack effect [143], [146].

Input vectors applied after clock gating determine the transistors that produce subthreshold leakage current together with the voltage state of the dynamic node. As discussed previously, a high dynamic node voltage state is typically the highest leakage state in a dual-V<sub>t</sub> domino logic gate. A variation of the subthreshold leakage current sources in the pulldown network of a standard dual-V<sub>t</sub> domino (generate) gate with the input vector for a high voltage state of the dynamic node is shown in Fig.

11.5. The dynamic node voltage in the generate and carry gates is maintained high for three different input vectors,  $V_0$ ,  $V_1$ , and  $V_5$ , as shown in Figs. 11.5a, 11.5b, and 11.5c, respectively.



Fig. 11.5. Variation of subthreshold leakage current conduction paths with input vector for a high voltage state at the dynamic node in a standard dual- $V_t$  domino logic circuit. (a) Sources of subthreshold leakage current for  $V_0$ . (b) Sources of subthreshold leakage current for  $V_1$ . (c) Sources of subthreshold leakage current for  $V_5$ . H: high. L: low.

For  $V_0$ , both  $N_1$  and  $N_2$  operate in the weak inversion region. The voltage of Node1 rises until the subthreshold leakage currents through  $N_1$  and  $N_2$  are equal (in the steady state condition). The total subthreshold leakage current at steady state is

$$I_{subthreshold-H} = I_{Leak-PD} + I_{Leak-P1}, (11.1)$$

$$I_{Leak-PD} = I_{Leak-N1-V0} = I_{Leak-N2-V0}, (11.2)$$

where  $I_{Leak-Pl}$  is the subthreshold leakage current through the low-V<sub>t</sub> pullup transistor within the output inverter.  $I_{Leak-Nl-V0}$  and  $I_{Leak-N2-V0}$  are the subthreshold leakage currents through N<sub>1</sub> and N<sub>2</sub>, respectively, for V<sub>0</sub>.

For  $V_1$ ,  $N_1$  operates in the weak inversion region. Alternatively,  $N_2$  operates in the strong inversion region. The total subthreshold leakage current at steady state is

$$I_{subthreshold-H} = I_{Leak-N1-V1} + I_{Leak-P1}, \tag{11.3}$$

where  $I_{Leak-NI-VI}$  is the subthreshold leakage current through  $N_1$  for  $V_1$ .

For  $V_5$ ,  $N_2$  operates in the weak inversion region while  $N_1$  is turned on (strong inversion). The total subthreshold leakage current at steady state is

$$I_{subthreshold-H} = I_{Leak-N2-V5} + I_{Leak-P1}, \tag{11.4}$$

where  $I_{Leak-N2-V5}$  is the subthreshold leakage current through N<sub>2</sub> for V<sub>5</sub>.

For  $V_0$ ,  $N_1$  and  $N_2$  are cutoff. A steady state voltage is reached when the voltage at Node<sub>1</sub> rises to approximately 41 mV above ground, equalizing the subthreshold leakage currents through  $N_1$  and  $N_2$ . The subthreshold leakage current through a stack of cutoff transistors is significantly smaller than the subthreshold leakage current through a single cutoff transistor [143], [146]. The subthreshold leakage current of a MOSFET is exponentially dependent on the threshold, gate-to-source, and drain-to-

source voltages [39], [48]. The subthreshold leakage current through  $N_1$  exponentially decreases for  $V_0$  as compared to  $V_1$ , due to increased threshold voltage (reverse body bias), negative gate-to-source voltage, and lower drain-to-source voltage. For  $V_1$ , the voltages of Node<sub>1</sub> and Node<sub>2</sub> are both at approximately the ground level since  $N_2$  operates in the strong inversion region. This condition eliminates the reverse body bias and negative gate-to-source voltage while increasing the drain-to-source voltage of  $N_1$ .  $I_{Leak-NI-VI}$  is, therefore, higher as compared to  $I_{Leak-NI-VO}$ . The total subthreshold leakage current of the generate gates is 2.3 X higher for  $V_1$  as compared to  $V_0$ . Similarly, the subthreshold leakage current of the carry gates is 1.7 X higher for  $V_1$  as compared to  $V_0$ . For  $V_5$ ,  $V_1$  is on while  $V_2$  is cutoff. The voltage at Node<sub>1</sub> is high, increasing the drain-to-source voltage of  $V_2$ .  $V_2$  is, therefore, higher than  $V_2$  is cutoff the total subthreshold leakage current of the generate gates by 2.1 X. Similarly, the subthreshold leakage current of the carry gates is 1.7 X higher for  $V_3$  as compared to  $V_4$ . The subthreshold leakage currents in the standard low- $V_4$  carry and generate gates also significantly decrease for  $V_3$  as compared to  $V_4$  and  $V_5$ , due to the stack effect.

#### 11.3.3. Delay and Power Reduction in the Active Mode

The active mode delay, power, and power delay product (PDP) of low- $V_t$ , standard dual- $V_t$ , and sleep switch adders are shown in Fig. 11.6. The delay and power characteristics of a standard dual- $V_t$  adder are similar to the sleep switch adder. The sleep switch circuit technique enhances the evaluation speed by 12% and 21%, for  $V_2$  and  $V_3$ , respectively, as compared to a low- $V_t$  adder. The enhancement in speed with the proposed circuit technique is primarily due to the reduced contention current [33] of a high- $V_t$  keeper (see Chapter 9 for a detailed discussion of the contention current).

The sleep switch circuit technique also reduces the active mode power consumption as compared to a low- $V_t$  circuit. The power consumption is reduced by 14.4% and 14.6% for the input vectors  $V_2$  and  $V_3$ , respectively, as compared to a low- $V_t$  adder. A portion of the savings in active mode power is due to the reduced contention current of the high- $V_t$  keeper transistor in a dual- $V_t$  circuit (see Figs. 11.1b

and 11.2). Another important factor that reduces the power consumption of a dual-V<sub>t</sub> circuit is the lower power consumption in dual-V<sub>t</sub> delay elements.



Fig. 11.6. A comparison of the delay, power, and power delay product (PDP) of adder circuits with low- $V_t$ , standard dual- $V_t$ , and sleep switch circuit techniques for the input vectors  $V_2$  and  $V_3$ .

#### 11.3.4. Sleep/Wake-up Delay and Energy Overhead

When a sleep switch domino logic circuit is idle, the clock is gated high. The sleep signal should be applied after the low-to-high edge of the clock signal propagates to the gates in the last stage of a clock-delayed domino logic circuit. Activating the sleep switches after the low-to-high transition of the clock ensures that no short-circuit power is consumed while entering the sleep mode. Activating the sleep switches forces all of the domino gates to a low dynamic node voltage state. After the node voltages settle, all of the high-V<sub>t</sub> transistors are strongly cut off, minimizing the subthreshold leakage currents with the proposed sleep switch circuit

technique. Less than a clock period is required (depending upon the input vector, from 829 ps to 850 ps after the clock is gated) for the adder circuit to be placed in a low leakage state. Before the end of an idle mode, the sleep signal transitions low, cutting off all of the sleep switches. Disabling the sleep transistors before activating the clock is important in order to avoid short-circuit currents while leaving the idle mode. The clock is reactivated and all of the dynamic nodes are recharged to activate (wake-up) a sleeping domino circuit. The duration of reactivation is equal to the precharge time of a domino circuit. An adder circuit, therefore, is able to enter and leave the standby mode within a single clock cycle with the proposed circuit technique.

The energy overhead to enter and leave the sleep mode with the sleep switch technique is also evaluated. Activating the sleep switches to place a dual-V<sub>t</sub> domino logic circuit into standby mode requires a specific amount of energy. Additional energy is dissipated at the end of an idle period while precharging the dynamic nodes in order to reactivate a domino logic circuit. Depending upon the input vectors, some or none of the dynamic nodes in the low-V<sub>t</sub> and standard dual-V<sub>t</sub> circuits are discharged during the sleep mode. Alternatively, all of the dynamic nodes in a sleep switch domino logic circuit are discharged during the sleep mode, independent of the input vectors. The activation energy required by the sleep switch circuit technique is, therefore, higher than the low-V<sub>t</sub> and standard dual-V<sub>t</sub> circuit techniques. In order to justify the proposed sleep switch circuit technique to force a dual-V<sub>t</sub> circuit into a low leakage state, the total energy consumed to enter and leave the sleep mode must be less than the total savings in standby leakage energy.

The cumulative energy dissipated in the standby mode by the low- $V_t$  and sleep switch adders is shown in Fig. 11.7. It is assumed that the junction temperature does not significantly change for the duration of the standby mode. The leakage energy per cycle is assumed to be constant. The cumulative energy of a low- $V_t$  domino circuit is only affected by the subthreshold leakage current during the standby mode. Alternatively, both the cumulative leakage energy and the energy overhead of entering and leaving the sleep mode are included in the energy characteristics of the sleep switch circuit. The total energy overhead of the sleep switch circuit technique is

independent of the duration of the idle mode. The energy overhead for employing the sleep switch circuit technique is dissipated even if a domino circuit remains in the standby mode for only a single clock cycle. The total energy overhead of the proposed technique (composed of the energy dissipated in order to activate the sleep transistors while entering the sleep mode and disable the sleep transistors and reactivate the domino gates after the standby mode is over) is included as an energy step in the first cycle of the standby mode (see Fig. 11.7). Similar to the low-V<sub>t</sub> energy characteristics, after the first clock cycle, the sleep switch circuit energy is only due to the subthreshold leakage current. Since the standby leakage energy of a sleep switch circuit is significantly lower (up to 830 times) than a low-V<sub>t</sub> circuit, the sleep switch energy characteristics have a much smaller slope as compared to the energy characteristics of the low-V<sub>t</sub> adder (see Fig. 11.7). A specific amount of time in the idle mode, also dependent upon the input vectors, is necessary for the cumulative leakage energy of a low-V<sub>t</sub> circuit to exceed the cumulative energy of a sleep switch circuit.

The intersection of the sleep switch and low- $V_t$  cumulative energy characteristics are evaluated to determine the necessary minimum duration of the sleep mode of operation such that the sleep switch circuit technique offers a net savings in energy as compared to a low- $V_t$  circuit. As shown in Fig. 11.7, the cumulative standby energy of the low- $V_t$  and sleep switch circuits exhibit different behavior depending upon the input vectors. The leakage current of a low- $V_t$  adder is smallest for  $V_2$  and highest for  $V_0$  (see Fig. 11.4). Alternatively, the leakage current of a sleep switch adder is virtually independent of the input vectors. Depending upon the input vectors, the energy overhead of the sleep switch scheme changes. For  $V_0$ , none of the dynamic nodes of a low- $V_t$  circuit are discharged during the standby mode. Alternatively, all of the dynamic nodes are discharged in a sleep switch circuit. The relative energy overhead of the sleep switch circuit technique required to charge the dynamic nodes to reactivate the circuit (to transition from standby mode to active mode) is, therefore, highest for  $V_0$ . As shown in Fig. 11.7, a minimum of 42 clock cycles is required for

the proposed sleep switch circuit technique to provide a net savings in energy as compared to a low-V<sub>t</sub> circuit during the standby mode.



Fig. 11.7. Cumulative standby energy dissipation of the low-Vt and sleep switch adders for three different input vectors.

As discussed previously, a standard dual- $V_t$  circuit offers a savings in leakage current of 1.2 X to 2.8 X as compared to a low- $V_t$  circuit. The energy savings of a standard dual- $V_t$  domino circuit originates from the selective replacement of a group of high leakage low- $V_t$  transistors with a group of low leakage high- $V_t$  transistors. Unlike the proposed sleep switch circuit technique, a standard dual- $V_t$  circuit does not introduce any energy overhead in order to reduce the standby leakage current. Although the leakage energy of a sleep switch circuit is significantly reduced as compared to a standard dual- $V_t$  circuit, the non-negligible energy overhead of the proposed circuit technique must also be assessed to accurately compare the energy characteristics of the two circuit techniques. The cumulative energy dissipated during standby mode by the sleep switch and standard dual- $V_t$  adders is shown in Fig. 11.8.



Fig. 11.8. Cumulative standby energy dissipation of the sleep switch and standard dual- $V_t$  adders for three different input vectors.

The (step) change in energy of the sleep switch characteristics during the first cycle represents the energy overhead for activating the sleep switches (to enter the sleep mode) and for deactivating the sleep switches and recharging the domino gates (to exit the sleep mode). Since the sleep switch circuit technique reduces the standby leakage energy by 167 X to 714 X as compared to a standard dual-V<sub>t</sub> circuit, the sleep switch characteristics have a significantly smaller slope after the first cycle as compared to the energy characteristics of a standard dual-V<sub>t</sub> adder. As discussed previously, V<sub>0</sub> produces the highest leakage state in a standard dual-V<sub>t</sub> circuit. Alternatively, the leakage energy of a standard dual-V<sub>t</sub> adder is lowest for V<sub>2</sub>. No input combination exists that can place a standard dual-V<sub>t</sub> adder into a lower leakage state as compared to a sleep switch dual-V<sub>t</sub> adder. Circuit techniques based on the application of a selected input vector to place a circuit into a low leakage state (such as the technique described in [86] and [136]) are, therefore, ineffective for minimizing the leakage of the domino adder discussed in this paper.

As shown in Fig. 11.8, a minimum of 57 clock cycles is required for the proposed sleep switch circuit technique to provide a net savings in energy as compared to a standard dual-V<sub>t</sub> circuit during the standby mode. Although the leakage energy of the standard dual-V<sub>t</sub> domino adder is 167 X to 714 X higher as compared to the sleep switch adder, a standard dual-V<sub>t</sub> circuit technique is preferable in those applications with idle periods shorter than 57 clock cycles.

### 11.4. Noise Immunity Compensation

As discussed in Chapter 9, in a standard domino logic gate, a feedback keeper is employed to maintain the state of the dynamic node against coupling noise, charge sharing, and subthreshold leakage current. In a dual- $V_t$  domino logic circuit, the keeper transistor has a high- $V_t$  (see Figs. 11.1b and 11.2). As discussed in chapter 10, the current supplied by a high- $V_t$  keeper to preserve the state of a dynamic node is reduced, thereby degrading the noise immunity as compared to a low- $V_t$  circuit. The degradation of noise immunity varies for different blocks within an adder.

During evaluation of the noise immunity characteristics, the same noise signal is coupled to all of the inputs of a domino logic circuit as this situation represents the worst case noise condition. In sleep switch circuits, the noise is also assumed to couple to the gates of the sleep transistors. The noise margin criterion used in this section is similar to the criterion described in [129]. The noise immunity is the voltage amplitude of the DC noise signal that produces a signal with the same amplitude at the output of a domino logic circuit, assuming a 1 GHz clock with a 50% duty cycle. The average degradation in noise immunity for the propagate, generate, carry, and sum domino logic gates are listed in Table 11.2. The degradation in noise immunity of the sleep switch domino logic gates varies between 11.3% and 14.9% as compared to the low-V<sub>t</sub> circuits. The effect of the sleep switches on the noise immunity characteristics of the dual-V<sub>t</sub> domino logic gates is very small. Sleep switch P and S gates have similar noise immunity as compared to the standard dual-V<sub>t</sub> P and S circuits. The

noise immunity degradation in the sleep switch G gates as compared to the standard dual-V<sub>t</sub> G gates is less than 1.7%.

TABLE 11.2 DEGRADATION IN NOISE IMMUNITY OF STANDARD DUAL-V $_{\rm t}$  AND SLEEP SWITCH ADDERS AS COMPARED TO THE LOW-V $_{\rm t}$  ADDER WITH SAME SIZE TRANSISTORS

|                                     | Gate                         | Propagate | Generate | Carry | Sum   |
|-------------------------------------|------------------------------|-----------|----------|-------|-------|
| Average Reduction in Noise Immunity | Standard Dual-V <sub>t</sub> | 12.2%     | 14.0%    | 12.9% | 11.3% |
|                                     | Sleep Switch                 | 12.2%     | 14.9%    | 12.9% | 11.3% |

Both keeper and output inverter sizing are required in a dual-V<sub>t</sub> domino logic circuit with a high-V<sub>t</sub> keeper transistor in order to provide similar noise immunity as compared to a standard low-V<sub>t</sub> domino logic circuit (see Chapter 10 for a detailed discussion). An alternative technique for enhanced noise immunity is to employ a low- $V_t$  keeper transistor in a dual- $V_t$  domino circuit. Unless the output inverter is resized, a dual-V<sub>t</sub> domino circuit with a low-V<sub>t</sub> keeper transistor is not capable of providing noise immunity comparable to a low-V<sub>t</sub> domino logic circuit. Under similar noise immunity conditions as compared to the standard low-V<sub>t</sub> domino logic circuits, the subthreshold leakage energy savings offered by a dual-V<sub>t</sub> circuit technique based on a high-V<sub>t</sub> keeper is significantly higher as compared to the leakage savings offered by a dual-V<sub>t</sub> circuit technique based on a low-V<sub>t</sub> keeper. In this section, therefore, the high-V<sub>t</sub> keeper and output inverter pulldown transistor widths of each sleep switch and standard dual-V<sub>t</sub> domino gate are increased so as to maintain a similar noise immunity as compared to standard low-V<sub>t</sub> gates. A comparison of the subthreshold leakage energy (per clock cycle) of the low-V<sub>t</sub>, standard dual-V<sub>t</sub>, and sleep switch dual-V<sub>t</sub> domino adders under similar noise immunity conditions for different input vectors is

shown in Fig. 11.9. The normalized leakage energy consumption of the low- $V_t$ , standard dual- $V_t$ , and sleep switch adders under similar and degraded noise immunity conditions is listed in Table 11.3.



Fig. 11.9. Under similar noise immunity conditions, a comparison of the leakage energy (per clock cycle) of the adder circuits with the low-V<sub>t</sub>, standard dual-V<sub>t</sub>, and sleep switch circuit techniques for six different input vectors.

The subthreshold leakage energy consumed by a standard dual-V<sub>t</sub> domino adder is determined by the subthreshold leakage current of the domino gates which are at a high dynamic node voltage state. When the dynamic node voltage is high, the subthreshold leakage current characteristics is virtually independent of the width of the keeper and output inverter pulldown transistors (see Chapter 10). Keeper and output inverter sizing for enhanced noise immunity, therefore, has very little effect on the subthreshold leakage energy consumed by a standard dual-V<sub>t</sub> adder, noticeable by a comparison of Figs. 11.4 and 11.9.

The dynamic nodes of all of the domino gates in a sleep switch circuit are maintained in a low voltage state. In a dual-V<sub>t</sub> domino logic circuit at a low dynamic node voltage state, the subthreshold leakage current strongly depends on the width of the keeper and output inverter pulldown transistors (see Chapter 10). The subthreshold leakage current in a sleep switch circuit, therefore, increases after keeper and output inverter sizing, degrading the savings in subthreshold leakage energy as listed in Table 11.3. The subthreshold leakage current in the sleep switch adder is 366 X to 660 X smaller as compared to the low-V<sub>t</sub> adder under similar noise immunity conditions. The subthreshold leakage energy dissipation of the sleep switch dual-V<sub>t</sub> adder is 132 X to 565 X smaller as compared to the standard dual-V<sub>t</sub> adder with similar noise immunity characteristics.

TABLE 11.3 A COMPARISON OF NORMALIZED SUBTHRESHOLD LEAKAGE ENERGY OF LOW-V $_{\rm t}$ , STANDARD DUAL-V $_{\rm t}$ , AND SLEEP SWITCH ADDERS UNDER SIMILAR AND DEGRADED NOISE IMMUNITY CONDITIONS

|                                                 |                              | $V_0$ | $V_1$ | $V_2$ | $V_3$ | $V_4$ | $V_5$ |
|-------------------------------------------------|------------------------------|-------|-------|-------|-------|-------|-------|
| Similar Transistor Size Degraded Noise Immunity | Standard Low-V <sub>t</sub>  | 830   | 802   | 461   | 473   | 622   | 790   |
|                                                 | Standard Dual-V <sub>t</sub> | 714   | 546   | 167   | 187   | 355   | 535   |
|                                                 | Sleep Switch                 | 1     | 1     | 1     | 1     | 1     | 1     |
| Transistor Sizing Similar Noise Immunity        | Standard Low-V <sub>t</sub>  | 660   | 637   | 366   | 376   | 495   | 628   |
|                                                 | Standard Dual-V <sub>t</sub> | 565   | 434   | 132   | 148   | 281   | 425   |
|                                                 | Sleep Switch                 | 1     | 1     | 1     | 1     | 1     | 1     |

The cumulative energy dissipated in the standby mode by the low- $V_t$  and sleep switch adders under similar noise immunity conditions is shown in Fig. 11.10. The

cumulative energy dissipated in the standby mode by the standard dual-V<sub>t</sub> and sleep switch dual-V<sub>t</sub> adders providing similar noise immunity characteristics is shown in Fig. 11.11. The minimum duration of the idle mode required to provide a net savings in the total energy dissipation increases due to the higher subthreshold leakage currents in the sleep switch domino adder after transistor sizing. As shown in Fig. 11.10, a minimum of 47 clock cycles is required for the sleep switch circuit to provide a net power savings as compared to a standard low-V<sub>t</sub> adder during the idle mode. Similarly, as shown in Fig. 11.11, a minimum of 69 clock cycles is required to provide a net savings in total standby energy consumption as compared to a standard dual-V<sub>t</sub> domino adder. The minimum number of clock cycles required for the sleep switch circuit to provide a net savings in total energy during the idle mode, for both similar transistor sizing (degraded noise immunity) and similar noise immunity conditions (increased keeper and output inverter pulldown transistor size), is listed in Table 11.4.



Fig. 11.10. Under similar noise immunity conditions, cumulative standby energy dissipation of the low- $V_t$  and sleep switch adders for three different input vectors.



Fig. 11.11. Under similar noise immunity conditions, cumulative standby energy dissipation of the sleep switch and standard dual-V<sub>t</sub> adders for three different input vectors.

## 11.5. Chapter Summary

A circuit technique is presented in this chapter for reducing the standby leakage energy consumption of domino logic circuits. The described circuit technique employs sleep switches and a dual threshold voltage CMOS technology in order to place an idle domino logic circuit into a low leakage state, without degrading the delay and power characteristics during the active mode. A dual threshold voltage domino circuit enters and leaves the sleep mode within a single clock cycle with the sleep switch circuit technique.

The sleep switch circuit technique reduces the leakage energy by up to 830 times as compared to a standard low-V<sub>t</sub> circuit. The circuit technique also reduces the active

mode delay and power by up to 21% and 14.6%, respectively, as compared to a low- $V_{\rm t}$  circuit.

TABLE 11.4

MINIMUM DURATION OF THE IDLE MODE REQUIRED FOR THE SLEEP

SWITCH CIRCUIT TECHNIQUE TO PROVIDE A NET SAVINGS IN STANDBY

ENERGY AS COMPARED TO THE STANDARD LOW-V<sub>t</sub> AND DUAL-V<sub>t</sub>

ADDERS UNDER SIMILAR AND DEGRADED NOISE IMMUNITY

CONDITIONS

| •                | Minimum Number of Clock Cycles Required |                               |                             |                              |  |  |
|------------------|-----------------------------------------|-------------------------------|-----------------------------|------------------------------|--|--|
|                  |                                         | nsistor Size<br>pise Immunity |                             |                              |  |  |
| Vector           | Standard Low-V <sub>t</sub>             | Standard Dual-V <sub>t</sub>  | Standard Low-V <sub>t</sub> | Standard Dual-V <sub>t</sub> |  |  |
| $V_0$            | 42                                      | 51                            | 47                          | 57                           |  |  |
| $\overline{V_1}$ | 20                                      | 37                            | 25                          | 41                           |  |  |
| $V_2$            | 6                                       | 57                            | 16                          | 69                           |  |  |
| $V_3$            | 9                                       | 57                            | 19                          | 68                           |  |  |
| $V_4$            | 26                                      | 56                            | 33                          | 61                           |  |  |
| $V_5$            | 23                                      | 42                            | 29                          | 47                           |  |  |

Existing techniques based on the application of a selected input vector to place a dual- $V_t$  circuit into a low leakage state are ineffective for minimizing the subthreshold leakage currents in multiple stage domino circuits with inverted internal signals. The sleep switch circuit technique exploits the full effectiveness of a dual- $V_t$  CMOS technology to reduce subthreshold leakage current by strongly turning off all of the high- $V_t$  transistors, independent of the input signals. The sleep switch circuit

technique reduces the leakage energy by up to 714 times as compared to a standard dual- $V_t$  circuit. The energy overhead of the circuit technique is low, justifying the use of the proposed sleep scheme during idle periods as short as 57 clock cycles in order to reduce standby leakage energy.

The noise immunity of the circuit blocks within a dual-V<sub>t</sub> domino adder is degraded by up to 14.9% as compared to a low-V<sub>t</sub> domino adder. The keepers and output inverter pulldown transistors are sized to provide similar noise immunity as compared to a low-V<sub>t</sub> adder. Under similar noise immunity conditions, the subthreshold leakage energy consumed by a sleep switch adder is up to 660 times and 565 times smaller as compared to a standard low-V<sub>t</sub> and dual-V<sub>t</sub> adder, respectively. A minimum of 47 and 69 clock cycles is required for the sleep switch circuit to provide a net savings in total energy consumption during the idle mode while providing similar noise immunity characteristics during the active mode as compared to a standard low-V<sub>t</sub> and dual-V<sub>t</sub> domino adder, respectively.

# Chapter 12

# **Low Swing Domino Logic**

The low swing circuit technique has become an attractive method to reduce power in high performance integrated circuits. This technique has primarily been applied to I/O drivers and long interconnects [126]. As discussed in Chapters 2 and 8, static CMOS circuits driven by low swing input signals dissipate excessive static DC power while displaying poor delay characteristics. Specialized voltage interface circuits are therefore required to transfer signals between static CMOS circuits operating at different voltage levels [32], [126]. The circuit delay and complexity of low swing static CMOS circuits increase while the power reduction attained by lowering the node voltages is reduced due to these voltage interface circuits. Low swing circuit techniques are, therefore, rarely applied to modify the voltage swing of signals driving static CMOS gates. Low swing circuit techniques, however, can be effective in domino logic circuits. In a domino gate, the input signals are applied only to the NMOS transistors in the pull-down path, while a single pull-up PMOS transistor is driven by a separate clock signal. A low swing signal that transitions between ground and a second sufficiently high voltage level to effectively turn on an NMOS transistor does not produce malfunctions or static DC power consumption in domino logic circuits.

A low swing domino logic circuit is described in this chapter to reduce power consumption without degrading noise immunity. The voltage swings at the internal nodes of a domino logic circuit are reduced in order to lower the dynamic switching power. The low swing concept is also applied to a domino circuit keeper to further reduce the power consumption while enhancing speed. A simple and efficient circuit technique is described for a dual threshold voltage (dual-V<sub>t</sub>) implementation of the proposed low swing circuits. Significant reductions in standby mode leakage power

without incurring a delay penalty in the active mode are observed as compared to completely low threshold voltage (low-V<sub>t</sub>) circuits.

Challenges in the design of reliable domino logic circuits together with active and standby power reduction techniques applicable to domino logic circuits are reviewed in Section 12.1. The operation of the proposed low swing circuits and related simulation results characterizing the delay, power, and noise immunity are described in Section 12.2. Proposed dual-V<sub>t</sub> low swing domino logic circuits and related simulation results are presented in Section 12.3. The research results presented in this chapter are summarized in Section 12.4.

## 12.1. Power Reduction Techniques in Domino Logic Circuits

Domino logic circuit techniques have been extensively applied in recent high performance microprocessors due to the superior speed and area characteristics of domino circuits as compared to static CMOS circuits [135] - [137]. A standard domino logic circuit with a keeper (SDK) is shown in Fig. 12.1a. The voltage on the dynamic node can be degraded due to charge sharing, coupling noise, and/or charge leakage [134], [138]. Since the dynamic node is not actively driven, the state of the dynamic node cannot be recovered once the output is erroneously switched. Threshold voltage scaling is extensively applied to domino logic circuits with reduced supply voltage in order to preserve the speed advantages as compared to static CMOS circuits. Hence, domino logic circuits are further sensitive to noise as the supply and threshold voltages are scaled [21], [130], [138], [141]. The voltage transfer characteristics of SDK for different threshold voltages are shown in Fig. 12.1b. The decreased noise immunity of domino logic gates with reduced threshold voltages is illustrated in Fig. 12.1b. In addition to the increased sensitivity of domino logic circuits to noise, the effect of coupling noise on reliable circuit operation increases with reduced feature size, increased interconnect aspect ratios, and higher circuit operating frequencies [134], [138]. Error free operation of deep submicrometer domino logic circuits has, therefore, become a significant challenge.

Most of the recently proposed low power domino logic circuit techniques (see Chapter 9 for a more detailed discussion of these techniques), either temporarily or permanently reduce the current strength of a keeper transistor in order to lower the power dissipation [128]-[130], [140]. These circuit techniques, therefore, typically sacrifice noise immunity for delay and/or power savings. Since the error free operation of domino logic circuits in scaled CMOS technologies is endangered, low power domino logic techniques should also address the noise issues in modern high performance integrated circuits. A low swing domino logic circuit technique is described in this chapter to reduce active power consumption without degrading noise immunity. The noise immunity characteristics of the proposed low swing circuits are evaluated in detail.



Fig. 12.1. Domino logic circuit and voltage transfer characteristics (VTC). (a) Standard domino logic circuit with a keeper (SDK). (b) VTC of SDK for various threshold voltages.

A low swing domino logic circuit technique is proposed in [139]. The circuit technique proposed in [139] requires integrated capacitors in order to modify the voltage swing at the output of a domino gate. These large capacitors impose significant area overhead, making the low swing domino logic circuit technique proposed in [139] non-practical. Low swing domino logic circuits proposed in this chapter have a significantly lower area overhead as compared to the circuits proposed in [139].

In [139], a low swing domino logic circuit without a keeper is compared to a standard full swing domino logic circuit without a keeper in terms of power and delay, without addressing the noise issues. The low power circuits proposed in [139] are not effective in increasingly noisy high performance integrated circuits. The noise immunity characteristics of the low swing domino logic circuits presented in this chapter are compared to standard domino. It is shown that the low swing domino logic circuits presented in this chapter offer significant dynamic switching power savings without degrading the noise immunity as compared to standard domino logic circuits.

As discussed in Chapter 2, the dynamic switching power consumption typically dominates the total power consumption in an active circuit. Alternatively, in an idle circuit, subthreshold leakage is the primary source of power consumption. Leakage power is a significant source of energy dissipation in portable systems with extended idle periods. As constant field scaling and threshold voltage reduction trends continue, the leakage power is expected to exceed the dynamic switching power [21], [29]. The leakage power is expected to become a significant portion of the total power dissipation of high performance circuits [2], [21]. Leakage current reduction techniques are required in both high performance and low power/portable integrated circuits.

The use of multiple threshold voltages in CMOS circuits (MTCMOS) has been proposed in [29] to reduce the subthreshold leakage current. The circuit operation is divided into active and standby modes of operation. It is shown that the leakage currents of a circuit can be reduced by placing the circuit into a controlled standby mode when the circuit is idle. However, as discussed in Chapter 3, MTCMOS

degrades the circuit performance due to the additional high threshold voltage (high- $V_t$ ) switches between the logic circuit and the power supplies. A similar technique for leakage reduction is the dual- $V_t$  technique. A dual- $V_t$  circuit is divided into critical and non-critical paths, where high- $V_t$  transistors are used on non-critical paths while low- $V_t$  transistors are used on the critical paths [21], [29], [86]. In dual- $V_t$  circuits, the threshold voltages of existing transistors are modified (there is no need for additional high- $V_t$  switches), and the high- $V_t$  transistors are used only on the non-critical paths. Dual- $V_t$  techniques, therefore, reduce the leakage power while incurring a smaller speed penalty as compared to the MTCMOS technique [86]. A detailed discussion of the dual- $V_t$  and MTCMOS circuit techniques applied to static and dynamic CMOS circuits is given in Chapters 3 and 10, respectively.

Application of dual-V<sub>t</sub> techniques to domino logic is particularly attractive because of the fixed transitions in domino circuits during the precharge and evaluation phases [34]. Dual-V<sub>t</sub> domino logic circuits have been proposed in [34], [86], [130], [141]. In this chapter, a simple low swing dual threshold voltage domino logic circuit technique is proposed. Significant reductions in standby leakage power and improved delay characteristics are observed as compared to standard low-V<sub>t</sub> circuits.

## 12.2. Low Swing Domino Logic

Low swing circuit techniques are applied to domino logic circuits in order to reduce the dynamic switching power. The voltage swings at the internal nodes of domino logic circuits are modified. The first proposed low swing domino logic with a fully driven keeper circuit technique is introduced in Section 12.2.1. A second proposed domino logic circuit technique reduces both the voltage swing at the output node and at the gate of the keeper. The operation of the second proposed low swing domino logic with a weak keeper circuit technique is described in Section 12.2.2. Simulation results are presented in Section 12.2.3.

#### 12.2.1. Low Swing Domino Logic with Fully Driven Keeper (LSDFDK)

A domino gate based on the proposed low swing domino logic with a fully driven keeper (LSDFDK) circuit technique is shown in Fig. 12.2. LSDFDK circuit technique reduces the voltage swing at the output node using the NMOS transistor (N6) as a pull-up. The output voltage swings between ground and  $V_{DD}$  -  $V_{tn}$ . The keeper (P2) is driven with a full swing signal for improved noise immunity.



Fig. 12.2. The proposed low swing domino logic with fully driven keeper (LSDFDK) circuit technique.

#### 12.2.2. Low Swing Domino Logic with Weakly Driven Keeper (LSDWDK)

A reduced keeper gate drive technique is proposed in [140] to improve the delay and power characteristics of domino circuits while maintaining robustness against noise. This technique reduces the contention current by lowering the gate voltage of the keeper transistor. A second low swing domino logic circuit, proposed here and shown in Fig. 12.3, utilizes a similar weak keeper.

The weak keeper is critical in low swing circuits since the effects of the contention current on the evaluation delay and switching power are worse due to the

reduced gate drive of the pull-down network transistors. LSDWDK produces two different voltage swings. The output voltage swing is between ground and  $V_{DD}$  -  $V_{tn}$ . The gate voltage swing of the keeper (P2) is also modified using (the PMOS transistor) P4 for pull-down. The gate voltage of P2 swings between  $|V_{tp}|$  and  $V_{DD}$  (assuming  $|V_{tp}| \leq V_{tn}$ ). This voltage swing reduces the contention current as compared to LSDFDK, thereby, lowering the evaluation delay and dynamic power. The tradeoff is a reduced noise margin due to the weaker keeper transistor.



Fig. 12.3. The proposed low swing domino logic with weakly driven keeper (LSDWDK) circuit technique.

#### 12.3. Simulation Results

The SDK, LSDFDK, and LSDWDK circuit techniques are evaluated for a three stage pipeline (see Fig. 12.4) composed of four-input AND gates assuming a 0.18  $\mu$ m CMOS technology. Domino AND gates based on the proposed low swing domino logic circuit techniques are shown in Fig. 12.5.  $V_{tn}$  and  $|V_{tp}|$  are assumed to be 200 mV. Each AND gate drives the four inputs of the following stage AND gate (the inputs of each AND gate are tied together and driven by the same signal). The third

stage of the LSDFDK and LSDWDK pipelines is assumed to be a four input SDK AND gate to recover the full swing signal at the output of the pipeline. A 1 GHz clock signal with a 50% duty cycle is applied to each pipeline.



Fig. 12.4. Three stage pipeline of four input domino AND gates.



Fig. 12.5. Four input domino AND gates based on the proposed low swing domino logic circuit techniques. (a) LSDFDK AND gate. (b) LSDWDK AND gate.

The size of the transistors in the pull-down network is critical for improving the evaluation delay of the domino logic circuits. The width of the keeper is minimum  $(W_{min})$  for each circuit. The equivalent width of the pull-down network (PNEW) is a

multiple of the keeper width and is varied to evaluate the delay, power, and noise immunity tradeoffs. The evaluation delay is calculated from 50% of the signal swing applied at the input of the first stage AND gate to 50% of the signal swing observed at the output of the third stage AND gate within each pipeline. To evaluate the noise immunity, the noise signal is assumed to be a square wave with a 450 ps duration. The maximum tolerable noise amplitude (MTNA) is defined as the signal amplitude at the input of the first stage AND gate that induces a 10% drop (as compared to  $V_{DD}$ ) in the voltage at the dynamic node of the second stage AND gate. The pull-down NMOS transistors and the foot transistor (N5) are the same size. The active power, evaluation delay, and MTNA for each of these three domino circuits are shown in Fig. 12.6. Normalized results (for PNEW = 3) are listed in Table 12.1.

The simulation results show that the proposed low swing circuit technique is effective for lowering the power consumption of domino logic circuits. As shown in Fig. 12.6a, LSDFDK reduces the power consumption by up to 9.4% as compared to SDK with increasing PNEW. LSDWDK offers an additional power savings since the contention current is decreased by weakening the keeper (reduced current drive for the same size keeper as compared to both LSDFDK and SDK). LSDWDK reduces the power consumption by up to 12.4% as compared to SDK. The power savings of both LSDWDK and LSDFDK increase as compared to SDK with increasing PNEW. For all three circuits, the active power consumption increases as the size of the pull-down network increases.

Increased PNEW reduces the evaluation delay due to the increased current drive of the pull-down network. However, as shown in Fig 12.6b, both LSDWDK and LSDFDK sacrifice some speed for reduced power. As listed in Table 12.1, the evaluation delay is 46% higher for LSDFDK, and 38% higher for LSDWDK as compared to SDK (for PNEW = 3). LSDWDK offers enhanced delay characteristics as compared to LSDFDK due to the reduced contention current. As shown in Fig. 12.6b, the LSDWDK evaluation delay is up to 8.6% lower than the LSDFDK evaluation delay.



Fig. 12.6. Simulation results for different pull-down network transistor sizes for a constant keeper size ( $W_{keeper} = W_{min}$ ). (a) Power versus pull-down network equivalent width (PNEW). (b) Evaluation time versus PNEW. (c) Maximum tolerable noise amplitude (MTNA) versus PNEW.

Another tradeoff for increased performance of each circuit with increasing PNEW is reduced noise immunity. As shown in Fig. 12.6c, the maximum tolerable noise amplitude decreases with increasing PNEW. LSDFDK not only lowers the power consumption but also displays higher noise immunity characteristics as compared to SDK. This behavior is due to the noise suppressing effect of the NMOS transistor providing the pull-up at the output (N6 in Fig. 12.2) as the noise signal is transferred to the following gates. The MTNA of LSDFDK is up to 2.6% higher than the MTNA of SDK, and up to 10.9% higher than the MTNA of LSDWDK. Since the keeper of LSDWDK is weak, the MTNA of LSDWDK is 8.7% less than the MTNA of SDK for PNEW = 1.2. With increasing PNEW, the relative effect of the keeper on the noise immunity of the domino circuits is reduced. The difference between the MTNA of LSDWDK and SDK therefore is reduced to 2.1% from 8.7% as the PNEW increases from 1.2 to 3. Similarly, the MTNA advantages of LSDFDK as compared to SDK increases from 1.3% to 2.6% as the PNEW increases from 1.2 to 3. As shown in Fig. 12.6, with increasing PNEW, the power advantages of both LSDWDK and LSDFDK increase as compared to SDK while the evaluation times of all three circuits become more similar. Therefore, low swing domino logic circuits are expected to become more attractive as the pull-down network is scaled for higher performance.

TABLE 12.1

NORMALIZED DYNAMIC POWER,

EVALUATION DELAY, AND MTNA (PNEW = 3)

|        | Power | Delay | MTNA |
|--------|-------|-------|------|
| SDK    | 1.00  | 1.00  | 1.00 |
| LSDFDK | 0.91  | 1.46  | 1.03 |
| LSDWDK | 0.88  | 1.38  | 0.98 |

### 12.4. Dual Threshold Voltage Low Swing Domino Logic

The proposed low swing domino circuits have been shown to be effective in reducing the power consumed during the active mode of operation. The standby mode power characteristics of the proposed circuits, however, are comparable to standard domino. In this section, a circuit technique to reduce standby leakage current in the proposed low swing domino logic circuits is presented. The sleep switch dual threshold voltage domino logic circuit technique described in Chapter 11 is applied to the proposed low swing domino logic circuits.

The low swing dual- $V_t$  domino circuits with the fully driven keeper and weakly driven keeper circuit techniques are shown in Figs. 12.7 and 12.8, respectively. The high- $V_t$  transistors are illustrated with a bold line in the channel area in Figs. 12.7 and 12.8. A high- $V_t$  NMOS switch is added to the first stage of each pipeline circuit with the proposed circuit technique. The operation of this transistor is controlled by a separate sleep signal. During the active mode of operation, the sleep signal is set low, the sleep transistor is cutoff, and the proposed low swing circuits operate as explained in Section 12.2. During the standby mode of operation, the sleep signal transitions high, turning on the sleep transistor. The dynamic node of the first domino gate is discharged through the sleep transistor. As a result, the output of the first stage gate transitions high, cutting off the keeper and causing the following gates to evaluate in a domino fashion. During the standby mode of operation, the clock signal is maintained high, turning off the pull-up transistor of each domino gate. After the node voltages settle to a steady state, all of the high- $V_t$  transistors in the proposed circuits are strongly cutoff, reducing the leakage current.

The circuits shown in Figs. 12.7 and 12.8 are evaluated for both active and standby modes of operation. The effects of modifying the threshold voltage distribution of the transistors on the power and performance characteristics of the circuits are examined. LSDFDK and LSDWDK are evaluated for high threshold voltages only, low threshold voltages only, and dual threshold voltages. Same size transistors and circuit configurations are used for all three threshold voltage

distributions. The low- $V_t$  is assumed to be 200 mV and the high- $V_t$  is assumed to be 400 mV. The PNEW is 2.4. The standby mode leakage power and active mode total power of the proposed low swing circuits are listed in Table 12.2. The evaluation delay and MTNA of the proposed low swing circuits are listed in Table 12.3.



Fig. 12.7. Dual threshold voltage implementation of a low swing domino logic circuit with a fully driven keeper (LSDFDK). High-V<sub>t</sub> transistors are illustrated with a bold line in the channel area.

These results demonstrate that the dual threshold voltage technique is a powerful method to simultaneously reduce the standby mode leakage power, active mode total power, and the evaluation delay of the proposed low swing domino logic circuits as compared to standard low-V<sub>t</sub> circuits. As listed in Table 12.2, the standby power of the dual-V<sub>t</sub> LSDWDK is 237 times smaller than in low-V<sub>t</sub> LSDWDK. Similarly, dual-V<sub>t</sub> LSDFDK consumes 235 times less leakage power as compared to low-V<sub>t</sub> LSDFDK operating in the standby mode. Another advantage of the proposed dual-V<sub>t</sub> circuits is the reduced active mode total power. This behavior is primarily caused by the weaker high-V<sub>t</sub> pull-up transistors, P1 and P2 (reduced contention current). As listed in Table

12.2, low- $V_t$  LSDWDK consumes 1.7% more active power than dual- $V_t$  LSDWDK. Similarly, the active power consumption of the low- $V_t$  LSDFDK is 2.1% higher than the power consumption of the dual- $V_t$  LSDFDK.



Fig. 12.8. Dual threshold voltage implementation of a low swing domino logic circuit with a weakly driven keeper (LSDWDK). High-V<sub>t</sub> transistors are illustrated with a bold line in the channel area.

Another advantage of the dual- $V_t$  implementation is reduced evaluation delay. The dual- $V_t$  technique improves slightly the evaluation time of both LSDWDK and LSDFDK as compared to the low- $V_t$  circuits. This behavior is again due to the reduced contention current due to the weaker high- $V_t$  pull-up transistors.

The primary drawback of the dual- $V_t$  circuits as compared to the low- $V_t$  circuits is reduced noise immunity. As listed in Table 12.3, MTNA is reduced by 4.3% (2.3%) for dual- $V_t$  LSDWDK (LSDFDK) as compared to low- $V_t$  LSDWDK (LSDFDK). This behavior is primarily caused by the reduced current drive of the high- $V_t$  keeper.

TABLE 12.2
STANDBY MODE LEAKAGE POWER AND ACTIVE MODE TOTAL POWER
FOR DIFFERENT THRESHOLD VOLTAGE DISTRIBUTIONS

|                             | Leakage Po | ower (nW) | Active Power (μW) |        |  |
|-----------------------------|------------|-----------|-------------------|--------|--|
| V <sub>t</sub> Distribution | LSDWDK     | LSDFDK    | LSDWDK            | LSDFDK |  |
| $Low-V_t$                   | 180.80     | 264.70    | 402.3             | 413.0  |  |
| $Dual$ - $V_t$              | 0.76       | 1.12      | 395.7             | 404.4  |  |
| $High-V_t$                  | 0.73       | 1.10      | 341.3             | 348.4  |  |

TABLE 12.3
EVALUATION DELAY AND MTNA
FOR DIFFERENT THRESHOLD VOLTAGE DISTRIBUTIONS

|                             | Evaluation | Delay (ps) | MTNA (mV) |        |  |
|-----------------------------|------------|------------|-----------|--------|--|
| V <sub>t</sub> Distribution | LSDWDK     | LSDFDK     | LSDWDK    | LSDFDK |  |
| $Low-V_t$                   | 216        | 231        | 488       | 520    |  |
| $Dual$ - $V_t$              | 212        | 229        | 467       | 513    |  |
| $High$ - $V_t$              | 355        | 400        | 685       | 732    |  |

The high- $V_t$  circuits improve both the active and standby power characteristics as compared to the low- $V_t$  and dual- $V_t$  circuits. High- $V_t$  LSDWDK (LSDFDK) offers an approximately 13.7% (13.8%) savings in active power as compared to the dual- $V_t$  LSDWDK (LSDFDK). The difference between the leakage power of the dual- $V_t$  implementation and the high- $V_t$  implementation of LSDWDK (LSDFDK) is 3.8% (2.5%). Another significant advantage of the high- $V_t$  circuits is increased noise

immunity. As listed in Table 12.3, high-V<sub>t</sub> LSDWDK (LSDFDK) increases the MTNA by 46.7% (42.7%) as compared to dual-V<sub>t</sub> LSDWDK (LSDFDK). This significant power savings and high noise immunity, however, comes with an increased evaluation delay due to the reduced current drive of the evaluation path transistors. The high-V<sub>t</sub> LSDWDK (LSDFDK) increases the evaluation delay by 67.5% (74.7%) as compared to the dual-V<sub>t</sub> LSDWDK (LSDFDK).

### 12.5. Chapter Summary

Low swing domino logic circuits with weakly driven keepers and fully driven keepers are presented in this chapter for reducing the dynamic switching power without degrading the noise immunity. LSDFDK is shown to consume up to 9.4% less active power and tolerate up to 2.6% more noise as compared to SDK. The active power is further reduced by weakening the keeper which also improves the evaluation delay due to reduced contention current. LSDWDK is shown to reduce the active power consumption by up to 12.4% as compared to SDK while improving the evaluation delay by up to 8.6% as compared to LSDFDK.

A dual- $V_t$  version of the proposed low swing domino circuits is also described for controlled standby mode circuit operation in order to reduce subthreshold leakage current. The standby leakage power of LSDWDK (LSDFDK) is reduced 237 (235) times for a dual- $V_t$  circuit as compared to a low- $V_t$  circuit. Moreover, the proposed dual- $V_t$  technique reduces the active power consumption of LSDWDK (LSDFDK) by 1.6% (2.1%) as compared to a low- $V_t$  circuit without incurring a delay penalty.

Dual-V<sub>t</sub> LSDWDK is the proper choice for those applications in which power and speed are both important because of the low active and standby power consumption and relatively low evaluation delay. High-V<sub>t</sub> LSDWDK is proposed for those applications in which power and noise immunity are the primary concerns and speed is a secondary issue. High-V<sub>t</sub> LSDWDK further reduces the leakage power by another 3.8% and the active power by another 13.7% while improving the noise immunity by 46.7% as compared to dual-V<sub>t</sub> LSDWDK.

# Chapter 13

# **Conclusions**

The fundamental enabling force behind the evolution of integrated circuits is the advancement in fabrication technologies that simultaneously permit scaling the minimum feature size of the device and interconnect and increasing die area. The increasing number of transistors per integrated circuit with each new process technology generation offers greater opportunity for enhanced circuit performance and functionality. The propagation delay through individual circuit elements is reduced as the physical dimensions are scaled. Technology scaling related enhancements coupled with advances in circuit structures and microarchitectures have significantly increased the performance of integrated circuits. The price for these performance and functional enhancements has traditionally been increased design complexity and higher power consumption.

The batteries provide the energy required for the operation of portable devices. The battery technologies evolve at a much slower pace as compared to the integrated circuit technologies. Due to the lack of a low cost battery technology with higher energy density characteristics as compared to the Li-ion technology, enhancing the performance and functionality in portable devices becomes increasingly challenging as the power consumption increases with each new technology generation. In portable, embedded, and desktop integrated systems, the energy efficient generation of the increasing levels of power is an important requirement for enhancing the system level energy efficiency. Another important consequence of the increasing power consumption is the degradation in the voltage quality and reliability of the power distribution networks. Due to the scaling of the supply voltages coupled with the increase in power consumption, the supply currents significantly increase producing metal migration and voltage fluctuation in the power distribution networks. Another important consequence of the increasing power consumption is the increasing die

temperature gradients and the formation of local hot spots which degrade the reliability and performance of high performance integrated circuits. After the power is consumed for producing a logic function, the released energy in the form of heat needs to be appropriately dissipated in order to maintain the die temperature. As the power density increases with the increasing power consumption and the device and interconnect density, it becomes increasingly challenging to maintain the die temperature using the traditional low cost cooling techniques based on heat sinks and air flow fans.

The generation, distribution, and dissipation of power are now at the forefront of problems in the design of high performance integrated circuits. Several techniques for designing low power and high speed integrated circuits are introduced in this dissertation. Techniques for supply and threshold voltage scaling that lower power consumption or enhance device reliability without degrading circuit speed are described.

The dominant component of power consumption in CMOS circuits is dynamic switching power. Dynamic switching power is more than quadratically reduced by lowering the supply voltage. The other significant components of power consumption in current CMOS technologies such as the short-circuit, subthreshold leakage, and gate oxide leakage power are also significantly reduced at lower supply voltages. Scaling the supply voltage, therefore, is an effective strategy for lowering the power consumed by CMOS circuits. As the supply voltage is reduced, however, the circuit speed degrades due to reduced transistor currents. Systems with multiple supply voltages, by selectively scaling the supply voltages along non-critical paths, can minimize the speed penalty for reducing the power consumption.

A primary issue in multiple supply voltage CMOS circuits is the generation of the multiple voltages. The energy and area overhead of the additional power supplies in multiple supply voltage CMOS integrated circuits is an important concern. The additional DC-DC converters must consume a small amount of energy in order to increase the energy savings attained by a multiple supply voltage CMOS circuit technique. High efficiency DC-DC conversion techniques with small area and good

output voltage regulation characteristics are developed in this dissertation. Monolithic DC-DC conversion on the same die as the load provides several desirable aspects. The specific load of interest in this dissertation is a microprocessor. Integrating both the active and passive devices of a DC-DC converter onto the same die as a microprocessor increases energy efficiency, enhances the quality of the voltage regulation, and decreases the number of I/O pads dedicated for power delivery. Furthermore, by employing an integrated circuit technology, the reliability of the voltage conversion circuitry can be enhanced, the area can be reduced, and the overall cost of the DC-DC converter can be decreased as compared to a traditional discrete DC-DC converter.

An analysis of the power characteristics of a standard switching DC-DC converter topology, a buck converter, is presented in Chapter 5. An accurate parasitic circuit model is described for determining the optimum circuit configuration with the maximum efficiency. With this model, a closed form expression for the total power consumption of a buck converter is proposed. An analysis of an on-chip DC-DC converter over a wide range of design parameters is evaluated, permitting the development of a design space for full integration of the active and passive devices of a DC-DC converter on the same die. Full integration of a high efficiency buck converter on the same die as a dual supply voltage microprocessor is demonstrated to be feasible.

Two major challenges for a monolithic switching DC-DC converter are the area occupied by the integrated filter capacitor and the effect of the parasitic impedance characteristics of the integrated inductor on the overall efficiency characteristics of a switching DC-DC converter. A high switching frequency is the key design parameter that enables the integration of a high efficiency buck converter on the same die as a dual supply voltage microprocessor. An optimum switching frequency and inductor current ripple pair that maximizes the efficiency of a buck converter is shown to exist for a target technology. The global maximum efficiency is 92% at a switching frequency of 114 MHz for a voltage conversion from 1.2 volts to 0.9 volts while supplying 9.5 amperes of DC current, assuming an 80 nm CMOS technology. The

required filter capacitance and inductance at this operating point are 2083 nF and 104 pH, respectively. An efficiency of 88.4% is demonstrated at a switching frequency of 477 MHz when the filter capacitance is reduced to 100 nF due to tight area constraints on a microprocessor die. The area occupied by the buck converter is 12.6 mm<sup>2</sup> and is dominated by the area of the integrated filter capacitor. The analytic model for the converter efficiency is within 2.4% of the simulation results at the target design point.

For those high switching frequencies at which monolithic integration becomes feasible, the energy dissipated by the power MOSFETs and gate drivers dominates the total losses of the DC-DC converter. The efficiency can, therefore, be improved by applying a variety of MOSFET power reduction techniques. A low swing MOSFET gate drive technique is proposed in this dissertation that improves the efficiency of a DC-DC converter. A new circuit model for low swing circuit optimization is also presented. The gate voltages and driver transistor sizes are included as independent parameters in the proposed model. The optimum gate voltage swing of a power MOSFET that maximizes efficiency is shown to be lower than a standard full voltage swing. Lowering the input and output voltage swing of power MOSFET gate drivers is demonstrated to be effective for enhancing the efficiency characteristics of a DC-DC converter.

High voltage power delivery on a circuit board and monolithic DC-DC conversion provides enhanced voltage regulation quality and higher energy efficiency in the power generation and distribution network. Next generation low voltage and high power microprocessors are likely to require high input voltage, large step-down DC-DC converters monolithicly integrated onto the same die. The voltage conversion ratios attainable with standard non-isolated switching DC-DC converter circuits are limited due to MOSFET reliability issues. Provided that a DC-DC converter is integrated onto the same die as a microprocessor (fabricated in a low voltage nanometer CMOS technology), the range of input voltages that can be applied to a standard DC-DC converter circuit is further reduced. A standard non-isolated switching DC-DC converter topology such as the standard buck converter circuit is, therefore, not suitable for future high performance integrated circuits. High efficiency

monolithic switching DC-DC converters that can generate very low operating voltages from a significantly higher board level distribution voltage in a scaled nanometer CMOS technology are highly desirable.

A new step-down DC-DC converter topology based on a cascode buffer is presented in this dissertation for integration in a low voltage CMOS process. The cascode bridge circuit ensures that the voltages across the terminals of all of the MOSFETs in a DC-DC converter are maintained within the limits imposed by available low voltage CMOS technologies. Reliable operation of the DC-DC converters operating at an input supply voltage up to three times as high as the maximum voltage that can be directly applied across the terminals of a MOSFET is verified assuming a 0.18 µm CMOS technology. The energy overhead of the proposed circuit technique is low due to a charge recycling mechanism in the MOSFET gate drivers. An efficiency of 79.6% is demonstrated for a voltage conversion from 5.4 volts to 0.9 volts while supplying 250 mA of DC current.

Another important issue in multiple supply voltage circuits is high speed and full voltage swing signal transfer among the different voltage domains. Signal transfer among the circuitry operating at different voltage levels requires specialized voltage interface circuits. The power and delay overhead of voltage interface circuits is a primary issue in multiple supply voltage CMOS circuits. A bi-directional CMOS voltage interface circuit that drives high capacitive loads to full swing at high speed while consuming no static DC power is presented in this dissertation. The propagation delay, power consumption, and power efficiency characteristics of the proposed voltage interface circuit are compared to other interface circuits described in the literature. Up to a 3.6 times delay improvement and up to a 95% power reduction are observed as compared to previously published schemes. The speed and power characteristics of the presented voltage interface circuit have also been verified by experimental test circuits.

An alternative technique for reducing the impact of supply voltage scaling on circuit performance is scaling the threshold voltages. Threshold voltage scaling has accelerated together with scaling of the supply voltages during the past decade. At

reduced threshold voltages, subthreshold leakage currents increase exponentially. Supply voltage scaling when coupled with threshold voltage reduction, therefore, increase the leakage power while lowering the dynamic switching power. Similar to standard single supply voltage CMOS circuits, single threshold voltage CMOS circuits also suffer from excessive energy comsumption in order to achieve a target throughput. The standard approach based on scaling the threshold voltage of an entire circuit so as to achieve a target signal propagation speed along a small number of critical delay paths is an inefficient method for enhancing performance.

A dynamic threshold voltage scaling technique can mitigate the deleterious side effects of threshold voltage scaling. Variable threshold voltage CMOS circuit techniques utilize the body terminal of the transistors to dynamically adjust the transistor currents during circuit operation. Dynamic threshold voltage scaling is typically used for reducing the subthreshold leakage current in the idle portions while enhancing the speed of the active portions of an integrated circuit.

A new use of the variable threshold voltage CMOS circuit technique is proposed in this dissertation for enhanced noise immunity in domino logic circuits. A body bias circuit technique that provides significant enhancements in several electrical characteristics of domino logic circuits is presented in Chapter 9. The use of standard domino logic circuits will become impractical within a few technology generations due to noise immunity issues. The proposed variable threshold voltage keeper circuit technique can extend the lifetime of domino logic circuits in high performance integrated circuits beyond the sub-100 nm CMOS technology generation. The technique is based on selectively applying reverse and forward body bias circuit techniques to a domino logic circuit. The circuit technique enhances the noise immunity of domino logic circuits without degrading the speed and power characteristics. By adjusting the current strength of a keeper transistor through a body bias, the electrical characteristics of a domino logic circuit is dynamically shifted towards either high speed/low power operation or higher noise immunity. Due to the dynamic nature of the technique (that automatically adjusts the keeper strength with respect to the changing speed, power, and noise immunity requirements of a domino

logic circuit for different operational modes), the noise immunity, power, and speed characteristics can all be simultaneously enhanced.

Another technique for mitigating the deleterious effects of threshold voltage scaling is multiple threshold voltage CMOS. Multiple threshold voltage CMOS circuits offer decreased subthreshold leakage currents and enhanced performance by selectively lowering the threshold voltages along the speed critical paths.

A specific multiple threshold voltage CMOS circuit technique is presented in this dissertation for application to high speed dynamic circuits. A quantitative study of the subthreshold leakage current characteristics of standard low threshold voltage and dual threshold voltage domino logic circuits is provided in Chapter 10. Different subthreshold leakage current conduction paths occur during different dynamic and output node voltage states. It is shown that a discharged dynamic node is preferable for reducing leakage current in a dual-V<sub>t</sub> circuit. Alternatively, a charged dynamic node is preferred for lower subthreshold leakage energy in a standard low-V<sub>t</sub> domino logic circuit with stacked pull-down devices, such as an AND gate.

Provided that a dual-V<sub>t</sub> CMOS technology is employed, the noise immunity of domino logic circuits can be significantly degraded, affecting the reliability. Two different dual-V<sub>t</sub> domino logic circuit techniques can be employed in order to maintain similar noise immunity as compared to standard low-V<sub>t</sub> circuits. Both keeper and output inverter sizing is required in a dual-V<sub>t</sub> domino logic circuit with a high threshold voltage (high-V<sub>t</sub>) keeper transistor in order to provide similar noise immunity as compared to a standard low-V<sub>t</sub> domino logic circuit. An alternative dual-V<sub>t</sub> domino logic circuit technique utilizes a low-V<sub>t</sub> keeper transistor for enhanced noise immunity. Under similar noise immunity conditions as compared to standard low-V<sub>t</sub> domino logic circuits, the savings in subthreshold leakage energy achieved by the dual-V<sub>t</sub> circuit technique with a high-V<sub>t</sub> keeper is 5.7 to 10.9 times higher as compared to the savings offered by the dual-V<sub>t</sub> circuit technique with a low-V<sub>t</sub> keeper. The effectiveness of the dual threshold voltage domino logic circuit technique in providing a significant savings in subthreshold leakage energy is verified down to a

100 mV difference between the high and low threshold voltages in a 0.18 μm CMOS technology.

A sleep switch dual threshold voltage domino logic circuit technique exploits the dynamic node voltage dependent asymmetry of the subthreshold leakage current characteristics of domino logic circuits (see Chapter 11). Existing techniques based on the application of a selected input vector to place a dual threshold voltage circuit into a low leakage state are shown to be ineffective in minimizing subthreshold leakage current in multiple stage domino circuits with inverted internal signals. The sleep switch circuit technique exploits the full effectiveness of a dual threshold voltage CMOS technology to reduce subthreshold leakage current by strongly turning off all of the high threshold voltage transistors, independent of the input signals. The energy overhead of the circuit technique is low, justifying the use of the proposed sleep scheme by providing significant savings in total energy consumption during short idle periods.

Certain challenges in reducing power consumption while enhancing speed and maintaining reliability in CMOS integrated circuits have been highlighted in this dissertation. Several enabling technologies have been decribed for achieving higher energy efficiency and enhanced reliability at the circuit and system levels. Examination of current integrated systems reveals inefficiencies at all levels of the design hierarchy. These inefficiencies are primarily due to constraints imposed by the system complexity which is directly reflected in the organization and hierarchy within a company. The complexity of the systems, in return, is imposed by the production, distribution, and consumption dynamics of the market. The enormous complexity of current integrated circuits composed of hundreds of millions of transistors is shaped by the market demand for ever increasing performance and functionality and the timeto-market constraints. This market pressure has traditionally left very little space for innovation and optimization with each new technology generation. The quantitatively impressive enhancements in speed and application variety of integrated circuits over the years have historically been managed by ignoring the increasing cost of energy and the degradation in reliability with each new product generation. While the additional

market value created by enhanced speed (both clock speed and design turnaround) and functionality has been the primary focus, additional costs due to increasing power consumption and complexity has typically been neglected due to the steadily increasing revenues. Continuing this same historical path will not be possible because, at this point in history, the increasing cost imposed by greater levels of power consumption and degraded reliability is likely to negate much of the additional value produced from the enhanced circuit speed and functionality. Introducing a new product which satisfies both customer expectations for higher speed and more functionality and vendor expectations for higher revenues will become infeasible based on existing company marketing policies which only trust in the notion of speed at all costs.

The push for moving from a mainstream design approach and engaging in another technique, as demonstrated throughout the history of the semiconductor industry, is always imposed by the requirements for cost effectiveness. An approach which claims to provide a solution to a recognized problem can only survive in a market environment if the approach is cost effective. Due to lagging battery technologies, increasing cost of cooling, and decreasing yield (caused by the device, circuit, and system level degradation of reliability), the author strongly believes that the end of the road for current mainstream speed-centric CMOS design techniques is quickly approaching. Low power and reliability concerns will dominate at all levels of the design hierarchy as the end of this speed-centric road that has been traveled for approximately four decades approaches. Low power and reliable integrated circuit and system design will develop into an increasingly exciting field full of opportunities. The research presented in this dissertation can be considered as a prelude to a larger discussion of the many possible opportunities for moving energy efficiency and reliability of nanometer semiconductor technologies to higher levels.

# Chapter 14

## **Future Research**

Some interesting research activities for the near future are briefly discussed in this chapter. The scaling of standard planar MOSFETs is expected to become infeasible within a few technology generations. Research activities for modeling and characterizing novel nanometer devices are described in Section 14.1. Another area of research activity is energy efficient computing. Future research directions for developing low power and high speed CMOS integrated circuit techniques are discussed in Section 14.2. A third group of future research activities is related to enhancing the reliability of integrated circuits. The increasing necessity for developing circuit techniques that can enhance robustnes against noise and parameter variations in CMOS integrated circuits are described in Section 14.3.

#### 14.1. Nanometer Devices

The scaling of planar silicon devices has been continuing for approximately five decades. However, due to fabrication related difficulties and degradation in device electrical characteristics caused by short-channel effects, scaling standard planar MOSFETs is expected to slow down within the next decade ( $L_{min} \approx 10$  nm). Today, integrated circuit technologies are shifting from bulk silicon to silicon-on-insulator (SOI) as the new industry standard [149]-[152]. SOI devices have enhanced short-channel characteristics as compared to standard bulk silicon devices. Moreover, SOI devices typically operate at considerably higher speeds due to lower junction capacitances as compared to bulk silicon devices [149], [150]. Multiple gate variations of SOI based technologies, as shown in Fig. 14.1, are likely to be widely used within the next decade [147], [152]. A gradual shift towards devices based on novel materials, such as carbon nanotubes, is likely to be observed towards the end of the

next decade (2020) [153], [154]. The modeling and characterization of these novel devices will become critical in the development of high speed, energy efficient, and reliable circuits and systems. Nanometer semiconductor device modeling and characterization will be a primary research focus. Accurate semiconductor device models for use in circuit design and analysis will become highly desirable.



Fig. 14.1 Triple gate and gate-all-around (quadruple gate) MOSFETs.

## 14.2. Energy Efficiency in CMOS Circuits

The price for performance and the functional enhancement of integrated circuits has traditionally been greater design and manufacturing complexity and higher power consumption. As stressed throughout this dissertation, the generation, distribution, and dissipation of power are at the forefront of current problems faced by integrated circuit designers. In order to continue the historical trend of reducing the unit cost of a circuit while simultaneously enhancing performance and functionality, radical changes are required in the manner in which integrated circuits are designed. Higher performance at all costs is no longer an option. Energy efficient semiconductor devices, circuit techniques, and microarchitectures are necessary to maintain the pace of expansion

that the semiconductor industry has enjoyed over the past forty years. Developing circuit techniques for energy efficient computing will continue to be a primary research focus. Future research activities related to multiple supply voltage CMOS circuits are described in Section 14.2.1. Enabling technologies for realizing dynamic voltage and frequency scaling integrated circuits are presented in Section 14.2.2. The advantages and challenges of dividing an integrated circuit into multiple domains for individually optimizing the supply voltage and clock frequency within each circuit domain are described in Section 14.2.3. Future research activities for simultaneously reducing both subthreshold and gate oxide leakage currents in nanometer CMOS technologies are presented in Section 14.2.4.

### 14.2.1. Multiple Supply Voltage CMOS Circuits

As discussed in Chapters 3 to 6, the energy and area overhead of the additional power supplies is a primary concern in multiple supply voltage CMOS circuits. Due to the advantages of high voltage power delivery on a circuit board and monolithic DC-DC conversion, next generation low voltage and high power microprocessors are likely to require high input voltage, large step down DC-DC converters monolithicly integrated onto the same die. High efficiency monolithic switching DC-DC converters that can generate low operating voltages from a significantly higher board level voltage in a scaled nanometer CMOS technology will become increasingly desirable. Developing high efficiency and high switching frequency monolithic DC-DC conversion techniques will be an important research objective. A critical technology to enable monolithic DC-DC conversion is an integrated inductor technology. High efficiency integrated inductor technologies will also be critical for enabling the energy efficient system-on-chip platforms in which the RF circuits will be integrated with the digital and analog integrated circuits. The design and characterization of high quality integrated inductors based on magnetic materials will be an important research area.

The voltage assignment process in multiple supply voltage CMOS circuits will need to be automated. Developing computer-aided design tools for partitioning circuits into multiple voltage domains will be an interesting research task. Another important issue in multiple supply voltage CMOS circuits is the power and delay overhead of voltage interface circuits required for signal transfer among different circuit blocks operating at different voltages. Developing low power, high speed voltage level converters for use in multiple supply voltage CMOS integrated circuits with varying voltage constraints will be another important research topic.

## 14.2.2. Dynamic Supply Voltage and Frequency Scaling

The computational load in a microprocessor varies with time. Applications in a typical microprocessor tend to produce peak performance requirements followed by idle periods. Dynamic voltage scaling techniques exploit variations in the computational workload by dynamically modifying the supply voltage and clock frequency. The primary objective of a dynamic voltage scaling technique is to provide high throughput during the execution of the computation-intensive tasks while saving energy at other times by adequately lowering the supply voltage and operating speed. Dynamic voltage scaling techniques can also be used to reduce the effect of die-to-die parameter variations on the circuit characteristics such as the clock frequency, active mode power, and standby leakage current.

In a microprocessor with dynamic frequency scaling, a monolithic, digital, and programmable clock generator is desirable. The clock generator must have high noise immunity, increased tolerance to process variations, and low power consumption. A digitally controlled low power clock generator with robust and high speed dynamic frequency scaling characteristics for integration onto the same die as a dynamic voltage and frequency scaled microprocessor will be an important research objective.

Integrating a dynamic voltage scaling DC-DC converter on the same die as the load provides several desirable aspects in dynamic voltage and frequency scaling circuits. In addition to the challenge of achieving high efficiency in a limited die area, another challenge unique to dynamic voltage scaling DC-DC converters is the switching speed of the output voltage. Enhancing the output switching speed while

maintaining a low output voltage ripple are two competing requirements in dynamic voltage scaling DC-DC converters. The development of a monolithic DC-DC converter with a high speed voltage scaling capability will be an important research objective.

### 14.2.3. Circuits with Multiple Voltage and Clock Domains

Partitioning an integrated circuit into multiple circuit blocks can significantly enhance the effectiveness of dynamic voltage and frequency scaling techniques [155], [156]. Rather than determining an optimum supply voltage for an entire integrated circuit for a specific workload, the supply voltage and operating frequency can be optimized independently within each circuit block. This approach can be seen as a combination of globally asynchronous, locally synchronous (GALS) circuits and dynamic voltage and frequency scaling techniques. In addition to multiple clock domains that operate in a typical GALS-based circuit, different circuit blocks operating at different supply voltages exist in a multiple voltage and clock domain integrated circuit, as shown in Fig. 14.2. In addition to lowering the power consumption, a multiple voltage and clock domain circuit can also significantly reduce the complexity of the on-chip synchronization circuitry.

A GALS microprocessor contains several independently clocked synchronous circuit blocks. Within each circuit block (or clock domain), the clock distribution network is less complex due to the lower number of transistors and the smaller physical area. The tolerable clock skew and jitter within a clock domain can be achieved by adjusting the size and number of clock domains within a GALS-based integrated circuit. A GALS-based microprocessor is skew independent at the global level since communication among the clock domains does not require a global clock reference. The clock domains within a GALS-based integrated circuit communicate via asynchronous protocols. In addition to reducing the power, jitter, and skew of the clock distribution network, GALS offers the opportunity to independently optimize the

operating voltage and frequency within each clock domain. Since different clock domains communicate asynchronously, the clock frequency and supply voltage of each clock domain can be dynamically adjusted, satisfying the throughput needs of the system while minimizing the energy consumed by each circuit block.



Fig. 14.2. An integrated circuit with multiple voltage and clock domains.

A significant issue in a GALS system is communication among different clock domains. Evaluating the energy overhead for generating and distributing the control signals to manage the signal transfer among regions with different clock references will be a topic of interest. Developing low power and high speed asynchronous communication (hand shaking) protocols and circuitry will become necessary. The

development of a design methodology for partitioning an IC into multiple clock domains with dynamic voltage and frequency scaling is an important research problem. Tools to synthesize asynchronous communication networks among the synchronous blocks will be another important research area. Digitally controlled clock generators and dynamic voltage scaling DC-DC converters for application to the clock domains of a GALS-based microprocessor is another interesting research topic.

#### 14.2.4. Leakage Current Reduction Techniques

As the aggressive scaling of threshold voltages continues in order to enhance speed, subthreshold leakage power is expected by the end of this decade to dominate the total power consumption of a CMOS circuit. Energy efficient circuit techniques aimed at lowering leakage currents are, therefore, highly desirable. The development of subthreshold leakage current reduction techniques will continue to be an important research interest.

A primary and immediate challenge for scaling MOSFETs is imposed by the gate dielectric thickness. Scaling the gate oxide thickness is crucial to enhancing the performance of semiconductor devices. The quantum mechanical tunneling of carriers increases exponentially with decreasing insulator layer thickness. Gate oxide tunneling current is expected to significantly increase the energy consumption while degrading the reliability characteristics of future deeply scaled CMOS technologies. As discussed in Chapter 2, the dielectric thickness can be increased while maintaining a high capacitive coupling between the gate and the channel provided that silicon dioxide (SiO<sub>2</sub>) is replaced with a higher dielectric constant (high-k) material. Hafnium dioxide (HfO<sub>2</sub>) and zirconium dioxide (ZrO<sub>2</sub>) are the two most likely high-k materials to replace SiO<sub>2</sub> in the future. However, due to a number of process related difficulties, such as the low quality of the interface between these new materials and silicon which causes a degradation of the surface mobility, the transition to these new dielectric materials is expected to be slow (possibly after the 65 nm technology generation in 2007 [157]).

Recent research shows that specific circuit techniques that are effective for reducing subthreshold leakage current can increase the gate oxide leakage current [157]. Developing leakage reduction techniques that consider both subthreshold and gate oxide leakage currents in scaled nanometer CMOS integrated circuits will become an increasingly important research topic as the aggressive scaling of the gate insulator thickness and threshold voltages is expected to continue.

## 14.3. Reliability in CMOS Circuits

CMOS integrated circuits have become more sensitive to noise while on-chip noise levels continue to rise with each new technology generation. The reliability of CMOS integrated circuits has degraded due to scaling the device and interconnect dimensions and the on-chip voltage levels. Future research activities for modeling various sources of on-chip noise while developing noise suppression techniques in CMOS integrated circuits are described in Section 14.3.1. The growing importance of robust circuit techniques that can tolerate significant die-to-die and within-die parameter fluctuations are presented in Section 14.3.2. The challenges of developing robust on-chip clock generators based on phase-locked-loops that can tolerate high on-chip noise and significant parameter variations are discussed in Section 14.3.3.

## 14.3.1. On-Chip Noise and Immunity Issues in CMOS Integrated Circuits

Error free operation of CMOS circuits has become increasingly challenging as integrated circuit technologies evolve. Characterizing the various sources of noise in high performance integrated circuits will be a primary research focus. Circuit techniques that can suppress on-chip noise will become critical. Generating design guidelines for enhancing the noise immunity of CMOS integrated circuits at reduced feature sizes and lower supply and threshold voltages will be an important research emphasis.



Fig. 14.3. Effect of technology scaling on the physical geometries of interconnect lines.

An important source of noise in CMOS integrated circuits is interconnect coupling noise [158], [159]. Due to increasing device densities, interconnect lines are physically closer with each new technology generation, as shown in Fig. 14.3. The resistance of the interconnect lines increases since the width of the interconnect lines are reduced as part of the technology scaling process. Due to the increasing resistance of the interconnect lines, supply voltage fluctuations across a die have become significant. The increasing interconnect resistance also increases the parasitic power dissipation. In order to limit the increased resistance of the interconnect lines, the height of the interconnect lines are scaled at a much smaller rate as compared to the width with each

new technology generation. The aspect ratio, therefore, increases significantly, thereby increasing the coupling capacitance between adjacent interconnect lines on the same metal layer. Noise generated on a quiescent line due to the coupling capacitance between the quiescent line (victim line) and a nearby switching line (aggressor line) can cause erroneous transitions, excessive power consumption, and malfunctions in those circuits driven by the victim line.

Developing shielding techniques and guidelines for suppressing coupling noise in CMOS integrated circuits is an important research requirement. Due to limitations imposed by the die area, noise suppression based on shielding and wire spacing is non-trivial. An alternative technique for suppressing coupling noise is inserting repeaters along the long interconnect lines. Since an inverter is essentially a low pass filter with high noise immunity, repeater insertion can significantly suppress the generation and propagation of coupling noise on long interconnect lines. Repeater insertion techniques to optimize long interconnect lines for delay, power, and coupling noise suppression are highly desirable.

Another important source of noise in CMOS integrated circuits is power supply noise. Power supply noise has both low frequency and high frequency components [158]. The low frequency component of the power supply noise is due to the resistive IR drops within the package and along the on-chip power grid. Alternatively, the high frequency components of the power supply noise are due to the inductance of the package and the on-chip power grid. As both the supply current and the slew rates increase with each new technology generation, simultaneous switching noise (L di/dt noise) on the power supply and ground lines will also increase [160]. The performance and reliability of CMOS circuits supplied by these noisy power and ground lines degrades. In addition to simultaneous switching noise, noise caused by the mutual inductance between a signal wire and the current return paths will also become an increasingly important issue as the width of the interconnect lines in the upper metal layers and clock frequency are increased. Modeling and characterization of the resistive and inductive power supply noise will be an important research topic. Design

guidelines that suppress both the generation and transmission of resistive and inductive noise within the power distribution grids will need to be developed.

A monolithic switching DC-DC converter can produce significant noise on a microprocessor die, as shown in Fig. 14.4. Modeling and evaluating substrate noise generated by a monolithic DC-DC converter will be a future research objective. Similarly, on-chip clock generators inject considerable amounts of noise into the substrate. Moreover, the clock distribution network can act as an aggressor to the surrounding interconnect lines. Evaluating the on-chip clock generator and distribution network related noise will be another interesting research topic. Noise suppression methodologies based on shielding and repeater insertion, aimed at reducing the noise induced by the on-chip clock distribution network on the surrounding interconnect lines, will be an important research problem.



Fig. 14.4. Various sources of noise in a microprocessor.

#### 14.3.2. Parameter Variations

A significant issue with threshold voltage and device scaling is the increasing effect of die-to-die and within-die parameter variations on the speed and power dissipation characteristics of CMOS integrated circuits. Die-to-die and within-die fluctuations of the critical dimensions (gate length, gate oxide thickness, and junction depletion width) effectively increase with technology scaling. Moreover, the sensitivity of the threshold voltage to variations in the critical dimensions is greater due to increasing short-channel effects as the gate length and threshold voltage are both reduced with technology scaling. Process variations cause integrated circuits to exhibit different speed and power characteristics. The electrical characteristics of a CMOS circuit fabricated in a deep submicrometer process technology become increasingly probabilistic. The number of dies that satisfy a target clock frequency and maximum power dissipation constraint is reduced, degrading the yield. The increasing cost of fabricating deep submicrometer integrated circuits is, therefore, further aggravated by lower yields caused by greater process variations. Process parameter, supply voltage, and temperature variation tolerant integrated circuit design will be an important research area.

#### 14.3.3. On-Chip Clock Generation

Phase locked loops (PLL) are typically employed in high performance microprocessors for phase synchronization and frequency multiplication of an external clock. PLLs have low phase noise, clock skew, and jitter characteristics attractive for clock generation in a high performance microprocessor. The analog circuitry within a PLL is sensitive to process variations and noise. On-chip noise levels in high performance microprocessors increase with technology scaling and higher operating frequencies. Furthermore, the analog circuits within a PLL are not suitable for low voltage operation. Integrating a PLL within a high performance microprocessor, therefore, becomes increasingly difficult with technology scaling. Reliable monolithic

PLL clock generation with high tolerance to process, temperature, and supply voltage variations is an important research objective. PLLs operating at multiple supply voltages are also a topic of interest.

# **Bibliography**

- [1] M. T. Bohr, "Nanotechnology Goals and Challenges for Electronic Applications," *IEEE Transactions on Nanotechnology*, Vol. 1, No. 1, pp. 56-62, March 2002.
- [2] R. Ronen *et al.*, "Coming Challenges in Microarchitecture and Architecture," *Proceedings of the IEEE*, Vol. 89, No. 3, pp. 325-340, March 2001.
- [3] S. Borkar, "Design Challenges of Technology Scaling," *IEEE Micro*, Vol. 19, pp. 23-29, July/August 1999.
- [4] K. Roy and S. C. Prasad, "Low-Power CMOS VLSI Circuit Design," *John Wiley & Sons, Inc.*, 2000.
- [5] S. Borkar, "Obeying Moore's Law Beyond 0.18 Micron," *Proceedings of the IEEE International ASIC/SOC Conference*, pp. 26-31, September 2000.
- [6] D. M. Brooks *et al.*, "Power-Aware Microarchitecture: Design and Modeling Challenges for Next Generation Microprocessors," *IEEE Micro*, Vol. 20, pp. 26-44, November/December 2000.
- [7] M. J. Flynn, P. Hung, and K. W. Rudd, "Deep Submicron Microprocessor Design Issues," *IEEE Micro*, Vol. 19, pp. 11-22, July/August 1999.
- [8] O. Takahashi, S. H. Dhong, P. Hofstee, and J. Silberman, "High-Speed, Power-Conscious Circuit Design Techniques for High-Performance Computing," *Proceedings of the IEEE International Symposium on VLSI Technology, Systems, and Applications*, pp. 279-282, December 2001.
- [9] A. P. Chandrakasan and R. W. Brodersen, "Low Power CMOS Digital Design," *Kluwer Academic Publishers*, 1995.

- [10] S. H. Gunther, F. Binns, D. M. Carmean, and J. C. Hall, "Managing the Impact of Increasing Microprocessor Power Consumption," *Intel Technology Journal*, Q1 Issue, pp. 1-9, February 2001.
- [11] A. Slawsby, "Taking Charge: Trends in Mobile Device Power Consumption," *Intel Corporation Internal Documents and Presentations #27514*, pp. 1-13, June 2002.
- [12] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-Power CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, Vol. 27, No. 4, pp. 473-484, April 1992.
- [13] G. Moore, "Cramming More Components onto Integrated Circuits," *Electronics*, Volume 38, Number 8, pp. 114-117, April 1965.
- [14] G. E. Moore, "Progress in Digital Integrated Electronics," *Proceedings of the IEE International Electron Devices Meeting*, pp. 11-13, December 1975.
- [15] Y. Leblebici, "Design Considerations for CMOS Digital Circuits with Improved Hot-Carrier Reliability," *IEEE Journal of Solid-State Circuits*, Vol. 31, No. 7, pp. 1014-1024, July 1996.
- [16] P. P. Gelsinger, P. A. Gargini, G. H. Parker, and A. Y. C. Yu, "Microprocessors Circa 2000," *IEEE Spectrum*, pp. 43-47, October 1989.
- [17] G. Sery, S. Borkar, and V. De, "Life is CMOS: Why Chase the Life After," *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 78-83, June 2002.
- [18] J. R. Pfiester, J. D. Shott, and J. D. Meindl, "Performance Limits of CMOS ULSI," *IEEE Journal of Solid-State Circuits*, Vol. SC-20, No. 1, pp. 253-263, February 1985.
- [19] L. Chang et al., "Moore's Law Lives On," *IEEE Circuits and Devices Magazine*, Vol. 19, Issue. 1, pp. 35-42, January 2003.

- [20] T. Klein, "Technology and Performance of Integrated Complementary MOS Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-4, No. 3, pp. 122-130, June 1969.
- [21] S. Borkar, "Low Power Design Challenges for the Decade," *Proceedings of the IEEE/ACM Asia and South Pacific Design Automation Conference*, pp. 293-296, June 2001.
- [22] Intel Pentium 4 Processor Thermal Design Guide, Intel Corporation Press, 2002.
- [23] D. Liu and C. Svensson, "Trading Speed for Low Power by Choice of Supply and Threshold Voltages," *IEEE Journal of Solid-State Circuits*, Vol. 28, No. 1, pp. 10-17, January 1993.
- [24] W. S. Song and L. A. Glasser, "Power Distribution Techniques for VLSI Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-21, No. 1, pp. 150-156, February 1986.
- [25] A. V. Mezhiba and E. G. Friedman, "Inductive Properties of High-Performance Power Distribution Grids," *IEEE Transactions of Very Large Scale Integration (VLSI) Systems*, Vol. 10, No. 6, pp. 762-776, December 2002.
- [26] V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Efficiency Analysis of a High Frequency Buck Converter for On-Chip Integration with a Dual-V<sub>DD</sub> Microprocessor," *Proceedings of the European Solid-State Circuits Conference*, pp. 743-746, September 2002.
- [27] R. Gonzales, B. M. Gordon, and M. A. Horowitz, "Supply and Threshold Voltage Scaling for Low Power CMOS," *IEEE Journal of Solid-State Circuits*, Vol. 32, No. 8, pp. 1210-1216, August 1997.

- [28] K. Usami *et al.*, "Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor," *IEEE Journal of Solid-State Circuits*, Vol. 33, No. 3, pp. 463-472, March 1998.
- [29] S. Mutoh *et al.*, "1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS," *IEEE Journal of Solid-State Circuits*, Vol. 30, No. 8, pp. 847-854, August 1995.
- [30] V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Analysis of Buck Converters for On-Chip Integration with a Dual Supply Voltage Microprocessor," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 11, No. 3, pp. 514-522, June 2003.
- [31] V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Monolithic DC-DC Converter Analysis and MOSFET Gate Voltage Optimization," *Proceedings of the IEEE International Symposium on Quality Electronic Design*, pp. 279-284, March 2003.
- [32] V. Kursun, R. M. Secareanu, and E. G. Friedman, "CMOS Voltage Interface Circuit for Low Power Systems," *Proceedings of the IEEE International Symposium on Circuits and Systems*, Vol. 3, pp. 667-670, May 2002.
- [33] V. Kursun and E. G. Friedman, "Domino Logic with Variable Threshold Voltage Keeper," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 11, No. 6, pp. 1080-1093, December 2003.
- [34] V. Kursun and E. G. Friedman, "Low Swing Dual Threshold Voltage Domino Logic," *Proceedings of the ACM/SIGDA Great Lakes Symposium on VLSI*, pp. 47-52, April 2002.

- [35] S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman, "Managing Static Leakage Energy in Microprocessor Functional Units," *Proceedings of the IEEE/ACM International Symposium on Microarchitecture*, pp. 321-332, November 2002.
- [36] A. P. Chandrakasan and R. W. Brodersen, "Minimizing Power Consumption in Digital CMOS Circuits," *Proceedings of the IEEE*, Vol. 83, No. 4, pp. 498-523, April 1995.
- [37] A. Chandrakasan, W. J. Bowhill, and F. Fox, "Design of High-Performance Microprocessor Circuits," *The Institute of Electrical and Electronics Engineers, Inc., New York*, 2001.
- [38] J. W. Nilsson, "Electric Circuits," Addison-Wesley Publishing Company, 1994.
- [39] W. Liu et al., "BSIM4.2.0 MOSFET Model Users' Manual," University of California, Berkeley, 2000.
- [40] D. J. Frank *et al.*, "Device Scaling Limits of Si MOSFETs and Their Application Dependencies," *Proceedings of the IEEE*, Vol. 89, No. 3, pp. 259-288, March 2001.
- [41] Y. Taur, C. H. Wann, and D. J. Frank, "25 nm CMOS Design Considerations," *Proceedings of the IEEE International Electron Devices Meeting*, pp. 789-792, December 1998.
- [42] Y. Taur, "CMOS Scaling Beyond 0.1 µm: How Far Can It Go?," *Proceedings of the IEEE International Symposium on VLSI Technology, Systems, and Applications*, pp. 6-9, June 1999.
- [43] D. J. Frank, Y. Taur, and H-S. P. Wong, "Future Prospects for Si CMOS Technology," *Proceedings of the IEEE Annual Device Research Conference*, pp. 18-21, June 1999.

- [44] T. Grotjohn and B. Hoefflinger, "A Parametric Short-Channel MOS Transistor Model for Subthreshold and Strong Inversion Current," *IEEE Journal of Solid-State Circuits*, Vol. SC-19, No. 1, pp. 100-112, February 1984.
- [45] T. Toyabe and S. Asai, "Analytical Models of Threshold Voltage and Breakdown Voltage of Short-Channel MOSFET's Derived from Two-Dimensional Analysis," *IEEE Journal of Solid-State Circuits*, Vol. SC-14, No. 2, pp. 375-383, April 1979.
- [46] Y-S. Lin et al., "Leakage Scaling in Deep Submicron CMOS for SoC," *IEEE Transactions on Electron Devices*," Vol. 49, No. 6, pp. 1034-1041, June 2002.
- [47] W-L. Zhang, L-L. Tian, and Z-L. Yang, "Unified Deep-Submicron MOSFET Model for Circuit Simulation," *Proceedings of the IEEE International Conference on Solid-State and Integrated Circuit Technology*, pp. 439-442, October 1998.
- [48] A. Ferre and J. Figueras, "Characterization of Leakage Power in CMOS Technologies," *Proceedings of the IEEE International Conference on Electronics, Circuits and Systems*, Vol. 2, pp. 185-188, September 1998.
- [49] B. J. Sheu, D. L. Scharfetter, P-K. Ko, and M-C. Jeng, "BSIM: Berkeley Short-Channel IGFET Model for MOS Transistors," *IEEE Journal of Solid-State Circuits*, Vol. SC-22, No. 4, pp. 558-566, August 1987.
- [50] S. Narendra, V. De, S. Borkar, D. Antoniadis, and A. Chandrakasan, "Full-chip Sub-threshold Leakage Power Prediction Model for sub-0.18 µm CMOS," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 19-23, August 2002.
- [51] T. Ghani et al., "100 nm Gate Length High Performance / Low Power CMOS Transistor Structure," Proceedings of the IEEE International Electron Devices Meeting, pp. 415-418, December 1999.

- [52] S. Thompson *et al.*, "An Enhanced 130 nm Generation Logic Technology Featuring 60 nm Transistors Optimized for High Performance and Low Power at 0.7 1.4 V," *Proceedings of the IEEE International Electron Devices Meeting*, pp. 257-260, December 2001.
- [53] J. Cai et al., "Supply Voltage Strategies for Minimizing the Power of CMOS Processors," Proceedings of the IEEE International Symposium on VLSI Technology, pp. 102-103, June 2002.
- [54] S. Thompson *et al.*, "An 90 nm Logic Technology Featuring 50nm Strained Silicon Channel Transistors, 7 Layers of Cu Interconnects, Low k ILD, and 1 um<sup>2</sup> SRAM Cell," *Proceedings of the IEEE International Electron Devices Meeting*, pp. 61-64, December 2002.
- [55] T. Ghani *et al.*, "Scaling Challenges and Device Design Requirements for High Performance Sub-50 nm Gate Length Planar CMOS Transistors," *Proceedings of the IEEE International Symposium on VLSI Technology*, pp. 174-175, June 2000.
- [56] J. D. Plummer and P. B. Griffin, "Material and Process Limits in Silicon VLSI Technology," *Proceedings of the IEEE*, Vol. 89, No. 3, pp. 240-258, March 2001.
- [57] K. M. Cao *et al.*, "BSIM4 Gate Leakage Model Including Source-Drain Partition," *Proceedings of the IEEE International Electron Devices Meeting*, pp. 815-818, December 2000.
- [58] B. P. Linder *et al.*, "Voltage Dependence of Hard Breakdown Growth and the Reliability Implication in Thin Dielectrics," *IEEE Electron Device Letters*, Vol. 23, No. 11, pp. 661-663, November 2002.
- [59] B. P. Linder *et al.*, "Growth and Scaling of Oxide Conduction after Breakdown," *Proceedings of the IEEE International Reliability Physics Symposium*, pp. 402-405, March-April 2003.

- [60] N. R. Mohapatra, M. P. Desai, S. G. Narendra, and V. R. Rao, "The Effect of High-K Gate Dielectrics on Deep Submicrometer CMOS Device and Circuit Performance," *IEEE Transaction on Electron Devices*, Vol. 49, No. 5, pp. 826-831, May 2002.
- [61] H. J. M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits," *IEEE Journal of Solid-State Circuits*, Vol. SC-19, No. 4, pp. 468-473, August 1984.
- [62] K. Nose and T. Sakurai, "Analysis and Future Trend of Short-Circuit Power," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 19, No. 9, pp. 1023-1030, September 2000.
- [63] S. Narendra, D. Antoniadis, and V. De, "Impact of Using Adaptive Body Bias to Compensate Die-to-die V<sub>t</sub> Variation on Within-die V<sub>t</sub> variation," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 229-232, August 1999.
- [64] K. A. Bowman, S. G. Duvall, and J. D. Meindl, "Impact of Die-to-Die and Within-Die Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 2, pp. 183-190, February 2002.
- [65] K. Shimohigashi and K. Seki, "Low-Voltage ULSI Design," *IEEE Journal of Solid-State Circuits*, Vol. 28, No. 4, pp. 408-413, April 1993.
- [66] S-W. Sun and P. G. Y. Tsui, "Limitation of CMOS Supply-Voltage Scaling by MOSFET Threshold-Voltage Variation," *IEEE Journal of Solid-State Circuits*, Vol. 30, No. 8, pp. 947-949, August 1995.
- [67] K. Chen and C. Hu, "Performance and V<sub>dd</sub> Scaling in Deep Submicrometer CMOS," *IEEE Journal of Solid-State Circuits*, Vol. 33, No. 10, pp. 1586-1589, October 1998.

- [68] T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, *Kluwer Academic Publishers*, 2002.
- [69] K. J. Nowka *et al.*, "A 32-bit PowerPC System-on-a-Chip With Support for Dynamic Voltage Scaling and Dynamic Frequency Scaling," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 11, pp. 1441-1447, November 2002.
- [70] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A Dynamic Voltage Scaled Microprocessor System," *IEEE Journal of Solid-State Circuits*, Vol. 35, No. 11, pp. 1571-1580, November 2000.
- [71] T. D. Burd and R. W. Brodersen, "Design Issues for Dynamic Voltage Scaling," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 9-14, July 2000.
- [72] T. Kuroda *et al.*, "A 0.9-V, 150-MHz, 10-mW, 4mm<sup>2</sup>, 2-D Discrete Cosine Transform Core Processor with Variable Threshold-Voltage (VT) Scheme," *IEEE Journal of Solid-State Circuits*, Vol. 31, No. 11, pp. 1770-1779, November 1996.
- [73] M. Miyazaki, G. Ono, and K. Ishibashi, "A Delay Distribution Squeezing Scheme with Speed-Adaptive Threshold-Voltage CMOS (SA-Vt CMOS) for Low Voltage LSIs," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 48-53, July 1998.
- [74] S. –F. Huang *et al.*, "Scalability and Biasing Strategy for CMOS with Active Well Bias," *Proceedings of the IEEE International Symposium on VLSI Technology*, pp. 107-108, June 2001.
- [75] A. Keshavarzi et al., "Technology Scaling Behavior of Optimum Reverse Body Bias for Standby Leakage Power Reduction in CMOS IC's," Proceedings of the IEEE International Symposium on Low Power Electronics and Design, pp. 252-254, July 1999.

- [76] A. Keshavarzi *et al.*, "Effectiveness of Reverse Body Bias for Leakage Control in Scaled Dual Vt CMOS ICs," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 207-212, August 2001.
- [77] C. Wann *et al.*, "CMOS with Active Well Bias for Low-Power and RF/Analog Applications," *Proceedings of the IEEE International Symposium on VLSI Technology*, pp. 158-159, June 2000.
- [78] S. Narendra *et al.*, "Forward Body Bias for Microprocessors in 130-nm Technology Generation and Beyond," *IEEE Journal of Solid-State Circuits*, Vol. 38, No. 5, pp. 696-701, May 2003.
- [79] M. Miyazaki, G. Ono, and K. Ishibashi, "A 1.2-GIPS/W Microprocessor Using Speed-Adaptive Threshold-Voltage CMOS With Forward Bias," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 2, pp. 210-217, February 2002.
- [80] J. Tschanz *et al.*, "Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die Parameter Variations on Microprocessor Frequency and Leakage," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 11, pp. 1396-1402, November 2002.
- [81] K. Nose *et al.*, "V<sub>TH</sub>-Hopping Scheme to Reduce Subthreshold Leakage for Low-Power Processors," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 3, pp. 413-419, March 2002.
- [82] N. Kindert, T. Sugii, S. Tang, and C. Hu, "Dynamic Threshold Pass-Transistor Logic for Improved Delay at Lower Power Supply Voltages," *IEEE Journal of Solid-State Circuits*, Vol. 34, No. 1, pp. 85-89, January 1999.
- [83] S. Mutoh *et al.*, "A 1-V Multithreshold-Voltage CMOS Digital Signal Processor for Mobile Phone Application," *IEEE Journal of Solid-State Circuits*, Vol. 31, No. 11, pp. 1795-1802, November 1996.

- [84] T. Sakata, K. Itoh, M. Horiguchi, and M. Aoki, "Subthreshold-Current Reduction Circuits for Multi-Gigabit DRAM's," *IEEE Journal of Solid-State Circuits*, Vol. 29, No. 7, pp. 761-769, July 1994.
- [85] S. Mutoh, S. Shigematsu, Y. Gotoh, and S. Konaka, "Design Method of MTCMOS Power Switch for Low-Voltage High-Speed LSIs," *Proceedings of the IEEE Asia and South Pacific Design Automation Conference*, pp. 113-116, January 1999.
- [86] J. T. Kao and A. Chandrakasan, "Dual-Threshold Voltage Techniques for Low-Power Digital Circuits," *IEEE Journal of Solid-State Circuits*, Vol. 35, No. 7, pp. 1009-1018, July 2000.
- [87] M. Eisele, J. Berthold, D. Schmitt-Landsiedel, and R. Mahnkopf, "The Impact of Intra-Die Device Parameter Variations on Path Delays and on the Design for Yield of Low Voltage Digital Circuits," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 237-242, August 1996.
- [88] D. Takashima *et al.*, "Standby/Active Mode Logic for Sub-1-V Operating ULSI Memory," *IEEE Journal of Solid-State Circuits*, Vol. 29, No. 4, pp. 441-447, April 1994.
- [89] L. Su *et al.*, "A High-Performance Sub-0.25 µm CMOS Technology with Multiple Thresholds and Copper Interconnects," *Proceedings of the IEEE International Symposium on VLSI Technology*, pp. 18-19, June 1998.
- [90] T. McPherson *et al.*, "760 MHz G6 S/390 Microprocessor Exploiting Multiple Vt and Copper Interconnects," *Proceedings of the IEEE International Solid-State Circuits Conference*, pp. 96-97, February 2000.
- [91] T. Sakurai and A. R. Newton, "A Simple MOSFET Model for Circuit Analysis," *IEEE Transactions on Electron Devices*, Vol. 38, No. 4, pp. 887-894, April 1991.

- [92] M. M. Khellah and M. I. Elmasry, "Power Minimization of High-Performance Submicron CMOS Circuits Using a Dual-V<sub>dd</sub> Dual-V<sub>th</sub> (DVDV) Approach," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 106-108, June 1999.
- [93] T. Kuroda *et al.*, "Variable Supply-Voltage Scheme for Low-Power High-Speed CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, Vol. 33, No. 3, pp. 454-462, March 1998.
- [94] M. Takahashi *et al.*, "A 60-mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme," *IEEE Journal of Solid-State Circuits*, Vol. 33, No. 11, pp. 1772-1780, November 1998.
- [95] T. Kuroda and M. Hamada, "Low-Power CMOS Digital Design with Dual Embedded Adaptive Power Supplies," *IEEE Journal of Solid-State Circuits*, Vol. 35, No. 4, pp. 652-655, April 2000.
- [96] M. Hamada *et al.*, "A Top-Down Low Power Design Technique Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme," *Proceedings of the IEEE Custom Integrated Circuits Conference*, pp. 495-498, May 1998.
- [97] J. T. Kao, M. Miyazaki, and A. P. Chandrakasan, "A 175-mV Multiply-Accumulate Unit Using an Adaptive Supply Voltage and Body Bias Architecture," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 11, pp. 1545-1554, November 2002.
- [98] J. Tschanz, S. Narendra, R. Nair, and V. De, "Effectiveness of Adaptive Supply Voltage and Body Bias for Reducing Impact of Parameter Variations in Low Power and High Performance Microprocessors," *IEEE Journal of Solid-State Circuits*, Vol. 38, No. 5, pp. 826-829, May 2003.

- [99] H. Soeleman, K. Roy, and B. C. Paul, "Robust Subthreshold Logic for Ultra-Low Power Operation," *IEEE Transactions on Very Large Scale Integration (VLSI)* Systems, Vol. 9, No. 1, pp. 90-99, February 2001.
- [100] Y. Tsividis, Operation and Modeling of The MOS Transistor, *The McGraw-Hill Companies*, *Inc.*, 1999.
- [101]Y. Panov and M. M. Jovanovic, "Design and Performance Evaluation of Low-Voltage/High-Current DC/DC On-Board Modules," *IEEE Transactions on Power Electronics*, Vol. 16, No. 1, pp. 26-33, January 2001.
- [102] R. W. Erickson and D. Maksimovic, Fundamentals of Power Electronics, *Kluwer Academic Publishers*, 2001.
- [103] T. Furuyama, Y. Watanabe, T. Ohsawa, and S. Watanabe, "A New On-Chip Voltage Converter for Submicrometer High-Density DRAM's," *IEEE Journal of Solid-State Circuits*, Vol. SC-22, No. 3, pp. 437-440, June 1987.
- [104]D. Takashima *et al.*, "Low-Power On-Chip Supply Voltage Conversion Scheme for Ultrahigh-Density DRAM's," *IEEE Journal of Solid-State Circuits*, Vol. 28, No. 4, pp. 504-509, April 1993.
- [105]T. Ooishi *et al.*, "A Mixed-Mode Voltage Down Converter with Impedance Adjustment Circuitry for Low-Voltage High-Frequency Memories," *IEEE Journal of Solid-State Circuits*, Vol. 31, No. 4, pp. 575-585, April 1996.
- [106] T. Endoh, K. Sunaga, H. Sakuraba, and F. Masuoka, "An On-Chip 96.5% Current Efficiency CMOS Linear Regulator Using a Flexible Control Technique of Output Current," *IEEE Journal of Solid-State Circuits*, Vol. 36, No. 1, pp. 34-38, January 2001.
- [107] C-C. Wang and J-C. Wu, "Efficiency Improvement in Charge Pump Circuits," *IEEE Journal of Solid-State Circuits*, Vol. 32, No. 6, pp. 852-860, June 1997.

- [108]B. Arntzen and D. Maksimovic, "Switched-Capacitor DC/DC Converters with Resonant Gate Drive," *IEEE Transactions on Power Electronics*, Vol. 13, No. 5, pp. 892-902, September 1998.
- [109]D. Maksimovic and S. Dhar, "Switched-Capacitor DC-DC Converters for Low-Power On-Chip Applications," *Proceedings of the IEEE Power Electronics Specialists Conference*, pp. 54-59, June 1999.
- [110]R. Blanchard and P. E. Thibodeau, "The Design of a High Efficiency, Low Voltage Power Supply Using MOSFET Synchronous Rectification and Current Mode Control," *Proceedings of the IEEE Power Electronics Specialists Conference*, pp. 355-361, June 1985.
- [111]R. S. Kagan and M. Chi, "Improving Power Supply Efficiency with MOSFET Synchronous Rectifiers," *Proceedings of the International Solid-State Power Conversion Conference*, pp. D4.1-D4.9, July 1982.
- [112] S. K. Reynolds, "A DC-DC Converter for Short-Channel CMOS Technologies," *IEEE Journal of Solid-State Circuits*, Vol. 32, No. 1, pp. 111-113, January 1997.
- [113] A. Stratakos, S. R. Sanders, and R. W. Brodersen, "A Low-Voltage CMOS DC-DC Converter for a Portable Battery-Operated System," *Proceedings of the IEEE Power Electronics Specialists Conference*, pp. 619-626, April 1994.
- [114]B. Arbetter and D. Maksimovic, "DC-DC Converter with Fast Transient Response and High Efficiency for Low-Voltage Microprocessor Loads," *Proceedings of the IEEE Applied Power Electronics Conference*, pp. 156-162, April 1998.
- [115]B. Arbetter and D. Maksimovic, "Control Method for Low-Voltage DC Power Supply in Battery-Powered Systems with Power Management," *Proceedings of the IEEE Power Electronics Specialists Conference*, pp. 1198-1204, April 1997.

- [116] S. H. Weinberg, "A Novel Lossless Resonant MOSFET Driver," *Proceedings of the IEEE Power Electronics Specialists Conference*, pp. 1003-1010, 1992.
- [117]D. Maksimovic, "A MOS Gate Drive with Resonant Transitions," *Proceedings* of the IEEE Power Electronics Specialists Conference, pp. 527-532, April 1991.
- [118] P. E. Gronowski *et al.*, "High-Performance Microprocessor Design," *IEEE Journal of Solid-State Circuits*, Vol. 33, No. 5, pp. 676-686, May 1998.
- [119]D. Gardner, A. M. Crawford, and S. Wang, "High Frequency (GHz) and Low Resistance Integrated Inductors Using Magnetic Materials," *Proceedings of the IEEE International Interconnect Technology Conference*, pp. 101-103, June 2001.
- [120]B. S. Cherkauer and E. G. Friedman, "A Unified Design Methodology for CMOS Tapered Buffers," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 3, No. 1, pp. 99-111, March 1995.
- [121]D. Gardner, Intel Corporation, Components Research, Santa Clara, California, personal communication, 2001.
- [122]R. M. Secareanu and E. G. Friedman, "A Universal CMOS Voltage Interface Circuit," *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1242-1245, May 1999.
- [123] J. S. Caravella and J. H. Quigley, "Three Volt to Five Volt CMOS Interface Circuit with Device Leakage Limited DC Power Dissipation," *Proceedings of the IEEE ASIC Conference*, pp. 448-451, September 1993.
- [124]R. Golshan and B. Haroun, "A Novel Reduced Swing CMOS Bus Interface Circuit for High Speed Low Power VLSI Systems," *Proceedings of the IEEE International Symposium on Circuits and Systems*, Vol. 4, pp. 351-354, June 1994.

- [125]H. Zhang, V. George, and J. M. Rabaey, "Low-Swing On-Chip Signaling Techniques: Effectiveness and Robustness," *IEEE Transactions on VLSI Systems*, Vol. 8, No. 3, pp. 264-272, June 2000.
- [126]Y. Nakagome *et al.*, "Sub 1-V Swing Internal Bus Architecture for Future Low-Power ULSI's," *IEEE Journal of Solid- State Circuits*, Vol. 28, No. 4, pp. 414-419, April 1993.
- [127]K. J. Nowka and T. Galambos, "Circuit Design Techniques for a Gigahertz Integer Microprocessor," *Proceedings of the IEEE International Conference on Computer Design*, pp. 11-16, October 1998.
- [128] A. Alvandpour, P. Larsson-Edefors, and C. Svensson, "A Leakage Tolerant Multi-Phase Keeper for Wide Domino Circuits," *Proceedings of the IEEE International Conference on Electronics, Circuits and Systems*, pp. 209-212, September 1999.
- [129] A. Alvandpour, R. K. Krishnamurty, K. Soumyanath, and S. Y. Borkar, "A Sub-130-nm Conditional Keeper Technique," *IEEE Journal of Solid-State Circuits*, Vol. 37, No. 5, pp. 633-638, May 2002.
- [130]M. W. Allam, M. H. Anis, and M. I. Elmasry, "High-Speed Dynamic Logic Styles for Scaled-Down CMOS and MTCMOS Technologies," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 155-160, July 2000.
- [131] A. Keshavarzi, S. Narendra, B. Bloechel, S. Borkar, and V. De, "Forward Body Bias for Microprocessors in 130nm Technology Generation and Beyond," *Proceedings of the IEEE International Symposium on VLSI Circuits*, pp. 312-315, June 2002.

- [132] J. Tschanz, S. Narendra, R. Nair, and V. De, "Effectiveness of Adaptive Supply Voltage and Body Bias for Reducing the Impact of Parameter Variations in Low Power and High Performance Microprocessors," *Proceedings of the IEEE International Symposium on VLSI Circuits*, pp. 310-311, June 2002.
- [133]I. S. Hwang and A. L. Fisher, "Ultrafast Compact 32-bit CMOS Adders in Multiple-Output Domino Logic," *IEEE Journal of Solid-State Circuits*, Vol. 24, No. 2, pp. 358-369, April 1989.
- [134]P. Srivastava, A. Pua, and L. Welch, "Issues in the Design of Domino Logic Circuits," *Proceedings of the IEEE Great Lakes Symposium on VLSI*, pp. 108-112, February 1998.
- [135] S. Rusu and G. Singer, "The First IA-64 Microprocessor," *IEEE Journal of Solid-State Circuits*, Vol. 35, No. 11, pp. 1539-1544, November 2000.
- [136] J. Kao, "Dual Threshold Voltage Domino Logic," *Proceedings of the European Solid-State Circuits Conference*, pp. 118-121, September 1999.
- [137]J. Silberman *et al.*, "A 1.0-GHz Single-Issue 64-Bit PowerPC Integer Processor," *IEEE Journal of Solid-State Circuits*, Vol. 33, No. 11, pp. 1600-1608, November 1998.
- [138] G. Balamurugan and N. R. Shanbhag, "Energy-efficient Dynamic Circuit Design in the Presence of Crosstalk Noise," *Proceedings of the IEEE International Symposium on Low Power Electronics and Design*, pp. 24-29, August 1999.
- [139] A. Rjoub, O. Koufopavlou, and S. Nikolaidis, "Low-Power/Low Swing Domino CMOS Logic," *Proceedings of the IEEE International Symposium on Circuits and Systems*, Vol. 2, pp. 13-16, May 1998.

- [140] S. Shieh, J. Wang, and Y. Yeh, "A Contention-Alleviated Static Keeper For High-Performance Domino Logic Circuits," *Proceedings of the IEEE International Conference on Electronics, Circuits, and Systems*, Vol. 2, pp. 707-710, September 2001.
- [141] S. Jung, S. Yoo, K. Kim, and S. Kang, "Skew-Tolerant High-Speed (STHS) Domino Logic," *Proceedings of the IEEE International Symposium on Circuits and Systems*, Vol. 4, pp. 154-157, May 2001.
- [142] S. Heo and K. Asanovic, "Leakage-Biased Domino Circuits for Dynamic Fine-Grain Leakage Reduction," *Proceedings of the IEEE International Symposium on VLSI Circuits*, pp. 316-319, June 2002.
- [143] Y. Ye, S. Borkar, and V. De, "A New Technique for Standby Leakage Reduction in High-Performance Circuits," *Proceedings of the IEEE International Symposium on VLSI Circuits*, pp. 40-41, June 1998.
- [144] C. C. Wang, P. M. Lee, and K. L. Chen, "An SRAM Design Using Dual Threshold Voltage Transistors and Low-Power Quenchers," *IEEE Journal of Solid-State Circuits*, Vol. 38, No. 10, pp.1712-1720, October 2003.
- [145] R. K. Krishnamurthy, A. Alvandpour, V. De, and S. Borkar, "High-performance and Low-power Challenges for Sub-70nm Microprocessor Circuits," *Proceedings of the IEEE Custom Integrated Circuits Conference*, pp. 125-128, May 2002.
- [146]S. Narendra et al., "Scaling of Stack Effect and Its Application for Leakage Reduction," Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, pp. 195-200, August 2001.
- [147]J. –T. Park and J. –P. Colinge, "Multiple-Gate SOI MOSFETs: Device Design Guidelines," *IEEE Transactions on Electron Devices*, Vol. 49, No. 12, pp. 2222-2229, December 2002.

- [148]D. W. Greve, *Field Effect Devices and Applications*, New Jersey: Prentice-Hall, Inc., 1998.
- [149]D. K. Schroder, "Low Power Silicon Devices", in *The Encyclopedia of Materials: Science and Technology* (K.H.J. Buschow, R.W. Cahn, M.C. Flemings. B. Ilschner, E.J. Kramer, and S. Mahajan, eds.), Elsevier, 2001.
- [150] C. T. Chuang, P. F. Lu, and C. J. Anderson, "SOI for Digital CMOS VLSI: Design Considerations and Advances," *Proceedings of the IEEE*, Vol. 86, No. 4, pp. 689-720, April 1998.
- [151]M. Y. Hammad and D. K. Schroder, "Analytical Modeling of the Partially-Depleted SOI MOSFET," *IEEE Transactions on Electron Devices*, Vol. 48, No. 2, pp. 252-258, February 2001.
- [152] Y. Liu *et al.*, "Systematic Electrical Characteristics of Ideal Rectangular Cross Section Si-Fin Channel Double-Gate MOSFETs Fabricated by a Wet Process," *IEEE Transactions on Nanotechnology*, Vol. 2, No. 4, pp. 198-204, December 2003.
- [153] J. Xu, "Nanotube Electronics: Non-CMOS Routes," *Proceedings of the IEEE*, Vol. 91, No. 11, pp. 1819-1829, November 2003.
- [154] P. Avouris *et al.*, "Carbon Nanotube Electronics," *Proceedings of the IEEE*, Vol. 91, No. 11, pp. 1772-1784, November 2003.
- [155]G. Magklis *et al.*, "Dynamic Frequency and Voltage Scaling for a Multiple-Clock Domain Microprocessor," *IEEE Micro*, Vol. 23, Issue 6, pp. 62-68, November-December 2003.
- [156]G. Semeraro et al., "Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling," Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, pp. 29-40, February 2002.

[157] D. Lee, D. Blaauw, and D. Sylvester, "Gate Oxide Leakage Current Analysis and Reduction for VLSI Circuits" *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, Volume: 12, Issue: 2, pp. 155-166, Feb. 2004.

[158]R. Kumar, "Interconnect and Noise Immunity Design for the Pentium 4 Processor," *Intel Technology Journal Q1*, pp. 1-12, 2001.

[159] K. L. Shepard and V. Narayanan, "Conquering Noise in Deep-Submicron Digital ICs," *IEEE Design and Test of Computers*, Vol. 15, Issue 1, pp. 51-62, January-March 1998.

[160]K. T. Tang and E. G. Friedman, "Simultaneous Switching Noise in On-Chip CMOS Power Distribution Networks," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 10, No. 4, pp. 487-493, August 2002.

# **Appendix A: Patents**

- 1. V. Kursun and E. G. Friedman, "Domino Logic with Variable Threshold Voltage Keeper," U.S. Patent Pending.
- 2. V. Kursun and E. G. Friedman, "Dual Threshold Voltage and Low Swing Domino Logic Circuits," U.S. Patent Pending.
- 3. P. Hazucha, G. Schrom, T. Karnik, V. Kursun, and S. G. Narendra, "DC-DC Converters Utilizing Transformers with Segmented Windings," U.S. Patent Pending.
- 4. D. S. Gardner, V. Kursun, and S. Narendra, "Fully Integrated DC to DC Converter Utilizing On-Chip Inductors with High Frequency Magnetic Materials," U.S. Patent Pending.
- 5. V. Kursun, S. Narendra, G. Schrom, P. Hazucha, T. Karnik, and V. De, "High Voltage CMOS Driver Circuits," U.S. Patent Pending.

## **Appendix B: Publications**

## **Authored Book**

1. V. Kursun and E. G. Friedman, *Multiple Supply and Threshold Voltage CMOS Circuits*, New York: John Wiley & Sons, Inc. (in preparation).

#### **Journal Publications**

- 1. V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Low Voltage Swing Monolithic DC-DC Conversion," *IEEE Transactions on Circuits and Systems II:*Analog and Digital Signal Processing, Vol. 51, No. 5, May 2004.
- 2. V. Kursun and E. G. Friedman, "Sleep Switch Dual Threshold Voltage Domino Logic with Reduced Standby Leakage Current," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 12, No. 5, May 2004.
- 3. D. H. Albonesi, R. Balasubramonian, S. G. Dropsho, S. Dwarkadas, E. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster, "Dynamically Tuning Processor Resources with Adaptive Processing," *IEEE Computer*, Vol. 36, No. 12, pp. 49-58, December 2003.
- 4. V. Kursun and E. G. Friedman, "Domino Logic with Variable Threshold Voltage Keeper," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 11, No. 6, pp. 1080-1093, December 2003.

5. V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Analysis of Buck Converters for On-Chip Integration with a Dual Supply Voltage Microprocessor," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 11, No. 3, pp. 514-522, June 2003.

### **Journal Papers in Review**

- 6. V. Kursun and E. G. Friedman, "Tradeoffs in Dual Threshold Voltage Domino Logic," *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications* (in review).
- 7. V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Cascode Monolithic DC-DC Converter for Reliable Operation at High Input Voltages," *International Journal of Analog Integrated Circuits and Signal Processing* (in review).

### **Conference Publications**

- 8. V. Kursun and E. G. Friedman, "Energy Efficient Dual Threshold Voltage Dynamic Circuits Employing Sleep Switches to Minimize Subthreshold Leakage," *Proceedings of the IEEE International Symposium on Circuits and Systems*, May 2004.
- 9. V. Kursun and E. G. Friedman, "Forward Body Biased Keeper for Enhanced Noise Immunity in Domino Logic Circuits," *Proceedings of the IEEE International Symposium on Circuits and Systems*, May 2004.
- 10. V. Kursun and E. G. Friedman, "Node Voltage Dependent Subthreshold Leakage Current Characteristics of Dynamic Circuits," *Proceedings of the IEEE/ACM International Symposium on Quality Electronic Design*, pp. 104-109, March 2004.

- 11. V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "High Input Voltage Step-Down DC-DC Converters for Integration in a Low Voltage CMOS Process," *Proceedings of the IEEE/ACM International Symposium on Quality Electronic Design*, pp. 517-521, March 2004.
- 12. V. Kursun and E. G. Friedman, "Speed and Noise Immunity Enhanced Low Power Dynamic Circuits," *Technical Digest of the Semiconductor Research Corporation* (SRC) TECHCON, August 2003.
- 13. V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Monolithic DC-DC Converter Analysis and MOSFET Gate Voltage Optimization," *Proceedings of the IEEE/ACM International Symposium on Quality Electronic Design*, pp. 279-284, March 2003.
- 14. S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman, "Managing Static Leakage Energy in Microprocessor Functional Units," *Proceedings of the IEEE/ACM International Symposium on Microarchitecture*, pp. 321-332, November 2002.
- 15. V. Kursun and E. G. Friedman, "Variable Threshold Voltage Keeper for Contention Reduction in Dynamic Circuits," *Proceedings of the IEEE International ASIC/SOC Conference*, pp. 314-318, September 2002.
- 16. V. Kursun, S. G. Narendra, V. K. De, and E. G. Friedman, "Efficiency Analysis of a High Frequency Buck Converter for On-Chip Integration with a Dual-V<sub>DD</sub> Microprocessor," *Proceedings of the European Solid-State Circuits Conference*, pp. 743-746, September 2002.
- 17. V. Kursun and E. G. Friedman, "Domino Logic with Dynamic Body Biased Keeper," *Proceedings of the European Solid-State Circuits Conference*, pp. 675-678, September 2002.

- 18. V. Kursun, R. M. Secareanu, and E. G. Friedman, "CMOS Voltage Interface Circuit for Low Power Systems," *Proceedings of the IEEE International Symposium on Circuits and Systems*, Vol. 3, pp. 667-670, May 2002.
- 19. V. Kursun and E. G. Friedman, "Low Swing Dual Threshold Voltage Domino Logic," *Proceedings of the ACM/SIGDA Great Lakes Symposium on VLSI*, pp. 47-52, April 2002.
- 20. V. Kursun, R. M. Secareanu, and E. G. Friedman, "Low Power CMOS Bi-Directional Voltage Converter," *Proceedings of the IEEE EDS/CAS Activities in Western New York Conference*, pp. 6-7, November 2001.

## **Conference Paper in Review**

21. V. Kursun, G. Schrom, S. G. Narendra, V. K. De, and E. G. Friedman, "Cascode Buffer for Monolithic Voltage Conversion Operating at High Input Supply Voltages," *Proceedings of the IEEE International SOC Conference*, September 2004 (in review).