

# **Power Delivery in High Current 3-D Systems**

by

Kan Xu

Submitted in Partial Fulfillment

of the

Requirements for the Degree  
Doctor of Philosophy

Supervised by  
Professor Eby G. Friedman

Department of Electrical and Computer Engineering  
Arts, Sciences and Engineering  
Edmund A. Hajim School of Engineering and Applied Sciences

University of Rochester  
Rochester, New York  
2020

# Dedication

This work is dedicated to my parents, Jianping Guo and Huisheng Xu.

# Table of Contents

|                                                               |          |
|---------------------------------------------------------------|----------|
| Biographical Sketch                                           | ix       |
| Acknowledgments                                               | xii      |
| Abstract                                                      | xv       |
| Contributors and Funding Sources                              | xvii     |
| List of Tables                                                | xviii    |
| List of Figures                                               | xx       |
| <b>1 Introduction</b>                                         | <b>1</b> |
| 1.1 Industrial revolutions and computer systems . . . . .     | 3        |
| 1.2 Power delivery network within a computer system . . . . . | 7        |
| 1.3 Outline . . . . .                                         | 13       |

|                                                                          |           |
|--------------------------------------------------------------------------|-----------|
| <b>2 Power Delivery Networks in High Performance 2-D and 3-D Systems</b> | <b>21</b> |
| 2.1 Power delivery network for high performance processors . . . . .     | 24        |
| 2.1.1 On-chip power noise . . . . .                                      | 24        |
| 2.1.2 Breakdown of on-chip multilayer power delivery networks . .        | 27        |
| 2.1.3 Modeling and efficient simulation of power grid . . . . .          | 34        |
| 2.2 Power delivery networks for 2.5-D and 3-D systems . . . . .          | 39        |
| 2.2.1 2.5-D power delivery network . . . . .                             | 42        |
| 2.2.2 3-D power delivery network . . . . .                               | 44        |
| 2.2.3 Power and ground TSV . . . . .                                     | 47        |
| 2.3 Summary . . . . .                                                    | 53        |
| <b>3 Challenges in High Current 2-D and 3-D systems</b>                  | <b>56</b> |
| 3.1 High current challenges at the PCB and package levels . . . . .      | 61        |
| 3.1.1 “Last inch” power loss . . . . .                                   | 63        |
| 3.1.2 Electromigration challenges . . . . .                              | 66        |
| 3.1.3 Advanced cooling systems . . . . .                                 | 70        |
| 3.2 High current challenges of on-chip 2-D and 3-D power networks . .    | 72        |
| 3.3 Summary . . . . .                                                    | 74        |
| <b>4 Power Noise in Advanced FinFET Technology Nodes</b>                 | <b>76</b> |

|          |                                                              |            |
|----------|--------------------------------------------------------------|------------|
| 4.1      | Standard cell-based power network . . . . .                  | 79         |
| 4.1.1    | Hierarchy of power grids . . . . .                           | 81         |
| 4.1.2    | Standard cell based power rails . . . . .                    | 82         |
| 4.2      | Circuit model . . . . .                                      | 83         |
| 4.2.1    | Load model . . . . .                                         | 84         |
| 4.2.2    | Rail model . . . . .                                         | 86         |
| 4.2.3    | Striping of power rail . . . . .                             | 87         |
| 4.3      | Characterization of power noise . . . . .                    | 89         |
| 4.3.1    | Power noise components . . . . .                             | 90         |
| 4.3.2    | Different technology nodes . . . . .                         | 94         |
| 4.4      | Power noise suppression . . . . .                            | 96         |
| 4.4.1    | Additional global power metal layers . . . . .               | 96         |
| 4.4.2    | Stripes technique . . . . .                                  | 98         |
| 4.4.3    | Graphene interconnects . . . . .                             | 99         |
| 4.4.4    | Scaling of local power rails . . . . .                       | 101        |
| 4.4.5    | Metalization schemes for advanced technology nodes . . . . . | 103        |
| 4.5      | Summary . . . . .                                            | 104        |
| <b>5</b> | <b>EMI Challenge in 2.5-D System with High Voltage VRs</b>   | <b>106</b> |
| 5.1      | LLC resonant converter . . . . .                             | 109        |
| 5.1.1    | Sinusoidal current generation . . . . .                      | 109        |

|          |                                                                        |            |
|----------|------------------------------------------------------------------------|------------|
| 5.1.2    | Operation of the LLC resonant converter . . . . .                      | 112        |
| 5.1.3    | Performance evaluation . . . . .                                       | 113        |
| 5.2      | Performance degradation due to high step down ratio . . . . .          | 116        |
| 5.3      | LLC resonant converter with distributed topology . . . . .             | 119        |
| 5.4      | Near field EMI in SiP environment . . . . .                            | 124        |
| 5.4.1    | EMI background . . . . .                                               | 125        |
| 5.4.2    | EMI evaluation setup in SiP environment . . . . .                      | 126        |
| 5.4.3    | Simulation results and analysis . . . . .                              | 129        |
| 5.5      | Summary . . . . .                                                      | 134        |
| <b>6</b> | <b>Power Noise and EMI in VR Top and Bottom Placements</b>             | <b>135</b> |
| 6.1      | Top and bottom placement . . . . .                                     | 138        |
| 6.2      | Package design specifications . . . . .                                | 141        |
| 6.3      | EMI and power noise evaluation . . . . .                               | 144        |
| 6.4      | Package layer comparison . . . . .                                     | 152        |
| 6.5      | Summary . . . . .                                                      | 155        |
| <b>7</b> | <b>Insertion Loss Due to Placement of Multiple Waveguide Crossings</b> | <b>157</b> |
| 7.1      | Previous Work . . . . .                                                | 161        |
| 7.2      | Placement of multiple waveguide crossings . . . . .                    | 163        |
| 7.2.1    | Example one . . . . .                                                  | 164        |

|          |                                                                             |            |
|----------|-----------------------------------------------------------------------------|------------|
| 7.2.2    | Example two                                                                 | 166        |
| 7.2.3    | Example three                                                               | 169        |
| 7.3      | Discussion                                                                  | 170        |
| 7.4      | Case Study of an 8 x 8 GWOR ONoC Router                                     | 173        |
| 7.5      | Summary                                                                     | 176        |
| <b>8</b> | <b>Design Guidelines for RDL-Based Power Networks</b>                       | <b>177</b> |
| 8.1      | Background and previous work                                                | 181        |
| 8.2      | RDL with different 3-D manufacturing methods                                | 184        |
| 8.2.1    | RDL within different TSV fabrication processes                              | 184        |
| 8.2.2    | RDL with different 3-D stacking topologies                                  | 192        |
| 8.3      | Grid-Based RDL in 3-D ICs                                                   | 199        |
| 8.3.1    | Grid-based P/G RDL model                                                    | 200        |
| 8.3.2    | Comparison between grid-based RDL and P2P RDL                               | 204        |
| 8.4      | Summary                                                                     | 211        |
| <b>9</b> | <b>Parasitic Impedance Aware Power Delivery for Voltage Stacked Systems</b> | <b>213</b> |
| 9.1      | Background and previous work                                                | 215        |
| 9.1.1    | Challenges of load imbalances in voltage stacked systems                    | 216        |
| 9.1.2    | Existing work on mitigating load imbalances                                 | 223        |

|           |                                                                                                     |            |
|-----------|-----------------------------------------------------------------------------------------------------|------------|
| 9.2       | Performance degradation due to parasitic impedances . . . . .                                       | 225        |
| 9.3       | Power delivery network of voltage stacked differential power processing systems . . . . .           | 229        |
| 9.3.1     | Power delivery network of voltage stacked systems . . . . .                                         | 230        |
| 9.3.2     | Power delivery network for DPP systems . . . . .                                                    | 235        |
| 9.3.3     | Resonant converter-based stack-to-bus topology . . . . .                                            | 240        |
| 9.4       | Tile-based power delivery network for voltage stacked systems . . . . .                             | 243        |
| 9.5       | Summary . . . . .                                                                                   | 250        |
| <b>10</b> | <b>Conclusions</b>                                                                                  | <b>251</b> |
| <b>11</b> | <b>Future Work</b>                                                                                  | <b>256</b> |
| 11.1      | Comparison between stack-to-bus and stack-to-stack topologies for voltage stacked systems . . . . . | 257        |
| 11.2      | Combination of voltage stacking within 3-D ICs . . . . .                                            | 262        |
|           | <b>Bibliography</b>                                                                                 | <b>267</b> |

# Biographical Sketch

Kan Xu was born in Anyang, Henan Province, China. He received the Bachelor of Science degree in electrical engineering from North China University of Water Resources and Electric Power, Zhengzhou, China, 2012, and the Master of Science degree in electrical and computer engineering from the University of Rochester, Rochester, New York, in 2015. He interned with the Power Integrity Team at Google Inc., Mountain View, California in 2019. He is currently completing the Ph.D. degree in electrical engineering from the University of Rochester, Rochester, New York, under the supervision of Prof. Eby G. Friedman. His current research interests include on-chip and package level power delivery networks for high current HPC systems, voltage stacking for high power VLSI systems, 3-D integration, and optical waveguides.

The following publications are a result of work conducted during his doctoral study.

## **Journal papers**

1. K. Xu, B. Vaisband, G. Sizikov, X. Li and E. G. Friedman, “EMI Suppression

With Distributed LLC Resonant Converter for High-Voltage VR-on-Package,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, pp. 263 - 271, February 2020.

2. R. Bairamkulov, K. Xu, M. Popovich, J. S. Ochoa, V. Srinivas and E. G. Friedman, “Power Delivery Exploration Methodology based on Constrained Optimization,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 39, No. 9, September 2020 (in press).
3. K. Xu, B. Vaisband, G. Sizikov, X. Li and E. G. Friedman, “Power Noise and Near-Field EMI of High-Current System-in-Package With VR Top and Bottom Placements,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, Vol. 9, No. 4, pp. 712 - 718, April 2019.
4. K. Xu, R. Patel, P. Raghavan and E. G. Friedman, “Exploratory Design of On-Chip Power Delivery for 14, 10, and 7 nm and Beyond FinFET ICs,” *Integration, the VLSI Journal*, Vol. 61, pp. 11 - 19, March 2018.
5. K. Xu and E. G. Friedman, “Insertion Loss Due to Placement of Multiple Waveguide Crossings,” (in submission).
6. K. Xu and E. G. Friedman, “Design Guidelines for RDL-Based Power Networks,” (in submission).

7. K. Xu and E. G. Friedman, “Parasitic Impedance of Power Delivery Networks within Voltage Stacked Systems,” (in submission).

### Conference papers

1. K. Xu, M. Popovich, G. Sizikov and E. G. Friedman, “Distributed Port Assignment for Extraction of Power Delivery Networks,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1 - 5, October 2020.
2. K. Xu and E. G. Friedman, “Challenges in High Current On-Chip Voltage Stacked System,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1 - 5, October 2020.
3. R. Bairamkulov, K. Xu, M. Popovich, J. S. Ochoa, V. Srinivas and E. G. Friedman, “Versatile Framework for Power Delivery Exploration,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1 - 5, May 2018.
4. K. Xu, B. Vaisband, G. Sizikov, X. Li and E. G. Friedman, “Distributed Sinusoidal Resonant Converter with High Step-Down Ratio,” *Proceedings of the IEEE International Conference on Electrical Performance of Electronic Packaging and Systems*, pp. 1 - 3, October 2017.
5. R. Patel, K. Xu, E. G. Friedman and P. Raghavan, “Exploratory Power Noise Models of Standard Cell 14, 10, and 7 nm FinFET ICs,” *Proceedings of the ACM Great Lakes Symposium on VLSI*, pp. 233 - 238, May 2016.

# Acknowledgments

It has been an amazing journey for me, where a number of people have played significant roles in the completion of this dissertation. I thank everyone who has helped and supported me through my Ph.D. experience.

First and foremost, I would like to express my sincere gratitude to Professor Eby G. Friedman, my advisor, mentor, and guide through this challenging journey. I thank your technical supervision, and patient and individualized guidance, without which I would have struggled a lot more conducting research and completing my Ph.D. Through countless short but intelligent talks you gave, you taught me passion, stoicism, critical thinking, and how to approach problems, which not only has helped my research but will also benefit me for the rest of my life. I have learned so much from you that I'm lacking words to properly express my appreciation.

I first came to the University of Rochester as a masters student, not knowing what my plans were afterwards. But I'm the luckiest; you generously offered me the opportunity of this journey with guidance, which turns out has made me who I am,

what I value, and how I see this world. For that, I will forever appreciate what you have provided.

I would like to thank the members of my committee, Professor Engin Ipek, Professor Selcuk Kose, and Professor Yuhao Zhu, for your valuable feedback and suggestions throughout my Ph.D. research. I really appreciate your time and effort on my dissertation. I would also like to thank Professor Jianhui Zhong for serving as the chairperson in my defense committee.

I would like to express my gratitude to my external committee member, Dr. Mikhail Popovich, for providing the precious intern opportunity and supporting me during my Ph.D. work. Thank you for the fruitful conversations, and for guiding me through a more than wonderful intern experience. I can never emphasize enough how much my internship experience helped me in the transition to my next journey.

I thank the Department of Electrical and Computer Engineer at the University of Rochester for the countless help and support I received from the administrative and technical staff. With special thanks to the Graduate Administrator, Michele Foster, for providing precious suggestions to help me get through my first semester in Rochester.

Thank you to my dear members of the High Performance VLSI/IC Design and Analysis Laboratory: Inna, Ravi, Alex, Boris, Avi, Gleb, Rassul, Abdo, Tahereh, Nurzhan, and Ana, for the inspiration, guidance, support, comfort, and kind heart

I received from all of you. The enriching and enjoyable time we spent together will always be cherished. In addition, I thank RuthAnn Williams for the joy and laughter she constantly brings to our lab. Your interesting stories as well as refreshing perspectives always ease our daily routine of research.

I would like to thank my friends for always standing by my side throughout this journey. Whether it is to regularly check on me, weekend gateways together, backing me up during a tough moment, or a heartwarming phone call from the other side of the Pacific, you have supported me more than you think. It is a fortune to always have all of you in my life. An incomplete list: Bo Shi, BWB, CC, Fan Zi, Feng, Gen Duo, Jia Yi, Ju Xiong, and Piao Ge. Special thanks to my girlfriend VK for being my No. 1 fan girl and one of my biggest supports throughout this journey. You've taught me so much about myself and made me a better man.

Last but not least, I would like to thank my parents for your unconditional love, which makes the foundation for everything great that has ever happened to me. You will always be the sweet sweet harbor where I can find peace in a storm, and courage to continue my voyage. For that, I will forever be grateful.

# Abstract

Although CMOS scaling has slowed, the demand for greater performance and heterogeneous integration has yet to end. Greater performance generally leads to higher current demand in high performance computing (HPC) systems. Three-dimensional (3-D) integrated circuits are a natural platform for heterogeneous integration. A 3-D HPC system however suffers from challenging design issues in the power delivery systems due to high current demand and vertical integration.

The dissertation starts by addressing two primary challenges, power noise and electromigration, within high current on-chip power systems. An exploratory model of an on-chip power grid and several on-chip metalization schemes are proposed to mitigate power noise. It is observed that the effectiveness of different metalization schemes on suppressing power noise varies significantly among different advanced technology nodes. To further address the issue of electromigration, voltage stacking is exploited. Parasitic impedance-aware load balancing circuits are proposed to manage load variations across different layers.

Moreover, two critical parameters at the package and board levels, electromagnetic interference (EMI) and “last inch power loss,” in 2.5-D power systems are discussed. A novel resonant converter with a distributed topology is proposed to mitigate EMI. Two VR-on-package topologies, VR-top and -bottom, are described. The impedance characteristics, EMI, power noise, and power loss of these two VR-on-package topologies are compared.

To fully exploit the potential of 3-D HPC systems, current paths within 3-D integrated power systems are explored. A 3-D redistribution layer (RDL), affecting both the vertical and 2-D current paths, is introduced. The effects of through silicon via (TSV) manufacturing processes and stacking topologies on RDLs are presented. A novel grid-based RDL topology is also proposed to support high current 3-D power systems while exhibiting low power noise. The advantages of a grid-based RDL for a nonuniform TSV distribution are also discussed.

3-D HPC systems are an excellent candidate to achieve high performance as well as heterogeneity. Vertical current path-aware design methodologies are critical to support reliable, power efficient, and high current 3-D power systems. This dissertation provides insight and solutions regarding the challenges of high current 3-D power systems.

# Contributors and Funding Sources

This work was supervised by a dissertation committee consisting of Professor Eby G. Friedman (advisor), Professor Engin Ipek, and Professor Selcuk Kose of the Department of Electrical and Computer Engineering, Professor Yuhao Zhu of the Department of Computer Science, and Doctor Mikhail Popovich of Google Inc. All of the work described in this dissertation was completed independently by the student.

This research is supported in part by the National Science Foundation under Grant No. CCF-1716091, Singapore Ministry of Education under Grant No. MOE2019-T2-2-075, and by grants from Cisco Systems, Qualcomm, and Google.

# List of Tables

|     |                                                                                                       |     |
|-----|-------------------------------------------------------------------------------------------------------|-----|
| 3.1 | Power density of HPC systems . . . . .                                                                | 71  |
| 5.1 | EMI characteristics of distributed LLC resonant converter with different number of branches . . . . . | 132 |
| 5.2 | Comparison of single branch LLC resonant converter and distributed LLC resonant converter . . . . .   | 133 |
| 6.1 | Design specifications of packages supporting VR top and bottom placement topologies . . . . .         | 143 |
| 6.2 | IR drop and power loss of package in VR top and bottom placement topologies. . . . .                  | 149 |
| 6.3 | $L \ di/dt$ comparison between the VR top and bottom topologies. . . . .                              | 152 |
| 7.1 | Interconnect loss of different components in a silicon photonic system [202] . . . . .                | 160 |
| 7.2 | Comparison of 8 x 8 GWOR router. . . . .                                                              | 175 |

|     |                                                                                                                                           |     |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 8.1 | Comparison of different TSV fabrication methods. . . . .                                                                                  | 193 |
| 8.2 | Comparison of a face-to-back and back-to-face 3-D stacking topologies. . . . .                                                            | 193 |
| 8.3 | 2-D power grid specifications [33]. . . . .                                                                                               | 201 |
| 8.4 | P/G TSV specifications [227]. . . . .                                                                                                     | 202 |
| 8.5 | P/G RDL specifications [227]. . . . .                                                                                                     | 204 |
| 9.1 | Specifications of the load imbalance analysis process. . . . .                                                                            | 218 |
| 9.2 | Specifications of the on-chip power network [33]. . . . .                                                                                 | 233 |
| 9.3 | Specification of a resonant converter-based DPP system with a stack-to-bus topology. . . . .                                              | 241 |
| 9.4 | Simulation results of a stack-to-bus DPP system, which utilizes the tile-based power delivery system with ten layer power planes. . . . . | 249 |

# List of Figures

|     |                                                                                                                                                                             |    |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1 | Power plants generate electricity which is delivered to customers over transmission and distribution power lines. . . . .                                                   | 4  |
| 1.2 | Evolution of computer systems from the first analog computer (Antikythera mechanism) to large scale data centers [4, 8, 12]. . . . .                                        | 5  |
| 1.3 | Power delivery network within a computer system. a) Cross sectional view, and b) top view of hierarchical power delivery network from VR to on-chip load [20, 21] . . . . . | 8  |
| 1.4 | A multi-layer interdigitated mesh structured on-chip power grid. The power and ground lines are, respectively, in orange and blue colors. . . . .                           | 11 |
| 1.5 | Overview of 3-D power delivery network. a) Panoramic view of power/ground TSVs and 2-D power distribution cell within a 3-D IC, and b) related circuit model [25]. . . . .  | 12 |

|      |                                                                                                                                                                                                                                                   |    |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1  | Hierarchy of a power delivery network from the PCB to on-chip within a high performance computing system. . . . .                                                                                                                                 | 21 |
| 2.2  | Circuit model of a power delivery network from the PCB to on-chip. .                                                                                                                                                                              | 25 |
| 2.3  | 13 layer metalization stack in Intel 14 nm process [41]. . . . .                                                                                                                                                                                  | 28 |
| 2.4  | Global power grid with interdigitated power and ground metal line [43].                                                                                                                                                                           | 30 |
| 2.5  | Standard cell-based local power and ground rails [33]. . . . .                                                                                                                                                                                    | 31 |
| 2.6  | Structure of an on-chip multilayer power delivery network, including the hierarchy of the global power grid, local power rails, and via stacks:<br>a) planar view, and b) cross-sectional view. . . . .                                           | 33 |
| 2.7  | Two-dimensional distributed model of a two-layer mesh structured power grid. . . . .                                                                                                                                                              | 36 |
| 2.8  | Cross-sectional view of Xilinx 2.5-D FPGA [80]. . . . .                                                                                                                                                                                           | 42 |
| 2.9  | Sectional view of VR-on-package with two PoL converters placed next to an IC. The solid and dashed arrows depict the current path, respectively, between the ball grid array (BGA) power pins and the VRs and between the VRs and the IC. . . . . | 43 |
| 2.10 | TSV-based 3-D IC. . . . .                                                                                                                                                                                                                         | 45 |
| 2.11 | Distributed model of an N-layer power delivery network in a 3-D IC [83].                                                                                                                                                                          | 46 |
| 2.12 | Electrical parameters characterizing a cylinder shaped TSV [90]. . . .                                                                                                                                                                            | 47 |
| 2.13 | Geometric parameters in array of P/G TSVs. . . . .                                                                                                                                                                                                | 50 |

|                                                                                                                                                                                  |    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.14 High resolution scanning electron microscope (SEM) images of the etching process of a TSV: a) straight etched TSV hole, and b) tapered etched TSV hole [98]. . . . .        | 52 |
| 3.1 Number of cores in high performance ICs by AMD, Intel, Nvidia, and IBM, including high end desktop processors, server processors, and high end GPUs [21,82,101–115]. . . . . | 57 |
| 3.2 Throughput (in TFLOPS) of modern high end GPUs developed by AMD and Nvidia, and tensor processing units (TPUs) developed by Google [21,82,101–115]. . . . .                  | 58 |
| 3.3 Power consumption of modern high performance processors developed by AMD, Nvidia, Intel, IBM, and Google [21,82,101–115]. . . . .                                            | 59 |
| 3.4 Resistive path (dashed line) between the VR and on-chip load across the board and package . . . . .                                                                          | 62 |
| 3.5 VR-on-package where the VR is moved from the PCB to the package. . . . .                                                                                                     | 64 |
| 3.6 Radiated EMI pollutes the surrounding electromagnetic environment. . . . .                                                                                                   | 65 |
| 3.7 16 core, four layer voltage stacking system. . . . .                                                                                                                         | 68 |
| 3.8 Area comparison of 2-D IC with 3-D IC [90]. . . . .                                                                                                                          | 73 |
| 4.1 Interconnect resistivity of different materials versus line width. . . . .                                                                                                   | 77 |
| 4.2 Topology of a standard cell based power network: a) planar view, b) profile view. . . . .                                                                                    | 80 |

|      |                                                                                                                                                                                                     |     |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.3  | Model of power network                                                                                                                                                                              | 84  |
| 4.4  | Circuit models and physical structure of striping between the local power rails. a) comprehensive circuit model, b) $R_{branch}$ approximated circuit model, and c) physical structure of a stripe. | 88  |
| 4.5  | Comparison between on-chip and off-chip power noise in 14, 10, and 10 nm technology nodes.                                                                                                          | 91  |
| 4.6  | Components of on-chip power noise in 14, 10, and 10 nm technology nodes.                                                                                                                            | 92  |
| 4.7  | Local peak power noise in 14 nm, 10 nm, and 7 nm technologies with increasing clock frequency.                                                                                                      | 93  |
| 4.8  | Per cent decrease in performance of average power noise of a five stage ring oscillator in 14 nm, 10 nm, and 7 nm technologies normalized to an N14 ring oscillator.                                | 95  |
| 4.9  | Degradation in global power noise versus additional global power metal layers                                                                                                                       | 97  |
| 4.10 | Effect of track stripe count and stripe width on normalized power noise, a) 14 nm, b) 10 nm, and c) 7 nm technologies. A 3.6 GHz frequency is assumed.                                              | 98  |
| 4.11 | Peak power noise in GNRs, FLG, and copper power grids with increasing clock frequencies in 7 nm technology.                                                                                         | 100 |

|                                                                                                                                                    |     |
|----------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.12 Peak noise in 3X, 4X, and 5X minimum metal pitch interconnect scaling scenarios with increasing clock frequencies in 7 nm technology. . . . . | 102 |
| 4.13 Total power noise in 14, 10, and 7 nm technology nodes for four different scenarios. . . . .                                                  | 104 |
| 5.1 Quasi-sinusoidal current generation circuit within a resonant converter.                                                                       |     |
| (a) Sinusoidal current generation mechanism, (b) $LC$ tank circuit, and                                                                            |     |
| (c) voltage across $LC$ tank and quasi-sinusoidal current. . . . .                                                                                 | 110 |
| 5.2 Full bridge isolated LLC resonant converter . . . . .                                                                                          | 112 |
| 5.3 Response of the LLC resonant converter to a change in the load. (a) Static load, and (b) dynamic load. . . . .                                 | 114 |
| 5.4 Performance degradation of a high turns ratio converter. . . . .                                                                               | 117 |
| 5.5 Working principle of basic transformer . . . . .                                                                                               | 118 |
| 5.6 LLC resonant converter with distributed topology . . . . .                                                                                     | 120 |
| 5.7 Waveforms characterizing performance of distributed LLC resonant converter . . . . .                                                           | 121 |
| 5.8 Power loss components for an eight branch distributed LLC resonant converter. . . . .                                                          | 123 |
| 5.9 Package with a digital IC and two PoL converter. (a) Top view, and                                                                             |     |
| (b) side view. . . . .                                                                                                                             | 127 |

|                                                                                                                                                                                                                                                                                                        |     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 5.10 Intensity of electric field across package. (a) Single branch LLC resonant converter, and (b) distributed LLC resonant converter. . . . .                                                                                                                                                         | 130 |
| 5.11 Comparison of 3 meter far field EMI, (a) standard resonant converter, and (b) distributed resonant converter. . . . .                                                                                                                                                                             | 131 |
| 5.12 Comparison between far field radiation of distributed LLC resonant converter at 3 meters and the CISPR 22 standard. . . . .                                                                                                                                                                       | 132 |
|                                                                                                                                                                                                                                                                                                        |     |
| 6.1 Board mounted VR. Note the resistive path (dashed line) between the VR and on-chip load . . . . .                                                                                                                                                                                                  | 136 |
| 6.2 Sectional view of VR-on-package with two PoL converters placed next to the IC. The solid and dashed arrows depict the current path, respectively, between the ball grid array (BGA) power pins and the VRs, and between the VRs and the IC. The VRs are placed on (a) top, and (b) bottom. . . . . | 139 |
| 6.3 VR-on-package design and evaluation flow . . . . .                                                                                                                                                                                                                                                 | 142 |
| 6.4 Intensity of electric field across package. (a) VR top placement topology, and (b) VR bottom placement topology. . . . .                                                                                                                                                                           | 146 |
| 6.5 Variation of IR drop and power loss with different number of core layers in the VR top topology . . . . .                                                                                                                                                                                          | 147 |
| 6.6 Power delivery model for evaluating $Ldi/dt$ noise of VR top and bottom topologies . . . . .                                                                                                                                                                                                       | 151 |

|                                                                                                                                                                                              |     |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 6.7 Comparison between VR top and bottom placement with number of package layers ranging from 16 to 28 layers. (a) EMI, (b) worst case IR drop, and (c) power loss [193]. . . . .            | 154 |
| 7.1 Waveguide crossing, (a) a 2 x 2 $\lambda$ -router, which consists of waveguides and micro-ring resonators, and (b) signal routing within a complex photonic system. . . . .              | 159 |
| 7.2 Direct waveguide crossing types. (a) Single mode crossing, (b) multi-mode interference-based crossing, (c) elliptical crossing, and (d) four fold symmetric elliptical crossing. . . . . | 161 |
| 7.3 Input light source. (a) Time domain, and (b) frequency domain. . . .                                                                                                                     | 165 |
| 7.4 Effect of the placement of a single waveguide crossing on the signal loss. (a) Experimental setup, and (b) FDTD simulations. . . . .                                                     | 166 |
| 7.5 Effect of the placement of two waveguide crossings on signal loss. (a) Experimental setup, and (b) FDTD simulations. . . . .                                                             | 167 |
| 7.6 Effect of the placement of multiple waveguide crossings on signal loss. (a) Experimental setup, and (b) FDTD simulations. . . . .                                                        | 168 |
| 7.7 8x8 GWOR router. (a) Demonstration of router topology with waveguide and micro-ring resonators, and (b) FDTD simulation setup for worse case scenario. . . . .                           | 174 |

|     |                                                                                                                                                                                                                                                                                              |     |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 8.1 | Current path within a 3-D power distribution network consisting of vertical current paths through the P/G TSVs and horizontal current paths within each 2-D IC. . . . .                                                                                                                      | 178 |
| 8.2 | RDL as an interface between a P/G TSV and an adjacent P/G TSV, and between a P/G TSV and a 2-D power grid. (a) The location of the RDL within a 3-D IC between two adjacent layers; and (b) a zoom-in of the RDL, where the RDL supports both horizontal and vertical current paths. . . . . | 180 |
| 8.3 | RDL as an interface between the IC and package. . . . .                                                                                                                                                                                                                                      | 182 |
| 8.4 | Type A TSV and current path between the TSV and load. (a) Cross sectional view, and (b) lumped circuit model. . . . .                                                                                                                                                                        | 186 |
| 8.5 | Cross sectional view of type two RDL for type A TSVs. The type two RDL connects the bond pad of the power grid in layer N to the P/G TSVs in layer N+1. . . . .                                                                                                                              | 189 |
| 8.6 | Type B TSV and current path between TSV and load. (a) Cross sectional view, and (b) lumped circuit model. . . . .                                                                                                                                                                            | 191 |
| 8.7 | Cross sectional view of current path and P/G RDL for type B TSVs, (a) back-to-face stacking topology, and (b) face-to-back stacking topology. .                                                                                                                                              | 195 |
| 8.8 | Circuit model of current path and P/G RDL for type B TSVs, (a) back-to-face stacking topology, and (b) face-to-back stacking topology. .                                                                                                                                                     | 195 |

|                                                                                                                                                                                                                                                                                                                         |     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 8.9 Comparison of voltage drop at the current source of layer three within a seven layer 3-D power network with a back-to-face and face-to-back topology. (a) Load increases in the adjacent layer, and (b) P/G TSV resistance increases within layer three. . . . .                                                    | 197 |
| 8.10 Power grid with a two layer mesh structure. . . . .                                                                                                                                                                                                                                                                | 201 |
| 8.11 Model of the P/G RDL connecting the P/G TSVs to the 2-D power grid. (a) Direct P2P RDL, and (b) grid-based RDL. . . . .                                                                                                                                                                                            | 203 |
| 8.12 Distribution topologies of P/G TSV for comparison of voltage drop between P2P RDL and grid-based RDL. (a) 100 TSVs with uniform distribution, (b) 50 TSVs with uniform distribution, (c) 20 TSVs with uniform distribution, and (d) 20 TSVs with uneven distribution. . . .                                        | 206 |
| 8.13 Variation in voltage drop in 3-D power networks with fewer P/G TSVs for grid-based and P2P P/G RDL. (a) 100 TSVs with grid-based P/G RDL, (b) 100 TSVs with P2P RDL, (c) 50 TSVs with grid-based RDL, (d) 50 TSVs with P2P P/G RDL, (e) 20 TSVs with grid-based P/G RDL, and (f) 20 TSVs with P2P P/G RDL. . . . . | 208 |
| 8.14 Comparison of voltage drop between the grid-based and P2P RDL with uneven P/G TSV distribution. (a) 20 unevenly distributed P/G TSVs with grid-based RDL, and (b) 20 unevenly distributed P/G TSVs with P2P RDL. . . . .                                                                                           | 209 |

|                                                                                                                                                                                                                                                           |     |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 8.15 Comparison of the voltage drop between the grid-based and P2P RDL with increasing load current. (a) Grid-based RDL, and (b) P2P RDL.                                                                                                                 | 210 |
| 8.16 Comparison of the highest voltage drop in the grid-based RDL and P2P RDL for five different scenarios. . . . .                                                                                                                                       | 210 |
| 9.1 16 core, four layer voltage stacked system. . . . .                                                                                                                                                                                                   | 217 |
| 9.2 Variation in the voltage levels for different activity factors for a non-stacked system, two layer voltage stacked system, and four layer voltage stacked system. . . . .                                                                             | 219 |
| 9.3 Voltage droop as a function of decoupling capacitance and transient current during a load imbalance, (a) 10 ns, and (b) 5 ns. The decoupling capacitance ranges from 0.42 to 4.2 $\mu$ F and the transient current ranges from 0.5 to 2 A/ns. . . . . | 222 |
| 9.4 Symmetric ladder topology switched capacitor converter utilized in a four layer voltage stacked system. . . . .                                                                                                                                       | 225 |
| 9.5 Voltage droop after a 10% load imbalance within a four layer voltage stacked system with a switched capacitor ladder converter. . . . .                                                                                                               | 227 |

|      |                                                                                                                                                                                                                                                                                                                                |     |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 9.6  | Voltage drop considering the effects of the parasitic impedances within a power delivery network. (a) Parasitic resistance increases from 0 to 5 mΩ while the parasitic inductance is assumed negligible; and (b) parasitic inductance increases from 0 to 80 pH while the parasitic resistance is assumed negligible. . . . . | 229 |
| 9.7  | On-chip current path among different cores. a) Regular non-stacked four core system, and b) four layer voltage stacked system. . . . .                                                                                                                                                                                         | 231 |
| 9.8  | Current density model, a) power distribution cell-based regular non-stacked power grid, and b) voltage stacked power grid with cross-core current path. . . . .                                                                                                                                                                | 234 |
| 9.9  | Two VR topologies utilizing DPP in a voltage stacked system. (a) Stack-to-bus topology, where n DC/DC converters are required for an n layer voltage stacked system, and (b) stack-to-stack topology, where n-1 DC/DC converters are required for an n layer voltage stacked system.                                           | 236 |
| 9.10 | The current path and parasitic impedances of the power delivery system within (a) stack-to-bus topology, and (b) stack-to-stack topology. . . . .                                                                                                                                                                              | 239 |
| 9.11 | Resonant converter-based DPP system with stack-to-bus topology in a four layer voltage stacked system. . . . .                                                                                                                                                                                                                 | 241 |





# Chapter 1

## Introduction

The first ever electrical phenomenon recorded by a human being (around 600 BCE) is by Thales of Miletus, who noted that rubbing fur on various substances, such as amber, would produce an attraction between the two materials [1]. Thales believed that this phenomenon was due to the magnetic force generated by the amber which contained a soul inside [2]. In 1600, William Gilbert, for the first time, scientifically explained this phenomenon, which is now called static electricity. He distinguished this phenomenon from magnetism, and gave a new word *electricus* for this phenomenon [3]. The study and exploration of electrical phenomena, and the development and engineering of novel electrical systems and devices have not stopped. Huge power plants, power substations, and electric power grids were developed across the entire country during the electrification era. During the digital era, nanometer scale transistors have been developed and utilized for integrated circuits (ICs) within computer systems.

The Antikythera mechanism (developed in 60 BCE) is generally considered as the oldest computer in human history, which was designed to calculate astronomical positions [4]. Between the late 19th and early 20th centuries, analog computers started to bloom, allowing a user to model or simulate a complex system with a known physical system [5]. These early analog computers were mechanical in nature. The classic example of an early analog computer is the differential analyzer, invented by Vannevar Bush in the 1930's, which used mechanical shafts and gears [6]. Not until 1942 was the first fully electronic analog computer developed by Helmut Hölzer [7], thanks to the development of electronics. The first electronic digital computer, the ENIAC, was developed in 1945 by the US government and can currently be viewed at the University of Pennsylvania [8]. Thanks to the development of integrated circuit technology, computers evolved into the microprocessor era. The successful integration of semiconductor technology into microprocessors started the modern computer era.

The history of the second to the fourth industrial revolution is described in Section 1.1. The effect of computer systems on the third and fourth industrial revolutions is also discussed. Power dissipation plays an important role in the development of computer systems. An overview of power delivery systems within a computer is presented in Section 1.2. An outline of the dissertation is provided in Section 1.3.

## 1.1 Industrial revolutions and computer systems

The second industrial revolution, also referred to as the technological revolution, began in the 1870's. This industrial transition provided an excellent opportunity for the development of electrical systems [9]. With the invention of the telephone, incandescent lamp, electric motor, the successful development of AC power stations, and many other important electrical systems, human beings entered the electrification era [10]. The electrical power grid, a combined transmission and distribution network, played an important role in this new electrification era, delivering stable, high quality electrical energy from the power plants to millions of households and factories, as illustrated in Figure 1.1. The development of the electrical power grid enabled the evolution from local generation, steam powered factories to centralized generation, electricity powered factories. This process enabled industry to produce greater output at lower cost, leading to the success of this industrial revolution.

In 1930, the term “electronics” was introduced as a discipline of electrical engineering. Electronics deals with electrical circuits composed of active electrical components such as vacuum tubes, diodes, and so on. Not long after the first electronic digital computer, the first transistor was developed in 1947 by John Bardeen, Walter Brattain, and William Shockley at Bell Labs [11]. The three men probably did not know that they started a new era, the third industrial revolution, also referred to as the digital revolution. Information today is generated, transferred, processed, and stored



Figure 1.1: Power plants generate electricity which is delivered to customers over transmission and distribution power lines.

in a digital form. Eleven years after the successful development of the first transistor, the IC was invented and the semiconductor industry was born. The computer industry started to grow rapidly thanks to the development of the semiconductor industry, as illustrated in Figure 1.2.

Computers exhibit increasing performance, smaller footprint, and lower cost than the predecessor versions due to the development of integrated circuit semiconductor technology. It was not until the 1970's that the personal computer was introduced and the digital revolution began to change the life of common people. Early in the development of the microprocessor, speed was the primary design objective. In the 2000's, with the assistance of the Internet, smart phones started to break the communications barrier among people and between people and information, revolutionizing the daily life of everyone anytime and anywhere. Systems-on-chip (SoC), as the processor



Figure 1.2: Evolution of computer systems from the first analog computer (Antikythera mechanism) to large scale data centers [4, 8, 12].

inside mobile smart phones and other smart portable devices, exhibit quite strict requirements for power, making power the design criterion of fundamental importance [13]. Note that power, as a primary design objective, did not attract significant attention during the early development of microprocessor technology. It was not until the 2000's that power became a challenging issue, becoming the primary bottleneck to performance and area [14]. Multicore microprocessors and power minimization techniques have recently been developed to manage this power challenge [15].

Today, the fourth industrial revolution is upon us. Although the definition of the fourth industrial revolution has not yet been widely agreed upon, it is marked by the development of emerging technologies across a number of fields such as artificial intelligence (AI), the Internet of things (IoT), autonomous vehicles, robotics, 5G

wireless technologies, the brain computer interface (BCI), and so on. The IC industry will continue to play an important role in this fourth industrial revolution, facing both novel and similar challenges in the advancement of technology, such as the ever challenging issue of low power.

For example, it is projected that the number of IoT devices will exceed 20 billion by 2020 [16]. IoT devices typically exhibit the following characteristics: (1) small physical area, (2) communications capability, (3) sensing/actuation modality, and (4) low energy consumption [17]. IoT places extreme limitations on area, reliability, cost, and, importantly, power consumption.

As important technologies of the fourth industrial revolution, AI and machine learning applications are starting to push the limits of computational power in high performance processors. Existing low power techniques have not prevented the growth in power consumption in high performance processors [18]. Data centers for cloud computing are being rapidly developed across the world, consuming significant energy. The power consumption of global data centers was roughly 416 terawatt hours in 2016, more than the electricity consumed by the United Kingdom [16].

The increase in power consumption leads to higher current demand for ICs, challenging the reliability of the power delivery network and the power efficiency of the computer system. Innovative technologies, design methodologies, and integrated platforms across the device, circuit, architectural, and system levels are greatly desired

to tackle this high current challenge either by reducing the current demand of the ICs or by developing ICs that can support high currents. Moreover, 3-D ICs that exploit the vertical dimension provide a platform for heterogeneous integration and greater performance regardless of the CMOS technology node [19]. The dense nature of vertical stacking however makes 3-D ICs highly sensitive to high power, preventing 3-D technology from being used in many high current and high performance applications. The full advantages of 3-D integration can only be exploited if high current 3-D systems are enabled.

The power delivery network within a computer is a complex hierarchical system. Each component within the power delivery network affects the reliability and efficiency of the entire system. An overview of power delivery systems within a computer is therefore presented in the following section.

## **1.2 Power delivery network within a computer system**

Similar to the electric power grid that generates, transfers, and distributes electricity to many millions of end users, a power delivery network is required within a computer system, as illustrated in Figure 1.3. This power delivery network spans from the voltage regulator module (VRM) to the on-chip load, including the physical



Figure 1.3: Power delivery network within a computer system. a) Cross sectional view, and b) top view of hierarchical power delivery network from VR to on-chip load [20,21]

structures and passive/active devices within the board, package, and IC. A VRM generates power for the system, providing sufficient current at a stable voltage to the load. The power planes within the board transfer power from the VRM to the microprocessor with low loss. The physical structure and devices between the board and on-chip load distribute the power, effectively transferring current from the off-chip power pins to the on-chip load with low energy consumption and small voltage drops.

As a basic component within the power delivery network, a VRM converts a high

DC voltage (for instance, 12 volts) to an on-chip voltage (for instance, 0.8 volts), supporting the proper operation of the microprocessor. The VRM also regulates the output voltage under load variations within the microprocessor. Variations in transient current demand and reduced noise margins have made the VRM design process quite challenging. Important characteristics of a VRM include the form factor, power efficiency, power density, step-down ratio, regulation capability, electromagnetic compatibility, as well as many other attributes.

Three classic topologies of a VRM have been developed, including switching mode power supplies (SMPS), switched capacitor converters, and linear regulators. SMPS requires a large inductor and capacitor to store and transfer power from the input to the load where the time that the passive circuit elements are connected to the input and output are carefully managed. SMPS exhibits high power efficiency and load regulation, making this topology effective for microprocessors. SMPS however requires significant area due to the large inductor, making on-chip integration quite challenging. A switched capacitor converter uses capacitors to store and transfer power. Due to trench capacitor technology, integration of large capacitors within a small area has become more feasible, making a switch capacitor converter a useful candidate for on-chip integration. The load regulation capability is however degraded in switched capacitor converters. Alternatively, a linear regulator uses a variable resistor to divide the input voltage, generating and regulating a steady output voltage.

The small area and excellent load regulation characteristics make linear regulators a good candidate for on-chip integration, although the efficiency is limited to the ratio of the output voltage to the input voltage.

Due to heterogeneous systems integration and reductions in power noise margin, multiple voltage domains and fine grained, fast power management are required. A centralized on-board VRM, far from the on-chip load, is therefore no longer sufficient. Historically, the VRM was located on the printed circuit board due to the large footprint, leading to a large parasitic impedance and significant power loss between the VRM and the on-chip load [13]. On-package and on-chip voltage regulation has therefore been developed to provide higher power efficiency and to enable fast load regulation required for dynamic voltage scaling (DVS) systems. On-chip buck converters are utilized by Intel as a fully integrated voltage regulator (FIVR) in the Haswell processor with advanced packaging technology in which the bulky inductors are also integrated [22]. Distributed linear low dropout regulators (LDO), a type of linear regulator, have been integrated within the IBM POWER8 processor, where core level dynamic voltage frequency scaling (DVFS) is achieved [23].

An efficient power delivery network that provides high quality power to the on-chip load has become a primary concern of the IC design process, in which the on-chip power grid plays a critical role. The on-chip power grid refers to the metal layers that transfer current from the controlled collapse chip connection (C4) to the on-chip



Figure 1.4: A multi-layer interdigitated mesh structured on-chip power grid. The power and ground lines are, respectively, in orange and blue colors.

transistors. The on-chip power grid is the final physical structure that passes current before reaching the load. As illustrated in Figure 1.4, a multi-layer, interdigitated mesh structure is generally utilized in the power grid within high performance ICs to provide significant robustness and lower impedance. With increasing clock frequencies and on-chip transient currents, the inductive power noise within the power grid has become non-negligible [24]. The power grid is typically located within the top metal layers and connected to the local, lower power rails through via stacks. On-chip interconnect scaling has led to thinner and narrower local power rails and via stacks.



Figure 1.5: Overview of 3-D power delivery network. a) Panoramic view of power/ground TSVs and 2-D power distribution cell within a 3-D IC, and b) related circuit model [25].

The resistivity of the metal line has increased significantly with interconnect scaling. As a result, the local power network is highly resistive, leading to significant resistive power noise. Inductive and resistive power noise together challenges the development of reliable power delivery networks and the proper functionality of the ICs. A breakdown of the power noise based on the hierarchical structure of the on-chip power network is therefore desirable. Suppression schemes targeting different power noise components are also needed [13]. Increases in the performance of microprocessors have slowed and become more costly due to the greater difficulty in scaling CMOS technology.

As a platform to enhance performance, multiple ICs are stacked together in the third dimension. Three-dimensional ICs have attracted significant attention within both academia and industry [26–28]. As illustrated in Figure 1.5, the on-chip power delivery network of a 3-D IC is highly sophisticated due to the introduction of power and ground (P/G) through silicon vias (TSVs), the key structure which delivers power vertically to the individual layers. Despite the successful development of 3-D products such as 3-D DRAM memory, high bandwidth memory, and field programmable gate arrays (FPGAs), 3-D ICs suffer from power integrity issues and low power efficiency within the 3-D power delivery network. To enable high power applications of 3-D ICs, improved understanding of the distribution of current within the power delivery network across multiple layers is required. Innovative technologies and design methodologies across circuit and architectural levels that support high current within 3-D power delivery networks is highly desirable to fully exploit the advantages provided by 3-D integration.

### 1.3 Outline

The importance and challenges of developing a reliable power delivery network, supporting high current demand, are presented in this dissertation. In Chapter 2, a breakdown of the physical structure of an on-chip power delivery network within a 2-D IC is presented. The hierarchical power delivery network is divided into a global

power grid, power via stacks, and local power rails. Lumped and distributed models of the power delivery network are described in this chapter. Accurate and efficient simulation engines, supporting large scale power networks, are reviewed. The power delivery network within a 2.5-D and 3-D system is also presented in this chapter. As a key component of a 3-D power delivery network, P/G TSVs require accurate models supporting different circumstances. Closed-form expressions, accurately capturing the impedance of the P/G TSVs in a large array, are presented.

At the system level, processors composed of a large number of cores and providing high throughput are highly desirable for high performance computing (HPC) systems. At the transistor level, dynamic and leakage power consumption do not decrease with standard CMOS scaling. As a result, power consumption continues to increase, leading to significant on-chip current. The challenges of developing a power delivery network, supporting high currents, are therefore discussed in Chapter 3. One challenge, the last inch power loss, is the undesirable power loss from the board to the IC before reaching the on-chip load, which can be significant due to the high current demands of HPC systems. Another classic issue is electromigration, an old problem within the IC community, which challenges the reliability of the power delivery network within HPC systems. Novel materials, mitigating electromigration, are reviewed in this chapter. An architectural technique, voltage stacking, is also introduced, reducing on-chip current demand by adding layers sharing the same current

within a voltage stacked system.

To explore the effects of high current on the power network of advanced technology nodes, several power noise suppression methods utilizing different on-chip metalization schemes are proposed in Chapter 4. An exploratory modeling methodology is presented to estimate power noise in advanced technology nodes. The models are evaluated for the 14, 10, and 7 nm FinFET technology nodes. Power noise is composed of three parts, noise related to the global power grids, via stacks, and local power rails, based on the hierarchical nature of the power distribution network. The components of the on-chip power noise for the 14, 10, and 7 nm technology nodes are compared. Different on-chip metalization schemes are proposed to suppress power noise, including additional global power metal layers, metal stripes, graphene interconnect, and deeply scaling the local power rails. The efficiency of these suppression techniques on different power noise components and technology nodes is also discussed.

One method to mitigate the last inch power loss is to shorten the "last inch" distance by bringing the point-of-load (PoL) converter close to the processor. A voltage regulator (VR)-on-package topology using system-in-package (SiP) technology can effectively reduce the last inch power loss by moving the PoL converter from the board to the package. To further decrease the last inch power loss, a high step-down ratio within the PoL converter is utilized. In this way, high voltage transmission is achieved between the PoL converter and processor, leading to less current passing

through the power delivery network. Electromagnetic interference (EMI) however has become more challenging due to the close proximity of the components within an SiP environment and the higher voltage levels within the package. A distributed resonant converter that supports a high step-down ratio and low EMI is described in Chapter 5. The converter exhibits a stable load voltage with less than 4% ripple and a fast transient response of less than 1  $\mu$ s. More than 3X lower EMI in the distributed resonant converter is demonstrated as compared with a single branch resonant converter using the same step-down ratio.

As previously mentioned, a VR-on-package is a promising topology which supports high voltage transmission within a printed circuit board and package, leading to lower distribution loss. A study of VR top and bottom placement topologies within a system-in-package environment in terms of EMI and power integrity is presented in Chapter 6. This comparative analysis targets a specific server package application, and evaluation of the power integrity focuses on the power delivery network between the VR and IC. The VR top placement topology exhibits more than 3X less EMI, and 15.3% lower worse case IR drop as compared with the VR bottom placement topology. The tradeoffs are however a larger power loss within the package and higher cost of the package due to the greater number of package layers required by the VR top placement topology. The VR top placement topology exhibits 52.6% higher power loss than the VR bottom placement topology.

Insights and design guidelines to mitigate high current challenges within HPC systems are presented in this dissertation. Due to the high bandwidth and power efficiency characteristics, silicon photonics, as a candidate for HPC systems, has attracted significant attention from both industry and academia. In a silicon photonic system, hundreds of photonic components are connected by waveguides, inevitably forming waveguide crossings within a limited area. Insertion loss due to the placement of these multiple waveguide crossings are discussed in Chapter 7. Existing metrics for determining the total signal loss are shown to be inaccurate since these metrics ignore the location of the waveguide crossings. Three experiments with different waveguide crossing locations are evaluated using 3-D finite difference time domain analysis. It is observed that the location of a waveguide crossing, the relative location of two crossings, and the total number of crossings within a system affect the loss per crossing. Bloch wave and multimode interference are subsequently introduced to explain this phenomenon.

3-D ICs exploit the vertical dimension, providing a promising method to extend scaling. The previously mentioned high current issues are however more challenging in 3-D ICs than in 2-D ICs, preventing the application of a 3-D platform to HPC systems. As a key factor affecting current distribution in 3-D ICs, the redistribution layer (RDL) has to date received little attention from the research community. Oversimplified assumptions regarding RDLs are made such as a perfect TSV-to-TSV

match between adjacent layers, or simple metal stripe connections between the TSVs and 2-D power grid. A comprehensive evaluation of RDLs with different TSV fabrication methods and 3-D stacking topologies is provided in Chapter 8. A circuit model of an RDL is described to provide insight into the effects of two types of RDLs on intra-layer and inter-layer power distribution networks. A novel grid-based RDL topology is also proposed to suppress voltage drops in 3-D power networks, supporting fewer TSVs and high current demand. It is also observed that a grid-based RDL topology is an excellent candidate for managing certain issues such as nonuniform TSV distribution and TSV mismatch.

Multiple methods have been introduced to alleviate high current issues without reducing the demand for current. Voltage stacking has become a topic of growing interest as a technique to reduce on-chip current. The challenge of load imbalances within voltage stacked systems is however significant. Existing work to manage load imbalances focus on two directions: (1) utilizing circuit, architectural, or scheduling methods to reduce the likelihood or magnitude of a load imbalance, and (2) given specific load imbalance profiles, utilize a power converter to mitigate voltage variations due to load imbalances. The power delivery network for voltage stacked systems, despite being critical, has to date received minimal attention. The challenges of a voltage stacked power network to support multiple voltage domains, high current demand, and exotic load balancing circuits are explored in Chapter 9. It is observed

that the parasitic impedance of the power network in voltage stacked systems significantly affects the power integrity as compared with conventional 2-D power networks. Characterization of the parasitic impedance within the power network is therefore discussed for both stack-to-bus and stack-to-stack topologies. A case study of a power network for voltage stacked systems is also described in this chapter.

A summary and some concluding remarks are offered in Chapter 10. An important research topic, power delivery in high current 3-D systems, is introduced in this dissertation. The trend and challenges of increasing current in HPC systems are demonstrated. The research explores power integrity issues within 2-D, 2.5-D, and 3-D power delivery systems, considering issues such as power noise, EMI, and last inch power loss. To mitigate the effects of high current on power integrity, certain circuit design methodologies, physical design guidelines, 2.5-D integration topologies, and 3-D power systems have been proposed. A serially connected stacked circuit topology is shown to significantly reduce the current flowing through a power delivery network. This research provides insight into the challenges of applying a stacked topology to high current applications. Design guidelines for the power delivery system of a stacked topology have also been proposed.

Lastly, several potential research directions are suggested in Chapter 11. One possible research direction is a topology for a high current voltage stacked system utilizing a buck-boost converter-based current balancing topology. Extending the

research described in Chapter 9, a comparison of the performance, efficiency, and overhead of this new topology with a bus-to-stack topology would be useful. Another possible research path is the analysis of the feasibility of a heterogeneous 3-D voltage stacked system. Performance improvements from 3-D voltage stacking as compared to 2-D voltage stacking by mitigating load imbalances is also a topic of potential interest.

## Chapter 2

# Power Delivery Networks in High Performance 2-D and 3-D Systems

Aggressive device scaling and increasing circuit complexity produce higher power dissipation in processors, leading to significant current transferring from the power source to the on-chip load. Resistive power noise  $\Delta V_R = IR$  and inductive power noise  $\Delta V_L = L dI/dt$  therefore develop across the entire hierarchy of the power delivery network. A cross-sectional view of the power delivery network of a high performance



Figure 2.1: Hierarchy of a power delivery network from the PCB to on-chip within a high performance computing system.

processor is illustrated in Figure 2.1. The hierarchy of the power delivery network spans from the point-of-load (PoL) voltage regulator (VR) to the on-chip load, including the printed circuit board (PCB), package, and integrated circuit (IC) [13]. The hierarchy of a power delivery network consists of active power regulator/converter circuits, passive devices, for example, decoupling capacitors, and a physical structure transferring current from the power source to the on-chip load. These physical structures include power planes within the PCB and package, on-chip power grids, and power/ground through silicon vias (P/G TSVs) within a three-dimensional integrated circuit (3-D IC).

As illustrated in Figure 2.1, a PoL voltage regulator is the source of power for a power delivery network, converting a medium level DC voltage (for instance, 12 volts) to an on-chip power supply voltage (for instance, 0.8 volts). The PoL voltage regulator is connected to the power planes within a PCB to transfer current to the package. Decoupling capacitors are generally placed on a PCB to compensate for transient current variations due to on-chip load variations. The current subsequently transfers to the package through a ball grid array (BGA) providing the interface between the PCB and package. The power delivery network of the package consists of multiple power planes and vias that transfer and evenly distribute current between the BGA and the controlled collapse chip connection (C4) bumps [29]. For a high

performance system, small decoupling capacitors (not shown in Figure 2.1) are integrated within the package to reduce on-chip simultaneous switching ( $L dI/dt$ ) noise. The structure of the package within a 2.5-D system can be significantly complex due to transient changes in current flow and multiple voltage domains within a package. This phenomenon is explained in Chapter 3. The current transfers to the IC through C4 bumps connecting the package to the IC, where the current is further distributed to the active loads through one or more on-chip power grids. In a 3-D system, the power grid within each tier is connected to the adjacent power grids through P/G TSVs.

The structure of the power delivery network in a conventional 2-D, advanced 2.5-D, and TSV-based 3-D systems is reviewed in this chapter. The structure of a conventional power delivery network within a 2-D system is described in Section 2.1, followed by recent research on fast power grid simulation. The structure of power delivery networks within a 2.5-D and 3-D system is introduced in Section 2.2, including current challenges and research highlights. A summary of the chapter is provided in Section 2.3.

## 2.1 Power delivery network for high performance processors

Due to the parasitic impedance within a power delivery network, resistive and inductive power noise develops, affecting the quality of power at the on-chip loads [30–32]. The reduction in power noise margin and the continuing increase in on-chip current lead to a challenging design process for achieving a low noise and robust power delivery network. Power noise suppression schemes need to be integrated into the IC design process. Important design issues affecting the power delivery network, such as area and performance, have to be considered during the early exploratory design stage [33]. Mechanisms for generating on-chip power noise are introduced in Section 2.1.1. The structure of an on-chip multilayer power grid for high performance processors is presented in Section 2.1.2. Fast simulation of power delivery networks for static and transient analysis is described in Section 2.1.3

### 2.1.1 On-chip power noise

A circuit model of the hierarchical power delivery network shown in Figure 2.1 is illustrated in Figure 2.2. The model consists of a lumped model of the parasitic impedance from the voltage regulator to on-chip [13].  $R$ ,  $L$ , and  $C$  are, respectively, the parasitic resistance, inductance, and decoupling capacitance within each level of a



Figure 2.2: Circuit model of a power delivery network from the PCB to on-chip.

power delivery network.  $V_{DD}$  and  $V_{SS}$  are, respectively, the high and low voltage of a PoL voltage regulator, and  $V_{DD} + \Delta V_{DD}$  and  $V_{SS} + \Delta V_{SS}$  are, respectively, the power and ground voltage across the load. Current passing through the parasitic impedance of the power delivery network produces voltage variations,  $\Delta V_{DD}$  and  $\Delta V_{SS}$ , referred to as power noise. The power noise consists of two types. One type is due to the parasitic resistance of the power distribution network, referred to as *IR* drop or resistive power noise, which is proportional to the magnitude of the current and the parasitic resistance. The other type is due to the time variant current, which flows through the parasitic inductance. This noise component is referred to as  $L \frac{dI}{dt}$ , simultaneous switching noise (SSN), or inductive noise, which is proportional to the rate of change of the transient current [34]. With continuing increase in operating

frequencies and current demand of next generation processors [35],  $L dI/dt$  noise is becoming more significant [24, 36].

The increase in current demand in high performance processors leads to higher average and transient currents. The high average currents in a power delivery network produce large  $IR$  drops. High transient currents, alternatively, increase  $L dI/dt$  noise as the switching times decrease. Power noise can lower performance and produce faulty circuit operation. The effects of power noise on signal delay uncertainty, on-chip clock jitter, and gate reliability are discussed below.

In CMOS technology, the PMOS and NMOS transistors are, respectively, connected to the power and ground rails within an on-chip power delivery network. Supply voltage variations affect the gate and source of the MOS transistor, changing the drain current of the MOS transistor. Variations in the drain current subsequently affects the device propagation delay. The propagation delay of the on-chip signal is therefore affected by power supply voltage variations [37], causing delay uncertainty. Signal delay uncertainty, particularly along the critical path, limits the maximum operating frequency of a processor. Similarly, when the victims of power supply variations are the clock generation and distribution network rather than the logic gates, power noise can increase clock jitter and clock skew [15], decreasing the operational frequency of a processor [38]. The gate oxide thickness decreases with technology scaling, producing faster device switching. This scaling trend makes the reliability of

the gate oxide more challenging. Voltage fluctuations across the gate oxide, increased by power supply noise, can increase the gate voltage [39]. In this way, long term reliability and functionality of the gate oxide degrade [40].

### **2.1.2 Breakdown of on-chip multilayer power delivery networks**

To maintain high performance and correct functionality of an integrated circuit, the power supply voltage should be regulated within specific noise margins [13]. The noise margin decreases with the scaling of CMOS technology, while the current demand of high performance processors continues to increase. Reducing the parasitic impedance of the power delivery network is therefore critical to suppress power supply noise. A breakdown of the physical structure of a power delivery network is important in terms of quantifying the parasitic impedance and providing intuitive tradeoffs between power noise and performance. Although each level within a hierarchical power delivery network contributes power noise, the on-chip power delivery network is the most critical level due to the complex structure of the on-chip metalization and proximity to the loads, as compared with the relatively simple structure of the power delivery network within the PCB and package.

Aggressive device scaling and increasing circuit complexity require significant signal routing resources, leading to an increase in the number of on-chip metal layers.



Figure 2.3: 13 layer metalization stack in Intel 14 nm process [41].

A cross-sectional view of the on-chip metalization stack in the Intel 14 nm process is illustrated in Figure 2.3. In total, 13 metal layers are provided for on-chip signal routing and power delivery, including the global, intermediate, and local metal layers. The hierarchical power delivery network within a modern processor is a multilayer structure, spanning from the top global layers to the bottom local metal layers. An on-chip multilayer power delivery network consists of a global power grid, local power and ground rails, and via stacks connecting the global power grid to the local power rails.

### 2.1.2.1 Global power grid

Depending upon the application, several structures are used to deliver power within the global metal layers, including routed networks, irregular mesh structured networks, grid structured networks, power and ground planes, and cascaded power and ground rings [13, 42]. A grid structure is generally utilized for high performance processors, as illustrated in Figure 2.4. The darker and lighter lines refer to, respectively, the power and ground metal lines. A typical global power grid for high performance ICs utilizes two layers of orthogonal metal lines to form a mesh structure. Adding global metal layers decreases the grid impedance, reducing the power noise within the global power grid. The total number of on-chip metal layers is however limited by the technology. A mesh structure increases the reliability and robustness of a power network due to the multiple redundant paths. The mesh structure also reduces the resistance and parasitic capacitance of the power grid.

The extraction of the effective resistance of a global power grid is necessary in on-chip power noise analysis, decoupling capacitance allocation, and power dissipation estimation. A closed-form expression of the resistance of a uniform mesh, assuming the vertical and horizontal unit resistances are the same, has been previously developed [44]. Inspired by this work, a closed-form expression has also been developed for the effective resistance between the intersections within a two layer resistive mesh where the horizontal and vertical unit resistances are different [45]. The impedance



Figure 2.4: Global power grid with interdigitated power and ground metal line [43].

of the global power grid typically exhibits low resistance and high inductance due to the mesh structure [33]. The inductive properties of the global power grid is an important research topic. Three different grid types, grouped, interdigitated, and paired, are compared in [46] in terms of the effective inductance of the global power grid. The inductance variation across a wide frequency spectrum with different metal line thicknesses is investigated within these three different grid types. The inductance variation with different grid specifications such as grid length, pitch, and number of lines is also described in this work.

### 2.1.2.2 Local power rails

Notably, billions of on-chip transistors are powered by the local power rails within the standard cell structure rather than directly from the global power grid. The local power rails within a standard cell structure is illustrated in Figure 2.5. Logic gates



Figure 2.5: Standard cell-based local power and ground rails [33].

with the same height, determined by the standard cell library of a specific technology process, are placed between the local power and ground rails. The power and ground rails with the standard cells in between are referred to as a power track. The devices within the same track are powered by the same power and ground rails.

The power rail impedances are dominated by the metal resistance and decoupling capacitance inserted along the power rails. On-chip power noise is caused by current switching on the track rails with the greatest contribution from the clocked gates and buffers [47, 48]. The local power noise is contributed by the *IR* drop within the power rails when multiple loads simultaneously switch. Reducing the resistance of the local power rails is therefore an effective approach to mitigate resistive power noise [33]. As illustrated in Figure 2.5, the height of the standard cell, width of the local power rails, and number of power tracks determine the footprint of the IC. The

standard cell height is normally determined by the technology process. The width of the power rails, however, depends upon the IC specifications and power noise margin. Early impedance characterization and power noise analysis are typically performed to evaluate different metalization schemes for the local power rails, providing intuitive tradeoffs among IC performance, area, and power noise.

### 2.1.2.3 Power via stacks

The global power grid is generally on the top two metal layers and the local power rails are on the bottom metal layer, as illustrated in Figure 2.6(a). Via stacks, spanning all of the metal layers in between, are the physical structure connecting the global power grid to the local power rails, as illustrated in Figure 2.6(b). A via is the vertical interconnect between adjacent metal layers. The via is assumed to be shaped like a cylinder with a metallic barrier layer, where the diameter is the same as the width of the adjacent power line. As technology is scaled, the resistance of the via increases significantly due to the smaller cross sectional area and highly resistive metallic barrier of the via. With an increase in the number of on-chip metal layers, the length of the via stack also increases, leading to an even greater resistance. The via stack is therefore an important structure within an on-chip power delivery network, affecting the power noise and parasitic impedance extraction processes [33].



Figure 2.6: Structure of an on-chip multilayer power delivery network, including the hierarchy of the global power grid, local power rails, and via stacks: a) planar view, and b) cross-sectional view.

### 2.1.3 Modeling and efficient simulation of power grid

The development, optimization, and pre-silicon verification of a power delivery network require a reliable circuit model for power grid simulation and power noise analysis [13]. An accurate and efficient power grid simulation engine is also required to support the analysis of modern power grids composed of billions of nodes. A one-dimensional lumped model and a two-dimensional distributed model for power delivery networks are described and compared in Section 2.1.3.1. Methodologies for the efficient simulation of large scale power grid is provided in Section 2.1.3.2.

#### 2.1.3.1 Power grid model

As illustrated in Figure 2.2, a one-dimensional lumped circuit model of a power delivery network is often used to characterize a power grid [49]. Each level within the hierarchy is modeled as a pair of power and ground conductors and a decoupling capacitor, spanning from the PCB to on-chip. This model is sufficient to capture the global characteristics of the power delivery system over a wide frequency range [13]. The  $IR$  drop and  $L \frac{dI}{dt}$  noise, from each individual hierarchy, can be evaluated and compared. The model can also be used for system level impedance characterization and target impedance evaluation for each hierarchical level within a power delivery network.

Despite the simplicity and wide use of a one-dimensional lumped model, this

topology fails to capture the local characteristics within each hierarchy. As described in Section 2.1.2, modern on-chip power delivery networks consist of complex physical structures spanning entire metal layers and billions of loads with time variant current. By simplifying such a complex system to a one-dimensional lumped model, this circuit model is often not sufficiently accurate for on-chip power noise evaluation and optimization. For example, a sudden increase in current demand of an on-chip block can produce significant  $IR$  drop as well as  $L \frac{dI}{dt}$  noise, producing power noise above a specified noise margin. This undesirable high power noise can lead to severe reliability issues and circuit malfunction. This local phenomenon, however, is not accurately captured by a one-dimensional circuit model.

Based on the physical structure of a mesh power grid, a two-dimensional distribution model has been developed [50], as illustrated in Figure 2.7. Due to the symmetric characteristics of the power and ground delivery networks, only the power network is illustrated in the figure. The two-layer mesh structure is represented by horizontal and vertical resistors, which is determined by the technology and on-chip metalization scheme. The inductance is not considered in this two-dimensional model since the on-chip inductance is much smaller than the package inductance and C4 bumps. Note that the vias connecting the two metal layers are also neglected in this model. The load is modeled as a time variant current source connected to the intersection

of the horizontal and vertical resistors. The location and current profile of the current sources are determined by the physical characteristics. The same power delivery network can be modeled at different spatial resolutions. The spatial resolution is determined by the size of the circuit, the target accuracy of the power delivery network analysis process, and the computational requirements. A two-dimensional model supports accurate power noise analysis and impedance characterization which includes local behavior. The tradeoff is however significantly higher computational cost and



Figure 2.7: Two-dimensional distributed model of a two-layer mesh structured power grid.

additional physical design information.

### 2.1.3.2 Efficient simulation of large scale power grid

Due to the scaling of technologies and increasing on-chip circuit complexity, the performance and reliability of a power delivery network degrades [51]. A more accurate model of a power delivery network, supporting highly accurate power noise analysis, is therefore required for power delivery network design and verification. Higher spatial resolution in a two-dimensional model of a power delivery network leads to greater accuracy [50]. The computational cost, however, increases significantly with higher spatial resolution. For example, a two-dimensional model of a power delivery network consists of  $N$  rows and  $N$  columns. The total number of nodes are  $N^2$ , and the resulting matrix to solve the power delivery network is  $N^2 \times N^2$  [52]. A power delivery network of a modern processor is extremely large, composed of billions of nodes and resistors. Accurate simulation of a large power delivery network consumes significant time and computational resources [53–55].

Many methods have been proposed to efficiently simulate large scale power delivery networks; particularly, two primary approaches. The first approach is applying linear algebraic techniques to exploit the sparse nature of the power grid, achieving improved simulation time and low memory requirements [56–63]. Many contributions have been developed, such as the Krylov-subspace method [57], hierarchical

and macro-modeling methods [64], random walks [59], and domain decomposition [65]. Geometric-based multigrid like techniques have been exploited to enhance the efficiency of the power grid analysis process [66]. To investigate irregular power grids, algebraic multigrid based techniques have also been developed [60, 67]. By utilizing the algebraic multigrid preconditioned conjugate gradient method in DC analysis, a practical power grid of 60 million nodes is solved at 0.01 mV accuracy in 170 seconds with a two Quad-Core Intel Xeon E5506 CPU at 2.13 GHz and 21.89 GB memory [56]. The transient analysis of a power grid is essential to capture simultaneous switching noise [37] and enhances accuracy when evaluating the effects of the inductance and capacitance on the power delivery network. It is however more challenging in terms of computational time and resources [54]. Many efficient transient simulation solvers for power grid analysis have been proposed to increase simulation speed with low error and memory requirements [61–63].

Another primary approach is to develop an accurate closed-form expression characterizing the effective resistance of a power grid to provide fast IR drop analysis. Closed-form expressions have been developed to efficiently characterize the effective resistance between specific nodes within a two-layer mesh structured power grid [44,45]. These closed-form expressions have been developed to estimate the maximum on-chip *IR* drop, assuming uniform current distribution [68]. In [55], the closed-form

expressions describe the voltage drop at any node within a power distribution network with non-uniform current loads and voltage supplies. Faster simulation speed is achieved as compared with algorithms such as random walk and second order iterative methods.

## 2.2 Power delivery networks for 2.5-D and 3-D systems

The increase in performance of processors relies heavily on CMOS technology scaling, leading to the increasing difficulty of devices scaling, reliability issues in deeply scaled devices, and the interconnect overhead brought by higher circuit complexity and aggressive metalization schemes [69]. The rate of increasing performance in modern processors has therefore decreased and become more costly. As a realistic technology candidate to extend Moore's Law by alleviating the bottleneck of global interconnect latency and growth in IC area, three-dimensional ICs have attracted significant attention within both academia and industry [19]. As one of several 3-D fabrication styles, TSV-based 3-D ICs exhibit several advantages [70]. Vertical integration in 3-D ICs yields smaller area and higher levels of integration. Short, vertical TSVs lead to both higher system performance and lower power dissipation [71]. 3-D stacked layers also make heterogeneous integration possible, enhancing the

functionality of modern systems [17].

2.5-D technologies have been developed as an intermediate step towards 3-D systems since many challenges exist for 3-D systems to be widely used. 2.5-D, which originated from multichip module technology, faces fewer challenges and exhibits a lower cost than 3-D systems, has therefore become a viable solution to significantly increase on-chip bandwidth [72]. By utilizing an interposer technology, a signal and power routing layer between the package and IC, a 2.5-D system supports high I/O densities up to  $10^6/\text{cm}^2$ , as compared with ceramic or organic packages, which typically provide I/O densities of  $10^3/\text{cm}^2$  [73]. Moreover, the interconnect pitch of the interposer is on the order of  $5\ \mu\text{m}$ , which is an order of magnitude larger than for ceramic packages [74]. Due to the advantages of 2.5-D systems, the fabrication process has made significant progress, allowing 2.5-D systems to evolve into a stand-alone manufacturing technology rather than a short-term intermediate technology on the path to fully integrated 3-D systems [19].

Several 3-D IC products have been developed, based on TSV technology, over the past few years [75–78]. Mass production of 3-D DRAM has been achieved by Samsung, targeting server farm applications, exhibiting a doubling in performance with half the power consumption [75]. In [76], a prototype 3-D wide I/O DRAM system, with multilayer DRAM die stacks on top of an SOC, has been introduced,

targeting low power mobile applications. A new architecture for DRAM, utilizing 3-D technology to integrate heterogeneous dies, was introduced to increase the density and bandwidth of DRAM [77]. Hybrid memory cubes (HMC) have been integrated into an advanced, high performance Intel server such as the Intel Knight's Landing. High bandwidth memory (HBM), first introduced in [78], has attracted significant attention within industry due to the superior bandwidth and low cost advantages as compared with other 3-D memory contenders. HBM utilizes high end graphic processing units (GPU) from Nvidia and AMD with an advanced interposer based 2.5-D technology [79].

Despite the mass production of 3-D memory and the integration of 3-D memory with high performance CPU/GPU, 3-D ICs suffer from power integrity issues and low power efficiency within the 3-D power delivery network due to the high currents transferring vertically across the stacked planes. 3-D ICs for high performance computation, particularly where the current demand is significant, remains a challenging technology from the perspective of power delivery. Consequently, current 3-D ICs are limited to low power applications [75–78]. It is therefore necessary to evaluate the power delivery network within 2.5-D and 3-D systems to further explore challenges in high power applications. The structure of the power delivery network within a 2.5-D and 3-D system is therefore discussed, respectively, in Sections 2.2.1 and 2.2.2.



Figure 2.8: Cross-sectional view of Xilinx 2.5-D FPGA [80].

### 2.2.1 2.5-D power delivery network

Due to improvements in on-chip bandwidth and relatively low cost as compared with TSV-based 3-D ICs, interposer-based 2.5-D systems have become increasingly well accepted. Examples include a field programmable gate array (FPGA) and high end GPU. The Xilinx 2.5-D FPGA is illustrated in Figure 2.8. A passive silicon interposer, is placed between the package and IC. Multiple FPGA dies are placed side by side on top of the silicon interposer. The dies are interconnected through metal lines (also called redistribution layers (RDL)) within the interposer for inter-chip communication. The vertical connections between the package and interposer are achieved with fine pitch TSVs. From the perspective of power delivery, an interposer-based 2.5-D technology is not significantly different from a conventional 2-D power delivery network.

A system-in-package (SiP), another approach to 2.5-D systems, has also attracted significant attention from both academia and industry. By integrating multiple ICs within the same package, less system area, shorter signal delay, and lower system power loss are achieved. An SiP is used in high end Intel server lineup from Xeon Phi, Knight's Landing [81], to integrate a high performance computing die with multiple HMCs within the same package. In the TPU [82], the Google machine learning accelerator, an SiP supports a VR-on-package topology, where the voltage regulator is moved from the PCB to the package to achieve lower system power loss due to the close proximity between the on-chip load and the VR.

In the VR-on-package topology, alternatively, the power delivery system can be significantly different from a conventional 2-D power network. A cross-sectional view of a VR-on-package is illustrated in Figure 2.9. The VR-on-package system consists



Figure 2.9: Sectional view of VR-on-package with two PoL converters placed next to an IC. The solid and dashed arrows depict the current path, respectively, between the ball grid array (BGA) power pins and the VRs and between the VRs and the IC.

of four parts: (1) an IC, (2) two VRs placed next to the IC, (3) a package that supports a system-in-package environment, and (4) decoupling capacitors placed on the bottom side of the package (not shown in the figure). The package includes the power delivery network between the ball grid array (BGA) power pins and the VR, and the redistribution layer (shown as redistribution layers in Figure 2.9) between the VR and the controlled collapse chip connection (C4) power pins. A high voltage is initially transferred from the BGA power pins to the VRs through the power network within the package, as the solid arrow indicates. After the voltage is converted to 0.8 volts, the current is transferred from the VR to on-chip through the RDL, as indicated by the dashed arrow shown in Figure 2.9.

### 2.2.2 3-D power delivery network

3-D ICs exploit the vertical dimension, providing a promising method to extend scaling. Several 3-D fabrication styles exist such as wire bonding, contactless, and TSV-based 3-D integration [71]. As compared with other 3-D technologies, TSV-based 3-D ICs exhibit advantages such as smaller area, heterogeneous integration, and a dedicated substrate for each layer. TSV-based 3-D ICs are a leading candidate for vertical integration and are therefore a primary focus of this dissertation.

In a TSV-based 3-D IC, multiple layers of planar devices are stacked in the vertical dimension, as shown in Figure 2.10. Communication between these layers is

achieved with fine grained TSVs and microbumps between layers. In the TSV-based approach, dedicated layers are used for different types of devices and systems, such as processors, memories, and analog and/or RF circuits. Note that TSV-based 3-D ICs facilitate heterogeneous integration because each layer can be separately fabricated and optimized using a preferable process for each layer.

Since multiple layers are stacked through TSV and microbumps in 3-D ICs, the current is transferred from the C4 bumps to each layer through P/G TSVs, as illustrated in Figure 2.10. Each functional layer within a 3-D IC is designed and individually optimized. The power delivery network of each layer is therefore similar to a 2-D power network, as described in Section 2.1.2. The most significant difference between a 2-D and 3-D power delivery network is the P/G TSVs. The P/G TSVs play an important role in the construction of a 3-D power network. In addition to P/G TSVs transferring current vertically to the upper layers, P/G TSVs also deliver



Figure 2.10: TSV-based 3-D IC.

current horizontally to each layer specific 2-D power network. The P/G TSVs are therefore key components in the electrical model of a 3-D power network.

The electrical characteristics are included in the model of a 3-D power network, including the decoupling capacitors [84,85], and TSVs [86,87]. The power and ground grids can be represented as either distributed or lumped. The physical characteristics also play an important role in the model of a 3-D power network such as the TSV fabrication process (via-first, via-middle, or via-last) [88], the substrate [85,88], and the effects of the unique current paths within TSV based 3-D circuits [89].

A model of an N-layer power delivery network is illustrated in Figure 2.11. The model includes both the resistive and inductive components of a conventional 2-D



Figure 2.11: Distributed model of an N-layer power delivery network in a 3-D IC [83].

power network. The inductance and resistance of the TSVs are also included. The capacitance of the TSV is ignored due to the decoupling capacitances being much larger than the TSV capacitances [83]. A single current load is shown in the figure, reducing the computational complexity of the model. Additional current loads are included across the entire power delivery network to represent the current demands of the different circuits. Single current sources can be placed at each node of a power grid depending upon the current demands of each circuit block.

### 2.2.3 Power and ground TSV

As previously mentioned, a detailed electrical model of the P/G TSVs is necessary to accurately evaluate a multi-layer power delivery network in 3-D ICs. Fabricating a TSV is however a complex process that requires state-of-the-art techniques and



Figure 2.12: Electrical parameters characterizing a cylinder shaped TSV [90].

equipment, which is composed of dozens of steps. A through silicon via is the vertical metal connection passing through the silicon wafer between two adjacent layers within a 3-D IC [91]. Different TSVs are categorized based on the TSV geometry, functionality, distribution, and fabrication technology. In terms of geometry, a TSV can be categorized as a core, coaxial, cylinder, cubic, or conical TSV [91]. In terms of TSV functionality, a TSV can be categorized as a signal, thermal, or P/G TSV [19]. In terms of TSV distribution, a group of TSVs can be categorized as clustered, lined, grouped, uniformly distributed, or hexagonal [92, 93]. In terms of TSV fabrication, a TSV can be categorized as a via first, via middle, or via last [88].

A circuit model of a cylinder TSV is shown in Figure 2.12. Closed-form expressions for the electrical parameters, shown in Figure 2.12, are provided in [94], and are used to estimate power noise in 3-D systems. The magnitude of these electrical parameters is based on the following expressions. The equivalent resistance of a TSV is [94]

$$Res = \frac{1}{\sigma_w} \cdot \frac{L}{\pi R^2} , \quad (2.1)$$

where  $R$  and  $L$  are, respectively, the radius and length of a TSV.  $\sigma_w$  is the conductivity of the TSV materials (e.g., copper or tungsten).

The equivalent inductance of a TSV at DC is [94]

$$L_{self} = \alpha \frac{\mu_0}{2\pi} \left[ \ln\left(\frac{L + \sqrt{R^2 + L^2}}{R}\right) L + R - \sqrt{R^2 + L^2} + \frac{L}{4} \right] , \quad (2.2)$$

$$L_{mutual} = \beta \frac{\mu_0}{2\pi} \left[ \ln\left(\frac{L + \sqrt{P^2 + L^2}}{P}\right) L + P - \sqrt{P^2 + L^2} \right] , \quad (2.3)$$

$$\alpha = 1 - e^{\frac{-4.3L}{D}} , \quad (2.4)$$

$$\beta = 1 . \quad (2.5)$$

The equivalent inductance of a TSV for high frequency operation is [94]

$$L_{self} = \alpha \frac{\mu_0}{2\pi} \left| \ln \frac{2L}{R} - 1 \right| L , \quad (2.6)$$

$$L_{mutual} = \beta \frac{\mu_0}{2\pi} \left[ \ln\left(\frac{L + \sqrt{P^2 + L^2}}{P}\right) L + P - \sqrt{P^2 + L^2} \right] , \quad (2.7)$$

$$\alpha = 0.94 + 0.52e^{-10|\frac{L}{D}|-1} , \quad (2.8)$$

$$\beta = 0.1535 \ln \frac{L}{D} + 0.592 , \quad (2.9)$$

where  $L_{self}$  is the self-inductance of a single TSV.  $L_{mutual}$  is the mutual inductance between two adjacent TSVs.  $L$ ,  $R$ ,  $D$ , and  $P$  are, respectively, the length, radius,



Figure 2.13: Geometric parameters in array of P/G TSVs.

diameter, and pitch of a TSV, as illustrated in Figure 2.13. The TSV pitch refers to the distance between the center of two adjacent TSVs. DC refers to the frequency range below 200 MHz, and high frequency refers to the frequency range beyond 800 MHz.  $\mu_0$  is the permeability of free space, which is assumed here to be  $4\pi \times 10^{-7} H/m$ . The coupling capacitance  $C_C$  between each TSV pair is [20]

$$C_C = 0.4\alpha\beta\gamma \frac{\epsilon_{Si}}{S} \pi D L , \quad (2.10)$$

$$\alpha = 0.225 \ln(0.97 \frac{L}{D}) + 0.53 , \quad (2.11)$$

$$\beta = 0.5711 \left( \frac{L}{D} \right)^{-0.988} \ln(S_{gnd_{\mu m}}) + (0.85 - e^{1.3 - \frac{L}{D}}) , \quad (2.12)$$

$$\gamma = 1 , \quad (2.13)$$

where  $L$  and  $D$  are, respectively, the length and diameter of the TSV.  $\epsilon_{Si}$  is the permittivity of the Si substrate. The TSV separation  $S$  is expressed as  $S = P - D$  (see Figure 2.11).

In [94], a cylindrical shape for a TSV is assumed in the development of the closed-form expressions. Due to the imperfect manufacturing process of TSVs, a tapering effect is inevitable, particularly in high TSV aspect ratios. Also, many benefits of tapering have been reported [95], such as enhanced manufacturability, a smaller keep out zone (the surrounding area where the devices are affected by TSV induced mechanical and thermal stress), and better balance between power and thermal distribution within 3-D ICs [96]. Closed-form expressions have therefore also been developed to characterize the impedance of tapered shape TSVs [97].

A practical structure of a P/G TSVs is a large array of tens of thousands of TSVs, as shown in Figure 2.13. A paired TSVs model is therefore insufficient to completely characterize a single TSV within a large array of TSVs. The TSVs around a reference TSV (the TSV within a paired model) affect the electromagnetic field in dense TSV networks, changing the electrical parameters such as the resistance, capacitance, and inductance between a paired TSV. Several distribution topologies exist including single, clustered, and distributed. Due to long distance inductive



Figure 2.14: High resolution scanning electron microscope (SEM) images of the etching process of a TSV: a) straight etched TSV hole, and b) tapered etched TSV hole [98].

effects within a large TSV array, the distribution topology of the P/G TSVs affect the ambient electromagnetic field [99], leading to variations in the TSV impedance. A distribution topology aware closed-form expression has therefore been developed [25] to capture the effective inductance of a TSV within a large TSV array. For an  $m \times n$  distributed array, the equivalent inductance is

$$L_{eq} = L_{self} + 4(-1)^{m+n} \beta \frac{\mu_0}{2\pi} \sum_{i=1}^m \sum_{j=0}^n \left[ \ln \left( \frac{\frac{L}{P} + \sqrt{\frac{L^2}{P^2} + m^2 + n^2}}{\sqrt{m^2 + n^2}} \right) \left( \frac{L}{P} \right) + \sqrt{m^2 + n^2} - \sqrt{\frac{L^2}{P^2} + m^2 + n^2} \right] P, \quad (2.14)$$

where the individual parameters are described in (2.2)-(2.9).

## 2.3 Summary

A robust power delivery network with high efficiency is the key issue in enabling high power, high performance computing systems. With aggressive device scaling and increasing circuit complexity, on-chip power noise has become a challenging issue, leading to significant degradation in performance. Advanced on-chip metalization schemes are therefore required to suppress power noise with increasing current demand.

With the greater number of on-chip metal layers, the structure of the power delivery network has become highly complex. A breakdown of the individual components within an on-chip power network is described in this chapter. The global power grid, local power rail, and via stacks are critical in terms of exploring different metalization schemes during early stages of the design processes. A thorough understanding of the

structure of each component of the power delivery network is necessary. The local power rails and via stacks have however historically lacked significant attention.

An accurate and computationally efficient distributed model is critical for the development, optimization, and verification of high performance power delivery networks. Lumped one-dimensional and distributed two-dimensional circuits models of a power delivery network are described in this chapter. Despite the computational efficiency of a lumped model, a distributed model is preferred due to the higher accuracy and the capability to capture local characteristics within a power network. With the increasing complexity of a distributed model, dedicated circuit solvers and algorithms are required to reduce the computational complexity to lessen the computational time and memory resources. Novel algorithms have also been developed with enhanced computational performance, assuming a grid structured power delivery network.

2.5-D and 3-D integration, as practical techniques to extend Moore's Law, have been investigated by both academia and industry. By utilizing an interposer, a 2.5-D system supports high I/O density, and faces fewer challenges and lower cost as compared with 3-D systems. The power delivery bottleneck has made it challenging to provide high power in high performance 2.5-D and 3-D systems. As the key component in 3-D power networks, the TSV impedance characteristics should be accurately extracted, considering different distribution topologies and manufacturing processes.

Although several 3-D IC products have been developed over the past few years, these systems are limited to low power applications. Power delivery networks for 2.5-D and 3-D systems, supporting high currents, is therefore an important objective.

## Chapter 3

# Challenges in High Current 2-D and 3-D systems

The recent blossom of artificial intelligence (AI), machine learning, and internet-of-things (IoT), and the consistent requirements for higher speed scientific computation and cloud computing all heavily rely on high performance computing (HPC). Consider the training process for deep learning where image classification of a single picture in the Alexnet consumes 720 MFLOPs [100]. As a result, the system complexity and processing power of high performance ICs continue to increase. Over the past few years, the number of cores and throughput of high performance ICs have significantly increased. The number of cores in recent high performance processors developed by AMD, Intel, IBM, and Nvidia is illustrated in Figure 3.1. High end desktop processors are approaching twenty cores by the end of 2018. For high end server processors and GPUs, heavily utilized in HPC applications, the number of cores can easily exceed 50. The IRDS also predicts the continuing increase in core number,

where the number of cores in CPUs and GPUs will reach, respectively, 114 and 288 by 2033 [18]. Furthermore, the throughput of high end GPUs continues to increase, as illustrated in Figure 3.2. Due to the utilization of 3-D memory technologies in high end GPUs, the bandwidth has also significantly increased, leading to a huge improvement in throughput over the past few years. With high bandwidth memory (HBM), the tensor processing units (TPUs) developed by Google achieve a throughput of 45 TFLOPS.



Figure 3.1: Number of cores in high performance ICs by AMD, Intel, Nvidia, and IBM, including high end desktop processors, server processors, and high end GPUs [21, 82, 101–115].

At the device and circuit levels, the total power consumption of an integrated circuit can be described by

$$P_{total} = P_{dynamic} + P_{short\_circuit} + P_{static} , \quad (3.1)$$

where the dynamic and leakage power dominate the total power consumption [69]. Dynamic power is due to charging and discharging the parasitic and load capacitances. The increase in the throughput of the processor, and the slowdown in scaling the power supply voltage and load capacitance have led to an increase in the dynamic



Figure 3.2: Throughput (in TFLOPS) of modern high end GPUs developed by AMD and Nvidia, and tensor processing units (TPUs) developed by Google [21,82,101–115].

power consumption. Historically, the leakage power consumption has been insignificant as compared with the dynamic power consumption. In recent years, aggressive CMOS device scaling has led to a significant decrease in the gate dioxide thickness and channel length, which has contributed to a significant increase in gate and channel leakage currents. Combined with the huge increase in on-chip transistors, leakage power has become the primary component of the total power dissipated in modern integrated circuits.

Low power techniques have been developed to reduce the power consumed by high performance processors. For example, dynamic frequency scaling (DFS) reduces



Figure 3.3: Power consumption of modern high performance processors developed by AMD, Nvidia, Intel, IBM, and Google [21, 82, 101–115].

the dynamic power consumption by lowering the clock frequency when a circuit is moderately active or a fast response is not required [15]. Subthreshold circuits [116] and near threshold circuits [117] can reduce the dynamic power consumption by lowering the power supply voltage at the cost of lower performance. Furthermore, to limit leakage currents, advanced materials and technologies have been developed, such as high K dielectric [118] and FinFET technology [119]. Another effective technique to lower leakage currents is power gating [120]. The principle is to disconnect a circuit from the power supply when the circuit is idle.

Due to the requirement for larger number of cores and higher processor throughput, the power consumption of an HPC system is continuing to increase despite the already high power consumption, challenging system performance and reliability [18]. The power consumption of recently developed CPUs, GPUs, and ASICs for HPC systems is illustrated in Figure 3.3. The high performance processors used for data center and AI accelerator applications dissipate particularly high level of power, which can exceed 300 watts. The IRDS predicts increasing power consumption in high performance processors, illustrated as the dashed line shown in Figure 3.3. Alternatively, with the scaling of power supply voltage and demand for greater power, high current HPC systems are being considered. Kiloampere processors are expected in the near future. High currents flowing within an HPC system lead to challenging issues within

the power delivery network from the printed circuit board (PCB) to on-chip, which limits the performance of HPC systems.

The challenges in power delivery networks caused by these high current HPC systems are discussed in this chapter. The high current challenges of delivering power at the PCB and package levels are described in Section 3.1, followed by recent research to alleviate this issue. The challenges of on-chip 2-D and 3-D power delivery are discussed in Section 3.2. A summary of the chapter is provided in Section 3.3.

### **3.1 High current challenges at the PCB and package levels**

As described in Chapter 2, the hierarchy of the power delivery network spans from the PoL voltage regulator to on-chip, including the PCB, package, and IC. As illustrated in Figure 3.4, current is passed from the voltage regulator to on-chip through a resistive/inductive path. This path includes the power planes within the PCB and package, and the power pins between each hierarchy of the power networks; for example, the ball grid array (BGA) and controlled collapse chip connections (C4). Due to the high current within an HPC system and the complex characteristics of the power delivery system, the power loss before reaching on-chip, electromigration, and thermal issues have become challenging topics. The “last inch” power loss and

significant  $IR$  drop due to the high current within an HPC system is described in Section 3.1.1, followed by the introduction of 2.5-D systems to alleviate this issue. The challenge of electromagnetic interference (EMI) in 2.5-D systems is also discussed. The challenges of electromigration within a BGA and C4 due to the high currents are introduced in Section 3.1.2. Voltage stacking, as a technique to reduce current, is also discussed in this section. Thermal effects within the PCB and package levels are described in Section 3.1.3. Advanced cooling solutions for HPC systems are also introduced.



Figure 3.4: Resistive path (dashed line) between the VR and on-chip load across the board and package

### 3.1.1 “Last inch” power loss

The last inch power loss in HPC systems has recently attracted attention from both academia and industry [29]. The term refers to undesirable power loss before reaching the on-chip load due to the high currents passing through the resistive path within the board and package, as illustrated in Figure 3.4. The last inch in terms of power is

$$P_{last\_{inch}} = I_{average}^2 R_{resistive\_{path}} , \quad (3.2)$$

where  $I_{average}$  and  $R_{resistive\_{path}}$  refer, respectively, to the average current flowing from the VR to the on-chip load and the resistance of the current path. The complex characteristics of the resistive path are discussed in Chapter 2, where the PCB and package power planes, vias connecting adjacent power planes, and power BGA and C4 all contribute to the total resistance. In modern HPC systems, this resistance can range anywhere from 400 to 900  $\mu\Omega$ , which may seem negligible. The power loss however grows quadratically with the current, as described in (3.2). Consider an HPC system consuming 250 watts, and a supply voltage of 0.8 volts. The average current traveling through the resistive path is 312 amperes, leading to a significant power loss between 38.9 and 87.6 watts depending upon the parasitic resistance of the path. Furthermore, with the increasing current in HPC systems, as illustrated in Figure 3.3, the power loss is expected to grow significantly to where high performance computing will become power inefficient.

Two directions exist to reduce the last inch power loss. One approach is to reduce the current flowing through the resistive path, as described in the following section. The other approach is to reduce the parasitic resistance. By exploiting advanced systems-in-package topologies, a VR-on-package is an effective candidate to reduce the parasitic resistance along the current paths. A VR-on-package refers to a 2.5-D system, where the PoL voltage regulator is placed on the same package as the processor, as illustrated in Figure 3.5. By moving the VR from the PCB to the package, the original resistance path is shorter due to the close distance between the VR and the on-chip load. Alternatively, the voltage across the resistive path between the PCB and package is increased from 0.8 to 12 volts while passing less current.



Figure 3.5: VR-on-package where the VR is moved from the PCB to the package.

As a result, the last inch power loss is lower with a VR-on-package topology. As compared with an on-chip VR, a VR-on-package topology does not require die area and supports the integration of large passive circuit elements within the VR.

As the operating frequency of a resonant converter increases, EMI has become a major concern, particularly in a VR-on-package environment where the converters are placed near the sensitive circuits [121–125]. EMI is the phenomenon where an electromagnetic (EM) disturbance is generated by the surrounding electronic devices, affecting the operation of the electrical circuits through a conducted or radiated path. Conducted EMI, also called coupling noise, consists of inductive and capacitive coupling through an electrical path. Radiated EMI is the undesired EM radiation generated by high frequency electronic devices without an electrical path. Not only may EMI affect the operation of the circuits within a package, EMI can also pollute the surrounding electromagnetic environment, which is crucial for wireless devices.



Figure 3.6: Radiated EMI pollutes the surrounding electromagnetic environment.

Shielding techniques [122–124] have been developed to reduce radiated EMI. The radiated EMI, passing through the power delivery network within the package is, however, not affected by these shielding techniques. New techniques that reduce radiated EMI generation is therefore highly desirable in high current SiP systems.

### 3.1.2 Electromigration challenges

Electromigration is an old problem within the IC community [126], which refers to the transport of material caused by the gradual movement of the ions within a conductor due to momentum transfer between conducting electrons and diffusing metal atoms. Due to the high temperature and current density in the metal lines, electromigration is a well known reliability issue in on-chip power delivery networks. Mean time to failure (MTTF), described by the Black equation [126], is widely used to quantify the electromigration reliability of a conductor,

$$MTTF = \frac{A}{J^n} \cdot \exp\left(\frac{E_a}{K \cdot T}\right) , \quad (3.3)$$

where  $A$  is a constant based on the cross-sectional area of the conductor,  $J$  is the current density,  $E_a$  is the activation energy of the conductor material,  $K$  is the Boltzmann constant,  $T$  is temperature, and  $n$  is the scaling factor. The current density, conductor material, and temperature play an important role in determining the MTTF. With the increase in power consumption in HPC system, as predicted by the IRDS,

and scaling of the on-chip metalization system, the current density within the metal line is increasing significantly, leading to challenging electromigration issues in power delivery networks.

Temperature is a critical factor in HPC systems. The process for removing heat (the thermal removal issue) as well as advanced cooling systems are introduced in Section 3.1.3. As described by (3.3), one way to relieve electromigration is by using enhanced or novel materials for the on-chip metal lines [127–131]. Bamboo interconnects have been introduced to alleviate electromigration by reducing the width of the interconnect, where the width is smaller than the grain size [127]. In this way, the grain boundary becomes perpendicular to the interconnect, degrading the grain diffusion process. In a traditional dual damascene fabrication process, hydrogenated amorphous silicon carbide (a-SiCx:H) is used as a material for metal capping [128]. Cobalt tungsten phosphorus (CoWP) has been introduced as a metal capping and surface coating material [128], which has a higher activation energy  $E_a$  as compared with a-SiCx:H. A one hundred fold increase in MTTF has been achieved [128]. Different alloys for solder balls and under bump metal have been proposed to alleviate electromigration in BGAs [129]. A high current carrying and highly reliable area array interconnect based on a direct Cu-to-Cu connection has been introduced [130]. A metal pitch of 100  $\mu\text{m}$  with a high current density of  $10^6\text{A/cm}^2$  has been claimed. A one hundred fold increase in current carrying capacity can be achieved by carbon

nanotube copper composites [131], offering the opportunity to replace a copper based metal line with carbon nanotubes or carbon nanoribbons. These advanced materials alleviate electromigration, although are difficult to integrate with existing CMOS fabrication technologies.

From (3.3), another method to alleviate electromigration is to reduce the current density of the conductors within the power delivery network. One way to reduce current density, assuming constant on-chip current demand, is to increase the die area. It is however undesirable to increase die area since a smaller footprint is generally desirable. An interposer based MCM-GPU architecture has been proposed [132],



Figure 3.7: 16 core, four layer voltage stacking system.

where multiple GPU modules are integrated within the same package. The MCM-GPU is 45.5% faster than the largest monolithic GPU with a larger package size and lower current density.

Another way to reduce current density is to lower the on-chip current demand by re-using the current by utilizing voltage stacking. Voltage stacking, also referred to as charge recycling [133] and multi-story power delivery networks [134], is a circuit and architectural level technique that vertically connects multiple voltage domains in series [133–136]. In this way, charge flowing through one voltage domain can be “recycled” within the following voltage domains. A high input voltage and low on-chip current are achieved. An example of a 16 core, four layer voltage stacking system is illustrated in Figure 3.7, where the 16 core processor is divided into four voltage domains. Each voltage domain includes four cores and shares the same voltage level, where  $V_1 = V_2 = V_3 = V_4$ . The processor cores within the same voltage domain, for example within layer 1, are connected in parallel. Alternatively, processor cores in different voltage domains are connected in series, ensuring that the same amount of current flows from layer 1 to layer 4.

Assuming a constant power consumption, an  $n$ -layer voltage stacking system can ideally reduce the on-chip current demand by  $1/n$ , which reduces the  $IR$  drop by a factor of  $n$ , and the power loss by  $n^2$ . Voltage stacking is therefore a useful technique to alleviate electromigration in HPC systems, reducing the metal resources required

by the power network and power pinouts. Ideally, the current passing through each stacked layer is the same. In practice, load or current imbalances exist across the stacked layers. These load imbalances lead to voltage variations across the n-layer voltage domains, challenging system performance and reliability. Extensive research has been conducted to address this load imbalance issue [133–136]. A fully integrated on-chip push-pull switched capacitor converter has been developed to regulate the voltage when load imbalances occur [136]. A hybrid voltage stacking system is proposed in [133], where an off-chip VRM combined with an on-chip IVR is utilized to address load imbalances. It is reported that 82.4 mm<sup>2</sup> of on-chip area is dedicated in this system for the IVRs [133]. Die unfolding is proposed in [135], where the circuit blocks are grouped into two voltage domains based on a power consumption profile. Load imbalances are alleviated and the system is less dependent on the work load schedule. Solutions to address significant load imbalances within high power voltage stacking applications are highly desirable.

### 3.1.3 Advanced cooling systems

As discussed in Section 3.1.2, temperature also has a significant effect on electromigration behavior. As noted in (3.3), a higher temperature reduces the mean time to failure of a conductor, lowering system reliability. Furthermore, a high on-chip temperature can increase the resistance of the interconnect, producing a larger delay,

greater power noise, and more power loss. Heat removal is therefore critical to maintain high performance, power efficient HPC systems. Air cooling systems are widely used in HPC data centers, where a large, expensive infrastructure provides significant the air flow. The energy cost of the air cooling system is not small, ranging between 25% to 60% of the total energy consumed by a data center, which in 2016 was 2% of the total energy consumption of the United States. Due to the low heat capacity of air, an air cooling system exhibits low power efficiency, and can only support small area and low power densities.

Due to the high power consumption of modern HPC processors, high power density is an issue, challenging heat removal systems. The power density of HPC processors developed by Intel, Nvidia, and Google is listed in Table 3.1. Higher power densities are expected due to the greater power consumed in HPC systems. With the higher power density and limited heat removal capabilities of air cooling systems, dark silicon has become an important approach to mitigate increasing current demand. By

Table 3.1: Power density of HPC systems

|                                       | TPU   | i9 7900X | Xeon Phi | Nvidia Titan |
|---------------------------------------|-------|----------|----------|--------------|
| Area pkg (mm <sup>2</sup> )           | 3,969 | 2,295    | 4,636    | 2,184        |
| Area IC (mm <sup>2</sup> )            | 1,064 | 360      | 646      | 815          |
| Power (W)                             | 500   | 140      | 300      | 300          |
| PowerDensity pkg (W/mm <sup>2</sup> ) | 0.13  | 0.06     | 0.06     | 0.14         |
| PowerDensity IC (W/mm <sup>2</sup> )  | 0.47  | 0.39     | 0.46     | 0.37         |

exploiting the high heat capacity of certain engineered fluids, liquid cooling has also attracted significant attention to address this severe heat issue. A two phase immersion cooling system proposed by 3M has been developed [137], where multiple HPC motherboards are immersed within a tank filled with low boiling point engineered fluid. The heat generated on-chip turns the fluid into vapor, which rises to the top and condenses on a coil lid condenser. In this way, the fluid passively circulates within the tank. High power efficiency can be achieved with a liquid cooled system with a power usage effectiveness (PUE)  $< 1.02$ , as compared with a PUE around 2.0 in air cooled systems [138]. A ten fold increase in area density and more than a six fold increase in power density can also be achieved with advanced liquid cooling systems.

## 3.2 High current challenges of on-chip 2-D and 3-D power networks

The high current demand of HPC systems results in reliability and performance issues in PCBs and packages as well as the on-chip power delivery network. With scaling of the on-chip interconnect, the crosssectional area of the local power metal line is also scaled, quadratically increasing the resistance. The resistivity of copper, used in traditional on-chip interconnects, sharply increases as the metal line pitch decreases [139]. Furthermore, the increasing current demand in HPC power networks

leads to significant  $IR$  drops. Replacing the copper interconnect with lower resistivity material is one way to reduce on-chip power noise. Due to the excellent conductivity of both heat and electricity, and the negative temperature coefficient of carbon-based graphene, few layer graphene (FLG) and graphene nanoribbons (GNR) have been considered as an alternative material for the on-chip interconnects [140, 141]. Developing an effective metalization scheme for the global power grid, via stacks, and local power rails is another way to reduce the on-chip power noise [33]. Poorly or overdesigned power networks can either damage the reliability or decrease the performance of the integrated circuits.

As discussed in Chapter 2, 3-D integrated circuits are a strong candidate for heterogeneous integration. The 3-D platform further enables scaling by alleviating the latency bottleneck of the global interconnect. 3-D ICs are however more sensitive to higher current requirements as compared with 2-D ICs due to the smaller footprint.



Figure 3.8: Area comparison of 2-D IC with 3-D IC [90].

Assume  $N$  square modules within a 2-D IC and a 3-D IC with the area of each square module as 1 unit. As illustrated in Figure 3.8, for a 3-D IC with  $N$  layers, the area of a 3-D IC is  $1/N$  of the area of a 2-D IC, significantly increasing the on-chip current density. Developing a power delivery network for 3-D ICs based on carbon nanotube (CNT) TSVs is one way to alleviate high on-chip current densities by exploiting the significantly higher current capacity and heat transport capability of CNTs.

### 3.3 Summary

At the system level, processors with a large number of cores and high processing speed are highly desirable for high performance computing. The IRDS predicts a fourteen and three fold increase, respectively, in the number of cores and throughput in HPC systems. At the transistor level, dynamic and leakage power consumption do not scale with standard CMOS scaling, requiring advanced low power techniques. As a result, power consumption will continue to increase in HPC systems, leading to significant current flowing within the power delivery system.

High current HPC systems introduce challenging design and reliability issues at the PCB, package, and on-chip levels. At the PCB and package levels, the last inch power loss has become a major concern, limiting system power efficiency. A VR-on-package is a useful candidate to address this issue; however, this topology suffers from

significant EMI. Techniques to reduce EMI generation are highly desirable. Electromigration within the BGA and C4 is also a major system reliability issue due to the high current levels. Voltage stacking is an attractive method to address this issue by significantly reducing the on-chip current demand. A high power, voltage stacking scheme that considers load imbalances is highly desirable.

## Chapter 4

# Power Noise in Advanced FinFET Technology Nodes

The increasing demand for high density, high performance integrated circuits leads to aggressive technology scaling, enabling billions of transistors [142]. Due to the area and leakage current advantages as compared to planar CMOS, FinFETs have become the standard CMOS structure as technology is scaled below the 22 nm technology node [143]. While significant research effort is focused on deeply scaled transistors and emerging technologies, the RC interconnect impedance is challenging performance improvements brought by technology scaling. The parasitic capacitance of the local metal lines is less due to the adoption of low-k dielectrics and air gap interconnects [144]. The significantly increasing resistance of the local interconnects has however become the dominant limitation to performance improvements despite faster devices and greater levels of integration [145]. Scaling the cross sectional area of the local

interconnects however quadratically increases the resistance. The resistivity of copper, used in traditional on-chip interconnects, sharply increases as the metal line pitch decreases [139], as illustrated in Figure 4.1. In this case, the local power network is also highly resistive, leading to significant on-chip power noise.

Replacing copper interconnect with lower resistivity material interconnect is one way to reduce the effects of the “resistivity wall.” Silver is one of these materials whose bulk level resistivity is lower than copper. Due to the excellent conductivity of both



Figure 4.1: Interconnect resistivity of different materials versus line width.

heat and electricity, and the negative temperature coefficient of carbon-based material graphene, few layer graphene (FLG) and graphene nanoribbons (GNRs) have been considered as an alternative material for on-chip interconnects [140, 141]. Graphene material has been listed in the technology roadmap from ITRS 2015 [145] and many industrial research centers [146]. The thin film resistivity of three materials, silver, FLG, and GNRs, has been investigated, respectively, in [147–149]. A comparison of the resistivity of different materials with interconnect width scaling is also illustrated in Figure 4.1. The thin film resistivity of silver increases significantly at 50 nm, and eventually becomes larger than copper as the metal line pitch is scaled to 10 nm. By intercalating FLG with ferric chloride, a sheet resistance of  $8.8 \Omega/\square$  has been reported [149]. Based on the thickness of five layer graphene, the resistivity of FLG is lower than copper, particularly when the metal line pitch is small, making FLG a promising material for a highly resistive local power network. GNR exhibits a higher resistivity comparable to copper when the metal line pitch is small (from 40 nm to 10 nm).

The “resistivity wall” phenomenon also leads to a more challenging power network design process due to the significantly resistive local power and ground rails. Reliable and energy efficient power distribution networks are necessary in high performance computing systems [69]. Decreasing supply voltages lead to smaller noise margins. Higher current densities and clock frequencies increase both resistive and inductive

power supply noise. Moreover, a primary source of power noise is due to the highly resistive local power metal lines and vias between adjacent metal layers in advanced technology nodes. Poorly or overdesigned power networks either damage the reliability or decrease the performance of integrated circuits. Early assessment of the effects of the structure and material of the power networks supports tradeoffs among power noise, performance, and technology choice.

The rest of this paper is organized as follows. The structure of a typical standard cell based power distribution network is presented in Section 4.1. A modeling approach is discussed in Section 4.2. The components of power noise in advanced technology nodes is described in Section 4.3. Power noise suppression methods are presented in Section 4.4, followed by some conclusions in Section 4.5.

## 4.1 Standard cell-based power network

The structure and impedance characteristics of power grids are presented in Section 4.1.1. The topology of a standard cell circuit influences the design of the power network, and therefore an overview of the structure is provided in Section 4.1.2.



Figure 4.2: Topology of a standard cell based power network: a) planar view, b) profile view.

#### 4.1.1 Hierarchy of power grids

The resistance of the power metal lines is affected by the structure of the power grids. An on-chip power grid is a hierarchical structure consisting of a global interdigitated mesh, local power and ground rails, and a via stack connecting the global power grid to the local power rails, as illustrated in Figure 4.2. A typical global power grid for high performance ICs uses two layers of orthogonal metal lines to form a mesh structure, as illustrated in Figure 4.2(a). Adding global metal layers decreases the grid impedance, reducing power noise in the global power grid. The total number of on-chip metal layers is however limited by the technology. A mesh structure increases the reliability and robustness of a power network due to the multiple redundant paths. The mesh structure also reduces the resistance and parasitic capacitance of the power grids. Each metal layer in a mesh consists of parallel P/G pairs separated from adjacent pairs by tens of micrometers [145]. The pitch of each adjacent P/G pair is a design tradeoff between the power distribution network and signal/clock routing. The impedance of the global power network typically exhibits low resistance and significant inductance due to the mesh structure. As the current density and clock frequency increases with technology scaling, inductive  $L di/dt$  noise becomes comparable to the resistive noise [24, 150]. The power noise contributed by the global power grid should therefore be carefully evaluated to satisfy the strict power noise requirements in advanced technology nodes.

#### 4.1.2 Standard cell based power rails

An individual standard cell track is structured as a row with a substrate region patterned between the power and ground rails, as illustrated in Figure 4.2(a). Gates within a cell library are structured to fit within a constant height track with transistors patterned within the substrate. The height of a standard cell is typically controlled by lithographic limits introduced by double and quadruple patterning processes [47]. Standard cell gates are mirrored to ensure that two tracks share a common power rail, doubling the effective current load on the line. After the gates are placed, the interconnections are routed among the internal gates, constraining the available metal resources. The power rail impedances are dominated by the metal resistance and decoupling capacitance. On-chip power noise is caused by current switching on the track rails with the greatest contribution arising from the clocked gates and buffers [48]. Most notably, local power noise is contributed by the *IR* drop within the power rails when multiple loads simultaneously switch. Reducing the resistance of the local power network is therefore an effective approach to mitigate resistive power noise. Early impedance characterization and power noise analysis can therefore be used to evaluate different metallization schemes and material alternatives in advanced technology nodes. The local power rails are typically not connected to each other to alleviate routing congestion in local metal layers. The global power grid is connected to the local power rails by a via stack, as illustrated in Figure 4.2(b). The size and

resistance of these vias are determined by the overlap area between metal lines and the thickness of the metallic barrier. The via is assumed to be cylinder shaped with a layer of metallic barrier, where the diameter is the same as the width of the adjacent power line. As technology is scaled, notably, the resistance of the via increases significantly due to the smaller cross sectional area and highly resistive metallic barrier of the via. The impedance characteristics of the on-chip power network affect the power noise generated in the three different parts of a power network.

## 4.2 Circuit model

The overall grid model consists of a load model, a local rail model, and a global mesh model, as illustrated in Figure 4.3. Due to the symmetric characteristics of power distribution networks, only the  $V_{DD}$  portion of the power network is illustrated in Figure 4.3. The digital load is modeled as a current source. The local power rail is modeled as a system composed of distributed resistors and capacitors. The global grid is modeled as an interdigitated mesh with the parameters described in [151]. The mesh size is based on the space between the pads. The model considers the physical area, supply current, and stage delay for each technology node (14, 10, and 7 nm). The load models, track rail, and stripes across the power rails are discussed in the following sections.

#### 4.2.1 Load model

The peak power noise is dependent on the clock network [152]. The load model is based on the current demands of a register and adjacent gates within a standard cell track. A model of an interdigitated power and ground distribution network is discussed in [43]; however, only a global power network is considered. On-chip power noise in a high performance system-on-chip based IC is evaluated in [153]. A lumped model is utilized where the load is modeled at the block level. A distributed on-die power grid model is introduced in [154], where the on-die power noise is dependent on the microarchitecture and current profile within different blocks. An individual load on a track rail is modeled as a current source with a triangular load characteristic [155, 156].

Those gates are spatially adjacent to the register and are likely to switch at approximately the same time as the register, thereby contributing to the local current.



Figure 4.3: Model of power network

If an adjacent gate at the load switches before the track rail is recharged to the supply voltage, the magnitude of the noise increases [157]. If the gate does not switch before the voltage is restored to  $V_{DD}$ , the gate does not contribute to the peak noise [13,158]. Recharging determines the noise window ( $t_{window}$ ) during which the loads that switch within the window are summed and the gates that switch outside of the window are ignored. The noise window, which determines the recharge time of a track rail, is approximated by three  $RC$  time constants,

$$t_{window} \approx 3 \frac{N_{cell}^2}{4} R_{cell} (C_{cell} + C_{decap}), \quad (4.1)$$

where  $N_{cell}$  is the number of cells between each P/G pair,  $R_{cell}$  and  $C_{cell}$  are, respectively, the resistance and capacitance of the track rail within a standard cell, and  $C_{decap}$  is the decoupling capacitance per cell. No additional decoupling capacitors are considered in this work. The placement and optimization of decoupling capacitors have been investigated in [157,159].

Only those adjacent logic gates that switch within the noise window contributes to the peak power noise. The delay of the adjacent gates is approximated by the delay of an inverter. The load current is

$$I_{Load} = \frac{\alpha 2t_{window}}{t_{inv1}} I_{inv1} + 2I_{invd4}, \quad (4.2)$$

where  $t_{window}$  is the noise window,  $t_{inv1}$  and  $I_{inv1}$  are, respectively, the delay and peak current of a 1X inverter,  $I_{inv4}$  is the peak current of a 4X inverter, and  $\alpha$  is the switching factor of the circuit. Note that the intention of the load model is not to precisely emulate billions of load changes across the entire power network, but rather to mimic the peak power noise under realistic conditions when detailed block or load information is not available.

#### 4.2.2 Rail model

Each local rail is modeled as a distributed resistor-capacitor with multiple loads, with the length of the rail determined by the space between two P/G pairs in the global power network. At least one load is placed at the center of the rail to model a single register assuming the worst case position. The number of loads and the space between loads are determined by the target clock frequency. An individual logic gate is modeled with an inverter delay ( $t_{inv}$ ) where the logic depth ( $D$ ) at a target frequency ( $f_{clock}$ ) is

$$D = \frac{1}{f_{clock}t_{inv}(1 + U)}, \quad (4.3)$$

where  $U$  is the delay uncertainty. The logic depth is the number of gates between adjacent loads on a rail. The width of an inverter is used to estimate the size of a standard cell. The physical distance between loads on a local rail is therefore known. Based on this assumption, the total number of active loads and the impedance

between each active load can be estimated. The logic depth is also used to determine the decoupling capacitance,

$$C_{decap} = C_{gate}(1 - \beta)D, \quad (4.4)$$

where  $C_{gate}$  is the gate capacitance of an inverter, and  $\beta$  is the fill factor of the standard cell layout. The fill factor, the fraction of silicon area occupied by the standard cells, is a common metric for characterizing the efficiency of standard cell circuits [160].

### 4.2.3 Striping of power rail

Each track rail is typically distinct. Recently, however, low impedance connections between adjacent track rails have been used to reduce the local rail resistance and any associated power noise, as illustrated in Figure 4.4(a). These connections between the power and ground rails, called stripes, ensure that loads on the adjacent rails interact. For any interaction, however, the worst case power noise of a single local power rail (described as local power noise in the following section) occurs when the power rail is not connected with striping. The greatest reduction in power noise from striping occurs when the adjacent rails are not affected by simultaneously switching signals. These two conditions, therefore, bound the noise generated by a circuit. The number of interacting rails is determined by approximating a set of rails as a resistive



Figure 4.4: Circuit models and physical structure of striping between the local power rails. a) comprehensive circuit model, b)  $R_{branch}$  approximated circuit model, and c) physical structure of a stripe.

tree, as illustrated in Figure 4.4(b). The resistance from the center load to the edge of the track rail is

$$R_{branch}(x) = R_v + a^x * R + \left( \frac{1}{a^x * R} + \frac{1}{R_{branch}(x+1)} \right)^{-1}, \quad (4.5)$$

where  $R_v$  is the resistance of a stripe,  $x$  is the number of additional branches, and  $a$  is the scaling factor of the resistance. As  $x$  increases, the error decreases. Note that (4.5) is used to estimate the maximum number of rails that minimizes the error. A distributed resistance across the rail is included in the model.

The proposed power methodology produces a general circuit model to evaluate peak power noise during the early exploratory design stage when floorplan and placement information is unavailable. This power network model is not intended to be integrated within a power network synthesis and optimization flow [161–163] or to compete with fast simulation algorithms within power network solvers which support many billions of nodes [56, 164].

### 4.3 Characterization of power noise

The model has been evaluated for power networks in 14 (N14), 10 (N10), and 7 nm (N07) CMOS FinFET technologies with a clock frequency ranging from DC to 5 GHz. The global power grid dimensions are based on the 14 nm technology node.

The pitch of the global grid is subsequently linearly scaled to N10 and N07 based on the global grid in 14 nm technology. Model generation and simulation are based on MATLAB and Cadence Spectre. The contribution of power noise in advanced technology nodes is discussed in Section 4.3.1. A comparison of the local power noise for different technology nodes is provided in Section 4.3.2.

### 4.3.1 Power noise components

The power noise is assumed to be the peak power noise due to simultaneously switching loads. The total on-chip power noise is the voltage variation from the power pad to the local  $V_{DD}$ . The entire power distribution network structure, including the global power grids, local power rails, and via stacks, contributes to the power noise. To comprehensively evaluate the power noise from the perspective of technology scaling, the total power noise considers the hierarchy of the power network. The global power noise consists of  $IR$  and  $L di/dt$  noise introduced by the mesh grid and high density transient currents. The local power noise is due to the highly resistive power rails and is dominant in advanced technology nodes. The via stack power noise is due to  $IR$  drops across stacked vias connecting the global power grid to the local power rails. The resistance of each local via is significant. The resistance of a via between metal 1 and metal 2 in the 10 nm technology node can reach  $30 \Omega$  [145].

The package inductance is important not only at the package and board levels

but also at the IC level. A comparison between on-chip and off-chip power noise for different technology nodes is illustrated in Figure 4.5. Note that the assumed package impedance is based on [154]. The power noise is averaged across clock frequencies ranging from DC to 5 GHz.

The total on-chip power noise ranges from 14.2% to 18.5% in 14, 10, and 7 nm technology nodes with a trend of increasing power noise with technology scaling, although the 10 nm node exhibits lower power noise than the other two nodes (see Fig. 4.5). The reason is that the reduction in power noise in global power grids is larger than the increase in power noise in local power rails and via stacks.

The distribution of the three power noise components vary with technology scaling,



Figure 4.5: Comparison between on-chip and off-chip power noise in 14, 10, and 7 nm technology nodes.

as illustrated in Figure 4.6. The power noise is averaged across clock frequencies ranging from DC to 5 GHz. The via stack power noise and local power noise exhibit the same trend of increasing noise as technology scales due to the significant resistance of the vias and local power rails. The global power noise, however, decreases 4.6% and 4.0%, respectively, in N10 and N07 as compared with N14, which is 10.8%. This reduction in global power noise is due to the lower resistance and inductance of the global power grid in N10 and N07 due to the decreasing global power/ground dimensions. Notably, global power noise in N14 is 10.8%, which dominates the total power noise, as compared with 6.2% and 6.8%, respectively, in N10 and N07. A



Figure 4.6: Components of on-chip power noise in 14, 10, and 10 nm technology nodes.

metalization scheme which reduces the global power noise is therefore preferable in N14. For N07, local power noise is the largest contributor to the total power noise, indicating methods to reduce local power noise are needed in N07. As an effective method to mitigate local power noise, the effects of graphene interconnects on power noise suppression are discussed in the following section



Figure 4.7: Local peak power noise in 14 nm, 10 nm, and 7 nm technologies with increasing clock frequency.

### 4.3.2 Different technology nodes

The local  $V_{DD}$  rails exhibit a peak power noise that ranges from 3% to 10% of  $V_{DD}$  with a trend of increasing power noise with technology scaling. As the clock frequency supported by the track increases, the power noise increases in discrete steps, as illustrated in Figure 4.7. Each step is due to the larger number of loads that simultaneously switch on a track rail, which corresponds to a relative decrease in logic depth. Local noise levels also increase with each technology node, although the magnitude of the noise is strongly dependent on the clock frequency and number of loads per rail. At lower frequencies with only a single load switching per rail, N10 and N07 exhibit, respectively, power noise increases of 0.7% and 1.8% as compared to N14. At higher frequencies with two loads per rail, the power noise increases, respectively, by 1.8% and 4.1%. This behavior is expected as the width of a standard cell gate is proportionally larger with scaled technologies, producing a larger track resistance per cell.

To measure the effects of power noise on circuit performance, a five stage ring oscillator (RO) is driven with power noise injected into both the power and ground rails. The per cent reduction in ring oscillator frequency is depicted in Figure 4.8. As the power noise increases with frequency, the performance of the ring oscillator decreases. As expected, the RO performance increases with each technology generation and drops discretely with increasing clock frequency. Notably, the magnitude

of the decrease in oscillator frequency is much higher in N07 than in N10 and N14, indicative of the higher sensitivity to power noise with device scaling. At frequencies above 3 GHz, the performance of the N07 ring oscillator drops below the performance of the N10 ring oscillator operating at a lower clock frequency. Intuitively, the delay of an N07 circuit degrades, losing the advantages of scaling. Maintaining the same performance requires a proportionally smaller P/G pitch that is more aggressive than a linearly scaled grid.



Figure 4.8: Per cent decrease in performance of average power noise of a five stage ring oscillator in 14 nm, 10 nm, and 7 nm technologies normalized to an N14 ring oscillator.

## 4.4 Power noise suppression

The dependence of power noise on additional global power metal layers, stripes, graphene interconnect, and local interconnect scaling is discussed in this section. Methods to suppress power noise in power distribution networks are discussed in the following subsections. The effectiveness of additional power metal layers to reduce global power noise is presented in Section 4.4.1. The striping technique is discussed in Section 4.4.2. Reductions in power noise due to graphene is evaluated in Section 4.4.3. The scaling scenario for local power rails affects the local and via stack power noise, which is discussed in Section 4.4.4. A preferable metalization scheme for different technology nodes to reduce the total power noise is discussed in Section 4.4.5.

### 4.4.1 Additional global power metal layers

As technology is scaled, the number of on-chip metal layers increases, resulting in multiple metal layers available for the global power network [165]. Adding metal layers to the global power grid introduces more paths for the current to flow, lowering the grid impedance. In this section, reductions in global power noise due to adding layers is evaluated for different advanced technology nodes.

These additional power metal layers are oriented orthogonal to the adjacent metal layers to lower inductive coupling, thereby producing a mesh structure [36]. The size of the metal line and the pitch between adjacent metal lines are assumed the same.

As expected, global power noise decreases as more global metal layers are added, as illustrated in Figure 4.9. The rate of global power noise reduction however decreases with increasing number of additional global layers. Further increase the number of dedicated layers is not efficient to reduce global power noise. Note in Figure 4.9 that the baseline of the global power grid is two layers. The greatest reduction in global power noise for N14, N10, and N07 is, respectively, 8.1%, 4.6%, and 5.2% when an additional six metal layers are dedicated to the global power grid. Adding power



Figure 4.9: Degradation in global power noise versus additional global power metal layers

layers is shown to be more advantageous in N14 as compared with N10 and N07 where global power noise is less significant.

#### 4.4.2 Stripes technique

One method to reduce local power noise is applying multiple stripes to adjacent track rails. As a primary component of on-chip power noise, local power noise become dominant in the N07 node. To reduce local power noise, an individual track rail can use multiple stripes to the adjacent rails, each with a variable width. The noise exhibited by a 3.6 GHz circuit with striping for variable width and count is illustrated in Figure 4.10. For reference, the peak noise of a 3.6 GHz circuit without striping for the N14, N10, and N07 technology nodes is, respectively, 4.6%, 5.7%, and 7.1%. The stripe count is the number of stripes per track rail, and the stripe width is the pitch of a stripe with additional vias. The stripe count and stripe width are both normalized to the minimum metal pitch of the technology node.



Figure 4.10: Effect of track stripe count and stripe width on normalized power noise, a) 14 nm, b) 10 nm, and c) 7 nm technologies. A 3.6 GHz frequency is assumed.

Introducing striping reduces power noise by almost a factor of two for each technology node, with a slight increase in noise reduction with each technology generation. The maximum stripe width and count, with nine stripes at a stripe width of ten, is impractical in conventional circuits for any technology node. In these cases, ten cells are between each stripe, and each stripe is approximately the size of four inverter cells. These additional interconnects cause significant routing congestion and area overhead.

Much benefit, however, can be achieved with wide stripes. A single stripe with a stripe width of ten reduces the power noise by almost a third for N14, N10, and N07. This reduction in noise is due to the relatively large resistance of the via for each stripe. As the stripe width increases, additional vias can be added, reducing the effective resistance of the stripe, thereby lowering the resistance of the path to the power supply. At stripe counts greater than five, there are diminishing returns on the reduction in power noise. In this case, a stripe width above six reduces much of the power noise without incurring excessive overhead.

#### 4.4.3 Graphene interconnects

Another method to reduce power noise is exploiting lower resistivity material in power grids to reduce the effects of the “resistivity wall.” As illustrated in Figure 4.1, the resistivity of GNRs is comparable to copper, and the resistivity of FLG is

lower than copper in deeply scaled metal lines [148]. Although integrating graphene with CMOS technology is not yet practical, graphene as an interconnect replacement significantly reduces power noise.

Power noise is evaluated for the 7 nm technology with five stripes across the power ground rails for three different materials. The resistivity of GNRs is extracted from experimental data based on the local interconnect width used in 7 nm technology [148]. The resistivity of FLG is determined based on the sheet resistance reported in [149] and the thickness of typical five layer graphene. The third material is copper. As illustrated in Figure 4.11, a large difference in power noise between FLG and



Figure 4.11: Peak power noise in GNRs, FLG, and copper power grids with increasing clock frequencies in 7 nm technology.

copper is exhibited since the difference in resistivity is significant in 7 nm interconnect technology. A 59.1% reduction in peak noise is achieved with FLG as compared with copper. The bottleneck is the vias between two adjacent metal layers, which is highly resistive in advanced technology nodes as compared to the resistance of the metal lines.

#### **4.4.4 Scaling of local power rails**

For a global mesh structured power grid, the pitch of each power/ground pair decreases with technology scaling. The width of the global power/ground interconnect is however fixed in advanced technology nodes to prevent an increase in the impedance of the global power grid. Widening the global power grid significantly increases on-chip area while also introducing a larger parasitic capacitance between adjacent metal layers. For interdigitated local power rails, the pitch of the adjacent power and ground rail is proportional to the gate pitch to match the standard cell height for each technology node. The width of the local power rail is proportional to the minimum metal pitch of each technology. This scaling process is a primary source of power noise due to IR drops.

In evaluating power noise, the width of the local power rail is set to three times the minimum metal pitch of each technology. A tradeoff should be considered between the physical area and the impedance characteristics of the local power rails to satisfy

power noise budgets in advanced technology nodes. A smaller standard cell height allows the on-chip area of the local power rails to be increased while maintaining performance improvements. As illustrated in Figure 4.12, a 32.4% reduction in peak noise is exhibited after increasing the power rail width from three times to five times the minimum metal pitch in a 7 nm technology. As compared with Figure 4.11, the reduction in power noise with larger power rail widths is lower than exploiting new interconnect materials. Changing the metal width is however more practical since this change does not require novel fabrication and integration technologies. Increasing the



Figure 4.12: Peak noise in 3X, 4X, and 5X minimum metal pitch interconnect scaling scenarios with increasing clock frequencies in 7 nm technology.

width of the local power rail degrades performance due to the large area overhead of the local metal layer.

#### 4.4.5 Metalization schemes for advanced technology nodes

In this section, power noise is compared for four different scenarios (A: baseline case, B: adding two extra metal layers for the global power network, C: increasing the local metal line width to five times the minimum metal pitch, D: adding five stripes to each power track). For reference, in the baseline case, the power network utilizes two metal layers for the global grid. The local metal width is three times the minimum metal pitch without striping.

Comparing scenarios B, C, and D with A, the total power noise in N14, N10, and N07 is suppressed. The greatest reduction in power noise is, respectively, 5.1% (case B), 4.6% (case C), and 1.1% (case D), as illustrated in Figure 4.13. An additional two power layers significantly decrease power noise in N14 but has limited effect on N10 and N07, indicating in N14 that adding global power layers is preferable to reducing power noise. A 5X metal line width achieves the greatest reduction in power noise (5.1%) as compared with adding five stripes (3.0%) in 7 nm technology. A wider local metal line more efficiently suppresses power noise in N07 than the stripping technique. Local power noise is reduced by wider metal lines, as well as via stacking due to the larger vias. As a result, widening the metal line has the greatest potential to reduce

the total power noise in 7 nm technology. Comparing scenarios B, C, and D for N10, a 5X metal line width is less efficient in suppressing power noise than the five stripes technique. Widening the metal line is less advantageous due to the relatively high resistance via stack in N10 as compared with N07.

## 4.5 Summary

An exploratory modeling methodology is proposed for assessing power noise in standard cell digital circuits. Models are discussed for 14, 10, and 7 nm technologies to evaluate noise trends. Local resistive noise is shown to increase with technology



Figure 4.13: Total power noise in 14, 10, and 7 nm technology nodes for four different scenarios.

scaling and starts to dominate the total power noise at the 7 nm node. The effects of local stripes are evaluated on power grids, exhibiting a 2X reduction in local power noise. Adding global power metal layers is an effective method to reduce global power noise. Exploiting new materials in on-chip interconnect exhibits good potential to lower power noise in advanced technology nodes. Tradeoffs between power noise and performance need to be carefully considered when scaling the width of the local power rails. In 14 nm technology, providing additional global metal layers is preferable to lowering power noise, where a 30.6% reduction in power noise is exhibited. Below 10 nm, local power rails with wider metal lines are effective in suppressing local and via stack power noise.

## Chapter 5

# EMI Challenge in 2.5-D System with High Voltage VRs

Compact systems-in-package (SiP) and three-dimensional (3-D) integrated circuits (ICs) exhibiting small form factors are primary platforms for modern applications such as portable wireless devices and the Internet of Things (IoT). Integrating small passive components is therefore essential. These small passive elements along with a fast transient response require on-chip or in-package power conversion to operate at high frequencies [166]. A conventional switching power converter, for example, a pulse width modulation converter, suffers from switching losses at high frequencies, limiting utilization in modern high performance applications [13, 167]. A resonant converter is a power converter topology based on tuning a resonant network to a specific frequency [29, 168]. Due to zero-current switching (ZCS) and zero-voltage switching (ZVS) properties, resonant converters are considered a promising alternative in high frequency applications, as well as integrated power conversion [169]. By

tuning the capacitor and inductor of the  $LC$  tank, both the current and voltage waveforms across the switches of the resonant converter can be controlled. The switching activity occurs during zero current or voltage switching, eliminating the power dissipated by the switches and mitigating electromagnetic interference (EMI) generated by the converter [170].

An important parameter of a resonant converter is the step-down ratio. The input voltage of a point-of-load (PoL) converter is dependent upon the topology of the power supply architecture. In a centralized power architecture (CPA) system, a two stage step-down structure is typically utilized to achieve a high step-down ratio [171]. A high DC voltage (for instance, 48 volts), generated from an AC-to-DC converter, is initially converted to a medium level DC voltage (for instance, 12 volts). A PoL converter subsequently transfers the medium level voltage to an on-chip voltage (for instance, 0.8 volts). Due to increasing current demands and lower on-chip supply voltages, a CPA suffers from significant distribution losses. A distributed power architecture (DPA) has therefore become a widely used power distribution topology [172, 173]. In DPA systems, the high DC voltage is directly converted to the load voltage using a PoL converter to eliminate local distribution losses caused by low voltage transmission. With on-chip voltage scaling to 0.8 volts and lower voltages, high step-down ratio converters for DPA systems are required [172]. Directly increasing the turns ratio of a transformer can achieve high step-down ratio conversion; the

power efficiency however drops due to the significant parasitic impedances. Although a DPA system can reduce distribution losses between the AC-to-DC converter and the PoL converter, the power loss due to the parasitic resistance between the PoL converter and the load can still be substantial, particularly in high current demand applications. SiP is a promising technology to mitigate the power loss between the PoL converter and the load. By placing a PoL converter within the package [169], distribution losses are further reduced due to the high voltage transmission from the AC-to-DC converter to the load.

As the operating frequency of a resonant converter increases, EMI has become a major concern, particularly in an SiP environment where the converters are placed near the sensitive circuits [121–125]. Not only may EMI affect the proper function of the digital circuits within a package, EMI can also pollute the surrounding electromagnetic environment which is crucial for wireless devices and IoT. EMI is affected by the waveform profile and magnitude of the current flowing through the converter. The harmonic behavior associated with the high frequency current and voltage waveforms is a major source of EMI. A sinusoidal current waveform exhibits few harmonics (ideally only the primary harmonic), significantly mitigating EMI [174, 175]. Based on this concept, a novel high step-down ratio, low EMI LLC resonant converter with a distributed topology is proposed in this chapter.

The rest of the chapter is organized as follows. The operation and performance

characteristics of an LLC resonant converter are described in Section 5.1. The performance degradation of the LLC resonant converter caused by the high turns ratio is discussed in Section 5.2. In Section 5.3, a novel distributed LLC resonant converter is introduced. The performance and power efficiency of this converter are also presented. In Section 5.4, near field simulations of the EMI of the proposed distributed resonant converter are described. Some conclusions are offered in Section 5.5.

## 5.1 LLC resonant converter

The ZCS and ZVS properties of resonant converters with quasi-sinusoidal current waveforms provide a narrow EMI spectrum and high power efficiency [167]. A PoL LLC resonant converter topology is composed of four stages, a sinusoidal current generator, isolated power converter, rectifier, and *LC* filter. The generation of sinusoidal current in the resonant converter is described in Subsection 5.1.1. The operation of a LLC resonant converter is presented in Subsection 5.1.2, followed in Subsection 5.1.3 by an evaluation of the performance characteristics.

### 5.1.1 Sinusoidal current generation

The first stage of an LLC resonant converter generates the sinusoidal current. The input of a PoL converter is a high DC voltage. This DC voltage is transformed into a sinusoidal current. As illustrated in Figure 5.1(a), a series *LC* tank connected to a



Figure 5.1: Quasi-sinusoidal current generation circuit within a resonant converter. (a) Sinusoidal current generation mechanism, (b)  $LC$  tank circuit, and (c) voltage across  $LC$  tank and quasi-sinusoidal current.

pulse voltage source generates a sinusoidal current. The resonant frequency is  $f_{res} = 1/2\pi\sqrt{LC}$ , where  $L$  and  $C$  are, respectively, the resonant inductor and capacitor within the  $LC$  tank. To achieve a specific operating frequency of the converter,  $L$  and  $C$  are selected to match the resonant frequency of the  $LC$  tank.

A full bridge configuration is adopted to transform the DC voltage into a pulse voltage across the  $LC$  tank, as illustrated in Figure 5.1(b). The switch pairs,  $S_1$  and  $S_4$  (first pair), and  $S_2$  and  $S_3$  (second pair), are intermittently turned on and off. In the first half cycle of the input pulse, where  $S_1$  and  $S_4$  are open and  $S_2$  and  $S_3$  are closed, the voltage across the  $LC$  tank  $V_{LC}$  is set to  $V_{dc}$ . In the second half cycle,  $S_2$  and  $S_3$  are open, and  $V_{LC}$  is set to  $-V_{dc}$ . An alternating pulse voltage is generated across the  $LC$  tank, ranging from  $+V_{dc}$  to  $-V_{dc}$ . If the switching frequency matches the resonant frequency of the  $LC$  tank, a sinusoidal current is generated. Note that  $V_{dc}$  originates from the previous DC-to-DC or AC-to-DC conversion stage. In a typical CPA, the DC voltage is converted into a voltage ranging from 10 volts to 15 volts before reaching the PoL converter.

The sinusoidal current generation topology has been evaluated in Cadence Virtuoso assuming a 12 volt DC input voltage. The converter operates at a frequency of 2 MHz.  $L$  and  $C$  are selected to maintain a 2 MHz resonant frequency. A quasi-sinusoidal current waveform generated by the  $LC$  tank is illustrated in Figure 5.1(c). The frequency of the sinusoidal current is the same as the resonant frequency. The impedance of the passive resonant elements cancels at the resonant frequency; the amplitude of the current is therefore based on the parasitic resistance of the wire and coil windings. By limiting these parasitic resistances, a quasi-sinusoidal current is achieved.

### 5.1.2 Operation of the LLC resonant converter

As illustrated in Figure 5.2, the proposed LLC resonant converter consists of four parts: (1) a primary stage  $LC$  tank, (2) a transformer, (3) a rectifier in the secondary stage, and (4) an  $LC$  filter. The primary stage  $LC$  tank generates sinusoidal current, as described in the previous subsection. Voltage conversion and isolation are achieved by a step-down transformer. The step-down ratio determines the turns ratio of the transformer. A transformer efficiency of 97% is assumed for the proposed converter[167]. A full wave rectifier in the secondary stage of the converter transforms the AC current from the transformer into a DC current flowing to the load. Switches within the resonant converter are based on power MOSFETs, as illustrated



Figure 5.2: Full bridge isolated LLC resonant converter

in Figure 5.2. An  $LC$  filter removes any high frequency harmonics originating from the imperfect sinusoidal current as well as stabilizes the output voltage.

The step-down transformer transfers the sinusoidal current from the primary stage ( $I_{res}$ ) to the branches of the secondary stage ( $I_{branch1/2}$ ). The magnitude of  $I_{branch1/2}$  is determined from both the load and the turns ratio of the transformer. Two current paths control the switches,  $S_5$ ,  $S_6$ ,  $S_7$ , and  $S_8$  in the rectifier stage, as illustrated in Figure 5.2. When switches,  $S_6$  and  $S_7$ , turn on, the first half cycle of the sinusoidal current enters branch 1, flows through the load, and returns to the transformer in branch 2. When switches,  $S_5$  and  $S_8$ , turn on, the second half cycle of the sinusoidal current enters branch 2, flows through the load, and returns to the transformer in branch 2. Switching pairs,  $S_6$  and  $S_7$  and  $S_5$  and  $S_8$ , intermittently turn on and off two current paths at the resonant frequency of the primary  $LC$  tank stage. The load current ( $I_{load}$ ) is therefore a positive half cycle sinusoidal current at twice the resonant frequency, as illustrated in Figure 5.2. Due to the output capacitor  $C_{out}$ , the output voltage across the load exhibits less ripple, proportional to the current flowing through the load.

### 5.1.3 Performance evaluation

The full bridge LLC resonant converter is evaluated in Cadence Virtuoso. A 12 volt DC input voltage from an intermediate bus converter and a 0.8 volt DC output



Figure 5.3: Response of the LLC resonant converter to a change in the load. (a) Static load, and (b) dynamic load.

voltage is assumed, where 0.8 volts is a typical voltage level in advanced CMOS technologies [145]. An ideal transformer with a turns ratio of 15 achieves the step-down conversion. The load is modeled as a resistor. The magnitude of the load resistance is dependent on the current demand.

A positive half cycle sinusoidal current flows to the load ( $I_{load}$ ), as illustrated in Figure 5.3(a). The output voltage is stable around 0.8 volts, and exhibits less than 4% ripple. The response of the converter to a change in the load is shown in Figure 5.3(b). At  $t = 200 \mu s$ , a second resistive load is included at the output in parallel with the original load to model the increase in load current. The waveform shown in the top of Figure 5.3(b) illustrates a change in the current within the second load. The bottom waveform is the transient response of the output voltage to an increase in the load current. After several cycles, the output voltage settles to a lower voltage due to the large voltage across the parasitic resistance of the secondary stage. A fast transient response of less than  $1 \mu s$  to changes in the load is achieved with this topology.

## 5.2 Performance degradation due to high step down ratio

A transformer with a high turns ratio is required for high step-down ratio conversion in DPA systems (for example, from 55 volts to 0.8 volts). The effects of the parasitic impedance within the secondary stage however become significant with an increasing turns ratio, producing a distorted current waveform within the converter. As compared with a sinusoidal current waveform, a distorted current waveform produces greater EMI. Two case studies with large input voltages and a high step-down ratio have been evaluated. The performance is compared to the converter described in Section 5.1. 22 volts and 55 volts are the input voltages in these two case studies. The output voltage in both cases is 0.8 volts. The turns ratio of each transformer is selected to match the step-down ratio of the converters. Other circuit parameters such as the  $LC$  tank, parasitic impedance, switches, and load behavior are the same as the transformer described in Section 5.1.

The current flowing through branch 1  $I_{branch1}$  in each of the converters is illustrated in Figure 5.4. Current spikes are observed in both cases when the branch current waveform crosses zero. The magnitude of the current spikes increases with a higher turns ratio of the transformer, as illustrated in Figure 5.4. In the case of the 55 volt input, the current waveform is significantly distorted, no longer maintaining a

sinusoidal shape. Note that the current flows through the branch after the switches are turned off due to the body diode within the power MOSFET, providing a path for reverse current to flow. Large current spikes of 90 amperes produces greater EMI, resulting in a less robust, noisy system. Moreover, a high voltage across the circuit elements is induced by the current spikes, potentially causing device failure.

The effect of the turns ratio of a transformer on the behavior of a converter is illustrated in Figure 5.5. An ideal transformer is assumed, where

$$\frac{V_1}{V_2} = \frac{I_2}{I_1}, \quad (5.1)$$



Figure 5.4: Performance degradation of a high turns ratio converter.

is maintained. The voltage across the primary winding is

$$V_1 = \frac{N^2 R_{out}}{N^2 R_{out} + R_{in}} V_{in}. \quad (5.2)$$

The current through the primary winding is

$$I_1 = \frac{V_{in}}{N^2 R_{out} + R_{in}}. \quad (5.3)$$

The output resistance and parasitic impedance of the rectifier stage degrade the operation of the primary  $LC$  tank. The resistance in the secondary stage contributes to the total parasitic resistance within the  $LC$  tank in the sinusoidal current generation stage. From (5.2) and (5.3), the parasitic impedance seen by the primary stage is  $N^2$  times larger than the actual resistance. The output impedance, therefore, has a greater effect on the  $LC$  tank than the parasitic impedance of the primary stage. With increasing  $N$ , the effect of the parasitic impedance on the secondary stage grows



Figure 5.5: Working principle of basic transformer

exponentially, leading to a distorted current waveform within the *LC* tank. The ZCS characteristics cannot be maintained due to distortion of the sinusoidal waveform, leading to current spikes in the rectifier stage. An LLC resonant converter with a high turns ratio is therefore a challenging requirement. Nonetheless, a high step-down ratio is required for DPA PoL converters due to the high input and low on-chip output voltage levels. A converter with a small turns ratio transformer and high step-down ratio is therefore highly desirable.

### 5.3 LLC resonant converter with distributed topology

To maintain high step-down ratio conversion, a scalable distributed topology for the LLC resonant converter is proposed. As illustrated in Figure 5.6, a distributed topology is achieved by cascading multiple primary stages in parallel with multiple secondary stages. A lower voltage drop is exhibited in this configuration across the primary winding of each branch. The secondary stage of each branch is connected to the load without contention from the other branches. The output of the secondary stage of each branch is connected to the power/ground pins of the resonant converter, and current flows from the power pins to the power delivery network within the package. The secondary stage voltage of the distributed topology is similar to the

voltage of a single branch LLC resonant converter. The primary stage voltage in the distributed topology is however reduced due to the voltage divider across all of the primary stages connected in series. Increasing the number of branches within the distributed topology further reduces the turns ratio of the transformer within the LLC resonant converter. High step-down ratio conversion is therefore achieved using multiple transformers with a low turns ratio.

The topology of the distributed LLC resonant converter is evaluated for high step-down ratio PoL conversion. The distributed topology consists of eight branches. Each branch exhibits a turns ratio of 8.5. An input voltage of 55 volts is converted to an output voltage of 0.8 volts. The passive  $LC$  elements, parasitic impedances, switches,



Figure 5.6: LLC resonant converter with distributed topology

and load behavior are identical for each branch. The performance is illustrated in Figure 5.7. The output voltage is stable around 0.8 volts with less than 3% ripple. Improved sinusoidal distortion and smaller current spikes are achieved as compared to a single branch high turns ratio converter. A reduction of 90% in the magnitude of the current spikes is achieved by the distributed topology. Reductions in EMI levels are discussed in the following section.

Power loss breakdown models of converters have been well researched [176–179].



Figure 5.7: Waveforms characterizing performance of distributed LLC resonant converter

Four components are considered in this chapter for the loss breakdown process: (1) primary side power MOSFETs, (2) transformer, (3) secondary side power MOSFETs, and (4) parasitic impedances. Note that a low  $R_{DS}$  MOSFET reduces the loss within the MOSFETs. PSPICE with the MOSFET model is used to determine both the switching and static power loss. The transformer loss, typically composed of core and winding losses, is a significant component of the total power loss of the converter [179]. The transformer loss varies significantly with different design specifications such as the core size, winding size, and winding material. One objective of this chapter is to evaluate the effects of EMI on the proposed resonant converter. Since the design of the transformer is out of the scope of this chapter, a 97% transformer efficiency is assumed for the power loss breakdown analysis.

The distribution of the power loss of each component in the eight branch distributed LLC resonant converter is illustrated in Figure 5.8. The transformers and MOSFETs within each of the branches are included. The secondary side MOSFETs contribute most of the total power loss due to the high currents flowing through the secondary stage and the large number of MOSFETs to support the eight branches. The maximum power efficiency of the distributed LLC resonant converter is 89.8%, as compared to a 91.7% maximum power efficiency in a single branch LLC resonant converter with the same step-down ratio. The distributed converter therefore exhibits greater total power losses as compared with the single branch resonant converter. Due

to the low turns ratio, the energy loss of each branch in the distributed topology is however less than the single branch resonant converter. The degradation in power efficiency in the distributed topology is small. The slightly reduced power efficiency is compensated by the high step-down ratio conversion in DPA systems, leading to comparable or greater system level power efficiency. Due to the lower turns ratio, the current waveform is also less distorted, producing lower EMI.

Due to the additional transformers and switches in the secondary stage, the area of the distributed LLC resonant converter increases linearly with the number of branches. The increase in area is dependent on the type of transformer and switch. Due to the lower current flowing through each branch within the distributed resonant



Figure 5.8: Power loss components for an eight branch distributed LLC resonant converter.

converter, a smaller switch is utilized. A smaller transformer is also used due to the lower turns ratio. The area of each branch within the distributed LLC resonant converter is therefore less than the single branch resonant converter. In one case study, the area of an eight branch distributed LLC resonant converter increased by 84.1% as compared with a single branch LLC resonant converter

Note that the distributed converter topology is highly scalable. The number of branches is dependent on the circuit area, specific voltages, and performance requirements. This distributed topology is particularly advantageous when high step-down ratio conversion is required, as in DPA systems. Due to the EMI mitigation characteristics, as described in the following section, the distributed topology is highly applicable to noisy environments, such as systems-in-package, IoT, and wireless applications.

## 5.4 Near field EMI in SiP environment

Although a high step-down ratio PoL converter-in-package is a promising technology, high EMI levels have become significant, degrading the system level signals and compromising power integrity. Due to the close proximity in an SiP environment between the source of EMI and the victims, and the high voltages in a high step-down ratio converter-in-package, EMI within a package has become a significant issue in SiP-based PoL converters. Full wave electromagnetic (EM) simulations

are therefore described in this section to evaluate the EMI characteristics of the proposed distributed LLC resonant converter. EMI fundamentals are briefly summarized in Subsection 5.4.1. The SiP environment used to evaluate the EMI characteristics of the proposed distributed LLC resonant converter is described in Subsection 5.4.2, followed by the simulation setup. Simulation results are discussed in Subsection 5.4.3.

### 5.4.1 EMI background

EMI is the phenomenon where an EM disturbance is generated by surrounding electronic devices which affect the proper function of the electrical circuits through a conducted or radiated path. Conducted EMI, also called coupling noise, consists of inductive and capacitive coupling through an electrical path. Conducted EMI filters has been widely used to reduce the conducted EMI [180,181]. Since the primary focus in this chapter is to evaluate how the resonant converter affects the electromagnetic environment within an SoP, conducted EMI, albeit important, is not the focus of this chapter. Radiated EMI is the undesired EM radiation generated by high frequency electronic devices without an electrical path. Due to the high operating frequency of the proposed distributed LLC resonant converter, radiated EMI is the primary issue in this converter.

The effects of EMI can be evaluated as two parts, the aggressor and the victim. Any electrical device or passive element can behave as an antenna, radiating an EM

wave to the surrounding environment. This structure is considered to be an EMI aggressor. The higher the frequency and the larger the antenna size, the stronger the EM radiation. Due to the high frequency, high power characteristics and large metal traces transferring high voltages, the PoL converter-in-package is considered here as an EMI aggressor. The entire system including the digital ICs and PoL converters is treated as an EMI victim. The surrounding environment outside the package is not considered; therefore, the near field range is the focus of this chapter.

#### **5.4.2 EMI evaluation setup in SiP environment**

The evaluation of package level EMI has recently drawn significant attention from the research community [121–124, 182, 183]. In [124], a thin micro-electro-mechanical system (MEMS) package is used to evaluate different EMI suppression methods. By redesigning the microbumps, adding ground vias, and applying a metal coating on the input/output ports, lower EMI is achieved. In [122], a near field EMI evaluation methodology has been developed for mobile DRAM applications. In [123], cavity resonance, as an EMI contributor, has been evaluated with a full wave solver. The effects on EMI within individual elements and possible EMI vulnerable spots in a package are also discussed. To the authors' best knowledge, no work has been published on evaluating EMI in an SiP environment where the PoL converters are treated as EMI aggressors.

The package utilized to evaluate EMI has been designed within an SiP environment, as illustrated in Figure 5.9. All of the power planes and via connections for the power distribution network exist within the package. No signal routing however exists within the package. As illustrated in Figure 5.9(a), three components, a digital



Figure 5.9: Package with a digital IC and two PoL converter. (a) Top view, and (b) side view.

IC and two PoL converters, are placed on the top side of the package, highlighted by the solid rectangles. Since the intended application of the SiP is high performance computing, the rectangle named as digital IC can include a silicon interposer and high bandwidth memory. Alternatively, the rectangle named as a PoL converter represents the entire package of the proposed resonant converter, including the switching devices, transformer, and power/ground pinout. The circular pads shown in the figure are connection pins between the package and digital IC, and the PoL converters. The dashed circles illustrate the location of the via stack which transfers 55 volts from the PCB to the PoL converter. A side view of the SiP system is shown in Figure 5.9(b). Dedicated power planes transfer current from the PoL converters to the digital IC after the 55 volts are converted into 0.8 volts.

As previously mentioned, radiated EMI behaves as an antenna. A large metal trace passing a high frequency, high voltage signal is a strong EMI aggressor. In the following simulation, the via stack, which transfers 55 volts to the PoL converter, is treated as an EMI aggressor. The metal trace within the PoL converter is not considered an EMI aggressor since the voltage on the metal trace within the PoL converter is only 0.8 volts after the conversion process. A metal coating can also be used to reduce EMI radiation around the PoL converter [124]. The EMI intensity is also related to the characteristics of the signal passing through the EMI aggressor (the EMI source) [184]. To compare the EMI levels of the proposed distributed LLC

resonant converter with the single branch LLC resonant converter, the current and voltage characteristics are extracted from Cadence Virtuoso simulations, as described in Section 5.3. A near field EMI simulation (using ANSYS SIwave [185]) is subsequently conducted for the entire package. The far field radiation of the PoL converter is also provided, comparing the EMI level with the International Special Committee on Radio Interference (CISPR) 22 standard, which is widely used for radiated and conducted EMI.

### 5.4.3 Simulation results and analysis

As previously mentioned, simulation of the near field EMI is conducted in Ansys SIwave, where the near field is evaluated on the surface of a cuboid that completely encloses the SiP. The offset of the X, Y, and Z axes of the cuboid is 3 mm. The observation surface illustrated in Figure 5.10 is the top surface of the cuboid.

The EMI levels across the entire package, characterized as electronic field intensity, is illustrated in Figure 5.10 as a contour plot. A layout of the metal layer and components of the package is also illustrated in Figure 5.10 depicting the near field distribution within the package. A darker color implies a higher EMI level. The closer to the EMI aggressor, the higher the EMI. The highest electric field intensity, around the via stack, is 740 V/m where the injected EMI is from the single branch LLC resonant converter, as illustrated in Figure 5.10(a). The highest electric field intensity,



(a)



(b)

Figure 5.10: Intensity of electric field across package. (a) Single branch LLC resonant converter, and (b) distributed LLC resonant converter.

around the via stack, is 210 V/m where the EMI source is the distributed LLC resonant converter, as illustrated in Figure 5.10(b). The distributed LLC resonant converter therefore exhibits more than 3X lower EMI than the single branch LLC resonant converter.

The 3 meter emission patterns of the standard resonant converter and distributed resonant converter are illustrated in Figure 5.11. A darker color implies a higher emission level. The emission pattern of the standard resonant converter (see Figure 5.11(a)) is similar to the distributed resonant converter (see Figure 5.11(b)) due to the identical operating frequency and package structure. The magnitude of the far field emission from the distributed resonant converter is lower than the standard resonant converter, matching the results in the near field EMI evaluation.

The far field radiation of the proposed distributed LLC resonant converter at a



Figure 5.11: Comparison of 3 meter far field EMI, (a) standard resonant converter, and (b) distributed resonant converter.



Figure 5.12: Comparison between far field radiation of distributed LLC resonant converter at 3 meters and the CISPR 22 standard.

distance of 3 meters is illustrated in Figure 5.12. The frequency ranges from 0 to 500 MHz. A zoom-in around the resolution frequency of 2 MHz is also illustrated in Figure 5.12 to provide sufficient resolution. The highest field intensity at 3 meters, 42 dBuV/m, is observed at 2 MHz due to the resonant frequency of the converter. CISPR 22 for class B 3 meter radiated EMI limit (40 dBuV/m) is also illustrated in Figure 5.12 as the dashed line. The far field EMI is below 40 dBuV/m across the

Table 5.1: EMI characteristics of distributed LLC resonant converter with different number of branches

| Branch number | 1       | 2       | 4       | 8       |
|---------------|---------|---------|---------|---------|
| EMI level     | 740 V/m | 560 V/m | 384 V/m | 210 V/m |

entire spectrum except for the resonant frequency at 2 MHz. Note that far field EMI is not the focus of this chapter. Shielding techniques can reduce the far field EMI [124]. Near field EMI is however difficult to eliminate due to the compact nature of a system-in-package.

A comparison of near field EMI among distributed resonant converters with different branch numbers is listed in Table 5.1. A performance comparison between the single branch LLC resonant converter and distributed LLC resonant converter is listed in Table 5.2. Although the power efficiency of the distributed LLC resonant converter is lower than the single branch LLC resonant converter, the distributed LLC resonant converter utilizes a much higher step-down ratio. Importantly, the distributed LLC resonant converter exhibits significantly lower EMI as compared with the single branch LLC resonant converter. The high step-down ratio and low EMI characteristics of the proposed distributed LLC resonant converter make the distributed LLC resonant converter a promising candidate for PoL conversion in an SiP environment.

Table 5.2: Comparison of single branch LLC resonant converter and distributed LLC resonant converter

|                      | $V_{in}$ | $V_{out}$ | Step down ratio | Turns ratio | EMI level | Ripple | Power efficiency |
|----------------------|----------|-----------|-----------------|-------------|-----------|--------|------------------|
| Single branch        | 12 V     | 0.8 V     | 15              | 15          | 740 V/m   | 4%     | 91.7%            |
| Distributed topology | 55 V     | 0.8 V     | 68.75           | 8.5         | 210 V/m   | 3%     | 89.8%            |

## 5.5 Summary

An LLC resonant converter operating at high frequencies to provide PoL DC-DC conversion is described in this chapter. The converter exhibits a stable load voltage with less than 4% ripple and a fast transient response of less than  $1 \mu s$ . Distortion of the sinusoidal waveform produces a 90 ampere current spike due to the high turns ratio of the transformer. A distributed LLC resonant converter that supports high step-down conversion is therefore proposed. A stable voltage with less than 3% ripple is achieved. A reduction of nearly 90% in the magnitude of the current spikes is also achieved as compared to a single branch LLC resonant converter with a similar step-down ratio. More than 3X lower EMI in the distributed LLC resonant converter is demonstrated as compared with the single branch LLC resonant converter utilizing the same step-down ratio. The distributed converter topology is also highly scalable, and compatible with standard power conversion architectures.

## Chapter 6

# Power Noise and EMI in VR Top and Bottom Placements

Due to the desire for small form factor and higher integration levels, systems-in-packages (SiP) have drawn significant attention from both the industrial and academic communities, targeting modern applications such as portable wireless systems and Internet of Things (IoT) devices. With SiP technology, voltage regulators (VR)-on-package have become increasingly popular for power delivery networks [13], supporting fast transient response and lower distribution loss [186]. The fast transient response is due to the close distance between the VR and integrated circuit (IC), which is key to enabling fine grained dynamic voltage frequency scaling. A point-of-load (PoL) VR is traditionally board mounted, supporting voltage conversion and regulation for the on-chip load through the power distribution network of the printed circuit board (PCB) and the package, as illustrated in Figure 6.1. As on-chip voltages scale below 0.8 volts, significant current flows through the resistive path, shown as

the dashed line in Figure 6.1. Greater power loss therefore occurs before reaching the on-chip load. Assuming a constant transmission power, higher voltages lead to a low current flowing through the transmission path. Resistive loss is therefore reduced. By moving the VR module from the PCB to the package, a high voltage (12 volts in the case study) is applied to the resistive path. Power loss due to the resistive path is therefore lower as compared with a VR-on-PCB topology. Moreover, higher



Figure 6.1: Board mounted VR. Note the resistive path (dashed line) between the VR and on-chip load

voltage transmission between the PCB/package interface requires less ball grid array (BGA) resources for the power distribution network, supporting a higher on-chip signal bandwidth.

Distribution losses in PCBs and packages are reduced in VR-on-packages due to the high voltage transmission path between the PCB and package. The power distribution network of the package, between the VR and on-chip load, needs to satisfy power integrity requirements. Overdesign of the redistribution layers may significantly increase package costs, while underdesign can lead to high IR drops, offsetting the power efficiency benefits of a VR-on-package.

As the operating frequency of the VR increases, electromagnetic interference (EMI) is another important concern, particularly in an SiP environment where the converters are placed close to the sensitive circuits [121, 122]. Not only may EMI affect the proper function of the digital circuits within the package, EMI can also pollute the surrounding electromagnetic environment which is crucial for wireless devices and IoT sensors. The process in which EMI is affected by the current profile flowing through a package and PoL converter is described in [29]. The physical design and topology of the VR-on-package also affects system EMI levels [123]. A comparison of the power integrity and EMI between the VR top and bottom placement within an SiP environment is therefore presented in this chapter.

The rest of the chapter is organized as follows. Two VR-on-package topologies,

VR top and bottom placement, are introduced and compared in Section 6.1. The design flow of the power distribution network of the package is summarized in Section 6.2. The design specifications of the two topologies of the VR-on-package are also described. In Section 6.3, the power integrity and near field EMI of the VR-on-package system are discussed. In Section 6.4, VR top and bottom placement topologies are compared in terms of EMI, IR drop, and power loss with the same number of layers in the package, ranging from 16 to 28 layers. Some conclusions are offered in Section 6.5.

## 6.1 Top and bottom placement

The topology of the VR-on-package determines the location of the VRs, the physical distance between the VRs and IC, and the design specifications of the power distribution network connecting the VRs to the IC. The power integrity and EMI performance can therefore vary with different topologies. The two topologies of a VR-on-package considered in this work, VR top and bottom placement, are illustrated in Figure 6.2. More complex system-in-package topologies [187] exploiting IC or package stacking technology [19] are not considered here.

As illustrated in Figure 6.2, the VR-on-package system consists of four parts: (1) an IC, (2) two VRs placed next to the IC, (3) a package that supports a system-in-package environment, and (4) decoupling capacitors placed on the bottom side of the



Figure 6.2: Sectional view of VR-on-package with two PoL converters placed next to the IC. The solid and dashed arrows depict the current path, respectively, between the ball grid array (BGA) power pins and the VRs, and between the VRs and the IC. The VRs are placed on (a) top, and (b) bottom.

package (not shown in the figure). Two voltage domains exist within the package power network. One domain is a high voltage (12 volt) power network, as illustrated by the red dash box. The high voltage power network connects the ball grid array (BGA) power pins and the VRs, where low current is transferred from the BGA power pins to the VRs, as the thin arrow indicates. The other power domain is a low voltage (0.8 volt) power network, as illustrated by the black dash box. A low voltage power

network connects the VRs and IC, where high current is transferred from the VRs to the on-chip power network through the 0.8 volt power network, as the thick arrow indicates. Note that in the VR top placement topology, the 12 volt power network is composed of multiple via stacks, connecting the BGA power pins directly to the VRs, as illustrated in Figure 6.2(a). Alternatively, a larger power plane is used in the VR bottom placement topology for the 12 volt power network, as illustrated in Figure 6.2(b). Since VR and IC design processes are not the focus of this chapter, only the pinout of these two modules is assumed to affect the package design characteristics.

As illustrated in Figure 6.2, the major difference between these two topologies is the location of the VRs. In Figure 6.2(b), not only does the location of the VRs affects the pinout topology of the BGA, it also occupies the bottom side of the package where the decoupling capacitors are located. The other major difference is the RDL between the VR and IC. In Figure 6.2(a), current is transferred horizontally from the VR to the IC once the voltage is converted to 0.8 volts. Note in Figure 6.2(b), the current is transferred vertically from the bottom to the top of the package.

By moving the VR from the top to the bottom of the package, the die size is no longer limited by the two VRs. The VR bottom placement topology can therefore support larger die sizes. Moreover, a less resistive RDL is achieved in the VR bottom placement topology due to the short vertical distance between the VRs and the IC.

Alternatively, the size of the power plane connecting the BGA to the VR (see Figure 6.2(b)) is much larger than the via stacks shown in Figure 6.2(a). These power networks, transferring a high voltage, can behave as an antenna, radiating an electromagnetic (EM) wave to the surrounding environment. The larger the physical size, the stronger the EM radiation. EMI is therefore a greater concern in the VR bottom placement topology. A quantitative comparison in terms of power integrity and EMI between these two topologies is therefore valuable for package design guidelines and characterization. The difference between these two topologies leads to different design principles and power integrity and EMI characteristics, which are discussed, respectively, in Sections 6.2 and 6.3.

## 6.2 Package design specifications

Packages that support VR top and bottom placement topologies are evaluated based on the design flow illustrated in Figure 6.3. The design flow includes (1) block diagram and schematic capture in Cadence OrCAD [188], (2) package layout design in Allegro [189], and (3) DC IR and EMI evaluation in Ansys SIwave [185]. Five modules are included in the schematic design in OrCAD: (1) IC, (2) VRs, (3) BGA, (4) decoupling capacitors, and (5) test signals. The footprint and pinout topology of each module is created to generate a netlist. The netlist is imported into Allegro, where the floorplan of each module, RDL, and test signal routing path is described.

Note that although the BGA module is included in the package design, the following EMI and DC IR evaluation is focused on the package without considering the BGA, package lid, and heat sink.

VR top and bottom placement topologies follow different design principles due to the differences in physical structure. A case study evaluating these two topologies is therefore conducted, targeting a server package application. The design specifications



Figure 6.3: VR-on-package design and evaluation flow

Table 6.1: Design specifications of packages supporting VR top and bottom placement topologies

|                                                 | VR top  | VR bottom |
|-------------------------------------------------|---------|-----------|
| Total metal layers                              | 28      | 16        |
| Package core layers                             | 14      | 2         |
| Package thickness ( $\mu\text{m}$ )             | 2,025   | 1,930     |
| Package size (mm)                               | 62 x 62 | 62 x 62   |
| Decoupling capacitance ( $\mu\text{F}$ )        | 2,892   | 2,880     |
| 12 volt power plane footprint ( $\text{mm}^2$ ) | 4       | 28.2      |
| Number of BGA pins                              | 2,780   | 2,780     |

of these two topologies is listed in Table 6.1. Note that the focus of this chapter is the design of the power network of the package to support a VR-on-package. In general, the IC, package, and VR are designed separately within an industrial design flow. Co-design of the package, IC, and VRs is out of the scope of this chapter. The pinout of the IC and VR is therefore assumed to be fixed in this chapter. The size of the package, constrained by the size of IC and VR, is therefore also assumed fixed.

Notably, the total number of metal layers in the VR top topology is 28, as compared with 16 layers in the VR bottom topology. The current path between the VR and IC in the VR top topology is in the horizontal direction, which is much longer than the vertical path through the package in the VR bottom topology. The relatively long current path within the VR top topology leads to a more resistive RDL, increasing the IR drops within the system. Multiple power plane stacking is therefore necessary to reduce the resistance of the RDL. The central core layers of the package

are utilized for the RDL due to the greater metal thickness. In the VR top placement topology, most of the decoupling capacitors are placed on the bottom of the package under the IC. Due to the area congestion at the bottom of the package in the VR bottom placement topology, some decoupling capacitors are placed on top of the package surrounding the IC. The number of BGA pins is maintained constant in these two VR topologies. The pinout topology of the BGA is however slightly different due to the change in location of the VRs.

### 6.3 EMI and power noise evaluation

As previously mentioned, EMI can be a major issue in SiP systems, particularly in VR-on-packages, where high voltage is transferred through the package [190, 191]. Power distribution networks that pass high frequency and high voltage signals can behave as an antenna, effectively becoming an EMI aggressor. In the following analysis, the power distribution network connecting the BGA power pins and VRs, which is 12 volts, is treated as an EMI aggressor. In this work, the VRs are not considered as an EMI aggressor since the voltage on the metal trace within the VR module is 0.8 volts. A metal coating can also be used to reduce EMI radiation around the VRs [124].

EMI is also related to the characteristics of the signal passing through the aggressor. To eliminate the effect of the signal, the same voltage and current profiles are

applied to both VR topologies. The initial current and voltage characteristics are extracted from [29], where the operating frequency of the resonant converter is 2 MHz. A spectrum of the near field simulation therefore ranges from 0 to 2.1 MHz, where the entire spectrum is divided into two regions. One region ranges from 0 to 1.9 MHz (100 frequencies). The other region ranges from 1.9 to 2.1 MHz (100 frequencies) to provide sufficient resolution around the solution frequency of 2 MHz. A near field EMI simulation using SIwave is conducted for the entire package.

The EMI level in terms of electric field intensity across the entire package for the two VR topologies is illustrated in Figure 6.4 as a contour plot. The observation surfaces are 3 mm above the top metal layer and beneath the bottom metal layer of the package for, respectively, the VR top and bottom placement topologies. The layout of the metal layer and components of the package are also illustrated in Figure 6.4 depicting the near field distribution with respect to the package. Note that the layout shown in Figure 6.4(a) is the top metal layer and the electrical components on top of the package, whereas the layout shown in Figure 6.4(b) is the bottom metal layer and the electrical components at the bottom of the package. Consider, as an example, the VR top placement shown in Figure 6.4(a). Three components, a digital IC and two PoL converters, are placed on the top side of the package, highlighted by the solid rectangles. The circular pads shown in the figure are connections between the package and digital IC, and the PoL converters. The dashed circles illustrate the



Figure 6.4: Intensity of electric field across package. (a) VR top placement topology, and (b) VR bottom placement topology.

location of the via stack which transfers 12 volts from the PCB to the PoL converters. In the VR bottom placement, the VRs are placed on the opposite side of the digital IC. Only the VRs are therefore illustrated in Figure 6.4(b).

For the scale of the field distribution illustrated in Figure 6.4, a darker shade implies a higher EMI level. As expected, the highest EMI level is exhibited around the high voltage power network in both VR topologies. The highest electric field intensity is 210 V/m in the VR top placement topology, as illustrated in Figure 6.4(a). The highest electric field intensity is 701 V/m in the VR bottom placement topology, as



Figure 6.5: Variation of IR drop and power loss with different number of core layers in the VR top topology

illustrated in Figure 6.4(b). More than 3X higher EMI is therefore exhibited in the VR bottom placement topology as compared with the VR top placement topology due to the larger power network connecting the BGA power pins to the VRs in the VR bottom topology.

Another important characteristic of a package is power integrity. Due to the large horizontal distance between the VR and IC, a greater IR drop is expected in the VR top placement topology, as described in Section 6.1. The VRs are modeled as voltage sources supplying 0.8 volts to the IC. The VR pins are grouped into  $V_{dd}$  and  $V_{ss}$ , the two voltage sources. The IC is modeled as a current load draining a constant 150 amperes through the power and ground package pins. The 150 amperes is assumed to be evenly distributed among the power/ground pins of the IC module. The DC IR simulation is subsequently conducted in Ansys SIwave.

The package resistance, the worst case IR drop (the greatest IR drop between the VR and the IC power pins), and the power loss through the package are listed in Table 6.2. The power loss in the VR bottom placement topology is lower than the VR top placement topology due to the short vertical distance between the VR and IC in the VR bottom topology. The vertical distance between the VR and IC in the VR bottom topology is the thickness of the package, about 2 mm, as listed in Table 6.1. The horizontal distance between the VR and IC in the VR top placement topology is however half of the width of the package, about 3 centimeters, as listed in Table 6.1.

A dedicated RDL is therefore added to the VR top placement topology to reduce the package resistance. The tradeoff is the additional RDL within the package.

As listed in Table 6.2, the extracted package resistance matches the worst case IR drop. Notably, the worst case IR drop in the VR bottom placement topology is larger than the VR top placement topology, which is unexpected, as described in Section 6.1. This effect occurs since the effective resistance of the power network of the package in the VR bottom topology is greater than in the VR top topology. Due to the pinout mismatch between the VRs and IC in the VR bottom topology, current flows horizontally within the power plane, leading to current crowding. The effective resistance of the power network within the package in the VR bottom topology is therefore greater due to this current crowding effect.

In the VR bottom topology, two VRs are connected to the bottom layer of the package, transferring current from the PCB to the power network of the package. Alternatively, the IC is connected to the top layer of the package, receiving current from the power network of the package. The pinout mismatch refers to the situation

Table 6.2: IR drop and power loss of package in VR top and bottom placement topologies.

|                                    | Package resistance | Worst case IR drop | Power loss |
|------------------------------------|--------------------|--------------------|------------|
| (i) VR top                         | 33.3 $\mu$ Ohm     | 5.0 mV             | 0.58 W     |
| (ii) VR bottom                     | 39.3 $\mu$ Ohm     | 5.9 mV             | 0.38 W     |
| $\frac{(i)-(ii)}{(ii)}$ Comparison | -15.3%             | -15.3%             | 52.6%      |

in which the power and ground (P/G) pins on top of the package do not overlap with the P/G pins at the bottom. This structure is due to the independent development of the package, IC, and VRs within different industrial design flows. The horizontal current in our work refers to the undesired current, flowing horizontally from the P/G vias to the P/G vias within the adjacent layer through the power plane within the package. Significant current flows over a long horizontal distance, leading to a highly resistive package. The power network for the package in the VR bottom topology ensures that the horizontal current flows gradually across multiple power planes to alleviate current crowding. The mismatch between the pinout of the VRs and IC is therefore an important issue in SiP power integrity, particularly for the VR bottom placement topology.

As discussed in the previous section, the core layers play an important role in reducing the resistance of an RDL. Additional core layers are utilized in the VR top topology to overcome the high resistance of the horizontal current paths. A case study has been included here to quantitatively evaluate the effects of the number of core layers on IR drop and power loss within a package. As listed in Table 6.1, 14 and 2 core layers are used, respectively, in the VR top and bottom topologies. In this case study, packages with 2, 4, 6, 8, 10, and 12 core layers are considered. Note that the core layers include both a power and ground plane, and the number of core layers is therefore even. The variation of IR drop and power loss in terms of the number of

core layers is illustrated in Figure 6.5. With the same number of core layers as the VR bottom topology, the VR top topology exhibits much greater IR drop and power losses.

$L di/dt$  noise is another important power integrity characteristic of a package. A case study is therefore described to evaluate the  $L di/dt$  noise of both the VR top and bottom topologies. As illustrated in Figure 6.6, a power delivery model is described based on the power delivery model of the Intel 850 Chipset Platform [192]. The current variation profile utilized in this case study is extracted from [192], as well as the on-chip resistance and capacitance. The package resistance  $R_p$  and inductance



Figure 6.6: Power delivery model for evaluating  $L di/dt$  noise of VR top and bottom topologies

$L_p$  are extracted from the package described in the previous sections. The value of  $R_p$  and  $L_p$  are, respectively, listed in Tables 6.2 and 6.3, as illustrated by the gray column. The  $L di/dt$  noise of the VR on top and bottom topologies is listed in Table 6.3 for a specific current load profile and decoupling capacitance. The placement and size of the decoupling capacitors are based on [13, 157, 159].

## 6.4 Package layer comparison

As previously mentioned, the total number of metal layers in the VR top placement is 28, as compared with 16 layers in the VR bottom placement. The number of layers plays an important role in reducing the resistance of an RDL. Additional metal layers are therefore utilized in the VR top placement topology to overcome the high resistance of the horizontal current path. A case study quantitatively characterizes the effects of the additional package layers on the IR drop and power loss within the package of the VR-on-top placement. As listed in Table 6.1, 28 and 16 metal layers are used, respectively, in the VR top and bottom topology. Packages with 16, 18, 20, 22, 24, 26, 28 metal layers are considered in this case study. The variation

Table 6.3:  $L di/dt$  comparison between the VR top and bottom topologies.

|           |       | Package inductance | Max $di/dt$ | $L di/dt$ noise |
|-----------|-------|--------------------|-------------|-----------------|
| VR top    | 32 pH | 385 A/ $\mu$ s     | -13.8 mV    |                 |
| VR bottom | 11 pH | 385 A/ $\mu$ s     | -4.7 mV     |                 |

of IR drop and power loss in terms of the number of package layers is illustrated in Figure 6.5. The power loss within the VR top placement is significantly lower with additional package layers. An 89.5% reduction in power loss is achieved with 28 package layers as compared with 16 package layers. The worst case IR drop is also reduced with additional package layers. The effectiveness of the additional package layers on reducing IR drop is however not as significant as on reducing power loss.

The EMI level, IR drop, and power loss of the VR top and bottom topologies with, respectively, 28 and 16 layers are compared in this section. Another case study compares the VR top and bottom placement with the same number of package layers, ranging from 16 to 28 layers. Similar to the previous case study, the variation of the number of layers in the package is achieved by adding or reducing the number of core layers. A comparison of VR and bottom placement in terms of EMI, IR drop, and power loss is illustrated in Figure 6.7 with 16 to 28 package layers. The EMI level of the VR top placement is much smaller than the VR bottom placement, as illustrated in Figure 6.7(a). Note that the EMI level slightly decreases with additional package layers in the VR top placement, whereas the VR bottom placement exhibits almost a constant EMI level with different number of layers. This behavior occurs because the additional package layers do not affect the electric field beneath the package. The IR drop within the VR top placement is greater than the VR bottom placement with a fewer number of layers due to the horizontal current path, as illustrated in Figure



Figure 6.7: Comparison between VR top and bottom placement with number of package layers ranging from 16 to 28 layers. (a) EMI, (b) worst case IR drop, and (c) power loss [193].

6.7(b). Additional layers lower the IR drop within the VR top placement. The IR drop within the VR top placement is eventually lower than the VR bottom placement as the number of layers increases. The power loss within the package is lower in both the VR top and bottom placement with additional package layers. The effectiveness of additional package layers on reducing the power loss in VR top placement is greater than VR bottom placement, as illustrated in Figure 6.7(c).

The EMI level, IR drop, and power loss, depending upon the VR placement topology, play an important role in the performance and power efficiency of the VR-on-package system. Alternatively, the number of package layers significantly affects the package cost [194]. Tradeoffs between the VR topology and number of package layers should therefore be carefully considered. For a system with strict EMI requirements, VR top placement should be utilized. Additional package layers should also be considered to reduce the power loss if the current demand is high. Alternatively, for a system which does not have strict EMI requirements, the VR bottom topology should be utilized due to the advantages of lower IR drop and power loss with fewer package layers.

## 6.5 Summary

A comparison between VR top and bottom placement within an SiP, targeting a specific high performance server application, is provided in this chapter. The EMI,

power integrity, and power loss of a 28 layer VR top and a 16 layer VR bottom topology are evaluated. The VR top topology exhibits lower worst case IR drop and much lower EMI as compared with the VR bottom topology. The tradeoffs are however a larger power loss and higher cost. The effect of the number of package layers on EMI, power integrity, and power loss within the VR top and bottom placements is also discussed. The worst case IR drop and package power loss of the VR top placement decrease with higher number of layers in the VR top placement topology; while the EMI, worst case IR drop, and power loss in the VR bottom placement topology are similar with greater number of package layers.

## Chapter 7

# Insertion Loss Due to Placement of Multiple Waveguide Crossings

Multi-core systems, such as a multi-core CPU, high performance GPU, and machine learning accelerator, are primary elements in high performance computing (HPC) due to the high data rates. Core-to-core and core-to-memory communication limits the performance of these multi-core systems. To achieve high bandwidth, increasingly complex interconnect networks are used, where additional metal layers are commonly utilized for signal routing. Furthermore, these complex interconnect networks together with higher clock frequencies have made interconnect coupling noise a significant issue in electrical circuits. The metal interconnect also dissipates significant dynamic power, leading to greater power consumption, hitting the “power wall.”

A promising technology to alleviate these bottlenecks of IC speed and power consumption is optical networks-on-chips (ONoC), which has attracted significant attention from both academia and industry due to the high bandwidth, power efficiency, and CMOS compatibility [195–197]. The interconnect network within an ONoC primarily consists of waveguides, as a medium to pass light, and optical routers, which direct the light signal to different circuits [198, 199]. Consider a  $2 \times 2$  ring resonator-based  $\lambda$ -router as an example, which is the elemental component of a large size  $\lambda$ -router and generic wavelength-routed optical router (GWOR) [198]. A  $2 \times 2$  ring resonator based  $\lambda$ -router consists of two waveguides crossing each other and two ring resonators. The ring resonators exhibit a specific resonant frequency which can be affected by the electrical devices surrounding the rings [196, 197]. As illustrated in Figure 7.1(a), when the electrical device is off, the ring resonators do not interfere with the light passing through the waveguide crossings. When the electrical device is on, the electrical device modifies the resonant frequency of the rings, ensuring that the light couple to the ring resonators, changing the direction of the light.

A waveguide crossing structure, a unique structure in silicon photonic systems, is formed in a  $2 \times 2$   $\lambda$ -router, as illustrated in Figure 7.1(a). These crossing structures do not exist in traditional electrical interconnect. Vias and additional metal layers are used to avoid crossing electrical signals. Light can however pass through a waveguide crossing, as the light signal is confined within the same waveguide after

passing through a crossing, albeit with some insertion loss. A large number of crossings exist within the routers of an ONoC system. 73 crossings are observed in an 8



Figure 7.1: Waveguide crossing, (a) a  $2 \times 2$   $\lambda$ -router, which consists of waveguides and micro-ring resonators, and (b) signal routing within a complex photonic system.

Table 7.1: Interconnect loss of different components in a silicon photonic system [202]

| Components  | Loss value |
|-------------|------------|
| Crossing    | 0.52 dB    |
| Bending     | 0.005 dB   |
| Drop        | 0.013 dB   |
| Propagation | 1.5 dB/cm  |

x 8  $\lambda$ -router [200]. Waveguide crossings are inevitable outside the routers due to the growth of integration complexity of silicon photonic systems [201], where light signals are routed within a single waveguide layer, as illustrated in Figure 7.1(b). Moreover, the contribution of the insertion loss due to these waveguide crossings to the total insertion loss within the silicon photonic system can be much greater than the other components. The propagation and bending loss of the waveguide are much smaller than the crossing loss, as listed in Table 7.1. An insertion loss of the waveguide crossings of 38.25 dB is observed [202] in a 16 x 16  $\lambda$ -router, where the insertion loss of the overall system is 44 dB. An ONoC place and route tool minimizing the insertion loss of the waveguide crossings is therefore required.

The rest of the chapter is organized as follows. Prior work on single waveguide crossing optimization and related algorithms to reduce the total number of crossings in a silicon photonic system is discussed in Section 7.1. The effect of the placement of multiple crossings on the total insertion loss is discussed in Section 7.2. The cause of this phenomenon is discussed in Section 7.3. A case study of a waveguide crossing

placement within an 8 x 8 GWOR router is described in Section 7.4. Some conclusions are offered in Section 7.5.

## 7.1 Previous Work

Low loss, low crosstalk waveguide crossings have been discussed, including direct crossings within the same layer and multi-layer waveguide crossings [203–206]. Direct waveguide crossings refer to crossings fabricated within a single layer and without any additional photonic devices. As illustrated in Figure 7.2, direct crossings includes single mode, multimode interference-based, elliptical, and four fold symmetric elliptical crossings [203–205].

Due to the CMOS compatibility and relatively simple fabrication of direct crossings, significant improvements have been achieved over the past few years in fabricating direct crossings [199, 204, 207]. A multimode interference-based waveguide



Figure 7.2: Direct waveguide crossing types. (a) Single mode crossing, (b) multimode interference-based crossing, (c) elliptical crossing, and (d) four fold symmetric elliptical crossing.

crossing with a lateral taper structure is proposed in [204]. CMOS compatibility is achieved with an insertion loss of 0.1 dB per crossing. Waveguide crossing arrays have also been recently developed [199, 207] by allocating multiple crossings next to each other with a constant pitch, achieving ultra-low insertion loss per crossing. The effect of crossing placement on routing ONoC systems has however not been discussed.

Modern complex silicon photonic systems with millions of transistors and hundreds of photonic components are currently in development [201]. Thousands of photonic components will soon be integrated into silicon-based photonic systems, where waveguide crossings can contribute significant loss, reducing the system power efficiency. Furthermore, traditional electrical routing tools do not support crossing structures. A waveguide crossing-aware routing tool is therefore highly desirable to support the development of complex silicon photonic systems.

A place and route tool for ONoC routers has been proposed [200, 202], where significantly lower insertion loss is achieved by minimizing the number of waveguide crossings within the router given a loss weight ratio between the crossing and propagation loss. The total insertion loss of the ONoC  $Loss_{tot}$  is

$$Loss_{tot} = Loss_p + Loss_b + Loss_d + Loss_c, \quad (7.1)$$

where  $Loss_p$ ,  $Loss_b$ ,  $Loss_d$ , and  $Loss_c$  are, respectively, the insertion loss of propagation, bending, drop, and crossings [200].  $Loss_c$  is

$$Loss_c = 0.52dB \cdot N_c, \quad (7.2)$$

where the insertion loss per crossing is assumed to be 0.52 dB, and  $N_c$  is the number of total crossings in an ONoC. The effect of the placement of multiple crossings on the crossing loss is however not considered within routing algorithms. The loss due to waveguide crossings, as determined by (2), is therefore not accurate. An enhanced routing algorithm considering the effect of waveguide crossings placement is therefore highly desirable.

## 7.2 Placement of multiple waveguide crossings

The total signal loss of a waveguide system is based on (1) and (2). Finite-difference time-domain (FDTD) simulations or experimental measurements however do not support this loss model. In this section, three examples of waveguide crossings are evaluated. The signal loss of a waveguide is shown here to not increase linearly with the number of waveguide crossings; furthermore, the location of the waveguide crossing within a waveguide also affects the total signal loss. The effect of the location

of the waveguide crossings on signal loss has however not been thoroughly discussed in the literature.

To demonstrate that the placement of the waveguide crossings affects the signal loss of a waveguide system, three experiments in OptiFDTD [208], a 3-D FDTD simulation tool, are described here. The waveguide system within the three experiments consists of a long horizontal waveguide for passing light and several short vertical waveguides to create crossings (see Figure 7.4(a)). The length of the long and short waveguides is, respectively,  $24 \mu\text{m}$  and  $3 \mu\text{m}$ . The width of the long and short waveguides is  $0.5 \mu\text{m}$ . A light source, a gaussian modulated continuous wave, is injected from the left side of the long waveguide. The time and frequency domain characteristics of the light source are illustrated in Figure 7.3. Two observation points are included within the long waveguide to capture the intensity of the light signal entering and departing the waveguide system. The location of the two observation points is fixed to ensure the same waveguide system for all three experiments. The signal loss of the waveguide system is therefore determined by comparing the signal intensity at these two observation points.

### 7.2.1 Example one

In experiment one, the effect of the location of a single waveguide crossing on the total signal loss is evaluated. The simulation setup in OptiFDTD is illustrated in



Figure 7.3: Input light source. (a) Time domain, and (b) frequency domain.

Figure 7.4(a). The long horizontal waveguide is placed at the center of the signal light path. Two observation points are located at, respectively,  $x = 1.5 \mu\text{m}$  and  $x = 11.5 \mu\text{m}$ . A gaussian modulated continuous wave with a  $1.4 \mu\text{m}$  wavelength is utilized as the input signal light, denoted as the strip illustrated in Figure 7.4(a). In this experiment, six scenarios are considered with six different vertical waveguide locations,  $x = 2.5 \mu\text{m}$ ,  $x = 4 \mu\text{m}$ ,  $x = 5.5 \mu\text{m}$ ,  $x = 7 \mu\text{m}$ ,  $x = 8.5 \mu\text{m}$ , and  $x = 10 \mu\text{m}$ . 3-D FDTD simulations are conducted for each location of the vertical waveguide. The normalized signal intensity at each observation point and the signal loss of the entire waveguide system are illustrated in Figure 7.4(b). The signal loss of the waveguide system ranges from 1 dB to 1.2 dB for the six scenarios, where both the length of the waveguide and the number of crossings are maintained constant. Varying the location of the waveguide crossings leads to a change in the signal loss of the system.



Figure 7.4: Effect of the placement of a single waveguide crossing on the signal loss. (a) Experimental setup, and (b) FDTD simulations.

### 7.2.2 Example two

The effect of the relative location of two waveguide crossings on the total signal loss is evaluated in experiment two. The simulation setup in OptiFDTD is illustrated in Figure 7.5(a). The setup for the horizontal waveguide, observation points, and input signal light are the same as experiment one. Two vertical waveguides are evenly placed between the two observation points along the horizontal waveguide, as



Figure 7.5: Effect of the placement of two waveguide crossings on signal loss. (a) Experimental setup, and (b) FDTD simulations.

illustrated in Figure 7.5(a). Four scenarios are considered in this experiment based on four different separations between the two waveguide crossings, 2  $\mu\text{m}$ , 4  $\mu\text{m}$ , 6  $\mu\text{m}$ , and 8  $\mu\text{m}$ . Similar to the first experiment, 3-D FDTD simulations are conducted for each scenario. The normalized signal intensity at each of the observation points and the signal loss of the overall waveguide system are illustrated in Figure 7.5(b). The signal loss of the overall waveguide system ranges from 1.8 dB to 2.3 dB for the



Figure 7.6: Effect of the placement of multiple waveguide crossings on signal loss. (a) Experimental setup, and (b) FDTD simulations.

four scenarios, where both the length of the waveguide and the number of crossings are maintained constant. Varying the relative location between the two waveguide crossings leads to a change in the signal loss of the system.

### 7.2.3 Example three

The effect of the number of waveguide crossings on the total signal loss and average loss per crossing is evaluated in experiment three. The simulation setup in OptiFDTD is illustrated in Figure 7.6(a). The setup for the horizontal waveguide, observation points, and input signal light are the same as in the previous experiments. Multiple vertical waveguides are evenly allocated between the two observation points along the horizontal waveguide, as illustrated in Figure 7.6(a). Ten scenarios, ranging from 1 to 10, are considered in this experiment with a different number of vertical waveguides allocated between the two observation points. 3-D FDTD simulations are conducted for each scenario. The signal loss of the entire waveguide system and the loss per waveguide crossing with different number of crossings are illustrated in Figure 7.6(b). The signal loss of the waveguide system ranges from 1.2 dB to 8.7 dB for the ten scenarios. As illustrated in Figure 7.6(b), the loss per crossing is not constant for a different number of waveguide crossings although the light source and overall distance are the same. In fact, the loss per crossing varies with a different number of crossings in the waveguide system, ranging from 0.78 dB to 1.18 dB. The variation in the total number of crossings in the waveguide system therefore leads to a change in the loss per crossing. In the following section, multimode interference and Bloch waves are introduced to explain this phenomenon. From a system level perspective, the standard assumption that the total signal loss is the sum of the number of waveguide

crossings times the loss per crossing is not accurate. Future work in terms of loss analysis and optimization considering the placement of the waveguide crossings is therefore desirable to precisely capture the total signal loss of a complex photonic system.

### 7.3 Discussion

For a single waveguide crossing, this phenomenon can be explained by multimode interference (MMI) [209]. As illustrated in Figure 7.2(b), the size of a crossing area is larger in a multimode interference-based crossing to support multiple modes. The self-imaging principle in MMI is widely used in waveguide crossings to minimize insertion loss [210–212]. Based on the principle of self-imaging in MMI, each mode, supported by the waveguide, propagates with a characteristic phase velocity. The image of the input field therefore periodically evolves along the multimode waveguide. For self-imaging, the propagation constant of mode  $n$   $\beta_n$  is

$$\beta_n = \beta_0 - n(n + 2)\pi/3L_\pi, \quad (7.3)$$

where  $\beta_0$  is the propagation constant of the fundamental mode, and  $L_\pi$  is the beat length of the self-imaging process [210]. One can further show that

$$\beta_n = \beta_0 \sqrt{1 + \frac{K_{T0}^2 - K_{Tn}^2}{\beta_0}}, \quad (7.4)$$

where

$$K_{Tn} = (n + 1)\pi/W_{en} \quad (7.5)$$

is the wave number of mode  $n$ , and  $W_{en}$  is the effective width of the MMI for the  $n^{th}$  mode [213]. For a symmetric MMI waveguide supporting only three modes, the self-imaging condition is

$$\beta_n - \beta_0 = 2\pi N/L_{MMI}, \quad (7.6)$$

where  $N$  is an integer, and  $L_{MMI}$  is the length of the MMI waveguide [199].

A rule of thumb in the design and optimization of multimode interference-based waveguide crossings is to control the convergence of the mode at the cross-sectional area to minimize scattering. One objective is to ensure that the mode evolution is symmetric before and after the crossing area[212]. Placement of the crossings with a different pitch, for example, breaks the symmetric characteristics of the mode evolution, leading to additional crossing loss. Although no additional multimode interference area exists around the crossings in the experiments, as described in Section 7.2, similar reasoning can be applied. Higher order modes are excited within the

waveguide crossing due to the lost of confinement of the light. The evolution of the mode is therefore affected by the distance to the next waveguide crossing, leading to a variation in the loss per crossing. Thus, the location of an individual crossing and the distance between crossings affect the loss per crossing, respectively, in experiments one and two.

In a waveguide crossing array, where multiple waveguide crossings are placed along a long waveguide with a constant pitch, as described in experiment three in Section 7.2, the effect of the placement of the waveguide crossings on the insertion loss per crossing is explained by a Bloch wave. As shown in Figure 7.6, multiple periodic structures are produced by the waveguide crossing arrays with a different number of crossings and a variable pitch. A Bloch wave is formed within the waveguide crossing array, where the energy of the light depends upon the Bloch mode. Similar to the mode within a waveguide crossing, a Bloch mode can be tuned by varying the geometric parameters of the waveguide system, for example, the spacing between crossings. A low loss Bloch mode is achieved in [199, 207] by changing the pitch of the waveguide crossings within a crossing array. Scattering is significantly suppressed through the low loss Bloch mode, leading to ultra-low loss per crossing. The same concept has also been discussed in the context of free space beam waveguides [214]. Waveguide crossing arrays with a different number of crossings and line pitch, as described in experiment three, also exhibit a different loss per crossing.

## 7.4 Case Study of an 8 x 8 GWOR ONoC Router

The effect of the placement of the crossings on the insertion loss is discussed in previous sections. Ignorance of this effect within existing ONoC routing algorithms, as well as the lack of insight into existing low loss waveguide crossing paths can lead to overdesigned waveguide crossings and therefore lower system power efficiency. To demonstrate the effect of this phenomenon on a practical ONoC circuit and routing algorithm, a case study of an 8 x 8 GWOR router is described in this section.

A router is a key element in an ONoC, requiring a large number of waveguide crossings. An 8 x 8 micro-ring GWOR router is considered here. Multiple topologies of micro-ring routers exist in ONoC systems, including a crossbar, hitless router, GWOR, and  $\lambda$ -router [198]. A GWOR router is considered in this work due to the advantage of utilizing fewer micro-rings and evenly distributed waveguide crossings as compared with other topologies such as a  $\lambda$ -router [215]. A comparison of different router topologies is not the intention of this chapter.

A schematic of an 8 x 8 GWOR router is illustrated in Figure 7.7(a). Each of the eight ports contains an input and output port depicted, respectively, by  $I$  and  $O$ . The ports are placed at the periphery of the router for enhanced routability and scalability. Micro-ring resonators select and change the direction of the light, routing the light signals to different ports. Six wavelengths, from  $\lambda_1$  to  $\lambda_6$ , are required to achieve full routability in an 8 x 8 GWOR router. Consider the path from  $I_6$  to  $O_2$ .



(a)



(b)

Figure 7.7: 8x8 GWOR router. (a) Demonstration of router topology with waveguide and micro-ring resonators, and (b) FDTD simulation setup for worse case scenario.

The light signal enters the router from  $I_6$  and propagates along the waveguide until the signal reaches the waveguide crossing next to micro-ring  $\lambda_3$ . The micro-ring is

Table 7.2: Comparison of 8 x 8 GWOR router.

|                 | PROTON [202] | Manual [215] | Case Study Router |
|-----------------|--------------|--------------|-------------------|
| Crossing number | 38           | 72           | 24                |
| Max loss (dB)   | 8.4          | 21.4         | 1.1               |

activated to couple the light signal to the waveguide perpendicular to the original waveguide, arriving at port  $O_2$ .

The transmission loss when the light signal passes through the router through different waveguide crossings is evaluated. The worst case scenario, the path from  $I_3$  to  $O_4$ , is considered in this case study due to the large number of waveguide crossings. The simulation setup is the same as discussed in Section III. The pitch between each adjacent waveguide crossing is 10  $\mu\text{m}$  including the space for the micro-rings between crossings, as illustrated in Figure 7.7(b). The micro-rings are inactive. Two observation points, port  $I_3$  and  $O_4$ , are included within the waveguide to determine the transmission loss through this path. The horizontal power transmission to the input wave is evaluated, the transmission rate is 78%. Increasing the waveguide crossing pitch from 10  $\mu\text{m}$  to 12  $\mu\text{m}$  increases the transmission ratio to 80%. The insertion loss can therefore be reduced by placing waveguide crossings in a slightly different pitch, although the total length of the waveguide is increased. Based on existing routing algorithms, the same number of waveguide crossings with a longer of waveguide will lead to greater loss.

A comparison among different 8 x 8 GWOR routers is listed in Table 7.2, including

an automatically generated router [202], a manually designed router [215], and the 8 x 8 GWOR router discussed in this case study. The case study router presented here exhibits the smallest number of crossings due to the characteristics of a GWOR [198], leading to smaller loss. Since the focus of the case study is waveguide crossing placement, the worst case scenario does not consider bending and drop loss, which are included in the other two routers [202, 215]. This further explains the reason why max loss of case study router is much smaller than the other two routers.

## 7.5 Summary

The effect of the placement of multiple waveguide crossings on system level signal loss is discussed in this chapter. Three experiments with different waveguide crossing placement scenarios are evaluated to determine the total signal loss. It is observed that the location of a single waveguide crossing, the relative location of two crossings, and the total number of crossings can affect the loss per crossing. The location of the waveguide crossings therefore affects the total system loss. This phenomenon is explained by the Block wave and self-imaging principle of multimode interference. The worst case insertion loss of an 8 x 8 GWOR router is reduced by changing the waveguide crossing pitch. Optimization algorithms that consider signal loss due to waveguide crossing placement are therefore necessary for the development of future photonic systems.

## Chapter 8

# Design Guidelines for RDL-Based Power Networks

Improvements in microprocessors have slowed and become more costly due to the increasing challenges in scaling CMOS technology [33]. By exploiting the vertical direction three-dimensional integrated circuits (3-D IC) provide a promising solution to extend scaling [19]. Vertical integration in 3-D ICs achieves a smaller die as well as significantly shorter global interconnects, leading to higher system performance and lower power dissipation. Multiple forms of 3-D ICs exist, including monolithic, contactless, and through silicon via (TSV) based topologies. TSVs are the key technology in TSV-based 3-D ICs, where signal communication and power distribution between layers are achieved with, respectively, signal TSVs and power/ground (P/G) TSVs. 3-D ICs are also a natural platform for heterogeneous integration, where layers can be individually designed, optimized, and fabricated with different semiconductor processes [17].



Figure 8.1: Current path within a 3-D power distribution network consisting of vertical current paths through the P/G TSVs and horizontal current paths within each 2-D IC.

Despite the successful commercial development of 3-D products such as 3-D DRAM memory, high bandwidth memory, and field programmable gate arrays (FPGAs) [75–78], 3-D ICs suffer from power integrity issues within the 3-D power delivery systems [25, 216, 217]. Due to the introduction of P/G TSVs, the current path within a complex 3-D system can be highly sophisticated, affecting both the reliability and performance of 3-D power delivery systems. As discussed in Chapter 2, a model of a

3-D power delivery system consists of a conventional 2-D power grid and P/G TSVs, which are serially connected to adjacent 2-D power grids. Two current paths therefore exists within a 3-D power delivery system, as illustrated in Figure 8.1. As highlighted by the dashed arrow in Figure 8.1, the horizontal current path designates the power network transferring current from a nearby P/G TSV within a 2-D power grid. The magnitude and current path of this current are set by the on-chip current demand of this 2-D layer. Alternatively, the vertical current path transfers current from the bottom P/G TSVs to the upper P/G TSVs, as highlighted by the thick arrow shown in Figure 8.1. This vertical current path is achieved by the serially connected P/G TSVs between layers. The magnitude of the vertical current within certain P/G TSV is set by the total current demand of all of the layers above the TSVs.

The development of a power delivery network within a 3-D IC is highly challenging due to these vertical current paths. One of the key factors affecting the current path is the P/G TSVs [84–88, 217–222]. Another key factor is the redistribution layer (RDL) within the 3-D power delivery network. This topic has however to date received little attention from the research community [223–225]. An RDL within a 3-D power delivery system is illustrated in Figure 8.2, where a face-to-back stacked topology is depicted. As illustrated in Figure 8.2(a), the RDL is between the substrate of layer N and the metal layer of layer N+1. As illustrated in Figure 8.2(b), the RDL behaves as the interface between the P/G TSVs and adjacent P/G TSVs, and between the P/G



Figure 8.2: RDL as an interface between a P/G TSV and an adjacent P/G TSV, and between a P/G TSV and a 2-D power grid. (a) The location of the RDL within a 3-D IC between two adjacent layers; and (b) a zoom-in of the RDL, where the RDL supports both horizontal and vertical current paths.

TSVs and the 2-D power grid, which supports, respectively, the horizontal path and vertical path. The RDL is therefore a critical component in the power delivery system

within a 3-D IC. An RDL plays an even more impactful role within heterogeneous 3-D systems, where each layer is individually designed, optimized, and fabricated. A comprehensive evaluation of RDLs with different 3-D manufacturing processes as well as the effect of an RDL on the power delivery system is described in this chapter. In addition, an analysis of a grid-based RDL and design guidelines for RDLs to tackle power integrity challenges in heterogeneous 3-D systems are discussed.

The rest of the chapter is organized as follows. Prior work related to RDLs in 3-D ICs is discussed in Section 8.1. Technical background characterizing an RDL is also introduced. The effect of different 3-D integration topologies and TSV fabrication technologies on an RDL is discussed in Section 8.2. In Section 8.3, the effect of a grid-based RDL on the power integrity of a 3-D system is discussed. Multiple scenarios are introduced to demonstrate the advantages of a grid-based RDL for heterogeneous 3-D systems. Some conclusions are offered in Section 8.4.

## 8.1 Background and previous work

In conventional 2-D ICs, the RDL is effectively within the top metal layer, above the back-end-of-line (BEOL) and beneath the controlled collapse chip connection (C4) bumps, as illustrated in Figure 8.3. Note that the RDL is a separate fabrication process, which is after the BEOL and prior to the bonding process [226]. The bond pads and C4 bumps are connected by the RDL, and the C4 bumps are connected to the



Figure 8.3: RDL as an interface between the IC and package.

bond pads on the package. The RDL is the interface between the IC and package. In general, within an industrial design flow, the IC and package are designed separately [193]. Any mismatches between the package pinout and on-chip pinout can therefore be corrected by the RDL as a routing layer between the package and IC [226].

In a 2.5-D system, the RDL is generally known as the metal lines within the silicon interposer between the IC and package, as discussed in Chapter 2. As illustrated in Figure 2.8, the vertical connections between the package and ICs are achieved with fine pitch TSVs. In addition, the ICs are interconnected through the RDL within the interposer for inter-chip communication. The RDL is the interface between multiple ICs within a 2.5-D system.

RDLs, in 3-D integrated systems, have not attracted significant attention from both industry and academia. Little work has been published on RDLs [223–225, 227], and no clear definition of a 3-D RDL exists in the literature. A 3-D RDL is defined in this work as: 1) the metal lines connecting the TSVs within layer N to the TSVs

within layer  $N+1$ , and 2) the metal lines connecting the TSVs within layer  $N$  to the on-chip interconnect within layer  $N$ . An RDL is briefly mentioned in a 3-D integrated environment [228–230]. However, in these examples [228–230], the RDL is between the bottom layer and the package, which is effectively a 2-D RDL. In [225, 231], a novel layer bonding technique is proposed for via-first 3-D integration, where a damascene patterned metal/adhesive RDL is achieved within the bonding layer. In [224], two methods for fabricating an RDL in TSV-based 3-D ICs are introduced. A comparison between these two methods as well as some fabrication guidelines are also provided. In [223], the signal integrity of 3-D interconnect is discussed with different TSV fabrication techniques. A physical model of a TSV-RDL-bump interconnect is provided, followed by extraction of the impedance characteristics. In [227], the RDL between the C4 and micro-bumps is investigated in a two layer 3-D system. An IR drop analysis of different RDL schemes and TSV fabrication technologies is also included. Interactions between inter-layer current due to the RDL is however not considered in [223, 227].

Note that the focus of this chapter is on RDLs for power delivery systems rather than for signal routing. Optimization and modeling of 3-D power networks have been investigated but these works [83–89, 221, 222] do not consider power/ground (P/G) RDL networks, leading to poorly or overdesigned 3-D power networks. A comprehensive analysis of the necessity and functionality of the RDL in 3-D power

networks is therefore provided in this chapter. Intra-layer and inter-layer current transfer within an RDL for 3-D power delivery is discussed here for the first time.

## 8.2 RDL with different 3-D manufacturing methods

As discussed in the previous section, the P/G RDL in a 3-D IC is the interface between the P/G TSVs and adjacent 3-D layers, and between the TSVs and the 2-D power grid within the same 3-D layer. The TSVs therefore significantly affect the RDL. Multiple TSV fabrication processes exist, whose effect on the P/G RDL is discussed in Section 8.2.1. The effect of 3-D stacking topologies on the P/G RDL is reviewed in Section 8.2.2.

### 8.2.1 RDL within different TSV fabrication processes

Based on the TSV fabrication process, four different TSV types exist: via-first, via-middle, backside via-last, and frontside via-last. Depending upon the connection of the P/G TSVs to the 2-D power grid, two types of TSVs are used within these four TSV types. Via-first, via-middle, and backside via-last TSVs are described as a type A TSV, where the P/G TSVs are connected to the 2-D power grid through the

BEOL layers. A frontside via-last TSV is described as a type B TSV, where the P/G TSVs are connected to the 2-D power grid through an additional RDL.

### 8.2.1.1 RDL within type A TSV fabrication process

In the via-first TSV method, TSVs are fabricated before the front-end-of-line (FEOL) process. Via-first TSVs are connected to the top metal layer by the following FEOL and BEOL processes. As illustrated in Figure 8.4(a), the TSV does not pass through the metal layer, saving on-chip metal resources. To endure the high temperatures during the FEOL process, polysilicon is typically utilized as the fill material for the via-first TSVs, leading to high resistance TSV paths. Via-middle TSVs are fabricated after the FEOL and prior to the BEOL process. Similar to the via-first method, the TSVs rely on the BEOL process to connect to the top metal layer. The choice of fill material for the via-middle method is however relaxed due to the lower temperature during the BEOL process. Tungsten is typically utilized, which exhibits a higher conductivity as compared with polysilicon.

Alternatively, in the via-last method, TSVs are fabricated after the BEOL process. A higher conductivity material, for example, copper, can therefore be utilized as the fill material for the TSVs. Depending upon whether the front or the back of the TSV etching process is used, two types of via-last methods exist: backside via-last and frontside via-last. In the backside via-last method [232, 233], the etching starts



Figure 8.4: Type A TSV and current path between the TSV and load. (a) Cross sectional view, and (b) lumped circuit model.

from the silicon substrate and ends at the bottom layer of the BEOL, for example, M1. The TSVs are therefore quite similar to the via-first and via-middle methods in terms of the connection between the TSV and the power grid, as illustrated in Figure 8.4(a). These TSVs, for convenience, are described in this chapter as type A TSVs.

Although the fabrication process, physical size, and material of the TSVs vary significantly among the type A TSVs, the current paths are quite similar, as illustrated in Figure 8.4(a). The current path between the via-first TSV and the load is depicted. Note that the current path from the load to the ground network is identical to the power network. It is therefore not illustrated in Figure 8.4(a). As highlighted by the solid arrow line, current from layer N-1 initially passes through the P/G TSVs in layer N. Due to the connection between the P/G TSV and the BEOL metal lines, current is transferred from the TSVs to the global power grid through a via stack [33]. The current is subsequently distributed to the local circuits through the power network within layer N. Alternative current paths may exist [89], as illustrated by the dashed arrow line in Figure 8.4(a). In these alternative paths, current is directly transferred from the P/G TSVs to the local power metal lines, for example, M2 or M3. The current is subsequently transferred to the local circuits, bypassing the via stacks and global power grid. This alternative current path is a physical design option, which does not apply to all 3-D systems [89]. In addition, alternative current paths utilizing

local power metal lines exhibit significant resistance due to interconnect scaling and the relatively large pitch of the P/G TSVs [234].

A lumped circuit model of a single layer of a 3-D power network with a type A TSV is discussed here. The current path within a single layer is illustrated in Figure 8.4(b), and can be applied to other layers. The P/G TSVs, 2-D power grid, and load are included in the circuit model, as illustrated in Figure 8.4(b).  $R_{TSV}$ ,  $R_{stack}$ ,  $R_{MT}$ ,  $R_{grid}$ , and  $R_{M1}$  represent, respectively, the resistance of the P/G TSV, via stacks between the TSV and the top metal layer, top metal line connecting the via stacks to the 2-D power grid, 2-D power grid, and alternative current paths within the lower metal layer. The voltage drop within a single layer of a 3-D power network with a type A TSV depends upon the circuit model. The voltage drop on layer  $n$  within an  $m$  layer 3-D IC is

$$V_{drop}^{n,m} = I_{load}[(m-n+1)R_{TSV} + \frac{R_{M1}R_{stack}(m-n+1) + R_{M1}(R_{MT} + R_{grid})}{R_{M1} + R_{MT} + R_{grid} + R_{stack}}], \quad (8.1)$$

where the current demand of each layer is  $I_{load}$ . Based on (8.1), high current  $I_{load}(m-n+1)$  passes through the  $R_{stack}$  and  $R_{TSV}$  path.

Two types of 3-D RDLs, including the connection between the TSVs and 2-D power grid and between the TSVs and adjacent TSVs, are named, respectively, a type one and type two RDL. In type A TSVs, a dedicated type one RDL is not required as the via stacks within the BEOL connect the TSVs to the global power

grid, as illustrated in Figure 8.4. A type two RDL connects the 2-D power grid in layer N to the P/G TSVs in layer N+1, as illustrated in Figure 8.5. Due to the individual manufacturing process of each 3-D layer, mismatches between the power grid in layer N and the P/G TSVs in layer N+1 exist. A type two RDL is therefore required in general 3-D ICs. This issue has been missing in the literature, where a perfect match between the power grid pad and the TSV distribution is assumed [75–78].

### 8.2.1.2 RDL within type B TSV fabrication process

Alternatively, in the frontside via-last method [235], the etching starts from the top metal layer of the BEOL and the TSV passes through the entire layer, creating metal routing blockages, as illustrated in Figure 8.6. Due to the unique fabrication process, these TSVs are named here as type B TSV. Note that type B TSVs are fabricated after the BEOL and passivation processes are completed. In addition,



Figure 8.5: Cross sectional view of type two RDL for type A TSVs. The type two RDL connects the bond pad of the power grid in layer N to the P/G TSVs in layer N+1.

type B TSVs pass through the entire layer, including both FEOL and BEOL. No metal line connections therefore exist between the TSVs and the global power grid, as illustrated in Figure 8.6(a). Also note that the alternative current path in the type A TSVs does not exist in type B TSVs. This difference is due to the insulation layer process during the TSV fabrication stage [19]. No metal connection therefore exists between the TSVs and the local metal power lines.

As highlighted by the solid arrow line shown in Figure 8.6(a), current is vertically transferred from layer N-1 to layer N through the P/G TSVs. The current is subsequently transferred from the P/G TSVs to the global power grid, as illustrated by the dashed line shown in Figure 8.6(a). As previously discussed, no BEOL metal lines exist to support this horizontal current path. A dedicated type one RDL above the passivation layer is therefore required to connect the P/G TSVs to the global power grid. Note that this requirement is one of the major differences between type A and type B TSVs. Similar to type A TSVs, the current is distributed to the loads through the power network within layer N, as highlighted by the solid arrow line shown in Figure 8.6(a).

A lumped circuit model of a single layer within a 3-D power network with type A TSVs is described here. The current path within a single layer is illustrated in Figure 8.6(b), and can be applied to other layers.  $R_{TSV}$ ,  $R_{RDL_1}$ , and  $R_{grid}$ , represent, respectively, the resistance of the type B TSV, the type one RDL connecting a TSV



Figure 8.6: Type B TSV and current path between TSV and load. (a) Cross sectional view, and (b) lumped circuit model.

to a 2-D power grid, and the 2-D power grid. Note that  $R_{stack}$  or  $R_{MT}$  in a type A TSV does not exist in a type B TSV, which is replaced, respectively, by  $R_{TSV}$  and  $R_{RDL_1}$ . The voltage drop within a single layer of a 3-D power network with a type B TSV on layer  $n$  within an  $m$  layer 3-D IC is

$$V_{drop}^{n,m} = I_{load}[(m - n + 1)R_{TSV} + R_{RDL_1} + R_{grid}], \quad (8.2)$$

where the current demand of each layer is  $I_{load}$ . Based on (8.2), high current  $I_{load}(m - n + 1)$  only passes through the  $R_{TSV}$  path.

A type two RDL is also required to support TSV-to-TSV connections between adjacent layers that exhibit a TSV mismatch. A comparison of specifications among via-first, via-middle, backside via-last, and frontside via-last TSV is listed in Table 8.1, including the TSV type, RDL type, and TSV to power grid connection. As highlighted in Table 8.1, only the frontside via-last TSV requires both types of RDLs. A circuit model is described in the following section for both types of RDLs and includes the interface between adjacent 3-D layers.

### 8.2.2 RDL with different 3-D stacking topologies

In Section 8.2.1, a back-to-face stacking topology is assumed in the discussion of the effect of the TSV fabrication technology on the P/G RDL. A back-to-face topology is widely used in 3-D ICs due to the ability to scale multi-layer stacks [19, 88]. Other topologies exist in 3-D ICs, including face-to-back, face-to-face, and back-to-back [19]. These stacking topologies can also affect the current path and P/G RDL in 3-D ICs. Since both the face-to-face and back-to-back topologies can only be applied to a two

Table 8.1: Comparison of different TSV fabrication methods.

| Type of TSV fabrication | TSV type |        |          | Existing RDL type |                     | Etching side | Passing layer       | Resistivity |
|-------------------------|----------|--------|----------|-------------------|---------------------|--------------|---------------------|-------------|
|                         | Type A   | Type B | Type one | Type two          | TSV-grid connection |              |                     |             |
| Via-first               | ✓        |        |          | ✓                 | BEOL                | Substrate    | Substrate           | High        |
| Via-mid                 | ✓        |        |          | ✓                 | BEOL                | Substrate    | Substrate           | Middle      |
| Backside via-last       | ✓        |        |          | ✓                 | BEOL                | Substrate    | Substrate           | Low         |
| Frontside via-last      |          | ✓      | ✓        | ✓                 | Type one RDL        | Metal        | Substrate and metal | Low         |

Table 8.2: Comparison of a face-to-back and back-to-face 3-D stacking topologies.

|                                         | Back-to-face                     | Face-to-back                      |
|-----------------------------------------|----------------------------------|-----------------------------------|
| Supported TSV type                      | Type A and B                     | Type A and B                      |
| RDL one connection                      | TSV (layer n) to grid (layer n)  | TSV (layer n) to grid (layer n+1) |
| RDL two connection                      | TSV (layer n) to TSV (layer n+1) | TSV (layer n) to TSV (layer n+1)  |
| Total layer of TSV in an M layer 3-D IC | M                                | M-1                               |

layer 3-D IC, these topologies are not considered here. The face-to-back topology, alternatively, is a popular topology, utilized in many 3-D ICs [75–78].

Consider an example of a type B TSV. A comparison of the P/G RDL between the back-to-face and face-to-back topology is illustrated in Figure 8.7. Both type one and two RDLs are required in these two topologies, where the connection to the RDLs are somewhat different. A type two RDL connects the TSV of layer N to the TSV of layer N+1 in both topologies. A type one RDL, alternatively, connects the TSV of layer N to the power grid of layer N and layer N+1, respectively, in the back-to-face and face-to-back topologies, as listed in Table 8.2. The current flowing to the loads in layer N is therefore transferred from the P/G TSV within layer N in the back-to-face topology. In contrast, the same current is transferred from the P/G TSV within layer N-1 in the face-to-back topology. Note that this difference in current paths leads to different design methodologies and optimization processes for these two types of topologies, a topic which has been ignored in the literature. For example, the design parameters of a P/G TSV within layer N, such as the physical size, number, and distribution style, should be chosen based on the power consumption and load distribution in layer N+1 in a face-to-back topology.

A circuit model of the current path and P/G RDL within different 3-D stacking topologies is based on the single layer circuit model illustrated in Figures 8.4(b) and 8.6(b). A type two RDL,  $R_{RDL2}$ , as the interface between  $layer_n$  and  $layer_{n+1}$ , is



Figure 8.7: Cross sectional view of current path and P/G RDL for type B TSVs, (a) back-to-face stacking topology, and (b) face-to-back stacking topology.



Figure 8.8: Circuit model of current path and P/G RDL for type B TSVs, (a) back-to-face stacking topology, and (b) face-to-back stacking topology.

illustrated in Figure 8.8(a). The dashed lines represent the border among  $layer_n$ ,  $layer_{n+1}$ , and  $layer_{RDL}$ . Note that a flip chip technology is assumed in the circuit

model, where the current passes from the bottom layer to the top layer. To evaluate the effect of the 3-D stacking topology on the power noise, a seven layer 3-D power network is considered. In both topologies, the bottom layer is connected to a 0.8 volt  $V_{DD}$ . Circuit parameters,  $R_{RDL1}$ ,  $R_{RDL2}$ ,  $R_{TSV}$ ,  $R_{grid}$ , and  $Load$ , are the same for the two topologies. The load current is modeled as a DC current source.

The voltage drop at layer three is evaluated for the two topologies, as illustrated in Figure 8.9. The solid and dashed line represents, respectively, the voltage drop in the back-to-face and face-to-back topologies. Increasing the load current in an adjacent layer, layer three, from 10 mA to 500 mA, while maintaining a constant load current from the other layers increases the voltage drop, as illustrated in Figure 8.9(a). Note that a higher voltage drop is observed in the back-to-face topology since one more layer of P/G TSV is required for the back-to-face topology, as listed in Table 8.2. The back-to-face topology is more sensitive to load variations in the adjacent layer as a greater change in the voltage drop is observed as compared with the face-to-back topology. The effect of the resistance on the P/G TSV on the voltage drop of layer three is illustrated in Figure 8.9(b). A 35% increase in voltage drop is observed in the back-to-face topology due to  $R_{TSV}$  in layer three. In contrast, the voltage drop does not change in the face-to-back topology. This constant voltage drop is due to the fundamental difference between the current paths of these two 3-D stacking topologies. Consider an example of  $layer_n$ . The current flows to  $Load_n$ ,



Figure 8.9: Comparison of voltage drop at the current source of layer three within a seven layer 3-D power network with a back-to-face and face-to-back topology. (a) Load increases in the adjacent layer, and (b) P/G TSV resistance increases within layer three.

passing through  $R_{TSV}$  in layer  $n$  within the back-to-face topology. Alternatively, the current flows to  $Load_n$ , passing through  $R_{TSV}$  in layer  $n-1$  rather than  $R_{TSV}$  in layer  $n$  within the face-to-back topology. In the face-to-back topology, the P/G TSV in a

specific layer does not directly affect the voltage drop in this layer, but rather affects the voltage drop in the higher layer. The back-to-face topology therefore supports a self-contained layer design process.

Both TSV fabrication methods and the 3-D stacking topology significantly affect the current path and P/G RDL in a 3-D IC. To evaluate the proposed grid-based RDL and compare with existing RDL structures, type B TSVs and a back-to-face topology are assumed. Due to the higher conductivity of the fill material and relatively mature fabrication process, the via-last TSV method is preferred for the P/G TSVs in 3-D systems. Within the via-last TSV method, the frontside via-last TSV, type B TSV, is assumed here since the type one RDL is not used in the backside via-last TSV method. A more general RDL analysis, including both type one and two RDLs, is appropriate for heterogeneous 3-D systems. Significant current passes through  $R_{stack}$  to the upper layers in a type A TSV, as shown in (8.1).  $R_{stack}$  is the on-chip BEOL, which does not support high current, leading to high power noise and electromigration issues. Although an additional layer of TSVs is required in the back-to-face topology as compared with the face-to-back topology, the back-to-face topology is preferred due to the advantages of separate TSV design processes for the different layers.

Note that existing work has demonstrated fabricated 3-D systems without discussing the function and effect of the P/G RDL [75–78]. Oversimplified assumptions regarding the RDL have however been made. For example, a perfect match of the

P/G TSV distribution between adjacent layers is assumed [75–78], where the P/G TSV bumps in layer N perfectly overlap with the TSV bumps in layer N+1. The type two RDL, discussed above, is therefore neglected. In addition, a type one P/G RDL is also neglected by either assuming type A TSVs or by simplifying the connection as a point-to-point metal line [217]. These assumptions are either unrealistic or do not support high performance heterogeneous 3-D systems.

### 8.3 Grid-Based RDL in 3-D ICs

The current path and P/G RDL of a 3-D power network with different TSV fabrication methods and 3-D stacking topologies are reviewed in the previous section. One-dimensional lumped circuit models are described to demonstrate the effect of the P/G RDL on different 3-D systems. Although a lumped model is sufficient to provide insight into the functionality and effect of the P/G RDL, the distributed nature of the power network is lost. A two-dimensional distributed circuit model of a 3-D power network is therefore preferred to characterize a P/G RDL [13]. A resistive grid-based 3-D power network as a platform to evaluate different P/G RDL topologies is described in Section 8.3.1. A novel grid-based RDL is also introduced in this section. A comparison between the proposed RDL and existing RDLs is provided in Section 8.3.2. The advantages of a grid-based RDL in scenarios such as a nonuniform P/G TSV distribution and high current demand are also discussed in this section.

### 8.3.1 Grid-based P/G RDL model

The power network of a 3-D IC is divided into three parts, the 2-D power grid of each layer, the P/G TSVs connecting adjacent layers, and the P/G RDL. A comprehensive model of a 2-D power grid consists of the global power grid, via stacks, and local power rails [33]. As the focus of this chapter is the interaction among layers due to the RDL, the 2-D power distribution network is treated as a simple two layer mesh structure, as illustrated in Figure 8.10. Adjacent power and ground metal lines are grouped to form a P/G pair, reducing the power grid inductance and saving on-chip area for the P/G TSVs. A classic two-dimensional distributed model is illustrated in Figure 2.7. Each node is connected to a DC load within the distributed model to eliminate the effects of temporal load variations on the RDL analysis process. DC loads are also assumed to be identical to eliminate the effects of variations in the location of the load on the RDL analysis process. The specifications of the 2-D power grid are summarized in Table 8.3. The parameters of the circuit model are based on these specifications. Note that the on-chip decoupling capacitance and metal line inductance are not considered in this model since the focus here is on the DC behavior.

A comprehensive TSV model considering coupling capacitance and inductance is discussed in [94]. In this chapter, the P/G TSV model is a simple resistor since the focus here is not the P/G TSVs but rather the P/G RDL. Note that within the TSV model, the barrier and seed layer are neglected as the current carried by these layers



Figure 8.10: Power grid with a two layer mesh structure.

Table 8.3: 2-D power grid specifications [33].

| Specs                              | Value              |
|------------------------------------|--------------------|
| 2-D IC size ( $\mu\text{m}$ )      | 1,000 x 1,000      |
| P/G pitch ( $\mu\text{m}$ )        | 2                  |
| P/G pair pitch ( $\mu\text{m}$ )   | 50                 |
| Metal line width ( $\mu\text{m}$ ) | 1                  |
| Metal line depth ( $\mu\text{m}$ ) | 2.5                |
| Metal layer number                 | 2                  |
| Cu conductivity (S/m)              | $5.88 \times 10^7$ |
| Average power consumption (W)      | 2                  |
| $V_{dd}$ (V)                       | 0.8                |

is negligible. Uniform distribution of the P/G TSVs is assumed across the entire 3-D system. The power TSVs and ground TSVs are distributed in an interdigitated manner, as illustrated in Figure 8.3. The specification of the P/G TSVs is listed in

Table 8.4: P/G TSV specifications [227].

| Specs                           | Value              |
|---------------------------------|--------------------|
| TSV length ( $\mu\text{m}$ )    | 50                 |
| TSV diameter ( $\mu\text{m}$ )  | 10                 |
| P/G TSV pitch ( $\mu\text{m}$ ) | 50                 |
| P/P TSV pitch ( $\mu\text{m}$ ) | 70.7               |
| TSV type                        | Frontside via-last |
| 3-D layer number                | 3                  |
| Cu conductivity (S/m)           | $5.88 \times 10^7$ |

Table 8.4. The number of layers in the 3-D power network is assumed to be three.

The model is however scalable to a higher number of layers.

A direct point-to-point (P2P) RDL is described in [217, 227], where a metal stripe directly connects the P/G TSV to an adjacent power metal line within a 2-D grid, as illustrated in Figure 8.11(a). Note that only the  $V_{dd}$  side of the power network is illustrated in this model as  $V_{dd}$  and  $G_{nd}$  of the power network in a 3-D IC are symmetric [19]. The ground TSVs and ground RDL are within the blank area, as illustrated in Figure 8.11(a). As discussed in the previous section, current flows from the P/G TSVs to the loads within each layer through the P/G RDL and 2-D power grid. Multiple 2-D power grid topologies have been proposed to manage the high current demand and to reduce power noise [151]. The same high current also passes through the P/G RDL. The direct P2P RDL, only relying on the metal stripe, leads



Figure 8.11: Model of the P/G RDL connecting the P/G TSVs to the 2-D power grid. (a) Direct P2P RDL, and (b) grid-based RDL.

to high power noise and electromigration, particularly when the number of P/G TSVs is insufficient or the distribution of the P/G TSVs is uneven.

A grid-based P/G RDL is therefore proposed. As illustrated in Figure 8.11(b), a two layer P/G RDL, each layer oriented orthogonally, forms a mesh structure similar to a 2-D power grid. The RDL layers are placed above the metal layers of the power grid, connecting the P/G TSVs to the power grid. Vertical vias are formed where

Table 8.5: P/G RDL specifications [227].

| Specs                                | Value              |
|--------------------------------------|--------------------|
| RDL line width ( $\mu\text{m}$ )     | 6                  |
| RDL line thickness ( $\mu\text{m}$ ) | 3.5                |
| RDL line pitch ( $\mu\text{m}$ )     | 50                 |
| RDL layer number                     | 2                  |
| Cu conductivity (S/m)                | $5.88 \times 10^7$ |

the RDL crosses the power grid. The grid-based P/G RDL can be modeled as an independent system. The inputs to the system are the connections between the P/G TSV and the RDL metal layer, where the number and location of the inputs are dependent on the number and distribution of the P/G TSVs within the 3-D layer. The circuit model of the RDL system, similar to the 2-D power grid, is a classic two-dimensional resistive grid. The specification of the P/G RDL grid is listed in Table 8.5. The outputs of the system are the connections between the 2-D power grid and the RDL metal layer. Note that the number and location of the outputs are fixed, which are set by the specifications of the 2-D power grid.

### 8.3.2 Comparison between grid-based RDL and P2P RDL

A comprehensive circuit model of a 3-D power network is described here, combining the previously described models of the 2-D power grid, P/G TSV, and P/G RDL. A three layer 3-D power network is assumed. The number and magnitude of the resistor within the model of the 2-D power grid are from Table 8.3. As discussed

in Section 8.2, the frontside via-last TSVs and back-to-face stacking topology are assumed as well as a 10 x 10 uniform TSV distribution. Each load connected to the 2-D power grid is also connected to a nearby P/G TSV within the range of the pitch of a P/G pair. The resistance of the TSV is from Table 8.4. A grid-based structure is initially only applied to the type one RDL to evaluate the effects of an RDL on a single 2-D power grid. The type two RDL is therefore neglected, assuming perfect P/G TSV mapping between adjacent layers. The model generation and simulation is conducted in Cadence Spectre. A comparison of the voltage drop on layer two between the P2P RDL and a grid-based RDL is provided.

A type one RDL is the interface between the P/G TSVs and the 2-D power grid. The distribution topology and number of TSVs can therefore affect the performance of the P2P and grid-based RDL. To evaluate the effect of the number of TSVs on the RDL performance, three simulation scenarios are considered, where the TSV distribution topology is assumed uniform and the number of TSVs are, respectively, 20, 50, and 100. The location of the power TSVs within the different scenarios is illustrated in Figure 8.12. Note that only the power TSVs are illustrated. Only half of the total number of TSVs are therefore shown in Figure 8.12. To evaluate the effect of the TSV distribution topology, another scenario is considered, where the TSV distribution is assumed to be uneven and the number of TSVs is 20. Two



Figure 8.12: Distribution topologies of P/G TSV for comparison of voltage drop between P2P RDL and grid-based RDL. (a) 100 TSVs with uniform distribution, (b) 50 TSVs with uniform distribution, (c) 20 TSVs with uniform distribution, and (d) 20 TSVs with uneven distribution.

groups of TSVs are assigned, respectively, to the top left and bottom right corner, as illustrated in Figure 8.12(d).

The voltage drop at each input of the current load in the second layer of the 3-D power network is illustrated in Figure 8.13. Each tile within the figure represents a current load. The shade of the tile illustrates the voltage level of the connected current load, where the darker shade represents a lower voltage level and the lighter shade represents a higher voltage level. A comparison between the grid-based and P2P RDL

with 100 P/G TSVs is illustrated in Figures 8.13(a) and (b). The voltage drop with the grid-based P/G RDL is quite low, exhibiting a maximum voltage drop of 11.3 mV, as compared with the P2P RDL with a maximum voltage drop of 47.1 mV. The larger voltage drop on the right side of Figure 8.13(b) is due to the connection of the TSV to the adjacent metal line on the left side of the P2P RDL, as illustrated in Figure 8.11. The voltage drop increases significantly with fewer P/G TSVs, from 50 and 20, as illustrated in Figures 8.13(c), (d), (e), and (f). In Figures 8.13(d) and (f), the loads without a direct connection to the P2P RDL exhibit a much higher voltage drop than the grid-based RDL with a maximum voltage drop of, respectively, 93.1 mV and 163 mV. Alternatively, the voltage drop of the loads connected to the grid-based RDL are within 5% of the noise margin even with 20 TSVs, as illustrated in Figure 8.13(e). This trend is due to the nature of the grid structure, producing a lower resistance as compared with a simple metal stripe in the P2P RDL. For the same number of P/G TSVs and distribution topology, the grid-based RDL produces a much lower voltage drop within the 3-D power network.

In the previous analysis, a uniform distribution of the P/G TSVs is assumed. Although the uniform distribution is preferable to suppress the worst case voltage drop in 3-D power networks [25], the uniform distribution may not be a practical design choice due to area constraints. A 3-D power network with an uneven TSV distribution is therefore considered to evaluate the effect on the performance of the



Figure 8.13: Variation in voltage drop in 3-D power networks with fewer P/G TSVs for grid-based and P2P P/G RDL. (a) 100 TSVs with grid-based P/G RDL, (b) 100 TSVs with P2P RDL, (c) 50 TSVs with grid-based RDL, (d) 50 TSVs with P2P P/G RDL, (e) 20 TSVs with grid-based P/G RDL, and (f) 20 TSVs with P2P P/G RDL.

P2P and grid-based RDLs. In this case study, P/G TSVs are distributed at the top left and bottom right corner of the IC, as illustrated in Figure 8.12(d). The



Figure 8.14: Comparison of voltage drop between the grid-based and P2P RDL with uneven P/G TSV distribution. (a) 20 unevenly distributed P/G TSVs with grid-based RDL, and (b) 20 unevenly distributed P/G TSVs with P2P RDL.

total number of TSVs is 20. The simulation results are illustrated in Figure 8.14. A significant increase in voltage drop is observed at the location without the P/G TSVs in both the P2P and grid-based RDLs. The P2P and grid-based RDL exhibit the highest voltage drop of, respectively, 84 mV and 331.1 mV. The mesh structure of the grid-based RDL also balances any voltage variations across the IC due to the uneven TSV distribution. A standard deviation of 0.021 is observed in Figure 8.14(a) as compared with a standard deviation of 0.078 in Figure 8.14(b).

To evaluate the effects of the P/G RDL in a high current environment, a high current 3-D power network is also considered. The average power consumption of each layer is increased from 2 watts to 10 watts. 100 P/G TSVs and a uniform distribution are assumed. The simulation results are illustrated in Figure 8.15. The voltage drops significantly in this scenario as compared with the 2 watt scenario (see



Figure 8.15: Comparison of the voltage drop between the grid-based and P2P RDL with increasing load current. (a) Grid-based RDL, and (b) P2P RDL.



Figure 8.16: Comparison of the highest voltage drop in the grid-based RDL and P2P RDL for five different scenarios.

Figures 8.13(a) and (b)). The P2P RDL cannot support high current 3-D power networks despite the large number of available P/G TSVs. The grid-based RDL is therefore preferable for high current 3-D power networks.

The worst case voltage drop within a grid-based and P2P RDL is compared for

the five previously discussed power network scenarios. For the P2P RDL, the worst case voltage drop is larger than the 5% noise margin in all scenarios. The P2P RDL is not a feasible choice for 3-D power networks with few TSVs, unevenly distributed TSVs, or high power consumption. For the grid-based RDL, the worst case voltage drop is within 5% noise margin of the uniform TSV distribution topology. An uneven distribution significantly affects the voltage drop, increasing the worst case voltage drop by 2X. The worst case voltage drop in the scenario of 20 unevenly distributed TSVs utilizing the grid-based RDL is 84 mV, which is lower than the scenario with 50 uniformly distributed TSVs utilizing the P2P RDL. The grid-based RDL can therefore reduce the required number of P/G TSVs in 3-D power networks while better tolerating any area constraints. In addition, the grid-based RDL supports much higher currents with less overhead as compared with the P2P RDL. The worst case voltage drop in the scenario of high power consumption within a grid-based RDL is 56.6 mV, which is slightly higher than the P2P RDL with 5X lower power consumption.

## 8.4 Summary

The importance of utilizing a P/G RDL for high current 3-D power networks is discussed in this chapter. A discussion of P/G RDLs is lacking in the literature

and whatever little material exists utilizes oversimplified assumptions such as a perfect TSV-to-TSV match between adjacent layers or simple metal stripe connections between the TSVs and the 2-D power grid. These assumptions in a complex high current 3-D ICs are not practical. For the first time, a practical definition of a 3-D RDL is provided.

Based on the functionality of the P/G RDL, two types of RDLs are introduced. The method of fabricating a TSV and the 3-D stacking topology can affect the impedance and performance characteristics of the P/G RDL. A circuit model of the P/G RDL is therefore described with different TSV and stacking strategies. A novel grid-based P/G RDL is also proposed and compared with a point-to-point RDL. It is observed that a grid-based RDL significantly suppresses the voltage drop in 3-D power networks, supporting fewer P/G TSVs and higher current demand. A grid-based RDL can also support a nonuniform TSV distribution, alleviating possible area constraints. The grid-based RDL is an effective candidate for high current, heterogeneous 3-D systems.

## Chapter 9

# Parasitic Impedance Aware Power Delivery for Voltage Stacked Systems

Over the past few years, the number of cores and throughput of high performance integrated circuits (ICs) have significantly increased [82, 101, 105, 110, 113]. The higher throughput of the processor and the slowdown in scaling the power supply voltage have led to much greater dynamic power consumption [13]. Aggressive CMOS device scaling has greatly increased both gate and channel leakage currents, which has become the primary component of the total power dissipated in modern ICs [69]. As a result, the power consumption of a high performance computing system continues to increase. The power consumption of recent high performance CPUs, GPUs, and ASICs can exceed 200 watts [82, 101, 105, 110, 113]. The IRDS predicts further increases in power consumption in high performance processors [18].

As discussed in Chapter 3, high current challenges exist in high performance computing system including high power noise, electromigration, and low efficiency. By serially connecting multiple power domains, voltage stacking can significantly reduce the current flowing through a power delivery network, lowering the high current. Voltage stacking, also referred to as charge recycling [133] and multistory power delivery networks [134], has recently drawn significant attention from both the industrial and academic communities [133–136, 236–240]. The challenges of voltage stacked systems are significant; load imbalances across different layers can lead to voltage variations within each power domain.

One method to mitigate load imbalances is to develop more balanced stacked systems, reducing the magnitude and frequency of the load imbalances. Circuit, architecture, and scheduling techniques can achieve more balanced stacked systems. Load imbalances however cannot be eliminated with this method. Another technique is to mitigate the effects of load imbalances by utilizing a push-pull voltage regulator [238]. A push-pull regulator is necessary as load imbalance may require the regulator to pull current from the load. The voltage level of each stack is regulated within a target noise margin, supporting a range of load imbalances within a voltage stacked system. Another key factor that affects performance is the power delivery network within the voltage stacked system. Due to the unique serial connection of a voltage

stacked system, the parasitic impedance of the power delivery system plays an impactful role. The power delivery system connects multiple layers within a stacked system, and the power delivery system provides current to the push-pull regulators affecting the performance. The power delivery network is therefore a critical component within a voltage stacked system.

The rest of the chapter is organized as follows. Prior work related to power delivery networks within voltage stacked systems is discussed in Section 9.1. A background of the challenges within voltage stacked systems is also provided. Performance degradation of a voltage stacked system when ignoring the parasitic impedances within the power delivery network is discussed in Section 9.2. In Section 9.3, power delivery networks for voltage stacked systems, which consider these parasitic impedances, are discussed. The parasitic impedance of the power delivery system within two converter topologies, stack-to-bus and stack-to-stack, are also discussed and compared. A tile-based power delivery network design methodology for high current voltage stacked systems is presented in Section 9.4, supporting multiple power domains and low parasitic impedances. Some conclusions are offered in Section 9.5.

## 9.1 Background and previous work

The power delivery network within a voltage stacked system is highly complex. A review of previous work in the field of voltage stacked systems is provided in Section

9.1.1. The challenges of load imbalances with conventional power delivery systems are also discussed. Existing work on developing balanced stacks and current balancing converters are reviewed in Section 9.1.2.

### **9.1.1 Challenges of load imbalances in voltage stacked systems**

Voltage stacking is a circuit and architectural level technique that serially connects multiple voltage domains. In this way, charge flowing through one voltage domain can be “recycled” within the following voltage domains. A high input voltage and lower on-chip current are therefore achieved, managing electromigration constraints, distribution losses, and thermal hotspots caused by the higher on-chip currents. The high input voltage leads to higher system efficiency due to the high voltage transmission [193]. Assuming a constant current, an  $n$ -layer voltage stacked system can ideally reduce the on-chip current demand by  $1/n$ , reducing the IR drop by  $n$  and the distribution loss by  $n^2$ . Voltage stacking is therefore a useful technique to alleviate electromigration in HPC systems, while requiring less metal resources for the power network and I/O.

An example of a 16 core, four layer voltage stacked system is illustrated in Figure 9.1, where a 16 core processor is divided into four voltage domains. Each voltage domain includes four cores and ideally shares the same voltage level, where  $V_1 = V_2$



Figure 9.1: 16 core, four layer voltage stacked system.

$= V_3 = V_4$ . The processor cores within the same voltage domain, for example within layer 1, are connected in parallel. Alternatively, the processor cores in different voltage domains are connected in series, ensuring that the same current flows from layer 1 to layer 4. Note that the term, layer, in this chapter, is those core/cores sharing the same voltage domain within a voltage stacked system. A stack referenced in other work is the same as a layer here.

### 9.1.1.1 A case study of load imbalances

The current passing through each stacked layer is ideally the same. In practice, load or current imbalances exist across the stacked layers. These load imbalances lead to voltage variations across the n-layer voltage domains, challenging system performance and reliability. To evaluate the effects of load imbalances on the power noise in a voltage stacked system, a comparative case study is described here. Three scenarios, a regular power network, a two layer voltage stacked system, and a four layer voltage stacked system, are considered. The input voltage is modeled as a DC voltage source. The power delivery network is modeled as serially cascaded RL branches and parallel RLC branches, as illustrated in Figure 6.6. The on-chip load is modeled as a resistor  $R_{load}$ , where the load activity factor is varied by changing  $R_{load}$ .

The specifications of the load imbalance analysis process is listed in Table 9.1. The input voltage  $V_{in}$  of a regular system is 0.8 volts, the same as the nominal voltage  $V_{nom}$  for the on-chip load. The input voltage of the two layer and four layer voltage stacked systems is, respectively, 1.6 volts and 3.2 volts. The average current flowing to the on-chip load  $I_{load}$  is accordingly less, respectively, 80 amperes and 40 amperes. Note

Table 9.1: Specifications of the load imbalance analysis process.

|                             | $V_{in}$ (V) | $V_{nom}$ (V) | $I_{load}$ (A) | $R_{load}$ ( $m\Omega$ ) |
|-----------------------------|--------------|---------------|----------------|--------------------------|
| Regular system              | 0.8          | 0.8           | 160            | 5                        |
| Two layer voltage stacking  | 1.6          | 0.8           | 80             | 10                       |
| Four layer voltage stacking | 3.2          | 0.8           | 40             | 20                       |



Figure 9.2: Variation in the voltage levels for different activity factors for a non-stacked system, two layer voltage stacked system, and four layer voltage stacked system.

that  $R_{load}$  of each layer in a voltage stacked system is the same when the loads are balanced. The load imbalances occur when differences in the activity factor between layers exist. Layer-to-layer regulation is not considered in this case study.

A comparison of the voltage levels within a non-stacked system, a two layer voltage stacked system, and a four layer voltage stacked system with different activity factors is illustrated in Figure 9.2. With no change in the activity factor, the voltage level of the three systems remains at a nominal voltage since the loads are balanced. If the activity factor of one of the layers in a voltage stacked system increases, the voltage in that layer becomes lower. The first group shown in Figure 9.2 illustrates the voltage level of a non-stacked system while the second and third groups represent,

respectively, the two layer and four layer voltage stacked systems. The voltage drop in a voltage stacked system is more sensitive to changes in the activity factor than in a non-stacked system. This behavior occurs because the parasitic impedance of the power delivery network produces a voltage drop in both the non-stacked and voltage stacked systems while voltage division among the layers also occurs in the voltage stacked systems. A lower  $R_{load}$  due to the higher activity factor in a certain layer leads to a lower voltage in that layer as compared with the other layers. A larger voltage drop is observed in a voltage stacked system with more layers. Power noise suppression is therefore necessary in voltage stacked system to alleviate the effects of load imbalances.

#### **9.1.1.2 Limitations of decoupling capacitors**

Several methods for suppressing power noise in non-voltage stacked systems include decoupling capacitors, voltage regulation circuits, and power delivery network design optimization. Conventional voltage regulators are not helpful in a voltage stacked system. Certain individual layers (for example,  $V_2$ ) can exhibit power noise while the voltage across all of the layers ( $V_1 + V_2 + V_3 + V_4$ ) remains stable. Placing decoupling capacitors within each layer can be an effective method to reduce power noise within a layer. To consider the effects of the decoupling capacitors on load balancing, a case study is described. The intention of this case study is to evaluate

whether a high performance IC can satisfy a target noise margin for a few hundred nanoseconds after a load imbalance occurs and before an off-chip regulator can respond.

A four layer voltage stacked system is evaluated in this case study. The decoupling capacitor [13] is

$$C_{decap} = \frac{P}{V^2 f} \frac{1 - \alpha}{\alpha}, \quad (9.1)$$

where  $P$  is the dynamic power consumption of an IC, assumed in this case study to be 60 watts.  $f$  is the operating frequency of the IC, assumed to be 2 GHz.  $\alpha$  is the switching factor of the IC, assumed to be 10%.  $V$  is the nominal voltage, 0.8 volts in this case study. The intrinsic on-chip decoupling capacitor is 0.42  $\mu$ F based on (9.1). To evaluate the effects of the decoupling capacitor on the load balance in a worst case scenario,  $I_{load}$  is assumed to be 40 amperes and only one layer exhibits a change in activity factor. A transient current of 0.3 A/ns is applied [241]. A range of  $di/dt$  from 0.5 A/ns to 2 A/ns is assumed in this case study.

The variation of the voltage droop in the voltage stacked system as a function of decoupling capacitance and transient current during a load imbalance is illustrated in Figure 9.3. The response shown in Figures 9.3(a) and 9.3(b) are, respectively, 5 ns and 10 ns after the load imbalances occur. At 5 ns, the intrinsic on-chip decoupling capacitor is sufficient to maintain the voltage droop below the 5% margin with a transient current ranging from 0.5 to 0.9 A/ns, as illustrated in Figure 9.3(a).



(a)



(b)

Figure 9.3: Voltage droop as a function of decoupling capacitance and transient current during a load imbalance, (a) 10 ns, and (b) 5 ns. The decoupling capacitance ranges from 0.42 to 4.2  $\mu$ F and the transient current ranges from 0.5 to 2 A/ns.

Alternatively, to ensure that the voltage droop at 10 ns is below the 5% margin, a minimum 2.52  $\mu$ F decoupling capacitor is required and the transient current cannot be larger than 0.9 A/ns, as illustrated in Figure 9.3(b). A 2.52  $\mu$ F capacitance is

excessively large for an on-chip decoupling capacitor utilizing conventional on-chip capacitor technologies. Due to the inevitable parasitic impedance and farther distance from the on-chip load [157], package and PCB level decoupling capacitors can not respond sufficiently fast to moderate the power noise. It is therefore impractical to rely on decoupling capacitors to manage power noise. Dedicated load balancing methods for voltage stacked systems are therefore preferable.

### 9.1.2 Existing work on mitigating load imbalances

As previously discussed, one method to mitigate load imbalances is to develop a more balanced stacked system. *CoreUnfolding*, a microarchitecture-level technique, is proposed in [135] to minimize the frequency and magnitude of the load imbalances. The magnitude and correlation of the power among different functional units are exploited, partitioning these units into a more balanced two layer voltage stacked system. In addition, an optimization framework for logic partitioning of a two layer voltage stacked system is proposed in [236]. A well balanced current profile is achieved, leading to a 2X improvement in battery lifetime.

A perfectly balanced voltage stacked system is however not possible. A different method is therefore required. One approach is to utilize a voltage regulator to manage the effects of the load imbalances on the power noise. A fully integrated on-chip push-pull switched capacitor converter is proposed here to regulate the voltage across

each layer when load imbalances occur [136]. A four layer voltage stacked system dissipating 17.5 mW is achieved. A hybrid voltage stacked system is proposed in [133], where an off-chip voltage regulator module (VRM) combined with an on-chip integrated voltage regulator is utilized to address load imbalances. It is reported that 82.4 mm<sup>2</sup> of on-chip area is dedicated for the integrated voltage regulators [133]. A high efficiency, high power density fully integrated switched capacitor converter is proposed in [242], supporting a two layer voltage stacked system with on-chip trench capacitor technology. Significant work has focused on board level pull-push converters to support voltage stacked systems [237–239].

Board level voltage stacking does not mitigate certain high current issues such as thermal challenges and electromigration. On-chip level voltage stacking is however an excellent candidate to resolve these high current issues. The complex nature of the on-chip power delivery network and the current paths within an on-chip voltage stacked system has however not been considered in existing work. Moreover, due to limited on-chip area, off-chip converters are required to manage these load imbalances. Due to these parasitic effects and complex current paths, the power delivery system plays a critical role in a voltage stacked system.

## 9.2 Performance degradation due to parasitic impedances

As discussed in Section 9.1, decoupling capacitors cannot sufficiently suppress power noise. A voltage regulator is therefore required for voltage stacked systems. Due to the relatively low efficiency of low-dropout regulators and the large area overhead of buck converters, a switched capacitor converter is considered here [243]. A symmetric ladder topology switched capacitor converter [243] is utilized in the four layer voltage stacked system, as illustrated in Figure 9.4. Five voltage levels are provided by the converter,  $V_{in}$ ,  $V_{up}$ ,  $V_{mid}$ ,  $V_{low}$ , and  $V_{Gnd}$ , dividing an IC into four layers. Each layer is connected to an adjacent voltage level, forming four voltage domains. The load of each layer is modeled as a resistor in parallel with a decoupling capacitor. The nominal voltage for each layer is assumed to be 0.8 volts.  $V_{in}$  is 3.2 volts since an equal  $V_{DD}$  is assumed for each of the four layers.

Six capacitors are connected to different voltage domains through switches, as



Figure 9.4: Symmetric ladder topology switched capacitor converter utilized in a four layer voltage stacked system.

illustrated in Figure 9.4. Note that these capacitors are flying capacitors since the capacitors are not directly connected to ground. Depending upon the operation of the switches, the flying capacitor,  $C_1$  for example, is connected to the first or the second voltage domain. The switches are divided into two groups,  $S_1$  and  $S_2$ , which are intermittently turned on and off. When the load is balanced, the current flowing through each layer is identical; the current passes through each of the four layers without flowing through the flying capacitors. Alternatively, when the load is unbalanced, the activity factor of each layer is different. Current also flows through the flying capacitors and switches. The flying capacitors charge and discharge depending upon the direction of the current. In this way, charge is passed between each layer within these four layers, regulating the voltage of each layer when a load imbalance occurs.

A simulation of this symmetric ladder switched capacitor converter, based on Cadence Virtuoso, is described where the PSPICE transistor model is used to characterize the switches. As discussed in the previous section, the converter responds within 10 ns after the load imbalance occurs and before the voltage droop reaches the target 5% noise margin. The switching frequency of the circuit is therefore 100 MHz. The total flying capacitance is  $4.5 \mu\text{F}$ , which is practical for on-chip integration with a deep trench capacitor technology or off-chip capacitors [242]. The decoupling capacitor is  $0.42 \mu\text{F}$ , which is evenly distributed among the four layers. A 10% increase



Figure 9.5: Voltage droop after a 10% load imbalance within a four layer voltage stacked system with a switched capacitor ladder converter.

in the activity factor models the load imbalance. To produce the worst case load imbalance scenario, the activity factor of only one layer changes while the activity factor of the other layers are maintained the same.

The voltage droop after the load imbalance occurs is illustrated in Figure 9.5. The voltage levels across the four layers are illustrated, respectively, as  $V_1$ ,  $V_2$ ,  $V_3$ , and  $V_4$ . The voltage level of the four layers is initially stable at 0.8 volts due to the balanced load condition. The load imbalance occurs at  $2.5 \mu s$ , where  $V_1$  and  $V_2$  exhibit a voltage drop while  $V_3$  and  $V_4$  exhibit a rise in voltage. The first layer with a changing activity factor exhibits a maximum voltage drop of about 10 mV, which is 20X lower than the voltage drop without regulation or decoupling capacitors, as described in Section 9.1.1.1. In a voltage stacked system without regulation, only the layer with a higher

activity factor exhibits a voltage drop; the remaining layers exhibit a rise in voltage. As illustrated in Figure 9.5,  $V_2$  also exhibits a drop in voltage. This behavior occurs because the switched capacitor converter does not regulate a specific layer but rather naturally balances the load among all four layers [244].

Note that the parasitic impedance of the power network connecting adjacent layers is not considered in the previous simulation. This parasitic impedance however exists within a voltage stacked system. To evaluate the effects of the parasitic impedances, the parasitic resistance and inductance are added at the connection between adjacent layers, as illustrated in Figure 9.4. The effects of the parasitic impedances on the voltage drop are illustrated in Figure 9.6. As shown in Figure 9.6(a), the voltage drop significantly increases with increasing parasitic resistance. This large voltage drop is due to the high current flowing through the parasitic resistance within the power delivery network. The parasitic inductance also leads to a greater voltage drop, as illustrated in Figure 9.6(b), although not as significant as the parasitic resistance. The high switching frequency of the switched capacitor converter and the parasitic inductance lead to greater  $Ldi/dt$  noise. Despite this critical parasitic effect on a voltage stacked system, this topic has to date not been discussed in the literature.



Figure 9.6: Voltage drop considering the effects of the parasitic impedances within a power delivery network. (a) Parasitic resistance increases from 0 to 5 mΩ while the parasitic inductance is assumed negligible; and (b) parasitic inductance increases from 0 to 80 pH while the parasitic resistance is assumed negligible.

### 9.3 Power delivery network of voltage stacked differential power processing systems

Due to the serially stacked layers, the power delivery networks, supporting a high current, voltage stacked system, is a complex system with multiple voltage domains.

A conventional power delivery network within a voltage stacked system can produce either an unrealistic design specification or significant parasitic impedances, damaging system reliability and efficiency. A power delivery network dedicated for voltage stacked systems as well as load balancing circuits, which consider the multi-domain characteristics and the effects of the parasitic impedances, is highly desirable. Such a power delivery network for voltage stacked systems is presented in Section 9.3.1. A primary differences between a conventional power network and a voltage stacked power network are reviewed. The power delivery network connecting the layers to the load balancing circuits, a differential power processing (DPP) system, is introduced in Section 9.3.2. The effects of the topology of the DPP system on the power delivery network is also discussed. In Section 9.3.3, a voltage stacked system with a load balancing converter utilizing a stack-to-bus topology is introduced. The effects of the parasitic impedances on the power delivery system are also reviewed.

### **9.3.1 Power delivery network of voltage stacked systems**

The power delivery networks should consider the current paths, providing low resistance and reliable paths to distribute the current to the on-chip loads. Due to the serial connection between each layer, the current distribution paths in a voltage stacked system are different from a regular system. A comparison of the current paths between a four core regular non-stacked system and a four layer voltage stacked

system is illustrated in Figure 9.7. In a non-stacked system, the current distribution is dominated by the current flowing from the bumps to the on-chip load, as illustrated in Figure 9.7(a). The current transferred from a bump is distributed within the power distribution cell [24]. A power distribution cell is a circular area within a power grid surrounding a power bump, where the on-chip load within this area draws current from this bump.

In a voltage stacked system, the current flows through the serially connected power network, producing a cross-core current path, as illustrated in Figure 9.7(b).



Figure 9.7: On-chip current path among different cores. a) Regular non-stacked four core system, and b) four layer voltage stacked system.

A header and footer layer are defined here as, respectively, the first and last layer along the current path. The current flows from the header layer (for example, core 1) to the footer layer (core 4), passing through the intermediate layers (core 2 and 3), as illustrated in Figure 9.7(b). Due to this cross-core current path, the power bumps, connected to the package power plane, only exist in the footer and header layer of a voltage stacked system. No power I/Os exist within the intermediate layers connecting to the power plane of the package. Assuming the same bump pitch as a non-stacked four core system, only 1/4 of the power bumps are required in a voltage stacked system, producing the same current density within each power bump. The bump resources and area can therefore be saved for high bandwidth signals. The cross-core current path however produces a large voltage drop across the parasitic impedances within the power network, degrading the performance of the voltage regulator within a voltage stacked system. Note that this cross-core current path does not exist in a non-stacked power delivery network, where current is only distributed within the power distribution cell. This situation occurs since the power delivery networks within a non-stacked system are not serially connected.

To evaluate the effects of the cross-core current path on the power delivery system, a comparison of the current density between a regular power network and a voltage stacked power network is presented. The specifications of the on-chip power grid are listed in Table 9.2. This specification applies to both the non-stacked and voltage

Table 9.2: Specifications of the on-chip power network [33].

| Specs                                    | Value              |
|------------------------------------------|--------------------|
| Core size ( $\mu\text{m}$ )              | 1,000 x 1,000      |
| Power bump pitch ( $\mu\text{m}$ )       | 50                 |
| Power metal line pitch ( $\mu\text{m}$ ) | 50                 |
| Metal line width ( $\mu\text{m}$ )       | 1                  |
| Metal line depth ( $\mu\text{m}$ )       | 2.5                |
| Power grid layer number                  | 2                  |
| Cu conductivity (S/m)                    | $5.88 \times 10^7$ |
| $V_{dd}$ (V)                             | 0.8                |
| Average power consumption (W)            | 1                  |

stacked power networks. The current density models of a regular and voltage stacked power network are illustrated in Figure 9.8. In a regular non-stacked power grid, the current flow is based on the power distribution cell, a circular area with a diameter equal to the pitch of the power bump, as illustrated in Figure 9.8(a). In an ideal case, where the current loads are evenly distributed across the IC, the current from the package to the on-chip loads is limited to the power cell, producing a low impedance path. Alternatively, in a voltage stacked system, the cross-core current passes from the header layer to the intermediate layers through the same power grid, as illustrated in Figure 9.8(b). The distance of this current path is however significantly longer than with a regular non-stacked power grid.

The current density is

$$J = \frac{I_{total}}{N \cdot A}, \quad (9.2)$$



Figure 9.8: Current density model, a) power distribution cell-based regular non-stacked power grid, and b) voltage stacked power grid with cross-core current path.

where  $I_{total}$  and  $A$  are, respectively, the total current flowing through the cross-core path and the cross sectional area of the power metal line.  $I_{total}$  and  $A$  are the same in both the non-stacked and voltage stacked power grids.  $N$  is the total number of metal lines supporting the cross-core current path. In a non-stacked power grid,  $N$  is  $2 \times N_{bump}$ , where  $N_{bump}$  is the number of power bumps. Alternatively, in a voltage stacked power grid,  $N$  is the number of metal lines connecting adjacent layers. The current density of a non-stacked and a voltage stacked power network is, respectively, 625 A/mm<sup>2</sup> and 25,000 A/mm<sup>2</sup>. The voltage stacked power network exhibits a 40X higher

current density as compared with a non-stacked power network, which is impractical due to electromigration constraints, as suggested by (3.3).

The DC voltage drop due to the cross-core current path is

$$V_{drop} = J \cdot A \cdot R_{unit} \cdot D_{eff}, \quad (9.3)$$

where  $R_{unit}$  is the unit resistance of the metal line, and  $D_{eff}$  is the effective distance of this current path.  $D_{eff}$  within the two power grids are, respectively, equal to the radius of the power cell and the size of the layer, leading to a 1,600X increase in voltage drop in a voltage stacked power grid as compared with a non-stacked power grid. Observe that a conventional on-chip power grid is not designed to consider cross-core current paths. Due to the significantly greater current density and voltage drop, a dedicated power delivery network is necessary for voltage stacked systems.

### 9.3.2 Power delivery network for DPP systems

As discussed in Section 9.1, load balancing circuits are a critical issue in achieving a target power noise. By processing the mismatched power between loads, differential power processing (DPP) [245] is an effective technique to regulate different layers within a voltage stacked system. Two converter topologies exist in DPP, stack-to-stack and stack-to-bus. The power delivery network connecting layers to the DPP



Figure 9.9: Two VR topologies utilizing DPP in a voltage stacked system. (a) Stack-to-bus topology, where  $n$  DC/DC converters are required for an  $n$  layer voltage stacked system, and (b) stack-to-stack topology, where  $n-1$  DC/DC converters are required for an  $n$  layer voltage stacked system.

converter varies significantly between these two topologies. The parasitic characteristics of these two topologies are therefore quite different. An exploration of the effects of the parasitic impedances on these two topologies is discussed here. Note that the power delivery network described in this section is different from the power network discussed in the previous section. The focus of this section is on the power delivery system within the DPP system, which is between the voltage stacked system and the load balancing circuits. The current path between the stacked layers and the converters is the primary issue. The focus of Section 9.3.1 is on the power delivery network between stacked layers, where the current path between layers is the primary issue.

The topology of a DPP system is a critical factor affecting the operational behavior and performance of load balancing techniques. Multiple topologies for DPP can be applied to voltage stacked systems depending upon the connections between the inputs and outputs between the layers. Stack-to-bus and stack-to-stack are two common topologies for DPP, as illustrated in Figure 9.9. In the stack-to-bus topology (see Figure 9.9(a)),  $n$  DC/DC converters are required for an  $n$  layer voltage stacked system. The input of each DC/DC converter is connected to the same power bus, while the output of each DC/DC converter is connected to a nearby layer. In the stack-to-stack topology (see Figure 9.9(b)),  $n-1$  DC/DC converters are connected between a nearby layer and two adjacent layers. Multiple types of converters, including a buck-boost converter and a transformer-based converter, can be utilized as the DC/DC converter within these topologies. For example, a multi-phase buck-boost converter can provide DC/DC conversion in a stack-to-stack topology [237, 238]. By varying the duty cycle of the buck converters, voltage variations due to load imbalances can be regulated.

The power delivery network and the effects of the parasitic impedances play an essential role within a DPP system, greatly affecting the capability of the voltage regulator. Due to the multi-domain characteristics of a voltage stacked system and the complex nature of a DPP system, the power delivery network of the overall system can be quite complex. The current path and parasitic impedances of a power delivery

system within a stack-to-bus and stack-to-stack topology are illustrated in Figure 9.10. Five voltage domains, P1, G1 (P2), G2 (P3), G3 (P4), and G4, exist in a four layer voltage stacked system with both DPP topologies. The ground net G of the upper layer is the power net P of the adjacent lower layer, as illustrated in Figure 9.10.

The effects of the parasitic impedances on these two DPP topologies are quite different. A zoom-in of net G1 (P2) within the stack-to-bus and stack-to-stack topologies is illustrated, respectively, in Figures 9.10(a) and 9.10(b). The parasitic impedances within the power delivery system are shown, where S-S and S-VR represent, respectively, the parasitic impedance between adjacent stacked layers and the parasitic impedance between the layers and converters. The current flowing through S-S and S-VR is highlighted by an arrow line. As illustrated in Figure 9.10(a), in a stack-to-bus topology, the converter  $n$  is only connected to layer  $n$ . A self-contained current loop is formed within layer  $n$ , where the current flows from converter  $n$  to layer  $n$  and returns to converter  $n$ . The current regulated by converter  $n$  does not flow to the other layers. This self-contained current loop does not exist in a stack-to-stack topology.

In a stack-to-stack topology, the current from converter  $n$  can either flow to layer  $n$  or layer  $n+1$  depending upon the load imbalance scenarios. In addition, multiple converters, including converter  $n-1$ , converter  $n$ , and converter  $n+1$ , are connected



Figure 9.10: The current path and parasitic impedances of the power delivery system within (a) stack-to-bus topology, and (b) stack-to-stack topology.

to layer  $n$ , as illustrated in Figure 9.10(b). The voltage level of layer  $n$  is therefore determined by the interactions among the three converters, leading to a complex control system for the DPP. A multi-input multi-output (MIMO) control system is required to coordinate the regulation of each layer, increasing the design complexity

as well as the cost of the voltage stacked system. Moreover, the current produced by the converter flows through the S-S parasitic impedance, producing a significant voltage drop in the stack-to-stack topology.

### 9.3.3 Resonant converter-based stack-to-bus topology

The power delivery network of a voltage stacked DPP system is discussed in the previous section. A case study of the parasitic characterization of these systems considering both power delivery systems is provided. Due to certain advantages such as a relatively straightforward regulation scheme, well characterized current paths, and self-contained current loop, a stack-to-bus DPP systems is considered here. As illustrated in Figure 9.11, the DC/DC converter is based on a transformer-based resonant converter [246], where the input of the converter is connected to the system power bus and the output is connected to the stacked layer. Four converters are required to support a four layer voltage stacked system.

The design specifications of the DPP system are listed in Table 9.3. Since a transformer-based resonant converter is considered here, on-chip integration of this DPP system is not practical. An off-chip DPP system is therefore assumed. The switching frequency of the converter is accordingly decreased as compared with the on-chip switched capacitor converter described in Section 9.1. The S-VR parasitic impedance should be carefully considered. Due to the off-chip DPP integration, the



Figure 9.11: Resonant converter-based DPP system with stack-to-bus topology in a four layer voltage stacked system.

S-VR parasitic impedance is no longer negligible. Due to the self-contained current loop in the stack-to-bus topology, the S-S parasitic impedance exhibits little effect on the power noise. This behavior is due to the differential current generated from converter n, which is limited within layer n without passing through the S-S parasitic

Table 9.3: Specification of a resonant converter-based DPP system with a stack-to-bus topology.

| Specs                       | Value         |
|-----------------------------|---------------|
| Frequency                   | 2 MHz         |
| Parasitic resistance (S-VR) | 0.6 mΩ        |
| Parasitic inductance (S-VR) | 1.6 pH        |
| Parasitic resistance (S-S)  | 1.2 mΩ        |
| Parasitic inductance (S-S)  | 8 pH          |
| Load imbalance              | 8 - 80 A      |
| di/dt                       | 7.2 A/ns      |
| On-chip de-cap              | 4 x 5 $\mu$ F |

impedance. A methodology for designing the power delivery system is introduced in the following section.

The layers within the voltage stacked system are modeled as a resistive load with an 8 ampere load current. To model a worst case load imbalance scenario, the load current is increased from 8 to 80 amperes on layer two while the load current on the remaining layers is maintained constant. To suppress the power noise due to this load imbalance, a voltage-controlled oscillator is utilized within each resonant converter. By varying the switching frequency of the resonant converter around the nominal frequency, the voltage across each layer is regulated.

To evaluate the effects of the parasitic impedances of the DPP power network, the range of parasitic resistance and inductance is explored. The limits of the parasitic impedance are set by the assumption that the DPP system is integrated at the package level, forming a system-in-package. The range of parasitic resistance and inductance are, respectively, 0.1 to 5 mΩ and 0 to 80 pH, common values for a system-in-package [193]. The voltage drop as a function of the parasitic impedance is illustrated in Figure 9.12. The area highlighted by the lightest shadow region illustrates the parasitic impedance that satisfies the noise margin, assuming a 5% noise margin with  $V_{DD}$  equal to 0.8 volts. To ensure the power noise is within the target noise margin, the parasitic resistance and inductance are maintained below, respectively, 1 mΩ and 20 pH. Note that the parasitic inductance exhibits a significant effect on the voltage drop



Figure 9.12: Voltage drop as a function of parasitic resistance and inductance between a stacked layer and a converter. The parasitic resistance ranges from 0.1 to 5 mΩ. The parasitic inductance ranges from 0 to 50 pH.

when the parasitic resistance is low [247]. The parasitic resistance has a dominant effect on the voltage drop once the magnitude of the parasitic resistance exceeds 1 mΩ.

## 9.4 Tile-based power delivery network for voltage stacked systems

High current challenges can be significantly mitigated by voltage stacking. The power delivery system within a voltage stacked DPP system is however highly challenging. These major challenges include: 1) the complex nature of a multi-domain



Figure 9.13: A four layer voltage stacked system with a stack-to-bus DPP system.

power delivery network to support a multi-layer voltage stacked system, as illustrated in Figure 9.10; 2) the cross-core current path due to the serial connection of the power network, as discussed in Section 9.3.1; 3) the effect of the parasitic impedance within the power network of a DPP system, as discussed in Section 9.3.2. A methodology for designing a tile-based power delivery system is proposed here for voltage stacked DPP systems, targeting these challenges.

This tile-based power delivery system is intended for a voltage stacked DPP system. Package level DPP integration is assumed, where the power planes within the package are utilized to construct a tile-based power delivery network. Consider a four layer voltage stacked system with a stack-to-bus DPP system, as illustrated in Figure 9.13. Each square block represents one layer, connected to a DC/DC converter. Four



Figure 9.14: Cross-sectional view of a tile-based power delivery network within a package, including the P1, G1, and G2 nets.

layers are oriented to form a square shape, a practical shape for an IC. The current flow within the voltage stacked DPP system is highlighted by the arrow lines.

As discussed in Section 9.3.1, conventional on-chip metal lines cannot manage the high currents being transferred from layer to layer. To manage the cross-core current path, package resources are allocated for this current path in the tile-based power delivery system. A cross-sectional view of the proposed tile-based power delivery network is illustrated in Figure 9.14. The location of the cross-section is highlighted by the vertical surface. The power planes and vias connecting the VR, layer one, and layer two as well as the current flow path are illustrated.

To manage the challenges of a multi-domain power network, a two layer interdigitated power plane is proposed for the tile-based power network. The power domains are isolated from each other. Note that two layers are the minimum number of power



Figure 9.15: Decomposition of the tile-based power delivery system for each power net.

planes for this tile-based power delivery network, where a two layer system is assumed as an example for the following discussion. Consider the P1, G1, and G2 power domains, as illustrated in Figure 9.14. The P1 power plane is beneath DC/DC converter 1 and layer 1. The G1 power plane is one layer above the P1 power plane, occupying the area beneath DC/DC 1, layer 1, and layer 2. The G2 power plane is one layer beneath the P1 power plane, occupying the area beneath layer 2. The P1 and G2 power domains are separated by the dashed line. Only two layers are required for the five power domains, P1, G1, G2, G3, and G4. The power planes for the different nets are separated by the square shape of each layer within the voltage stacked system, forming a tile shaped power plane.

A top view of the tile-based power network as well as a decomposition of each power net are illustrated in Figure 9.15. Four DC/DC converters and five power nets are included. The grey squares, numbered as 1, 2, 3, and 4, represent the four layers

in the voltage stacked system. The hatching squares, numbered as 5, 6, 7, and 8, represent the four DC/DC converters in the DPP system. The DC/DC converters are oriented such that each converter is next to the connected layer. The five tiles with increasing grey level represent, respectively, power net P1, G1, G2, G3, and G4. The number on the tiles illustrates the horizontal location of each tile. Consider an example of power net P1. The number, 5 and 1, on this tile means that the two squares within the tile are beneath converter 1 and layer 1, which are numbered, respectively, 5 and 1. The glowing effect around the tiles illustrates the vertical location of each tile. Squares without glowing effect means this tile is located at the bottom layer of the two layer power planes and the squares with glowing effect means that this tile is located at the top layer of the two layer power planes, as illustrated in Figure 9.14. Each number has one square with glowing effect and one square without glowing effect, indicating that, for each converter square or stacked layer square, two power plane layers exist beneath the square. As discussed in Section 9.3.2, the parasitic impedance between the stacked layer and converter produces significant power noise. To manage this challenge, the proposed tile-based power delivery network is designed to be scalable to a higher number of power planes, reducing the overall parasitic impedance of the DPP system.

Consider an example of the G1 net utilizing the tile-based power delivery network. Three current paths flow within this power domain, as illustrated in Figure 9.16: 1)



Figure 9.16: Tile-based power network matching the circuit model of the stack-to-bus topology.

the current loop between converter 2 and layer 2 for regulation, 2) the current flows from layer 2 to layer 3, and 3) the current loop between converter 3 and layer 3 for regulation. The tile-based power network matches quite well with the circuit model of the stack-to-bus topology, as discussed in Section 9.3.2. In this case, parasitic extraction based on a tile-based power delivery system can be directly integrated into the circuit simulation and verification processes. The parasitic impedance is based on the square shape, which can be extracted by an EM solver [248]. Note that other power delivery systems that do not follow the proposed tile-based design methodology may introduce additional parasitic impedances that are not characterized by the circuit model. The intuitive net separation and tile shape ease the design process and characterization of the parasitic impedances, making this tile-based design methodology a useful method for exploring different power delivery systems within a voltage stacked system. This design methodology can also be applied to the stack-to-stack topology.

Table 9.4: Simulation results of a stack-to-bus DPP system, which utilizes the tile-based power delivery system with ten layer power planes.

| Parameters                | Value      |
|---------------------------|------------|
| Number of Pkg power plane | 10         |
| Worst case $V_{drop}$     | 46 mV      |
| DC $V_{drop}$             | 30 mV      |
| Ripple                    | 1 mV       |
| Settling time             | 10 $\mu$ s |



Figure 9.17: Maximum voltage drop as a function of the number of power planes in the tile-based power delivery system.

Simulations of the proposed tile-based power delivery network is provided here. The simulation framework discussed in Section 9.3.3 is utilized. The parasitic extraction is based on [248]. Ten layers are assumed within the power plane of the package. Ten layers is a practical number as a high number of layer within a package is typical for a high performance computing system [193]. The simulation results are listed in Table 9.4. A maximum voltage drop of 46 mV is exhibited for a load imbalance ranging from 8 to 80 A for a stack-to-bus DPP system. The maximum voltage drop

significantly increases with fewer power planes, as illustrated in Figure 9.17. Depending upon the package technology, the voltage drop can be further reduced by utilizing additional power planes. Note that the layer number shown in Figure 9.17 includes all of the power networks for the voltage stacked system. No additional power planes are required.

## 9.5 Summary

The challenge of load imbalances in a voltage stacked high current system is discussed in this chapter. The power delivery system for a voltage stacked DPP system is a primary issue. It is observed that the on-chip power grid cannot manage the cross-core current paths within a voltage stacked system. Two topologies, stack-to-stack and stack-to-bus, for DPP systems are evaluated, demonstrating a challenging power delivery network design process due to the effects of the parasitic impedances. Targeting these challenges, a design methodology for a tile-based power delivery network is proposed. It is observed that this tile-based power delivery network is effective in mitigating certain challenges such as the complex nature of a multi-domain power network, cross-core current paths flowing between layers, and the effects of the parasitic impedances on the DPP system. The tile-based design methodology also provides an intuitive and efficient process for characterizing the impedances within a power delivery network in voltage stacked systems.

# Chapter 10

## Conclusions

Although CMOS scaling has slowed, the demand for higher integration, greater performance, and more diverse functionality in high performance computing (HPC) systems has yet to end. By utilizing advanced packaging techniques to exploit the vertical dimension, 2.5-D and 3-D systems have effectively extended scaling. The vertical current paths and dense nature of 3-D stacking however make 2.5-D and 3-D systems prone to power integrity issues, leading to challenges in the design of the power delivery system. The increasing speed and throughput of HPC systems along with the slow down of CMOS voltage scaling has led to higher power consumption in HPC systems. This higher power consumption leads to greater on-chip current demand, damaging the reliability and power efficiency of already challenged power delivery networks in 2.5-D and 3-D systems. In this dissertation, several primary

power integrity challenges in 2-D, 2.5-D, and 3-D systems, such as power noise, electromigration, EMI, and last inch power loss, are addressed across a wide range of abstraction levels, unlocking the many advantages of 2.5-D and 3-D integration.

A 2-D IC is the basic component of more complex 2.5-D and 3-D systems. It is therefore important to evaluate the effects of high current demand on power delivery systems in 2-D ICs before considering 2.5-D and 3-D systems. One of the most critical parameters affecting the reliability of a 2-D power delivery system is power noise. Trends in power noise within each component of a power grid in advanced technologies are reviewed. The on-chip power grid produces power noise differently based on the metalization scheme in the power delivery system. The effects of the multiple on-chip metalization scheme on suppressing power noise are evaluated, including additional metal layers for the global power grid, stripes, wider local power rails, and advanced materials. The effectiveness of different metalization schemes on reducing power noise in different technology nodes varies significantly. It is observed that wider local power rails are an effective method to suppress on-chip power noise in advanced technology nodes when high current is required.

In addition to the challenges of on-chip power noise, 2-D power systems suffer from electromigration at the package and board level due to the high current. One effective approach to mitigate this issue is to reduce the current flowing into the power grid while maintaining the same performance. By utilizing serially connected power

delivery networks, voltage stacking significantly reduces the current flowing through the package and board without affecting the digital logic. A special power converter is required to manage load imbalances within a voltage stacked system. The challenges of a voltage stacked power network include multiple voltage domains, high current demand, and exotic power converters. These topics are discussed in this dissertation. A case study of a power network for voltage stacked systems is also provided.

By utilizing advanced package technologies to bring the components physically closer, 2.5-D systems provide greater integration and are more achievable as compared with 3-D systems. In addition to conventional power integrity issues in 2-D power grids, the close proximity leads to new challenges within a 2.5-D power system at the package and board levels. Consider an example of a 2.5-D system, a VR-on-package topology, where the PoL converters are located close to the processor to reduce the “last inch power loss.” EMI is generally not a critical factor in 2-D power grids; however, EMI can pose significant concerns in the reliability of 2.5-D systems. A distributed resonant converter that supports a high step-down ratio and low EMI is therefore proposed in this dissertation. More than 3X lower EMI in the distributed resonant converter is achieved as compared with a conventional resonant converter. Since a 2.5-D system is a heterogeneous system with multiple components, placement becomes another critical design tradeoff. VR-top and -bottom topologies are described in this dissertation in which EMI, power integrity, and power efficiency

of these two VR-on-package topologies are compared. It is observed that the VR-top placement topology exhibits lower EMI and IR drop as compared with the VR-bottom placement topology while exhibiting higher power loss.

To exploit the full potential of the vertical dimension, the challenges of high current 3-D integrated systems need to be addressed. Specific facets introduced in a 3-D power delivery system are the unique vertical current paths, and interactions between these vertical paths and conventional 2-D current paths. A 3-D RDL behaves as an interface between the P/G TSVs within adjacent layers, and between the P/G TSVs and the 2-D power grid. A comprehensive analysis of the functionality, interactions between the 2-D grid and P/G TSVs, and different topologies in an RDL is also provided in this dissertation. A novel grid-based RDL topology is proposed to achieve reliable vertical and horizontal current paths within a 3-D power delivery network. A grid-based RDL is proposed to satisfy high current demand and noise margins within a practical distribution of P/G TSVs. It is also observed that a grid-based RDL topology is an excellent candidate to support a nonuniform TSV distribution and TSV mismatches, easing constraints on the placement of the TSVs within high current 3-D systems.

Reliable and efficient power delivery networks to support high current in 2-D, 2.5-D, and 3-D systems are a key element of a high performance system. Three directions are considered here to mitigate these high current challenges in HPC systems: 1) low

power methodologies at the device, circuit, and architectural levels to reduce on-chip power consumption, 2) circuit techniques, topology improvements, and physical design optimization to mitigate the effects of high current on the power delivery systems, and 3) novel topologies of power delivery systems to reduce the overall current. The second and third research directions are the focus of this dissertation since these two approaches are primary issues in the development of power delivery systems. The first research direction also remains important. The primary objective of this dissertation is to introduce the importance as well as the challenges in high current 3-D power networks and to provide insight and guidelines to produce effective solutions to these important technical challenges.

# Chapter 11

## Future Work

Power consumption will continue to increase in high performance computing systems, leading to significant current flowing through the power delivery network. High currents challenge the entire hierarchy of the power distribution network, ranging from the board, package, and on-chip, across all IC platforms, including 2-D, 2.5-D, and 3-D technologies. High current issues, for example, power noise, electromigration, and electromagnetic interference, are discussed in the previous chapters. Research to mitigate these high current issues is also described in this dissertation from the perspective of power delivery systems.

As a technique to reduce on-chip current demand, voltage stacking has recently received significant attention in both conventional 2-D ICs and 3-D platforms. Possible future work is applying different converter topologies to a voltage stacked environment, as compared with the aforementioned converter topology described in Chapter 9. 3-D integration significantly suffers from thermal issue due to the small form factor

and high on-chip current demand. One promising solution is applying voltage stacking to 3-D systems to simultaneously mitigate the challenges of thermal and high current issues.

The rest of this chapter is organized as follows. As an alternative technique to balance power in voltage stacked systems, a buck converter-based stack-to-stack topology is described in Section 11.1. The challenges of utilizing this topology in differential power processing are also discussed. The potential advantages of applying voltage stacking to three-dimensional systems are discussed in Section 11.2.

## **11.1 Comparison between stack-to-bus and stack-to-stack topologies for voltage stacked systems**

The challenges of on-chip voltage stacking are significant, where load imbalances across different layers leads to voltage variations within each power domain. These voltage variations have become a critical issue, challenging system performance. Differential power processing (DPP) [245], providing only the difference between the load current of adjacent domains, is one way to tackle load imbalances in voltage stacked systems. By utilizing DPP, multiple current balancing techniques have been developed (see Chapter 9) with on-chip decoupling capacitors, switched-capacitor

converters, and resonant converters. A buck converter based DPP has also recently been proposed [237]. The topology of a DPP system is an essential factor affecting the working principles and performance of current balancing techniques. Multiple topologies for DPP can be applied to voltage stacked systems, depending upon the connections between the inputs and outputs between the stacks.

Stack-to-bus and stack-to-stack are two common topologies for DPP [245], as illustrated in Figure 11.1. In a stack-to-bus topology (see Figure 11.1(a)),  $n$  DC/DC converters are required for an  $n$  layer voltage stacked system. The input of each DC/DC converter is connected to the same system power bus, while the output of each DC/DC converter is connected to the nearby stack. In the stack-to-stack topology (see Figure 11.1(b)),  $n-1$  DC/DC converters are connected between the nearby stack and adjacent stacks. A multi-phase buck converter is one way to achieve a DC/DC converter in this topology [237, 238]. By varying the duty cycle of the buck converter and coordinating with adjacent buck converters, voltage variations due to load imbalances can be regulated.

In the stack-to-bus topology, each DC/DC converter (e.g., DC/DC 2 in Figure 11.1(a)) is connected to a nearby stack (Stack 2 in Figure 11.1(a)), which regulates the voltage across this stack. Independent current loops are formed between each stack (e.g., Stack 2) and the nearby DC/DC converter (DC/DC 2), without interacting with other stacks or DC/DC converters. In the stack-to-stack topology, alternatively,



Figure 11.1: Differential power processing to balance loads across different layers within a voltage stacked system. (a) Stack-to-bus topology, where  $n$  DC/DC converters are required for an  $n$  layer voltage stacked system, and (b) stack-to-stack topology, where  $n-1$  DC/DC converters are required for an  $n$  layer voltage stacked system.

each DC/DC converter (e.g., DC/DC 2 in Figure 11.1(b)) is connected to a nearby stack (Stack 2 in Figure 11.1(b)) and adjacent stacks (Stack 1 and 3), as illustrated in Figure 11.1(b). Current flows between certain DC/DC converters (e.g., DC/DC 2) and adjacent stacks (e.g., Stack 1 and 3). Moreover, each DC/DC converter is also connected to adjacent DC/DC converters, permitting current to flow between DC/DC converters in the stack-to-stack topology. The voltage across each stack is regulated by the DC/DC converter directly connected to the stack as well as adjacent DC/DC converters indirectly connected to the stack.



Figure 11.2: MIMO control system within a buck converter-based voltage stacked system utilizing a four layer stack-to-stack topology.

The voltage across each stack depends upon the duty cycle of each of the buck converters within the stack-to-stack topology, leading to a multi-input multi-output (MIMO) control system. Consider, as an example, a buck converter-based voltage stacked system utilizing a four layer stack-to-stack topology. A schematic of the MIMO control system is illustrated in Figure 11.2.  $V_1$ ,  $V_2$ ,  $V_3$ , and  $V_4$  represent the voltage across each stack.  $D_1$ ,  $D_2$ , and  $D_3$  represent the duty cycle of each buck converter within the stack-to-stack topology. In addition, a system level buck converter is connected to the voltage stacked system, where  $V_{in}$ ,  $V_{out}$ , and  $D$  are, respectively, the input voltage, output voltage, and duty cycle. Assuming  $V_{in}$  is

constant, four inputs exist,  $V_{out}$ ,  $V_{in_1}$ ,  $V_{in_2}$ , and  $V_{in_3}$ , where

$$V_{out} = V_{in} \cdot D, \quad (11.1)$$

and

$$V_{in_i} = V_{in} \cdot D_i. \quad (11.2)$$

The four outputs are the stack voltages,  $V_1$ ,  $V_2$ ,  $V_3$ , and  $V_4$ , regulated by the MIMO system. Each output within the MIMO system is affected by the four inputs. For example,  $V_1$  is

$$V_1 = \frac{\alpha_1 \alpha_2 \alpha_3}{1 + \alpha_3 + \alpha_2 \alpha_3 + \alpha_1 \alpha_2 \alpha_3} D \cdot V_{in}, \quad (11.3)$$

where

$$\alpha_i = D_i / (1 - D_i). \quad (11.4)$$

A control mechanism is therefore required to determine the correct duty cycle of each buck converter, adapting to different load imbalance scenarios in a voltage stacked system [238].

This MIMO control mechanism does not exist in traditional buck converters, requiring research to achieve a balanced load within a voltage stacked system utilizing a stack-to-stack topology. The response time of this control mechanism is an essential issue characterizing the MIMO control mechanism. In addition, the power efficiency

of the control system should be carefully considered. The classic tradeoff between speed and power efficiency becomes more challenging in a stack-to-stack topology within a voltage stacked system due to the complex power domain and current flow paths. A comprehensive comparison between the stack-to-bus and stack-to-stack topology within voltage stacked systems in terms of speed, power efficiency, parasitic impedances, and physical design complexity requires further study.

## 11.2 Combination of voltage stacking within 3-D ICs

The successful integration of 3-D memory with high performance CPU/GPU has recently been achieved [75–78]. High current is however a bottleneck that prevents the application of a 3-D platform to HPC systems, as mentioned in Chapter 2. In addition to the correlation between the significant current demand of 3-D systems and thermal challenges, 3-D ICs exhibit a significant shortage of power/ground pinouts and on-chip metal resources due to the smaller footprint. Circuit and architectural level methods that reduce on-chip current demand while maintaining high performance are required to support a 3-D platform.

Voltage stacking is an effective technology to reduce current demand, mitigating high current and thermal issues in 3-D ICs. Voltage stacking provides a scalable



Figure 11.3: 3-D platform with voltage stacking technology. (a) A four layer voltage stacked system, where each layer is stacked above the other layer to form a 3-D system; (b) a four layer voltage stacked system placed above a layer of trench capacitors, forming a 3-D system; and (c) combination of (a) and (b), a 3-D system with the trench capacitors and voltage stacked system integrated into a single 3-D structure.

solution for delivering power in 3-D ICs, where the current density of each layer is maintained constant regardless of the number of layers within a 3-D IC. Voltage



Figure 11.4: Current path within a power distribution network of a 3-D IC consisting of a vertical current path through the P/G TSVs and a horizontal current path within the conventional power grid within each 2-D IC.

stacking within a 3-D IC is illustrated in Figure 11.3(a). A 3-D IC is a natural platform for voltage stacked system, where each physical layer is assigned a voltage domain. Since the current is intended to be the same across all of the layers within a 3-D IC, the number and size of the P/G TSVs are likely to be similar across all of the layers, easing the fabrication effort.

As discussed in Chapter 8, two current paths exist within a voltage stacked system, which includes a local power distribution network within each layer and a dedicated power delivery network passing current from layer to layer. This dedicated power delivery network forms horizontal current paths, which do not exist in conventional

non-voltage stacked systems. Modifications of initial power delivery networks are therefore required to support the horizontal current paths in 2-D voltage stacked systems. In a 3-D IC, the built-in vertical current path through the P/G TSVs (see Figure 11.4) can support the flow of current from layer to layer within a voltage stacked system. Little modification is therefore required to support voltage stacking in 3-D ICs. Unlike traditional 3-D ICs, the P/G TSVs are no longer connected to the P/G TSVs in the adjacent layers. Instead, the P/G TSVs only connect the 2-D power network to the current layer and the layer below. Due to these special characteristics of 3-D voltage stacked systems, research objectives include the development of a circuit model of the power network within a 3-D voltage stacked system and the exploration and development of novel topologies and design guidelines for P/G TSVs that support 3-D voltage stacked systems.

Due to load imbalances in voltage stacking, voltage regulation between layers is required. On-chip push-pull voltage regulators have recently been proposed to mitigate load imbalances [237–239, 242]. A switched capacitor converter with a trench capacitor technology is proposed, leading to a high current density,  $2.3 \text{ A/mm}^2$ , two orders of magnitude larger than in conventional on-chip switched capacitor converters [242]. Trench capacitor technology however remains a challenge to integrate on-chip. As an example, a heterogeneous 3-D system can be an effective platform for on-chip trench capacitors, as illustrated in Figure 11.3(b). A dedicated layer for the

trench capacitor is placed beneath the voltage stacked IC, supporting high current density on-chip regulation. By combining the structures illustrated in Figure 11.3(a) and 11.3(b), a 3-D system with an interdigitated voltage stacked layer and trench capacitor layer can be developed, as illustrated in Figure 11.3(c). A research objective is therefore to evaluate the feasibility of developing heterogeneous 3-D voltage stacked systems. Performance improvements due to 3-D voltage stacked systems as compared to 2-D voltage stacked systems in terms of mitigating load imbalances should also be evaluated.

# Bibliography

- [1] Sten Hellström, *ESD - The Scourge of Electronics*, Springer, 1998.
- [2] Aristotle, *De Animus (On the soul)*, 350 BC.
- [3] W. Gilbert and A. Dowling, *De Magnete*, 1600.
- [4] T. Freeth *et al.*, “Decoding the Ancient Greek Astronomical Calculator Known as the Antikythera Mechanism,” *Nature*, Vol. 444, pp. 587–591, November 2006.
- [5] C. Care, *Technology for Modelling: Electrical Analogies, Engineering Practice, and the Development of Analogue Computing*, Springer, 2010.
- [6] L. Owens, J. M. Nyce, and P. Kahn (Eds), *From Memex to Hypertext*, Academic Press Professional, Inc., 1991.
- [7] J. E. Tomayko, “Helmut Hoelzer’s Fully Electronic Analog Computer,” *Annals of the History of Computing*, Vol. 7, No. 3, pp. 227–240, July 1985.
- [8] A. W. Burks, “Electronic Computing Circuits of the ENIAC,” *Proceedings of the IRE*, Vol. 35, No. 8, pp. 756–767, August 1947.
- [9] B. F. Ronalds, “Francis Ronalds (1788–1873): The First Electrical Engineer?,” *Proceedings of the IEEE*, Vol. 104, No. 7, pp. 1489–1498, July 2016.
- [10] W. D. Devine, “From Shafts to Wires: Historical Perspective on Electrification,” *The Journal of Economic History*, Vol. 43, No. 2, pp. 347–372, June 1983.
- [11] J. Bardeen and W. H. Brattain, “The Transistor, a Semi-Conductor Triode,” *Physical Review*, Vol. 74, pp. 230–231, July 1948.

- [12] F. Faggin, “The Birth of the Microprocessor,” *Byte*, pp. 145–150, March 1992.
- [13] I. P. Vaisband, R. Jakushokas, M. Popovich, A. V. Mezhiba, S. Kose, and E. G. Friedman, *On-Chip Power Delivery and Management, 4th Edition*, Springer International Publishing, 2016.
- [14] G. Taylor, “Energy Efficient Circuit Design and the Future of Power Delivery,” *Keynote at the IEEE Conference on Electrical Performance of Electronic Packaging and Systems*, October 2009.
- [15] J. M. Rabaey and M. Pedram, *Low Power Design Methodologies*, Springer Publishers, 1996.
- [16] Forbes, “<https://www.forbes.com/>,” 2017.
- [17] B. Vaisband and E. G. Friedman, “Heterogeneous 3-D ICs as a Platform for Hybrid Energy Harvesting in IoT Systems,” *Future Generation Computer Systems*, Vol. 87, pp. 152 – 158, October 2018.
- [18] IRDS Roadmap Teams, *International Roadmap for Devices and Systems*, 2017.
- [19] V. F. Pavlidis, I. Savidis, and E. G. Friedman, *Three-Dimensional Integrated Circuit Design, 2nd Edition*, Morgan Kaufmann Publishing, 2017.
- [20] H. Mujtaba, “<https://wccftech.com/>,” 2017.
- [21] AMD, “<https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-2990wx>,” 2018.
- [22] E. A. Burton *et al.*, “FIVR: Fully Integrated Voltage Regulators on 4th Generation Intel Core SoCs,” *Proceedings of the IEEE Applied Power Electronics Conference and Exposition*, pp. 432–439, March 2014.
- [23] E. J. Fluhr *et al.*, “5.1 POWER8: a 12-Core Server-Class Processor in 22nm SOI with 7.6Tb/s Off-Chip Bandwidth,” *Proceedings of the IEEE International Solid-State Circuits Conference*, pp. 96–97, February 2014.
- [24] A. V. Mezhiba and E. G. Friedman, “Scaling Trends of On-Chip Power Distribution Noise,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 12, No. 4, pp. 386–394, April 2004.

- [25] K. Xu and E. G. Friedman, “Scaling Trends of Power Noise in 3-D ICs,” *Integration, the VLSI Journal*, Vol. 51, pp. 139 – 148, September 2015.
- [26] R. S. Patti, “Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs,” *Proceedings of the IEEE*, Vol. 94, No. 6, pp. 1214–1224, June 2006.
- [27] G. Katti, M. Stucchi, K. De Meyer, and W. Dehaene, “Electrical Modeling and Characterization of Through Silicon via for Three-Dimensional ICs,” *IEEE Transactions on Electron Devices*, Vol. 57, No. 1, pp. 256–262, January 2010.
- [28] A. W. Topol *et al.*, “Three-Dimensional Integrated Circuits,” *IBM Journal of Research and Development*, Vol. 50, No. 4.5, pp. 491–506, July 2006.
- [29] K. Xu, B. Vaisband, G. Sizikov, X. Li, and E. G. Friedman, “EMI Suppression With Distributed LLC Resonant Converter for High-Voltage VR-on-Package,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, Vol. 10, No. 2, pp. 263–271, 2020.
- [30] K. T. Tang and E. G. Friedman, “Estimation of Transient Voltage Fluctuations in The CMOS-Based Power Distribution Networks,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, Vol. 5, pp. 463–466, May 2001.
- [31] K. T. Tang and E. G. Friedman, “On-Chip Delta I Noise in The Power Distribution Networks of High Speed CMOS Integrated Circuits,” *Proceedings of the IEEE International ASIC/SOC Conference*, pp. 53–57, September 2000.
- [32] K. T. Tang and E. G. Friedman, “Simultaneous Switching Noise in On-Chip CMOS Power Distribution Networks,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 10, No. 4, pp. 487–493, August 2002.
- [33] K. Xu, R. Patel, P. Raghavan, and E. G. Friedman, “Exploratory Design of On-Chip Power Delivery for 14, 10, and 7 nm and Beyond FinFET ICs,” *Integration, The VLSI Journal*, Vol. 61, pp. 11 – 19, March 2018.

- [34] M. Popovich, E. G. Friedman, R. Secareanu, and O. L. Hartin, “Efficient Distributed On-Chip Decoupling Capacitors for Nanoscale ICs,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 16, No. 12, pp. 1717–1721, December 2008.
- [35] ITRS Technology Working Groups, “International Technology Roadmap for Semiconductors,” 2014.
- [36] A. V. Mezhiba and E. G. Friedman, “Impedance Characteristics of Power Distribution Grids in Nanoscale Integrated Circuits,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 12, No. 11, pp. 1148–1155, November 2004.
- [37] K. T. Tang and E. G. Friedman, “Delay Uncertainty Due to On-Chip Simultaneous Switching Noise in High Performance CMOS Integrated Circuits,” *Proceedings of the IEEE Workshop on Signal Processing Systems*, pp. 633–642, October 2000.
- [38] M. Saint-Laurent and M. Swaminathan, “Impact of Power-Supply Noise on Timing in High Frequency Microprocessors,” *IEEE Transactions on Advanced Packaging*, Vol. 27, No. 1, pp. 135–144, February 2004.
- [39] L. Smith, “Reliability and Performance Tradeoffs in the Design of On-Chip Power Delivery and Interconnects,” *Proceedings of the IEEE Topical Meeting on Electrical Performance of Electronic Packaging*, pp. 49–52, November 1999.
- [40] A. W. Strong *et al.*, *Reliability Wearout Mechanisms in Advanced CMOS Technologies*, Wiley Hoboken Publishing, 2006.
- [41] Chipworks, “Intel 14 nm Generation Tri-Gate Core M-5Y10 Broadwell Processor Technical Analysis Reports,” 2014. [Online]. Available: <http://www.chipworks.com>.
- [42] A. Chandrakasan, W. J. Bowhill, and F. Fox (Eds.), *Design of High-Performance Microprocessor Circutis*, IEEE Press, 2000.

- [43] R. Jakushokas and E. G. Friedman, “Inductance Model of Interdigitated Power and Ground Distribution Networks,” *IEEE Transactions on Circuits and Systems II: Express Briefs*, Vol. 56, No. 7, pp. 585–589, July 2009.
- [44] G. Venezian, “On the Resistance Between Two Points on a Grid,” *American Journal of Physics*, Vol. 62, No. 11, pp. 1000–1004, November 1994.
- [45] S. Kose and E. G. Friedman, “Effective Resistance of a Two Layer Mesh,” *IEEE Transactions on Circuits and Systems II: Express Briefs*, Vol. 58, No. 11, pp. 739–743, November 2011.
- [46] A. V. Mezhiba and E. G. Friedman, “Inductive Properties of High-Performance Power Distribution Grids,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 10, No. 6, pp. 762–776, December 2002.
- [47] K. Vaidyanathan *et al.*, “Design and Manufacturability Tradeoffs in Unidirectional and Bidirectional Standard Cell Layouts in 14 nm Node,” *Proceedings of the Society of Photo-Optical Instrumentation Engineers*, Vol. 8327, March 2012.
- [48] S. Lin and N. Chang, “Challenges in Power-Ground Integrity,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 651–654, November 2001.
- [49] R. Evans and M. Tsuk, “Modeling and Measurement of a High-Performance Computer Power Distribution System,” *IEEE Transactions on Components, Packaging, and Manufacturing Technology*, Vol. 17, No. 4, pp. 467–471, November 1994.
- [50] H. H. Chen and J. S. Neely, “Interconnect and Circuit Modeling Techniques for Full-Chip Power Supply Noise Analysis,” *IEEE Transactions on Components, Packaging, and Manufacturing Technology*, Vol. 21, No. 3, pp. 209–215, August 1998.
- [51] R. Berridge *et al.*, “IBM POWER6 Microprocessor Physical Design and Design Methodology,” *IBM Journal of Research and Development*, Vol. 51, No. 6, pp. 685–714, November 2007.

- [52] Y. Zhong and M. D. F. Wong, “Fast Algorithm for IR Drop Analysis in Large Power Distribution Grid,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 351–357, November 2005.
- [53] Z. Li, R. Balasubramanian, F. Liu, and S. Nassif, “2011 TAU Power Grid Simulation Contest: Benchmark Suite and Results,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 478–481, November 2011.
- [54] Z. Li, R. Balasubramanian, F. Liu, and S. Nassif, “2012 TAU Power Grid Simulation Contest: Benchmark Suite and Results,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 643–646, November 2012.
- [55] S. Kose and E. G. Friedman, “Efficient Algorithms for Fast IR Drop Analysis Exploiting Locality,” *Integration, the VLSI Journal*, Vol. 45, No. 2, pp. 149 – 161, March 2012.
- [56] J. Yang, Z. Li, Y. Cai, and Q. Zhou, “PowerRush: a Linear Simulator for Power Grid,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 482–487, November 2011.
- [57] T. Chen and C. Chen, “Efficient Large-Scale Power Grid Analysis Based on Preconditioned Krylov-Subspace Iterative Methods,” *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 559–562, June 2001.
- [58] M. Zhao *et al.*, “Hierarchical Analysis of Power Distribution Networks,” *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 150–155, June 2000.
- [59] H. Qian, S. R. Nassif, and S. S. Sapatnekar, “Power Grid Analysis Using Random Walks,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 24, No. 8, pp. 1204–1224, August 2005.
- [60] J. N. Kozhaya, S. R. Nassif, and F. N. Najm, “A Multigrid-Like Technique for Power Grid Analysis,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 21, No. 10, pp. 1148–1160, October 2002.

- [61] J. Yang, Z. Li, Y. Cai, and Q. Zhou, “PowerRush : Efficient Transient Simulation for Power Grid Analysis,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 653–659, November 2012.
- [62] T. Yu and M. D. F. Wong, “PGT-SOLVER: An Efficient Solver for Power Grid Transient Analysis,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 647–652, November 2012.
- [63] X. Xiong and J. Wang, “Parallel Forward and Back Substitution for Efficient Power Grid Simulation,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 660–663, November 2012.
- [64] M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw, “Hierarchical Analysis of Power Distribution Networks,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 21, No. 2, pp. 159–168, February 2002.
- [65] K. Sun, Q. Zhou, K. Mohanram, and D. C. Sorensen, “Parallel Domain Decomposition for Simulation of Large-Scale Power Grids,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 54–59, November 2007.
- [66] S. R. Nassif and J. N. Kozhaya, “Fast Power Grid Simulation,” *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 156–161, June 2000.
- [67] H. Su, E. Acar, and S. R. Nassif, “Power Grid Reduction Based on Algebraic Multigrid Principles,” *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 109–112, June 2003.
- [68] K. Shakeri and J. D. Meindl, “Compact Physical IR-Drop Models for Chip/Package Co-Design of Gigascale Integration (GSI),” *IEEE Transactions on Electron Devices*, Vol. 52, No. 6, pp. 1087–1096, June 2005.
- [69] E. Salman and E. G. Friedman, *High Performance Integrated Circuit Design*, McGraw-Hill Publishers, 2012.
- [70] K. Banerjee, S. J. Souris, P. Kapur, and K. C. Saraswat, “3-D ICs: a Novel Chip Design for Improving Deep-Submicrometer Interconnect Performance and

Systems-on-Chip Integration," *Proceedings of the IEEE*, Vol. 89, No. 5, pp. 602–633, May 2001.

[71] J. Lu, "3-D Hyperintegration and Packaging Technologies for Micro-Nano Systems," *Proceedings of the IEEE*, Vol. 97, No. 1, pp. 18–30, January 2009.

[72] J. U. Knickerbocker *et al.*, "Development of Next-Generation System-on-Package (SoP) Technology Based on Silicon Carriers with Fine-Pitch Chip Interconnection," *IBM Journal of Research and Development*, Vol. 49, No. 4/5, pp. 725–753, July 2005.

[73] J. H. Lau *et al.*, "Low-Cost TSH (Through-Silicon Hole) Interposers for 3D IC Integration," *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 290–296, May 2014.

[74] J. H. Lau, "Evolution, Challenge, and Outlook of TSV, 3D IC Integration and 3D Silicon Integration," *Proceedings of the IEEE International Symposium on Advanced Packaging Materials*, pp. 462–488, October 2011.

[75] U. Kang *et al.*, "8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology," *IEEE Journal of Solid-State Circuits*, Vol. 45, No. 1, pp. 111–119, January 2010.

[76] J. Kim *et al.*, "A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With  $4 \times 128$  I/Os Using TSV Based Stacking," *IEEE Journal of Solid-State Circuits*, Vol. 47, No. 1, pp. 107–116, January 2012.

[77] J. Jeddeloh and B. Keeth, "Hybrid Memory Cube New DRAM Architecture Increases Density and Performance," *Proceedings of the IEEE Symposium on VLSI Technology*, pp. 87–88, June 2012.

[78] D. U. Lee *et al.*, "A 1.2V 8Gb 8-channel 128GB/s High-Bandwidth Memory (HBM) Stacked DRAM with Effective Microbump I/O Test Methods Using 29nm Process and TSV," *Proceedings of the IEEE International Solid-State Circuits Conference*, pp. 432–433, February 2014.

[79] J. W. Poulton *et al.*, "A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications,"

*IEEE Journal of Solid-State Circuits*, Vol. 48, No. 12, pp. 3206–3218, December 2013.

- [80] P. Dorsey, “Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency,” *Xilinx White paper: Vertex-7 FPGAs*, 2010.
- [81] A. Sodani, “Knights Landing (KNL): 2nd Generation Intel Xeon Phi Processor,” *Proceedings of the HotChips*, pp. 1–24, August 2015.
- [82] T. Paul, “<https://www.nextplatform.com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/>,” 2018.
- [83] A. Shayan *et al.*, “3D Power Distribution Network Co-Design for Nanoscale Stacked Silicon ICs,” *Proceedings of the IEEE Symposium on Electrical Performance of Electronic Packaging*, pp. 11–14, October 2008.
- [84] P. Zhou, K. Sridharan, and S. S. Sapatnekar, “Optimizing Decoupling Capacitors in 3D Circuits for Power Grid Integrity,” *IEEE Design Test of Computers*, Vol. 26, No. 5, pp. 15–25, September 2009.
- [85] K. Kim, J. S. Pak, H. Lee, and J. Kim, “Effects of On-Chip Decoupling Capacitors and Silicon Substrate On Power Distribution Networks in TSV-based 3D-ICs,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 690–697, May 2012.
- [86] N. H. Khan, S. Reda, and S. Hassoun, “Early Estimation of TSV Area for Power Delivery in 3-D Integrated Circuits,” *Proceedings of the IEEE International 3D Systems Integration Conference*, pp. 1–6, November 2010.
- [87] Q. Wu and T. Zhang, “Design Techniques to Facilitate Processor Power Delivery in 3-D Processor-DRAM Integrated Systems,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 19, No. 9, pp. 1655–1666, September 2011.
- [88] S. M. Satheesh and E. Salman, “Power Distribution in TSV-Based 3-D Processor-Memory Stacks,” *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, Vol. 2, No. 4, pp. 692–703, December 2012.

- [89] V. F. Pavlidis and G. D. Micheli, “Power Distribution Paths in 3-D ICs,” *Proceedings of the ACM Great Lakes Symposium on VLSI*, pp. 263–268, May 2009.
- [90] S. Ge and E. G. Friedman, “Data Bus Swizzling in TSV-Based Three-Dimensional Integrated Circuits,” *Microelectronics Journal*, Vol. 44, No. 8, pp. 696 – 705, August 2013.
- [91] N. H. Khan, S. M. Alam, and S. Hassoun, “Power Delivery Design for 3-D ICs Using Different Through-Silicon Via (TSV) Technologies,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 19, No. 4, pp. 647–658, April 2011.
- [92] J. S. Pak *et al.*, “PDN Impedance Modeling and Analysis of 3D TSV IC by Using Proposed P/G TSV Array Model Based on Separated P/G TSV and Chip-PDN Models,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, Vol. 1, No. 2, pp. 208–219, February 2011.
- [93] B. Vaisband and E. G. Friedman, “Hexagonal TSV Bundle Topology for 3-D ICs,” *IEEE Transactions on Circuits and Systems II: Express Briefs*, Vol. 64, No. 1, pp. 11–15, January 2017.
- [94] I. Savidis and E. G. Friedman, “Closed-Form Expressions of 3-D Via Resistance, Inductance, and Capacitance,” *IEEE Transactions on Electron Devices*, Vol. 56, No. 9, pp. 1873–1881, September 2009.
- [95] Y. Liang and Y. Li, “Closed-Form Expressions for the Resistance and the Inductance of Different Profiles of Through-Silicon Vias,” *IEEE Electron Device Letters*, Vol. 32, No. 3, pp. 393–395, March 2011.
- [96] A. Todri *et al.*, “A Study of Tapered 3-D TSVs for Power and Thermal Integrity,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 21, No. 2, pp. 306–319, February 2013.
- [97] J. Su, F. Wang, and W. Zhang, “Capacitance Expressions and Electrical Characterization of Tapered Through- Silicon Vias for 3-D ICs,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, Vol. 5, No. 10, pp. 1488–1496, October 2015.

- [98] N. Ranganathan *et al.*, “The Development of a Tapered Silicon Micro-Micromachining Process for 3D Microsystems Packaging,” *Journal of Micromechanics and Microengineering*, Vol. 18, No. 11, October 2008.
- [99] R. Weerasekera *et al.*, “Compact Modelling of Through-Silicon Vias (TSVs) in Three-Dimensional (3-D) Integrated Circuits,” *Proceedings of the IEEE International Conference on 3D System Integration*, pp. 1–8, September 2009.
- [100] GitHub, “<https://github.com/albanie/convnet-burden>,” 2018.
- [101] Radeon Technologies Group (AMD), “<https://www.amd.com/en>,” 2017.
- [102] AMD, “<https://www.amd.com/en/processors/server-white-papers>,” 2018.
- [103] AMD, “<https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-1950x>,” 2018.
- [104] AMD, “<https://www.amd.com/en/ryzen-7>,” 2018.
- [105] Intel ARK (Product Specs), “<https://ark.intel.com/products/95830/Intel-Xeon-Phi-Processor-7295-16GB-1.50-GHz-72-core>,” 2018.
- [106] Intel ARK (Product Specs), “<https://ark.intel.com/products/95828/Intel-Xeon-Phi-Processor-7235-16GB-1.30-GHz-64-core>,” 2018.
- [107] Intel ARK (Product Specs), “<https://ark.intel.com/products/series/186673/9th-Generation-Intel-Core-i9-Processors>,” 2018.
- [108] Intel ARK (Product Specs), “<https://ark.intel.com/products/series/122593/8th-Generation-Intel-Core-i7-Processors>,” 2018.
- [109] Intel ARK (Product Specs), “<https://ark.intel.com/products/series/134902/9th-Generation-Intel-Core-i5-Processors>,” 2018.
- [110] Nvidia, “<https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf>,” 2018.
- [111] R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd, “Power7: IBM’s Next-Generation Server Processor,” *IEEE Micro*, Vol. 30, No. 2, pp. 7–15, March 2010.

- [112] B. Sinharoy *et al.*, “IBM POWER8 Processor Core Microarchitecture,” *IBM Journal of Research and Development*, Vol. 59, No. 1, pp. 2:1–2:21, January 2015.
- [113] S. K. Sadasivam, B. W. Thompto, R. Kalla, and W. J. Starke, “IBM Power9 Processor Architecture,” *IEEE Micro*, Vol. 37, No. 2, pp. 40–51, March 2017.
- [114] AMD, “<https://www.amd.com/en/products/graphics/radeon-rx-580>,” 2018.
- [115] AMD, “<https://www.amd.com/en-us/products/graphics/desktop/r9>,” 2018.
- [116] S. Jain *et al.*, “A 280mV-to-1.2V Wide-Operating-Range IA-32 Processor in 32nm CMOS,” *Proceedings of the IEEE International Solid-State Circuits Conference*, pp. 66–68, February 2012.
- [117] R. G. Dreslinski *et al.*, “Near-Threshold Computing: Reclaiming Moore’s Law Through Energy Efficient Integrated Circuits,” *Proceedings of the IEEE*, Vol. 98, No. 2, pp. 253–266, February 2010.
- [118] K. Mistry *et al.*, “A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100Pb-free Packaging,” *Proceedings of the IEEE International Electron Devices Meeting*, pp. 247–250, December 2007.
- [119] D. Ha *et al.*, “Molybdenum Gate HfO<sub>x</sub>/sub 2/ CMOS FinFET Technology,” *Proceedings of the IEEE International Electron Devices Meeting*, pp. 643–646, December 2004.
- [120] D. Flynn, R. Aitken, A. Gibbons, and K. Shi, *Low Power Methodology Manual For System-on-Chip Design*, Springer Publishers, 2007.
- [121] D. Shin *et al.*, “Quantified Design Guides for the Reduction of Radiated Emissions in Package-Level Power Distribution Networks,” *IEEE Transactions on Electromagnetic Compatibility*, Vol. 59, No. 2, pp. 468–480, April 2017.
- [122] J. S. Youn *et al.*, “Chip and Package-Level Wideband EMI Analysis for Mobile DRAM Devices,” *Proceedings of the DesignCon*, pp. 1–14, January 2016.

- [123] N. Kim *et al.*, “Package-Level Electromagnetic Interference Analysis,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 2119–2123, May 2014.
- [124] C. H. Ko, H. L. Lee, and C. H. Wang, “The EMI Suppression of Ultra Thin MEMS Microphone Package,” *Proceedings of the IEEE International Microsystems Packaging Assembly and Circuits Technology Conference*, pp. 1–3, October 2010.
- [125] S. Kose, E. G. Friedman, R. M. Secareanu, and O. Hartin, “Current Profile of a Microcontroller to Determine Electromagnetic Emissions,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 2650–2653, May 2013.
- [126] J. R. Black, “Electromigration—A Brief Survey and Some Recent Results,” *IEEE Transactions on Electron Devices*, Vol. 16, No. 4, pp. 338–347, April 1969.
- [127] J. Lienig, “Electromigration and its Impact on Physical Design in Future Technologies,” *Proceedings of the ACM International Symposium on Physical Design*, pp. 33–40, March 2013.
- [128] C.-K. Hu *et al.*, “Reduced Cu Interface Diffusion by CoWP Surface Coating,” *Microelectronic Engineering*, Vol. 70, No. 2, pp. 406 – 411, November 2003.
- [129] A. Syed *et al.*, “Electromigration Reliability and Current Carrying Capacity of Various WLCSP Interconnect Structures,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 714–724, May 2013.
- [130] S. A. Khan *et al.*, “High Current-Carrying and Highly-Reliable 30 um Diameter Cu-Cu Area-Array Interconnections Without Solder,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 577–582, May 2012.
- [131] C. Subramiam *et al.*, “One Hundred Fold Increase in Current Carrying Capacity in a Carbon Nanotube Copper Composite,” *Nature Communication*, Vol. 4, No. 1, pp. 2202, July 2013.

- [132] A. Arunkumar *et al.*, “MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability,” *Proceedings of the Annual International Symposium on Computer Architecture*, pp. 320–332, June 2017.
- [133] A. Zou *et al.*, “Efficient and Reliable Power Delivery in Voltage-Stacked Many-Core System with Hybrid Charge-Recycling Regulators,” *Proceedings of the IEEE/ACM Design Automation Conference*, pp. 43:1–43:6, June 2018.
- [134] Q. Zhang, L. Lai, M. Gottscho, and P. Gupta, “Multi-Story Power Distribution Networks for GPUs,” *Proceedings of the Design, Automation Test in Europe Conference*, pp. 451–456, March 2016.
- [135] E. K. Ardestani *et al.*, “Managing Mismatches in Voltage Stacking with Core-Unfolding,” *ACM Transactions on Architecture and Code Optimization*, Vol. 12, No. 4, pp. 43:1–43:26, November 2015.
- [136] T. Tong *et al.*, “A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications,” *IEEE Journal of Solid-State Circuits*, Vol. 51, No. 9, pp. 2142–2152, September 2016.
- [137] 3M Novec, “<https://www.3m.com/3M/en-US/novec-us/>,” 2014.
- [138] 3M Novec, “<https://multimedia.3m.com/mws/media/1127920O/2-phase-immersion-cooling-a-revolution-in-data-center-efficiency.pdf>,” 2014.
- [139] W. Steinhögl *et al.*, “Comprehensive Study of the Resistivity of Copper Wires with Lateral Dimensions of 100 nm and Smaller,” *Journal of Applied Physics*, Vol. 97, No. 2, pp. 023706–1—023706–7, January 2005.
- [140] P. Pasanen *et al.*, “Graphene for Future Electronics,” *Physica Scripta*, Vol. 2012, No. T146, pp. 014025, January 2012.
- [141] J. S. Moon and D. K. Gaskill, “Graphene: Its Fundamentals to Future Applications,” *IEEE Transactions on Microwave Theory and Techniques*, Vol. 59, No. 10, pp. 2702–2708, October 2011.

- [142] K. Schuegraf *et al.*, “Semiconductor Logic Technology Innovation to Achieve Sub-10 nm Manufacturing,” *IEEE Transactions on Electron Devices*, Vol. 1, No. 3, pp. 66–75, March 2013.
- [143] J. Whitehouse and E. John, “Leakage and Delay Analysis in FinFET Array Multiplier Circuits,” *Proceedings of the IEEE International Midwest Symposium on Circuits and Systems*, pp. 909–912, August 2014.
- [144] S. Natarajan *et al.*, “A 14nm Logic Technology Featuring 2nd-Generation FinFET, Air-Gapped Interconnects, Self-Aligned Double Patterning and a  $0.0588\ \mu\text{m}^2$  SRAM Cell Size,” *Proceedings of the IEEE International Electron Devices Meeting*, pp. 3.7.1–3.7.3, December 2014.
- [145] ITRS Technology Working Groups, “International Technology Roadmap for Semiconductors,” 2015.
- [146] A. C. Ferrari *et al.*, “Science and Technology Roadmap for Graphene, Related Two-Dimensional Crystals, and Hybrid Systems,” *Nanoscale*, Vol. 7, pp. 4598–4810, March 2015.
- [147] F. Lacy, “Developing a Theoretical Relationship between Electrical Resistivity, Temperature, and Film Thickness for Conductors,” *Nanoscale Research Letters*, Vol. 6, No. 1, pp. 1–14, December 2011.
- [148] R. Murali *et al.*, “Resistivity of Graphene Nanoribbon Interconnects,” *IEEE Electron Device Letters*, Vol. 30, No. 6, pp. 611–613, June 2009.
- [149] I. Khrapach *et al.*, “Novel Highly Conductive and Transparent Graphene-Based Conductors,” *Advanced Materials*, Vol. 24, No. 21, pp. 2844–2849, June 2012.
- [150] N. Srivastava, X. Qi, and K. Banerjee, “Impact of On-Chip Inductance On Power Distribution Network Design for Nanometer Scale Integrated Circuits,” *Proceedings of the IEEE International Symposium on Quality Electronic Design*, pp. 346–351, March 2005.
- [151] R. Jakushokas and E. G. Friedman, “Multi-Layer Interdigitated Power Distribution Networks,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 19, No. 5, pp. 774–786, May 2011.

- [152] I. S. Kourtev, B. Taskin, and E. G. Friedman, *Timing Optimization Through Clock Skew Scheduling*, Springer International Publishing, 2009.
- [153] M. Popovich, E. G. Friedman, R. Secareanu, and O. L. Hartin, “On-Chip Power Noise Reduction Techniques in High Performance SoC-Based Integrated Circuits,” *Proceedings of the IEEE International SOC Conference*, pp. 309–312, September 2005.
- [154] M. S. Gupta *et al.*, “Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network,” *Proceedings of the IEEE Design, Automation Test in Europe*, pp. 1–6, April 2007.
- [155] E. Salman, E. G. Friedman, R. M. Secareanu, and O. L. Hartin, “Worst Case Power/Ground Noise Estimation Using an Equivalent Transition Time for Resonance,” *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 56, No. 5, pp. 997–1004, May 2009.
- [156] R. Patel, P. Raghavan, and E. G. Friedman, “Power Noise in 14, 10, and 7 nm FinFET CMOS Technologies,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 37–40, May 2016.
- [157] M. Popovich, M. Sotman, A. Kolodny, and E. G. Friedman, “Effective Radii of On-Chip Decoupling Capacitors,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 16, No. 7, pp. 894–907, July 2008.
- [158] R. Jakushokas, M. Popovich, A. V. Mezhiba, S. Kose, and E. G. Friedman, *Power Distribution Networks with On-Chip Decoupling Capacitors, 2nd Edition*, Springer International Publishing, 2011.
- [159] A. Todri, M. Marek-Sadowska, F. Maire, and C. Matheron, “A Study of Decoupling Capacitor Effectiveness in Power and Ground Grid Networks,” *Proceedings of the IEEE International Symposium on Quality Electronic Design*, pp. 653–658, March 2009.
- [160] X. Yang, B. K. Choi, and M. Sarrafzadeh, “Routability-Driven White Space Allocation for Fixed-Die Standard-Cell Placement,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 22, No. 4, pp. 410–419, April 2003.

- [161] S. S. Y. Liu *et al.*, “Effective Power Network Prototyping via Statistical-Based Clustering and Sequential Linear Programming,” *Proceedings of the IEEE Conference on Design, Automation and Test in Europe*, pp. 1701–1706, March 2013.
- [162] C. C. Huang *et al.*, “Improving Power Delivery Network Design by Practical Methodologies,” *Proceedings of the IEEE International Conference on Computer Design*, pp. 237–242, October 2014.
- [163] L. T. Wang, Y. W. Chang, and K. T. Cheng, *Electronic Design Automation: Synthesis, Verification, and Test*, Morgan Kaufmann Publishing, 2009.
- [164] Z. Zeng, T. Xu, Z. Feng, and P. Li, “Fast Static Analysis of Power Grids: Algorithms and Implementations,” *Proceedings of the IEEE International Conference on Computer-Aided Design*, pp. 488–493, November 2011.
- [165] C. H. Lin *et al.*, “High Performance 14nm SOI FinFET CMOS Technology with  $0.0174 \mu\text{m}^2$  Embedded DRAM and 15 Levels of Cu Metallization,” *Proceedings of the IEEE International Electron Devices Meeting*, pp. 3.8.1–3.8.3, December 2014.
- [166] R. C. N. Pilawa-Podgurski *et al.*, “Very High Frequency Resonant Boost Converters,” *Proceedings of the IEEE Power Electronics Specialists Conference*, Vol. 24, pp. 2718–2724, June 2007.
- [167] R. W. Erickson and D. Maksimovic, *Fundamentals of Power Electronics*, Springer Science & Business Media, 2007.
- [168] P. Clarke, “Self-Commutated Thyristor DC-to-DC Converter,” *IEEE Transactions on Magnetics*, Vol. 6, No. 1, pp. 10–15, March 1970.
- [169] D. J. Perreault *et al.*, “Opportunities and Challenges in Very High Frequency Power Conversion,” *Proceedings of the IEEE Applied Power Electronics Conference and Exposition*, pp. 1–14, February 2009.
- [170] A. Isurin and A. Cook, “Cost Effective Resonant DC-DC Converter for Hi-Power and Wide Load Range Operation,” *Proceedings of the IEEE International Symposium on Industrial Electronics*, Vol. 2, pp. 1014–1018, July 2006.

- [171] K. I. Hwu, W. Z. Jiang, and Y. T. Yau, "Ultrahigh Step-Down Converter," *IEEE Transactions on Power Electronics*, Vol. 30, No. 6, pp. 3262–3274, June 2015.
- [172] L. Brush, "Distributed Power Architecture Demand Characteristics," *Proceedings of the IEEE Applied Power Electronics Conference and Exposition*, Vol. 1, pp. 342–345, February 2004.
- [173] H. Huang, "Coordination of Design Issues in the Intermediate Bus Architecture," *Proceedings of the IEEE Applied Power Electronics Conference and Exposition*, Vol. 1, pp. 169–175, March 2005.
- [174] Y. S. Lee and G. T. Cheng, "Quasi-Resonant Zero-Current-Switching Bidirectional Converter for Battery Equalization Applications," *IEEE Transactions on Power Electronics*, Vol. 21, No. 5, pp. 1213–1224, September 2006.
- [175] R. L. Steigerwald, "A Comparison of Half-Bridge Resonant Converter Topologies," *Proceedings of the IEEE Applied Power Electronics*, pp. 135–144, March 1987.
- [176] F. Krismer and J. W. Kolar, "Accurate Power Loss Model Derivation of a High-Current Dual Active Bridge Converter for an Automotive Application," *IEEE Transactions on Industrial Electronics*, Vol. 57, No. 3, pp. 881–891, March 2010.
- [177] K. Venkatachalam, C. R. Sullivan, T. Abdallah, and H. Tacca, "Accurate Prediction of Ferrite Core Loss with Nonsinusoidal Waveforms Using Only Steinmetz Parameters," *Proceedings of the IEEE Computers in Power Electronics*, pp. 36–41, June 2002.
- [178] R. Yu *et al.*, "Computer-Aided Design and Optimization of High-Efficiency LLC Series Resonant Converter," *IEEE Transactions on Power Electronics*, Vol. 27, No. 7, pp. 3243–3256, July 2012.
- [179] F. Xue, R. Yu, and A. Q. Huang, "A 98.3% Efficient GaN Isolated Bidirectional DC-DC Converter for DC Microgrid Energy Storage System Applications," *IEEE Transactions on Industrial Electronics*, Vol. 64, No. 11, pp. 9094–9103, November 2017.

- [180] X. Huang *et al.*, “Conducted EMI Analysis and Filter Design for MHz Active Clamp Flyback Front-End Converter,” *Proceedings of the IEEE Applied Power Electronics Conference and Exposition*, pp. 1534–1540, March 2016.
- [181] H. I. Hsieh, J. S. Li, and D. Chen, “Effects of X Capacitors on EMI Filter Effectiveness,” *IEEE Transactions on Industrial Electronics*, Vol. 55, No. 2, pp. 949–955, February 2008.
- [182] Y. Xiong and Z. Yan, “EMI and PI Analysis of Analog Board,” *Proceedings of the IEEE International Symposium on Microwave, Antenna, Propagation and EMC Technologies for Wireless Communications*, pp. 171–175, October 2013.
- [183] S. Hayashi and M. Yamada, “EMI-Noise Analysis under ASIC Design Environment,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 19, No. 11, pp. 1337–1346, November 2000.
- [184] T. Sudo, H. Sasaki, N. Masuda, and J. L. Drewniak, “Electromagnetic Interference (EMI) of System-On-Package (SOP),” *IEEE Transactions on Advanced Packaging*, Vol. 27, No. 2, pp. 304–314, May 2004.
- [185] Ansys SIwave, “<http://www.ansys.com/products/electronics/ansys-siwave>,” 2017.
- [186] N. Sturcken *et al.*, “A 2.5D Integrated Voltage Regulator using Coupled-Magnetic-Core Inductors on Silicon Interposer,” *IEEE Journal of Solid-State Circuits*, Vol. 48, No. 1, pp. 244–254, January 2013.
- [187] A. Fontanelli, “System-in-Package Technology: Opportunities and Challenges,” *Proceedings of the IEEE International Symposium on Quality Electronic Design*, pp. 589–593, March 2008.
- [188] Cadence OrCAD, “<https://www.ema-eda.com/products/cadence-orcad>,” 2018.
- [189] Cadence Allegro, “<https://www.cadence.com/content/cadence-www/global/en-US/home/tools/pcb-design-and-analysis/pcb-layout/allegro-pcb-designer.html>,” 2018.

- [190] Y. Shu, X. Wei, X. Yu, and C. Liu, “Effects of Grounded-Lid Apertures for Package-Level Electromagnetic Interference (EMI) Shielding,” *Proceedings of the IEEE International Symposium on Electromagnetic Compatibility Signal/Power Integrity*, pp. 345–348, August 2017.
- [191] A. Khoshnati and R. Abhari, “Suppression of Radiated Electromagnetic Emissions Using Absorbing Frequency Selective Surfaces,” *Proceedings of the IEEE Conference on Electrical Performance of Electronic Packaging and Systems*, pp. 1–3, October 2017.
- [192] Intel Pentium 4 Processor in the 423 pin package/Intel 850 Chipset Platform Intel, February 2002.
- [193] K. Xu, B. Vaisband, G. Sizikov, X. Li, and E. G. Friedman, “Power Noise and Near-Field EMI of High-Current System-in-Package With VR Top and Bottom Placements,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, Vol. 9, No. 4, pp. 712–718, April 2019.
- [194] C. Palesko and A. Lujan, “Cost Comparison of Fan-out Wafer-Level Packaging to Fan-out Panel-Based Packaging,” *Proceedings of the International Symposium on Microelectronics*, Vol. 2016, No. 1, pp. 000180–000184, August 2016.
- [195] S. Werner, J. Navaridas, and M. Lujan, “A Survey on Optical Network-On-Chip Architectures,” *ACM Computing Surveys*, Vol. 50, No. 6, pp. 1–37, December 2017.
- [196] A. Shacham, K. Bergman, and L. P. Carloni, “On the Design of a Photonic Network-on-Chip,” *Proceedings of the IEEE Symposium on Networks-on-Chip*, pp. 53–64, May 2007.
- [197] J. Chan, G. Hendry, A. Biberman, and K. Bergman, “Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis,” *Journal of Lightwave Technology*, Vol. 28, No. 9, pp. 1305–1315, May 2010.
- [198] X. Tan *et al.*, “On a Scalable, Non-Blocking Optical Router for Photonic Networks-on-Chip Designs,” *Proceedings of the Symposium on Photonics and Optoelectronics*, pp. 1–4, May 2011.

- [199] Y. Zhang *et al.*, “Ultralow-Loss Silicon Waveguide Crossing Using Bloch Modes in Index-Engineered Cascaded Multimode-Interference Couplers,” *Optics Letters*, Vol. 38, No. 18, pp. 3608–3611, September 2013.
- [200] A. Boos, L. Ramini, U. Schlichtmann, and D. Bertozzi, “PROTON: An Automatic Place-and-Route Tool for Optical Networks-on-Chip,” *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pp. 138–145, November 2013.
- [201] C. Sun *et al.*, “Single-Chip Microprocessor that Communicates Directly Using Light,” *Nature*, Vol. 528, pp. 534–538, December 2015.
- [202] A. V. Beuningen and U. Schlichtmann, “PLATON: A Force-Directed Placement Algorithm for 3D Optical Networks-on-Chip,” *Proceedings of the ACM International Symposium on Physical Design*, pp. 27–34, April 2016.
- [203] W. Bogaerts, P. Dumon, D. V. Thourhout, and R. Baets, “Low-Loss, Low-Cross-Talk Crossings for Silicon-on-Insulator Nanophotonic Waveguides,” *Optics Letters*, Vol. 32, No. 19, pp. 2801–2803, October 2007.
- [204] Y. Luo *et al.*, “Low-Loss Low-Crosstalk Silicon Rib Waveguide Crossing with Tapered Multimode-Interference Design,” *Proceedings of the International Conference on Group IV Photonics*, pp. 150–152, August 2012.
- [205] Y. Zhang *et al.*, “A CMOS-Compatible, Low-Loss, and Low-Crosstalk Silicon Waveguide Crossing,” *IEEE Photonics Technology Letters*, Vol. 25, No. 5, pp. 422–425, March 2013.
- [206] A. V. Tsarev, “Efficient Silicon Wire Waveguide Crossing with Negligible Loss and Crosstalk,” *Optics Express*, Vol. 19, No. 15, pp. 13732–13737, July 2011.
- [207] Y. Liu, J. M. Shainline, X. Zeng, and M. A. Popović, “Ultra-Low-Loss CMOS-Compatible Waveguide Crossing Arrays based on Multimode Bloch Waves and Imaginary Coupling,” *Optics Letters*, Vol. 39, No. 2, pp. 335–338, January 2014.
- [208] OptiFDTD, Available: <https://optiwave.com/optifDTD-overview/>, 2018.

- [209] R. Ulrich and T. Kamiya, “Resolution of Self-Images in Planar Optical Waveguides,” *Journal of the Optical Society of America*, Vol. 68, No. 5, pp. 583–592, May 1978.
- [210] L. B. Soldano and E. C. M. Pennings, “Optical Multi-Mode Interference Devices based on Self-Imaging: Principles and Applications,” *Journal of Lightwave Technology*, Vol. 13, No. 4, pp. 615–627, April 1995.
- [211] H. Chen and A. W. Poon, “Low-Loss Multimode-Interference-Based Crossings for Silicon Wire Waveguides,” *IEEE Transaction on Photonics Technology Letters*, Vol. 18, No. 21, pp. 2260–2262, November 2006.
- [212] Y. Ma *et al.*, “Ultralow Loss Single Layer Submicron Silicon Waveguide Crossing for SOI Optical Interconnect,” *Optical Express*, Vol. 21, No. 24, pp. 29374–29382, December 2013.
- [213] J. Z. Huang, R. Scarmozzino, and R. M. Osgood, “A New Design Approach to Large Input/Output Number Multimode Interference Couplers and Its Application to Low-Crosstalk WDM Routers,” *IEEE Photonics Technology Letters*, Vol. 10, No. 9, pp. 1292–1294, September 1998.
- [214] G. Goubau and F. Schwering, “On the Guided Propagation of Electromagnetic Wave Beams,” *IRE Transactions on Antennas and Propagation*, Vol. 9, No. 3, pp. 248–256, May 1961.
- [215] L. Ramini, D. Bertozzi, and L. P. Carloni, “Engineering a Bandwidth-Scalable Optical Layer for a 3D Multi-Core Processor with Awareness of Layout Constraints,” *Proceedings of the IEEE/ACM International Symposium on Networks-on-Chip*, pp. 185–192, May 2012.
- [216] M. Jung and S. K. Lim, “A Study of IR-Drop Noise Issues in 3D ICs with Through-Silicon-Vias,” *Proceedings of the IEEE International 3D Systems Integration Conference*, pp. 1–7, November 2010.
- [217] Z. Li *et al.*, “Thermal-Aware P/G TSV Planning for IR Drop Reduction in 3D ICs,” *Integration*, Vol. 46, No. 1, pp. 1 – 9, January 2013.

- [218] S. Wang, F. Firouzi, F. Oboril, and M. B. Tahoori, “Deadspace-Aware Power/Ground TSV Planning in 3D Floorplanning,” *Proceedings of the IEEE International Conference on IC Design Technology*, pp. 1–4, June 2015.
- [219] H. He, Z. Xu, X. Gu, and J. Lu, “Power Delivery Modeling for 3D Systems with Non-Uniform TSV Distribution,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 1115–1121, May 2013.
- [220] M. B. Healy and S. K. Lim, “Distributed TSV Topology for 3-D Power-Supply Networks,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 20, No. 11, pp. 2066–2079, November 2012.
- [221] Y. Satomi, K. Hachiya, T. Kanamoto, and A. Kurokawa, “Optimization of Full-Chip Power Distribution Networks in 3D ICs,” *Proceedings of the IEEE International Conference on Integrated Circuits and Microsystems*, pp. 134–138, November 2018.
- [222] Y. Wang *et al.*, “HS3-DPG: Hierarchical Simulation for 3-D P/G Network,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 23, No. 10, pp. 2307–2311, October 2015.
- [223] T. Sung, K. Chiang, D. Lee, and M. Ma, “Electrical Analyses of TSV-RDL-Bump of Interposers for High-Speed 3D IC Integration,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 865–870, May 2012.
- [224] J. Lau and *et al.*, “Redistribution Layers (RDLs) for 2.5D/3D IC Integration,” *Journal of Microelectronics and Electronic Packaging*, Vol. 11, No. 1, pp. 16–24, January 2014.
- [225] J. J. McMahon, J. Q. Lu, and R. J. Gutmann, “Wafer Bonding of Damascene-Patterned Metal/Adhesive Redistribution Layers for Via-First Three-Dimensional (3D) Interconnect,” *Proceedings of the IEEE Electronic Components and Technology*, pp. 331–336, May 2005.
- [226] J. H. Lau *et al.*, “Fan-Out Wafer-Level Packaging for Heterogeneous Integration,” *IEEE Transactions on Components, Packaging and Manufacturing Technology*, Vol. 8, No. 9, pp. 1544–1560, July 2018.

- [227] M. Li, P. Periasamy, K. N. Tu, and S. S. Iyer, “Optimized Power Delivery for 3D IC Technology Using Grind Side Redistribution Layers,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 2449–2454, May 2016.
- [228] H. Chen, H. Lin, Z. Wang, and T. Hwang, “A New Architecture for Power Network in 3D IC,” *Proceedings of ACM Design, Automation and Test in Europe Conference and Exhibition*, pp. 1–6, March 2011.
- [229] S. W. Yoon *et al.*, “3D TSV Processes and its Assembly/Packaging Technology,” *Proceedings of the IEEE International Conference on 3D System Integration*, pp. 1–5, September 2009.
- [230] M. Thomason and G. Girvna, “TSV Front End Design, Integration, and Process Development Unique Cell Design and Process Integration,” *Proceedings of the IEEE Advanced Semiconductor Manufacturing Conference*, pp. 337–341, May 2017.
- [231] J. Lu, “3-D Hyperintegration and Packaging Technologies for Micro-Nano Systems,” *Proceedings of the IEEE*, Vol. 97, No. 1, pp. 18–30, January 2009.
- [232] E. Chen *et al.*, “Fine-Pitch Backside Via-Last TSV Process with Optimization on Temporary Glue and Bonding Conditions,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 1811–1814, May 2013.
- [233] S. Chen *et al.*, “Implementation of Memory Stacking on Logic Controller by Using 3DIC 300mm Backside TSV Process Integration,” *Proceedings of the IEEE International Symposium on VLSI Technology, Systems and Application*, pp. 1–2, April 2016.
- [234] M. O. Hossen *et al.*, “Power Delivery Network (PDN) Modeling for Backside-PDN Configurations With Buried Power Rails and  $\mu$  TSVs,” *IEEE Transactions on Electron Devices*, Vol. 67, No. 1, pp. 11–17, January 2020.
- [235] R. Puschmann *et al.*, “Via Last Technology for Direct Stacking of Processor and Flash,” *Proceedings of the IEEE Electronic Components and Technology Conference*, pp. 1327–1332, May 2012.

- [236] K. Blutman *et al.*, “Logic Design Partitioning for Stacked Power Domains,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 25, No. 11, pp. 3045–3056, November 2017.
- [237] C. Schaeaf and J. T. Stauth, “Efficient Voltage Regulation for Microprocessor Cores Stacked in Vertical Voltage Domains,” *IEEE Transactions on Power Electronics*, Vol. 31, No. 2, pp. 1795–1808, February 2016.
- [238] K. Kesarwani, C. Schaeaf, C. R. Sullivan, and J. T. Stauth, “A Multi-Level Ladder Converter Supporting Vertically-Stacked Digital Voltage Domains,” *Proceedings of the IEEE Applied Power Electronics Conference*, pp. 429–434, March 2013.
- [239] P. S. Shenoy and P. T. Krein, “Differential Power Processing for DC Systems,” *IEEE Transactions on Power Electronics*, Vol. 28, No. 4, pp. 1795–1806, April 2013.
- [240] K. T. Zhan *et al.*, “Serial Power Supply Circuit, Virtual Digital Coin Mining Machine and Computer Server,” China Patent CN105045364A, 2016.
- [241] Intel Pentium 4 Processor in the 423 pin package/Intel 850 Chipset Platform, “<http://happytrees.org/files/chips/datasheets/datasheet-Intel-Pentium-4-423-pin-1.30,1.40,1.50,1.60,1.70,1.80,1.90,2GHz.pdf>,” 2001.
- [242] L. Chang *et al.*, “A Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm<sup>2</sup>,” *Proceedings of the IEEE Symposium on VLSI Circuits*, pp. 55–56, June 2010.
- [243] I. Vaisband, M. Saadat, and B. Murmann, “A Closed-Loop Reconfigurable Switched-Capacitor DC-DC Converter for Sub-mW Energy Harvesting Applications,” *IEEE Transactions on Circuits and Systems I: Regular Papers*, Vol. 62, No. 2, pp. 385–394, February 2015.
- [244] L. Delmas, G. Gateau, T. A. Meynard, and H. Foch, “Stacked Multicell Converter (SMC): Control and Natural Balancing,” *Proceedings of the IEEE Power Electronics Specialists Conference*, Vol. 2, pp. 689–694, June 2002.

- [245] H. Jeong, H. Lee, Y. Liu, and K. A. Kim, “Review of Differential Power Processing Converter Techniques for Photovoltaic Applications,” *IEEE Transactions on Energy Conversion*, Vol. 34, No. 1, pp. 351–360, March 2019.
- [246] K. Xu, B. Vaisband, G. Sizikov, X. Li, and E. G. Friedman, “Distributed Sinusoidal Resonant Converter with High Step-Down Ratio,” *Proceedings of the IEEE Conference on Electrical Performance of Electronic Packaging and Systems*, pp. 1–3, October 2017.
- [247] M. A. El-Moursy and E. G. Friedman, “Shielding Effect of On-Chip Interconnect Inductance,” *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Vol. 13, No. 3, pp. 396–400, March 2005.
- [248] K. Xu, M. Popovich, G. Sizikov, and E. G. Friedman, “Distributed Port Assignment for Extraction of Power Delivery Networks,” *Proceedings of the IEEE International Symposium on Circuits and Systems*, pp. 1–4, October 2020.