



# The New Power Perspective -Realistic Workloads - Real Results

Xiaoming Li, Synopsys

#### Agenda



- Why Power Matters
- Power Verification using Emulation
- Results

# Power Consumption Drives Competitiveness



Peak and Average Power are Key Design Concerns



#### Low Power Remains #1 Verification Issue Synopsys Global User Survey 2020





Meeting Dynamic Power Requirements Becoming More Difficult

and the second

# **Peak Power Events Are Critical**



#### Peak power events are driven by actual software workloads<sup>26,202</sup>

Emulation: Billion cycle software-driven test



#### Running SW Workloads to Find Power Bugs





Small tests do not expose realistic workload driven power bugs

Real firmware and OS are needed during pre-silicon testing Must use emulation and verify power over millions or billions of cycles Pre-silicon power verification enables debug not possible with actual silicon How is Power Calculated? Power Analysis Requires Waveforms, Technology Library Library and Signal Delay Data

#### Total Power =

Logic Cell Power

Switching Power: Capacity, Frequency, Voltage

+ Internal Power

+ Leakage Power

- + Clock Tree Power
- + Memory Power

#### Average Power

- Need # toggles, total duration at 0 and 1
- Cycle Power
  - 0-delay waveform for all signals for million cycles

Signoff Power

 Waveform for all signals with accurate delays

#### Data Formats

• SAIF: Switching Activity File

accel

SYSTEMS INITIATIVE

- .lib: Technology Library for Logic and Memory Internal and Leakage Power
- SPEF: Net Capacities File
- SDF: Net Delay File

# Synopsys Software-Driven Low Power Solution



End-to-End low power solution from architecture to signoff



# ZeBu Empower Fastest Power Emulator for HW-SW Power Verification Market All May 26, 201 Key Benefits

Large designs, Realistic workloads, Multiple iterations per day

Actionable power profiling for dynamic and leakage power

Power critical blocks and vectors feeding into signoff analysis



Hardware and Software Architected for Maximum Compute Throughput

#### Software-Driven SoC Power Analysis



acceller

CONFERENCE AND EXHIBITION

Identify Peak Power with real stimulus: Zoom from billions cycles  $\rightarrow$  Thousands of cycles

# ZeBu Empower Power Analysis: RTL Cycle, RTL-2-Gate, Gate







## ZeBu Empower Power Estimation: **Tcl Shell & Average/Peak Power**





| Flow Step              | ZeBu Empower Command                         |
|------------------------|----------------------------------------------|
| Set all Required .dbs  | <pre>set link_library "tech.db mem.db"</pre> |
| Read all Netlist Files | <pre>read_verilog design.v; link</pre>       |
| Read Constraints       | read_sdc                                     |
| Read Parasitic Data    | read_parasitics dut.spef                     |
| Read Activity File     | read_stimulus -file dut.ztdb                 |
| Calculate Power        | compute_power                                |
| Report Power           | report_power                                 |

| Power Group        | Internal<br>Power | Switching<br>Power | Leakage<br>Power | Total<br>Power | (       | Peak<br>%) Power | Peak<br>Time |
|--------------------|-------------------|--------------------|------------------|----------------|---------|------------------|--------------|
|                    |                   |                    |                  |                |         |                  |              |
| clock network      | 8.899e-04         | 0.000e+00          | 0.000e+00        | 8.899e-04      | ( 48.18 | %) 8.901e-04     | 71640        |
| register           | 9.527e-06         | 3.696e-06          | 3.760e-04        | 3.892e-04      | (21.07  | %) 4.396e-04     | 77280        |
| combinational      | 5.495e-05         | 4.844e-05          | 4.645e-04        | 5.679e-04      | ( 30.75 | %) 1.114e-03     | 11280        |
| sequential         | 0.000e+00         | 0.000e+00          | 0.000e+00        | 0.000e+00      | ( 0.00  | %) 0.000e+00     | N/A          |
| memory             | 0.000e+00         | 0.000e+00          | 0.000e+00        | 0.000e+00      | ( 0.00  | %) 0.000e+00     | N/A          |
| io pad             | 0.000e+00         | 0.000e+00          | 0.000e+00        | 0.000e+00      | ( 0.00  | %) 0.000e+00     | N/A          |
| black_box          | 0.000e+00         | 0.000e+00          | 0.000e+00        | 0.000e+00      | ( 0.00  | %) 0.000e+00     | N/A          |
| Net Switching Powe | er = 5.21         | 4e-05 ( 2.         | 82%)             |                |         |                  |              |
| Cell Internal Powe | er = 9.54         | 4e-04 (51.         | 67%)             |                |         |                  |              |
| Cell Leakage Power | c = 8.40          | 5e-04 (45.         | 51%)             |                |         |                  |              |
| Total Power        | = 1.84            | 7e-03 (100.        | 00%)             |                |         |                  |              |
| Peak Power         | = 2.43            | 4e-03              |                  |                |         |                  |              |
| Peak Time          |                   | 11280              |                  |                |         |                  |              |

Standard Tcl Shell Commands Compatible with PrimePower

Tcl Debug Shell Debug design, Debug Power



| # | groups: | /or1200 | cpu |
|---|---------|---------|-----|

| <pre># cols_per_group:</pre> | leakage internal | switching total | <<- columns |
|------------------------------|------------------|-----------------|-------------|
| # xunit: 1ns                 |                  |                 |             |

| f yunit: ι | WL |
|------------|----|
|------------|----|

# WSH TABLE BEG

#### # col names: xkey=Time G1.C1 G1.C2 G1.C3 G1.C4

| 100 | 8.304661e+02 | 3.958796e+02 | 6.054502e+00 | 1.232400e+03 |
|-----|--------------|--------------|--------------|--------------|
| 110 | 8.302657e+02 | 9.023951e+02 | 2.466496e+01 | 1.757326e+03 |
| 120 | 8.300638e+02 | 9.084959e+02 | 3.577724e+01 | 1.774337e+03 |
| 130 | 8.304145e+02 | 8.917342e+02 | 6.054502e+00 | 1.728203e+03 |
| 140 | 8.301953e+02 | 9.040037e+02 | 2.506173e+01 | 1.759261e+03 |
|     |              |              |              |              |

Standard Tcl analysis shell, Average + Peak Power Reports

#### Read Parasitic Data / Net Annotation 2021



SHANGHAL MAY 26, 2021

#### wsh> report\_parasitic\_annotation

Info: Total 14,315 unique nets found, missing capacitance annotations 4(0.03%).
Info: Cap Unit: 0.001 pF, Data Source: SPEF(99.97%), WLM(0.03%)
Info: Wire cap stats: sum = 491.34, avg = 0.03, min = 0.54, max = 45.98
Info: Total cap stats: sum = 26008.90, avg = 1.82, min = 0.00, max = 1373.96
Info: SPEF Annotation Summary

| Nets Driven by   | Annotated (%)  | Not Annotated<br>Loadless (%) | Not Annotated<br>Loaded (%) | Total   |
|------------------|----------------|-------------------------------|-----------------------------|---------|
| Primary Input    | 385(99.74%)    | 0 (0응)                        | 1(0.26%)                    | <br>386 |
| IO Pads          | 0(0응)          | 0 (0응)                        | 0(0%)                       | 0       |
| Black Box        | 0 (0응)         | 0 (0응)                        | 0(0%)                       | 0       |
| Memory           | 0 (0응)         | 0 (0응)                        | 0(0%)                       | 0       |
| Register         | 2,943(99.97%)  | 0 (0응)                        | 1(0.03%)                    | 2,944   |
| Latch            | 0 (0응)         | 0 (0응)                        | 0(0%)                       | 0       |
| Other Sequential | 0(0응)          | 0 (0응)                        | 0(0%)                       | 0       |
| Clock Gate       | 0 (0응)         | 0 (0응)                        | 0(0%)                       | 0       |
| Combinational    | 10,983(99.98%) | 0(0응)                         | 2(0.02%)                    | 10,985  |
|                  | 14,311(99.97%) | 0(0%)                         | 4(0.03%)                    | 14,315  |
|                  |                |                               |                             |         |

All nets: primary Input, Register, Combinational ... Goal 0% Not Annotated - if not debug ...

#### **Read Stimulus / Net Annotation**



wsh> report activity annotation -list not annotated

Info: Processing -root /or1200\_cpu, -stim\_id /wsdb/stim1 ... Info: Activity Annotation: -root /or1200\_cpu, -stim\_id /wsdb/stim1 Info: Checking for drivers with missing waveform annotations. Info: Total 2,277 essential drivers found, missing waveforms 0(0%). BEG: Waveform Annotation Summary

| Nets Driven by   | From Activity<br>File (%) | From<br>Constants (%) | Not Annotated<br>Loadless (%) | Not Annotated<br>Loaded (%) | Total (%)     |
|------------------|---------------------------|-----------------------|-------------------------------|-----------------------------|---------------|
| Primary Input    | 386(99.48%)               | 0 (0%)                | 2(0.52%)                      | 0 (0%)                      | 388(17.03%)   |
| IO Pads          | 0(0응)                     | 0 (0응)                | 0(0%)                         | 0(0%)                       | 0(0응)         |
| Black Box        | 0(0%)                     | 0 (0응)                | 0(0%)                         | 0 (0응)                      | 0(0응)         |
| Memory           | 0(0%)                     | 0(0%)                 | 0(0%)                         | 0(0%)                       | 0(0응)         |
| Register         | 1,891(100%)               | 0(0%)                 | 0(0%)                         | 0 (0응)                      | 1,891(82.97%) |
| Latch            | 0(0%)                     | 0 (0응)                | 0(0%)                         | 0(0%)                       | 0(0응)         |
| Other Sequential | 0(0%)                     | 0(0%)                 | 0(0%)                         | 0 (0응)                      | 0(0응)         |
| Clock Gate       | 0(0%)                     | 0(0%)                 | 0(0%)                         | 0(0%)                       | 0(0응)         |
| Combinational    | 0(0응)                     | 0 (0응)                | 0(0%)                         | 0(0%)                       | 0(0응)         |
| Empty Modules    | 0(0%)                     | 0(0%)                 | 0(0%)                         | 0(0%)                       | 0(0응)         |
|                  | 2,277(99.91%)             | 0(0%)                 | 2(0.09%)                      | 0(0%)                       | 2,279(100%)   |

Essential Signals: Sequential Outputs, Memory Outputs, Port Inputs (not combos). Goal 0% Not Annotated - if not debug ...

#### **Computer Power/ Cell Computed**





| Info: 7 | Total cells:<br>Total computed<br>Power Computat |                       | 12,192<br>12,192(100%)    |        |  |
|---------|--------------------------------------------------|-----------------------|---------------------------|--------|--|
| J       | Power Group                                      | Power<br>Computed (%) | Power<br>Not Computed (%) | Total  |  |
| (       | clock network                                    | 0(0%)                 | 0(0%)                     | 0      |  |
|         | register                                         |                       | 0(0%)                     | 1,891  |  |
| (       | combinational                                    | 10,301(100%)          | 0 (0응)                    | 10,301 |  |
| ŝ       | sequential                                       | 0(0응)                 | 0 (0응)                    | 0      |  |
| r       | memory                                           | 0(0응)                 | 0 (0응)                    | 0      |  |
|         | io_pad                                           | 0 (0응)                | 0 (0응)                    | 0      |  |
| k       | black_box                                        | 0(0%)                 | 0(0%)                     | 0      |  |
|         |                                                  | 12,192(100%)          | 0(0%)                     | 12,192 |  |
| -       |                                                  |                       |                           |        |  |

All cells: clock network, register, combinational, memory... Goal 0% Not Computed - if not debug ...

# Local Customer Use Case



# Bring-up

- Effort
   Flow is rather simple and clear
- PrimePower script can be easily reused
- Generally one day set-up period for new project

#### Speed

- Typical TAT is around ~2 hours for tens of millions gate counts design
- Typical TAT is around ~12 hours for hundreds of millions gate counts design
- Native PC farm support, more farm resource, less TAT
- Multi-iterations per day

It is the first time to perform complex power analysis with real software workload at presilicon stage for millions of cycles, including DFS, clock gating feature enabled.

#### Local Customer Use Case



Peak Time

N/A N/A

180459561.984 i 180459596.616 180459575.088 180459569.472 180459563.856

Attrs

#### Average power <2% deviation compared with PrimePower

#### PrimePow

|                                                          | el<br>register cloc<br>led power grou          | ck pin inte                              | ernal powe                              | r                |                     |                                           |                                                                        |                                                                                         |                                                                                         |                                                                                         |                                                                                         |                                                                             |                                                                                         |
|----------------------------------------------------------|------------------------------------------------|------------------------------------------|-----------------------------------------|------------------|---------------------|-------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| Power Group                                              |                                                | Gwitching<br>Power                       | Leakage<br>Power                        | Total<br>Power   | ( %                 | Attrs                                     |                                                                        |                                                                                         |                                                                                         |                                                                                         |                                                                                         |                                                                             |                                                                                         |
| clock_network<br>register<br>combinational<br>sequential | 2,0498<br>3,709e-03 4<br>3,798e-03 9<br>0,0000 | 9.097e-03                                | 0.0175<br>0.1036<br>0.2764<br>3.689e-04 | 0.1118<br>0.2893 | (63,55%)<br>(2,78%) |                                           | т                                                                      |                                                                                         |                                                                                         | Ze                                                                                      | Bu Em                                                                                   | power                                                                       |                                                                                         |
| memory<br>io_pad<br>black_box<br>Net Switching Power     |                                                | 5.761e-08<br>0,0000<br>0,0000<br>(12.47% | 0.1353<br>0.0000<br>0.0000              |                  | (26.46%)<br>(0.00%) | 9<br>10<br>11<br>12<br>13                 |                                                                        | ng register c<br>fined power g                                                          |                                                                                         | rnal power                                                                              |                                                                                         |                                                                             |                                                                                         |
| Cell Internal Power<br>Cell Leakage Power                | = 2,9861<br>= 0,5332                           | (74.27%<br>(13.26%                       |                                         |                  |                     | 14                                        | wer Group                                                              | Internal<br>Power                                                                       | Switching<br>Powe <b>r</b>                                                              | Leakage<br>Power                                                                        | Total<br>Power                                                                          | (%)                                                                         | Peak<br>Power                                                                           |
| Total Power                                              | = 4.0208                                       | (100,00%                                 | )                                       |                  |                     | 18 re<br>19 co<br>20 se<br>21 me<br>22 io | _pad                                                                   | 2.027e+00<br>3.697e-03<br>3.810e-03<br>4.498e-06<br>9.264e-01<br>0.000e+00<br>0.000e+00 | 4.867e-01<br>4.472e-03<br>9.084e-03<br>2.558e-07<br>6.744e-08<br>0.000e+00<br>0.000e+00 | 1.758e-02<br>1.041e-01<br>2.764e-01<br>1.101e-05<br>1.353e-01<br>0.000e+00<br>0.000e+00 | 2.531e+00<br>1.122e-01<br>2.893e-01<br>1.577e-05<br>1.062c+00<br>0.000e+00<br>0.000e+00 | (63.37%)<br>(2.81%)<br>(7.24%)<br>(0.00%)<br>(26.58%)<br>(0.00%)<br>(0.00%) | 3.588e+00<br>1.250e-01<br>3.085e-01<br>2.044e-05<br>1.701e+00<br>0.000e+00<br>0.000e+00 |
|                                                          |                                                |                                          |                                         |                  |                     | 24<br>25<br>26                            | ack_box<br>Net Switching Pow<br>Cell Internal Pow<br>Cell Leakage Powe | ver = 5.00<br>ver = 2.96                                                                | 2e-01 (12.<br>1e+00 (74.<br>4e-01 (13.                                                  |                                                                                         | 0.0000+00                                                                               | ( 0.00%)                                                                    | 0.0002+00                                                                               |

29 Total Power

3.994e+06 (100.00%)

### Local Customer Use Case



#### Cycle power <5% deviation compared with PrimePower

|                                                                | PP (W) | ZeBu Empower (W) | (ZeBu Empower - PP)/PP |
|----------------------------------------------------------------|--------|------------------|------------------------|
|                                                                | 4.53   | 4.44             | -0.01911766            |
|                                                                | 4.53   | 4.44             | -0.019269536           |
|                                                                | 4.56   | 4.47             | -0.019126535           |
|                                                                | 4.53   | 4.44             | -0.01888543            |
| ρ                                                              | 4.56   | 4.50             | -0.012141009           |
|                                                                | 4.57   | 4.49             | -0.01855186            |
|                                                                | 4.56   | 4.49             | -0.014354386           |
|                                                                | 4.55   | 4.46             | -0.019007692           |
| -0.005<br>-0.01                                                | 4.55   | 4.48             | -0.014616044           |
| <b>F I I I I I</b>                                             | 4.55   | 4.47             | -0.018207253           |
| ğ                                                              | 4.56   | 4.47             | -0.018641009           |
| -0.01                                                          | 4.54   | 4.45             | -0.019657489           |
|                                                                | 4.53   | 4.44             | -0.019637307           |
| · PP) / 말 -0.015                                               | 4.53   | 4.45             | -0.018675055           |
|                                                                | 4.53   | 4.45             | -0.018703753           |
| 👻 IVA IVANA ITA INTA INA IWA INA IVAN INA INATA INATA INA IWA. | 4.53   | 4.44             | -0.020122737           |
|                                                                | 4.52   | 4.47             | -0.010213938           |
|                                                                | 4.53   | 4.44             | -0.02002362            |
|                                                                | 4.56   | 4.49             | -0.014760965           |
|                                                                | 4.53   | 4.45             | -0.018122958           |
| -0.025                                                         | 4.54   | 4.51             | -0.007194053           |
|                                                                | 4.56   | 4.47             | -0.019219737           |
|                                                                | 4.55   | 4.48             | -0.015076264           |
|                                                                | 4.54   | 4.45             | -0.019574449           |
|                                                                | 4.55   | 4.48             | -0.014661758           |
|                                                                | 4.55   | 4.46             | -0.019503077           |
|                                                                | 4.56   | 4.47             | -0.019928289           |

#### ZeBu Empower - Multiple Turns per Day



3h TAT – Daily TAT not possible before Major US Processor Company GPU Design, 5M Cycles, 4.8MG, 8 CPUs 12h TAT – Not possible before China AI Startup Al Design, 1.5M Cycles, 300MG, 150 CPUs 1.1h TAT - Using only 14GB/CPU Major US Processor Company GPU Design, 2.6M Cycles, 5MG, 24 CPUs 2h TAT – Good QoR for Exploration Leading IP Provider GPU Design, 0.7M Cycles, 28MG, 8 CPUs



## **Thank You**