

## Agile Digital Chip Design with SpinalHDL and Cocotb

WanZheng Weng, DatenLord

## About Us

DatenLord aims to break cloud barrier by deeply integrating hardware and software to build a unified storage-access mechanism to provide high-performance and secure storage support for applications across clouds.

We are using advanced HDLs, like Bluespec and SpinalHDL, and new verification tools, like Cocotb, to develop highperformance RDMA network interface card.

GitHub: https://github.com/datenlord; Website: https://datenlord.github.io/;







02 Desig

- Introduction -

Why we need agile hardware design?

A Scala package featured in efficient hardware design.

### Design Using SpinalHDL

## 03 Verification with Cocotb

#### - SpinalHDL -

#### - Cocotb -

Python-based testbench environment for verifying RTL designs



# Introduction ➢ Why Agile Chip Design?





Why Agile Chip Design?

Emerging Applications, like LLM and Crypto, demand huge computation power;





> Moore Law and Dennard Scaling slows down, hard to improve performance further;



Why Agile Chip Design?

- > Adopt domain-specific architecture to gain better performance-power ratio:
  - Al accelerator;
  - Smart NIC;
  - Cryptography;
  - Video coding/encoding;
- > The architecture of chip becomes more diverse and design complexity surges;
- Democratize chip design for more companies and designers;











## Design with SpinalHDL

What is SpinalHDL? Why SpinalHDL?



## 

What's SpinalHDL?

- > An efficient HCL(Hardware Construction Language) embedded in Scala;
- > A Scala package providing APIs to describe circuits and generate Verilog codes;
- $\succ$  Problems with Verilog/VHDL:
  - Poor parameterization ability;
  - Loose error checking;
  - Tedious net connections;





Place and Route



#### What's SpinalHDL?

#### ➢ Not HLS;

- Same description granularity as traditional HDLs;
- SpinalHDL is a super set of Verilog/VHDL;

| SpinalHDL elements                   | System Verilog element            | category                |                                            |
|--------------------------------------|-----------------------------------|-------------------------|--------------------------------------------|
| in/out                               | input/output                      | I/O                     | General Digital Circuit:                   |
| Bundle                               | interface                         | I/O                     |                                            |
| ClockDomain                          | sensitivity list and always block | clock domain            | - Clock Domain;                            |
| signals(Bool,Bits,UInt,Sint and Vec) | wire/reg/logic and its modifiers  | signals and connections | •                                          |
| connection(:=)                       | assignment(=, <=)                 | signals and connections | - Combinational paths;                     |
| when/elseWhen/others                 | if/else if/else                   | combinational           | <ul> <li>Sequential parts;</li> </ul>      |
| switch/is/default                    | case/default                      | combinational           | <ul> <li>Signals and connection</li> </ul> |
| logical operators                    | logical operators                 | combinational           | - I/Ŏ;                                     |
| relational operators                 | relational operators              | combinational           |                                            |
| arithmetic operators                 | arithmetic operators              | combinational           |                                            |
| registers                            | -                                 | sequential              |                                            |
| memories                             | -                                 | sequential              |                                            |



#### ditional HDLs; VHDL;

## ns; tions;

Why SpinalHDL? - Expressiveness

Utilize advanced language features of Scala to describe circuits:

- Object Oriented;
- Recursion;
- Collection types and methods;
- Function programming;





### Why SpinalHDL? - Expressiveness

- > A Simple Demo Adder Tree:
  - Implemented through recursion
  - reduceBalanceTree method

```
class AdderTree extends Component {
    val io = new Bundle {
      val inputs = in Vec(Bits(4 bits), 16)
      val output = out Bits(4 bits)
    io.output := io.inputs.reduceBalanceTree(_ + _)
      it (inputs.length == 2) {
         inputs(0) + inputs(1)
       else {
         val splitIdx = inputs.length/2
         val (firstHalf, secondHalf) = inputs.splitAt(splitIdx)
         adderTree(firstHalf) + adderTree(secondHalf)
    io.output := adderTree(io.inputs)
DatenLord
```



Why SpinalHDL? - Expressiveness

#### > Other Regular Circuits:

- FFT(Fast Fourier Transformation)
- Bitonic sorter







Why SpinalHDL? - Reliability

- > More accurate model:
  - Registers in Verilog:
    - always block + sensitive list;
    - if else;
    - blocking/non-blocking assignments;
  - For SpinalHDL, reg is embedded in language level:
    - Reg/RegNext/RegInit;
    - ClockDomain;
  - SpinalHDL provides richer finer types for signal:
     Bits / UInt / SInt / Vec





### Why SpinalHDL? - Reliability

- ➤ Early DRCs:
  - Semantic problems could be found by the type system;
  - Problems introduced by implicit width expansion or reduction;
  - Latches could be found before generating Verilog/VHDL code;

Separate Design and Simulation Elements:

- Verilog can be divided into synthesizable and unsynthesizable;
- It is confused to separate these two parts;
- No needs to consider "synthesizable" in SpinalHDL;



Why SpinalHDL? - Reusability

- Object Oriented and Strong Parameterization:
  - Handle relations and restrictions between parameters;
  - Optimize architecture for a specific set of parameters;
- > Community and tools:
  - Reuse Java packages;
  - Convenient build tools sbt/mill, intelligent IDE;
- > Abundant Libraries of Circuit Components:
  - FIFO, RAM/ROM, Counter;
  - Bus: AXI/APB/AHB;
  - VexRiscv, NaxRiscv and SoC framework



```
val cpu = new VexRiscv(
  //Provide a configuration instance
  config = VexRiscvConfig(
    //Provide a list of plugins which will futher add tl
    plugins = List(
      new IBusSimplePlugin(
        resetVector = 0x000000001,
        cmdForkOnSecondStage = true,
        cmdForkPersistence = true
      ),
      new DBusSimplePlugin(
        catchAddressMisaligned = false,
        catchAccessFault = false
     new DecoderSimplePlugin(
        catchIllegalInstruction = false
      new RegFilePlugin(
        regFileReadyKind = Plugin.SYNC,
        zeroBoot = true
     new IntAluPlugin,
      new SrcPlugin(
        separatedAddSub = false,
        executeInsertion = false
     new LightShifterPlugin,
      new HazardSimplePlugin(
        bypassExecute
                                = false,
        bypassMemory
                                = false,
        bypassWriteBack
                                = false,
        bypassWriteBackBuffer
                                = false
     new BranchPlugin(
        earlyBranch = false,
       catchAddressMisaligned = false
     new YamlPlugin("cpu0.yaml")
```

## Verification with Cocotb

What is Cocotb?
Why Cocotb?
A Specific Cocotb Demo



03

What's Cocotb?

> A bridge/interface between Python testbench and RTL simulator; > Enable you write hardware testbench like software in Python;





Components in Cocotb Verification:

- RTL simulator providing VHPI/VPI/GPI; lacksquare
- Hardware Design in Verilog/VHDL;
- Python testbench using Cocotb API;

### Why Cocotb?

- > In the perspective of language itself:
  - more productive and succinct than VHDL/SV;
  - abundant language features: OO, FP;
  - Easy to learn and master;
  - Popular language: easy to find engineer;

Example: Big Integer Multiplication in Crypto

- Simplest C++ implementation uses 40 lines;
- Only 1 line in Python;



```
1 /* big_int_arith.h */
2 #ifndef BIG_INT_ARITH_H
3 #define BIG_INT_ARITH_H
4 #define MAX_SIZE 10
5 typedef struct big_int{
      int width;
      int numbers[MAX_SIZE];
8 } big_int;
9 big_int big_int_init(char input[]);
10 big_int big_int_add(big_int op1, big_int op2);
11 #endif
12 /* big_int_arith.c */
13 #include "big_int_arith.h"
14 #include <string.h>
15 big_int big_int_init(char input[]){
      big_int res;
      int len = strlen(input);
      res.width = len;
      if(len > MAX_SIZE) {res.width = MAX_SIZE;}
19
      for(int i=0; i<MAX_SIZE; i++){</pre>
20
          if(i < res.width){ res.numbers[i] = input[len-1-i] - '0';}</pre>
21
           else{ res.numbers[i] = 0;}
22
      }
23
      return res;
25 }
26
27 big_int big_int_add(big_int op1, big_int op2){
      int max = op1.width;
      if (max < op2.width) max = op2.width;
29
      for(int i; i< MAX_SIZE-1; i++){</pre>
30
           op1.numbers[i] += op2.numbers[i];
31
           op1.numbers[i+1] += op1.numbers[i]/10;
32
           op1.numbers[i] = op1.numbers[i]%10;
33
      7
34
      op1.numbers[MAX_SIZE-1] += op2.numbers[MAX_SIZE-1];
35
      op1.numbers[MAX_SIZE-1] = op1.numbers[MAX_SIZE-1]%10;
36
      if (max == MAX_SIZE) { op1.width = max; }
37
      else{op1.width = (op1.numbers[max]==0)?max:max+1;}
38
      return op1;
39
40
```

1 # python implements addition of integers with any bit width
2 c = a + b



### Why Cocotb?

- > In the perspective of Community:
  - Prosperous open-source community;
  - Abundant and diverse libraries;
  - Easy-to-use package management;
  - Reduce workload and ensure correctness;



#### Package Stack for Cocotb-based Verification





### A Specific Cocotb Demo

- Design Under Test:
  - Generate UDP/IP/Ethernet packet generator with support for RoCE;
  - Additional ICRC is appended in the tail of IP packet;
  - Takes in payload stream and header information;





### erator with support for RoCE; I of IP packet; nformation;

### A Specific Cocotb Demo

- > Testbench Architecture:

  - gen random pkt: generate random UDP/IP/Ethernet packet using scapy;
  - drive dut input: drive input ports by calling corresponding function;
  - check dut output: receives and check reference packets;





- Sink/Source: abstract valid-ready ports into a python function using cocotb-axi;

#### A Specific Cocotb Demo

#### > Key Codes:

- gen random pkt: generate random UDP/IP/Ethernet packet using scapy;
- drive dut input: drive input ports by calling corresponding function;
- check dut output: receives and check reference packets;

```
dut_I await self.udp_ip_meta_src.send(udp_ip_meta)
ref_H await self.mac meta src.send(mac meta)
     await self.data_stream_src.send(frame)
if du
    , self.ref_buffer.put(raw(packet))
    ref_packet_hex = ref_packet.hex("-")
    self.log.error(f"DUT Packet {case_idx}: {dut_packet_hex}")
    self.log.error(f"REF Packet {case_idx}: {ref_packet_hex}")
    Ether(dut_packet).show()
    Ether(ref_packet).show()
assert raw(dut_packet) == raw(ref_packet), f"Test Case {case_idx} Fail"
self.log.info(f"Pass testcase {case_idx}")
```



### **Related Links**

- Datenlord GitHub: https://github.com/datenlord;
- SpinalHDL: https://github.com/SpinalHDL/SpinalHDL;
- Cocotb: https://www.cocotb.org/;
- blue-ethernet: https://github.com/wengwz/blue-ethernet;



## Thanks

