Projects
1) Intel 16nm CNN Accelerator ASIC
Tape-out Project / SystemVerilog / Cadence & Synopsys EDA Tools
- Involved in all steps from RTL to GDS design flow, leading to a successful tape-out on Intel’s 16nm technology.
- Implemented an Eyeriss V1-based architecture featuring energy-efficient row-stationary data flow, reducing data movement and power consumption while supporting convolution, max pooling, and ReLU.
- Led RTL design and verification; employed direct and randomized testing methodologies for comprehensive verification and automated test flows using Python scripts.
- Achieved a maximum clock frequency of 1GHz.
- Used Cadence and Synopsys EDA tools for synthesis, P&R, DRC, and LVS.
2) Tomasulo 32-bit Out-of-Order Execution CPU
VHDL / ModelSim / MIPS
- Developed key features including branch prediction, speculative execution, and memory disambiguation.
- Implemented modules such as the Issue Unit, Re-order Buffer (ROB), Free Register List (FRL), Store Buffer, Store Address Buffer, and a 2-stage Dispatch Unit.
- Developed a copy-free checkpoint (CFC) technique optimized for FPGA, enabling efficient restoration of the Front-End Register Alias Table (FRAT) during branch misprediction events.
3) Advanced eXtensible Interface (AXI) Interconnect
Verilog / ModelSim
- Developed AXI protocol bridging interface in Verilog that packetized read/write transactions for a multi-master, multi-slave SoC.
- Utilized reorder buffers to maintain in-order transactions for each master’s writes and reads, while supporting out-of-order behavior across different masters and memories.
4) PCIe (Physical Layer) Implementation
Verilog / ModelSim
- Implemented 8b/10b encoding for PCIe interfaces, enhancing data integrity and stability by ensuring DC balance
- Completed Elastic Buffer to handle transmissions across different clock domains
- Developed De-Skew FIFO to eliminate lane-to-lane skew, ensuring high-speed data integrity and synchronization across multiple lanes.
+ More Projects