





# SCATTER: <u>Algorithm-Circuit Co-Sparse Photonic</u> <u>Accelerator with Thermal-Tolerant, Power-Efficient In-</u> situ Light <u>R</u>edistribution

<u>Dennis Yin<sup>1</sup></u>, Nicholas Gangi<sup>2</sup>, Meng Zhang<sup>2</sup>, <u>Jiaqi Gu<sup>1</sup></u>, Jeff Zhang<sup>1</sup>, Rena Huang<sup>2</sup>

<sup>1</sup>Arizona State University, <sup>2</sup>Rensselaer Polytechnic Institute <sup>1</sup>School of Electrical, Computer and Energy Engineering <u>ziangyin@asu.edu</u> <u>jiaqiqu@asu.edu</u> | <u>scopex-asu.github.io</u>





Source: https://openai.com/blog/ai-and-compute/

Source: https://spectrum.ieee.org/nvidias-next-gpu-shows-that-transformers-are-transforming-ai

# **Electrical Computing vs Photonic Computing**



# **Photonic AI System is Booming**

#### **Photonic AI Trends in Academia**

#### Foundry / EPDA Support in Industry



# **SCATTER Advantages Over Other Types of PTCs**



- Lacks universality
- Phase/Thermal Sensitive
- Need large device spacing to reduce crosstalk
- Analog device/circuit noise
- Large on-chip area cost
- Power-consuming E-O conversion

Z. Yin, N. Gangi, M. Zhang, J. Zhang, R. Huang, J. Gu, "SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution," *ACM/IEEE ICCAD*, 2024.



- Lacks universality
- Universal full-range PTC
- Phase/Thermal sensitive
- Need large device spacing to reduce crosstalk
- Analog device/circuit noise
- Large on-chip area cost
- Power-consuming E-O conversion

Z. Yin, N. Gangi, M. Zhang, J. Zhang, R. Huang, J. Gu, "SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution," *ACM/IEEE ICCAD*, 2024.



# **Universal Full Range Photonic Tensor Core**

- Full-range dot-product engine
  - > 1 × 2 MZI power splitter
  - Balanced photodetectors enables differential output
  - Combining above two designs enables
     full-range weight representation

» 
$$I_{out} = (cos((\Delta \phi + \phi_b)))x; \phi_b = \frac{\pi}{2},$$
  
 $W_{ij}$   $\Delta \phi \in [-\frac{\pi}{2}, \frac{\pi}{2},$   
»  $(cos((\Delta \phi + \phi_b))) \in [-1, 1]$ 



- Lacks universality
- Universal full-range PTC
- Phase/Thermal sensitive
- Incoherent tensor core and symmetrical placement
- Need large device spacing to reduce crosstalk
- Analog device/circuit noise
- Large on-chip area cost
- Power-consuming E-O conversion



#### **Phase/Thermal-Insensitive Photonic Tensor Core**



#### **Phase/Thermal-Insensitive Photonic Tensor Core**



- Thermal-Insensitive Design
  - > Broadband device
  - > Symmetrical device placement
  - Phase error on two arms partially cancel out



#### **Phase/Thermal-Insensitive Photonic Tensor Core**



- Thermal-Insensitive Design
  - > Broadband device
  - > Phase error on two arms cancel out
  - > Symmetrical device placement

How to further reduce the crosstalk with a compact layout?



- Lacks universality
- Universal full-range PTC
- Phase/Thermal sensitive
- Incoherent tensor core and symmetrical placement
- Need large device spacing to reduce crosstalk
- Row pruning and output gating
- Analog device/circuit noise
- Large on-chip area cost
- Power-consuming E-O conversion





# **Row Pruning + Output Gating to Reduce Crosstalk**

- Row-wise *interleave* structural pruning + output gating
  - Reduce the error induced by crosstalk
  - Save power of unused TIA/ADC







K-R

TIA/

Spacing  $5\mu m$ 

12-12-12-

TIA/

ADC ADC ADC

-CH-C

TIA/



# **Column Pruning + Input Gate ?**



15

- Lacks universality
- Universal full-range PTC
- Phase/Thermal sensitive
- Incoherent tensor core and symmetrical placement
- Need large device spacing to reduce crosstalk
- Row pruning and output gating
- Analog device/circuit noise
- Light redistribution
- Large on-chip area cost
- Power-consuming E-O conversion





16

# **Light Redis. to Reduce Leakage and PD Error**



#### **Crosstalk and Power-Aware Dynamic Sparse Training**









Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution," ACM/IEEE ICCAD, 2024.



Z. Yin, N. Gangi, M. Zhang, J. Zhang, R. Huang, J. Gu, "SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution," *ACM/IEEE ICCAD*, 2024.

- Lacks universality
- Universal full-range PTC
- Phase/Thermal sensitive
- Incoherent tensor core and symmetrical placement
- Need large device spacing to reduce crosstalk
- Row pruning and output gating
- Analog device/circuit noise
- Light redistribution and input gating
- Large on-chip area cost
- Customized device and structural pruning
- Power-consuming E-O conversion
- Device gating and hybrid eoDAC





# **Single-Link Chip Testing and Error Calibration**





Testing vs. Simulation: 0.8~3% relative square error 6~7-bit Resolution

[Courtesy: Prof. Rena Huang's group for testing]

# **Application Evaluation and Efficiency Comparison**







Open-Source TorchONN Toolchain

Automating optical AI hardware design toward productivity

# Thank you! Q & A?

SCATTER: <u>A</u>lgorithm-<u>C</u>ircuit Co-<u>S</u>parse Photonic Accelerator with <u>T</u>hermal-<u>T</u>olerant, Power-<u>E</u>fficient In-situ Light <u>R</u>edistribution

> Ziang Yin<sup>1</sup>, Nicholas Gangi<sup>2</sup>, Meng Zhang<sup>2</sup>, Jeff Zhang<sup>1</sup>, Rena Huang<sup>2</sup>, Jiaqi Gu<sup>1†</sup> <sup>1</sup>Arizona State University, <sup>2</sup>Rensselaer Polytechnic Institute





Renselaer 26 Covrible 2022 Arizona Board of Rede