



# Test, Repair and Reliability Challenges of Chip-let Interconnects

Dr. Sreejit Chakravarty, IEEE Fellow, Distinguished Engineer Ampere Computing Santa Clara, CA



### Agenda

- Background and Motivation
- Packaging Technology and Chip-Let Interconnect Defects
- Chip-Let Interconnect Reliability Issues





### CHIPLET BASED SoC EXAMPLES



SoC Example using multiple interconnect technology

- Chip-let based design
  - spans multiple market segments
  - Uses multiple packaging technology





### Al and Chip-Let Based Design









#### Al and Chip-Let Based Design

#### IEEESpectrumAMD MI300



To get everything to line up, the IOD chiplets had to be made as mirrors of each other, and the accelerator (XCD) and compute (CCD) chiplets had to be rotated.  $\mbox{\sc AMD}$ 





### CHIPLET INTERCONNECT TREND: Test Implication

| Feature                               | Feature Trend                                             | Test Implication                                                                                                                                                                                                                              |
|---------------------------------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Density                               | 100µm → 50µm → $\leq$ 10µm<br>[35-25µm in the market]     | <ul> <li>Higher defect level</li> <li>DPPM impact and increasing repair need</li> </ul>                                                                                                                                                       |
| Technology Change                     | μBump to Hybrid (direct) and combinations there-off       | <ul><li>Defect profile change</li><li>Repair support change</li></ul>                                                                                                                                                                         |
| Chiplet interconnect<br>count per SoC | 100s to 200,000 and counting<br>[Increasing very rapidly] | <ul> <li>DFT insertion and verification complexity increase</li> <li>Test time increase</li> <li>Greater need for repair to improve yield</li> <li>Need for infield repair</li> <li>Repair cost (muxing logic, repair fusing cost)</li> </ul> |
| Frequency                             | 100s of Mhz to 3.5+ Ghz and counting                      | <ul> <li>Greater need to test for marginalities, in<br/>addition to defects</li> </ul>                                                                                                                                                        |





### Agenda

✓ Background and Motivation

- Packaging Technology and Chip-Let Interconnect Defects
- Chip-Let Interconnect Reliability Issues



### (SP)

### Advanced Packaging Intro [Source: UCle 1.1]

Advanced Package interface: Example 1 Figure 1-4. Die-1 Die-0 Die-2 0000000  $\circ \circ \circ$ 00000000  $\bigcirc$ Silicon Bridge Silicon Bridge Package Substrate (e.g. EMIB) (e.g. EMIB) Packaging used is Figure 1-5. Advanced Package interface: Example 2 based on power, form Die-1 Die-0 Die-2 factor, size, cost etc. 0000000000  $\circ$   $\circ$   $\circ$   $\circ$   $\circ$   $\circ$   $\circ$   $\circ$ 0000 Interposer (e.g. CoWoS) Package Substrate Figure 1-6. Advanced Package interface: Example 3 Die-1 Die-0 Die-2 . . . . . . . . 0000000 Fanout Organic Interposer (e.g., FOCoS-B) — Silicon Bridge Package Substrate



۲



### Silicon Interposer Usage Examples









### µBump based Interconnects







### $\mu \text{BUMP DEFECT PROFILE}$







### $\mu \text{BUMP DEFECT PROFILE}$



- Vcc/Vss shorts quite significant
  - Cannot be addressed during high volume manufacturing
  - Addressed through physical design rules
- Defects targetable during High Volume Manufacturing (HVM) Tests
  - ~1/3 Shorts between signal and Vcc/Vss
  - ~1/3 Shorts between 2-signal line
  - Slightly less than 1/3 multiple line shorts
  - ~5% opens
- Short between clock and signal
- Shorts across clusters
- Coupling Failures

Not covered by HBM

Not covered by UCle

#### **Potential source of Test Escapes**



### µBump Repair Idea, Two Redundant Chip-Let Interconnects

Chip-Let Interconnect

Redundant Chip-Let Interconnect

Mux Select

Mux Select

3:1 Mux

2:1 Mux





- Alternate placement of redundant lanes are possible
- Clock and signal lanes can be shared/not-shared









# Hybrid Bonding

Source: EDM 2021 Intel Paper





### HYBRID BONDING DEFECT PROFILE



- Contaminants are the primary defect source
  - Surface near contaminant is concave/convex

- Clustered Open Defects
- Not addressed by either UCIe or HBM



**Broken Inter** 



### Hybrid Repair Idea for Clustered Opens





|  | Redundant<br>Block |
|--|--------------------|

**Block Definition** 



Each block is a collection of signals, Vcc, Vdd

Redundnacy built into the layout Primarily for Vcc and Vdd, sometimes for clocks

Don't add repair muxing logic to Vcc, Vdd, Clk [Saves area]

- Assume: Simple Buffer Tx/Rx
- Can SerDes Tx/Rx be used for very high-density interconnects? [<10µm]
- High repair-logic cost. Contained by reducing block size and sacrificing yield.



### Hybrid Repair Idea for Clustered Opens

#### **Challenges**











### Agenda

✓ Background and Motivation

✓ Packaging Technology and Chip-Let Interconnect Defects

Chip-Let Interconnect Reliability Issues



### Chip-let Interconnect Reliability Issues



 Typically, Chip-Let Interconnect regions, show up as hot spots in the SoC thermal map

• Why?

- Power dissipation per unit area is high
  - Chiplet Tx/Rx run at a higher voltage
  - Capacitive load is high
- Die shoreline area density is very high
- Impact of chiplet interconnects running hot
  - Chip-Let Interconnect defect aging
  - Device aging
- Aging leads to field failures





### Marginal Defects and Defect Aging: µBump







### Marginal Defects and Defect Aging: Hybrid Bonding

Redundant Block





- Direction of block shift for repair
- Contaminent

Defective ("fully" open) Chip-Let Interconnect
 Marginally defective (partially open) Chip-Let Interconnect

# Latent Defects: Not detected at time-0 but will fail over time



Expansion of the contaminent and surface warping



"Fully Open"



### Device Aging of Chip-Let Interconnects



- Running Hot leads to device and/or defect aging
- Over time, latent defects will show up as field failures



### **Potential Mitigation Techniques**



- Frequency adjustment using Temperature Monitors
  - Slow down clock frequency to cool down the interconnect area
  - Very challenging to add Temp Monitors in such dense area of the die
- Monitoring Signal Parameters
  - To identify if the interconnect is reaching its performance limits
  - Clock-jitter and jitter compensation
  - Clock duty cycle
  - Clock-data skews
  - These IPs have their own area overhead
  - Predicting failure from the collected data needs a data analysis infrastructure





### **Potential Mitigation Techniques**

- Test and Repair on Bootup/powerup/power-down
  - Feasible for products that are turned on and off frequently: Automotive applications, smaller devices
    - The number of interconnects are relatively small
  - Not effective for large servers and AI engines used in data centers, AI model build etc.
    - These systems are used for very long running jobs
    - Large time-interval between successive bootup
    - The number of interconnects are large
- Periodic Test and Repair
  - Periodically pause data transfer across chip-let
  - Exercise the test and repair infrastructure to fix any issues or impending issues
  - Impacts performance





## Thank you! Q&A

Test, Repair and Reliability Challenges of Chip-Let Interconnects