# CMS Binary Chip status

#### TUPO - 30/3/2010

#### Outline

intro + target specifications front end design, simulated performance current layout status system aspects powering module concepts, modularity issues GBT interface summary

Mark Raymond

## **CBC** context

CBC targeted at phase II outer tracker region

 $r > \sim 50 \text{ cm}$ 

assumed instrumented by short strips

~ 2.5 / 5 cm

CBC under design since March 2009

Lawrence Jones (RAL engineer)

planned submission May 2010



## architectural choice

eventually converged on **binary un-sparsified** architecture

"digital APV" concept abandoned – unsparsified data volume too high

#### some advantages:

- simpler on-chip functionality should offer lowest possible FE power
- simpler readout system on-detector occupancy independent data volume no requirement for data buffering data concentrating, or time-stamping
- synchronous, easy to identify upset chips
- easier to scale to future technologies
- easier to keep dynamic power variations small no bursts of activity
- simpler FE module design (less chips)

#### **CBC** main functional blocks

- fast front end amplifier 20 nsec peaking
- comparator with programmable threshold
- 256 deep pipeline (6.4 us)
- 32 deep buffer for triggered events
- output mux and driver (SLVS)
- fast (SLVS) and slow (I2C) control interfaces

#### FE amp comp. digital pipeline digital MUX ₽<sub>×</sub>× 256 deep V<sub>th</sub> – pipeline + 32 deep buffer V<sub>th</sub> fast **4** pipe. control control bias test pulse slow control gen.

#### CBC – CMS Binary Chip

## **CBC** target specs

#### sensor related

signal polarity: **both** coupling: DC (or AC) DC leakage: < 1μA charge collection: < 10 ns strip pitch: > 60 μm

#### front end and comparator

pulse shape: 20 nsec peaking time time-walk: < 16 nsec overload recovery: < 2.5 usec noise: < 1000e for 5 pF and 1  $\mu$ A

#### digital

latency: up to 256 event buffering: up to 32 attention to SEU tolerance

#### power

supplies: 1.2 Volts analog, up to 1.2 Volts digital consumption: <0.5 mW/channel for  $C_{SENSOR}$  5 pF rejection: as good as possible (expect to be powered by DC-DC converters)



## design strategy

#### **SLHC environment**

CMS tracker at SLHC will operate at v. low temperatures maybe as low as -30 -> -40 degrees (but will still want to test and run chips and modules at room temperatures)

#### simulation conditions

specs should be met at -20 -> -40 deg. for all process corners

can accept some relaxation at room temperature – e.g. don't require full range of leakage current compensation at higher temperatures

front end should run at VDDA=1.1 V to provide headroom for LDO in supply rail to improve PSR

will present a short summary of simulated front end performance here

for more detail see design review talk:

http://icva.hep.ph.ic.ac.uk/~dmray/CBC\_documentation/frontend\_design\_review\_Oct\_09.pdf

### front end schematic



## front end - preamp

switches and T- network in feedback allow polarity switching

simulated preamp output pulse shapes below for: all process corners, 0 and 1 uA leakage T = -40 degrees signal 2 -> 8 fC in 1 fC steps (6 pF sensor capacitance)









\*http://icva.hep.ph.ic.ac.uk/~dmray/CBC\_documentation/frontend\_design\_review\_Oct\_09.pdf

### postamp output

op-amp based

AC coupled to preamp (any DC shifts due to I<sub>leak</sub> removed)

variable current through 16k output resistor allows DC adjustment of level to comparator (fine tuning)

8-bit precision on every channel 0 - 200 mV , 0.8 mV lsb resolution (c.f.  $\sim$  50 mV / fC at postamp O/P)





### noise at postamp output



preamp input device power varied with C<sub>added</sub> to maintain pulse shape note: power in figures above includes preamp and postamp

noise within spec. (doesn't vary much (< ~10%) with process variation)

### overload recovery at postamp output





T = - 40 & +40, lleak = 0.5uA, preamp Cin = 6 pF,all process corners 4 pC injected at t = 50 ns, 2.5 fC injected at t = 2.5 us recovery spec. comfortably met (< 2.5 usec)



## time-walk at comparator O/P

dependence of comparator fire time on signal size must be less than 1 BX

#### **Atlas specification**

 $\leq$  16 ns time difference between comparator output edges for input signals of 1.25 fC and 10 fC, for a threshold setting of 1 fC

(spec. defined for 300 µm sensors)







### some other bits





Pipeline control logic

SEU mitigation techniques employed

Three 8 bit counters (2 gray code, 1 binary) which sequence the Write and Trigger Pointers. Latency seperation check included.





### powering

would like to explore some options

```
switched capacitor DC-DC (CERN)
converts 2.5 -> 1.25
might be needed if tracker I/P voltage (~12V) cannot be converted to ~1.25 V in one stage
could use 1.25 to provide CBC digital rail
```



### module concepts

#### hybrid, bonding, PA issues

CBC prototype will be "conventional" layout 128 channels, effective pitch 50 um, wire-bondable

=> no special test setup preparations generally easier to test

but 50 um not well matched to sensor pitch ~ 110 um

would like to avoid separate PA's in future system

=> pitch adaption somewhere - where?

hybrid? - possible but need space to fanout

would help if CBC input pad pitch better matched to sensor ~ 100 um pitch

#### other issues

would module design benefit from 256 (2 x 128 channels back-to-back)?

should we be looking at bump-bonding for chip/hybrid/sensor connection?



re-examine these issues for next iteration

### 128 -> 256 back-to-back?

can share some functionality e.g. power, control but not much else

reduced area, cheaper

less pads available on 256 version

less flexible

overall power consumption probably not much different

#### conclusions

256 back-to-back probably not impossible

but detailed study may yet reveal difficulties

bump-bond option will increase height by factor ~ 2, plus many other implications -> significant changes to layout

128 is still the best prototype unit for now



0201 capacitors

## modularity issues

need multiples of 4 x 128 channel chips to combine onto 80 Mbps e-links GBT takes 40 e-links



need to be close to (on or not much under, not over) 40 link boundaries to make efficient use of GBT

### sLHC strips readout system



binary unsparsified output frame format similar to APV (just hits, not analog values)

CBC provides output data at 20 Mbps

keep data frame ~7  $\mu$ s (must be less than average L1 separation)

=>4 CBCs data combined onto one 80 Mbps onto one GBT input 40 x 80 Mbps lanes combined on 3.2 Gb/s off-detector fibre (up to 160 CBCs / fibre) link power ~ 20% overall channel power (assuming fully populated 2W link)

#### combining chips output data using CERN e-port IP core



→ 80 Mbps

use e-port (e-link) to communicate with GBT (CERN IP core) automatically takes care of synchronization also has receive data path but we will probably not use (plan to use I<sup>2</sup>C bus for slow control)

all FE chips produce 20 Mbps output data frame

CBC1 programmed to be master combines pairs of CBC data streams into 2 x 40 Mbps (compatible with e-port requirements) e-port combines 40 Mbps streams to produce 80 Mbps

will probably not implement e-port in CBC May '10 submission some aspects still under development

but nevertheless a clear route to provide the CBC->GBT system interface in the future

note: single lines shown outside chips but assume all differential SLVS using CERN SLVS interface driver/receiver

### alternative scheme

put e-port in separate GBT interface chip

CBC1 another chip on hybrid but could offer some future flexibility . . . an option to consider in the future 20 Mbps **GBT** interface chip CBC1 20 Mbps 40 MHz e-port ✤ 80 Mbps CBC1 ♠ 40 Mbps **n** n **n** 20 Mbps CBC1 20 Mbps

incorporates circuitry to combine CBC data streams

### test structures

lots of space available

test structures to include will be:

one complete dummy channel with buffered signals along chain (top edge of chip)

e.g. preamp, postamp and comp. O/P's

pads on bias generator outputs

allows to directly measure and/or over-ride

other test structures

. . . .

e.g. individual components and arrays



#### summary

CBC development at advanced stage most of chip design complete front end meets all specs for simulated performance -40 -> +40, all process corners, VDD = 1.1V (minor restrictions for leakage current tolerance if collecting holes)

May submission – CERN MPW => plenty of chips expect back ~ September

system issues

compatible with DC-DC powering schemes some options to study (with/without switched cap. DC-DC and/or LDO) route to compatibility with GBT system seems clear can incorporate e-link in future

future?

CBC targeted at phase II – now a long way away final chip/system may not look much like todays concepts

will learn a lot from this prototype

gain some experience with 130 nm lots of functionality and performance issues to study useful information for future chip and system developments

### extra

### **CBC** power estimate



#### 0.5 mW / channel seems like an achievable target (c.f. 2.7 mW for APV25)

digital is biggest uncertainty, and maybe largest contributor hope to improve estimate as design progresses can consider running at lower voltage (dig. power ~ V<sup>2</sup>) => extra contingency e.g. 1.2 -> 0.85 power consumption halved will keep power rails separate on chip to keep option open

#### using numbers above: 128 chan. chip needs ~ 20 mA analogue, ~30 mA digital

# front end PSR without LDO supply



#### time domain picture

measured noise waveform added to VDD rail supplying FE circuit

sampled scope data for Enpirion "quiet" converter provided by Aachen

but x10 to (artificially) make it noisier

~ 80 mV pk-pk

1 fC normal signal completely swamped by noise

**Ref:** http://indico.cern.ch/getFile.py/access?contribId=24&sessionId=0&resId=0&materialId=slides&confId=47293

# front end PSR with LDO supply



measured **x10** (80 mV pk-pk) noise waveform now added to LDO Vin

LDO loaded by single CBC frontend + 25 mA extra dummy load

1 fC signal at postamp O/P now appears

postamp O/P noise just visible

′~ 125e pk-pk

## matching CBC modules to GBT links Duccio's FNAL workshop talk – October '09

| Tag                         | OB_L1   | OB_L2   | OB_L3   | OB_L4   | EC_R1                                                                                | EC_R2   | EC_R3   | EC_R4   | EC_R5   | EC_R6    | EC_R7   |
|-----------------------------|---------|---------|---------|---------|--------------------------------------------------------------------------------------|---------|---------|---------|---------|----------|---------|
| Туре                        | rphi    | rphi    | rphi    | rphi    | rphi                                                                                 | rphi    | rphi    | rphi    | rphi    | rphi     | rphi    |
| Area (mm²)                  | 8475.8  | 8475.8  | 8475.8  | 8475.8  | 8475.8                                                                               | 8475.8  | 8475.8  | 8475.8  | 8475.8  | 8475.8   | 8475.8  |
| Occup (max/av)              | 2.7/2.6 | 3.2/3.0 | 1.9/1.8 | 0.8/0.8 | 3.9/3.5                                                                              | 2.8/2.5 | 2.2/2.0 | 3.3/3.0 | 2.6/2.4 | 2.0/1.8  | 1.7/1.5 |
| Pitch (min/max)             | 110     | 110     | 110     | 110     | 110                                                                                  | 110     | 110     | 110     | 110     | 110      | 110     |
| Segments x Chips            | 4x6     | 2x6     | 2x6     | 2x6     | 4x6                                                                                  | 4x6     | 4x6     | 2x6     | 2x6     | 2x6      | 2x6     |
| Strip length                | 24.9    | 49.8    | 49.8    | 49.8    | 24.9                                                                                 | 24.9    | 24.9    | 49.8    | 49.8    | 49.8     | 49.8    |
| Chan/Sensor                 | 3072    | 1536    | 1536    | 1536    | 3072                                                                                 | 3072    | 3072    | 1536    | 1536    | 1536     | 1536    |
| N. mod                      | 960     | 1248    | 1536    | 2016    | 400                                                                                  | 480     | 560     | 600     | 680     | 760      | 800     |
| Channels (M)                | 2.95    | 1.92    | 2.36    | 3.1     | 1.23                                                                                 | 1.47    | 1.72    | 0.92    | 1.04    | 1.17     | 1.23    |
| Power (kW)                  | 1.5     | 1       | 1.2     | 1.5     | 0.6                                                                                  | 0.7     | 0.9     | 0.5     | 0.5     | 0.6      | 0.6     |
| N of GBTs                   | 160     | 104     | 128     | 168     | 80                                                                                   | 80      | 120     | 60      | 60      | 80       | 80      |
| GBT power (kW)              | 0.3     | 0.2     | 0.3     | 0.3     | 0.2                                                                                  | 0.2     | 0.2     | 0.1     | 0.1     | 0.2      | 0.2     |
| ~ radius [cm]               | 50      | 65      | 85      | 110     | module dimensions 10 x 8.5 cm <sup>2</sup> (z x rphi)                                |         |         |         |         |          |         |
| circumference [cm]          | 314     | 408     | 534     | 692     |                                                                                      |         |         |         |         |          |         |
| rods per ½ barrel           | 40      | 52      | 64      | 84      | 1/2 barrel 120 cm long                                                               |         |         |         |         |          |         |
| 128 channel<br>chips/module | 24      | 12      | 12      | 12      | => 12 modules / 1/2 barrel rod                                                       |         |         |         |         |          |         |
| chips / rod                 | 288     | 144     | 144     | 144     | GBT lanes not fully utilised, but integer no. of GBT's per rod is presumably optimal |         |         |         |         |          |         |
| 80 Mbps lanes / rod         | 72      | 36      | 36      | 36      |                                                                                      |         |         |         |         |          |         |
|                             |         |         |         |         |                                                                                      |         |         |         |         | <u> </u> |         |

(40 lane)GBTs / rod

#### unsparsified binary advantages

data volume known - no trigger-to-trigger variations - occupancy independent

=> simpler readout system on-detector

FE chips -> GBT -> link

no extra data buffering and concentrating chips in the system

=> simpler system off-detector too I think

known and unchanging origin and volume of data – must simplify processing

synchronous system - all FE chips doing same thing at same time - easy to emulate externally

no need to timestamp on front end

easy to spot upset chips (pipe address wrong)

likely to be lowest power FE chip architecture

analog FE + comparator followed by simple digital pipeline and off-chip mux

no ADC, no analog pipeline readout

easier to scale designs to even finer feature processes

(analog pipelines using gate capacitance probably only just possible in 0.13) easy to keep dynamic power variations small or negligible simpler FE module design (less chips)

#### unsparsified binary disadvantages

no pulse height information

binary worse for position resolution

common mode immunity – (short strips will help – less pickup)

larger data volume => more off-detector links

#### data volumes

unsparsified data volume / 128 channel chip = 128 bits + header (say 12) = 140 bits sparsified data volume / hit 8 bit chip address 7 bit channel address (x no. of channels above threshold) 8 bit timestamp

occupancy 0.8%, 1hit, data volume = 23 bits occupancy 4%, 5 hits, data volume = 51 bits

so factor ~ 3-6 (depending on occupancy) inefficient assuming 100% link bandwidth utilised factor ~ 1.5-3 inefficient assuming 50% link bandwidth utilised

(do we need to provide for heavy ion running in CMS at SLHC?)