CALICE MAPS Interim Design Review 1, Part 2, RAL, 25/01/07
==========================================================
Present: Andy Clark, Jamie Crooks, Paul Dauncey, Matt Noy, Marcel
 Stanitzki, Konstantin Stefanov, Renato Turchetta

Minutes: Paul


Overview: The width required (and hence dead area) for the logic columns
 will not be known until the layout is complete. The width should be
 rounded up to a multiple of the pixel size so that it represents an
 integer number of dead pixels. This is likely to be 200mu = 4 pixels.

 The issue of reflecting neighbouring sectors and hence lumping the dead
 areas for two logic columns together was discussed. The optimal
 arrangement is not clear; an EM shower will have a transverse size ~9mm
 and so is roughly of the same order as the live areas ~2mm. It was felt
 it would be better to have random inefficiencies so they do not depend
 on where the shower occured, which would argue for not reflecting the
 layout for neighbouring sectors, but translating it. However, it might
 be that the logic columns need e.g. ~3.5 pixels space, so combining
 them could be done in 7 rather than 8 pixels, reducing the dead area.
 Unless this latter happens to occur, then tranlating the designs would
 be prefered.

 Each sector will handle 42 pixels which will be subdivided into seven
 groups of six neighbouring pixels in terms of the memory readout. These
 require six bits for the six pixels and three for the group label. The
 timestamp will have 13 bits (up to 8k bunch crossings), which was
 considered reasonable, although 14 bits (up to 16k bunch crossings)
 would have definitely been safe. This means each memory location stores
 6+3+13 = 22 bits. There will be a total of 19 memory locations (i.e.
 possible hits) for each sector of 42 pixels. The addresses for the
 seven groups of pixels are 1-7, with 0 reserved for no valid pixel. The
 addresses will have to be set externally, cycling from 0 over all seven
 valid values and back to zero at ~50MHz, i.e. between each bunch
 crossing. The second sensor design should have this implemented on the
 sensor.

 The mask register shown in the slides is now obsolete; it is now
 implemented in the individual pixels.

 There is a global "force hit" input signal which is tracked over the
 whole sensor, which is simply OR'ed with the normal pixel output.
 This is downstream of the mask and so is not influenced by it, i.e.
 masking will have no effect and all pixels will respond to this signal.
 The signal can be changed at any time, specifically during a bunch
 train.

 The configuration data (mask and trim DAC setting) are loaded into a
 serial shift register 168 bits at a time (one per pixel column) and
 then the whole SR is parallel-loaded into the pixels, before being
 refilled and the process repeated. However, the parallel load is not
 destructive so the same SR data can be loaded multiple times. This
 would in principle be possible during a bunch train. However, the mask
 and trim bits are not loaded separately so any pattern sent would 
 disrupt the trim DAC settings, reducing the utility of this. It seemed
 more useful to keep to the usual pattern of loading the configuration
 data before the bunch train and, if desired, having short bunch trains.

 There is an memory overflow flag per 84x84 pixel bank, for a total of 
 four flags in all, which go directly to output pins on the sensor.

 The 22 bits saved in the memory are combined with another 9 bits (the
 "row encoder") which identify the 42-pixel sector at readout, resulting
 in a 31 bit word. The row encoder bits are hardcoded into each logic
 column row. All 9 bits are not strictly needed for the first sensor as
 there are only 168 sectors in each column, which would need 8 bits to
 label. However, the second sensor will have 5/2 times the size in both
 dimensions (i.e. 420x420 pixels total compared with 168x168 pixels)
 giving 420 sectors per column, which will require 9 bits. Hence 9 bits
 have been reserved already. The data from the four columns cannot be
 distinguished internally; this information must be provided by the
 external readout control.

 The total memory in each logic column is then 22x19x168 = 70224 bits,
 which gives a total over the whole sensor, with four columns, of
 280896 bits, i.e. ~35kBytes. However, on readout, the extra 9 bits
 which make the words up to 31 bits (rounded externally to 4 bytes)
 would then result in a maximum data volume per sensor of ~51kBytes.
 The equivalent for the second sensor will be (5/2)^2 = 25/4 larger and
 so is ~319kBytes.


Logic simulations: The comments refer to pages in the document used in
 the review.

 Pg 5: All the transistors work at 1.8V except for a few special cases;
 such as writing to the SRAM, where the signal has to overpower the SRAM
 transistor settings. These are the "HV logic" which will be at 3.3V.

 Pg 7: The variation within a sensor of the monostable capacitors is
 likely to be much smaller than the sensor-to-sensor variations, so the
 monostable times are likely to be much more similar that the study here
 implies. Taking the plot on pg 10 at face value, then a bias setting
 would be needed which ensures the shortest monostables are just large
 enough to be guaranteed to be hit for one bunch crossing tick, i.e.
 150ns. However, the average length would then be ~10% longer and the
 maximum ~20% longer. These would give rates of ~10% and 20% of double
 hits, respectively, hence giving an average of ~10% double hits. Note,
 the ratio of double to single hits will allow the monostable length for
 each pixel to be measured.

 Pg 8: The monostable length can be adjusted by the current biases.
 There are two biases, one for each type of pixel (shaper and sampler).

 Pg 9: The monostable inverter is current limited to 5uA. This is done
 because otherwise there could potentially be a large total current 
 (~100mA for 10's of ns) if they all switched at the same time. This
 would be most likely to occur at power-up, hence the "power-on-reset"
 here. In addition, the monostables will have a separate power supply.
 It was thought that having the return path via the substrate would not
 be sufficient and so there should also be a separate ground.

 Pg 12: The table shows even 2.5V is not ideal in some process corners
 so 3.3V will now be used for this logic. The numbers show that the
 SRAM is not reliably loaded when driving ~10 sectors simultaneously, at
 least at 2.5V, as each SRAM is potentially being switched 22x8 times
 on each bunch crossing. There is now a driver for every four sectors.

 Pg 16: The edge variation seen is ~20ns. This was thought to set a
 (somewhat conservative) limit on the readout clock speed of 5MHz. Also,
 the multiplexing results on pg 19 show a 25ns rise/fall time, which
 imply a limit at around 20MHz, which would result in the same 5MHz
 readout speed per column, so this seems a reasonable value to assume.

 Pg 20: The race condition seemed risky for the sake of saving one clock
 line, so a third clock line will be added to remove this issue. The
 extra input will be implemented as phi_3-bar as it could then be
 approximated by wiring phi_2 to the input if desired.

 Pg 24: The SRAM acts as a stack (i.e. a LIFO). There is no counter
 of the number of locations; the pointer is the only item which keeps
 track of the occupancy. Hence, the total number of hits following
 memory overflow cannot be recorded.

 The issue of glitches on the phi_2 clock was raised; this could cause
 corruption of the SRAM if they occured. This needs to be simulated to
 check the sensitivity.

 The 19th hit which fills the memory in a sector will not cause the
 overflow condition to be set; a following 20th hit will do this. The
 overflow status is not recorded in the output data per sector. The bit
 can be read from the output pin but this is common to each 84x84 pixel
 array and so does not uniquely locate the sector(s) which caused it.
 Although it would be clearer to get the overflow bit per sector, it
 was not thought essential. Once the 19th hit is recorded, the pixel is
 then dead, irrespective of whether a hit actually occured later or not.
 Hence, any pixels with 19 hits should count towards inefficiency from
 the timestamp following that of the 19th hit.

 Pg 29: Since both the rising and falling edge of the HOLD signal is
 used, then it must run at 3MHz rather than 6MHz as stated, i.e. half
 the bunch crossing clock.

 Pg 30: A ~4kHz signal is currently required so as to avoid the
 possibility of the internal nodes of the latch-hold circuit floating up
 to the transistor transition point (and hence generating a power surge)
 when not in use. This was thought to be undesirable and a pulldown
 should be added to prevent this and so remove the need for the 4kHz
 clock.

 The leak rate was found to be long (~0.5ms) compared to the clock times
 being used but the simulation of this had not been done at high
 temperatures; checking this as 50C should be done.

 Pg 33: The 100kHz minimum is assumed for the SR parallel load clock as
 it is limited by the time to serial load the SR with 168 bits. The
 latter is assumed to run at ~100MHz, which would give a maximum
 parallel load rate of 600kHz. At 100kHz for the parallel load, then the
 5x168 = 840 loads will take ~8ms, which is acceptable; there is no
 major constraint on the time to load configuration data.

 Pg 48: Note the comment in the overview (above); the first sensor needs
 eight bits to label the sector, not seven as stated here.

 Pg 50: The address lines must not move much relative to the clocks over
 the whole length of the logic columns. Hence, a check should be done to
 see if the address propagation time is as close as possible to the
 clock propagation time, in terms of the buffering and RC loads on the
 lines.

 There was a question on whether the simulation contains enough stray
 capacitance for these checks. It may be necessary to do a full
 simulation of the propagation times.


Top level schematic: The schematic does not yet include any test
 structures. Jamie is intending to add at least a monostable and some
 number of pixels in an array, with access to the analogue signals.
 These would require the trim DAC and mask configuration settings to
 be wired to input pins and hence set "by hand" externally.

 All the I/O signals are currently single-ended, either 2.5V or 3.3V.
 There are no LVDS converters in the library but they could be made from
 basic components with some work. It was decided that given the short
 time left, it was better to leave the signals as is and do any
 necessary conversion externally if/when required.