-
Notifications
You must be signed in to change notification settings - Fork 1
/
NOTES.txt
464 lines (346 loc) · 16.9 KB
/
NOTES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
SPDX-FileCopyrightText: (c) 2020 Tristan Gingold <[email protected]>
SPDX-License-Identifier: Apache-2.0
Mini-blog from a non-native English speaker...
I decided to participate to the Open Source Shuttle Program (by
efabless and google) because this was a unique opportunity to design
my own ASIC.
Although I do hardware design, I didn't have a design to be tapped
out. So I have to create a new design. Due to the time constraints
between the announce (June 2020) and the deadline (November 2020),
this couldn't be a complex design from a logic point of view. And the
purpose of the shuttle program is mostly experimental.
I quickly decided to do a TDC and FD design. TDC stands for Time to
Data Converter, which is a fancy name for simply timestamping a pulse
with high precision (much more precise than the clock). FD stands for
Fine Delay and this is the opposite of TDC: generating a pulse at a
given time (again, higher precision than the clock).
The main advantages of this design are:
* Not a very complex design (again). This simplifies the verification
a lot, and in case of any errors (DRC or process), it should be
possible to use the design partially. Although the process is
mature, we are using open source tools which aren't as heavily
tested.
* Allows to analyze the process. In particular, how long it takes for
a pulse to propagate through a certain gate (the delay gate). And
this could be studied at various PVT (Process, Voltage, Temperature)
points.
* This is not a design that could be written the same way for an FPGA.
On an FPGA, you have already defined and fixed complex gates, so all
you can do is to play with certain pathes (like carry lines).
And of course, I also wanted to test in real conditions the synthesis
feature of GHDL. So the design will be written in VHDL. As the
toolchain for the shuttle program is verilog only, the design will be
first synthesized and the result will be a verilog netlist. Then this
netlist will be the entry point of the toolchain. This will make the
integration almost transparent.
August
------
The openlane tool chain wasn't available when the shuttle project was
announced. So I had time to think about my design and to start to
write it.
When the first version of openlane was released (late July), it was
time to try it. The first test didn't go smooth. The design was too
small and hit unusual constraints.
Hopefully, I get a quick answer from the forum and I was able to
generate the gdsII file which decribes the masks of the results.
I had to play with the parameters and in particular adjust the density.
The final result doesn't only contain your gates. There are a lot of
extra gates:
* decap: decoupling capacitors which provides current in case of peaks
* tap: small cells that power the bulk and wells.
* fill: cells that fill any hole so that abutment is continuous
* diodes: cells needed to protect other cells from charges accumulated
during fabrication. They are needed only for long wires, but the
rules are complex. The toolchain will automatically insert diodes,
but there must be room for them.
In addition, openlane also creates a power grid. If the pitch of the grid
is so large than the grid doesn't fit on your design, the toolchain fails
(with a cryptic error message).
I was also able to create a design from VHDL files. I cheat a little
bit to make the process easy: I first synthesize the VHDL sources but
generate a simple verilog as an output and then I feed openlane with
the verilog file (which is a netlist).
The main part of a TDC is the tap line: a long chain of delays and
flip-flop. But how long is the tap line ? Well, it depends on the
propagation time in the delay element, on the wires and on the
frequency.
* For the delay element, I first need to choose it. The obvious
choice would be a not gate. But I didn't want to deal initially
with the inverted output. So, the second obvious choice would be a
buf gate. Until I realized there is also a clkbuf gate. What is the
difference ? Simple: the clkbuf gate is balanced, so the propagation
time of a 0 is the same as the propagation time of a 1. That's the
gate to use.
* What is the propagation of the a clkbuf gate ? This information is
written in the .lib (liberty) file, but it is not easy to find it.
It is also given by the static timing analysis (STA) tool. The result
is about 100ps
* What is the frequency ? It wasn't fixed so I estimated a frequency of
about 200Mhz. This means a 5ns period so I need at least 50 taps.
Later some figures appeared: the max frequency could be 50Mhz (so a
period of 20ns), which means 200 taps.
* What about the wires ? I don't know how important to the delay they
are. In the first openlane release, the static analysis tool was applied
only before routing, which means that wires length was not considered (or
maybe just estimated). The tool team planned to use a parasistic
extractor that compute wires capacity and resistance.
I had a look at that tool (SPEF_EXTRACTOR) before it was integrated.
It was very interesting to me: I didn't know anything about parasistic
extraction and about the various files (LEF, DEF, SPEF). And the tool
was a small (about 700 lines) python script. So I started to read the
source code, and also started to refactor it to make it more efficient
and more understandable. And I read about the files format.
Just when I started to integrate the tool in the openlane toolchain,
a new release of openlane was published. Depressing moment. They
were faster than me! But the tool was simply integrated, without the
refactoring work. I restarted to refactor and submitted several PR.
The submission went very well, which pushed me to continue.
The conclusion on wires is simple: almost no impact on timing. Except on
my personnal timing!
September: OSU library
----------------------
At the end of September, there was the Fossi dial-up event about OSU cells.
I initially thought that would be the least interesting talk, because
I am not an analog designer. But the talk was very accessible, very
understandable and maybe the most interesting one. Because I didn't know
a lot about standard cells.
To sum up, the OSU team has designed a standard cells library for the
skywater 130 process. And that's almost all. But the library is very
interesting. It is small, so you aren't lost by the huge number of cells.
The cell library of the skywater foundary is much larger and it is not
always easy to understand the purpose of each cell. Also the OSU library
has a pdf file that quickly document each cell.
During the talk, they explained the OSU library must be faster, as it was
designed to be. However, it is not possible to use it with openlane, there is
no integration.
I started to integrate the OSU library. Most of the work consists in
adjusting the scripts of the open_pdk project to have a standard place of
the library files so that openlane can found them easily. That's not very
complex.
Then I wanted to use the OSU library in a design. After fixing some
simple issues, I quickly got weird error message from tools or even
crashes.
So, this is not plug and play! This requires a better understanding.
The first issue was an incoherence between the LIB and the LEF files.
The LIB file contains timing characteristics of each library cell
while the LEF file gives an abstract layout (details of lower layers
are not present). Weirdly, for two cells, the name of the pins were
different in the two files. Some tools were confused. The easy way
was to change the name in the LIB file, so that it matches the other
ones.
A second issue was with access point (AP). This are the locations in
the cells of the pins. At least one of the tool, the placer, needs to
have them aligned on the grid. This required parameter adjustements.
Unfortunately, in the second delivery of OSU library, the pins were
only aligned vertically and deliberatly not aligned horizontally.
The last issue was resolution in DEF and LEF files. The units used
by OSU were 1/100 microns, while the foundary unit is the nanometer.
One of the tool generated incorrect outputs and needed a small change.
This was changed in the second delivery of OSU in November
This second OSU delivery was very different from the first one: the
cell names were completely changed and now follows the naming
convention of the foundary library. The layout of the files was also
changed so all the scripts have to adapt.
I was able to create a macro using the 18T_hs variant of the library,
but it had many DRC issues. Looking closer, they look like real
issues, admitted by the designers. So I won't be able to use those
macros in my design.
October: generating blocks
--------------------------
I started to write some HDL code for my design, but quickly enough
I realized that the place and route result was irregular for my
timing elements (the tap line and the delay line).
So my idea was simple: let's create a very regular placement for
them. But I haven't seen an easy way for that. So create a
simple tool that generates a small macro for these elements.
That was a very fun idea, as I had to understand the files format.
That's also one of my favourite approach: be able to read the files.
So I started to write a python script (see in the tools/ directory)
to directly generate the DEF file of a tap line. Then use this
file as an input to the OpenLANE chain to generate a GDS file.
[ now: manual place, use API, use magic ]
[ So naive: cannot deal with too many configs ]
November: the whole design
--------------------------
[ Clocking ]
[ Power estimation ]
[ Next iteration: better precision and accuracy, bursts, best techno, best layout ]
[ Test vectors ]
[ bug magic: cell with FN orient ]
[ crisis time: macros issues, drc, lvs, antenna]
[ Incorrect cells ]
[ antenna vs LVS ]
[ STA and memory ]
[ DRC issues ]
Magic
-----
I planned to have many hard blocks (macros in openlane language).
Fine, but you have to place them, through a file that indicates the
position. I would prefer a more graphical way as I need to take into
account the distance between macros (they cannot overlap!), and
distance with input pad.
Magic is the graphical tool of the tool chain. It can be used to draw
cells and can also be used to edit or view a full design. It it also
the tool to do DRC and write the final GDS. My idea was simple:
* start from an almost empty design (to ease graphical display)
* run the floorplan phase (at least the ioplacer)
* edit the result (the DEF file) with magic
* graphically place the macros
* save the result (a .mag file)
* convert the result to the macro_placement.cfg file
* Or better, write a tcl script for that.
As I want to place the macros near the pins, I have to select the
pins. The caravel design has 38 gpios (mprj_io[37:0]). The pin
mprj[11:0] are multiplexed with JTAG, UART, SPI, FLASH or IRQ. So I
don't plan to use them. I still can use 26 gpios. For each gpio, the
user design has 3 pins: io_in (for the input), io_out (for the output)
and io_oeb (to enable the output).
This was a nice idea on the paper but it doesn't work well. In magic, you
cannot move cells directly with the mouse, only with keys. Or I didn't find
how. But there is no default key to move the cells, you need to enter
a command. Due to lack of time, I didn't investigate further. I simply
look at the DEF file, look at the pin place and use that number in the macro
placement file. To be investigated later. Also might worth to check if
klayout might help.
Complaints
----------------
GHDL: A missing warning (comparison always false/true) makes me lost a
couple of hours. Direct instantiation could be a blackbox. You should
give a name to the cells when possible.
Yosys: Non-fully simplified output for assign makes me write
work-arounds. This has been fixed. Also, I have seen weird
optimization in a structural design (elimination of conb_1) that was
worked-around by adding a keep attribute.
skywater: Your sky130_fd_sc_hd__clkdlybuf4s15_1 cell doesn't pass DRC
so we cannot use it.
OpenSTA: Memory requirement is too high. This almost froze my linux
machine (Ok, only 8 GB in a docker container).
io_place: Your configuration must accept comment, you must be more
flexible (allow holes, allow to specify pitch). May need more
warning.
manual_macro_place: Need a lot of time to correctly align macro.
tapcell: The output is not always valid (in case of holes due to
macro).
CTS: Too slow. We need to handle macros.
placement: Sometimes we need to manually place cells. Needs more
flexibility. How to place DFF near pins ?
pdn: I am not sure I really fully understand the config file...
All the straps must be considered as pins (and not only the first two).
fastroute/tritonRoute: Sometimes the decision is really stupid.
Required manual obstruction.
magic: DRC is slow (and was buggy), GDS output is slow. Would be nice
if I was able to edit the DEF file. The GUI is so 80's, but that's OK!
For placing macros, rastnet would be nice.
flow.tcl: the whole output must be logged, inclusing stderr. Colors
are evil.
netgen: There are reports I haven't understood (circuits are OK but
not top pins). I am not sure I can trust LVS about power and that's
very annoying as that's the main value.
diode insertion: Doesn't work. All the strategies fail.
iverilog: A little bit too slow.
Conclusion
----------
Lot's of fun, lot's of work.
At the end I created something, with some level of confidence.
It is not exactely what I initially planned to do. Less features.
I had to learn the tools, how they work, what they can do, what they
cannot do.
I had to learn the constraints of designing.
Using macros was a good idea: they are small and thefore they can be
quickly harden (a few minutes). So I can iterate. Divide and conquer.
But macros have constraints: minimum size so that they can be
connected to the power strips.
We have work. We have to improve the tools. We have to write
tutorials and tips.
After this fun experience, I am ready for the next iteration!
Thanks to Efabless and Google.
Annex
-----
Some random notes...
26 gpios:
37, 36: (far) internals FD + TDC
35, 34, 33, 32: (far)
31, 29, 27, 25: (best, left):
30, 28, 26: (good, left):
24: (far, left)
23, 21, 19, 18, 16, 15: (best, top)
22, 20, 17: (good, top)
14, 13: (far, right)
12: (best, right)
TDCs - fd_sc_hd:
*2 + ref: internals
*1: hori x1
*1 + ref: hori x1
*1: hori x2
*1 + ref: hori x2
*1 + ref: hori x4
*1 + ref: hori x8
*1 + ref: x? dly: clkbuf_1
*1 + ref: x? dly: clkbuf_2
*1 + ref: x? dly: clkbuf_4
1 + ref: x? dly: clkbuf_8
1 + ref: x? dly: clkbuf_16
*1 + ref: x? dly: clkdlybuf4s18
*1 + ref: x? dly: clkdlybuf4s25
*1 + ref: x? dly: clkdlybuf4s50
1 + ref: x? dly: not
1 + ref: x? dly: clknot
TDCs (hd, hs, ls, ms, osu_sc_t18): [x5]
1 + ref: x16 + clkbuf - with scan - very long (1000 taps): for 10Mhz
1 + ref: x2 + clkbuf + scan
TODO: enable for ref TDCs and FDs.
TODO: detect unhandled macro in placement
Powers:
First h strap: first_row_Y - rail_width/2 + strap_offset
10880 - 480/2 + 16650 = 27290
(first_row_y = bottom_margin * site_height = 4 * 2720)
site: 460x2720
TRACKS X 1700 DO 859 STEP 3400 LAYER met5 ;
ROW ROW_0 unithd 5520 10880 FS DO 6323 BY 1 STEP 460 0 ;
Params:
Type: stdcell, grid
Stdcell Rails
Layer: met1 - width: 0.480 pitch: 2.720 offset: 0.000
Straps
Layer: met4 - width: 1.600 pitch: 153.600 offset: 16.320
Layer: met5 - width: 1.600 pitch: 153.180 offset: 16.650
Connect: {met1 met4} {met4 met5}
- VPWR + NET VPWR + DIRECTION INPUT + USE SIGNAL
+ LAYER met5 ( -1454290 -800 ) ( 1454290 800 )
+ FIXED ( 1459810 27290 ) N + SPECIAL ;
NEW met5 1600 + SHAPE STRIPE ( 5520 793190 ) ( 2914100 793190 )
NEW met5 1600 + SHAPE STRIPE ( 5520 640010 ) ( 2914100 640010 )
NEW met5 1600 + SHAPE STRIPE ( 5520 486830 ) ( 2914100 486830 )
NEW met5 1600 + SHAPE STRIPE ( 5520 333650 ) ( 2914100 333650 )
NEW met5 1600 + SHAPE STRIPE ( 5520 180470 ) ( 2914100 180470 )
NEW met5 1600 + SHAPE STRIPE ( 5520 27290 ) ( 2914100 27290 )
- VGND + NET VGND + DIRECTION INPUT + USE SIGNAL
+ LAYER met5 ( -1454290 -800 ) ( 1454290 800 )
+ FIXED ( 1459810 103880 ) N + SPECIAL ;
NEW met5 1600 + SHAPE STRIPE ( 5520 716600 ) ( 2914100 716600 )
NEW met5 1600 + SHAPE STRIPE ( 5520 563420 ) ( 2914100 563420 )
NEW met5 1600 + SHAPE STRIPE ( 5520 410240 ) ( 2914100 410240 )
NEW met5 1600 + SHAPE STRIPE ( 5520 257060 ) ( 2914100 257060 )
NEW met5 1600 + SHAPE STRIPE ( 5520 103880 ) ( 2914100 103880 )
Macros placement:
in pdn: look for met5 VPWR layers (bottom one), select one strip (eg: 486.830)
# in macro: look for bottom met5 VPWR: 8010
in macro.lef: look for VPWR RECT, compute middle: (8.010 + 6.410)/2 = 7.210
substract: 486.830 - 7.210 = 479.620
ghdl improvements
* name of cells
* blackbox
* labels
* split records
tools improvements:
* place io (distance, groups)
* visualize nets
* visualize timing issues
* visualize congestion
FD:
fd_hd
fd_hs
fd_ms
fd_ls
+ inline ?
+ hd with cdly25