Skip to content

Latest commit

 

History

History
62 lines (43 loc) · 19.4 KB

AVR_Overclocking.md

File metadata and controls

62 lines (43 loc) · 19.4 KB

Modern AVRs and overclocking

This is written from the context of modern AVRs however much of it applies to classic AVRs as well. This document was initially excised from a planned response to an issue where it was largely off topic, but contained enough information to be of value, hence needed to be retained somewhere. A quick note on classic AVRs - It's known that at room temp, ATtiny841's (rated 16 MHz) will do 20 no problem (and 16 from "tuned" oscillator), the tiny1634 rated for 12 can do 16. I have not seen how far I can push the old ones. The new parts have an advantage for overclockers* in that, except for the EA-series, there's some way to generate clocks on-chip that significantly exceed the spec'ed maximum even at 5V, while the 8 MHz internal oscillator on classics has no hope of that; even on parts like the tiny85 and 861, with the PLL, that kicks the clock up to 64 MHz nom. and then divides by 4. According to the datasheet the PLL "saturates" at around 85 MHz, giving you a whopping 1.25 MHz of overclocking (and the speed is probably poorly controlled). Among classics, if I were to bet on which ones overclock best, I'd point at the PB parts - 328PB, 168PB, 88PB, 324PB, just because they were the last parts released, and are most likely to use a more advanced process. On the other hand, they don't support full swing crystal drive, which makes them less amenable to overclocking with a crystal. (In general, crystals don't seem to be very popular at Atmel/Microchip (I belive this predates the buyout) for some reason; as soon as the internal oscillators were good enough to get away without a crystal and have UART work, they ditched the external crystal support. Only the DB, DD, and EA since then have had external high frequency crystal support. The tiny0, 1, and 2, mega 0, DA, and EB do not.)

* - I'd like to say that the improvements in the ISA help you not need to overclock (certainly the Dx's voltage independent max clock speed does if you're running at low voltage), but I'm not certain it does. Identical (machine/asm) code will run faster on AVRxt in any non-contrived case, as only a single instruction got slower, while many of the most common instructions are faster. They picked the instructions to juice up well (unsurprising, they can profile application code far better than I).

AVRs are well known to (at least at room temperature) overclock extremely well.

A large part of this is likely because they are designed to function at temperatures of up to 125C. As in "continue to function long term". 125C is pretty hot; these parts would be well within spec operating from the bottom of a pot of boiling water (assuming they were properly waterproofed, and so on). Often this is a big deal in industrial ("put controller for hot thing near hot thing) Most individuals looking at microcontrollers however, really don't care if their device will run while submerged in boiling water (Pro tip: When you make the digitally controlled coffee maker, the trick is to put the circuit board outside of the water tank). We're overwhelmingly likely to be using it at temperatures of... maybe 125 F max? More commonly in the narrow range between 20 and 35C or so? 125 F is only 55 C, and if we assume that Fmax(T) is continuous and decreases as T increases, we must conclude that, if the chip can run at 20 or 24 MHz (or 10, or 4) at 125 C, at 25C they must be able to run faster, right? Otherwise, either temperature does not effect the performance of the chip (we know that to be false), or there are regions where heating the chip up more would make it more stable, and while not impossible, this is rather implausible.

In any event, the point is, AVRs overclock real good; this was what I found to typically work:

  • Classic AVRs have been reported stable at speeds in the low 30 MHz range (rated 20). Likely some specimen variation is in play. This datapoint is second hand.
  • Tiny 0/1 - up to 30 sometimes works, or 32 from external clock. Internal oscillators usually go to around 32. Tiny 2 - up to 32 usually works, internal oscillators go up to 36ish. (note: typical maximum osc speed was extrapolated using excel's quadratic line of best fit, which fits measured speeds nearly perfectly, as only a single part made it through the top of it's cal range successfully, and it had an anomalously slow oscillator).
  • AVR Dx - up to 32 MHz on internal virtually always works. Up to 40, or even 48 is possible (specimen variation is seen here) from crystal.

General trends

  • You can almost always get higher clock speeds while remaining stable with an external clock.
  • If you're going to overclock, you want to get the extended temperature range parts - they're the "good" chips.
  • Be sure you have a well decoupled, stable power supply. It should be at the nominal voltage for the highest rated speed (this does not matter for Dx, except possibly at very low voltages - the Dx core runs at 1.something volts, produced from an on-chip regulator.

STOP!! Before you begin, remember than this dark sorcery is not to be used to personal or corporate gain, but only in places where arbitrary failure modes are no more than an embarasment. Your holiday lighting? Sure why not. Your cars collision avoidance mechanism? Your product delivered to a demanding customer? Maaaybe not,

Overclocking Dx-series

On AVR Dx-series parts, is achieved two ways:

  • Changing the MCLKCTRL register of the CLKCTRL. On the Dx-series, above 4 MHz, the granularity of this is 4 MHz, going from 4, 8, 12, 16, 20, and 24 MHz according the to spec. In reality, there are two "secret" speeds not mentioned in the datasheet. Setting the bitfield to a value 1 or 2 higher than 24 MHz results in operating frequency of 28 or 32 MHz respectively. Most parts can do either of these no problem at room temp!!!

  • For larger overclocks, the tuning is of little use for overclocking, due to the limited range. However, they can all take an external clock, and DD, DD and DU and DB parts can also use a crystal. I am aware of crystals working reliably on most E-spec (extended temp) parts at 40 or even 48 MHz! External clocks always work better and always have for overclocking AVRs, and 48 MHz has been found stable on at least one E-spec DB (but not on an I-spec one; the sample size is very small, however).

A 2:1 speedup, if actually stable, is a BFD, and I have found a whole bunch of baller things that are right atthe edge of what I can do with hand tuned ASM.

Overclocking tinyAVR parts

On tinyAVR, the internal oscillator is incredibly flexible, and is the primary method used to overclock: 1-series parts will do 24-25 no problem and most will do 30, though they fall apart above that. With solid supply rails at an external clock, though, they can be pushed to 32. 2-series parts will usually work fine on internal oscillator at 32, and collapse in the mid-30's just about at the top of the calibration range for the oscillator. This epic compliance of the internal oscillator makes these MUCH more fun to overclock :-) The datasheet makes a point of warning users not to change calibration by large amounts all at once. This is not new. However, study of the arcane code written by wizards of that sort has indicated that there is a "trick" to get around this, which I use succcessfully: Simply follow the write to the cal register immediately with a NOP. The source of this voodoo practice is the widely used digispark-alikes, which like the run at 16.5 (classic AVRs, 8.0 MHz nominal, passed through a PLL to multiply by 8 and divide by 4, tuned upwards for a base F_CPU of 8.25, multiplied by 2 (net) yeilding 16.5, which is better for USB on the marginal oscillator of classic AVRs).

The theoretical groundings of that practice are convincing provided that the assumptions it rests upon are correct (though this is not known to be the case) The most frequent case of incorrect execution is 1 bits in the result being set to 0 instead; this was noticed immediately when doing overclocking trials on tinyAVRs (random garbage values can also happen too, though I can't rule out the possibility that those were from an intermediate experiencing 1->0 errors;), the voodoo solution appears to be supported. If the "no-zero-to-one-errors" conjecture is true, and if the assumption that abrupt clock change induced errors are similar to overclock induced errors is valid, that would make the use of a NOP (or possibly two nops - (not a _NOP2() which is a rjmp .+0, a great way to get a 2 cycle nop for a single instruction word and commonly used in cyclecounting based time critical code, but _NOP(); _NOP();, which is actually a pair of 0x0000 nop instructions) clearly the correct thing to do to eliminate improper execution after a large change to the clock speed - not only does it not do anything, the fact that the opcode is 0x0000 means that even if the instruction fetch glitches, if it can only clear 1 bits to 0, then it cannot transform the nop into anything else. However neither of these are known to be true.

More about how crazy the tinyAVR internal osc is, and what tools could make overclocking more useful

Modern tinyAVRs are more interesting to overclock than Dx (on Dx. the cal is pretty worthless. Granularity is too large to really trim the oscillator accurately, hence autotune is of limited value, and there are so few steps, that they can't swing the speed far enough to expose new practical clock speeds). Of course you can overclock to 32 MHz just by just setting a value 2 higher than the value for 24 MHz, and that typically works at room temp. (that's as far as it goes - after that the last 4 just repeat), and I've got parts that run at 48 MHz, fully twice the spec!)). On tinies, though, he internal oscillator is nuts: 64 steps from 4/8ths nominal up to 13/8ths nominal on the 1-series, and 128 steps from around 5/8ths through 15/8ths of nominal on 2-series.

I have seen only one part that would reproducibly run the cal routine all the way up to the maximum, with 20 MHz selected while remaining stable enough to run my tuning sketch without apparent errors - (a couple would occasionally make it through - but have gotten some errors, which means they could not be used at this speed) this part also had an unusually slow internal oscillator, such that it simply couldn't reach high enough speeds to malfunction at room temperature). The transition from no apparent errors to visible errors happening very often happens over a change of less than 1 MHz, but that still means that these parts all have several cal settings at which they are struggling to various extents. That would be an interesting laboratory for exploring the behavior of AVRs that fail to excecute instructions correctely due to exceeded operating conditions. IMO an ideal investigation would need to know which instructions were most sensitive, I'm imagining like, using a part known to be in the struggling regime at a certain cal, T, and V, starting at normal speed, then running a test functions in asm for each instruction, Part of this could be proceedurally generated and should be, as a good test would be quite long, where you'd go to inline assembly push everything, ldi a start value into some registers, ldi the new speed and the CCP value, then out ccp, sts new cal, nop nop, then a long sequence of the same instruction with minor changes.

For example, starting with 4 bytes of data to mangle, you then have a sequence of subi's. Declare that there shalt be 256 subis, with all possible immediate values. each data byte could be the destination of 64 of those, (probably want to distribute randomly), with the order of the immediate values randomized as well. Then, turn the CPU speed back down, you should be running in clean territory now. so store than 4 byte value to memory, restore all the registers - but pop them into r0, compare with the value in their home register, and then mov them there. If any register you didn't use is changed, that's something to investigate. For example, if you targeted registers 20, 21, 22, 23, and then you look at what wound up in those vs what did at in-spec clock speed, and found that r20 was different, and your register scan revealed that r16 was wrongly changed, and you then see that the two are wrong by the same amount, and confirm that yes that amount is one of the numbers you tried to subi from r20, you would suspect that the 1-bit in the register section of the subi opcode was misread as a 0, and the operation otherwise carried out successfully. On the other hand if you saw no damage to other registers, saw evidence of only a single error, but the value was not one of the ones subtracted from that byte, nor was it the sum of any two (note - converse not true for a single sample, as chance of false positive is pretty high), you would instead suspect that the immediate value or the result itself was what got mangled. If the error was a power of two off, you would strongly suspect that. Multiple runs under the same condition using several randomly generated sequences, each multiple times, would reveal if the errors were distributed randomly across the opcode, or if there was a correlation between the opcode for that instruction (that is, between the operands) and the chance of error or not (both are plausible a priori). This could be repeated for each instruction (procedural generation of the asm, like I said, is a must). At the end you would be in a position to determine:

  1. What instructions are the most likely to fail to execute?
  2. What instruction operands are most likely to be misinterpreted? (or that it does not matter) Then if the same process were performed on a few specimens, you'd learn the most important things
  3. Whether the same instructions are the weakest ones on all devices, or vary part-to-part? (I suspect it is the same instructions).
  4. Whether that can be put into practice to develop a single short routine composed of a large number of sensitive isns, which give a trustworthy answer to the question "Is the chip stable at the current operating conditions?"

With such a tool in hand, an expanse of uncharted territory is ripe for exploration, as you could run up the clock speed like a tuning sketch, and recognize when it had started struggling and make a 3-dimensional plot of that Fmax temperature and voltage. Finally, if enough data was taken at one of the "speed grade" voltages for varying temperature, you would likely get a plot that you could fit a curve to, extrapolate to the manufacturer spec max speed, thus revealing how much headroom they designed for, and from there you could likely synthesize a h(V, T, F_CPU) which would indicate how comfortable the chip was running in those circumstances, ie, for all V, T you would know Fmax, ie, Fmax(V,T), so h(V, T, F_CPU), the headroom, would be 1-F_CPU/F_CPUmax(V,T). Now Fmax(V,T) of course would have to have specimen-dependant constants. The number of such constants, (assuming they're not correllated exactly), would be the number of points on the V,T plane you'd need to measure Fmax at for any given specimen in order to predict its headroom over all conditions

Are temperature grades still real?

Since the AVR DA-series was released, all newly released parts have not had the temperature grade marked on the parts. This is of course very unfortunate (Microchip support will tell you if you tell them the lot number), since it opens the door towards easier misconduct by turnkey PCBA manufacturers (often based in China, where unethical cost cutting is enthusiastically embraced). If my application requires operation at 125C, and I wanted to be certain that the products were using the parts I specified (for example to ensure that I couldn't be held liable for negligance), I would now have to write down all the almost-unreadable lot numbers, send them to Microchip support, and have them tell me what temperature spec they are. Assuming Microchip acts honestly when it answers these (which it likely does if it's putting things in writing, since they don't want to be held liable themselves, and they have every incentive to help call out the fraudsters who are pocketing profits that could be theirs), this is still a lot more work than just verifying that the letter is the correct one. The letters could still be faked (though E and I are less effective in that regards compared to the old Atmel system, which used F, N and U for the temperature grades, since you can turn an I (as written on parts) into an E by just adding 3 short lines, while turning an N or U into an F requires erasing lines).

THIS CHANGE SUCKS! What on earth were they thinking?!

What could be their motivation here? Did they:

  1. Want to support fraudulant activity? To what end? This seems implausible, because the dishonest assembly house would be taking money right out of Microchip's revenue by turning what should have been E-spec sales into I-spec ones.
  2. Dislike their customers and want to make their lives harder? To what end? This seems contrary to their intererests as well.
  3. Mindlessly follow an inflexible product marking doctrine? This could be excluded or supported by looking at other Microchip products and seeing if they marked different grades. This is plausible; Microchip is a big company, and this is the sort of thing big companies do.
  4. They're both the same damned chips, and it saves money to not have to mark them differently, but they don't want to stop charging extra for a high temp version.

It thus stands to reason that, if Microchip's process technology had bested Atmel's by a sufficient extent, they might be producing only E-spec parts without having to try. Since they'd still want to be able to charge more for as many parts as they could, without the prices scaring off the cost-sensitive customers, they might want to continue selling parts in two different grades. Yet each lot is homogenous in terms of temperature grade. How could we tell those apart?

  • Difference in price between the temperature grades which varied significantly between different parts would suggest the price difference was not due to process differences (which would result in PE = PI * YI/YE) where Y is the yield of the two parts, which for a given process node could in turn be approximated by YprocessA where A is the normalized die size, and Yprocess is the yield for a die of that size, and depends on the temeperature grade. So if we define U = YI process/YE process, the first equality becomes PE=PI * UA. That is, there should be a very clear relationship - for parts made on the same process - between the premium as a fraction of price and the size of the die. And if the processes were not identical, PE=PI * UA * C where C is a constant equal to the ratio between the cost of the production process. Since the flash and ram is a large portion of the size of the die, we would then expect that the smallest flash versions would never have a larger premium on high spec variant than the large flash versions: when A is smaller, UA * C is closer to unity. With a sack of files (the tool, not the digital kind) and sacrificial chips, one could experimentally characterize the normalized die area for a given chip design across the flash sizes. That would allow you to make predictions for normalized premium on the E/F spec parts as a function of U. If we assume that U >= 1, the bound for the extreme of U = 1 would be constant. If U > 1, the premium would get larger as the die got larger - and we could say somethin