What change caused the largest reduction in code size? #264

AJ528 · 2024-04-26T20:12:21Z

AJ528
Apr 26, 2024

Hello,

After discovering this repository, I was inspired to spin my own version of printf functions with similar (but not identical) goals:

It should be a simple and well-documented printf function.
It's coded using 32-bit numbers almost everywhere (I'm designing it to run on Cortex-M micrcontrollers).
It should support integer conversions but no floating point (b, c, d, i, p, s, u, X, x, and % are supported).
It should be as small as possible.

I feel like I've accomplished goals 1-3, but I am struggling a bit with goal 4. I've tried both plugging my code into godbolt compiler explorer and iterating through making some changes, recompiling, and looking at the size/generated code. In both situations I'm not seeing any optimizations I can make that affect the code size by more than like 10 bytes.

When nanoprintf was being developed, were there any minor changes made that had an outsized affect on the compiled code size? Any low-hanging fruit I might have overlooked? Otherwise, were there any specialized programs or compiler options that helped you reduce the size?

Obviously, I could make drastic changes to my code or rewrite it in assembly to reduce the size, but that would go against goal 1. (also, in that situation I would just use nanoprintf!)

Answered by charlesnicholson

Apr 26, 2024

Yup, it gets really hard to keep things really tiny in C, and as soon as you drop into ASM you hit portability issues. Off the top of my head, there wasn't a single silver bullet, it was as you say: just constantly pasting the code into godbolt and studying the asm.

Big wins:

making all of the internal helper functions static and never taking their addresses. That allows the compiler to go crazy with inlining, which removes function prologue / epilogue work and doesn't waste bytes on argument passing.
using sibling-call optimization in npf_putc_cnt to increment the output buffer cursor for free, instead of having to increment the output buffer cursor at every point in npf_vpprintf where …

View full answer

charlesnicholson · 2024-04-26T20:41:22Z

charlesnicholson
Apr 26, 2024
Maintainer

Yup, it gets really hard to keep things really tiny in C, and as soon as you drop into ASM you hit portability issues. Off the top of my head, there wasn't a single silver bullet, it was as you say: just constantly pasting the code into godbolt and studying the asm.

Big wins:

making all of the internal helper functions static and never taking their addresses. That allows the compiler to go crazy with inlining, which removes function prologue / epilogue work and doesn't waste bytes on argument passing.
using sibling-call optimization in npf_putc_cnt to increment the output buffer cursor for free, instead of having to increment the output buffer cursor at every point in npf_vpprintf where I write a byte. (sibling-call structure means that there's no generated function prologue / epilogue code for the npf_putc_cnt function, it's "free")
working with reversed data: it's very small + simple to turn the number 1234 into the string "4321", but requires length-preprocessing or post-write reversal code to turn the number 1234 into the string "1234". So, all of the conversion helper functions simply leave the string data reversed in the buffer, and the final printing code walks backwards through the array (free since the length is now known) to print each character.

There are also a bunch of micro-optimization wins: lines 402-413 look like buggy nonsense, but they're a way to express logic that can't otherwise be done in C, in a way that let me add the feature without increasing a single byte of object code because the compiler "understood" it. (https://github.com/charlesnicholson/nanoprintf/pull/245/files). I also got back a bunch of bytes by using array indices instead of pointers, and also by expanding out the various fields of npf_format_spec_t from bits to standalone values. It increased the size of the structure, but let the compiler do simple reads instead of masking and shifting.

Ultimately though, it's the result of a few years of staring at the assembly, coming up with ideas, trying them, and seeing what happens. I don't believe it's possible to make code like this "pretty" or "simple" while maintaining the tiny size and the feature/flag matrix intact. The code started off closer to your first goal, but then I just kept trading simplicity away for big size reductions, and ended up here. I don't have the hubris to claim that nanoprintf is optimal at all; I'd love to see a smaller version that's just as full-featured! But, it's definitely not simple to get there, no matter what your approach :)

I do agree that nanoprintf is ugly and near-unreadable, and this project is the only time I've made these kinds of tradeoffs (intentionally, anyway...)

2 replies

AJ528 Apr 26, 2024
Author

Thanks for the quick reply!

I'm glad to hear I have the right approach of "change code -> study asm output -> repeat". I've run into enough situations where I made my life hard by not using the right tool, so it's nice to hear from someone who has gone before and learn what they discovered.

I was aware of the benefits of static functions and inlining, so I was already using those in my code. But I've never heard of sibling-call optimization. After a bit of googling, it sounds like a way to get 2 functions for the price of one as long as their required stack manipulations are the same. That's really cool!

Working with reversed data is something I discovered during the writing of my printf function. That really does greatly simplify some logic and streamline the whole process. However, I wasn't able to include the padding while the number was reversed. I created a 34 byte char array to store the reversed number (and any sign), but since the padding could potentially be much larger, I only add that in when I'm printing back to the original buffer.

I also got back a bunch of bytes by using array indices instead of pointers, and also by expanding out the various fields of npf_format_spec_t from bits to standalone values. It increased the size of the structure, but let the compiler do simple reads instead of masking and shifting.

I'm surprised to hear that! I was actually just considering swapping my array indices for pointers because I thought the opposite would be true. But it does make sense that could reduce overall size by simplifying access.

I do agree that nanoprintf is ugly and near-unreadable, and this project is the only time I've made these kinds of tradeoffs (intentionally, anyway...)

I'm not sure I'd go so far as to say it's "ugly"...I would say it's highly optimized for a specific goal, and I think that's really cool! I'm not sure I could ever make something this small while maintaining your list of constraints, so I'm quite impressed.

After looking at my code some more, it seems like a decent chunk of the size is coming from the string reversal and padding logic, but I don't see any obvious optimizations there. I also looked at nanoprintf some more and realized that when it's compiled to support field width format specifiers it also jumps in size from 850 bytes to 1300 bytes, so that made me feel better haha.

While my version of printf has fewer features (I think it's similar to nanoprintf compiled with FIELD_WIDTH_FORMAT and BINARY_FORMAT enabled), I have been able to get it down to 1354 bytes! You are welcome to check it out here: https://github.com/AJ528/mprintf

Thank you for the advice and pointing me towards some new optimizations I can learn. I really appreciate it!

charlesnicholson Apr 27, 2024
Maintainer

Nice! IIRC ~100-200 bytes of nanoprintf with precision + field-width enabled come from the various rules that printf puts in place: if you have this flag and that flag, this one wins. If you have this flag and this flag, ignore them both, etc etc etc. Additionally, some amount of error checking was required so that the entire format-specifier string would just be ignored if the format was invalid. Support for "star" args took up another 50-100 bytes. :(

Writeback is generally acknowledged as a bad idea but it's in the spec, and precision also exists for integer and string types. Weird things that I don't generally use, and it hurt seeing the binary size go up as I brought nanoprintf into compliance :)

In general, I'd say "look for duplication, try deleting it, and see if it helps!" When I add new features, I see if there's any way to "golf" (or pervert) existing code into supporting the new feature, or I'll try to weave it in in parallel, like the binary value support stuff.

Good luck!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What change caused the largest reduction in code size? #264

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

What change caused the largest reduction in code size? #264

AJ528 Apr 26, 2024

Replies: 1 comment · 2 replies

charlesnicholson Apr 26, 2024 Maintainer

AJ528 Apr 26, 2024 Author

charlesnicholson Apr 27, 2024 Maintainer

AJ528
Apr 26, 2024

Replies: 1 comment 2 replies

charlesnicholson
Apr 26, 2024
Maintainer

AJ528 Apr 26, 2024
Author

charlesnicholson Apr 27, 2024
Maintainer