-
Don't : You first need to have measurable performance goals ("as fast as you can't" is not an acceptable goal). If you hit these goals go do something with better business value.
-
Don't ... yet : It's much easier and cheaper to fix the problem with hardware. Get a faster CPU, faster network ... Developer time & money are the most expensive resources in software development. Also note that optimized code is much harder to maintain.
-
Profile before optimizing : Bottlenecks will surprise you. Don't guess where the code spends it's time, use a profiler and see.
-
Algorithms & Data structures Rule : They will usually give you much better performance than any other trick.
-
Know thy Hardware : CPU affinity, CPU cache, memory, latency numbers .... For example: Cache-oblivious algorithms
-
Include performance in your process : Design & code reviews, run & compare benchmarks on CI ...
-
Memory Allocation : Avoid allocations as possible (see the design of io.Reader). Pre-allocate if you already know the size. Be careful of slices keep large amounts of memory (
s := make([]int, 1000000)[:3]
) -
defer
might slow you Down : However consider the advantages. -
strings are immutable : Use bytes.Buffer or strings.Builder
-
Know when a goroutine is going to stop : Avoid goroutine leaks. Use context for cancellation/timeouts.
-
Cgo calls are expensive : Group them together in one
cgo
call. -
Channel can be slower than
sync.Mutex
: However they are much easier to work with -
Interface calls are more expensive the struct calls : You can extract the value from the interface first. However it's less generic code.
-
Use
go run -gcflags=-m -l
: You'll see what escapes to the heap.
- So you wanna go fast
- Performance tuning workshop.
- Quick look at some compiler optimization
- Performance Mantras
{::comment}
-
Don't do it : Can we avoid doing the calculation at all? For example: Do we need to parse the input or just pass it as-is?
-
Do it, but don't do it again : Can we use memoization/caching? Parse objects once at the "edges" and use the parsed objects internally.
-
Do it less : Do we need to run this every millisecond? Can every second work? Can we use only a subset of the data?
-
Do it later : Can we make this API call async?
-
Do it when they're not looking : Can we run the calculation in the background while doing another task?
-
Do it concurrently : Will concurrency help here? Consider Amdhal's law.
-
Do it cheaper : Can we use a map here instead of a slice? Research available algorithms and data structures and know their complexity. Test them on your data
{:/comment}