-
Notifications
You must be signed in to change notification settings - Fork 0
/
fastfirst.slide
317 lines (179 loc) · 11.7 KB
/
fastfirst.slide
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
Writing Fast Code the First Time
Brian Ketelsen
@bketelsen
* Writing Code To Be Fast The First Time
In this section we’ll build on our learnings from profiling and benchmarking and show advanced techniques to create code that is fast the first time. You’ll learn how to minimize the garbage created in your application, how to build reusable pools of expensive resources, and other advanced tricks to make your applications faster and avoid problems before they happen.
- Arrays and Slices
- Make GC Faster by Creating Less Garbage
- Pools
- Avoiding Unnecessary Conversions
- Use the Stack
* Memory management and GC tuning
* Memory management and GC tuning
Go is a garbage collected language. This is a design principle, it will not change.
As a garbage collected language, the performance of Go programs is often determined by their interaction with the garbage collector.
The Go GC favors lower latency over maximum throughput; it moves some of the allocation cost to the mutator to reduce the cost of cleanup later.
Next to your choice of algorithms, memory consumption is the most important factor that determines the performance and scalability of your application.
This section discusses the operation of the garbage collector, how to measure the memory usage of your program and strategies for lowering memory usage if garbage collector performance is a bottleneck.
* Garbage collector design
The design of the Go GC has changed over the years
- Go 1.0, stop the world mark sweep collector based heavily on tcmalloc.
- Go 1.3, fully precise collector, wouldn't mistake big numbers on the heap for pointers.
- Go 1.5, new GC design, focusing on _latency_ over _throughput_.
- Go 1.6, GC improvements, handling larger heaps with lower latency.
- Go 1.7, small GC improvements, mainly refactoring.
- Go 1.8, ROC collector
* Garbage collector world view
The purpose of a garbage collector is to present the illusion that there is an infinite amount of memory.
You may disagree with this statement, but this is the base assumption of how garbage collector designers think.
# Include McCarthy quote about infinite free store.
A stop the world, mark sweep GC is the most efficient in terms of total run time; good for batch processing, simulation, etc.
The Go GC favors _lower_latency_ over _maximum_throughput_; it moves some of the allocation cost to the mutator to reduce the cost of cleanup later.
The Go GC is designed for low latency servers and interactive applications.
* Garbage collector monitoring
A simple way to obtain a general idea of how hard the garbage collector is working is to enable the output of GC logging.
These stats are always collected, but normally suppressed, you can enable their display by setting the `GODEBUG` environment variable.
% env GODEBUG=gctrace=1 godoc -http=:8080
gc 1 @0.017s 8%: 0.021+3.2+0.10+0.15+0.86 ms clock, 0.043+3.2+0+2.2/0.002/0.009+1.7 ms cpu, 5->6->1 MB, 4 MB goal, 4 P
gc 2 @0.026s 12%: 0.11+4.9+0.12+1.6+0.54 ms clock, 0.23+4.9+0+3.0/0.50/0+1.0 ms cpu, 4->6->3 MB, 6 MB goal, 4 P
gc 3 @0.035s 14%: 0.031+3.3+0.76+0.17+0.28 ms clock, 0.093+3.3+0+2.7/0.012/0+0.84 ms cpu, 4->5->3 MB, 3 MB goal, 4 P
gc 4 @0.042s 17%: 0.067+5.1+0.15+0.29+0.95 ms clock, 0.20+5.1+0+3.0/0/0.070+2.8 ms cpu, 4->5->4 MB, 4 MB goal, 4 P
gc 5 @0.051s 21%: 0.029+5.6+0.33+0.62+1.5 ms clock, 0.11+5.6+0+3.3/0.006/0.002+6.0 ms cpu, 5->6->4 MB, 5 MB goal, 4 P
gc 6 @0.061s 23%: 0.080+7.6+0.17+0.22+0.45 ms clock, 0.32+7.6+0+5.4/0.001/0.11+1.8 ms cpu, 6->6->5 MB, 7 MB goal, 4 P
gc 7 @0.071s 25%: 0.59+5.9+0.017+0.15+0.96 ms clock, 2.3+5.9+0+3.8/0.004/0.042+3.8 ms cpu, 6->8->6 MB, 8 MB goal, 4 P
The trace output gives a general measure of GC activity.
DEMO: Show `godoc` with `GODEBUG=gctrace=1` enabled
* Garbage collector monitoring (cont.)
Using `GODEBUG=gctrace=1` is good when you _know_ there is a problem, but for general telemetry on your Go application I recommend the `net/http/pprof` interface.
import _ "net/http/pprof"
Importing the `net/http/pprof` package will register a handler at `/debug/pprof` with various runtime metrics, including:
- A list of all the running goroutines, `/debug/pprof/heap?debug=1`.
- A report on the memory allocation statistics, `/debug/pprof/heap?debug=1`.
*Warning*: `net/http/pprof` will register itself with your default `http.ServeMux`.
Be careful as this will be visible if you use `http.ListenAndServe(address,`nil)`.
DEMO: `godoc`-http=:8080`, show `/debug/pprof`.
* Garbage collector tuning
The Go runtime provides one environment variable to tune the GC, `GOGC`.
The formula for GOGC is as follows.
goal = reachable * (1 + GOGC/100)
For example, if we currently have a 256mb heap, and `GOGC=100` (the default), when the heap fills up it will grow to
512mb = 256mb * (1 + 100/100)
- Values of `GOGC` greater than 100 causes the heap to grow faster, reducing the pressure on the GC.
- Values of `GOGC` less than 100 cause the heap to grow slowly, increasing the pressure on the GC.
The default value of 100 is _just a guide_. you should choose your own value _after_profiling_your_application_with_production_loads_.
* Reduce allocations
Make sure your APIs allow the caller to reduce the amount of garbage generated.
Consider these two Read methods
func (r *Reader) Read() ([]byte, error)
func (r *Reader) Read(buf []byte) (int, error)
The first Read method takes no arguments and returns some data as a `[]byte`. The second takes a `[]byte` buffer and returns the amount of bytes read.
The first Read method will _always_ allocate a buffer, putting pressure on the GC. The second fills the buffer it was given.
.link https://golang.org/pkg/io/#ReadAtLeast
.link https://golang.org/pkg/io/#ReadFull
* Reduce allocations (cont.)
In this example, `Conn.Loop` can run forever without generating garbage.
.code fastfirst/includes/ioloop.go /START OMIT/,/END OMIT/
* strings and []bytes
In Go `string` values are immutable, `[]byte` are mutable.
Most programs prefer to work `string`, but most IO is done with `[]byte`.
Avoid `[]byte` to string conversions wherever possible, this normally means picking one representation, either a `string` or a `[]byte` for a value. Often this will be `[]byte` if you read the data from the network or disk.
The [[https://golang.org/pkg/bytes/][`bytes`]] package contains many of the same operations—`Split`, `Compare`, `HasPrefix`, `Trim`, etc—as the [[https://golang.org/pkg/strings/][`strings`]] package.
Under the hood `strings` uses same assembly primitives as the `bytes` package.
* Using []byte as a map key
It is very common to use a `string` as a map key, but often you have a `[]byte`.
The compiler implements a specific optimisation for this case
var m map[string]string
v, ok := m[string(bytes)]
This will avoid the conversion of the byte slice to a string for the map lookup. This is very specific, it won't work if you do something like
key := string(bytes)
val, ok := m[key]
* Avoid string concatenation
Go strings are immutable. Concatenating two strings generates a third.
Avoid string concatenation by appending into a `[]byte` buffer.
_Before:_
s := request.ID
s += " " + client.Address().String()
s += " " + time.Now().String()
return s
_After:_
b := make([]byte, 0, 40) // guess
b = append(b, request.ID...)
b = append(b, ' ')
b = append(b, addr.String()...)
b = append(b, ' ')
b = time.Now().AppendFormat(b, "2006-01-02 15:04:05.999999999 -0700 MST")
return string(b)
DEMO: `go`test`-bench=.`./concat` // b.ReportAllocs() shows memory allocations
* Preallocate slices if the length is known
Append is convenient, but wasteful.
Slices grow by doubling up to 1024 elements, then by approximately 25% after that. What is the capacity of `b` after we append one more item to it?
.play fastfirst/includes/grow.go /START OMIT/,/END OMIT/
If you use the append pattern you could be copying a lot of data and creating a lot of garbage.
If know know the length of the slice beforehand, then pre-allocate the target to avoid copying and to make sure the target is exactly the right size.
* Preallocate slices if the length is known (cont.)
_Before:_
var s []string
for _, v := range fn() {
s = append(s, v)
}
return s
_After:_
vals := fn()
s := make([]string, len(vals))
for i, v := range vals {
s[i] = v
}
return s
* Slice Tidbit
Slice Optimization:
//s := []string{} // not this
var s []string // this!
for t := range things {
s = append(s, t)
}
Creating `s` without initializing it will never allocate memory if there is nothing in the `things` collection.
Creating `s` with an initialization will *always* allocate memory. Optimize for the best case and use the `var s []string` form.
* Using sync.Pool
The `sync` package comes with a `sync.Pool` type which is used to reuse common objects.
`sync.Pool` has no fixed size or maximum capacity. You add to it and take from it until a GC happens, then it is emptied unconditionally.
.code fastfirst/includes/pool.go /START OMIT/,/END OMIT/
*Warning*: `sync.Pool` is not a cache. It can and will be emptied _at_any_time_.
Do not place important items in a `sync.Pool`, they will be discarded.
* Rolling Your Own Pool
It's easier to make a safe pool using buffered channels.
Start with a type that represents your pool:
.code fastfirst/includes/pool/pool.go /START OMIT/,/END OMIT/
* Rolling Your Own Pool
Then create a New() function that creates a new pool
.code fastfirst/includes/pool/pool.go /STARTNEW OMIT/,/ENDNEW OMIT/
The NewPool() function takes an integer parameter that represents the maximum size of the pool. That way the caller can control the size of the pool.
* Rolling Your Own Pool
To take an object from the pool, use Borrow():
.code fastfirst/includes/pool/pool.go /STARTBORROW OMIT/,/ENDBORROW OMIT/
Borrow uses all the properties of channels that we love. It tries to read a from the channel, but creates a new object if there's nothing to read.
* Rolling Your Own Pool
To return an object to the pool, use Return(), preferably in a `defer`:
.code fastfirst/includes/pool/pool.go /STARTRETURN OMIT/,/ENDRETURN OMIT/
Return tries to send the object back on the channel, and discards it if it can't, because the pool is at max capacity.
* Rolling Your Own Pool
Where does this make sense?
- Things that have costly setup, or large memory footprint
- Encoders/Decoders with their own internal buffer
- Rate-limiting the number of outbound connections to an external service
* Use the Stack, Luke
Memory that is allocated on the heap is more costly to collect. Memory that is allocated on the stack is less costly and easier to collect.
In the current compilers, if a variable has its address taken, that variable is
a candidate for allocation on the heap. However, a basic escape analysis
recognizes some cases when such variables will not live past the return from
the function and can reside on the stack.
.link https://golang.org/doc/faq
* Use the Stack, Luke
You can help keep things on the stack by avoiding making pointers when they aren't needed. It's less pressure on the garbage collector to pass a value to a function *if*the*fuction*won't*be*mutating*the*value*.
You can't control this 100%. But you can sway things in your favor by avoiding pointers unless they're needed.
* Use the Stack, Luke
You can see what the compiler will do in a specific instance by calling the `go` compiler with `-gcflags '-m'` which will print verbose output of escape analysis information.
.link fastfirst/includes/gogrep.txt
* Exercises
src/fastfirst/exercises/concat
Run go test -bench=. and benchmark the various concatenation strategies to see the difference between string manipulation and byte buffers.