Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(redactors): memory consumption improvements #1332

Merged

Conversation

banjoh
Copy link
Member

@banjoh banjoh commented Sep 11, 2023

Description, Motivation and Context

Redactor memory optimisations

  • Reduce buffer size allocated for io reader scanner - This was the smoking gun
  • Prefer working with []byte over strings. Strings are immutable and lead to more memory allocations
  • Use a regex cache to store compiled regular expressions. This led to a minimal reduction of consumption of clock cycles but a good optimisation
  • Use bufio.Scanner where possible. Reading raw bytes until a newline is reached has the likely hood of using up too much heap memory leading to OOMs. This is possible when a large enough file, without newlines, is read. With bufio.Scanner, there is a 10MB limit

Fixes: #1331

Checklist

  • New and existing tests pass locally with introduced changes.
  • Tests for the changes have been added (for bug fixes / features)
  • The commit message(s) are informative and highlight any breaking changes
  • Any documentation required has been added/updated. For changes to https://troubleshoot.sh/ create a PR here

Does this PR introduce a breaking change?

  • Yes
  • No

@banjoh banjoh requested review from a team as code owners September 11, 2023 17:51
@banjoh banjoh marked this pull request as draft September 11, 2023 17:52
@banjoh
Copy link
Member Author

banjoh commented Sep 13, 2023

Improvements analysis

Environment used

  • Decent Mac laptop
  • Empty k3d cluster
  • Spec collecting cluster resources only
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: sample
spec:
  collectors:
    - clusterResources: {}

Commands used

  • support-bundle spec.yaml --debug --memprofile=mem.prof
  • go tool pprof -http=":8001" -alloc_space mem.prof

Results

I ran these tests several times to reduce the error margin of resources being consumed in my laptop

  • Original run with code in main branch
orig
  • Run after reducing buffer size to 4096 (4KB). This is the default size used by other std libraries such as bufio
with_buffer_limit
  • Run after the string to []byte data type change
final

Notes

  • The buffer size change made the largest impact
  • The string to []byte change reduced memory consumption by about 11-14% after a number of runs. The sample above shows a 13% reduction (137MB -> 119MB of total memory)

@banjoh banjoh force-pushed the em/redactors-memory-consumption branch from 202acc1 to a53acb4 Compare September 13, 2023 12:35
@banjoh banjoh marked this pull request as ready for review September 13, 2023 14:04
@diamonwiggins
Copy link
Member

diamonwiggins commented Sep 13, 2023

@banjoh Working to get myself educated on these changes, so apologies in advance if none of the following makes sense :) A few questions so far:

  1. With regards to bufio.Scanner vs. bufio.Reader, could we have accomplished the same result by keeping bufio.Reader, but just lowering the buffer size? Is the major difference here that bufio.Scanner delimits at every newline and thus there's more of an implicit limit?

Use bufio.Scanner where possible. Reading raw bytes until a newline is reached has the likely hood of using up too much heap memory leading to OOMs. This is possible when a large enough file, without newlines, is read. With bufio.Scanner, there is a 10MB limit

Just to make sure I understand, if there's a large file without newlines it'll fall back to the buffer size same as bufio.Reader correct? I wonder if what's most important here is having the correct buffer size set as to optimize for memory consumption in our use case instead of relying on newlines? Would that result in the most complete solution here without any edge cases?

Also, you mentioned:

Prefer working with []byte over strings. Strings are immutable and lead to more memory allocations

With Reader you wouldn't need to cast from a string into bytes to begin with right?

  1. You mentioned that bufio.Reader was the smoking gun, but it doesn't seem like we've had changes to that code for a while? Any details on why we're just seeing this now, or has this been an unresolved problem for a while?

  2. How is current performance of redaction impacted by these changes? Are we knowingly sacrificing performance here in order to lower memory consumption? Was this a case of the increased memory consumption not being worth the potential increase in speed of redaction, or were we leaking memory?

@arcolife
Copy link
Contributor

arcolife commented Sep 14, 2023

FWIW, I see these pprof cumulative figures on M2 pro:

pre-fix (main):

  • bufio.(*Scanner).Text: 12.01GB
  • redact.readLine: 11.22GB
  • SingleLineRedactor: 24.64GB
  • MultiLineRedactor: 17.11GB
# ./bin/support-bundle test.yaml --debug --memprofile=mem.prof
 Collecting support bundle ⠇ [apiserver-audit-logs] Running host collector...
support-bundle-2023-09-14T03_21_59.tar.gz

============ Collectors summary =============
Succeeded (S), eXcluded (X), Failed (F)
=============================================
apiserver-audit-logs (S) : 359ms

============ Redactors summary =============
Host collectors : 17,366ms

============= Analyzers summary =============
Succeeded (S), eXcluded (X), Failed (F)
=============================================
No analyzers executed

Duration: 17,726ms

# go tool pprof -alloc_space mem.prof 
# (pprof) top
Showing nodes accounting for 41.38GB, 99.05% of 41.78GB total
Dropped 193 nodes (cum <= 0.21GB)
Showing top 10 nodes out of 19
      flat  flat%   sum%        cum   cum%
   17.31GB 41.43% 41.43%    17.31GB 41.43%  strings.(*Builder).grow
   12.01GB 28.75% 70.18%    12.01GB 28.75%  bufio.(*Scanner).Text (inline)
   11.22GB 26.86% 97.04%    11.22GB 26.86%  github.com/replicatedhq/troubleshoot/pkg/redact.readLine
    0.37GB  0.88% 97.93%     0.37GB  0.88%  fmt.(*buffer).writeString (inline)
    0.27GB  0.64% 98.56%    24.64GB 58.98%  github.com/replicatedhq/troubleshoot/pkg/redact.(*SingleLineRedactor).Redact.func1
    0.14GB  0.32% 98.88%     0.28GB  0.68%  regexp.(*Regexp).ReplaceAllString
    0.07GB  0.17% 99.05%    17.11GB 40.95%  github.com/replicatedhq/troubleshoot/pkg/redact.(*MultiLineRedactor).Redact.func1
         0     0% 99.05%     0.37GB  0.88%  fmt.(*fmt).fmtS
         0     0% 99.05%     0.37GB  0.88%  fmt.(*fmt).padString
         0     0% 99.05%     0.37GB  0.88%  fmt.(*pp).doPrintf

post-fix (feature branch):

  • SingleLineRedactor: 12.14GB
  • MultiLineRedactor: 12.33GB
  • redact.readLine / getNextTwoLines: 5.85GB
# ./bin/support-bundle test.yaml --debug --memprofile=mem.prof
 Collecting support bundle ⠼ [apiserver-audit-logs] Running host collector...
support-bundle-2023-09-14T03_08_42.tar.gz

============ Collectors summary =============
Succeeded (S), eXcluded (X), Failed (F)
=============================================
apiserver-audit-logs (S) : 303ms

============ Redactors summary =============
Host collectors : 16,986ms

============= Analyzers summary =============
Succeeded (S), eXcluded (X), Failed (F)
=============================================
No analyzers executed

Duration: 17,290ms

# go tool pprof -alloc_space mem.prof 
# (pprof) top
(pprof) top
Showing nodes accounting for 24.22GB, 98.91% of 24.49GB total
Dropped 132 nodes (cum <= 0.12GB)
Showing top 10 nodes out of 13
      flat  flat%   sum%        cum   cum%
   17.10GB 69.81% 69.81%    17.17GB 70.10%  bytes.ToLower
    5.85GB 23.91% 93.71%     5.85GB 23.91%  github.com/replicatedhq/troubleshoot/pkg/redact.readLine
    0.71GB  2.90% 96.62%    12.33GB 50.36%  github.com/replicatedhq/troubleshoot/pkg/redact.(*MultiLineRedactor).Redact.func1
    0.41GB  1.69% 98.31%     0.41GB  1.69%  regexp.(*bitState).push (inline)
    0.15GB   0.6% 98.91%     0.15GB   0.6%  regexp.(*Regexp).replaceAll
         0     0% 98.91%    12.14GB 49.58%  github.com/replicatedhq/troubleshoot/pkg/redact.(*SingleLineRedactor).Redact.func1
         0     0% 98.91%     5.85GB 23.91%  github.com/replicatedhq/troubleshoot/pkg/redact.getNextTwoLines
         0     0% 98.91%     0.54GB  2.22%  regexp.(*Regexp).Match (inline)
         0     0% 98.91%     0.15GB   0.6%  regexp.(*Regexp).ReplaceAll
         0     0% 98.91%     0.53GB  2.18%  regexp.(*Regexp).backtrack

In either cases:

  • Process' memory jumped only a couple hundred MBs
  • Duration is similar pre/post fix
  • On c2-standard-8 GCP, duration is about 37,000ms

Data used:

# cat test.yaml
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: default
spec:
  hostCollectors:
    - copy:
        collectorName: "apiserver-audit-logs"
        path: ./log/k8s-audit*
# ls log
-rw-------@  1 arcolife  staff   100M Sep 14 02:29 k8s-audit-2023-09-06T07-32-52.651.log
-rw-------@  1 arcolife  staff   100M Sep 14 02:29 k8s-audit-2023-09-06T10-52-44.261.log
-rw-------@  1 arcolife  staff   100M Sep 14 02:29 k8s-audit-2023-09-06T14-07-35.432.log
-rw-------@  1 arcolife  staff   100M Sep 14 02:29 k8s-audit-2023-09-06T17-28-43.581.log
-rw-------@  1 arcolife  staff   100M Sep 14 02:29 k8s-audit-2023-09-07T07-14-41.214.log
-rw-------@  1 arcolife  staff   100M Sep 14 02:29 k8s-audit-2023-09-07T13-30-36.751.log
-rw-------@  1 arcolife  staff    49M Sep 14 02:29 k8s-audit.log

@DexterYan
Copy link
Member

I did this one on aws ec2 machine with 4 CPU, 16G ram

with

apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: default
spec:
  hostCollectors:
    - copy:
        collectorName: "apiserver-audit-logs"
        path: /var/log/apiserver/k8s-audit*

main branch:
Screenshot 2023-09-14 at 4 44 13 PM
Process peak usage:
Screenshot 2023-09-14 at 4 32 13 PM
mem.prof:
main-mem.prof.txt

improved branch:
Screenshot 2023-09-14 at 4 45 34 PM
Process peak usage:
Screenshot 2023-09-14 at 4 39 15 PM
mem.prof:

improved-mem.prof.txt

@DexterYan
Copy link
Member

I think we need to read a line without break for regex, otherwise the regex search is not precisely. So bufio.Scanner read by line can help it. However, large file in one line that is quite difficult to handle in current way

@banjoh
Copy link
Member Author

banjoh commented Sep 14, 2023

@diamonwiggins

With regards to bufio.Scanner vs. bufio.Reader, could we have accomplished the same result by keeping bufio.Reader, but just lowering the buffer size? Is the major difference here that bufio.Scanner delimits at every newline and thus there's more of an implicit limit?

Difference between bufio.Scanner and bufio.Reader is that bufio.Scanner accepts a maximum size its internal buffer can grow to. This gives use the ability to have a limit of consuming a large file with no newlines breaking it. With bufio.Reader, we can achieve the same by effectively counting how much data we read, but that is what bufio.Scanner does

Just to make sure I understand, if there's a large file without newlines it'll fall back to the buffer size same as bufio.Reader correct?

No. Reading that file will result in an error in the lines of a line in '<somepath>' file exceeded 10485760 bytes max length. That max number is hardcoded to 10MB. Minified JSON files for example have no newlines.

I wonder if what's most important here is having the correct buffer size set as to optimize for memory consumption in our use case instead of relying on newlines?

Our implementation of redaction has been designed to process one line at time. This optimization is addressing huge one line inputs

With Reader you wouldn't need to cast from a string into bytes to begin with right?

No. Previous implementation converts bytes to strings which leads to new string objects being created. I'm proposing using the bytes as they are read.

You mentioned that bufio.Reader was the smoking gun

I meant the buffer created here. This line gets called multiple times and each time it allocates 1MB or memory. Using a bundle with 100 files, 1800MBs will be allocated "instantly" (depending on how fast goroutines start up)

How is current performance of redaction impacted by these changes? Are we knowingly sacrificing performance here in order to lower memory consumption?

In my tests, it doesn't look like the performance is getting impacted. I can provide some measurements here.

Was this a case of the increased memory consumption not being worth the potential increase in speed of redaction, or were we leaking memory?

We we a bit too greedy in the memory we reserved as buffers for data read in from files.

@diamonwiggins
Copy link
Member

Our implementation of redaction has been designed to process one line at time. This optimization is addressing huge one line inputs

Thanks for the explanation on everything. This in particular was helpful to understand.

diamonwiggins
diamonwiggins previously approved these changes Sep 14, 2023
laverya
laverya previously approved these changes Sep 14, 2023

// io.WriteString would be nicer, but scanner strips new lines
fmt.Fprintf(writer, "%s\n", clean)
_, err = writer.Write(append(clean, '\n')) // Append newline since scanner strips it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if the line is really long and you use append(line, '\n') you will allocate another up to 10M. wouldnt it just be better to write twice?

Suggested change
_, err = writer.Write(append(clean, '\n')) // Append newline since scanner strips it
_, err = writer.Write(clean)
_, err = writer.Write('\n') // Append newline since scanner strips it

this goes for all occurrences of append below

if err != nil {
return
}
_, err = writer.Write(append(clean, '\n'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also change it to write twice here?


_, err = writer.Write(append(clean, '\n')) // Append newline since scanner strips it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here, maybe need change to write twice

@arcolife
Copy link
Contributor

Thus far, confirming smoke tests ✅ for replicatedhq/kots#4031 with latest fixes

@diamonwiggins diamonwiggins merged commit 86279b4 into replicatedhq:main Sep 15, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redactors in v0.71.1 consume too much memory
6 participants