Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go frontend memory optimisation #64

Merged
merged 3 commits into from
Jul 30, 2024
Merged

Conversation

pandurangpatil
Copy link
Member

  1. goastgen version upgrade, which fixes exclusion rule handling for path as well and graceful handling of mod file parsing in the even module name is not present. (Earlier changes to include packages with include regex, didn't filter all the packages and had a limitation to include all sub folders/packages if root package is being used. However, that wasn't the case. Hence, we introduced a mechanism to check the entire package folder path with goastgen with a separate flag -includePacakges and updated integration.)
  2. Changed parsing of AST json inside AST Creator itself.
  3. Made changes in download dependency processing to generate AST only for the used packages and subsequently only processing used packages. There was a change required inside goastgen utility to support the -include input option.

Restructured the code to isolate the common handling for main source code pre-processing to build the cache, downloaded dependencies and AST creation.

With this change, we now have created separate traits which handle, main source code pre-processing, processing of downloaded dependencies for caching and AST Creation along with their respective passes.

With this, we get the following benefits.

  1. Clarity in the code as the processing is now isolated in different passes and traits.
  2. Earlier we held the AstCreator instances for each file at the beginning of CPG generation while pre-processing the AST for building the metadata cache which can be used and referred to by the AST creator to resolve the types. Resuing that instance at the last stage inside AstCreationPass. Which blocks large amounts of memory which is noticeable with a big code base. With this change, we are creating an AstCreator instance and destroying it once the pass is done with its work.

Earlier, if we pass the directory path which contains multiple go modules. Processing was done with all the .go files mapped to the single go.mod file.

With this change, we have segregated the processing by first isolating all the files mapped to respective go.mod. This will make sure to clean up the memory footprint after every module is being processed. However, this will increase the processing time when used with a download dependency as it will process all the go.mod files by identifying and processing used dependencies.

tests covering multiple module scenarios

Original PRS

joernio#4724
joernio#4703
joernio#4678
joernio#4668
joernio#4475

1. `goastgen` version upgrade, which fixes exclusion rule handling for
path as well and graceful handling of mod file parsing in the even
module name is not present. (Earlier changes to include packages with
include regex, didn't filter all the packages and had a limitation to
include all sub folders/packages if root package is being used. However,
that wasn't the case. Hence, we introduced a mechanism to check the
entire package folder path with goastgen with a separate flag
-includePacakges and updated integration.)
2. Changed parsing of AST json inside AST Creator itself.
3. Made changes in download dependency processing to generate AST only
for the used packages and subsequently only processing used packages.
There was a change required inside `goastgen` utility to support the
`-include` input option.

Restructured the code to isolate the common handling for main source
code pre-processing to build the cache, downloaded dependencies and AST
creation.

With this change, we now have created separate traits which handle, main
source code pre-processing, processing of downloaded dependencies for
caching and AST Creation along with their respective passes.

With this, we get the following benefits.

1. Clarity in the code as the processing is now isolated in different
passes and traits.
2. Earlier we held the AstCreator instances for each file at the
beginning of CPG generation while pre-processing the AST for building
the metadata cache which can be used and referred to by the AST creator
to resolve the types. Resuing that instance at the last stage inside
AstCreationPass. Which blocks large amounts of memory which is
noticeable with a big code base. With this change, we are creating an
AstCreator instance and destroying it once the pass is done with its
work.

Earlier, if we pass the directory path which contains multiple go
modules. Processing was done with all the .go files mapped to the single
go.mod file.

With this change, we have segregated the processing by first isolating
all the files mapped to respective go.mod. This will make sure to clean
up the memory footprint after every module is being processed. However,
this will increase the processing time when used with a download
dependency as it will process all the go.mod files by identifying and
processing used dependencies.

tests covering multiple module scenarios

Original PRS

joernio#4724
joernio#4703
joernio#4678
joernio#4668
joernio#4475
Copy link
Member

@khemrajrathore khemrajrathore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pandurangpatil pandurangpatil merged commit 7ad4a4e into master Jul 30, 2024
15 checks passed
@pandurangpatil pandurangpatil deleted the privado-fork-go-opt branch July 30, 2024 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants