Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.util.ArrayList$Itr error when using the orchestration pipeline #745

Closed
BraisVQ opened this issue Oct 25, 2021 · 15 comments · Fixed by #750 or #756
Closed

java.util.ArrayList$Itr error when using the orchestration pipeline #745

BraisVQ opened this issue Oct 25, 2021 · 15 comments · Fixed by #750 or #756
Labels
bug Something isn't working

Comments

@BraisVQ
Copy link
Contributor

BraisVQ commented Oct 25, 2021

Describe the bug
We have a project that is using a huge amount of resources for theis Jenkins-master instance (20 GB and JVM is -Xms10g -Xmx18g) in order to aboid this error, but it still appear from time to time when running developer previews.
I have only seen this if document generation is enabled.

To Reproduce
Steps to reproduce the behavior:

  1. Make use of the orchestration pipeline to deploy some components and enable document generation
  2. Keep running developer previews without increasing Jenkins resources until you see this error

Expected behavior
The execution of the orchestration pipeline with document generation should not fail

Screenshots
Resources usage when error happened:
captura

Log where the error took place):
captura2

Another log where the error happened:
image

Affected version (please complete the following information):

  • OpenShift: 3.11 and 4.7
  • OpenDevStack 3.x and 4.x
@BraisVQ BraisVQ added the bug Something isn't working label Oct 25, 2021
@BraisVQ
Copy link
Contributor Author

BraisVQ commented Oct 28, 2021

In a new project, I run developer preview just the 2º time and we get this error.
In this new project I have just deployed 2 java backends and spock as e2e.

@metmajer
Copy link
Member

@braisvq1996 this is an effect of Jenkins running out of memory. Don’t forget to adjust also the Java heap size on the Jenkins master next to increasing me key on the pod itself.

@BraisVQ
Copy link
Contributor Author

BraisVQ commented Oct 28, 2021

Yeah, the JAVA_MAX_HEAP_PARAM for the Jenkins container where it failed was -Xms10g -Xmx18g. The container has 20 GB as memory limit

@clemensutschig
Copy link
Member

clemensutschig commented Oct 29, 2021

@braisvq1996 - can you repro this? wehave to fix this - no way a jenkins needing 20! gigs ...
I took a very brief look at the codebase, and where this shows up ..

a) component pipeline: https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/component/Context.groovy#L528

This is super easy to refactor into its own method and flag with NonCPS

b) finalize ods component: https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/orchestration/phases/FinalizeOdsComponent.groovy#L73

this is super weird as this runs on an agent I believe - and there is no functional loop anywhere ... (correction: there is, but could this really be it?) : https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/orchestration/FinalizeStage.groovy#L65 which is the call stack parent of the above call

@clemensutschig
Copy link
Member

There mus be a memory leak somewhere as well btw ...

@clemensutschig
Copy link
Member

Ok . more digging - I think we have to refactor:
https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/orchestration/FinalizeStage.groovy#L132-L1976

to use classical for loops - or get them really into NonCPS annotated methods...

@clemensutschig
Copy link
Member

plan of action:
go down to max 4gig with repos for code and test - and find every little serialization problem :)

@clemensutschig
Copy link
Member

clemensutschig commented Oct 29, 2021

@metmajer - we uncovered two issues, but where able to run 3 repos (2 ods-code / 1 e2e spock) with all docs and it worked (with 4gig and 3 requested :))

@clemensutschig
Copy link
Member

@braisvq1996 - as discussed please test a couple of more runs of this combo - so we really see if there is not more to fix.
also, I would add stories and tests etc ... so we get more traction now on the levadoc case - and see what we have missed there.

@metmajer - fyi

@clemensutschig
Copy link
Member

clemensutschig commented Nov 2, 2021

we are stuck with the component pipeline (we get the $Itr - seriously at the script.sh!) @michaelsauter any ideas?

https://github.com/opendevstack/ods-jenkins-shared-library/pull/750/files#diff-c24ba2bec77d315b094da0b3e035f5319f02d74481ad7b7eef102ae4e4442a87R520

after fixing all sorts of issues - we are really stuck here ... (jenkins bug?)

@clemensutschig clemensutschig linked a pull request Nov 2, 2021 that will close this issue
@michaelsauter
Copy link
Member

I looked at the source code right now but I do not understand how that line could produce an exception that java.util.ArrayList$Itr is not serialisable. Which list is used there? Are you sure the exception is from that line - and that there isn't an issue in reporting the exception?

@clemensutschig
Copy link
Member

clemensutschig commented Nov 2, 2021

@michaelsauter - it's coming from that line (we have surrounded this now with a try {} block, it's coming definetely from there)
https://github.com/opendevstack/ods-jenkins-shared-library/pull/750/files#diff-c24ba2bec77d315b094da0b3e035f5319f02d74481ad7b7eef102ae4e4442a87R520

the callstack - is coming from
https://github.com/opendevstack/ods-jenkins-shared-library/pull/750/files#diff-c24ba2bec77d315b094da0b3e035f5319f02d74481ad7b7eef102ae4e4442a87R167

I am inclined to believe this to be a jenkins bug ... but I cannot pinpoint it (unfortunately) ..

@clemensutschig
Copy link
Member

so - with a while & sleep it seems to work ... :(

Tmrw we'll stick even more stories++ into jira to see if there is more we missed ...

@clemensutschig clemensutschig linked a pull request Nov 5, 2021 that will close this issue
@BraisVQ
Copy link
Contributor Author

BraisVQ commented May 14, 2024

with the changes on #980 and opendevstack/ods-core#1217 we do not have more memory issues with Jenkins so issue can be closed

@BraisVQ BraisVQ closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants