-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System unavailable: Multiple windows machines currently offline #2493
Comments
Note: The jenkins logs have repeated messages regarding a failed JNLP connection attempt from several of the Windows Azure systems:
Potentially caused by it thinking it's already connected so not quite clear why it's trying to connect again. The one here was listed as offline after a disconnect, but was able to be brought back online again so we'll see if this continues.
|
Still happening on these three systems:
|
For test-azure-win2012r2-x64-1, I have changed it's work directory to |
test-azure-win2012r2-x64-3 is fine, but is low on disk space. I've changed it's Jenkins workspace to E:\jenkins which has 127g free. |
Unable to connect to the following: test-ibmcloud-win2012r2-x64-1 They're unpingable too. Might have to be rebooted via their respective vendor consoles @sxa |
test-azure-win2016-x64-1 - low on disk space. I cannot find anything in its workspace folder of significant size to delete. It has a D: drive, but it only has 16G on it. Not enough to be used as a workspace imo |
If it's low on space keep it offline for now |
@AdamBrousseau Are you able to initiate a restart of these two systems for us? |
I've triggered a restart of test-equinix-win2012r2-x64-1 |
rebooted, online.
|
Thanks Adam!
…On Tue, 12 Apr 2022, 20:31 AdamBrousseau, ***@***.***> wrote:
rebooted, online.
test-ibmcloud-win2012r2-x64-1
build-ibmcloud-win2012r2-x64-2
—
Reply to this email directly, view it on GitHub
<#2493 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APDJLOALSEQS5LTPWPXAAFLVEXFQPANCNFSM5PU5Q4CA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
test-ibmcloud-win2012r2-x64-1 is down again, and unpingable. build-ibmcloud-win2012r2-x64-2 is still up |
FYI I'm no longer seeing the already connected messages so at least for now that seems to have been resolved by things that have changed in the last day. |
@AdamBrousseau Can you see if test-ibmcloud-win2012r2-x64-1 can be recovered again please? We might need to keep an eye on that one - bring it up, look in the event look, then try a reboot and verify whether it comes back again. Ping me on slack when you want to do it so I can try and catch it in case it disappears of its own accod again. |
rebooted, online.
|
It's back and has survived a subsequent reboot to apply security updates 👍 |
Although build-ibmcloud-win2012r2-x64-2 seems to have gone offline again :-( Have pinged Adam who will try and recover it again. We should take a close look at the event log when it's back to see if there's any obvious cause |
both build-azure-win2012r2-x64-1 and build-ibmcloud-win2012r2-x64-1 went offline |
Took a few tries but it's back online. Feel free to ping me on slack if I don't respond fast enough here. |
All but test-equinix-win2012r2-x64-1 are back online. The ibmcloud machines need to be restarted every now and then, thankfully we have @AdamBrousseau to do it, but it would be nice to understand why they are unstable and find a solution |
@Haroon-Khel Can you see anything in the event log on the ibmcloud systems? Those machines have historically been stable enough for us. |
Current offline systems:
|
In test-azure-win2012r2-x64-1, its C:\Users\jenkins.test-2012r2-1\AppData\Local\Temp directory contains some beefy data files. @sxa Have you seen files like these before?
|
They were not created recently
|
Same on test-azure-win2012r2-x64-3. Some of these files are more recent than the others
|
And on test-azure-win2019-x64-1
|
Almost certainly from test cases. There are definitely ones that create files of that sort of size outside the workspace (See #2448 for an issue with that on AIX when |
|
Please put the system name in the title of this issue. Anything in https://ci.adoptopenjdk.net/label/ci.role.test&&hw.arch.x86&&sw.os.windows/
Link to any log file showing the problem: Same as above :-)
Please describe the issue: Jenkins agent not started not started on several machines (May also affect some build ones too)
The text was updated successfully, but these errors were encountered: