Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tmate action to CI for debugging #3138

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

pshriwise
Copy link
Contributor

Description

This adds tmate to our CI in a mode that will only generate a connection if the CI action has failed. The connection will remain open for 15 minutes to allow someone with access to the action output to connect to the CI machine (once connected, the action will remain live until the user exits the terminal session).

Alternative options for deploying tmate are discussed in the issue below.

Fixes #3137

@ahnaf-tahmid-chowdhury
Copy link
Contributor

This could be helpful to investigate the PR #3087

@pshriwise
Copy link
Contributor Author

Okay I think I found a solution I'm happy enough with here, even if it's a little less elegant than I'd like. Contributors can now submit a commit with a message that contains [gha-debug] to their PR to produce a detached tmate session that allows all steps of the workflow to execute and then provides an SSH command to allow one to login and perform whatever problem solving is needed (default timeout is 10 min).

Github actions surprisingly doesn't make it very easy to get the latest commit information for all events that trigger workflows as part of the github.event context in the workflow file. The pull_request event requires a checkout of the repo with more depth so the merge commit parents can be extracted to get the correct message for determining whether or not the tmate session should be enabled.

@pshriwise
Copy link
Contributor Author

Just smoothed out one final snag here: The tmate action was causing jobs to fail when timing out, but I wasn't a fan of that. The continue-on-error flag for that step did the trick, so CI runs that trigger the tmate debug session with [gha-debug] in their commit messages should pass checks if the tests pass. This should keep PRs from requiring another CI session if the tests pass with the tmate session enabled.

Here's an example of this working in my fork: https://github.com/pshriwise/openmc/actions/runs/11335356157

I'll note that the errors do still appear in the summary, but aren't really of consequence.

continue-on-error: true
if: ${{ contains(env.COMMIT_MESSAGE, '[gha-debug]') }}
uses: mxschmitt/action-tmate@v3
timeout-minutes: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it mean the session will end after 10 min? And I have noticed, when the log is huge, we can't scroll the window in tmate session. So, is there any alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does mean the session will close after 10 minutes, yes. I don't think we want CI running indefinitely. This seemed like a reasonable window for someone to follow the progress of CI and log into the session.

The terminal session is inside of a tool called tmux. You can scroll up higher in the terminal output, but it requires a few extra keystrokes (Ctrl+B, [ if memory serves).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please double check this? I was testing this workflow in my branch and found that the session gets closed even when I am logged in. I think that since we are using a specific commit to run this, we can increase the time limit to 1 h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm I hadn't experienced that. Let me confirm and I can increase the time limit.

Copy link
Contributor

@paulromano paulromano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, this will require a PR author to be aware of the "[gha-debug]" special string and make a commit with that. I personally don't see how this is much easier than letting people be aware of tmate in general and having them add the few lines in ci.yml since either way they will have to make an extra commit. I would prefer the latter for simplicity.

@pshriwise
Copy link
Contributor Author

As is, this will require a PR author to be aware of the "[gha-debug]" special string and make a commit with that. I personally don't see how this is much easier than letting people be aware of tmate in general and having them add the few lines in ci.yml since either way they will have to make an extra commit. I would prefer the latter for simplicity.

Hrmm that's a good point about awareness. It's something of an easter egg at the moment. I think there are ways to address that though by adding notes in the documentation testing section and to the pull request template.

I'd love for more people to be aware of the tmate action for sure since it's so useful, and, yes, it's true that a commit is required either way, but what you've outlined above requires a change to ci.yml that will have to be removed later before the PR is merged (if the author and reviewer remember). It just feels more noisy to me. I think will also save contributors time determining:

  • where the ci.yml file is,
  • where in the ci.yml file they should place the tmate lines,
  • and what options for the tmate action they should use

as well as maintainers' time providing that explanation if needed instead of "run $ git commit -m "[gha-debug]" --allow-empty and watch CI for a login line".

At the end of the day, It's up to you bit I wouldn't mind saving some cycles for CI debugging when it's needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addition of entry point for debugging failed CI runs
3 participants