Understanding Jenkins

Introduction

The homepage for our Jenkins tests is https://cmssdt.cern.ch/jenkins/view/DMWM/ which is behind the CERN SSO service.

There are two tests that ultimately decide what Jenkins thinks about your pull request: DMWM-PR-test and DMWM-WMAgent-TestAll. Each of these tests or "builds" starts other builds that test specific aspects.

DMWM-PR-test starts DMWM-PR-unittests, DMWM-WMCore-PR-pylint, and DMWM-WMCore-PR-27. DMWM-PR-unittests actually starts 10 jobs each of which tests a "slice" of the unit tests since running them all in one process would take too long. Each of these slices feeds up when done to DMWM-PR-unittests and DMWM-PR-unittests, DMWM-WMCore-PR-pylint, and DMWM-WMCore-PR-27 all feed their information up to DMWM-PR-test which does a final analysis of the results and decides if your PR passes or fails.

How to interpret Jenkins's comments on your pull request

These sub-builds allow Jenkins to break down your pull request into three different checks:

Unit tests

Jenkins runs every unit test we have in the code base (currently about 1100) and compare them with the last time the baseline unit tests were run against the master branch (this should happen twice per day). It reports on the differences it finds. Your PR is failed if you add a failing unit test or cause any existing unit tests to fail that previously succeeded. Unit tests that are known to be unstable (test/etc/UnstableTests.txt) are reported on, but won't trigger a failure of your pull request.

If you do have a failure and want to see the unit test output

click on "details" in the merge status box or on the link in the message Beginning pylint and unit tests that is posted on your PR.
click on Test Result
find the failing test you are interested in and click on it. This will give you all the info that nose captured while it was running your test.

pylint code quality checks

Jenkins runs a full pylint check on both the master (or other) branch you are pulling towards and your proposed branch. It fails your PR under certain circumstances:

You add a new file with a pylint score less than 8.0
You modify a file and the pylint score is less than 8.0

It will spit out a report on a file (but not fail the PR) in other cases

You add a new file
You change a file such that its score is less than it was before or you increase the number of warnings or errors in the file
You change a file that has easy to fix errors like unused arguments, variables, or imports (only these warnings are displayed)

python future code quality checks

How to debug Jenkins when it doesn't produce results

In principle, none of these sub-jobs is supposed to fail. It doesn't work out that way, though

Any of them can fail because they can't install the WMAgent RPMs from cmsweb
Any of them can fail because they can't contact GitHub
Unit test slices can fail because they get stuck or take too long, in which case Jenkins kills them
PR-27 can fail if the proposed code introduces a python syntax error. That never happens because everyone runs their code, right?

Any time a failure occurs, the job that started it also fails. If this is the case, DMWM-PR-test fails and is restarted (up to four times). It is not restarted if it determines that your code was bad, only if there is a problem with the infrastructure. If Jenkins posts something about pylint and unit tests being good or bad, the build "succeeded". If it doesn't and instead says it is starting tests again, it did not.

Ok, so let's look back at that top level page: https://cmssdt.cern.ch/jenkins/view/DMWM/ You may see DMWM-PR-test as red and with stormy skies (previous builds failed). You shouldn't see the other tests mentioned with anything but green and sunny skies.

Now, you can click on DMWM-PR-test and see the history of tests on our pull requests. On the left you will see a list of pull requests and Jenkins build numbers. If you see a little green arrow in that column, that means Jenkins had to automatically restart the pull request for one of the reasons above. If not, that build was started because of a new commit or someone saying "test this please" on the pull request.

Now lets say you want to see why the unit tests were unable to complete.

Click on the build # next to the red dot
Click on console log (these two steps can be combined with a dropdown)
Scroll down to the bottom and you will see something like

Waiting for the completion of DMWM-PR-unittests
DMWM-PR-unittests #642 completed. Result was FAILURE

This tells you the build # of the sub-build (642). Click on that link, not the generic one for all DMWM-PR-unittests
Scroll to the bottom. You will see the status of each slice. Yellow is OK (tests failing). Red is not. Find the red one and click on that
Click on console output (again these can be combined using the drop down)
Usually at the end you will see the problem. It should be one of the causes listed above.

The steps to diagnose pylint or "27" (the python 2.7 future checker) are a little easier since they don't have slices. And in the "27" case at the end you might find something like this:

RefactoringTool: There was 1 error:
RefactoringTool: Can't parse src/python/CRABInterface/HTCondorDataWorkflow.py: ParseError: bad input: type=8, value=u')', context=('', (643, 107))

This means that line 643 of the proposed version of HTCondorDataWorkflow.py had a syntax error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly