Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to validate commit headers. #6630

Merged
merged 1 commit into from
Nov 25, 2024
Merged

Conversation

mbien
Copy link
Member

@mbien mbien commented Oct 28, 2023

Back when I setup the paperwork job, I didn't know that gh can generate patch files yet. The problem was to figure out what commits are actually new in the PR. I gave it a few attempts but got distracted with more important things.

this adds:

  • simple script to check if all commits have proper email addresses
  • looks for empty or single word subject messages
  • can warn about missing blank line after subject (disabled)
  • logs errors/warnings to the github actions summary page, fails on error

paperwork job:

  • the paperwork job will now no longer prevent the github workspace from being cleaned up on failure

output in workflow summary:

image

output in job log:

image

@mbien mbien added CI continuous integration changes ci:no-build [ci] disable CI pipeline labels Oct 28, 2023
@mbien mbien requested a review from ebarboni October 28, 2023 00:29
@mbien mbien force-pushed the ci-email-checker branch 2 times, most recently from 0770006 to 127107a Compare October 28, 2023 20:35
@mbien mbien force-pushed the ci-email-checker branch 4 times, most recently from d17b7ab to a3f84da Compare December 15, 2023 16:18
@ebarboni
Copy link
Contributor

well late :D will need rebase.

@mbien
Copy link
Member Author

mbien commented Sep 26, 2024

rebased, removed string templates and fixed a bug

@mbien mbien removed the ci:no-build [ci] disable CI pipeline label Sep 26, 2024
.github/workflows/main.yml Outdated Show resolved Hide resolved
@ebarboni
Copy link
Contributor

LGTM

@mbien
Copy link
Member Author

mbien commented Sep 27, 2024

removed the test commits and updated the error messages a bit

@mbien mbien marked this pull request as ready for review September 27, 2024 20:58
@mbien
Copy link
Member Author

mbien commented Nov 3, 2024

disabled warnings, kept the errors. Noticed another recent PR (#7776) with invalid mail while testing.

java --enable-preview .github/scripts/CommitHeaderChecker.java https://github.com/apache/netbeans/pull/7776                                                               
checking PR patch file...
::error::invalid email in commit 1 'From: Adam Sotona <[email protected]>'
1 commit(s) checked

@mbien mbien force-pushed the ci-email-checker branch 2 times, most recently from 7599da2 to 341da28 Compare November 3, 2024 23:34
Comment on lines -2609 to -2625
# last job depends on everything so that it is forced to run last even if a long job fails early
# cleanup job depends on everything so that it is forced to run last even if a long job fails early.
# 'paperwork' is left out intentionally, since it doesn't run unit tests (hopefully doesn't need restarts)
# and shouldn't prevent cleanup on validation failure - which might be common during dev time
cleanup:
name: Cleanup Workflow Artifacts
needs:
- base-build
- commit-validation
- paperwork
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is new: the paperwork job will no longer prevent CI pipeline artifact cleanup. This could otherwise create a situation where workspaces are never cleaned up (due to failing paperwork job) until shortly before PR integration - which wouldn't be ideal.

@mbien mbien requested a review from neilcsmith-net November 4, 2024 01:21
Copy link
Contributor

@matthiasblaesing matthiasblaesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I the idea, we still need to keep in mind, that this might produce false positives, but that should be rare.

I verified, that this works by rewriting it into plain java. I will not veto this, but I think it is an error to use experimental features just because. I see no benefit from using JDK 23 features here. The gatherers and the unfamiliar format make this hard to read for me.

FWIW is the JDK 21 Version (this would be JDK 11 compatible if HttpClient would not be explicitly closed):

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpClient.Redirect;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse.BodyHandlers;
import java.nio.charset.StandardCharsets;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;

// checks commit headers for valid author, email and commit msg formatting
// java CommitHeaderChecker.java https://github.com/apache/netbeans/pull/${{ github.event.pull_request.number }}

public class scratch {
    public static void main(String[] args) throws IOException, InterruptedException {
        if (args.length != 1 || !args[0].startsWith("https://github.com/")) {
            throw new IllegalArgumentException("PR URL expected");
        }

        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(args[0] + ".patch"))
                .timeout(Duration.ofSeconds(10))
                .build();

        System.out.println("checking PR patch file...");

        List<String> headerbuffer = new ArrayList<>(5);

        int index = 0;
        boolean allHeadersOk = true;

        try (HttpClient client = HttpClient.newBuilder()
                .followRedirects(Redirect.NORMAL).build()) {

            try(InputStream is = client.send(request, BodyHandlers.ofInputStream()).body();
                    InputStreamReader isr = new InputStreamReader(is, StandardCharsets.UTF_8);
                    BufferedReader br = new BufferedReader(isr)
            ) {

                for(String line = br.readLine(); line != null; line = br.readLine()) {
                    headerbuffer.add(line);
                    if(headerbuffer.size() > 5) {
                        headerbuffer.remove(0);
                    }
                    if (headerbuffer.size() == 5
                            && headerbuffer.get(0).matches("From [a-fA-F0-9]+.*") // Commit hint
                            && headerbuffer.get(1).startsWith("From: ")
                            && headerbuffer.get(2).startsWith("Date: ")
                            && headerbuffer.get(3).startsWith("Subject: ")) {
                        index++;
                        boolean headerResult = checkCommit(index, headerbuffer.get(1), headerbuffer.get(2), headerbuffer.get(3), headerbuffer.get(4));
                        allHeadersOk &= headerResult;
                    }
                }
            }
        }

        System.out.println(index + " commit(s) checked");
        System.exit(allHeadersOk ? 0 : 1);
    }

    static boolean checkCommit(int index, String from, String subject, String date, String blank) {
        return checkNameAndEmail(index, from)
                & checkSubject(index, subject)
                & checkBlankLineAfterSubject(index, blank);
    }

    static boolean checkNameAndEmail(int i, String from) {
        // From: Duke <[email protected]>
        int start = from.indexOf('<');
        int end = from.indexOf('>');

        String mail = end > start ? from.substring(start + 1, end) : "";
        String author = start > 6 ? from.substring(6, start).strip() : "";

        boolean green = true;
        if (mail.isBlank() || !mail.contains("@") || mail.contains("noreply") || mail.contains("localhost")) {
            System.out.println("::error::invalid email in commit " + i + " '" + from + "'");
            green = false;
        }
        // single word author -> probably the nickname/account name/root etc
        // mime encoded is probably fine, since gh account names can't be encoded
        if (author.isBlank() || (!author.contains(" ") && !(author.startsWith("=?") && author.endsWith("?=")))) {
            System.out.println("::error::invalid author in commit " + i + " '" + author + "' (full name?)");
            green = false;
        }
        return green;
    }

    // https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-commit.html#_discussion
    static boolean checkSubject(int i, String subject) {
        // Subject: [PATCH] msg
        subject = subject.substring(subject.indexOf(']') + 1).strip();
        // single word subjects are likely not intended or should be squashed before merge
        if (!subject.contains(" ")) {
            System.out.println("::error::invalid subject in commit " + i + " '" + subject + "'");
            return false;
        }
        return true;
    }

    // there should be a blank line after the subject line, some subjects can overflow though.
    static boolean checkBlankLineAfterSubject(int i, String blank) {
    // disabled since this would produce too many warnings due to overflowing subject lines
    //    if (!blank.isBlank()) {
    //        println("::warning::blank line after subject recommended in commit " + i + " (is subject over 50 char limit?)");
    ////        return false;
    //    }
        return true;
    }
}

@mbien
Copy link
Member Author

mbien commented Nov 4, 2024

the stream-gatherers API is final in JDK 24 - I would be available to update the script in the event if any changes are needed. CI JDK for this script is locked to JDK 23 right now and should continue working no matter what so it doesn't create a must-update deadline before our CI breaks.

I think the new single-file-program preview features do make especially sense for CI scripts since CI is very flexible in locking something to specific releases. Risk is close to 0 too. I do also use those CI scripts to actually test (sometimes improve) NetBeans support for them - which is a positive side effect I believe. (e.g #7605 only happened because of this script)

I the idea, we still need to keep in mind, that this might produce false positives, but that should be rare.

yes, but I am more worried about the opposite case: that it misses cases and committers might start completely relying on it without going through the pre-merge checklist.

@mbien mbien added the ci:no-build [ci] disable CI pipeline label Nov 22, 2024
@mbien
Copy link
Member Author

mbien commented Nov 22, 2024

updates:

 - simple script to check if PR commits have proper email addresses
 - looks for empty or single word subject messages
 - can warn about missing blank line after subject (disabled)
 - logs errors/warnings to the github actions summary page, fails on
   error

paperwork job:

 - the paperwork job will now no longer prevent the github workspace
   from being cleaned on failure
@mbien
Copy link
Member Author

mbien commented Nov 22, 2024

added a few examples for regression testing, planning to merge this soon

@mbien mbien merged commit 3bd8e2d into apache:master Nov 25, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:no-build [ci] disable CI pipeline CI continuous integration changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants