Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run bids-validator #5

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Run bids-validator #5

wants to merge 5 commits into from

Conversation

kousu
Copy link
Member

@kousu kousu commented Feb 20, 2023

Fixes #2

Follows #7

@kousu
Copy link
Member Author

kousu commented Feb 20, 2023

Some issues:

  • it prints absolute paths; on GitHub Actions or with DroneCI doing that's fine because everything is in ephemeral containers, but since we're trying to not be ephemeral maybe we want to avoid that
  • it doesn't handle git-lfs. I don't think it does anyway. Maybe it does if git-lfs is installed because git-lfs is just a lot more ergonomic and automatic?
  • we should run the output through a colorizer maybe?
  • should we strip the ^Hs generated by the various progress bars out? (this is similar to colorizing)

Also I haven't actually tested plugging it into Gitea properly yet: my test so far is just ./worker not running bids-hook as a daemon.

@kousu kousu requested a review from mguaypaq February 20, 2023 22:04
@kousu
Copy link
Member Author

kousu commented Feb 22, 2023

I've tried out the worker with Gitea:

  1. I edited my Gitea's app.ini to set webhook.ALLOWED_HOST_LIST = 127.0.0.1

    This is tricky to automate because it needs to account for the fact that this is a list.

  2. I used ran Gitea locally and opened http://localhost:3000/api/swagger to help me generate the API calls needed to deploy bids-hook; I clicked "Log in" and filled in my password.

  3. I used POST /api/v1/admin/hooks to make a new webhook; this is for gitea->bids-hook communication
    This involves making up a secret.

    curl -X 'POST' \
      'http://localhost:3000/api/v1/admin/hooks' \
      -H 'accept: application/json' \
      -H 'authorization: Basic a291c3U6ZnVja3NoaXQ=' \
      -H 'Content-Type: application/json' \
      -d '{
      "active": true,
      "authorization_header": "",
      "branch_filter": "string",
      "config": {
          "content_type": "json",
          "url": "http://127.0.0.1:2845/bids-hook"
      },
      "events": [
        "push"
      ],
      "type": "gitea"
    }'
    
    {
      "id": 2,
      "type": "gitea",
      "config": {
        "content_type": "json",
        "url": "http://127.0.0.1:2845/bids-hook"
      },
      "events": [
        "push"
      ],
      "authorization_header": "",
      "active": true,
      "updated_at": "2023-02-22T16:50:36-05:00",
      "created_at": "2023-02-22T16:50:36-05:00"
    }
    

    I'm not sure what the authorization_header is for? Do we need that?

  4. I used POST /api/v1/users/kousu/tokens to create a token; this is for bids-hook->gitea communication

    curl -X 'POST' \
      'http://localhost:3000/api/v1/users/kousu/tokens' \
      -H 'accept: application/json' \
      -H 'authorization: Basic XXXXXXXXXXXXX=' \
      -H 'Content-Type: application/json' \
      -d '{
      "name": "bids-hook",
      "scopes": [
        "repo"
      ]
    }'
    

    return: 201 CREATED

    {
      "id": 2,
      "name": "bids-hook",
      "sha1": "14d3554c1ac7fe4f25348d32bc4b8e410bf7b638",
      "token_last_eight": "0bf7b638",
      "scopes": null
    }
    

    This returns a secret (rather, a token; which is different than the other kind of secret because it's just a password whereas a secret is used as a signing key)

  5. Manually edited ./start to write in the two secrets

  6. mkdir gitea/custom/public/

  7. Run ./start

    p115628@joplin:~/src/neurogitea/bids-hook$ ./start 
    2023/02/22 17:11:31 main: reading config from environment
    2023/02/22 17:11:31 main: starting worker
    2023/02/22 17:11:31 main: listening on "http://127.0.0.1:2845/bids-hook"
    

Then I pushed to a repo I already had. The extremely appealing part of this design: I didn't have to log into bids-hook, I didn't have to "activate" (i.e. install webhooks) each repo, I didn't have to write in a script (like .drone.yml) to do everything. It Just Worked:

2023/02/22 17:12:05 router: got request for "POST" "/bids-hook"
2023/02/22 17:12:05 postHandler: accepted job {"kousu" "spine-generic-processed" "0a11e3d7e6659e0ce33248332a647af9e6975170" "90fa9a89-5e50-4bd4-834d-d42655e1ee8e"}
2023/02/22 17:12:05 worker: starting job {"kousu" "spine-generic-processed" "0a11e3d7e6659e0ce33248332a647af9e6975170" "90fa9a89-5e50-4bd4-834d-d42655e1ee8e"}

Screenshot_20230222_171252

It hung for a while. I inspected and could see it was lagging at running git-annex-smudge, of course:

p115628@joplin:~/src/neurogitea/test/spine-generic-processed2$ pstree -ap $USER
ssh,1103383

sshd,1103416

sshd,2125589
  ├─bash,2125590
  │   ├─gitea,2127455
  │   │   ├─{gitea},2127456
  │   │   ├─{gitea},2127457
  │   │   ├─{gitea},2127458
  │   │   ├─{gitea},2127459
  │   │   ├─{gitea},2127460
  │   │   ├─{gitea},2127461
  │   │   ├─{gitea},2127462
  │   │   ├─{gitea},2127463
  │   │   ├─{gitea},2127464
  │   │   ├─{gitea},2127465
  │   │   ├─{gitea},2127466
  │   │   ├─{gitea},2127467
  │   │   ├─{gitea},2127468
  │   │   ├─{gitea},2127469
  │   │   ├─{gitea},2127470
  │   │   ├─{gitea},2127471
  │   │   ├─{gitea},2127472
  │   │   ├─{gitea},2127473
  │   │   ├─{gitea},2127474
  │   │   ├─{gitea},2127475
  │   │   ├─{gitea},2127476
  │   │   ├─{gitea},2127477
  │   │   ├─{gitea},2127478
  │   │   ├─{gitea},2127479
  │   │   ├─{gitea},2127484
  │   │   ├─{gitea},2127485
  │   │   ├─{gitea},2127486
  │   │   ├─{gitea},2127487
  │   │   ├─{gitea},2127488
  │   │   ├─{gitea},2127497
  │   │   ├─{gitea},2127498
  │   │   ├─{gitea},2127499
  │   │   ├─{gitea},2127500
  │   │   ├─{gitea},2127501
  │   │   ├─{gitea},2127502
  │   │   ├─{gitea},2127503
  │   │   ├─{gitea},2127504
  │   │   ├─{gitea},2127505
  │   │   ├─{gitea},2127506
  │   │   ├─{gitea},2127507
  │   │   ├─{gitea},2127508
  │   │   ├─{gitea},2127509
  │   │   ├─{gitea},2127510
  │   │   └─{gitea},2127511
  │   └─vi,2126990 custom/conf/app.ini
  ├─bash,2125622
  ├─bash,2125905
  │   └─pstree,2128953 -ap p115628
  └─bash,2126736
      ├─start,2128178 ./start
      │   └─bids-hook,2128179
      │       ├─worker,2128471 ./worker
      │       │   └─worker,2128482 ./worker
      │       │       └─worker,2128644 ./worker
      │       │           └─git,2128931 annex copy --from origin
      │       │               └─git-annex,2128932 copy --from origin
      │       │                   ├─(git,2128941)
      │       │                   ├─(git,2128942)
      │       │                   ├─(git,2128945)
      │       │                   ├─(git,2128946)
      │       │                   ├─git,2128947 --git-dir=.git --work-tree=. --literal-pathspecs cat-file ...
      │       │                   ├─git,2128948 --git-dir=.git --work-tree=. --literal-pathspecs cat-file ...
      │       │                   ├─git,2128949 --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch
      │       │                   ├─git,2128950 --git-dir=.git --work-tree=. --literal-pathspecs cat-file ...
      │       │                   ├─{git-annex},2128933
      │       │                   ├─{git-annex},2128934
      │       │                   ├─{git-annex},2128935
      │       │                   ├─{git-annex},2128937
      │       │                   └─{git-annex},2128951
      │       ├─{bids-hook},2128180
      │       ├─{bids-hook},2128181
      │       ├─{bids-hook},2128182
      │       ├─{bids-hook},2128183
      │       └─{bids-hook},2128184
      └─vi,2126792 start

When it finally got to bids-validator, it finished in a couple seconds, and failed:

Screenshot_20230222_171548

clicking the red X showed me why:

Screenshot_20230222_171634

🎉

@kousu
Copy link
Member Author

kousu commented Feb 22, 2023

Deployment is more complicated than I would like still, but it's not really more complicated than

https://github.com/neuropoly/computers/blob/98a219cf8b0987cd35d00f40f8d1661b9bff4e7c/ansible/neuropoly.data.yml#L29-L171

(though some of that will need to stay: generating the ci-admin user is going to be necessary since we need to be able to make API calls)

@kousu kousu changed the title [WIP] Run bids-validator Run bids-validator Feb 25, 2023
@kousu kousu changed the base branch from main to output-subdirs February 25, 2023 04:40
Subdirs are simply named by taking the prefix of the input UUID.

This is a common technique to prevent the chance of overflowing filesystem
limits on the number of files allowed in a single directory.

e.g.

Before
 - /srv/gitea/custom/public/90fa9a89-5e50-4bd4-834d-d42655e1ee8e.html
 - http://127.0.0.1:3000/assets/90fa9a89-5e50-4bd4-834d-d42655e1ee8e.html

After:
 - /srv/gitea/custom/public/bids-validator/84/b1/84b10e1c-b188-41e2-a92d-fb6ded9c6889.html
 - http://127.0.0.1:3000/assets/bids-validator/84/b1/84b10e1c-b188-41e2-a92d-fb6ded9c6889.html
To test:

- set up a gitea install at ../gitea/ and run `./start` (if elsewhere, run `GITEA_APP_PATH=path/to/gitea ./start`)
- make a repo in it; upload some git-annex (or not git-annex) files
- commit and push to the test repo
worker Show resolved Hide resolved
Base automatically changed from output-subdirs to main February 28, 2023 18:56
Similar to results, see 5d3f078.

* While we're at it, create the log folder if need be, instead of
  erroring out if it doesn't exist already.

* As a result, we no longer need to import io/fs.

* Uniformize permissions to 0750 and 0640 too.
* I added documentation for the script's contract/API in the script
  itself, for people who want to modify it.

* I restored the ERR trap, but managed to still save an exit status
  using the following idiom:

  ```sh
  some_command && STATUS=$? || STATUS=$?
  ```

* I added a temp dir cleanup trap on EXIT, and checked that the
  commands in it don't cause ERR traps, and don't affect the
  script's exit status.

* I removed the $BH_UUID from the temp dir name, because that's
  mildly sensitive info, and the temp dir name is world-readable.

* I removed `--verbose` from `bids-validator .` because it can
  generate so many lines that it's hard to find the actual warning
  types in the output.

* With an extra dependency on `jq`, I added a way to check whether
  `bids-validator` returns success with or without warnings.

* I added an `ansifilter` step to turn the terminal color escape
  sequences and characters like `<`, `&` into valid HTML.
  (I guess we should similarly sanitize $BH_USER, etc.?)

* I made the script exit status conform to its contract/API.
Hopefully this name is more inviting for people who want to mod it.
@mguaypaq
Copy link
Member

mguaypaq commented Mar 2, 2023

Thanks for researching the best steps to use for the git checkout!

I revamped the worker script, and tested it in the following scenarios:

  • A non-annexed repository.
  • An annexed repository.
  • A correct .bids-validator-config.json file.
  • An incorrect .bids-validator-config.json file (which resulted in this upstream pull request).

After removing a bunch of the bugs I put in, I got it to show me yellow dots, green checkmarks, red exes, yellow exclamations, and red exclamations, along with results pages and log files.

One question is: do we want to bother with showing warnings as yellow marks? If so, maybe we should supply an example .bids-validator-config.json file in a template so that people can easily ignore the warnings they don't care about.

Another question is, do we want to configure the webhook to ignore the branches git-annex and synced/*?

@mguaypaq
Copy link
Member

mguaypaq commented Mar 2, 2023

(@kousu I can't add you as a reviewer because you're the PR author, but I would love to hear your comments.)

@kousu
Copy link
Member Author

kousu commented Mar 2, 2023

Ah wonderful. Thanks for trying out all those cases.

I don't suppose you kept screenshots of each case? I'd like to look over the log files. Maybe do you have a repo with branches for each of the cases you tested I could pull somewhere?

One question is: do we want to bother with showing warnings as yellow marks?

Yes, definitely, that's super handy. I guess you're worried that it will train people to ignore it though?

I'm unhappy that BIDS is sort of 99 standards in one so that it needs to be the kind of standard where people often ignore the warnings they don't care about. Ideally our datasets conform the baseline BIDS, and we fix the data not the warnings.

But either way including .bids-validator-config.json in our final dataset template is a good idea -- I was already mulling that.

configure the webhook to ignore the branches git-annex and synced/*?

mmmmmyesssss but I think it'd be better to solve this in gitea itself: neuropoly/gitea#4.

I wish this script didn't have to know about git-annex.

I would love to hear your comments

I'll look over the worker and give thoughts now!

Comment on lines 106 to 126
OUTPUT=$(bids-validator .) && STATUS=$? || STATUS=$?
WARNINGS=$(bids-validator . --json | jq '.issues.warnings | length') || true
cat 1>&2 <<EOF
STATUS=${STATUS}
WARNINGS=${WARNINGS}
EOF

echo 1>&2 '# formatting output'
HTML=$(echo "$OUTPUT" | ansifilter --html --fragment --line-numbers --anchors=self)

# Produce the HTML result page on stdout.
cat <<EOF
<!DOCTYPE html>
<html lang=en>
<title>bids-validator results for ${BH_USER}/${BH_REPO}@${BH_COMMIT}</title>
<p>Here are the bids-validator results for ${BH_USER}/${BH_REPO}@${BH_COMMIT}:</p>
<pre>
${HTML}
</pre>
</html>
EOF
Copy link
Member Author

@kousu kousu Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This buffers the output and runs it through the shell. Is echo "$OUTPUT" safe? What happens if there's nuls in the output? Weird unicode? What happens if the data is huge and overflows RAM? It's a lot safer to pipe:

cat <<EOF
<!DOCTYPE html>
<html lang=en>
<title>bids-validator results for ${BH_USER}/${BH_REPO}@${BH_COMMIT}</title>
<p>Here are the bids-validator results for ${BH_USER}/${BH_REPO}@${BH_COMMIT}:</p>
<pre>
EOF

(bids-validator . | ansifilter --html --fragment --line-numbers --anchors=self) && STATUS=$? || STATUS=$?

cat <<EOF
</pre>
</html>
EOF

(pipes will block if they're out of space until they're not out of space)

That was why I was using nested subshells -- you can grab their exit codes while still processing data 'live', piped direct to stdout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that bash ignores null bytes when doing variable expansions. At least it prints out a warning that this is the case! I thought with enough quoting it would be safe (apart from the issue that trailing newlines are discarded when doing $(...)), but I was insufficiently paranoid. This is a good lesson to learn, thanks.

But since pipes are safe, let's use them, yes. Maybe wrapped in a function, if we want to be nice about formatting the script.

EOF

(
WORKDIR="$(mktemp -d --tmpdir bids-hook."$BH_UUID".XXXXXXXXXXX)"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't change much either way, but why do you think BH_UUID is sensitive?

Copy link
Member

@mguaypaq mguaypaq Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that the BH_UUID is the only "secret" which is needed to view someone else's bids-validator output, because Gitea doesn't have any way to authenticate visitors of the static assets. And this output can contain a bunch of metadata about subjects, etc.

I guess we have bigger problems if someone is logged into the server anyway, but just doing ls /tmp at the right time, while logged in as any user, would reveal the BH_UUID for the running jobs, and then you can visit the corresponding webpage with the bids-validator output.

Comment on lines 57 to 58
GITEA_REPO=$(realpath --canonicalize-existing \
"${GITEA_REPOSITORY_ROOT}/${BH_USER}/${BH_REPO}.git")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I wanted to be more transparent in the output, I wanted; which meant showing a relative path to the user, maybe.

I just did a quick test and git clone canonicalizes local paths; but git remote add doesn't. So..yeah I guess we need this either way; i didn't test with GITEA_REPOSITORY_ROOT set to a relative path I guess. And we'll just have to live with exposing full paths. Drone does, after all (but Drone mostly works in ephemeral docker containers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I actually ran my tests with relative paths, and everything failed, that's why I put this in. And since I wasn't trying to show this path to users, only in the admin logs, I hadn't thought about showing a pretty path. Good point.

Comment on lines 41 to 102
if git ls-remote --exit-code "$GITEA_REPO" "refs/heads/git-annex" >/dev/null; then
ANNEXED=1
else
ANNEXED=""
fi

if [ -n "$ANNEXED" ]; then
set -x
# If this is a git-annex repository, we need to get the contents.
if git ls-remote --exit-code origin refs/heads/git-annex >/dev/null; then
echo '# getting git-annexed files'
# this reduces copies; always overrides annex.hardlink even if that is set system-wide
git config annex.thin true
# make sure we don't corrupt origin accidentally
git config remote.origin.annex-readonly true
git config annex.private true # XXX this doesn't do anything until git-annex 10
git annex init
git annex dead here # this is like annex.private, but has to be run
git annex sync --only-annex --no-content # grab the git-annex branch (since we did a shallow clone above)
git annex copy --from origin # NB: using copy --from origin and not git annex get to ensure we're validating the contents of origin and not any special remotes
else
set -x
# grab the git-annex branch (since we did a shallow clone above)
git annex sync --only-annex --no-content
# NB: using 'copy --from origin' and not 'git annex get; to ensure we're
# validating the contents of origin and not any special remotes
git annex copy --from origin
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users should see all these commands.

Drone inserts a clone step into every pipeline by default (unless you turn it off). It appears like this, with each input command being written + CMD (from set +x, or maybe from emulating it) and followed by it's output (stderr and stdout and all):

Build #68440 1 1 - go-gitea_gitea - Drone CI

I think it's easier to understand set -x's output ("+ git remote add ${GITEA_REPO} && git fetch ...") than printed comments ("# cloning ${GITEA_REPO}"). It's longer, but when things go wrong -- which is the main time people read these logs -- people will need to see what commands were actually run. I want that to show up in the final output HTML. Our curators need to be able to have a chance to see what went wrong, without resorting to asking their sysadmin to go dig out the logfile and help them debug git-annex every time. That has no chance of scaling up, especially since I hope eventually we have enough installs and neurogitea is reliable enough that there can be sysadmins who do not need to be intimately familiar with git-annex 🙏

So, I think:

  • set -x is important (which implies 2>&1'ing this section, maybe the whole script)

  • errors in any of these commands count as exit code 1, not exit code 3

    • I'll grant that 'git: command not found' or 'git-annex: not found', maybe '${GITEA_REPO}: not found', should theoretically be exit code 3 since that's a server misconfiguration. But anything other error the user needs to see. And since it's hard to tease apart which is which, we should just dump all errors here to exit code 2.

      A very common error for curators will be some variant of running git annex copy --to but forgetting to git annex sync (or worse: force pushing to git-annex) or otherwise getting git-annex out of sync with the branch under test is the kind of error that curators need to get feedback on. Arguably we should have a separate webhook to check git-annex health, but even if we did the error should also show up when trying to run this; on other CI systems, you would expect a corrupted version control database to show up as an error when trying to run the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that I was working from a completely different mental model of who should see what, sorry! But your points make a lot of sense.

Comment on lines 47 to 89
if [ -n "$ANNEXED" ]; then
set -x
# If this is a git-annex repository, we need to get the contents.
if git ls-remote --exit-code origin refs/heads/git-annex >/dev/null; then
echo '# getting git-annexed files'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an exception to my own rule that everyone should see everything. I originally had it written this way, but with set -x turned on, and it turned out that ls-remote is pretty noisy, whether it succeeds or fails, so I decided users could live without it. It's not something anyone would normally run anyway.

By the way, can you think of a better way to detect if a repo is annexed? git annex init contacts the remote and reads its config, looking for annex.uuid, but I don't know how to do that from the command line without actually running git annex init and by then it's too late.

cat <<"EOF"
echo 1>&2 '# running bids-validator'
OUTPUT=$(bids-validator .) && STATUS=$? || STATUS=$?
WARNINGS=$(bids-validator . --json | jq '.issues.warnings | length') || true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disappointing to have to run this twice. bids-validator is pretty quick, even on large datasets, but I don't trust it to stay that way.

Is there another flag where maybe we could get the warnings routed...elsewhere?

Or can we patch bids-validator so that it exits:

  • 0 if no issues exist
  • 1 if issues exist and contain errors
  • 2 if issues exist

Does it already do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it already do that?

I don't think so. Looking at the source, here is the meaning of each exit status for bids-validator:

  • Exit 3 for internal errors (the equivalent of a traceback).
  • Exit 2 for problems opening the dataset directory.
  • Exit 1 if the .bids-validator-config.json is invalid or the dataset contains "error"-level issues.
  • Exit 0 otherwise, including if there are "warning"-level issues.

fi

bids-validator --verbose .
Copy link
Member Author

@kousu kousu Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having verbose on.

I see that we do this in

https://github.com/ivadomed/ivadomed/blob/e101ebea632683d67deab3c50dd6b372207de2a9/.github/workflows/run_tests.yml#L112

but not in

https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/f3d7ad0c78b1ae3232d14ca060f3893d71444803/.ci.sh#L40

So maybe it's a cultural thing.

My argument is that: the CI machine is a batch job. You let it run and wait for it to bake (or boil). If something goes wrong, you don't want to redo its work because that's potentially expensive. So if we're going to output any errors/warnings we should output all of them.

We can help out the cognitive load by colouring errors in red, and warnings in yellow (which ansifilter does now!).

And with bids-validator in particular, not having --verbose makes it quit early when listing issues, which means the most direct route people will try to follow to fix them will be: fix the ones on the screen, commit, push, wait, look at the new list of failures. In that case what people should do, I guess, is see that CI failed, ignore the rest of its output, run bids-validator --verbose locally on their dataset until it passes, then commit and push. But people often will just try to take a shortcut and not do that. And also CI/dev are subtly different environments (for example: if somehow the annexed data is corrupted/out of sync/not committed, dev could pass but CI might fail, and CI is there to catch that kind of mistake).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm swayed by your point that bids-validator will quit early without --verbose, which I didn't know.

But for the record, my main reason came from recent experience with --verbose being counter-productive: for example, the output for the "Checking BIDS compliance step" in spine-generic/data-multi-subject is literally more than 10,000 lines. No amount of colour highlighting can make that pleasant. And at least in my browser, Github's UI looks like it's doing some kind of fancy loading and unloading of the content, which breaks text search. If I scroll to the top, I can find the first warning; if I scroll to the bottom, I can find the third warning; but I have not yet been able to find the second warning buried in that mountain of output.

@mguaypaq
Copy link
Member

mguaypaq commented Mar 9, 2023

I don't suppose you kept screenshots of each case? I'd like to look over the log files. Maybe do you have a repo with branches for each of the cases you tested I could pull somewhere?

I don't have screenshots, and I'm not sure which log file is which case, sorry. I should get in the habit of taking screenshots. For the repos I basically used spine-generic/data-single-subject, with most of the subjects deleted so that it would be a small example. That was for the git-annexed version, and then I just copied the files (including the real content) to a new repo for the non-git-annex tests.

git checkout "$BH_COMMIT"

# If this is a git-annex repository, we need to get the contents.
if git ls-remote --exit-code origin refs/heads/git-annex >/dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the repository is local anyway, we could instead check directly for a valid annex.uuid in the config:

Suggested change
if git ls-remote --exit-code origin refs/heads/git-annex >/dev/null; then
if git -C "$GITEA_REPO" config --get annex.uuid '^[[:xdigit:]]{8}(-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12}$' >/dev/null; then

@mguaypaq
Copy link
Member

configure the webhook to ignore the branches git-annex and synced/*?

mmmmmyesssss but I think it'd be better to solve this in gitea itself: neuropoly/gitea#4.

That would be great, yes. But also here is some silliness that I'm recording for future reference:

None of these suggestions is really serious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write an actual worker script
2 participants