nightlies and link validation failing because of repository.apache.org blockage #1585

raboof · 2024-12-16T11:15:57Z

Our nightlies and link validation sometimes fail when it is ran on a GitHub Actions running that is blocked from repository.apache.org.

Infra seems open to create per-project buckets for the abuse thresholds, but we'd have to add a header to the requests to identify ourselves.

Looks like this would depend on coursier/coursier#1203

pjfanning · 2024-12-16T11:19:59Z

I wonder if we could try ordering the resolvers in sbt.

I've seen failures where we get issues loading 3rd party jars because our sbt setup seems to check repository.apache.org before checking maven central. Ideally, repository.apache.org should be checked last.

Humbedooh · 2024-12-16T11:59:44Z

https://brettporter.wordpress.com/2009/06/16/configuring-maven-http-connections/ suggests you can set a custom user agent header for the requests. We could make use of this, if we come up with a standard format for denoting ASF projects. This would allow us to tailor rules to both be more lenient in these cases, as well as debug which projects or builds are causing issues.

raboof · 2024-12-16T13:17:44Z

I wonder if we could try ordering the resolvers in sbt.

I've seen failures where we get issues loading 3rd party jars because our sbt setup seems to check repository.apache.org before checking maven central. Ideally, repository.apache.org should be checked last.

I agree that would be a good thing to keep an eye on. 'Normal' CI builds shouldn't reference repository.a.o at all, though, right? And even when including repository.apache.org, I think sbt should use Maven Central first regardless of what additional things we put into resolvers (e.g. sbt/sbt#1138)

https://brettporter.wordpress.com/2009/06/16/configuring-maven-http-connections/ suggests you can set a custom user agent header for the requests

Yes (or arbitrary other headers). Pekko uses sbt instead of mvn to access the Maven repository, though, so that'd need a separate change.

pjfanning · 2024-12-31T10:43:09Z

@raboof one source of strain that we put on repository.apache.org is from https://github.com/pjfanning/sbt-pekko-build

This has logic to find the latest snapshot versions by scraping pages served by repository.apache.org.

raboof · 2025-01-03T10:26:16Z

We haven't seen GitHub Actions runners get blocked anymore by the "too many 404's on repository.apache.org" rule since apache/ranger#435 was merged. I now (ack'ed by infra) removed all those bans.

That should help, but GitHub Actions runners are still being banned for Bugzilla scraping (> 800req/hr to show_bug.cgi). I guess we should look into whether those are 'real' scrapers or some misconfigured job somewhere as well.

raboof · 2025-01-03T10:51:06Z

looks like this might be bingbot, filed https://issues.apache.org/jira/browse/INFRA-26405 to get a robots.txt in place

pjfanning mentioned this issue Dec 16, 2024

stop running link validator on every PR commit #1595

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nightlies and link validation failing because of repository.apache.org blockage #1585

nightlies and link validation failing because of repository.apache.org blockage #1585

raboof commented Dec 16, 2024

pjfanning commented Dec 16, 2024

Humbedooh commented Dec 16, 2024

raboof commented Dec 16, 2024

pjfanning commented Dec 31, 2024 •

edited

Loading

raboof commented Jan 3, 2025

raboof commented Jan 3, 2025

nightlies and link validation failing because of repository.apache.org blockage #1585

nightlies and link validation failing because of repository.apache.org blockage #1585

Comments

raboof commented Dec 16, 2024

pjfanning commented Dec 16, 2024

Humbedooh commented Dec 16, 2024

raboof commented Dec 16, 2024

pjfanning commented Dec 31, 2024 • edited Loading

raboof commented Jan 3, 2025

raboof commented Jan 3, 2025

pjfanning commented Dec 31, 2024 •

edited

Loading