Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KIEKER-1335] Crawler checking for dead links on the web site? #2817

Open
rju opened this issue Nov 12, 2024 · 12 comments
Open

[KIEKER-1335] Crawler checking for dead links on the web site? #2817

rju opened this issue Nov 12, 2024 · 12 comments

Comments

@rju
Copy link
Member

rju commented Nov 12, 2024

JIRA Issue: KIEKER-1335 Crawler checking for dead links on the web site?
Original Reporter: Andre van Hoorn


Maybe some free online services exist? Otherwise should be easy to implement with wget or curl.

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author Jan Waller -- Wed, 19 Dec 2012 21:45:42 +0100

Maybe the Google Webtools can do this as well? (See KIEKER-718 Done )

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Tue, 15 Jan 2013 16:30:39 +0100

Replying to [jwa|comment:1]:
> Maybe the Google Webtools can do this as well? (See KIEKER-718 Done )

Google admin tools seems to check for dead internal links. However, external links are not checked (e.g., the broken TrustSoft link we had)

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Tue, 15 Jan 2013 16:31:12 +0100

Replying to [avh|comment:2]:
> Replying to [jwa|comment:1]:
> > Maybe the Google Webtools can do this as well? (See KIEKER-718 Done )
>
> Google admin tools seems to check for dead internal links. However, external links are not checked (e.g., the broken TrustSoft link we had)

But we should check it ...

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Mon, 20 Jan 2014 10:16:23 +0100

Replying to KIEKER-1335 Accepted :
> Maybe some free online services exist? Otherwise should be easy to implement with wget or curl.

Another option we are now using in our group is the `inkchecker tool, also included in the Ubuntu distro.

Example output obtained by calling {{ linkchecker http://kieker.uni-kiel.de/trac/ }} attached to this ticket kiekerurl.log (cancelled after some time). Thanks to Teerat.

Could make sense to use for both:
1. http://kieker-monitoring.net
1. http://trac.kieker-monitoring.net

Would, of course, need some further configuration. How about running it as a Jenkins job?

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Mon, 20 Jan 2014 13:49:19 +0100

Replying to [avh|comment:6]:
> Replying to KIEKER-1335 Accepted :
> > Maybe some free online services exist? Otherwise should be easy to implement with wget or curl.
>
> Another option we are now using in our group is the {{inkchecker tool, also included in the Ubuntu distro.
>
> Example output obtained by calling {{ linkchecker http://kieker.uni-kiel.de/trac/ }} attached to this ticket kiekerurl.log (cancelled after some time). Thanks to Teerat.
>
> Could make sense to use for both:
> 1. http://kieker-monitoring.net
> 1. http://trac.kieker-monitoring.net
>
> Would, of course, need some further configuration. How about running it as a Jenkins job?

  • ignore }}robots.txt`. we want to check these pages
  • think times between requests (avoid DoS)
  • Which parts of trac to include (tickets?)
  • <add>

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author nils-christian -- Mon, 23 Feb 2015 16:54:50 +0100

Delegate to next free HiWi.

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author nils-christian -- Wed, 18 Mar 2015 23:10:51 +0100

1.12?

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Fri, 4 Sep 2015 02:30:55 +0200

Thomas: interested?

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author Thomas F. Düllmann -- Tue, 8 Sep 2015 16:07:27 +0200

There is a web-based tool by the W3C https://validator.w3.org/checklink which seems to do exactly this.
For a script in Jenkins something like this could be useful: http://memory.psych.mun.ca/tech/snippets/wget.shtml.
The downside of the wget script is, that it does not show the page where the invalid link is located. It just says which link did not work.

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Wed, 6 Sep 2017 09:42:25 +0200

Would be nice to have a Jenkins job for this.

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author André van Hoorn -- Wed, 6 Sep 2017 09:47:08 +0200

For instance via https://validator.w3.org/checklink?uri=http%3A%2F%2Fkieker-monitoring.net&hide_type=all&recursive=on&depth=-1&check=Check and parse results.

@rju
Copy link
Member Author

rju commented Nov 12, 2024

author Thomas F. Düllmann -- Mon, 11 Sep 2017 17:15:51 +0200

By running

wget -O linkchecker.html -r "https://validator.w3.org/checklink?uri=http%3A%2F%2Fkieker-monitoring.net%2F&summary=on&hide_type=all&recursive=on&depth=-1&check=Check"


one gets a file like this: linkchecker.html
This could be a first step to have as an artifact.
Later on it would be an option to parse that to see how many warnings/errors are present and use them similar as our thresholds for PMD/Findbugs/Checkstyle to make sure there are not more issues than necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant