Skip to content

Commit

Permalink
Fallback to urlcanon when URL parsing fails
Browse files Browse the repository at this point in the history
Java's URL parser has become more strict over time so there are URLs that are accepted by old versions but are now rejected.
  • Loading branch information
ato committed Aug 30, 2024
1 parent 51df5d0 commit 95c01d7
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/outbackcdx/UrlCanonicalizer.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.netpreserve.urlcanon.Canonicalizer;
import org.netpreserve.urlcanon.ParsedUrl;
import org.snakeyaml.engine.v2.api.Load;
import org.snakeyaml.engine.v2.api.LoadSettings;

Expand Down Expand Up @@ -370,7 +372,10 @@ public static String canonicalize(String rawUrl) {
try {
return canonicalize(makeUrl(rawUrl)).toString();
} catch (MalformedURLException e) {
return rawUrl;
// if Java's URL parser rejected the input fallback urlcanon instead
ParsedUrl parsedUrl = ParsedUrl.parseUrl(rawUrl);
Canonicalizer.AGGRESSIVE.canonicalize(parsedUrl);
return parsedUrl.toString();
}
}

Expand Down

0 comments on commit 95c01d7

Please sign in to comment.