Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-127090: Fix urllib.response.addinfourl.url value for opened file: URIs #127091

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Nov 21, 2024

The canonical file: URL (as generated by pathname2url()) is now used as the url attribute of the returned addinfourl object. The addinfourl.url attribute reflects the resolved URL for both file: or http[s]: URLs now.

The original `file:` URL that was passed to `urlopen()` is now used as the
`url` attribute of the returned `addinfourl` object. The `addinfourl.url`
attribute *always* reflects the original `file:`, `data:` or `ftp:` URL
now.
@barneygale barneygale changed the title GH-127090: Fix urllib.request.addinfourl.url value for opened file: URIs GH-127090: Fix urllib.response.addinfourl.url value for opened file: URIs Nov 21, 2024
@barneygale barneygale marked this pull request as ready for review November 21, 2024 06:29
@barneygale barneygale added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 21, 2024
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, but there are not enough tests.

I think that you can add checks for url in test_file_notexists and maybe in test_basic and test_copy in urlretrieve_FileTests.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other hand, the current code removes port and fragment from url, but the changed code will preserve them. I do not know what is more correct.

For history of the current code see 18e4dd7 and 6057ba1 (bpo-8656/gh-52902).

@barneygale
Copy link
Contributor Author

Thank you Serhiy, that's very helpful.

I'll mark this PR as a draft for now. Once #127125 and #127194 land, I'll adjust this patch to call pathname2url() to canonicalise the URL, which will strip the port and fragment, restoring previous behaviour. It looks like urlopen() canonicalises HTTP URLs, and I think it makes sense to do so here.

@barneygale barneygale marked this pull request as draft November 23, 2024 13:55
@barneygale barneygale removed needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 25, 2024
@barneygale barneygale marked this pull request as ready for review November 25, 2024 21:39
@barneygale
Copy link
Contributor Author

barneygale commented Nov 25, 2024

I think that you can add checks for url in test_file_notexists and maybe in test_basic and test_copy in urlretrieve_FileTests.

I've expanded test_file_notexists, but the urlretrieve() tests aren't relevant here - they use a different codepath for handling local files (with its own set of problems...)

On other hand, the current code removes port and fragment from url, but the changed code will preserve them. I do not know what is more correct.

The existing code raises URLError if a port is given, and that's tested in test_urllib2.HandlerTests.test_file. I've slightly expanded that test to also cover fragment stripping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants