-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dont leak ip address #310
Dont leak ip address #310
Conversation
This reverts commit 0911477. urllib3 is needed to postprocess/parse/URL for sanitization and privacy purpose (bibanon#192)
We fix this by carefully redacting the IP address in the JSON fields known to contain it.
Not all heroes wear capes. Will review tomorrow and if everything looks good, we'll ship at as soon as possible. Thanks for putting the time into this. |
flake8 error. Do you have example test upload from this branch? edit: Do we need to lock that dep version? Can that be lifted? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this dep version locked?
I have questions @drzraf
When you answer these three questions I'll merge. |
No, I just reverted a previous commit for sake of commit history/revision tracking/reviewing process. I can remove the version lock if you prefer so.
It's not clear to me what do you want and why. A modified file? It easy to generate one: #!/usr/bin/python3
import json, sys
from tubeup.utils import strip_ip_from_meta
with open(sys.argv[1], 'r') as f:
_, new_meta = strip_ip_from_meta(json.load(f))
print(json.dumps(new_meta))
# Run with test.py tests/test_tubeup_rootdir/downloads/Mountain_3_-_Video_Background_HD_1080p-6iRV8liah8A.info.json I'm attaching a modified file if that's what you meant: a.json.gz
It's safe because:
Let's keep in mind that, overall, these files contains metadata related to the downloaded video. Some are very useful, other are garbage. In the case of these URL, then often contain expiring tokens/query parameters making them unreachable after a specific amount of time: they are in no way stable or canonical resources/identifier (some suggested to even drop them entirely). BTW, Google could still identify uploaders based on the temporary tokens parameters which this PR doesn't currently redact. tl;dr : These are mostly unimportant metadata but reasonable efforts have been provided to ensure safely obfuscation so that no side-effect is expected on Youtube and there is very very low probability a significant side-effect happened to the URLs from another provider. |
Useless because most people upload from Linux where they don't normally log in, or use a separate cookie file. Edit: Thanks for this I'll cut a new version later. |
I was asking for a link to a example upload where you tested this PR. Beyond just the test. I wanted to inspect the results of what it does. |
Your PR broke Tubeup. Expect a hard reset on the repo, and extreme scrutiny and testing standards for your PRs in the future. |
Would you mind explaining what did it broke? |
This is a lot of effort to not show me a URL. That PR put me through so much work, I tried to lift then re-added the urllib3 version lock, pushed new versions to pypi. I had to yank 3 releases due to that PR then hard reset Github I couldn't revert your PR due to me doing a bunch of other work after it, which is now all gone. Personally I'm not seeing a security issue from the IP in a VOD URL. Use a VPS. |
My main question remains : what's wrong with the PR? I obviously assume good faith. I also assume only technical implementations details are at stake here (unlike in #311). Otherwise, I suggest to reopen the PR and offer details about the problems it may have cause, so that I revamp and improve it. thx |
He just wants the URL of a successful upload with this patch. I'd assume your code is right, but only works for YouTube and breaks other sites. |
No it broke everything. |
@mrpapersonic If you want this PR in, I want it verified it works. So since the author cannot just give me an example URL, if you want this in make sure it works first. Because me and #313 saw how it was broken |
Hi, you haven't enabled the ability to make issues on your fork so I'm doing this here. Every time a item is uploaded to Archive.org the "Scanner" tag on the item lists the script name and version:
BinAnon is requesting you change this, you can make it "tubeup-drzraf" or whatever you want, but not our Scanner ID. Your fork divirges from ours in function significantly, specifically with open yt-dlp flags that allow for non-standard output and we don't want staff or future historians confusing the two. This is for automation of fixing items by IA staff. Additionally you list a link to our PyPi repo. Please also change that if you're going to fork. Please change this ASAP. Thanks. |
Long needed, long requested feature to stop leaking contributors/uploaders IP addresses.
#145, #146, #192
Here finally implemented by carefully parsing the URL as suggested previously.