Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[omm] Hash api implementation first draft #1355

Merged
merged 4 commits into from
Sep 12, 2023
Merged

[omm] Hash api implementation first draft #1355

merged 4 commits into from
Sep 12, 2023

Conversation

Dcallies
Copy link
Contributor

@Dcallies Dcallies commented Sep 8, 2023

Summary

Use the storage interfaces to pull the currently configured SignalType / ContentTypes, then download the content from the given URL and then process it.

Test Plan

curl 'localhost:5000/h/hash?'\
'url=https://github.com/facebook/ThreatExchange/blob/main/pdq/data/bridge-mods/aaa-orig.jpg?raw=true&'\
'content_type=photo'

Response:

{
  "pdq": "f8f8f0cee0f4a84f06370a22038f63f0b36e2ed596621e1d33e6b39c4e9c9b22"
}

@Dcallies Dcallies added the hma Items related to the hasher-matcher-actioner system label Sep 8, 2023
@github-actions github-actions bot removed the hma Items related to the hasher-matcher-actioner system label Sep 8, 2023
# the devcontainer and getting an error, just override the env
app.config.from_pyfile("/workspace/.devcontainer/omm_config.py")
else:
raise RuntimeError("No flask config given - try populating OMM_CONFIG env")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running the flask CLI just to get a feel for it, and my run had an error, and I spent ~10 minutes learning this env variable needed to be populated, which I then realized for development will always just be this same value.

I changed the default for CLI is throw exception to default load this config.

@@ -10,18 +26,65 @@ def hash_media():
Fetch content and return its hash.
TODO: implement
"""

content_type = _parse_request_content_type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we get the content type from the response headers (below) and simplify the interface?
download_resp.headers['content-type']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me - sometimes the content type cannot be determine from the URL, should we provide an optional param in that case?

@Dcallies Dcallies merged commit cc04f20 into main Sep 12, 2023
4 of 5 checks passed
@Dcallies Dcallies deleted the hash_shim branch September 12, 2023 20:41
@Dcallies
Copy link
Contributor Author

Discussed offline, doing content detection in a followup and removing as an argument for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants