Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocorrect misidentified Files from remote sources after Exif/pronom extraction #278

Open
DiegoPino opened this issue Sep 28, 2023 · 0 comments
Assignees
Labels
Configuration Danger Mr Robinson Things so core to us that need extra care. Please submit automated tests? Digital Preservation enhancement New feature or request Events and Subscriber JSON Postprocessors Drupal Plugins that do stuff with JSON data question Further information is requested
Milestone

Comments

@DiegoPino
Copy link
Member

DiegoPino commented Sep 28, 2023

What?

I should have come up with this before but here I am, just a boy standing in front of an application/octet-stream knowing it is a tiff (but still does not love me?)

How?

Specially when dealing with remote Files and ill-configured HTTP servers, we have ended with Files being ingested via AMI and indentified/routed toas:document bc the headers were absent and we did not even have an extension, but once persisted, saved and exif/pronom etc. kicked in we could get the real format from inside (and more precise than we could have gotten ever by just fetching and downloading). And all this wonderful tech metadata is there and stored. The issue is the Drupal File entity is already created, the file is in its final position (and probably in S3:// using either just a dot, no extension of stuff like .bin).

But we have 3, last minute "signs", things we can act on (in the analogy of the boy standing in front, let's say these are orange flowers and chocolate covered cherries as response to a smile).

  • We know we have anapplication/octet-stream at the dr:mimetype level
  • AND We don't have a real extension. (c'mon, .bin is not real)
  • AND We know flv:exif && pronom inferred from signature mimetype are telling us a different story

Based on that we could kick a "save the night and dance at least once" action that under this conditions does a last attempt, deduces the right extension, renames the name and the S3 file path (cheaper than deleting/re ingesting), edits the File entity adding the real exif, changing the URL to the source (same size, same checksum) and moves the file to its right place under as:image or as:audio, who knows. Now the question. This should be a setting? Or are the "signs" enough to try one more time and get flowers, at midnight, at the gas station (or from someones front yard) ?

@alliomeria ping. This would be just great. Bc we could re-process (simply save) ADOs that have this issue instead of patching?

@DiegoPino DiegoPino added enhancement New feature or request question Further information is requested JSON Postprocessors Drupal Plugins that do stuff with JSON data Events and Subscriber Digital Preservation Configuration Danger Mr Robinson Things so core to us that need extra care. Please submit automated tests? labels Sep 28, 2023
@DiegoPino DiegoPino added this to the 1.2.0 milestone Sep 28, 2023
@DiegoPino DiegoPino self-assigned this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Danger Mr Robinson Things so core to us that need extra care. Please submit automated tests? Digital Preservation enhancement New feature or request Events and Subscriber JSON Postprocessors Drupal Plugins that do stuff with JSON data question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant