Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphaned media rows created by subject location update #3835

Open
lcjohnso opened this issue Apr 26, 2022 · 1 comment
Open

Orphaned media rows created by subject location update #3835

lcjohnso opened this issue Apr 26, 2022 · 1 comment

Comments

@lcjohnso
Copy link
Member

While pushing an update to add_location() on the Python Client (see zooniverse/panoptes-python-client#240), I identified unexpected behavior related to how a subject's locations blob is handled on update / edit: any Zooniverse-hosted media items that are already associated to a subject at the time of the update are orphaned in the media table and rewritten as external links in new entries.

As I discuss in this comment, the source of this behavior is that any location that is submitted via update request as a hash (key = mime type, value = src/URL) are treated as external links by this code in the Subject model. This is the default situation for any existing media items -- including Zooniverse-hosted media on Azure.

The good thing: when media entries are orphaned, the underlying media files are not cleaned up. That's desired behavior because otherwise the newly-minted media "external link" media items would be dead links.

The downside: editing subject locations leads to orphaned media rows, and there are now media items treated as external links even when Zooniverse is hosting the data.

Other consideration: I've discussed with a few folks potential ways to improve the API behavior; here are a few details (among others) that make this complex:

  1. Note that it is good that the Client returns the valid media_url (as opposed to the raw media src) so that project teams can readily access the hosted files without having to guess or (re)construct the URL. However, this makes it more difficult to differentiate between external links and hosted files, even if that difference is clear from the media table's src value.
  2. There are cases where media items might only need to be edited to change ordering rather than adding or changing src values or number of frames. In those cases the only thing that needs to change is the media entries metadata value (which is a hash: index: 0 or similar), but this case makes the creation of a "alter location"/"add frame" end point less appealing.

For now, this is not desired behavior, but the current behavior is workable.

@camallen @zwolf Let me know if you have any further thoughts on this issue, but for now I'll plan for this to remain as-is.

@camallen
Copy link
Contributor

camallen commented Apr 27, 2022

After reading the background issues and the Python client and API source it appears like the API is behaving correctly as it respects the payload URL format and coverts to using external media resource links. I'd note that the API implementation is limited as we never envisioned the re-use media resource use case.

What I'm not sure about is the client choosing the implementation of sending the existing Media URLs, I think this was chosen as it's the only information the client has to 're-use' the existing media uploaded media resources.

What I think we really wanted available for the client is the ability to send exiting location media resource IDs combined as well as the ability to send remote URLs and or new media resource files, however that doesn't exist in the API and the location media IDs aren't serialized on the subject resource.

example API payload structure to allow reusing existing media IDs

{
  subjects: {
    locations: [ 
      '12',  # <- existing media resource ID
      'image/jpeg', # <- new mime content for creating new media resource
      { 'image/jpeg': 'https://example.com/file.jpeg' } # <- remote URL
    ]
  }
}

example client for re-using existing media resource IDs

# reuse an existing subject location media resource 
subject.add_location(3)
# add a local file
subject.add_location('my_file.jpeg')
# add an external link
subject.add_location({ 'image/jpeg': 'https://example.com/file.jpeg' })

However we don't serialize the media resource IDs in the subject response, rather we only serialize the resource media locations mime type and URL. Any change to this forma may be a breaking one and needs to be researched to ensure we don't break existing code or we add a new optional field to serialize this information to avoid breaking the existing behaviours.

# current subject locations response object
"locations": [
  { "image/jpeg": "https://panoptes-uploads.zooniverse.org/subject_location/6f76602f-603b-4106-ba50-2003181daedb.jpeg" }
]

# required subject locations response object (possible breaking change to the API response object format!)
"locations": [
  { 
    "media_id": 3, 
    "image/jpeg": "https://panoptes-uploads.zooniverse.org/subject_location/6f76602f-603b-4106-ba50-2003181daedb.jpeg" 
  }
]

TBH - i'm not really phased if the current implementation stands and the client uses the "convert to remote links" with the orphan existing media resources side effect. Due to the limited use of this feature, it may be easier to punt any work on changing to the API to allow re-using existing media resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants