Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs for attachments are stored with expiry token at end of URL #252

Open
elrayle opened this issue Mar 10, 2020 · 3 comments
Open

URLs for attachments are stored with expiry token at end of URL #252

elrayle opened this issue Mar 10, 2020 · 3 comments

Comments

@elrayle
Copy link
Contributor

elrayle commented Mar 10, 2020

Background

URLs are being saved after attachment upload in the form...

https://_BUCKET_URL_/uploads/spotlight/attachment/file/_FILEID_/_FILENAME_?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIOIH5APZJOF4GQNA%2F20200309%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200309T205909Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=730558b83e40924a38a2b48f97e70e6885b705cd17c27c1593698eabc0dd0063

They should be stored without the expiry token in the form...

https://_BUCKET_URL_/uploads/spotlight/attachment/file/_FILEID_/_FILENAME_

With the expiry token removed, access is determined on the ACL associated with the file.

Existing Patch

PR #251 created a patch to remove everything from the question mark on to the end. This fixes the display of the images, but not the storing of the image URLs in the database.

To Reproduce

  • edit a page
  • click + to add a widget
  • select Uploaded Item Row
  • add an image
  • save the page
  • right click the image on the page and display in new tab
  • click on the URL in the address field and see whether it has the expiry token appended

NOTE: The value of stored in spotlight_attachments table only includes the FILENAME. The full URL is stored in the content field of the page that uses the attachment which is in the spotlight_pages table.

Acceptance Requirements

This issue will be fixed when the expiry tokens are no longer saved on the attachment URL. Once fixed, any attachments that already have the expiry token saved as part of the URL will need to be manually adjusted to remove the expiry token.

Manual Repair

You can find all pages with a URL with the expiry token by searching via mysql with...

select id, type, title, slug, content from spotlight_pages where content like '%X-Amz-Algorithm%';

You can copy the content to an editor and remove everything from the ? to the end of the URL. To be able to use the sql update command to set the content, you will also have to escape the following characters.

  • replace ' with ''
  • replace \ with \\

Related Work

PR #251 patch s3 access to remove expiry token from attachment URLs

@elrayle elrayle added the bug label Apr 2, 2020
@chrisrlc
Copy link
Collaborator

select count() from spotlight_pages where content like '%X-Amz-Algorithm%';
+----------+
| count(
) |
+----------+
| 125 |
+----------+
1 row in set (0.07 sec)

Looks like there are still pages with these URLs. Can certainly do cleanup, but it looks like this might still be occurring. The newest created page with an attachment upload that matches this format is from earlier this month. Not sure yet what the user-facing impact is.

@chrisrlc
Copy link
Collaborator

Seems like this bug is coming back (or just never left?). From the user perspective, this doesn't seem to cause issues unless viewing an uploaded image with the "View Larger" button.

I've fixed it for the uploaded images that are using the View Larger button here: https://exhibits.library.cornell.edu/blackprint-WIP/about/behind-the-scenes

Fixed by ssh-ing into prod server and opened a rails console:

page = Spotlight::Page.find_by_slug('behind-the-scenes')
page.content
# Found the item in the content array that had the problem data, in this case with index 7
# Loop through all the urls with issues, example:
page.content[7].item['file_0']['url'] = page.content[7].item['file_0']['url'].gsub(/\?.*$/, '')
# Then save when done, the following looks strange but triggers the actual model update:
page.content = page.content
page.save!

But there's still 141 pages to go:
select p.id, p.title, p.type, p.slug, e.slug from spotlight_pages p join spotlight_exhibits e on e.id = p.exhibit_id where p.content like '%X-Amz-Algorithm%' order by e.created_at desc;

Also, is this always an issue for uploaded images? Seems like it was fixed previously, what changed since then? Needs more investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants