Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-work image resizing #2101

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Re-work image resizing #2101

wants to merge 6 commits into from

Conversation

audiodude
Copy link
Member

@audiodude audiodude commented Nov 14, 2024

Re-work image sizing algorithm. It now enforces a maximum image width of 320px.

This value is chosen because it is a reasonable size on both mobile and desktop, but saves bandwidth compared to "full size" images (which was what the code might have been grabbing from data-data-original-file-src before). This should give us a nice tradeoff between efficiency and quality. We can always adjust this value later, of course, but the logic for retrieving and resizing the image URLs remains the same.

Of note, it takes the smallest of:

  • The src attribute set on the <span> itself that will become the image
  • The data-data-original-file-src attribute, of the original size the image was in the article
  • 320px, provided it can look at one of the previous two URLs and find a URL that is hackable to set the image size.

Tests have been added/adjusted.

EDIT:

Additionally, based on feedback in this thread, I have modified the algorithm to refrain from scaling any dimension smaller than 320px. So for large panorama images such as in the enwiki Paris article, the image does not get destroyed because it is so wide (because scaling to a width of 320px leaves a height of 50px)

… of 320px, assuming it can use URL hacking to generate the proper URL to download
@audiodude
Copy link
Member Author

Fixes #1925. Fixes #2071.

@audiodude
Copy link
Member Author

With this change, we see the following:

14272	output/wikipedia_bm_all_maxi_2024-11.113.zim
16536	output/wikipedia_bm_all_maxi_2024-11.114.zim

Scraping of BM with these settings results in a ZIM that is about 2 MB, or 15% bigger.

@audiodude
Copy link
Member Author

With image max size set to 280px, we save another half megabyte. Compared to the 1.13 ZIM, the size increases by about 11%.

14272	wikipedia_bm_all_maxi_2024-11.113.zim
15880	wikipedia_bm_all_maxi_2024-11.114-280.zim
16536	wikipedia_bm_all_maxi_2024-11.114.zim

@audiodude
Copy link
Member Author

audiodude commented Nov 14, 2024

By the way, I spent way too long looking through the repo here and even posted a phabricator ticket, but my intuition is that they were just setting max sizes and we should do the same.

@Jaifroid
Copy link
Collaborator

Just checking, the current max appears to be 264px (looking at full English Wikipedia). I quite like the larger 320px, and it is something users have been requesting... I suppose this needs weighing up carefully. 15% on ~100GB doesn't seem like too high a price to pay to me...

@audiodude
Copy link
Member Author

Just checking, the current max appears to be 264px (looking at full English Wikipedia). I quite like the larger 320px, and it is something users have been requesting... I suppose this needs weighing up carefully. 15% on ~100GB doesn't seem like too high a price to pay to me...

How did you find that value? Just by finding some 264 width images and assuming there's nothing bigger?

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 93.02326% with 3 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@7d95c2b). Learn more about missing BASE report.
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/renderers/wikimedia-mobile.renderer.ts 93.02% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2101   +/-   ##
=======================================
  Coverage        ?   74.71%           
=======================================
  Files           ?       41           
  Lines           ?     3188           
  Branches        ?      703           
=======================================
  Hits            ?     2382           
  Misses          ?      686           
  Partials        ?      120           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@Jaifroid
Copy link
Collaborator

How did you find that value? Just by finding some 264 width images and assuming there's nothing bigger?

Oh sorry, I did that a bit fast, and having checked, I've found some at 280px down the right-hand side, so no, no guarantee that that's the max, more an average! There are some much wider images centred in some pages and we'll have to be careful not to clobber those. For example here in the "Paris" article:

image

@audiodude
Copy link
Member Author

Okay, when investigating the Paris panorama, I found a bug, which I fixed and added a test for above. Now the sizes look like this:

14272	output/wikipedia_bm_all_maxi_2024-11.113.zim
14532	output/wikipedia_bm_all_maxi_2024-11.114-280.zim
15364	output/wikipedia_bm_all_maxi_2024-11.114-320.zim

That's a 1.8% increase for 280px width images, and a 7.65% increase for 320px width images. These numbers have gotten smaller because we're catching more images and squishing them to our max size.

@audiodude
Copy link
Member Author

And of course, as you predicted @Jaifroid, we get this:

image

Looks like we need to add logic to only scale in the dominant dimension.

@audiodude
Copy link
Member Author

Okay, added logic for scaling the smallest dimension to 320px, while keeping the aspect ratio.

This, unsurprisingly, gives us the biggest ZIM yet (.114A-320 is the current build)

14272	wikipedia_bm_all_maxi_2024-11.113.zim
14532	wikipedia_bm_all_maxi_2024-11.114-280.zim
15364	wikipedia_bm_all_maxi_2024-11.114-320.zim
15640	wikipedia_bm_all_maxi_2024-11.114A-320.zim

And

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants