-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ BUG ] Incorrect division into files when the limit is exceeded #327
Comments
It looks like this issue is If so, I nearly have a PR for enforcing size limits, I need to write unit tests, modify the spillover logic in the simple sitemap class, and no-op the size calc when not enabled as it does have a slight performance impact due to the need to UTF-8 encode the strings to measure the size in bytes of an entire item before it is written to the map so it can be rejected if it would put the map over size. |
@huntharo that would be awesome |
@derduher - What are your thoughts on how to handle the overflow? The most precise way to implement this is to set the byte limit to exactly 50 MB, then compute the size of an item when writing (done), and reject the write if that specific item would cause an overflow. The desire would be to reject that write, catch that rejection, close the file without the item, open a new file, and write the item again to a new file. However, it seems that throwing an exception from SitemapStream.write OR passing an error to the callback in SitemapStream.write causes the Transform to enter a state where it cannot be cleanly closed and flushed. Before digging deeply to figure out if this throw then close can work... there are other options:
I do not love either of these, but I'm not sure if we can throw an error from a Transform and still leave it in a completely usable state. |
The An example of how to use that to rotate files if an item would cause the size limit to be exceeded (even when writing to gzipped files which cause an issue when trying to prevent the last write to the file that would cause an overflow): |
For example, I set a limit of 50,000 links, as stated in the documentation https://www.sitemaps.org/protocol.html And I do get divided into files by the number of links, but I have a lot of locales on my project and the map looks like this
So, after splitting the files, the weight of each file is more than 50 MB. It seems that when setting the limit, you also need to take into account the weight of the file being created.
The text was updated successfully, but these errors were encountered: