-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resize2fs corrupts extents on ext4 filesystem (offline shrinking) #146
Comments
The ext4 image is available at https://github.com/viavi-ab/e2fsprogs/blob/master/ab_test/sanitized_e2.img.zst (~90MB). The script that reproduces the bug is sanitized_reproducer (requires unzstd to extract the raw image; uncompressed image is a 1.8TB sparse file). Attached are the output of debugfs |
Hi, I believe that this is connected to #145, I'm not a maintainer and don't have working solution, but maybe you could take a look at it. |
Hi @antmat, I think I've found the bug and am putting together a patch. |
Hi! Unfortunately I'm not a maintainer of this project - I'm just experiencing related issue. I believe @tytso could help. Maybe the right way is to apply patch on kernel mailing list - I don't know ¯_(ツ)_/¯ |
This patch modifies the logic of ext2fs_extent_set_bmap() so that: - Handling of "only block in extent" special case is removed - This affects files during a shrink with resize2fs; for example, when a file has an extent of 100 blocks and all 100 are relocated, the current (unpatched) code does this: - (ref.: inode_scan_and_fix() -> ext2fs_block_iterate3() -> process_block() which remaps each block and returns BLOCK_CHANGED to ext2fs_block_iterate3(), which then calls ext2fs_extent_set_bmap() to update the extent tree). - A new extent is inserted for the same lblk (because the pblk has moved), and the original extent is pushed back/shrunk by 1 block (start pblk is incremented by 1, e_len reduced by 1). - All other blocks in the extent (except the last one) are appended to the newly-inserted extent, and the original extent is pushed back/shrunk accordingly (one block at a time). - When it's time to process the last remaining block, the code notices an extent of e_len = 1 (the last one remaining with the old pblk) and "optimizes" this by rewriting it in place (instead of appending it to the new extent). - The end result is that *every* file that has an extent with e_len > 1 is *always* split into two extents (the second one consisting of a single block). - I think the intention here was to make the code more efficient by making the update in place, but the way resize2fs calls this function practically causes the number of extents to double up during a shrink; in my testcase a large file with ~8300 extents before the resize ends up with ~8200 after resize (compared with ~15k extents with the unpatched code). - More preference is given to growing the previous extent's end, by swapping two if blocks - This helps compacting the extents during resize2fs (swapping the blocks is necessary because of the removal above). - When an existing extent is shrunk and its e_len becomes zero, it is deleted. - I am not clear if an extent with length == 0 is valid; judging from https://www.mail-archive.com/[email protected]/msg20364.html, probably not. - Without this change, "extent.e_len += (extent.e_lblk - start)" in ext2fs_extent_fix_parents() may overflow and throw the end of the extent way outside the size of the filesystem. Check with the e2 image referenced in this bug. In summary, this patch fixes tytso#146 and optimizes the allocation of extents after a shrinking resize2fs. Signed-off-by: Andrea Biardi <[email protected]>
Apologies @antmat , I thought you were involved in the project. I emailed @tytso directly. |
Hello maintainers, @tytso, is there any update on this bug? it's been open for over a year, there's a reproducer and a patch, is there a chance to see the fix in the next e2fsprogs release? |
I have come across a situation where resize2fs corrupts an ext4 filesytem and I can reproduce it reliably.
I have captured an image file with "e2image -r" prior to the resize and have put together a script that reproduces the bug on 1.47.0 (I can't seem to upload the large file to github, so I have cloned this repo and committed my test files at https://github.com/viavi-ab/e2fsprogs/tree/master/ab_test).
The ext4 image that I'm resizing is from a working server that, prior to the resize, had no filesystem problems.
This is an offline resize (shrink) of a large (2TB) filesystem containing a single (very large) sparse file scattered across ~8000 extents (the original image was sanitized, all other files removed, and then zerofree'ed for better compression and to hide data that I'm not allowed to share).
My script goes like this:
The filesystem passes the first e2fsck, resize2fs reports no problems, yet the second e2fsck complains (amongst other things) about:
Bug does not depend on the presence of other files or the amount of free space in my image, but seems to be related to the particular dispersion of extents of that particular file.
I think what is happening is that as resize2fs is relocating data blocks, it somehow trips on some of the extents resulting in the situation shown above (note how the two logical blocks are duplicated; also, the large physical block numbers were once within the limits of the filesystem, but after resize they have become invalid).
I may be wrong, but looking at the extents allocated to the file, it seems like it has two particular interior extents (level 1) that, after resize, end up with no leaf extents (level 2).
Bug was initially found on e2fsprogs 1.45.6 (stock ubuntu 22 package) but can be reproduced on fresh binaries compiled from the latest e2fsprogs source (1.47.0).
Rearranging the extents of the file inside the image seems to work around the problem; for example, any of these will allow resize2fs to do its work without corrupting the filesystem:
If the bug is in the way resize2fs handles the extents, however, neither of these completely rules out the possibility of corrupting the filesystem anyway.
The text was updated successfully, but these errors were encountered: