All of lore.kernel.org
 help / color / mirror / Atom feed
* e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent".
@ 2017-05-22 23:20 Marc Thomas
  2017-05-23 15:41 ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Thomas @ 2017-05-22 23:20 UTC (permalink / raw)
  To: linux-ext4

Hi All,

I hope this is the correct place for ext4 / e2fsprogs bug reports. I
think this is a bug.

As per the title, I've discovered sparse files appear corrupt after
running "e2fsck -E bmap2extent" on a filesystem I'm migrating.
Tested with kernel.org 4.10.15 and 4.11.1, using e2fsprogs-1.43.4 on an
x86_64 system.

In short, I'm carrying out a data migration from a snapshot of an ext3
filesystem to new storage, then expanding the filesystem, converting (in
place) to ext4 (as per
https://ext4.wiki.kernel.org/index.php/UpgradeToExt4), before
de-fragmenting the new filesystem with e4defrag.

As part of the testing, I created md5sums of all the source files. After
a "dry run" of the migration, a minority of files (approx 30 out of
1095578 inodes) were found to be corrupt (ie the md5sum had changed).

I repeated the process a second time, checking the md5sums after each
step of the migration. It appears the corruption occurs after running
"e2fsck -E bmap2extent -fy" on the newly converted ext4 filesystem.

The one thing I can find in common is all the affected files are sparse
files.

Here's a few example files:

linux-2.6.30/arch/x86/kernel/acpi/realmode/wakeup.bin: FAILED
linux-2.6.30/arch/x86/boot/compressed/vmlinux.bin: FAILED
marc.old/.mozilla/firefox/ecyfs7l9.default/Cache/_CACHE_003_: FAILED
Linux4.x/linux-4.10.12/arch/x86/realmode/rm/realmode.bin: FAILED
Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin: FAILED

They are all definitely sparse files, eg:

$ ls -l Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin
-rwxr-xr-x 1 marc users 21080 Mar  8 12:03
Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin

$ du -k Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin
16      Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin

$ du --apparent -k Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin
21      Linux4.x/linux-4.10.1/arch/x86/realmode/rm/realmode.bin


Would you agree this is a bug? As I understand it, reading from an
"unpopulated" region of a sparse file should return all zeros - so the
md5sum should be the same before and after migration.

I'm happy to provide more details and do testing if required. I have an
offline copy of the source filesystem so can run the migration again.
Let me know if there's anything you need.

Thanks & Kind Regards,
Marc

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent".
  2017-05-22 23:20 e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent" Marc Thomas
@ 2017-05-23 15:41 ` Theodore Ts'o
  2017-05-23 16:25   ` Darrick J. Wong
  2017-05-23 17:41   ` Marc Thomas
  0 siblings, 2 replies; 5+ messages in thread
From: Theodore Ts'o @ 2017-05-23 15:41 UTC (permalink / raw)
  To: Marc Thomas; +Cc: linux-ext4

On Tue, May 23, 2017 at 12:20:59AM +0100, Marc Thomas wrote:
> Hi All,
> 
> Would you agree this is a bug? As I understand it, reading from an
> "unpopulated" region of a sparse file should return all zeros - so the
> md5sum should be the same before and after migration.

I agree that sparse files should be properly handled after bmap2extent
conversion.  This code hasn't received that much use or testing, but
I've tried to replicate the problem and I haven't succeeded.  This
script works for me:

----- cut here -----
#!/bin/bash

rm -f /tmp/foo.img
mke2fs -t ext3 -Fq /tmp/foo.img 65536
mount -o loop /tmp/foo.img /mnt
dd if=/etc/motd of=/mnt/test bs=4k >& /dev/null
dd if=/etc/motd of=/mnt/test conv=notrunc seek=32 bs=4k >& /dev/null
dd if=/etc/motd of=/mnt/test2 bs=4k >& /dev/null
dd if=/etc/motd of=/mnt/test2 conv=notrunc seek=1024 bs=4k >& /dev/null
lsattr /mnt/test*
md5sum /mnt/test* > /mnt/MD5SUMS
umount /mnt
e2fsck -fy -E bmap2extent /tmp/foo.img
mount -o loop /tmp/foo.img /mnt
md5sum -c /mnt/MD5SUMS
lsattr /mnt/test*
umount /mnt
----- cut here -----

Maybe you can come up with a simple repro case that fails for you?

Thanks,

						- Ted
						

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent".
  2017-05-23 15:41 ` Theodore Ts'o
@ 2017-05-23 16:25   ` Darrick J. Wong
  2017-05-23 17:41   ` Marc Thomas
  1 sibling, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2017-05-23 16:25 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Marc Thomas, linux-ext4

On Tue, May 23, 2017 at 11:41:52AM -0400, Theodore Ts'o wrote:
> On Tue, May 23, 2017 at 12:20:59AM +0100, Marc Thomas wrote:
> > Hi All,
> > 
> > Would you agree this is a bug? As I understand it, reading from an
> > "unpopulated" region of a sparse file should return all zeros - so the
> > md5sum should be the same before and after migration.
> 
> I agree that sparse files should be properly handled after bmap2extent
> conversion.  This code hasn't received that much use or testing, but
> I've tried to replicate the problem and I haven't succeeded.  This
> script works for me:
> 
> ----- cut here -----
> #!/bin/bash
> 
> rm -f /tmp/foo.img
> mke2fs -t ext3 -Fq /tmp/foo.img 65536
> mount -o loop /tmp/foo.img /mnt
> dd if=/etc/motd of=/mnt/test bs=4k >& /dev/null
> dd if=/etc/motd of=/mnt/test conv=notrunc seek=32 bs=4k >& /dev/null
> dd if=/etc/motd of=/mnt/test2 bs=4k >& /dev/null
> dd if=/etc/motd of=/mnt/test2 conv=notrunc seek=1024 bs=4k >& /dev/null
> lsattr /mnt/test*
> md5sum /mnt/test* > /mnt/MD5SUMS
> umount /mnt
> e2fsck -fy -E bmap2extent /tmp/foo.img
> mount -o loop /tmp/foo.img /mnt
> md5sum -c /mnt/MD5SUMS
> lsattr /mnt/test*
> umount /mnt
> ----- cut here -----
> 
> Maybe you can come up with a simple repro case that fails for you?

Or take an e2image of one of the affected filesystems so that the
developers can directly reproduce your error case. :)

e2image -Q /dev/sda1 /some/qcow/dump/file

(Preferably xz the dumpfile before uploading...)

--D

> 
> Thanks,
> 
> 						- Ted
> 						

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent".
  2017-05-23 15:41 ` Theodore Ts'o
  2017-05-23 16:25   ` Darrick J. Wong
@ 2017-05-23 17:41   ` Marc Thomas
  2017-05-23 18:42     ` Darrick J. Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Marc Thomas @ 2017-05-23 17:41 UTC (permalink / raw)
  To: linux-ext4; +Cc: Theodore Ts'o

Hi Ted,

Firstly, thank-you for taking the time to look into this.

I have modified your script with a "test-case" which fails for me every
time, as I noticed there is a file created by a Linux kernel build which
exhibits the problem (arch/x86/realmode/rm/realmode.bin) I think any
kernel version 3.10.x on x86_64 ought to work. I've also included a few
extra steps which I'm doing as part of the ext3 to ext4 conversion, and
added the "-b" (binary) flag to md5sum.

Here's the script. I'm not sure if it counts as a "simple" repro! :)


----- cut here -----
#!/bin/bash

rm -f /tmp/foo.img
mke2fs -t ext3 -Fq /tmp/foo.img 5G
mount -o loop /tmp/foo.img /mnt
dd if=/etc/motd of=/mnt/test bs=4k >& /dev/null
dd if=/etc/motd of=/mnt/test conv=notrunc seek=32 bs=4k >& /dev/null
dd if=/etc/motd of=/mnt/test2 bs=4k >& /dev/null
dd if=/etc/motd of=/mnt/test2 conv=notrunc seek=1024 bs=4k >& /dev/null
md5sum -b /mnt/test* > /mnt/MD5SUMS
lsattr /mnt/test*
( cd /mnt
xz -dc ~marc/LinuxStuff/Linux4.x/linux-4.11.1.tar.xz | tar xvf -
cd linux-4.11.1
make defconfig
time make -j 15 ) > /dev/null 2>&1
md5sum -b /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin >>
/mnt/MD5SUMS
lsattr /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
umount /mnt
tune2fs -O extents,uninit_bg,dir_index /tmp/foo.img
e2fsck -fy /tmp/foo.img
e2fsck -fy -E bmap2extent /tmp/foo.img
mount -o loop /tmp/foo.img /mnt
md5sum -c /mnt/MD5SUMS
lsattr /mnt/test*
lsattr /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
umount /mnt
----- cut here -----


The output of the script is:

time ./marc2.sh
------------------- /mnt/test
------------------- /mnt/test2
------------------- /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
tune2fs 1.43.4 (31-Jan-2017)
e2fsck 1.43.4 (31-Jan-2017)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/tmp/foo.img: 71184/327680 files (6.4% non-contiguous), 338237/1310720
blocks
e2fsck 1.43.4 (31-Jan-2017)
Pass 1: Checking inodes, blocks, and sizes
Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/tmp/foo.img: ***** FILE SYSTEM WAS MODIFIED *****
/tmp/foo.img: 71184/327680 files (6.4% non-contiguous), 334994/1310720
blocks
/mnt/test: OK
/mnt/test2: OK
/mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
--------------e---- /mnt/test
--------------e---- /mnt/test2
--------------e---- /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin

real    1m46.829s
user    17m41.611s
sys     1m39.268s

Thanks & Kind Regards,
Marc

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent".
  2017-05-23 17:41   ` Marc Thomas
@ 2017-05-23 18:42     ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2017-05-23 18:42 UTC (permalink / raw)
  To: Marc Thomas; +Cc: linux-ext4, Theodore Ts'o

On Tue, May 23, 2017 at 06:41:48PM +0100, Marc Thomas wrote:
> Hi Ted,
> 
> Firstly, thank-you for taking the time to look into this.
> 
> I have modified your script with a "test-case" which fails for me every
> time, as I noticed there is a file created by a Linux kernel build which
> exhibits the problem (arch/x86/realmode/rm/realmode.bin) I think any
> kernel version 3.10.x on x86_64 ought to work. I've also included a few
> extra steps which I'm doing as part of the ext3 to ext4 conversion, and
> added the "-b" (binary) flag to md5sum.
> 
> Here's the script. I'm not sure if it counts as a "simple" repro! :)

Aha!  There is a bug in e2fsck's bmap2extent converter that merges bmap
records that are physically but not logically contiguous.  I will
produce a patch + testcase shortly.

--D

> 
> 
> ----- cut here -----
> #!/bin/bash
> 
> rm -f /tmp/foo.img
> mke2fs -t ext3 -Fq /tmp/foo.img 5G
> mount -o loop /tmp/foo.img /mnt
> dd if=/etc/motd of=/mnt/test bs=4k >& /dev/null
> dd if=/etc/motd of=/mnt/test conv=notrunc seek=32 bs=4k >& /dev/null
> dd if=/etc/motd of=/mnt/test2 bs=4k >& /dev/null
> dd if=/etc/motd of=/mnt/test2 conv=notrunc seek=1024 bs=4k >& /dev/null
> md5sum -b /mnt/test* > /mnt/MD5SUMS
> lsattr /mnt/test*
> ( cd /mnt
> xz -dc ~marc/LinuxStuff/Linux4.x/linux-4.11.1.tar.xz | tar xvf -
> cd linux-4.11.1
> make defconfig
> time make -j 15 ) > /dev/null 2>&1
> md5sum -b /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin >>
> /mnt/MD5SUMS
> lsattr /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
> umount /mnt
> tune2fs -O extents,uninit_bg,dir_index /tmp/foo.img
> e2fsck -fy /tmp/foo.img
> e2fsck -fy -E bmap2extent /tmp/foo.img
> mount -o loop /tmp/foo.img /mnt
> md5sum -c /mnt/MD5SUMS
> lsattr /mnt/test*
> lsattr /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
> umount /mnt
> ----- cut here -----
> 
> 
> The output of the script is:
> 
> time ./marc2.sh
> ------------------- /mnt/test
> ------------------- /mnt/test2
> ------------------- /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
> tune2fs 1.43.4 (31-Jan-2017)
> e2fsck 1.43.4 (31-Jan-2017)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /tmp/foo.img: 71184/327680 files (6.4% non-contiguous), 338237/1310720
> blocks
> e2fsck 1.43.4 (31-Jan-2017)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 1E: Optimizing extent trees
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> 
> /tmp/foo.img: ***** FILE SYSTEM WAS MODIFIED *****
> /tmp/foo.img: 71184/327680 files (6.4% non-contiguous), 334994/1310720
> blocks
> /mnt/test: OK
> /mnt/test2: OK
> /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin: FAILED
> md5sum: WARNING: 1 computed checksum did NOT match
> --------------e---- /mnt/test
> --------------e---- /mnt/test2
> --------------e---- /mnt/linux-4.11.1/arch/x86/realmode/rm/realmode.bin
> 
> real    1m46.829s
> user    17m41.611s
> sys     1m39.268s
> 
> Thanks & Kind Regards,
> Marc
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-23 18:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-22 23:20 e2fsck bug? - Sparse files corrupt after "e2fsck -E bmap2extent" Marc Thomas
2017-05-23 15:41 ` Theodore Ts'o
2017-05-23 16:25   ` Darrick J. Wong
2017-05-23 17:41   ` Marc Thomas
2017-05-23 18:42     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.