All of lore.kernel.org
 help / color / mirror / Atom feed
* Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
@ 2016-09-19 14:19 Nix
  2016-09-20  5:52 ` Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nix @ 2016-09-19 14:19 UTC (permalink / raw)
  To: Linux FS Maling List; +Cc: daeho.jeong, Darrick J. Wong

So I ran into spurious metadata corruption warnings in v4.7.2 due to the
problem fixed by c9274d8. I applied an early version of the fix,
rebooted, and oh dear root filesystem mount failure with invalid
checksum errors.

The problem persists in v4.7.4, as seen here in qemu emulation on a raw
image dd'ed directly from the thing that won't boot, with a couple of
printk()s:

# mount /dev/vda /new-root/
[    8.124692] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
[    8.126977] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
[    9.017980] Inode size 256 > good old size 128; fits in inode: 0
[    8.134897] inode 8: provided: 5c50l; calculated: 36e1i
[    8.135098] EXT4-fs error (device vda): ext4_iget:4479: inode #8: comm mount: checksum invalid
[    8.138992] EXT4-fs (vda): no journal found
[    8.165744] UDF-fs: warning (device vda): udf_fill_super: No partition found (2)
mount: mounting /dev/vda on /new-root/ failed: Invalid argument

I added a bit of printking to show the failure of the journal inode
checksum to pass muster. e2fsck (from e2fsprogs 1.43.1-14) is quite
happy with this filesystem. Reverting c9274d8 makes everything happy
again (well, it does bring the original bug back, which is a rather
serious one, but other than that...):

[    9.823032] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
[    9.824647] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
[    9.832593] inode 8: provided: 5c50l; calculated: 5c50i
[    9.839253] inode 2: provided: d6ea92e9l; calculated: d6ea92e9i
[    9.846947] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)

So c9274d8 is clearly messing up the calculation of the checksum.

The problem becomes more evident if we add more printk()s to the old
code, so we can see what region is being checksummed:

# mount /dev/vda /new-root
[    6.827297] inode 8: unadjusted csum of 256 bytes with seed a5df92a7: 449a5c50
[    6.827596] adjusted csum: 5c50
[    6.835993] inode 2: unadjusted csum of 256 bytes with seed 759c6c33: d6ea92e9
[    6.836173] adjusted csum: d6ea92e9
[    6.844801] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts:

and the new:

[   11.098013] inode 8: csum of first 124 bytes with seed a5df92a7: f375b663
[   11.098205] inode 8: added csum of 2 dummy_csum bytes with seed a5df92a7: 20cfebcb
[   11.098420] inode 8: added csum of 2 bytes from offset 126 -- 128 to existing: d79e7432
[   11.098646] inode 8: > GOOD_OLD_INODE_SIZE; added csum of 2 bytes from 128 -- 130 to existing: d10936e1
[   11.098890] 8: adjusted csum: 36e1
[   11.099133] EXT4-fs error (device vda): ext4_iget:4483: inode #8: comm mount: checksum invalid

We are not checksumming enough bytes! We used to checksum the entire
256-byte inode: now, we checksum only 130 bytes of it, which isn't even
enough to cover the 28-byte extra_isize on this filesystem and is more
or less guaranteed to give the wrong answer. I'd fix the problem, but I
frankly can't see how the new code is meant to be equivalent to the old
code in any sense -- most particularly what the stuff around dummy_csum
is meant to do -- so I thought it better to let the people who wrote it
fix it :)

tune2fs output for this filesystem, particularly the extra_isize and
inode size fields are likely relevant:

tune2fs 1.43.1 (08-Jun-2016)
Filesystem volume name:   root
Last mounted on:          /
Filesystem UUID:          6c0f7fa7-d6c2-4054-bff3-3a878460bdc7
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              65536
Block count:              262144
Reserved block count:     13107
Free blocks:              227009
Free inodes:              59499
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      63
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stripe width:        16
Flex block group size:    64
Filesystem created:       Tue May 26 21:28:46 2009
Last mount time:          Sun Sep 18 23:34:41 2016
Last write time:          Mon Sep 19 13:51:59 2016
Mount count:              0
Maximum mount count:      36
Last checked:             Mon Sep 19 13:51:59 2016
Check interval:           15552000 (6 months)
Next check after:         Sat Mar 18 12:51:59 2017
Lifetime writes:          16 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f1da2da0-057e-4ba0-a021-3d56db5b24ab
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x92acf115

This is an old, upgraded fs from before the days of checksums, but even
so I'm surprised that with a 256-byte inode and no xattrs in use,
EXT4_FITS_IN_INODE is false. Maybe the extra_isize isn't big enough?

An lzipped e2image of the problematic filesystem is available from
<http://www.esperi.org.uk/~nix/temporary/csum-corruption.img.lz>:
I have verified that the problem recurs with this image.

I can also replicate the problem in literally seconds if you need more
debugging output.


... The mystery is why this isn't going wrong anywhere else: I have
metadata_csum working on every fs on a bunch of other ext4-using
systems, and indeed on every filesystem on this machine as well, as long
as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
fses with the same inode size and extra_isize, but they work...

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
  2016-09-19 14:19 Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed? Nix
@ 2016-09-20  5:52 ` Darrick J. Wong
  2017-01-05 11:09   ` Nick Alcock
  2017-01-15 17:34   ` Nick Alcock
       [not found] ` <CGME20160920055236epcas1p191f8fd252e18f762fa3ad9704f5e77e2@epcas1p1.samsung.com>
  2016-09-24 20:20 ` Nix
  2 siblings, 2 replies; 6+ messages in thread
From: Darrick J. Wong @ 2016-09-20  5:52 UTC (permalink / raw)
  To: Nix, Theodore Ts'o; +Cc: Linux FS Maling List, daeho.jeong, linux-ext4

[cc Ted and the ext4 list]

On Mon, Sep 19, 2016 at 03:19:03PM +0100, Nix wrote:
> So I ran into spurious metadata corruption warnings in v4.7.2 due to the
> problem fixed by c9274d8. I applied an early version of the fix,
> rebooted, and oh dear root filesystem mount failure with invalid
> checksum errors.
> 
> The problem persists in v4.7.4, as seen here in qemu emulation on a raw
> image dd'ed directly from the thing that won't boot, with a couple of
> printk()s:
> 
> # mount /dev/vda /new-root/
> [    8.124692] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
> [    8.126977] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
> [    9.017980] Inode size 256 > good old size 128; fits in inode: 0
> [    8.134897] inode 8: provided: 5c50l; calculated: 36e1i
> [    8.135098] EXT4-fs error (device vda): ext4_iget:4479: inode #8: comm mount: checksum invalid
> [    8.138992] EXT4-fs (vda): no journal found
> [    8.165744] UDF-fs: warning (device vda): udf_fill_super: No partition found (2)
> mount: mounting /dev/vda on /new-root/ failed: Invalid argument
> 
> I added a bit of printking to show the failure of the journal inode
> checksum to pass muster. e2fsck (from e2fsprogs 1.43.1-14) is quite
> happy with this filesystem. Reverting c9274d8 makes everything happy
> again (well, it does bring the original bug back, which is a rather
> serious one, but other than that...):
> 
> [    9.823032] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
> [    9.824647] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
> [    9.832593] inode 8: provided: 5c50l; calculated: 5c50i
> [    9.839253] inode 2: provided: d6ea92e9l; calculated: d6ea92e9i
> [    9.846947] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
> 
> So c9274d8 is clearly messing up the calculation of the checksum.
> 
> The problem becomes more evident if we add more printk()s to the old
> code, so we can see what region is being checksummed:
> 
> # mount /dev/vda /new-root
> [    6.827297] inode 8: unadjusted csum of 256 bytes with seed a5df92a7: 449a5c50
> [    6.827596] adjusted csum: 5c50
> [    6.835993] inode 2: unadjusted csum of 256 bytes with seed 759c6c33: d6ea92e9
> [    6.836173] adjusted csum: d6ea92e9
> [    6.844801] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts:
> 
> and the new:
> 
> [   11.098013] inode 8: csum of first 124 bytes with seed a5df92a7: f375b663
> [   11.098205] inode 8: added csum of 2 dummy_csum bytes with seed a5df92a7: 20cfebcb
> [   11.098420] inode 8: added csum of 2 bytes from offset 126 -- 128 to existing: d79e7432
> [   11.098646] inode 8: > GOOD_OLD_INODE_SIZE; added csum of 2 bytes from 128 -- 130 to existing: d10936e1
> [   11.098890] 8: adjusted csum: 36e1
> [   11.099133] EXT4-fs error (device vda): ext4_iget:4483: inode #8: comm mount: checksum invalid
> 
> We are not checksumming enough bytes! We used to checksum the entire
> 256-byte inode: now, we checksum only 130 bytes of it, which isn't even
> enough to cover the 28-byte extra_isize on this filesystem and is more
> or less guaranteed to give the wrong answer. I'd fix the problem, but I
> frankly can't see how the new code is meant to be equivalent to the old
> code in any sense -- most particularly what the stuff around dummy_csum
> is meant to do -- so I thought it better to let the people who wrote it
> fix it :)

Sh*t, I missed this during the review.  So your filesystem image (thank
you!) had this to say:

debugfs> stats
Inode size:               256
debugfs> stat <8>
Size of extra inode fields: 0

Basically, a 128-byte inode inside a filesystem that allocated 256 bytes
for each inode.  As you point out, the old code would checksum the entire
allocated space, whether or not the inode core used it.  Obviously, you
want this since inline extended attributes live in that space:

csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
		   EXT4_INODE_SIZE(inode->i_sb));

The new code, on the other hand, carefully checksums around the
i_checksum fields and only bothers to checksum the space between the end
of i_checksum_hi and the end of the allocated space if the inode core is
big enough to store i_checksum_hi.  Since we allocated 256 bytes for
each inode, we checksum the first two bytes after byte 128
(EXT4_GOOD_OLD_INODE_SIZE), but then we see that i_extra_size == 0 so we
never bother to checksum anything after that.  This is of course wrong
since we no longer checksum the xattr space and we've deviated from the
pre-4.7.4 (documented) on-disk format.

*FORTUNATELY* since the root inode fails the read verifier, the file (in
this case the root dir) can't be modified and therefore nothing has been
corrupted.  Especially fortunate for you, the fs won't mount, reducing
the chances that any serious damage has occurred.

I /think/ the fix in this case is to hoist the last ext4_chksum call
out of the EXT4_FITS_IN_INODE blob:

if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
	offset = offsetof(struct ext4_inode, i_checksum_hi);
	csum = ext4_chksum(sbi, csum, (__u8 *)raw +
			   EXT4_GOOD_OLD_INODE_SIZE,
			   offset - EXT4_GOOD_OLD_INODE_SIZE);
	if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
		csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
				   csum_size);
		offset += csum_size;
	}
	csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
			   EXT4_INODE_SIZE(inode->i_sb) - offset);
}

Can you give that a try?

> tune2fs output for this filesystem, particularly the extra_isize and
> inode size fields are likely relevant:
> 
> tune2fs 1.43.1 (08-Jun-2016)
> Filesystem volume name:   root
> Last mounted on:          /
> Filesystem UUID:          6c0f7fa7-d6c2-4054-bff3-3a878460bdc7
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> Filesystem flags:         signed_directory_hash
> Default mount options:    (none)
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              65536
> Block count:              262144
> Reserved block count:     13107
> Free blocks:              227009
> Free inodes:              59499
> First block:              0
> Block size:               4096
> Fragment size:            4096
> Group descriptor size:    64
> Reserved GDT blocks:      63
> Blocks per group:         32768
> Fragments per group:      32768
> Inodes per group:         8192
> Inode blocks per group:   512
> RAID stripe width:        16
> Flex block group size:    64
> Filesystem created:       Tue May 26 21:28:46 2009
> Last mount time:          Sun Sep 18 23:34:41 2016
> Last write time:          Mon Sep 19 13:51:59 2016
> Mount count:              0
> Maximum mount count:      36
> Last checked:             Mon Sep 19 13:51:59 2016
> Check interval:           15552000 (6 months)
> Next check after:         Sat Mar 18 12:51:59 2017
> Lifetime writes:          16 GB
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:               256
> Required extra isize:     28
> Desired extra isize:      28
> Journal inode:            8
> Default directory hash:   half_md4
> Directory Hash Seed:      f1da2da0-057e-4ba0-a021-3d56db5b24ab
> Journal backup:           inode blocks
> Checksum type:            crc32c
> Checksum:                 0x92acf115
> 
> This is an old, upgraded fs from before the days of checksums, but even
> so I'm surprised that with a 256-byte inode and no xattrs in use,
> EXT4_FITS_IN_INODE is false. Maybe the extra_isize isn't big enough?

It's zero, so yes. :)

> An lzipped e2image of the problematic filesystem is available from
> <http://www.esperi.org.uk/~nix/temporary/csum-corruption.img.lz>:
> I have verified that the problem recurs with this image.
> 
> I can also replicate the problem in literally seconds if you need more
> debugging output.
> 
> 
> ... The mystery is why this isn't going wrong anywhere else: I have
> metadata_csum working on every fs on a bunch of other ext4-using
> systems, and indeed on every filesystem on this machine as well, as long
> as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
> fses with the same inode size and extra_isize, but they work...

Hard to say, but this bug only affects inodes with 0 < i_extra_size <= 4
i.e. 128 < inode-core-size <= 130.  Maybe an old ext3 fs that was
created with 256 bytes per inode but inode-core-size of 128?

Uck.  Sorry about this mess.

--D

> 
> -- 
> NULL && (void)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
       [not found] ` <CGME20160920055236epcas1p191f8fd252e18f762fa3ad9704f5e77e2@epcas1p1.samsung.com>
@ 2016-09-20 13:35   ` 정대호
  0 siblings, 0 replies; 6+ messages in thread
From: 정대호 @ 2016-09-20 13:35 UTC (permalink / raw)
  To: Darrick J. Wong, Nix, Theodore Ts'o
  Cc: Linux FS Maling List, 정대호, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2408 bytes --]

Hi, Sorry to bother you.

> Basically, a 128-byte inode inside a filesystem that allocated 256 bytes
> for each inode.  As you point out, the old code would checksum the entire
> allocated space, whether or not the inode core used it.  Obviously, you
> want this since inline extended attributes live in that space:
 
> csum = ext4_chksum(sbi, ei->i_csum_seed, (__u8 *)raw,
>                   EXT4_INODE_SIZE(inode->i_sb));
 
> The new code, on the other hand, carefully checksums around the
> i_checksum fields and only bothers to checksum the space between the end
> of i_checksum_hi and the end of the allocated space if the inode core is
> big enough to store i_checksum_hi.  Since we allocated 256 bytes for
> each inode, we checksum the first two bytes after byte 128
> (EXT4_GOOD_OLD_INODE_SIZE), but then we see that i_extra_size == 0 so we
> never bother to checksum anything after that.  This is of course wrong
> since we no longer checksum the xattr space and we've deviated from the
> pre-4.7.4 (documented) on-disk format.

Oops. I had overlooked the case of that i_extra_size is less than 4.
I misunderstood the previous Darrick's codes calculating inode checksum value.
Sorry about that. :-(
 
> if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
>        offset = offsetof(struct ext4_inode, i_checksum_hi);
>        csum = ext4_chksum(sbi, csum, (__u8 *)raw +
>                           EXT4_GOOD_OLD_INODE_SIZE,
>                           offset - EXT4_GOOD_OLD_INODE_SIZE);
>        if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
>                csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
>                                   csum_size);
>                offset += csum_size;
>        }
>        csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
>                           EXT4_INODE_SIZE(inode->i_sb) - offset);
> }
> 
> Can you give that a try?

Darrick, your modification looks good enough to me.

Thank you, guys.

 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
  2016-09-19 14:19 Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed? Nix
  2016-09-20  5:52 ` Darrick J. Wong
       [not found] ` <CGME20160920055236epcas1p191f8fd252e18f762fa3ad9704f5e77e2@epcas1p1.samsung.com>
@ 2016-09-24 20:20 ` Nix
  2 siblings, 0 replies; 6+ messages in thread
From: Nix @ 2016-09-24 20:20 UTC (permalink / raw)
  To: linux-ext4, tytso, adilger.kernel; +Cc: daeho.jeong

[Resent with a few more people Cc:ed in and email addresses corrected,
 since I made the mistake of using the addresses on the original
 commits, which are in some cases no longer valid or simply the wrong
 list entirely.]

So I ran into spurious metadata corruption warnings in v4.7.2 due to the
problem fixed by c9274d8. I applied an early version of the fix,
rebooted, and oh dear root filesystem mount failure with invalid
checksum errors.

The problem persists in v4.7.4, as seen here in qemu emulation on a raw
image dd'ed directly from the thing that won't boot, with a couple of
printk()s:

# mount /dev/vda /new-root/
[    8.124692] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
[    8.126977] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
[    9.017980] Inode size 256 > good old size 128; fits in inode: 0
[    8.134897] inode 8: provided: 5c50l; calculated: 36e1i
[    8.135098] EXT4-fs error (device vda): ext4_iget:4479: inode #8: comm mount: checksum invalid
[    8.138992] EXT4-fs (vda): no journal found
[    8.165744] UDF-fs: warning (device vda): udf_fill_super: No partition found (2)
mount: mounting /dev/vda on /new-root/ failed: Invalid argument

I added a bit of printking to show the failure of the journal inode
checksum to pass muster. e2fsck (from e2fsprogs 1.43.1-14) is quite
happy with this filesystem. Reverting c9274d8 makes everything happy
again (well, it does bring the original bug back, which is a rather
serious one, but other than that...):

[    9.823032] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
[    9.824647] EXT4-fs (vda): couldn't mount as ext2 due to feature incompatibilities
[    9.832593] inode 8: provided: 5c50l; calculated: 5c50i
[    9.839253] inode 2: provided: d6ea92e9l; calculated: d6ea92e9i
[    9.846947] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)

So c9274d8 is clearly messing up the calculation of the checksum.

The problem becomes more evident if we add more printk()s to the old
code, so we can see what region is being checksummed:

# mount /dev/vda /new-root
[    6.827297] inode 8: unadjusted csum of 256 bytes with seed a5df92a7: 449a5c50
[    6.827596] adjusted csum: 5c50
[    6.835993] inode 2: unadjusted csum of 256 bytes with seed 759c6c33: d6ea92e9
[    6.836173] adjusted csum: d6ea92e9
[    6.844801] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts:

and the new:

[   11.098013] inode 8: csum of first 124 bytes with seed a5df92a7: f375b663
[   11.098205] inode 8: added csum of 2 dummy_csum bytes with seed a5df92a7: 20cfebcb
[   11.098420] inode 8: added csum of 2 bytes from offset 126 -- 128 to existing: d79e7432
[   11.098646] inode 8: > GOOD_OLD_INODE_SIZE; added csum of 2 bytes from 128 -- 130 to existing: d10936e1
[   11.098890] 8: adjusted csum: 36e1
[   11.099133] EXT4-fs error (device vda): ext4_iget:4483: inode #8: comm mount: checksum invalid

We are not checksumming enough bytes! We used to checksum the entire
256-byte inode: now, we checksum only 130 bytes of it, which isn't even
enough to cover the 28-byte extra_isize on this filesystem and is more
or less guaranteed to give the wrong answer. I'd fix the problem, but I
frankly can't see how the new code is meant to be equivalent to the old
code in any sense -- most particularly what the stuff around dummy_csum
is meant to do -- so I thought it better to let the people who wrote it
fix it :)

tune2fs output for this filesystem, particularly the extra_isize and
inode size fields are likely relevant:

tune2fs 1.43.1 (08-Jun-2016)
Filesystem volume name:   root
Last mounted on:          /
Filesystem UUID:          6c0f7fa7-d6c2-4054-bff3-3a878460bdc7
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              65536
Block count:              262144
Reserved block count:     13107
Free blocks:              227009
Free inodes:              59499
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      63
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stripe width:        16
Flex block group size:    64
Filesystem created:       Tue May 26 21:28:46 2009
Last mount time:          Sun Sep 18 23:34:41 2016
Last write time:          Mon Sep 19 13:51:59 2016
Mount count:              0
Maximum mount count:      36
Last checked:             Mon Sep 19 13:51:59 2016
Check interval:           15552000 (6 months)
Next check after:         Sat Mar 18 12:51:59 2017
Lifetime writes:          16 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      f1da2da0-057e-4ba0-a021-3d56db5b24ab
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x92acf115

This is an old, upgraded fs from before the days of checksums, but even
so I'm surprised that with a 256-byte inode and no xattrs in use,
EXT4_FITS_IN_INODE is false. Maybe the extra_isize isn't big enough?

An lzipped e2image of the problematic filesystem is available from
<http://www.esperi.org.uk/~nix/temporary/csum-corruption.img.lz>:
I have verified that the problem recurs with this image.

I can also replicate the problem in literally seconds if you need more
debugging output.


... The mystery is why this isn't going wrong anywhere else: I have
metadata_csum working on every fs on a bunch of other ext4-using
systems, and indeed on every filesystem on this machine as well, as long
as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
fses with the same inode size and extra_isize, but they work...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
  2016-09-20  5:52 ` Darrick J. Wong
@ 2017-01-05 11:09   ` Nick Alcock
  2017-01-15 17:34   ` Nick Alcock
  1 sibling, 0 replies; 6+ messages in thread
From: Nick Alcock @ 2017-01-05 11:09 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Theodore Ts'o, Linux FS Maling List, daeho.jeong, linux-ext4

[Aside: This got misfiltered into a mailbox with a typo in the name,
 more or less electronic oblivion. I just recovered it while preparing
 to migrate to notmuch. Sorry for the huge delay.]

On 20 Sep 2016, Darrick J. Wong stated:

> [cc Ted and the ext4 list]
>
> On Mon, Sep 19, 2016 at 03:19:03PM +0100, Nix wrote:
>> We are not checksumming enough bytes! We used to checksum the entire
>> 256-byte inode: now, we checksum only 130 bytes of it, which isn't even
>> enough to cover the 28-byte extra_isize on this filesystem and is more
>> or less guaranteed to give the wrong answer. I'd fix the problem, but I
>> frankly can't see how the new code is meant to be equivalent to the old
>> code in any sense -- most particularly what the stuff around dummy_csum
>> is meant to do -- so I thought it better to let the people who wrote it
>> fix it :)
[...]
> Sh*t, I missed this during the review.  So your filesystem image (thank
> you!) had this to say:

e2image is *such* a good program. :)

> debugfs> stats
> Inode size:               256
> debugfs> stat <8>
> Size of extra inode fields: 0
>
> Basically, a 128-byte inode inside a filesystem that allocated 256 bytes
> for each inode.

This was due to the change in mke2fs defaults catching me out, long ago:
I was assuming a 128-byte inode was the default, but it wasn't, or I'd
have changed things and avoided wasting the space...

I tried using inline data to save the space again (again, in the v4.7.x
period) and had a bit of a disaster a few weeks later: running dovecot's
configure with the srcdir and objdir on an inline data filesystem leads
to an oops shortly afterwards and massive data corruption on remount.
I'll whip up a replication case for this fairly soon. I suspect shared
writable mmap is being evil again.

> never bother to checksum anything after that.  This is of course wrong
> since we no longer checksum the xattr space and we've deviated from the
> pre-4.7.4 (documented) on-disk format.

... and of course that'll lead to checksum failures, and thanks to
metadata checksums this is instantly obvious! :)

> *FORTUNATELY* since the root inode fails the read verifier, the file (in
> this case the root dir) can't be modified and therefore nothing has been
> corrupted.  Especially fortunate for you, the fs won't mount, reducing
> the chances that any serious damage has occurred.

Even if it had, nothing too bad would result. There's a reason I do
hourly-and-daily backups, and keep new features switched well and truly
off on the backup filesystem!

> I /think/ the fix in this case is to hoist the last ext4_chksum call
> out of the EXT4_FITS_IN_INODE blob:

I'll give that a try this weekend. Sorry for the huge delay!

>> ... The mystery is why this isn't going wrong anywhere else: I have
>> metadata_csum working on every fs on a bunch of other ext4-using
>> systems, and indeed on every filesystem on this machine as well, as long
>> as c9274d8 is not applied. Many of them are similarly upgraded pre-csum
>> fses with the same inode size and extra_isize, but they work...
>
> Hard to say, but this bug only affects inodes with 0 < i_extra_size <= 4
> i.e. 128 < inode-core-size <= 130.  Maybe an old ext3 fs that was
> created with 256 bytes per inode but inode-core-size of 128?

It was ext4 from the start, created with -i 128, inode_size = 256.

> Uck.  Sorry about this mess.

Sorry I overlooked your rapid response for so long!

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed?
  2016-09-20  5:52 ` Darrick J. Wong
  2017-01-05 11:09   ` Nick Alcock
@ 2017-01-15 17:34   ` Nick Alcock
  1 sibling, 0 replies; 6+ messages in thread
From: Nick Alcock @ 2017-01-15 17:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Theodore Ts'o, Linux FS Maling List, daeho.jeong, linux-ext4

On 20 Sep 2016, Darrick J. Wong said:
> I /think/ the fix in this case is to hoist the last ext4_chksum call
> out of the EXT4_FITS_IN_INODE blob:
>
> if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
> 	offset = offsetof(struct ext4_inode, i_checksum_hi);
> 	csum = ext4_chksum(sbi, csum, (__u8 *)raw +
> 			   EXT4_GOOD_OLD_INODE_SIZE,
> 			   offset - EXT4_GOOD_OLD_INODE_SIZE);
> 	if (EXT4_FITS_IN_INODE(raw, ei, i_checksum_hi)) {
> 		csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
> 				   csum_size);
> 		offset += csum_size;
> 	}
> 	csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
> 			   EXT4_INODE_SIZE(inode->i_sb) - offset);
> }
>
> Can you give that a try?

Months too late, I can finally confirm that the patch below fixes it for
me (tested in 4.9.4), and I'm running with metadata csums on everywhere
again. :)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 27e4348..ed79c6e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -71,10 +71,10 @@ static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw,
 			csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum,
 					   csum_size);
 			offset += csum_size;
-			csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
-					   EXT4_INODE_SIZE(inode->i_sb) -
-					   offset);
 		}
+		csum = ext4_chksum(sbi, csum, (__u8 *)raw + offset,
+				   EXT4_INODE_SIZE(inode->i_sb) -
+				   offset);
 	}
 
 	return csum;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-15 17:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19 14:19 Trouble mounting metadata_csum ext4 filesystems with v4.7.x after c9274d891869880648c4ee9365df3ecc7ba2e285: not enough inode bytes checksummed? Nix
2016-09-20  5:52 ` Darrick J. Wong
2017-01-05 11:09   ` Nick Alcock
2017-01-15 17:34   ` Nick Alcock
     [not found] ` <CGME20160920055236epcas1p191f8fd252e18f762fa3ad9704f5e77e2@epcas1p1.samsung.com>
2016-09-20 13:35   ` 정대호
2016-09-24 20:20 ` Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.