linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
@ 2019-12-05  9:20 Viliam Lejcik
  0 siblings, 0 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-05  9:20 UTC (permalink / raw)
  To: linux-ext4

We provide a custom Linux distribution, based on yocto-project (poky 2.6.1). With bitbake we've built an image, which becomes corrupted during installation to the SSD of the embedded device. We're setting the filesystem UUID (not partition UUID) using tune2fs, so the bootloader can find it. We noticed this problem because we found a directory that couldn't be read.

| root@board:~# dir /var/lib/opkg/info
| ls: reading directory '/var/lib/opkg/info': Bad message total 0

This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Firstly, let check that the image has been built correctly:

| root@board:~# fsck.ext4 -fn core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Pass 3: Checking directory connectivity
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

Then we want to set UUID to fs (random one for this example) with tune2fs:

| root@board:~# tune2fs -U random core-image.ext4
| tune2fs 1.44.1 (24-Mar-2018)
| Setting UUID on a checksummed filesystem could take some time.
| Proceed anyway (or wait 5 seconds to proceed) ? (y,N) y
|
| This operation requires a freshly checked filesystem.
|
| Please run e2fsck -fD on the filesystem.

It says that on a checksummed fs all metadata blocks have to be rewritten ('metadata_csum' fs features flag set in superblock), what failed somewhere in between ('not clean' fs state in superblock). We can fix it with fsck:

| root@board:~# fsck.ext4 -fy core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Problem in HTREE directory inode 177: internal node fails checksum.
| Clear HTree index? yes
|
| Pass 3: Checking directory connectivity
| Pass 3A: Optimizing directories
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
|
| core-image.ext4: ***** FILE SYSTEM WAS MODIFIED *****
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

If I rerun tune2fs on the same fixed image, it corrupts it again.

Let have a deeper look to the corrupted inode 177 - it is path /var/lib/opkg/info/, and there's 2712 files under it. Here is its HTREE structure:

| root@board:~# debugfs -R "htree_dump /var/lib/opkg/info" core-image.ext4
| Root node dump:
|  Reserved zero: 0
|  Hash Version: 1
|  Info length: 8
|  Indirect levels: 1
|  Flags: 0
| Number of entries (count): 1
| Number of entries (limit): 123
| Checksum: 0x8dc1e2db
| Entry #0: Hash 0x00000000, block 127
|
| Entry #0: Hash 0x00000000, block 127
| Number of entries (count): 126
| Number of entries (limit): 126
| Checksum: 0x9e54b5c7
| Entry #0: Hash 0x00000000, block 1
| Entry #1: Hash 0x01bddbe0, block 2
| ...
| Entry #124: Hash 0xfd55ab30, block 125
| Entry #125: Hash 0xffa96492, block 126
|
| Entry #0: Hash 0x00000000, block 1
| Reading directory block 1, phys 17863
| 1650 0x000f9ee0-bc4ace72 (52) perl-module-tap-parser-sourcehandler.list
| 1228 0x001d06e8-ada99897 (40) perl-module-net-servent.control
| ...
| 2762 0xff77b492-a9b98e31 (228) perl-module-json-pp.control
| leaf block checksum: 0xeccb004d
| Entry #125: Hash 0xffa96492, block 126
| Reading directory block 126, phys 247938
| 1390 0xffa96492-73841561 (36) lmsensors-sensors.control
| 1022 0xffaf73b4-a6f75b1b (976) perl-module-bytes.control
| leaf block checksum: 0x0f9c8092
| ---------------------

The problem for tune2fs is the "Number of entries", when count==limit (126). In this case it fails within the following 'if' statement:
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/tune2fs.c#n544

Then it prints out error, sets 'not clean' fs state in superblock, and exits. What fsck does, it recomputes checksums, sets 'clean' fs state, and that's all. It doesn't change number of entries, count+limit stays the same (126). So that's why rerunning tune2fs corrupts the fs again.

And here is the question - how it should behave correctly? Who's responsible for this issue?
- tune2fs - should it ignore the 'if' statement? (I tried to comment it out and tune2fs then did its job successfully, proved with fsck),
- fsck - should it rebuild the dir, as stated in the comment above the 'if' statement? (htree block is full then rebuild the dir),
- mkfs - should it not build the image with full number of entries? (count==limit).

This issue is not related to the used version of e2fsprogs (1.44.1), I compiled and tried out versions 1.43 - 1.45.4 and they behave the same way. I also tried to generate other images (such as core-image-minimal), but no one else lead to the corruption. If needed, I may give you access to the corrupted image for further investigation.

I'm not expert in ext4, so I'd appreciate any advice. Thank you.

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
  2019-12-06  3:51 ` Theodore Y. Ts'o
@ 2019-12-09 14:44   ` Viliam Lejcik
  0 siblings, 0 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-09 14:44 UTC (permalink / raw)
  To: linux-ext4

Hi Theodore,

Thank you for your analysis. As you suggested, finally we decided to increase block size to 4K, and it seems that problem is resolved.

By default, yocto/bitbake do not configure the block size, so that mkfs.ext4 used default value - 1K. We fixed it like here: (see the last line)
https://git.yoctoproject.org/cgit/cgit.cgi/meta-qcom/tree/conf/machine/dragonboard-820c.conf?h=master#n26

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.
-----Original Message-----
From: Theodore Y. Ts'o <tytso@mit.edu>
Sent: Friday, 6. December 2019 4:51
To: Lejcik Viliam <Viliam.Lejcik@kistler.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image

On Thu, Dec 05, 2019 at 12:36:35PM +0000, Viliam Lejcik wrote:
>
> The problem for tune2fs is "Number of entries", when count==limit
> (126). In this case it fails within the following 'if' statement:
> https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/tune2fs.c#n544
>
> Then it prints out error, sets 'not clean' fs state in superblock,
> and exits. What fsck does, it recomputes checksums, sets 'clean' fs
> state, and that's all. It doesn't change number of entries,
> count+limit stays the same (126). So that's why rerunning tune2fs
> corrupts the fs again.

So what's going on is that the code in question was originally
designed for file systems with a 4k block size, and when *adding* a
checksum to a directory which didn't already have a checksum "tail"
already reserved.  In that case, you have to recreate the htree index
for that directory.  Since tune2fs didn't want to deal with that
corner case, it would throw up its hands and tell the user to run
e2fsck -fD.  Since the UUID had alrady been changed, and the checksum
seed was based on the checksum seed, e2fsck would report corruptions,
but that was actually what was expected.  Unfortunately the message
printed by tune2fs is super-confusing, and logic for checking for this
case is faulty in that (a) it doesn't take into account the block
size, and (b) it doesn't take into account if there is a checksum
"tail" already present for that particular htree directory.

Most people don't see this because they are using file systems with 4k
block sizes, and it's much less likely they will run into that
situation, since the interior node fanout is significantly larger with
4k blocks.  (Is there a reason you are using a 1k block size?  This
adds quite a bit of inefficiency to the file system, and while it does
reduce internal fragmentation, bytes are quite cheap these days, and
it's probably not worth it if you care about performance at all to use
a 1k block size instead of a 4k block size.)

The workaround I would suggest is assuming you are using a kernel
which is 4.4 or newer (and in 2019, you really should), to turn on the
metadata_csum_seed field, either when the file system is originally
formatted, or using "tune2fs -O ^metadata_csum_seed".  This allows you
to change the UUID without needing to rewrite all of the metadata
blocks, which is faster, works while the file system is mounted, and
avoids the bug in tune2fs.

So using the test file system you sent me, this works just fine:

% tune2fs -O metadata_csum_seed -U random  core-image.ext4
tune2fs 1.45.4 (23-Sep-2019)
% e2fsck -fy !$
e2fsck -fy core-image.ext4
e2fsck 1.45.4 (23-Sep-2019)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
core-image.ext4: 13237/89408 files (0.6% non-contiguous), 249888/357064 blocks

cheers,

- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
  2019-12-05 12:36 Viliam Lejcik
  2019-12-05 15:48 ` Theodore Y. Ts'o
@ 2019-12-06  3:51 ` Theodore Y. Ts'o
  2019-12-09 14:44   ` Viliam Lejcik
  1 sibling, 1 reply; 6+ messages in thread
From: Theodore Y. Ts'o @ 2019-12-06  3:51 UTC (permalink / raw)
  To: Viliam Lejcik; +Cc: linux-ext4

On Thu, Dec 05, 2019 at 12:36:35PM +0000, Viliam Lejcik wrote:
> 
> The problem for tune2fs is "Number of entries", when count==limit
> (126). In this case it fails within the following 'if' statement:
> https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/tune2fs.c#n544
> 
> Then it prints out error, sets 'not clean' fs state in superblock,
> and exits. What fsck does, it recomputes checksums, sets 'clean' fs
> state, and that's all. It doesn't change number of entries,
> count+limit stays the same (126). So that's why rerunning tune2fs
> corrupts the fs again.

So what's going on is that the code in question was originally
designed for file systems with a 4k block size, and when *adding* a
checksum to a directory which didn't already have a checksum "tail"
already reserved.  In that case, you have to recreate the htree index
for that directory.  Since tune2fs didn't want to deal with that
corner case, it would throw up its hands and tell the user to run
e2fsck -fD.  Since the UUID had alrady been changed, and the checksum
seed was based on the checksum seed, e2fsck would report corruptions,
but that was actually what was expected.  Unfortunately the message
printed by tune2fs is super-confusing, and logic for checking for this
case is faulty in that (a) it doesn't take into account the block
size, and (b) it doesn't take into account if there is a checksum
"tail" already present for that particular htree directory.

Most people don't see this because they are using file systems with 4k
block sizes, and it's much less likely they will run into that
situation, since the interior node fanout is significantly larger with
4k blocks.  (Is there a reason you are using a 1k block size?  This
adds quite a bit of inefficiency to the file system, and while it does
reduce internal fragmentation, bytes are quite cheap these days, and
it's probably not worth it if you care about performance at all to use
a 1k block size instead of a 4k block size.)

The workaround I would suggest is assuming you are using a kernel
which is 4.4 or newer (and in 2019, you really should), to turn on the
metadata_csum_seed field, either when the file system is originally
formatted, or using "tune2fs -O ^metadata_csum_seed".  This allows you
to change the UUID without needing to rewrite all of the metadata
blocks, which is faster, works while the file system is mounted, and
avoids the bug in tune2fs.

So using the test file system you sent me, this works just fine:

% tune2fs -O metadata_csum_seed -U random  core-image.ext4
tune2fs 1.45.4 (23-Sep-2019)
% e2fsck -fy !$
e2fsck -fy core-image.ext4
e2fsck 1.45.4 (23-Sep-2019)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
core-image.ext4: 13237/89408 files (0.6% non-contiguous), 249888/357064 blocks

cheers,

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
@ 2019-12-05 16:51 Viliam Lejcik
  0 siblings, 0 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-05 16:51 UTC (permalink / raw)
  To: linux-ext4

Hi Theodore,

Thank you for quick response. You can download the sample image here:
https://smartfile.kistler.com/link/Oggq7B33BaI/

It was slightly modified (I deleted some proprietary stuff), as it is difficult to build a sample image which leads to this problem. So you may find the issue on another inode, but if you run the commands as stated in the report, you'll be able to reproduce the issue, for sure.

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.
-----Original Message-----
From: Theodore Y. Ts'o <tytso@mit.edu>
Sent: Thursday, 5. December 2019 16:48
To: Lejcik Viliam <Viliam.Lejcik@kistler.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image

On Thu, Dec 05, 2019 at 12:36:35PM +0000, Viliam Lejcik wrote:
>
> This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Where can we download or otherwise obtain the image where you are seeing the problem?

Thanks,

- Ted


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
  2019-12-05 12:36 Viliam Lejcik
@ 2019-12-05 15:48 ` Theodore Y. Ts'o
  2019-12-06  3:51 ` Theodore Y. Ts'o
  1 sibling, 0 replies; 6+ messages in thread
From: Theodore Y. Ts'o @ 2019-12-05 15:48 UTC (permalink / raw)
  To: Viliam Lejcik; +Cc: linux-ext4

On Thu, Dec 05, 2019 at 12:36:35PM +0000, Viliam Lejcik wrote:
> 
> This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Where can we download or otherwise obtain the image where you are
seeing the problem?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
@ 2019-12-05 12:36 Viliam Lejcik
  2019-12-05 15:48 ` Theodore Y. Ts'o
  2019-12-06  3:51 ` Theodore Y. Ts'o
  0 siblings, 2 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-05 12:36 UTC (permalink / raw)
  To: linux-ext4

Hi all,

We provide a custom Linux distribution, based on yocto-project (poky 2.6.1). With bitbake we've built an image, which becomes corrupted during installation to the SSD of the embedded device. We're setting the filesystem UUID (not partition UUID) using tune2fs, so the bootloader can find it. We noticed this problem because we found a directory that couldn't be read.

| root@board:~# dir /var/lib/opkg/info
| ls: reading directory '/var/lib/opkg/info': Bad message total 0

This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Firstly, let check that the image has been built correctly:

| root@board:~# fsck.ext4 -fn core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Pass 3: Checking directory connectivity
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

Then we want to set UUID to fs (random one for this example) with tune2fs:

| root@board:~# tune2fs -U random core-image.ext4
| tune2fs 1.44.1 (24-Mar-2018)
| Setting UUID on a checksummed filesystem could take some time.
| Proceed anyway (or wait 5 seconds to proceed) ? (y,N) y
|
| This operation requires a freshly checked filesystem.
|
| Please run e2fsck -fD on the filesystem.

It says that on a checksummed fs all metadata blocks have to be rewritten ('metadata_csum' fs features flag set in superblock), what failed somewhere in between ('not clean' fs state set in superblock). We can fix it with fsck:

| root@board:~# fsck.ext4 -fy core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Problem in HTREE directory inode 177: internal node fails checksum.
| Clear HTree index? yes
|
| Pass 3: Checking directory connectivity
| Pass 3A: Optimizing directories
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
|
| core-image.ext4: ***** FILE SYSTEM WAS MODIFIED *****
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

If I rerun tune2fs on the same fixed image, it corrupts it again.

Let have a deeper look to the corrupted inode 177 - it is path /var/lib/opkg/info/, and there's 2712 files under it. Here is its HTREE structure:

| root@board:~# debugfs -R "htree_dump /var/lib/opkg/info" core-image.ext4
| Root node dump:
|  Reserved zero: 0
|  Hash Version: 1
|  Info length: 8
|  Indirect levels: 1
|  Flags: 0
| Number of entries (count): 1
| Number of entries (limit): 123
| Checksum: 0x8dc1e2db
| Entry #0: Hash 0x00000000, block 127
|
| Entry #0: Hash 0x00000000, block 127
| Number of entries (count): 126
| Number of entries (limit): 126
| Checksum: 0x9e54b5c7
| Entry #0: Hash 0x00000000, block 1
| Entry #1: Hash 0x01bddbe0, block 2
| ...
| Entry #124: Hash 0xfd55ab30, block 125
| Entry #125: Hash 0xffa96492, block 126
|
| Entry #0: Hash 0x00000000, block 1
| Reading directory block 1, phys 17863
| 1650 0x000f9ee0-bc4ace72 (52) perl-module-tap-parser-sourcehandler.list
| 1228 0x001d06e8-ada99897 (40) perl-module-net-servent.control
| ...
| 2762 0xff77b492-a9b98e31 (228) perl-module-json-pp.control
| leaf block checksum: 0xeccb004d
| Entry #125: Hash 0xffa96492, block 126
| Reading directory block 126, phys 247938
| 1390 0xffa96492-73841561 (36) lmsensors-sensors.control
| 1022 0xffaf73b4-a6f75b1b (976) perl-module-bytes.control
| leaf block checksum: 0x0f9c8092
| ---------------------

The problem for tune2fs is "Number of entries", when count==limit (126). In this case it fails within the following 'if' statement:
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/tune2fs.c#n544

Then it prints out error, sets 'not clean' fs state in superblock, and exits. What fsck does, it recomputes checksums, sets 'clean' fs state, and that's all. It doesn't change number of entries, count+limit stays the same (126). So that's why rerunning tune2fs corrupts the fs again.

And here is the question - how it should behave correctly? Who's responsible for this issue?
- tune2fs - should it ignore the 'if' statement? (I tried to comment it out and tune2fs then did its job successfully, proved with fsck),
- fsck - should it rebuild the dir, as stated in the comment above the 'if' statement? (htree block is full then rebuild the dir),
- mkfs - should it not build the image with full number of entries? (count==limit).

This issue is not related to the used version of e2fsprogs (1.44.1), I compiled and tried out versions 1.43 - 1.45.4 and they behave the same way. I also tried to generate other images (such as core-image-minimal), but no one else lead to the corruption. If needed, I may give you access to the corrupted image for further investigation.

I'm not expert in ext4, so I'd appreciate any advice. Thank you.

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-09 14:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05  9:20 e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image Viliam Lejcik
2019-12-05 12:36 Viliam Lejcik
2019-12-05 15:48 ` Theodore Y. Ts'o
2019-12-06  3:51 ` Theodore Y. Ts'o
2019-12-09 14:44   ` Viliam Lejcik
2019-12-05 16:51 Viliam Lejcik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).