linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
@ 2019-12-05 16:51 Viliam Lejcik
  0 siblings, 0 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-05 16:51 UTC (permalink / raw)
  To: linux-ext4

Hi Theodore,

Thank you for quick response. You can download the sample image here:
https://smartfile.kistler.com/link/Oggq7B33BaI/

It was slightly modified (I deleted some proprietary stuff), as it is difficult to build a sample image which leads to this problem. So you may find the issue on another inode, but if you run the commands as stated in the report, you'll be able to reproduce the issue, for sure.

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.
-----Original Message-----
From: Theodore Y. Ts'o <tytso@mit.edu>
Sent: Thursday, 5. December 2019 16:48
To: Lejcik Viliam <Viliam.Lejcik@kistler.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image

On Thu, Dec 05, 2019 at 12:36:35PM +0000, Viliam Lejcik wrote:
>
> This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Where can we download or otherwise obtain the image where you are seeing the problem?

Thanks,

- Ted


^ permalink raw reply	[flat|nested] 6+ messages in thread
* e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
@ 2019-12-05 12:36 Viliam Lejcik
  2019-12-05 15:48 ` Theodore Y. Ts'o
  2019-12-06  3:51 ` Theodore Y. Ts'o
  0 siblings, 2 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-05 12:36 UTC (permalink / raw)
  To: linux-ext4

Hi all,

We provide a custom Linux distribution, based on yocto-project (poky 2.6.1). With bitbake we've built an image, which becomes corrupted during installation to the SSD of the embedded device. We're setting the filesystem UUID (not partition UUID) using tune2fs, so the bootloader can find it. We noticed this problem because we found a directory that couldn't be read.

| root@board:~# dir /var/lib/opkg/info
| ls: reading directory '/var/lib/opkg/info': Bad message total 0

This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Firstly, let check that the image has been built correctly:

| root@board:~# fsck.ext4 -fn core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Pass 3: Checking directory connectivity
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

Then we want to set UUID to fs (random one for this example) with tune2fs:

| root@board:~# tune2fs -U random core-image.ext4
| tune2fs 1.44.1 (24-Mar-2018)
| Setting UUID on a checksummed filesystem could take some time.
| Proceed anyway (or wait 5 seconds to proceed) ? (y,N) y
|
| This operation requires a freshly checked filesystem.
|
| Please run e2fsck -fD on the filesystem.

It says that on a checksummed fs all metadata blocks have to be rewritten ('metadata_csum' fs features flag set in superblock), what failed somewhere in between ('not clean' fs state set in superblock). We can fix it with fsck:

| root@board:~# fsck.ext4 -fy core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Problem in HTREE directory inode 177: internal node fails checksum.
| Clear HTree index? yes
|
| Pass 3: Checking directory connectivity
| Pass 3A: Optimizing directories
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
|
| core-image.ext4: ***** FILE SYSTEM WAS MODIFIED *****
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

If I rerun tune2fs on the same fixed image, it corrupts it again.

Let have a deeper look to the corrupted inode 177 - it is path /var/lib/opkg/info/, and there's 2712 files under it. Here is its HTREE structure:

| root@board:~# debugfs -R "htree_dump /var/lib/opkg/info" core-image.ext4
| Root node dump:
|  Reserved zero: 0
|  Hash Version: 1
|  Info length: 8
|  Indirect levels: 1
|  Flags: 0
| Number of entries (count): 1
| Number of entries (limit): 123
| Checksum: 0x8dc1e2db
| Entry #0: Hash 0x00000000, block 127
|
| Entry #0: Hash 0x00000000, block 127
| Number of entries (count): 126
| Number of entries (limit): 126
| Checksum: 0x9e54b5c7
| Entry #0: Hash 0x00000000, block 1
| Entry #1: Hash 0x01bddbe0, block 2
| ...
| Entry #124: Hash 0xfd55ab30, block 125
| Entry #125: Hash 0xffa96492, block 126
|
| Entry #0: Hash 0x00000000, block 1
| Reading directory block 1, phys 17863
| 1650 0x000f9ee0-bc4ace72 (52) perl-module-tap-parser-sourcehandler.list
| 1228 0x001d06e8-ada99897 (40) perl-module-net-servent.control
| ...
| 2762 0xff77b492-a9b98e31 (228) perl-module-json-pp.control
| leaf block checksum: 0xeccb004d
| Entry #125: Hash 0xffa96492, block 126
| Reading directory block 126, phys 247938
| 1390 0xffa96492-73841561 (36) lmsensors-sensors.control
| 1022 0xffaf73b4-a6f75b1b (976) perl-module-bytes.control
| leaf block checksum: 0x0f9c8092
| ---------------------

The problem for tune2fs is "Number of entries", when count==limit (126). In this case it fails within the following 'if' statement:
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/tune2fs.c#n544

Then it prints out error, sets 'not clean' fs state in superblock, and exits. What fsck does, it recomputes checksums, sets 'clean' fs state, and that's all. It doesn't change number of entries, count+limit stays the same (126). So that's why rerunning tune2fs corrupts the fs again.

And here is the question - how it should behave correctly? Who's responsible for this issue?
- tune2fs - should it ignore the 'if' statement? (I tried to comment it out and tune2fs then did its job successfully, proved with fsck),
- fsck - should it rebuild the dir, as stated in the comment above the 'if' statement? (htree block is full then rebuild the dir),
- mkfs - should it not build the image with full number of entries? (count==limit).

This issue is not related to the used version of e2fsprogs (1.44.1), I compiled and tried out versions 1.43 - 1.45.4 and they behave the same way. I also tried to generate other images (such as core-image-minimal), but no one else lead to the corruption. If needed, I may give you access to the corrupted image for further investigation.

I'm not expert in ext4, so I'd appreciate any advice. Thank you.

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image
@ 2019-12-05  9:20 Viliam Lejcik
  0 siblings, 0 replies; 6+ messages in thread
From: Viliam Lejcik @ 2019-12-05  9:20 UTC (permalink / raw)
  To: linux-ext4

We provide a custom Linux distribution, based on yocto-project (poky 2.6.1). With bitbake we've built an image, which becomes corrupted during installation to the SSD of the embedded device. We're setting the filesystem UUID (not partition UUID) using tune2fs, so the bootloader can find it. We noticed this problem because we found a directory that couldn't be read.

| root@board:~# dir /var/lib/opkg/info
| ls: reading directory '/var/lib/opkg/info': Bad message total 0

This behavior can be reproduced on an ext4 fs image, so there's no need to run it on the device.

Firstly, let check that the image has been built correctly:

| root@board:~# fsck.ext4 -fn core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Pass 3: Checking directory connectivity
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

Then we want to set UUID to fs (random one for this example) with tune2fs:

| root@board:~# tune2fs -U random core-image.ext4
| tune2fs 1.44.1 (24-Mar-2018)
| Setting UUID on a checksummed filesystem could take some time.
| Proceed anyway (or wait 5 seconds to proceed) ? (y,N) y
|
| This operation requires a freshly checked filesystem.
|
| Please run e2fsck -fD on the filesystem.

It says that on a checksummed fs all metadata blocks have to be rewritten ('metadata_csum' fs features flag set in superblock), what failed somewhere in between ('not clean' fs state in superblock). We can fix it with fsck:

| root@board:~# fsck.ext4 -fy core-image.ext4
| e2fsck 1.44.1 (24-Mar-2018)
| Pass 1: Checking inodes, blocks, and sizes
| Pass 2: Checking directory structure
| Problem in HTREE directory inode 177: internal node fails checksum.
| Clear HTree index? yes
|
| Pass 3: Checking directory connectivity
| Pass 3A: Optimizing directories
| Pass 4: Checking reference counts
| Pass 5: Checking group summary information
|
| core-image.ext4: ***** FILE SYSTEM WAS MODIFIED *****
| core-image.ext4: 13417/85344 files (0.6% non-contiguous), 250575/340060 blocks

If I rerun tune2fs on the same fixed image, it corrupts it again.

Let have a deeper look to the corrupted inode 177 - it is path /var/lib/opkg/info/, and there's 2712 files under it. Here is its HTREE structure:

| root@board:~# debugfs -R "htree_dump /var/lib/opkg/info" core-image.ext4
| Root node dump:
|  Reserved zero: 0
|  Hash Version: 1
|  Info length: 8
|  Indirect levels: 1
|  Flags: 0
| Number of entries (count): 1
| Number of entries (limit): 123
| Checksum: 0x8dc1e2db
| Entry #0: Hash 0x00000000, block 127
|
| Entry #0: Hash 0x00000000, block 127
| Number of entries (count): 126
| Number of entries (limit): 126
| Checksum: 0x9e54b5c7
| Entry #0: Hash 0x00000000, block 1
| Entry #1: Hash 0x01bddbe0, block 2
| ...
| Entry #124: Hash 0xfd55ab30, block 125
| Entry #125: Hash 0xffa96492, block 126
|
| Entry #0: Hash 0x00000000, block 1
| Reading directory block 1, phys 17863
| 1650 0x000f9ee0-bc4ace72 (52) perl-module-tap-parser-sourcehandler.list
| 1228 0x001d06e8-ada99897 (40) perl-module-net-servent.control
| ...
| 2762 0xff77b492-a9b98e31 (228) perl-module-json-pp.control
| leaf block checksum: 0xeccb004d
| Entry #125: Hash 0xffa96492, block 126
| Reading directory block 126, phys 247938
| 1390 0xffa96492-73841561 (36) lmsensors-sensors.control
| 1022 0xffaf73b4-a6f75b1b (976) perl-module-bytes.control
| leaf block checksum: 0x0f9c8092
| ---------------------

The problem for tune2fs is the "Number of entries", when count==limit (126). In this case it fails within the following 'if' statement:
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/tune2fs.c#n544

Then it prints out error, sets 'not clean' fs state in superblock, and exits. What fsck does, it recomputes checksums, sets 'clean' fs state, and that's all. It doesn't change number of entries, count+limit stays the same (126). So that's why rerunning tune2fs corrupts the fs again.

And here is the question - how it should behave correctly? Who's responsible for this issue?
- tune2fs - should it ignore the 'if' statement? (I tried to comment it out and tune2fs then did its job successfully, proved with fsck),
- fsck - should it rebuild the dir, as stated in the comment above the 'if' statement? (htree block is full then rebuild the dir),
- mkfs - should it not build the image with full number of entries? (count==limit).

This issue is not related to the used version of e2fsprogs (1.44.1), I compiled and tried out versions 1.43 - 1.45.4 and they behave the same way. I also tried to generate other images (such as core-image-minimal), but no one else lead to the corruption. If needed, I may give you access to the corrupted image for further investigation.

I'm not expert in ext4, so I'd appreciate any advice. Thank you.

BR,
Vilo


Confidentiality Notice: This e-mail is privileged and confidential and for the use of the addressee only. Should you have received this e-mail in error please notify us by replying directly to the sender or by sending a message to info@kistler.com. Unauthorised dissemination, disclosure or copying of the contents of this e-mail, or any similar action, is prohibited.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-09 14:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 16:51 e2fsprogs: setting UUID with tune2fs corrupts an ext4 fs image Viliam Lejcik
  -- strict thread matches above, loose matches on Subject: below --
2019-12-05 12:36 Viliam Lejcik
2019-12-05 15:48 ` Theodore Y. Ts'o
2019-12-06  3:51 ` Theodore Y. Ts'o
2019-12-09 14:44   ` Viliam Lejcik
2019-12-05  9:20 Viliam Lejcik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).