linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Jan Kara <jack@suse.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps
Date: Mon, 5 Oct 2020 10:36:39 -0700	[thread overview]
Message-ID: <20201005173639.GA2311765@magnolia> (raw)
In-Reply-To: <20201005081454.GA493107@localhost>

On Mon, Oct 05, 2020 at 01:14:54AM -0700, Josh Triplett wrote:
> Ran into an ext4 regression when testing upgrades to 5.9-rc kernels:
> 
> Commit e7bfb5c9bb3d ("ext4: handle add_system_zone() failure in
> ext4_setup_system_zone()") breaks mounting of read-only ext4 filesystems
> with intentionally overlapping bitmap blocks.
> 
> On an always-read-only filesystem explicitly marked with
> EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS, prior to that commit, it's safe to
> point all the block and inode bitmaps to a single block

LOL, WHAT?

I didn't know shared blocks applied to fs metadata.  I thought that
"shared" only applied to file extent maps being able to share physical
blocks.

Could /somebody/ please document the ondisk format changes that are
associated with this feature?

> of all 1s,
> because a read-only filesystem will never allocate or free any blocks or
> inodes.

All 1s?  So the inode bitmap says that every inode table slot is in use,
even if the inode record itself says it isn't?  What does e2fsck -n
think about that kind of metadata inconsistency?

Hmm, let's try.

$ truncate -s 300m /tmp/a.img
$ mke2fs -T ext4 -O shared_blocks /tmp/a.img -d /tmp/ -F
mke2fs 1.46~WIP-2020-10-04 (4-Oct-2020)
Invalid filesystem option set: shared_blocks

Oookay.  So that's not how you create these shared block ext4s,
apparently...

$ mke2fs -T ext4 /tmp/a.img -F
mke2fs 1.46~WIP-2020-10-04 (4-Oct-2020)
Discarding device blocks: done
Creating filesystem with 76800 4k blocks and 19200 inodes
Filesystem UUID: 0a763191-89ca-49bc-9dc6-bf2986009ad9
Superblock backups stored on blocks:
        32768

Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

$ debugfs -w /tmp/a.img
debugfs 1.45.6 (20-Mar-2020)
debugfs:  features shared_blocks
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum shared_blocks
debugfs:  set_bg 1 inode_bitmap 42
debugfs:  set_bg 1 block_bitmap 39
debugfs:  stats
 Group  0: block bitmap at 39, inode bitmap at 42, inode table at 45
           31517 free blocks, 6389 free inodes, 2 used directories, 6389 unused inodes
           [Checksum 0xda06]
 Group  1: block bitmap at 39, inode bitmap at 42, inode table at 445
           28633 free blocks, 6400 free inodes, 0 used directories, 6400 unused inodes
           [Inode not init, Checksum 0x2e69]
$ xfs_io -c "pwrite -S 0xFF $((39 * 4096)) 4096" /tmp/a.img
$ xfs_io -c "pwrite -S 0xFF $((42 * 4096)) 4096" /tmp/a.img

Ok, now we have a shared blocks fs where BG 0 and BG 1 share bitmaps,
and the bitmaps are set to 1.

$ e2fsck -n /tmp/a.img 
e2fsck 1.45.6 (20-Mar-2020)
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
e2fsck: Group descriptors look bad... trying backup blocks...
/tmp/a.img was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(1251--32767)
Fix? no

Free blocks count wrong for group #0 (31517, counted=0).
Fix? no

Free blocks count wrong (71414, counted=39897).
Fix? no

Inode bitmap differences:  -(12--6400)
Fix? no

Free inodes count wrong for group #0 (6389, counted=0).
Fix? no

Free inodes count wrong (19189, counted=12800).
Fix? no

Padding at end of inode bitmap is not set. Fix? no

Inode bitmap differences: Group 0 inode bitmap does not match checksum.
IGNORED.
Group 1 inode bitmap does not match checksum.
IGNORED.
Group 2 inode bitmap does not match checksum.
IGNORED.
Block bitmap differences: Group 0 block bitmap does not match checksum.
IGNORED.

/tmp/a.img: ********** WARNING: Filesystem still has errors **********

/tmp/a.img: 11/19200 files (0.0% non-contiguous), 5386/76800 blocks

Sooooo... are you shipping ext4 images with an undocumented ondisk
format variation that looks like inconsistency to the standard tools?

--D

> 
> However, after that commit, the block validity check rejects such
> filesystems with -EUCLEAN and "failed to initialize system zone (-117)".
> This causes systems that previously worked correctly to fail when
> upgrading to v5.9-rc2 or later.
> 
> This was obviously a bugfix, and I'm not suggesting that it should be
> reverted; it looks like this effectively worked by accident before,
> because the block_validity check wasn't fully functional. However, this
> does break real systems, and I'd like to get some kind of regression fix
> in before 5.9 final if possible. I think it would suffice to make
> block_validity default to false if and only if
> EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS is set.
> 
> Does that seem like a reasonable fix?
> 
> Here's a quick sketch of a patch, which I've tested and confirmed to
> work:
> 
> ----- 8< -----
> Subject: [PATCH] Fix ext4 regression in v5.9-rc2 on ro fs with overlapped bitmaps
> 
> Commit e7bfb5c9bb3d ("ext4: handle add_system_zone() failure in
> ext4_setup_system_zone()") breaks mounting of read-only ext4 filesystems
> with intentionally overlapping bitmap blocks.
> 
> On an always-read-only filesystem explicitly marked with
> EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS, prior to that commit, it's safe to
> point all the block and inode bitmaps to a single block of all 1s,
> because a read-only filesystem will never allocate or free any blocks or
> inodes.
> 
> However, after that commit, the block validity check rejects such
> filesystems with -EUCLEAN and "failed to initialize system zone (-117)".
> This causes systems that previously worked correctly to fail when
> upgrading to v5.9-rc2 or later.
> 
> Fix this by defaulting block_validity to off when
> EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS is set.
> 
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> Fixes: e7bfb5c9bb3d ("ext4: handle add_system_zone() failure in ext4_setup_system_zone()")
> ---
>  fs/ext4/ext4.h  | 2 ++
>  fs/ext4/super.c | 3 ++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 523e00d7b392..7874028fa864 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1834,6 +1834,7 @@ static inline bool ext4_verity_in_progress(struct inode *inode)
>  #define EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	0x0400
>  #define EXT4_FEATURE_RO_COMPAT_READONLY		0x1000
>  #define EXT4_FEATURE_RO_COMPAT_PROJECT		0x2000
> +#define EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS	0x4000
>  #define EXT4_FEATURE_RO_COMPAT_VERITY		0x8000
>  
>  #define EXT4_FEATURE_INCOMPAT_COMPRESSION	0x0001
> @@ -1930,6 +1931,7 @@ EXT4_FEATURE_RO_COMPAT_FUNCS(bigalloc,		BIGALLOC)
>  EXT4_FEATURE_RO_COMPAT_FUNCS(metadata_csum,	METADATA_CSUM)
>  EXT4_FEATURE_RO_COMPAT_FUNCS(readonly,		READONLY)
>  EXT4_FEATURE_RO_COMPAT_FUNCS(project,		PROJECT)
> +EXT4_FEATURE_RO_COMPAT_FUNCS(shared_blocks,	SHARED_BLOCKS)
>  EXT4_FEATURE_RO_COMPAT_FUNCS(verity,		VERITY)
>  
>  EXT4_FEATURE_INCOMPAT_FUNCS(compression,	COMPRESSION)
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index ea425b49b345..f57a7e966e44 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3954,7 +3954,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  	else
>  		set_opt(sb, ERRORS_RO);
>  	/* block_validity enabled by default; disable with noblock_validity */
> -	set_opt(sb, BLOCK_VALIDITY);
> +	if (!ext4_has_feature_shared_blocks(sb))
> +		set_opt(sb, BLOCK_VALIDITY);
>  	if (def_mount_opts & EXT4_DEFM_DISCARD)
>  		set_opt(sb, DISCARD);
>  

  parent reply	other threads:[~2020-10-05 17:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-04 23:17 Linux 5.9-rc8 Linus Torvalds
2020-10-05  8:14 ` ext4 regression in v5.9-rc2 from e7bfb5c9bb3d on ro fs with overlapped bitmaps Josh Triplett
2020-10-05  9:46   ` Jan Kara
2020-10-05 10:16     ` Josh Triplett
2020-10-05 16:19       ` Jan Kara
2020-10-05 16:20   ` Jan Kara
2020-10-05 17:36   ` Darrick J. Wong [this message]
2020-10-06  0:04     ` Theodore Y. Ts'o
2020-10-06  0:32     ` Josh Triplett
2020-10-06  2:51       ` Darrick J. Wong
2020-10-06  3:18         ` Theodore Y. Ts'o
2020-10-06  5:03           ` Josh Triplett
2020-10-06  6:03             ` Josh Triplett
2020-10-06 13:35             ` Theodore Y. Ts'o
2020-10-07  8:03               ` Josh Triplett
2020-10-07 14:32                 ` Theodore Y. Ts'o
2020-10-07 20:14                   ` Josh Triplett
2020-10-08  2:10                     ` Theodore Y. Ts'o
2020-10-08 17:54                       ` Darrick J. Wong
2020-10-08 22:38                         ` Josh Triplett
2020-10-09  2:54                           ` Darrick J. Wong
2020-10-09 19:08                             ` Josh Triplett
2020-10-08 22:22                       ` Josh Triplett
2020-10-09 14:37                         ` Theodore Y. Ts'o
2020-10-09 20:30                           ` Josh Triplett
2021-01-10 18:41                           ` Malicious fs images was " Pavel Machek
2021-01-11 18:51                             ` Darrick J. Wong
2021-01-11 19:39                               ` Eric Biggers
2021-01-12 21:43                             ` Theodore Ts'o
2021-01-12 22:28                               ` Pavel Machek
2021-01-13  5:09                                 ` Theodore Ts'o
2020-10-08  2:57                     ` Andreas Dilger
2020-10-08 19:12                       ` Josh Triplett
2020-10-08 19:25                         ` Andreas Dilger
2020-10-08 22:28                           ` Josh Triplett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201005173639.GA2311765@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).