All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] ext4: fix some bugs in online resize
@ 2022-11-16  7:27 Baokun Li
  2022-11-16  7:28 ` [PATCH v2 1/3] ext4: fix bad checksum after " Baokun Li
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Baokun Li @ 2022-11-16  7:27 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yukuai3, libaokun1

V1->V2:
    Replace s_first_data_block with ext4_group_first_block_no() in patch 3
    to avoid type warning.(Reported-by: kernel test robot <lkp@intel.com>)

Baokun Li (3):
  ext4: fix bad checksum after online resize
  ext4: fix corrupt backup group descriptors after online resize
  ext4: fix corruption when online resizing a 1K bigalloc fs

 fs/ext4/resize.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] ext4: fix bad checksum after online resize
  2022-11-16  7:27 [PATCH v2 0/3] ext4: fix some bugs in online resize Baokun Li
@ 2022-11-16  7:28 ` Baokun Li
  2022-11-16  7:28 ` [PATCH v2 2/3] ext4: fix corrupt backup group descriptors " Baokun Li
  2022-11-16  7:28 ` [PATCH v2 3/3] ext4: fix corruption when online resizing a 1K bigalloc fs Baokun Li
  2 siblings, 0 replies; 9+ messages in thread
From: Baokun Li @ 2022-11-16  7:28 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yukuai3, libaokun1, Darrick J . Wong

When online resizing is performed twice consecutively, the error message
"Superblock checksum does not match superblock" is displayed for the
second time. Here's the reproducer:

	mkfs.ext4 -F /dev/sdb 100M
	mount /dev/sdb /tmp/test
	resize2fs /dev/sdb 5G
	resize2fs /dev/sdb 6G

To solve this issue, we moved the update of the checksum after the
es->s_overhead_clusters is updated.

Fixes: 026d0d27c488 ("ext4: reduce computation of overhead during resize")
Fixes: de394a86658f ("ext4: update s_overhead_clusters in the superblock during an on-line resize")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/resize.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 46b87ffeb304..cb99b410c9fa 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1476,8 +1476,6 @@ static void ext4_update_super(struct super_block *sb,
 	 * active. */
 	ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) +
 				reserved_blocks);
-	ext4_superblock_csum_set(sb);
-	unlock_buffer(sbi->s_sbh);
 
 	/* Update the free space counts */
 	percpu_counter_add(&sbi->s_freeclusters_counter,
@@ -1513,6 +1511,8 @@ static void ext4_update_super(struct super_block *sb,
 		ext4_calculate_overhead(sb);
 	es->s_overhead_clusters = cpu_to_le32(sbi->s_overhead);
 
+	ext4_superblock_csum_set(sb);
+	unlock_buffer(sbi->s_sbh);
 	if (test_opt(sb, DEBUG))
 		printk(KERN_DEBUG "EXT4-fs: added group %u:"
 		       "%llu blocks(%llu free %llu reserved)\n", flex_gd->count,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] ext4: fix corrupt backup group descriptors after online resize
  2022-11-16  7:27 [PATCH v2 0/3] ext4: fix some bugs in online resize Baokun Li
  2022-11-16  7:28 ` [PATCH v2 1/3] ext4: fix bad checksum after " Baokun Li
@ 2022-11-16  7:28 ` Baokun Li
  2022-11-16 11:49   ` Jan Kara
  2022-11-16  7:28 ` [PATCH v2 3/3] ext4: fix corruption when online resizing a 1K bigalloc fs Baokun Li
  2 siblings, 1 reply; 9+ messages in thread
From: Baokun Li @ 2022-11-16  7:28 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yukuai3, libaokun1

In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end
of the online resize"), it is assumed that update_backups() only updates
backup superblocks, so each b_data is treated as a backupsuper block to
update its s_block_group_nr and s_checksum. However, update_backups()
also updates the backup group descriptors, which causes the backup group
descriptors to be corrupted.

The above commit fixes the problem of invalid checksum of the backup
superblock. The root cause of this problem is that the checksum of
ext4_update_super() is not set correctly. This problem has been fixed
in the previous patch ("ext4: fix bad checksum after online resize").
Therefore, roll back some modifications in the above commit.

Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/resize.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index cb99b410c9fa..32fbfc173571 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1158,7 +1158,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
 	while (group < sbi->s_groups_count) {
 		struct buffer_head *bh;
 		ext4_fsblk_t backup_block;
-		struct ext4_super_block *es;
 
 		/* Out of journal space, and can't get more - abort - so sad */
 		err = ext4_resize_ensure_credits_batch(handle, 1);
@@ -1187,10 +1186,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
 		memcpy(bh->b_data, data, size);
 		if (rest)
 			memset(bh->b_data + size, 0, rest);
-		es = (struct ext4_super_block *) bh->b_data;
-		es->s_block_group_nr = cpu_to_le16(group);
-		if (ext4_has_metadata_csum(sb))
-			es->s_checksum = ext4_superblock_csum(sb, es);
 		set_buffer_uptodate(bh);
 		unlock_buffer(bh);
 		err = ext4_handle_dirty_metadata(handle, NULL, bh);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] ext4: fix corruption when online resizing a 1K bigalloc fs
  2022-11-16  7:27 [PATCH v2 0/3] ext4: fix some bugs in online resize Baokun Li
  2022-11-16  7:28 ` [PATCH v2 1/3] ext4: fix bad checksum after " Baokun Li
  2022-11-16  7:28 ` [PATCH v2 2/3] ext4: fix corrupt backup group descriptors " Baokun Li
@ 2022-11-16  7:28 ` Baokun Li
  2022-11-16 11:56   ` Jan Kara
  2 siblings, 1 reply; 9+ messages in thread
From: Baokun Li @ 2022-11-16  7:28 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, ritesh.list, linux-kernel, yi.zhang,
	yukuai3, libaokun1

When a backup superblock is updated in update_backups(), the primary
superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used
as the backup superblock's offset in its group. However, when the block
size is 1K and bigalloc is enabled, the two offsets are not equal. This
causes the backup group descriptors to be overwritten by the superblock
in update_backups(). Moreover, if meta_bg is enabled, the file system will
be corrupted because this feature uses backup group descriptors.

To solve this issue, we use a more accurate ext4_group_first_block_no() as
the offset of the backup superblock in its group.

Fixes: d77147ff443b ("ext4: add support for online resizing with bigalloc")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
V1->V2:
    Replace s_first_data_block with ext4_group_first_block_no() to avoid
    type warning.(Reported-by: kernel test robot <lkp@intel.com>)

 fs/ext4/resize.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 32fbfc173571..98e544c2f97d 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1591,8 +1591,8 @@ static int ext4_flex_group_add(struct super_block *sb,
 		int meta_bg = ext4_has_feature_meta_bg(sb);
 		sector_t old_gdb = 0;
 
-		update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es,
-			       sizeof(struct ext4_super_block), 0);
+		update_backups(sb, ext4_group_first_block_no(sb, 0),
+			       (char *)es, sizeof(struct ext4_super_block), 0);
 		for (; gdb_num <= gdb_num_end; gdb_num++) {
 			struct buffer_head *gdb_bh;
 
@@ -1803,7 +1803,7 @@ static int ext4_group_extend_no_check(struct super_block *sb,
 		if (test_opt(sb, DEBUG))
 			printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
 			       "blocks\n", ext4_blocks_count(es));
-		update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr,
+		update_backups(sb, ext4_group_first_block_no(sb, 0),
 			       (char *)es, sizeof(struct ext4_super_block), 0);
 	}
 	return err;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] ext4: fix corrupt backup group descriptors after online resize
  2022-11-16  7:28 ` [PATCH v2 2/3] ext4: fix corrupt backup group descriptors " Baokun Li
@ 2022-11-16 11:49   ` Jan Kara
  2022-11-16 13:14     ` Baokun Li
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2022-11-16 11:49 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yukuai3

On Wed 16-11-22 15:28:01, Baokun Li wrote:
> In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end
> of the online resize"), it is assumed that update_backups() only updates
> backup superblocks, so each b_data is treated as a backupsuper block to
> update its s_block_group_nr and s_checksum. However, update_backups()
> also updates the backup group descriptors, which causes the backup group
> descriptors to be corrupted.
> 
> The above commit fixes the problem of invalid checksum of the backup
> superblock. The root cause of this problem is that the checksum of
> ext4_update_super() is not set correctly. This problem has been fixed
> in the previous patch ("ext4: fix bad checksum after online resize").
> Therefore, roll back some modifications in the above commit.
> 
> Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

So I agree commit 9a8c5b0d0615 is broken and does corrupt group
descriptors. However I don't see how PATCH 1/3 in this series would fix all
the problems commit 9a8c5b0d0615 is trying to fix. In particular checksums
on backup superblocks will not be properly set by the resize code AFAICT.

								Honza

> ---
>  fs/ext4/resize.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index cb99b410c9fa..32fbfc173571 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -1158,7 +1158,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
>  	while (group < sbi->s_groups_count) {
>  		struct buffer_head *bh;
>  		ext4_fsblk_t backup_block;
> -		struct ext4_super_block *es;
>  
>  		/* Out of journal space, and can't get more - abort - so sad */
>  		err = ext4_resize_ensure_credits_batch(handle, 1);
> @@ -1187,10 +1186,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
>  		memcpy(bh->b_data, data, size);
>  		if (rest)
>  			memset(bh->b_data + size, 0, rest);
> -		es = (struct ext4_super_block *) bh->b_data;
> -		es->s_block_group_nr = cpu_to_le16(group);
> -		if (ext4_has_metadata_csum(sb))
> -			es->s_checksum = ext4_superblock_csum(sb, es);
>  		set_buffer_uptodate(bh);
>  		unlock_buffer(bh);
>  		err = ext4_handle_dirty_metadata(handle, NULL, bh);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] ext4: fix corruption when online resizing a 1K bigalloc fs
  2022-11-16  7:28 ` [PATCH v2 3/3] ext4: fix corruption when online resizing a 1K bigalloc fs Baokun Li
@ 2022-11-16 11:56   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2022-11-16 11:56 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, tytso, adilger.kernel, jack, ritesh.list,
	linux-kernel, yi.zhang, yukuai3

On Wed 16-11-22 15:28:02, Baokun Li wrote:
> When a backup superblock is updated in update_backups(), the primary
> superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used
> as the backup superblock's offset in its group. However, when the block
> size is 1K and bigalloc is enabled, the two offsets are not equal. This
> causes the backup group descriptors to be overwritten by the superblock
> in update_backups(). Moreover, if meta_bg is enabled, the file system will
> be corrupted because this feature uses backup group descriptors.
> 
> To solve this issue, we use a more accurate ext4_group_first_block_no() as
> the offset of the backup superblock in its group.
> 
> Fixes: d77147ff443b ("ext4: add support for online resizing with bigalloc")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

The patch looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> V1->V2:
>     Replace s_first_data_block with ext4_group_first_block_no() to avoid
>     type warning.(Reported-by: kernel test robot <lkp@intel.com>)
> 
>  fs/ext4/resize.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 32fbfc173571..98e544c2f97d 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -1591,8 +1591,8 @@ static int ext4_flex_group_add(struct super_block *sb,
>  		int meta_bg = ext4_has_feature_meta_bg(sb);
>  		sector_t old_gdb = 0;
>  
> -		update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es,
> -			       sizeof(struct ext4_super_block), 0);
> +		update_backups(sb, ext4_group_first_block_no(sb, 0),
> +			       (char *)es, sizeof(struct ext4_super_block), 0);
>  		for (; gdb_num <= gdb_num_end; gdb_num++) {
>  			struct buffer_head *gdb_bh;
>  
> @@ -1803,7 +1803,7 @@ static int ext4_group_extend_no_check(struct super_block *sb,
>  		if (test_opt(sb, DEBUG))
>  			printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
>  			       "blocks\n", ext4_blocks_count(es));
> -		update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr,
> +		update_backups(sb, ext4_group_first_block_no(sb, 0),
>  			       (char *)es, sizeof(struct ext4_super_block), 0);
>  	}
>  	return err;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] ext4: fix corrupt backup group descriptors after online resize
  2022-11-16 11:49   ` Jan Kara
@ 2022-11-16 13:14     ` Baokun Li
  2022-11-16 15:26       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Baokun Li @ 2022-11-16 13:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yukuai3, Baokun Li

On 2022/11/16 19:49, Jan Kara wrote:
> On Wed 16-11-22 15:28:01, Baokun Li wrote:
>> In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end
>> of the online resize"), it is assumed that update_backups() only updates
>> backup superblocks, so each b_data is treated as a backupsuper block to
>> update its s_block_group_nr and s_checksum. However, update_backups()
>> also updates the backup group descriptors, which causes the backup group
>> descriptors to be corrupted.
>>
>> The above commit fixes the problem of invalid checksum of the backup
>> superblock. The root cause of this problem is that the checksum of
>> ext4_update_super() is not set correctly. This problem has been fixed
>> in the previous patch ("ext4: fix bad checksum after online resize").
>> Therefore, roll back some modifications in the above commit.
>>
>> Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize")
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> So I agree commit 9a8c5b0d0615 is broken and does corrupt group
> descriptors. However I don't see how PATCH 1/3 in this series would fix all
> the problems commit 9a8c5b0d0615 is trying to fix. In particular checksums
> on backup superblocks will not be properly set by the resize code AFAICT.
>
> 								Honza
I didn't find these two issues to be the same until I researched the 
problem in
PATCH 3/3 and found that commit 9a8c5b0d0615 introduced a similar problem.
Then, it is found that the backup superblock is directly copied from the 
primary
superblock. If the backup superblock is faulty, the primary superblock 
must be
faulty. In this case, patch 1 that fixes the primary superblock problem 
is thought
of. So by rolling back commit 9a8c5b0d0615 to verify, I found that patch 
1 did
fix the problem.

Only ext4_flex_group_add() and ext4_group_extend_no_check() call
update_backups() to update the backup superblock. Both of these functions
correctly set the checksum of the primary superblock. The backup superblocks
that are copied from them are also correct.

In ext4_flex_group_add(), we only update the backup superblock if there 
are no
previous errors, indicating that we must have updated the checksum in
ext4_update_super() before executing update_backups(). The previous problem
was that after we updated the checksum in ext4_update_super(), we modified
s_overhead_clusters, so the checksums for both the primary and backup 
superblocks
were incorrect. This problem has been fixed in PATCH 1/3, so checksum is set
correctly in ext4_flex_group_add().

The same is true in ext4_group_extend_no_check(), we only update the backup
superblock if there are no errors, and we execute ext4_superblock_csum_set()
to update the checksum before updating the backup superblock. Therefore,
checksum is correctly set in ext4_group_extend_no_check().

I think we only need to ensure that the checksum is set correctly when 
the buffer
lock of sbi->s_sbh is unlocked. Therefore, the checksum should be 
correct before
update_backups() holds the buffer lock. Also, in update_backups() we 
copy the
entire superblock completely, and the checksum is unchanged, so we don't 
need
to reset it.
>> ---
>>   fs/ext4/resize.c | 5 -----
>>   1 file changed, 5 deletions(-)
>>
>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
>> index cb99b410c9fa..32fbfc173571 100644
>> --- a/fs/ext4/resize.c
>> +++ b/fs/ext4/resize.c
>> @@ -1158,7 +1158,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
>>   	while (group < sbi->s_groups_count) {
>>   		struct buffer_head *bh;
>>   		ext4_fsblk_t backup_block;
>> -		struct ext4_super_block *es;
>>   
>>   		/* Out of journal space, and can't get more - abort - so sad */
>>   		err = ext4_resize_ensure_credits_batch(handle, 1);
>> @@ -1187,10 +1186,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
>>   		memcpy(bh->b_data, data, size);
>>   		if (rest)
>>   			memset(bh->b_data + size, 0, rest);
>> -		es = (struct ext4_super_block *) bh->b_data;
>> -		es->s_block_group_nr = cpu_to_le16(group);
>> -		if (ext4_has_metadata_csum(sb))
>> -			es->s_checksum = ext4_superblock_csum(sb, es);
>>   		set_buffer_uptodate(bh);
>>   		unlock_buffer(bh);
>>   		err = ext4_handle_dirty_metadata(handle, NULL, bh);
>> -- 
>> 2.31.1
>>
Thank you for your review!
-- 
With Best Regards,
Baokun Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] ext4: fix corrupt backup group descriptors after online resize
  2022-11-16 13:14     ` Baokun Li
@ 2022-11-16 15:26       ` Jan Kara
  2022-11-17  1:37         ` Baokun Li
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2022-11-16 15:26 UTC (permalink / raw)
  To: Baokun Li
  Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, ritesh.list,
	linux-kernel, yi.zhang, yukuai3

On Wed 16-11-22 21:14:16, Baokun Li wrote:
> On 2022/11/16 19:49, Jan Kara wrote:
> > On Wed 16-11-22 15:28:01, Baokun Li wrote:
> > > In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end
> > > of the online resize"), it is assumed that update_backups() only updates
> > > backup superblocks, so each b_data is treated as a backupsuper block to
> > > update its s_block_group_nr and s_checksum. However, update_backups()
> > > also updates the backup group descriptors, which causes the backup group
> > > descriptors to be corrupted.
> > > 
> > > The above commit fixes the problem of invalid checksum of the backup
> > > superblock. The root cause of this problem is that the checksum of
> > > ext4_update_super() is not set correctly. This problem has been fixed
> > > in the previous patch ("ext4: fix bad checksum after online resize").
> > > Therefore, roll back some modifications in the above commit.
> > > 
> > > Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize")
> > > Signed-off-by: Baokun Li <libaokun1@huawei.com>
> > So I agree commit 9a8c5b0d0615 is broken and does corrupt group
> > descriptors. However I don't see how PATCH 1/3 in this series would fix all
> > the problems commit 9a8c5b0d0615 is trying to fix. In particular checksums
> > on backup superblocks will not be properly set by the resize code AFAICT.
> > 
> > 								Honza
> I didn't find these two issues to be the same until I researched the problem
> in
> PATCH 3/3 and found that commit 9a8c5b0d0615 introduced a similar problem.
> Then, it is found that the backup superblock is directly copied from the
> primary
> superblock. If the backup superblock is faulty, the primary superblock must
> be
> faulty. In this case, patch 1 that fixes the primary superblock problem is
> thought
> of. So by rolling back commit 9a8c5b0d0615 to verify, I found that patch 1
> did
> fix the problem.
> 
> Only ext4_flex_group_add() and ext4_group_extend_no_check() call
> update_backups() to update the backup superblock. Both of these functions
> correctly set the checksum of the primary superblock. The backup superblocks
> that are copied from them are also correct.
> 
> In ext4_flex_group_add(), we only update the backup superblock if there are
> no
> previous errors, indicating that we must have updated the checksum in
> ext4_update_super() before executing update_backups(). The previous problem
> was that after we updated the checksum in ext4_update_super(), we modified
> s_overhead_clusters, so the checksums for both the primary and backup
> superblocks
> were incorrect. This problem has been fixed in PATCH 1/3, so checksum is set
> correctly in ext4_flex_group_add().
> 
> The same is true in ext4_group_extend_no_check(), we only update the backup
> superblock if there are no errors, and we execute ext4_superblock_csum_set()
> to update the checksum before updating the backup superblock. Therefore,
> checksum is correctly set in ext4_group_extend_no_check().
> 
> I think we only need to ensure that the checksum is set correctly when the
> buffer
> lock of sbi->s_sbh is unlocked. Therefore, the checksum should be correct
> before
> update_backups() holds the buffer lock. Also, in update_backups() we copy
> the
> entire superblock completely, and the checksum is unchanged, so we don't
> need
> to reset it.

So I agree the checksum should be matching but the backup superblock should
have also s_block_group_nr set properly and after updating that we need to
recalculate the checksum as well.

								Honza

> > > ---
> > >   fs/ext4/resize.c | 5 -----
> > >   1 file changed, 5 deletions(-)
> > > 
> > > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> > > index cb99b410c9fa..32fbfc173571 100644
> > > --- a/fs/ext4/resize.c
> > > +++ b/fs/ext4/resize.c
> > > @@ -1158,7 +1158,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
> > >   	while (group < sbi->s_groups_count) {
> > >   		struct buffer_head *bh;
> > >   		ext4_fsblk_t backup_block;
> > > -		struct ext4_super_block *es;
> > >   		/* Out of journal space, and can't get more - abort - so sad */
> > >   		err = ext4_resize_ensure_credits_batch(handle, 1);
> > > @@ -1187,10 +1186,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
> > >   		memcpy(bh->b_data, data, size);
> > >   		if (rest)
> > >   			memset(bh->b_data + size, 0, rest);
> > > -		es = (struct ext4_super_block *) bh->b_data;
> > > -		es->s_block_group_nr = cpu_to_le16(group);
> > > -		if (ext4_has_metadata_csum(sb))
> > > -			es->s_checksum = ext4_superblock_csum(sb, es);
> > >   		set_buffer_uptodate(bh);
> > >   		unlock_buffer(bh);
> > >   		err = ext4_handle_dirty_metadata(handle, NULL, bh);
> > > -- 
> > > 2.31.1
> > > 
> Thank you for your review!
> -- 
> With Best Regards,
> Baokun Li
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] ext4: fix corrupt backup group descriptors after online resize
  2022-11-16 15:26       ` Jan Kara
@ 2022-11-17  1:37         ` Baokun Li
  0 siblings, 0 replies; 9+ messages in thread
From: Baokun Li @ 2022-11-17  1:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, ritesh.list, linux-kernel,
	yi.zhang, yukuai3

On 2022/11/16 23:26, Jan Kara wrote:
> On Wed 16-11-22 21:14:16, Baokun Li wrote:
>> On 2022/11/16 19:49, Jan Kara wrote:
>>> On Wed 16-11-22 15:28:01, Baokun Li wrote:
>>>> In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end
>>>> of the online resize"), it is assumed that update_backups() only updates
>>>> backup superblocks, so each b_data is treated as a backupsuper block to
>>>> update its s_block_group_nr and s_checksum. However, update_backups()
>>>> also updates the backup group descriptors, which causes the backup group
>>>> descriptors to be corrupted.
>>>>
>>>> The above commit fixes the problem of invalid checksum of the backup
>>>> superblock. The root cause of this problem is that the checksum of
>>>> ext4_update_super() is not set correctly. This problem has been fixed
>>>> in the previous patch ("ext4: fix bad checksum after online resize").
>>>> Therefore, roll back some modifications in the above commit.
>>>>
>>>> Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize")
>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>> So I agree commit 9a8c5b0d0615 is broken and does corrupt group
>>> descriptors. However I don't see how PATCH 1/3 in this series would fix all
>>> the problems commit 9a8c5b0d0615 is trying to fix. In particular checksums
>>> on backup superblocks will not be properly set by the resize code AFAICT.
>>>
>>> 								Honza
>> I didn't find these two issues to be the same until I researched the problem
>> in
>> PATCH 3/3 and found that commit 9a8c5b0d0615 introduced a similar problem.
>> Then, it is found that the backup superblock is directly copied from the
>> primary
>> superblock. If the backup superblock is faulty, the primary superblock must
>> be
>> faulty. In this case, patch 1 that fixes the primary superblock problem is
>> thought
>> of. So by rolling back commit 9a8c5b0d0615 to verify, I found that patch 1
>> did
>> fix the problem.
>>
>> Only ext4_flex_group_add() and ext4_group_extend_no_check() call
>> update_backups() to update the backup superblock. Both of these functions
>> correctly set the checksum of the primary superblock. The backup superblocks
>> that are copied from them are also correct.
>>
>> In ext4_flex_group_add(), we only update the backup superblock if there are
>> no
>> previous errors, indicating that we must have updated the checksum in
>> ext4_update_super() before executing update_backups(). The previous problem
>> was that after we updated the checksum in ext4_update_super(), we modified
>> s_overhead_clusters, so the checksums for both the primary and backup
>> superblocks
>> were incorrect. This problem has been fixed in PATCH 1/3, so checksum is set
>> correctly in ext4_flex_group_add().
>>
>> The same is true in ext4_group_extend_no_check(), we only update the backup
>> superblock if there are no errors, and we execute ext4_superblock_csum_set()
>> to update the checksum before updating the backup superblock. Therefore,
>> checksum is correctly set in ext4_group_extend_no_check().
>>
>> I think we only need to ensure that the checksum is set correctly when the
>> buffer
>> lock of sbi->s_sbh is unlocked. Therefore, the checksum should be correct
>> before
>> update_backups() holds the buffer lock. Also, in update_backups() we copy
>> the
>> entire superblock completely, and the checksum is unchanged, so we don't
>> need
>> to reset it.
> So I agree the checksum should be matching but the backup superblock should
> have also s_block_group_nr set properly and after updating that we need to
> recalculate the checksum as well.
>
> 								Honza

Totally agree!

I will try to fix this in a better way in V3.

>>>> ---
>>>>    fs/ext4/resize.c | 5 -----
>>>>    1 file changed, 5 deletions(-)
>>>>
>>>>
Thank you for your review!
-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-17  1:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16  7:27 [PATCH v2 0/3] ext4: fix some bugs in online resize Baokun Li
2022-11-16  7:28 ` [PATCH v2 1/3] ext4: fix bad checksum after " Baokun Li
2022-11-16  7:28 ` [PATCH v2 2/3] ext4: fix corrupt backup group descriptors " Baokun Li
2022-11-16 11:49   ` Jan Kara
2022-11-16 13:14     ` Baokun Li
2022-11-16 15:26       ` Jan Kara
2022-11-17  1:37         ` Baokun Li
2022-11-16  7:28 ` [PATCH v2 3/3] ext4: fix corruption when online resizing a 1K bigalloc fs Baokun Li
2022-11-16 11:56   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.