linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] ext4: fix a inode checksum error
@ 2021-08-10 14:27 Zhang Yi
  2021-08-10 14:27 ` [PATCH 1/3] ext4: move inode eio simulation behind io completeion Zhang Yi
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Zhang Yi @ 2021-08-10 14:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

We find a checksum error and a inode corruption problem while doing
stress test, this 3 patches address to fix them.

 - Checksum error

    EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

 - Inode corruption

    e2fsck 1.46.0 (29-Jan-2020)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Entry 'foo' in / (2) has deleted/unused inode 17.  Clear<y>? yes
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Inode bitmap differences:  -17
    Fix<y>? yes
    Free inodes count wrong for group #0 (32750, counted=32751).
    Fix<y>? yes
    Free inodes count wrong (32750, counted=32751).
    Fix<y>? yes

The first patch is relate to the error simulation, and the second patch
is just a cleanup patch, which are prepare to do the fix. The last patch
fix these two issue.

Thanks,
Yi.



Zhang Yi (3):
  ext4: move inode eio simulation behind io completeion
  ext4: remove an unnecessary if statement in __ext4_get_inode_loc()
  ext4: prevent getting empty inode buffer

 fs/ext4/inode.c | 177 +++++++++++++++++++++++++-----------------------
 1 file changed, 94 insertions(+), 83 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] ext4: move inode eio simulation behind io completeion
  2021-08-10 14:27 [PATCH 0/3] ext4: fix a inode checksum error Zhang Yi
@ 2021-08-10 14:27 ` Zhang Yi
  2021-08-13 12:55   ` Jan Kara
  2021-08-10 14:27 ` [PATCH 2/3] ext4: remove an unnecessary if statement in __ext4_get_inode_loc() Zhang Yi
  2021-08-10 14:27 ` [PATCH 3/3] ext4: prevent getting empty inode buffer Zhang Yi
  2 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2021-08-10 14:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

No EIO simulation is required if the buffer is uptodate, so move the
simulation behind read bio completeion just like inode/block bitmap
simulation does.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d8de607849df..eb2526a35254 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4330,8 +4330,6 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	bh = sb_getblk(sb, block);
 	if (unlikely(!bh))
 		return -ENOMEM;
-	if (ext4_simulate_fail(sb, EXT4_SIM_INODE_EIO))
-		goto simulate_eio;
 	if (!buffer_uptodate(bh)) {
 		lock_buffer(bh);
 
@@ -4418,8 +4416,8 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 		ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
 		blk_finish_plug(&plug);
 		wait_on_buffer(bh);
+		ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
 		if (!buffer_uptodate(bh)) {
-		simulate_eio:
 			if (ret_block)
 				*ret_block = block;
 			brelse(bh);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] ext4: remove an unnecessary if statement in __ext4_get_inode_loc()
  2021-08-10 14:27 [PATCH 0/3] ext4: fix a inode checksum error Zhang Yi
  2021-08-10 14:27 ` [PATCH 1/3] ext4: move inode eio simulation behind io completeion Zhang Yi
@ 2021-08-10 14:27 ` Zhang Yi
  2021-08-13 13:00   ` Jan Kara
  2021-08-10 14:27 ` [PATCH 3/3] ext4: prevent getting empty inode buffer Zhang Yi
  2 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2021-08-10 14:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

The "if (!buffer_uptodate(bh))" hunk covered almost the whole code after
getting buffer in __ext4_get_inode_loc() which seems unnecessary, remove
it and switch to check ext4_buffer_uptodate(), it simplify code and make
it more readable.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 162 +++++++++++++++++++++++-------------------------
 1 file changed, 78 insertions(+), 84 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eb2526a35254..eae1b2d0b550 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4330,99 +4330,93 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	bh = sb_getblk(sb, block);
 	if (unlikely(!bh))
 		return -ENOMEM;
-	if (!buffer_uptodate(bh)) {
-		lock_buffer(bh);
-
-		if (ext4_buffer_uptodate(bh)) {
-			/* someone brought it uptodate while we waited */
-			unlock_buffer(bh);
-			goto has_buffer;
-		}
+	if (ext4_buffer_uptodate(bh))
+		goto has_buffer;
 
-		/*
-		 * If we have all information of the inode in memory and this
-		 * is the only valid inode in the block, we need not read the
-		 * block.
-		 */
-		if (in_mem) {
-			struct buffer_head *bitmap_bh;
-			int i, start;
+	lock_buffer(bh);
+	/*
+	 * If we have all information of the inode in memory and this
+	 * is the only valid inode in the block, we need not read the
+	 * block.
+	 */
+	if (in_mem) {
+		struct buffer_head *bitmap_bh;
+		int i, start;
 
-			start = inode_offset & ~(inodes_per_block - 1);
+		start = inode_offset & ~(inodes_per_block - 1);
 
-			/* Is the inode bitmap in cache? */
-			bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp));
-			if (unlikely(!bitmap_bh))
-				goto make_io;
+		/* Is the inode bitmap in cache? */
+		bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp));
+		if (unlikely(!bitmap_bh))
+			goto make_io;
 
-			/*
-			 * If the inode bitmap isn't in cache then the
-			 * optimisation may end up performing two reads instead
-			 * of one, so skip it.
-			 */
-			if (!buffer_uptodate(bitmap_bh)) {
-				brelse(bitmap_bh);
-				goto make_io;
-			}
-			for (i = start; i < start + inodes_per_block; i++) {
-				if (i == inode_offset)
-					continue;
-				if (ext4_test_bit(i, bitmap_bh->b_data))
-					break;
-			}
+		/*
+		 * If the inode bitmap isn't in cache then the
+		 * optimisation may end up performing two reads instead
+		 * of one, so skip it.
+		 */
+		if (!buffer_uptodate(bitmap_bh)) {
 			brelse(bitmap_bh);
-			if (i == start + inodes_per_block) {
-				/* all other inodes are free, so skip I/O */
-				memset(bh->b_data, 0, bh->b_size);
-				set_buffer_uptodate(bh);
-				unlock_buffer(bh);
-				goto has_buffer;
-			}
+			goto make_io;
+		}
+		for (i = start; i < start + inodes_per_block; i++) {
+			if (i == inode_offset)
+				continue;
+			if (ext4_test_bit(i, bitmap_bh->b_data))
+				break;
 		}
+		brelse(bitmap_bh);
+		if (i == start + inodes_per_block) {
+			/* all other inodes are free, so skip I/O */
+			memset(bh->b_data, 0, bh->b_size);
+			set_buffer_uptodate(bh);
+			unlock_buffer(bh);
+			goto has_buffer;
+		}
+	}
 
 make_io:
-		/*
-		 * If we need to do any I/O, try to pre-readahead extra
-		 * blocks from the inode table.
-		 */
-		blk_start_plug(&plug);
-		if (EXT4_SB(sb)->s_inode_readahead_blks) {
-			ext4_fsblk_t b, end, table;
-			unsigned num;
-			__u32 ra_blks = EXT4_SB(sb)->s_inode_readahead_blks;
-
-			table = ext4_inode_table(sb, gdp);
-			/* s_inode_readahead_blks is always a power of 2 */
-			b = block & ~((ext4_fsblk_t) ra_blks - 1);
-			if (table > b)
-				b = table;
-			end = b + ra_blks;
-			num = EXT4_INODES_PER_GROUP(sb);
-			if (ext4_has_group_desc_csum(sb))
-				num -= ext4_itable_unused_count(sb, gdp);
-			table += num / inodes_per_block;
-			if (end > table)
-				end = table;
-			while (b <= end)
-				ext4_sb_breadahead_unmovable(sb, b++);
-		}
+	/*
+	 * If we need to do any I/O, try to pre-readahead extra
+	 * blocks from the inode table.
+	 */
+	blk_start_plug(&plug);
+	if (EXT4_SB(sb)->s_inode_readahead_blks) {
+		ext4_fsblk_t b, end, table;
+		unsigned num;
+		__u32 ra_blks = EXT4_SB(sb)->s_inode_readahead_blks;
+
+		table = ext4_inode_table(sb, gdp);
+		/* s_inode_readahead_blks is always a power of 2 */
+		b = block & ~((ext4_fsblk_t) ra_blks - 1);
+		if (table > b)
+			b = table;
+		end = b + ra_blks;
+		num = EXT4_INODES_PER_GROUP(sb);
+		if (ext4_has_group_desc_csum(sb))
+			num -= ext4_itable_unused_count(sb, gdp);
+		table += num / inodes_per_block;
+		if (end > table)
+			end = table;
+		while (b <= end)
+			ext4_sb_breadahead_unmovable(sb, b++);
+	}
 
-		/*
-		 * There are other valid inodes in the buffer, this inode
-		 * has in-inode xattrs, or we don't have this inode in memory.
-		 * Read the block from disk.
-		 */
-		trace_ext4_load_inode(sb, ino);
-		ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
-		blk_finish_plug(&plug);
-		wait_on_buffer(bh);
-		ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
-		if (!buffer_uptodate(bh)) {
-			if (ret_block)
-				*ret_block = block;
-			brelse(bh);
-			return -EIO;
-		}
+	/*
+	 * There are other valid inodes in the buffer, this inode
+	 * has in-inode xattrs, or we don't have this inode in memory.
+	 * Read the block from disk.
+	 */
+	trace_ext4_load_inode(sb, ino);
+	ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
+	blk_finish_plug(&plug);
+	wait_on_buffer(bh);
+	ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
+	if (!buffer_uptodate(bh)) {
+		if (ret_block)
+			*ret_block = block;
+		brelse(bh);
+		return -EIO;
 	}
 has_buffer:
 	iloc->bh = bh;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] ext4: prevent getting empty inode buffer
  2021-08-10 14:27 [PATCH 0/3] ext4: fix a inode checksum error Zhang Yi
  2021-08-10 14:27 ` [PATCH 1/3] ext4: move inode eio simulation behind io completeion Zhang Yi
  2021-08-10 14:27 ` [PATCH 2/3] ext4: remove an unnecessary if statement in __ext4_get_inode_loc() Zhang Yi
@ 2021-08-10 14:27 ` Zhang Yi
  2021-08-13 13:44   ` Jan Kara
  2 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2021-08-10 14:27 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

  EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch clear uptodate flag and mark buffer new if we get an empty
buffer, clear it after we fill inode data or making read IO.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eae1b2d0b550..1f36289e9ef6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4292,6 +4292,18 @@ int ext4_truncate(struct inode *inode)
 	return err;
 }
 
+static void ext4_end_inode_loc_read(struct buffer_head *bh, int uptodate)
+{
+	if (buffer_new(bh))
+		clear_buffer_new(bh);
+	if (uptodate)
+		set_buffer_uptodate(bh);
+	else
+		clear_buffer_uptodate(bh);
+	unlock_buffer(bh);
+	put_bh(bh);
+}
+
 /*
  * ext4_get_inode_loc returns with an extra refcount against the inode's
  * underlying buffer_head on success. If 'in_mem' is true, we have all
@@ -4367,9 +4379,11 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 		}
 		brelse(bitmap_bh);
 		if (i == start + inodes_per_block) {
-			/* all other inodes are free, so skip I/O */
-			memset(bh->b_data, 0, bh->b_size);
-			set_buffer_uptodate(bh);
+			if (!buffer_new(bh)) {
+				/* all other inodes are free, so skip I/O */
+				memset(bh->b_data, 0, bh->b_size);
+				set_buffer_new(bh);
+			}
 			unlock_buffer(bh);
 			goto has_buffer;
 		}
@@ -4408,7 +4422,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	 * Read the block from disk.
 	 */
 	trace_ext4_load_inode(sb, ino);
-	ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
+	ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, ext4_end_inode_loc_read);
 	blk_finish_plug(&plug);
 	wait_on_buffer(bh);
 	ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
@@ -5132,6 +5146,11 @@ static int ext4_do_update_inode(handle_t *handle,
 	if (err)
 		goto out_brelse;
 	ext4_clear_inode_state(inode, EXT4_STATE_NEW);
+	if (buffer_new(bh)) {
+		clear_buffer_new(bh);
+		set_buffer_uptodate(bh);
+	}
+
 	if (set_large_file) {
 		BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access");
 		err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ext4: move inode eio simulation behind io completeion
  2021-08-10 14:27 ` [PATCH 1/3] ext4: move inode eio simulation behind io completeion Zhang Yi
@ 2021-08-13 12:55   ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2021-08-13 12:55 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 10-08-21 22:27:20, Zhang Yi wrote:
> No EIO simulation is required if the buffer is uptodate, so move the
> simulation behind read bio completeion just like inode/block bitmap
> simulation does.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Makes sense. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d8de607849df..eb2526a35254 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4330,8 +4330,6 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  	bh = sb_getblk(sb, block);
>  	if (unlikely(!bh))
>  		return -ENOMEM;
> -	if (ext4_simulate_fail(sb, EXT4_SIM_INODE_EIO))
> -		goto simulate_eio;
>  	if (!buffer_uptodate(bh)) {
>  		lock_buffer(bh);
>  
> @@ -4418,8 +4416,8 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  		ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
>  		blk_finish_plug(&plug);
>  		wait_on_buffer(bh);
> +		ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
>  		if (!buffer_uptodate(bh)) {
> -		simulate_eio:
>  			if (ret_block)
>  				*ret_block = block;
>  			brelse(bh);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] ext4: remove an unnecessary if statement in __ext4_get_inode_loc()
  2021-08-10 14:27 ` [PATCH 2/3] ext4: remove an unnecessary if statement in __ext4_get_inode_loc() Zhang Yi
@ 2021-08-13 13:00   ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2021-08-13 13:00 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 10-08-21 22:27:21, Zhang Yi wrote:
> The "if (!buffer_uptodate(bh))" hunk covered almost the whole code after
> getting buffer in __ext4_get_inode_loc() which seems unnecessary, remove
> it and switch to check ext4_buffer_uptodate(), it simplify code and make
> it more readable.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 162 +++++++++++++++++++++++-------------------------
>  1 file changed, 78 insertions(+), 84 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index eb2526a35254..eae1b2d0b550 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4330,99 +4330,93 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  	bh = sb_getblk(sb, block);
>  	if (unlikely(!bh))
>  		return -ENOMEM;
> -	if (!buffer_uptodate(bh)) {
> -		lock_buffer(bh);
> -
> -		if (ext4_buffer_uptodate(bh)) {
> -			/* someone brought it uptodate while we waited */
> -			unlock_buffer(bh);
> -			goto has_buffer;
> -		}
> +	if (ext4_buffer_uptodate(bh))
> +		goto has_buffer;
>  
> -		/*
> -		 * If we have all information of the inode in memory and this
> -		 * is the only valid inode in the block, we need not read the
> -		 * block.
> -		 */
> -		if (in_mem) {
> -			struct buffer_head *bitmap_bh;
> -			int i, start;
> +	lock_buffer(bh);
> +	/*
> +	 * If we have all information of the inode in memory and this
> +	 * is the only valid inode in the block, we need not read the
> +	 * block.
> +	 */
> +	if (in_mem) {
> +		struct buffer_head *bitmap_bh;
> +		int i, start;
>  
> -			start = inode_offset & ~(inodes_per_block - 1);
> +		start = inode_offset & ~(inodes_per_block - 1);
>  
> -			/* Is the inode bitmap in cache? */
> -			bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp));
> -			if (unlikely(!bitmap_bh))
> -				goto make_io;
> +		/* Is the inode bitmap in cache? */
> +		bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp));
> +		if (unlikely(!bitmap_bh))
> +			goto make_io;
>  
> -			/*
> -			 * If the inode bitmap isn't in cache then the
> -			 * optimisation may end up performing two reads instead
> -			 * of one, so skip it.
> -			 */
> -			if (!buffer_uptodate(bitmap_bh)) {
> -				brelse(bitmap_bh);
> -				goto make_io;
> -			}
> -			for (i = start; i < start + inodes_per_block; i++) {
> -				if (i == inode_offset)
> -					continue;
> -				if (ext4_test_bit(i, bitmap_bh->b_data))
> -					break;
> -			}
> +		/*
> +		 * If the inode bitmap isn't in cache then the
> +		 * optimisation may end up performing two reads instead
> +		 * of one, so skip it.
> +		 */
> +		if (!buffer_uptodate(bitmap_bh)) {
>  			brelse(bitmap_bh);
> -			if (i == start + inodes_per_block) {
> -				/* all other inodes are free, so skip I/O */
> -				memset(bh->b_data, 0, bh->b_size);
> -				set_buffer_uptodate(bh);
> -				unlock_buffer(bh);
> -				goto has_buffer;
> -			}
> +			goto make_io;
> +		}
> +		for (i = start; i < start + inodes_per_block; i++) {
> +			if (i == inode_offset)
> +				continue;
> +			if (ext4_test_bit(i, bitmap_bh->b_data))
> +				break;
>  		}
> +		brelse(bitmap_bh);
> +		if (i == start + inodes_per_block) {
> +			/* all other inodes are free, so skip I/O */
> +			memset(bh->b_data, 0, bh->b_size);
> +			set_buffer_uptodate(bh);
> +			unlock_buffer(bh);
> +			goto has_buffer;
> +		}
> +	}
>  
>  make_io:
> -		/*
> -		 * If we need to do any I/O, try to pre-readahead extra
> -		 * blocks from the inode table.
> -		 */
> -		blk_start_plug(&plug);
> -		if (EXT4_SB(sb)->s_inode_readahead_blks) {
> -			ext4_fsblk_t b, end, table;
> -			unsigned num;
> -			__u32 ra_blks = EXT4_SB(sb)->s_inode_readahead_blks;
> -
> -			table = ext4_inode_table(sb, gdp);
> -			/* s_inode_readahead_blks is always a power of 2 */
> -			b = block & ~((ext4_fsblk_t) ra_blks - 1);
> -			if (table > b)
> -				b = table;
> -			end = b + ra_blks;
> -			num = EXT4_INODES_PER_GROUP(sb);
> -			if (ext4_has_group_desc_csum(sb))
> -				num -= ext4_itable_unused_count(sb, gdp);
> -			table += num / inodes_per_block;
> -			if (end > table)
> -				end = table;
> -			while (b <= end)
> -				ext4_sb_breadahead_unmovable(sb, b++);
> -		}
> +	/*
> +	 * If we need to do any I/O, try to pre-readahead extra
> +	 * blocks from the inode table.
> +	 */
> +	blk_start_plug(&plug);
> +	if (EXT4_SB(sb)->s_inode_readahead_blks) {
> +		ext4_fsblk_t b, end, table;
> +		unsigned num;
> +		__u32 ra_blks = EXT4_SB(sb)->s_inode_readahead_blks;
> +
> +		table = ext4_inode_table(sb, gdp);
> +		/* s_inode_readahead_blks is always a power of 2 */
> +		b = block & ~((ext4_fsblk_t) ra_blks - 1);
> +		if (table > b)
> +			b = table;
> +		end = b + ra_blks;
> +		num = EXT4_INODES_PER_GROUP(sb);
> +		if (ext4_has_group_desc_csum(sb))
> +			num -= ext4_itable_unused_count(sb, gdp);
> +		table += num / inodes_per_block;
> +		if (end > table)
> +			end = table;
> +		while (b <= end)
> +			ext4_sb_breadahead_unmovable(sb, b++);
> +	}
>  
> -		/*
> -		 * There are other valid inodes in the buffer, this inode
> -		 * has in-inode xattrs, or we don't have this inode in memory.
> -		 * Read the block from disk.
> -		 */
> -		trace_ext4_load_inode(sb, ino);
> -		ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
> -		blk_finish_plug(&plug);
> -		wait_on_buffer(bh);
> -		ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
> -		if (!buffer_uptodate(bh)) {
> -			if (ret_block)
> -				*ret_block = block;
> -			brelse(bh);
> -			return -EIO;
> -		}
> +	/*
> +	 * There are other valid inodes in the buffer, this inode
> +	 * has in-inode xattrs, or we don't have this inode in memory.
> +	 * Read the block from disk.
> +	 */
> +	trace_ext4_load_inode(sb, ino);
> +	ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
> +	blk_finish_plug(&plug);
> +	wait_on_buffer(bh);
> +	ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
> +	if (!buffer_uptodate(bh)) {
> +		if (ret_block)
> +			*ret_block = block;
> +		brelse(bh);
> +		return -EIO;
>  	}
>  has_buffer:
>  	iloc->bh = bh;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: prevent getting empty inode buffer
  2021-08-10 14:27 ` [PATCH 3/3] ext4: prevent getting empty inode buffer Zhang Yi
@ 2021-08-13 13:44   ` Jan Kara
  2021-08-16 14:29     ` Zhang Yi
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2021-08-13 13:44 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 10-08-21 22:27:22, Zhang Yi wrote:
> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
> inode buffer when the inode monopolize an inode block for performance
> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
> buffer to make it fine, but we could miss this call if something bad
> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
> empty inode buffer and trigger ext4 error.
> 
> For example, if we remove a nonexistent xattr on inode A,
> ext4_xattr_set_handle() will return ENODATA before invoking
> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
> will get checksum error message in ext4_iget() when getting inode again.
> 
>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> 
> Even worse, if we allocate another inode B at the same inode block, it
> will corrupt the inode A on disk when write back inode B.
> 
> So this patch clear uptodate flag and mark buffer new if we get an empty
> buffer, clear it after we fill inode data or making read IO.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Thanks for the fix! Really good catch! The patch looks correct but
honestly, I'm not very happy about the special buffer_new handling. It
looks correct but I'm a bit uneasy that e.g. the block device code can
access this buffer and manipulate its state. Cannot we instead e.g. check
whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it,
if still not uptodate, zero it, mark as uptodate, unlock it and then call
ext4_do_update_inode()? That would seem like a bit more foolproof solution
to me. Basically the fact that the buffer is not uptodate in
ext4_mark_iloc_dirty() would mean that nobody else is past
__ext4_get_inode_loc() for another inode in that buffer and so zeroing is
safe.

								Honza

> ---
>  fs/ext4/inode.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index eae1b2d0b550..1f36289e9ef6 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4292,6 +4292,18 @@ int ext4_truncate(struct inode *inode)
>  	return err;
>  }
>  
> +static void ext4_end_inode_loc_read(struct buffer_head *bh, int uptodate)
> +{
> +	if (buffer_new(bh))
> +		clear_buffer_new(bh);
> +	if (uptodate)
> +		set_buffer_uptodate(bh);
> +	else
> +		clear_buffer_uptodate(bh);
> +	unlock_buffer(bh);
> +	put_bh(bh);
> +}
> +
>  /*
>   * ext4_get_inode_loc returns with an extra refcount against the inode's
>   * underlying buffer_head on success. If 'in_mem' is true, we have all
> @@ -4367,9 +4379,11 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  		}
>  		brelse(bitmap_bh);
>  		if (i == start + inodes_per_block) {
> -			/* all other inodes are free, so skip I/O */
> -			memset(bh->b_data, 0, bh->b_size);
> -			set_buffer_uptodate(bh);
> +			if (!buffer_new(bh)) {
> +				/* all other inodes are free, so skip I/O */
> +				memset(bh->b_data, 0, bh->b_size);
> +				set_buffer_new(bh);
> +			}
>  			unlock_buffer(bh);
>  			goto has_buffer;
>  		}
> @@ -4408,7 +4422,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  	 * Read the block from disk.
>  	 */
>  	trace_ext4_load_inode(sb, ino);
> -	ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
> +	ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, ext4_end_inode_loc_read);
>  	blk_finish_plug(&plug);
>  	wait_on_buffer(bh);
>  	ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO);
> @@ -5132,6 +5146,11 @@ static int ext4_do_update_inode(handle_t *handle,
>  	if (err)
>  		goto out_brelse;
>  	ext4_clear_inode_state(inode, EXT4_STATE_NEW);
> +	if (buffer_new(bh)) {
> +		clear_buffer_new(bh);
> +		set_buffer_uptodate(bh);
> +	}
> +
>  	if (set_large_file) {
>  		BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access");
>  		err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: prevent getting empty inode buffer
  2021-08-13 13:44   ` Jan Kara
@ 2021-08-16 14:29     ` Zhang Yi
  2021-08-16 17:14       ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2021-08-16 14:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/8/13 21:44, Jan Kara wrote:
> On Tue 10-08-21 22:27:22, Zhang Yi wrote:
>> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
>> inode buffer when the inode monopolize an inode block for performance
>> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
>> buffer to make it fine, but we could miss this call if something bad
>> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
>> empty inode buffer and trigger ext4 error.
>>
>> For example, if we remove a nonexistent xattr on inode A,
>> ext4_xattr_set_handle() will return ENODATA before invoking
>> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
>> will get checksum error message in ext4_iget() when getting inode again.
>>
>>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
>>
>> Even worse, if we allocate another inode B at the same inode block, it
>> will corrupt the inode A on disk when write back inode B.
>>
>> So this patch clear uptodate flag and mark buffer new if we get an empty
>> buffer, clear it after we fill inode data or making read IO.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Thanks for the fix! Really good catch! The patch looks correct but
> honestly, I'm not very happy about the special buffer_new handling. It
> looks correct but I'm a bit uneasy that e.g. the block device code can
> access this buffer and manipulate its state. Cannot we instead e.g. check
> whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it,
> if still not uptodate, zero it, mark as uptodate, unlock it and then call
> ext4_do_update_inode()? That would seem like a bit more foolproof solution
> to me. Basically the fact that the buffer is not uptodate in
> ext4_mark_iloc_dirty() would mean that nobody else is past
> __ext4_get_inode_loc() for another inode in that buffer and so zeroing is
> safe.
> 

Thanks for your suggestion! I understand what you're concerned and your
approach looks fine except mark buffer uptodate just behind zero buffer
in ext4_mark_iloc_dirty(). Because I think (1) if ext4_do_update_inode()
return error before filling the inode, it will still left an uptodate
but zero buffer, and it's not easy to handle the error path. (2) it is
still not conform the semantic of buffer uptodate because it it not
contain an uptodate inode information. How about move mark as uptodate
into ext4_do_update_inode(), something like that(not tested)?

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eae1b2d0b550..99ccba8d47c6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4368,8 +4368,6 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
                brelse(bitmap_bh);
                if (i == start + inodes_per_block) {
                        /* all other inodes are free, so skip I/O */
-                       memset(bh->b_data, 0, bh->b_size);
-                       set_buffer_uptodate(bh);
                        unlock_buffer(bh);
                        goto has_buffer;
                }
@@ -5132,6 +5130,9 @@ static int ext4_do_update_inode(handle_t *handle,
        if (err)
                goto out_brelse;
        ext4_clear_inode_state(inode, EXT4_STATE_NEW);
+       if (!buffer_uptodate(bh))
+               set_buffer_uptodate(bh);
+
        if (set_large_file) {
                BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access");
                err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
@@ -5712,6 +5713,13 @@ int ext4_mark_iloc_dirty(handle_t *handle,
        /* the do_update_inode consumes one bh->b_count */
        get_bh(iloc->bh);

+       if (!buffer_uptodate(bh)) {
+               lock_buffer(iloc->bh);
+               if (!buffer_uptodate(iloc->bh))
+                       memset(iloc->bh->b_data, 0, iloc->bh->b_size);
+               unlock_buffer(iloc->bh);
+       }
+
        /* ext4_do_update_inode() does jbd2_journal_dirty_metadata */
        err = ext4_do_update_inode(handle, inode, iloc);
        put_bh(iloc->bh);


Thanks,
Yi.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: prevent getting empty inode buffer
  2021-08-16 14:29     ` Zhang Yi
@ 2021-08-16 17:14       ` Jan Kara
  2021-08-18 12:15         ` Zhang Yi
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2021-08-16 17:14 UTC (permalink / raw)
  To: Zhang Yi; +Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, yukuai3

On Mon 16-08-21 22:29:01, Zhang Yi wrote:
> On 2021/8/13 21:44, Jan Kara wrote:
> > On Tue 10-08-21 22:27:22, Zhang Yi wrote:
> >> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
> >> inode buffer when the inode monopolize an inode block for performance
> >> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
> >> buffer to make it fine, but we could miss this call if something bad
> >> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
> >> empty inode buffer and trigger ext4 error.
> >>
> >> For example, if we remove a nonexistent xattr on inode A,
> >> ext4_xattr_set_handle() will return ENODATA before invoking
> >> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
> >> will get checksum error message in ext4_iget() when getting inode again.
> >>
> >>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> >>
> >> Even worse, if we allocate another inode B at the same inode block, it
> >> will corrupt the inode A on disk when write back inode B.
> >>
> >> So this patch clear uptodate flag and mark buffer new if we get an empty
> >> buffer, clear it after we fill inode data or making read IO.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > Thanks for the fix! Really good catch! The patch looks correct but
> > honestly, I'm not very happy about the special buffer_new handling. It
> > looks correct but I'm a bit uneasy that e.g. the block device code can
> > access this buffer and manipulate its state. Cannot we instead e.g. check
> > whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it,
> > if still not uptodate, zero it, mark as uptodate, unlock it and then call
> > ext4_do_update_inode()? That would seem like a bit more foolproof solution
> > to me. Basically the fact that the buffer is not uptodate in
> > ext4_mark_iloc_dirty() would mean that nobody else is past
> > __ext4_get_inode_loc() for another inode in that buffer and so zeroing is
> > safe.
> > 
> 
> Thanks for your suggestion! I understand what you're concerned and your
> approach looks fine except mark buffer uptodate just behind zero buffer
> in ext4_mark_iloc_dirty(). Because I think (1) if ext4_do_update_inode()
> return error before filling the inode, it will still left an uptodate
> but zero buffer, and it's not easy to handle the error path. (2) it is
> still not conform the semantic of buffer uptodate because it it not
> contain an uptodate inode information. How about move mark as uptodate
> into ext4_do_update_inode(), something like that(not tested)?

OK, but this way could loading of buffer from the disk race with
ext4_do_update_inode() and overwrite its updates? You have to have buffer
uptodate before you start modifying it or you have to keep the buffer
locked all the time while you are updating it to avoid such races.

Luckily the only place where ext4_do_update_inode() can fail before copying
data to the buffer is due to ext4_inode_blocks_set() which should never
happen because we set s_maxsize so that i_blocks cannot overflow. So maybe
we can just get rid of that case and keep the uptodate setting with the
zeroing?

								Honza

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index eae1b2d0b550..99ccba8d47c6 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4368,8 +4368,6 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>                 brelse(bitmap_bh);
>                 if (i == start + inodes_per_block) {
>                         /* all other inodes are free, so skip I/O */
> -                       memset(bh->b_data, 0, bh->b_size);
> -                       set_buffer_uptodate(bh);
>                         unlock_buffer(bh);
>                         goto has_buffer;
>                 }
> @@ -5132,6 +5130,9 @@ static int ext4_do_update_inode(handle_t *handle,
>         if (err)
>                 goto out_brelse;
>         ext4_clear_inode_state(inode, EXT4_STATE_NEW);
> +       if (!buffer_uptodate(bh))
> +               set_buffer_uptodate(bh);
> +
>         if (set_large_file) {
>                 BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access");
>                 err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
> @@ -5712,6 +5713,13 @@ int ext4_mark_iloc_dirty(handle_t *handle,
>         /* the do_update_inode consumes one bh->b_count */
>         get_bh(iloc->bh);
> 
> +       if (!buffer_uptodate(bh)) {
> +               lock_buffer(iloc->bh);
> +               if (!buffer_uptodate(iloc->bh))
> +                       memset(iloc->bh->b_data, 0, iloc->bh->b_size);
> +               unlock_buffer(iloc->bh);
> +       }
> +
>         /* ext4_do_update_inode() does jbd2_journal_dirty_metadata */
>         err = ext4_do_update_inode(handle, inode, iloc);
>         put_bh(iloc->bh);
> 
> 
> Thanks,
> Yi.
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: prevent getting empty inode buffer
  2021-08-16 17:14       ` Jan Kara
@ 2021-08-18 12:15         ` Zhang Yi
  2021-08-18 13:11           ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2021-08-18 12:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/8/17 1:14, Jan Kara wrote:
> On Mon 16-08-21 22:29:01, Zhang Yi wrote:
>> On 2021/8/13 21:44, Jan Kara wrote:
>>> On Tue 10-08-21 22:27:22, Zhang Yi wrote:
>>>> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
>>>> inode buffer when the inode monopolize an inode block for performance
>>>> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
>>>> buffer to make it fine, but we could miss this call if something bad
>>>> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
>>>> empty inode buffer and trigger ext4 error.
>>>>
>>>> For example, if we remove a nonexistent xattr on inode A,
>>>> ext4_xattr_set_handle() will return ENODATA before invoking
>>>> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
>>>> will get checksum error message in ext4_iget() when getting inode again.
>>>>
>>>>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
>>>>
>>>> Even worse, if we allocate another inode B at the same inode block, it
>>>> will corrupt the inode A on disk when write back inode B.
>>>>
>>>> So this patch clear uptodate flag and mark buffer new if we get an empty
>>>> buffer, clear it after we fill inode data or making read IO.
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> Thanks for the fix! Really good catch! The patch looks correct but
>>> honestly, I'm not very happy about the special buffer_new handling. It
>>> looks correct but I'm a bit uneasy that e.g. the block device code can
>>> access this buffer and manipulate its state. Cannot we instead e.g. check
>>> whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it,
>>> if still not uptodate, zero it, mark as uptodate, unlock it and then call
>>> ext4_do_update_inode()? That would seem like a bit more foolproof solution
>>> to me. Basically the fact that the buffer is not uptodate in
>>> ext4_mark_iloc_dirty() would mean that nobody else is past
>>> __ext4_get_inode_loc() for another inode in that buffer and so zeroing is
>>> safe.
>>>
>>
>> Thanks for your suggestion! I understand what you're concerned and your
>> approach looks fine except mark buffer uptodate just behind zero buffer
>> in ext4_mark_iloc_dirty(). Because I think (1) if ext4_do_update_inode()
>> return error before filling the inode, it will still left an uptodate
>> but zero buffer, and it's not easy to handle the error path. (2) it is
>> still not conform the semantic of buffer uptodate because it it not
>> contain an uptodate inode information. How about move mark as uptodate
>> into ext4_do_update_inode(), something like that(not tested)?
> 
> OK, but this way could loading of buffer from the disk race with
> ext4_do_update_inode() and overwrite its updates? You have to have buffer
> uptodate before you start modifying it or you have to keep the buffer
> locked all the time while you are updating it to avoid such races.

Indeed.

> 
> Luckily the only place where ext4_do_update_inode() can fail before copying
> data to the buffer is due to ext4_inode_blocks_set() which should never
> happen because we set s_maxsize so that i_blocks cannot overflow. So maybe
> we can just get rid of that case and keep the uptodate setting with the
> zeroing?
> 

It's fine, Let's fix it this way now.(But I guess it's fragile because we have
to prevent modify ext4_do_update_inode() return before filling data into inode
buffer cautiously in the future.)

BTW, could we also add a patch to just remove the ext4_has_feature_huge_file()
check in ext4_inode_blocks_set() or move it to ext4_mark_iloc_dirty() before
ext4_mark_iloc_dirty()? Or else we may get confused and have to add comments to
explain it.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: prevent getting empty inode buffer
  2021-08-18 12:15         ` Zhang Yi
@ 2021-08-18 13:11           ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2021-08-18 13:11 UTC (permalink / raw)
  To: Zhang Yi; +Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, yukuai3

On Wed 18-08-21 20:15:59, Zhang Yi wrote:
> On 2021/8/17 1:14, Jan Kara wrote:
> > On Mon 16-08-21 22:29:01, Zhang Yi wrote:
> >> On 2021/8/13 21:44, Jan Kara wrote:
> >>> On Tue 10-08-21 22:27:22, Zhang Yi wrote:
> >>>> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
> >>>> inode buffer when the inode monopolize an inode block for performance
> >>>> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
> >>>> buffer to make it fine, but we could miss this call if something bad
> >>>> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
> >>>> empty inode buffer and trigger ext4 error.
> >>>>
> >>>> For example, if we remove a nonexistent xattr on inode A,
> >>>> ext4_xattr_set_handle() will return ENODATA before invoking
> >>>> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
> >>>> will get checksum error message in ext4_iget() when getting inode again.
> >>>>
> >>>>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> >>>>
> >>>> Even worse, if we allocate another inode B at the same inode block, it
> >>>> will corrupt the inode A on disk when write back inode B.
> >>>>
> >>>> So this patch clear uptodate flag and mark buffer new if we get an empty
> >>>> buffer, clear it after we fill inode data or making read IO.
> >>>>
> >>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >>>
> >>> Thanks for the fix! Really good catch! The patch looks correct but
> >>> honestly, I'm not very happy about the special buffer_new handling. It
> >>> looks correct but I'm a bit uneasy that e.g. the block device code can
> >>> access this buffer and manipulate its state. Cannot we instead e.g. check
> >>> whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it,
> >>> if still not uptodate, zero it, mark as uptodate, unlock it and then call
> >>> ext4_do_update_inode()? That would seem like a bit more foolproof solution
> >>> to me. Basically the fact that the buffer is not uptodate in
> >>> ext4_mark_iloc_dirty() would mean that nobody else is past
> >>> __ext4_get_inode_loc() for another inode in that buffer and so zeroing is
> >>> safe.
> >>>
> >>
> >> Thanks for your suggestion! I understand what you're concerned and your
> >> approach looks fine except mark buffer uptodate just behind zero buffer
> >> in ext4_mark_iloc_dirty(). Because I think (1) if ext4_do_update_inode()
> >> return error before filling the inode, it will still left an uptodate
> >> but zero buffer, and it's not easy to handle the error path. (2) it is
> >> still not conform the semantic of buffer uptodate because it it not
> >> contain an uptodate inode information. How about move mark as uptodate
> >> into ext4_do_update_inode(), something like that(not tested)?
> > 
> > OK, but this way could loading of buffer from the disk race with
> > ext4_do_update_inode() and overwrite its updates? You have to have buffer
> > uptodate before you start modifying it or you have to keep the buffer
> > locked all the time while you are updating it to avoid such races.
> 
> Indeed.
> 
> > 
> > Luckily the only place where ext4_do_update_inode() can fail before copying
> > data to the buffer is due to ext4_inode_blocks_set() which should never
> > happen because we set s_maxsize so that i_blocks cannot overflow. So maybe
> > we can just get rid of that case and keep the uptodate setting with the
> > zeroing?
> > 
> 
> It's fine, Let's fix it this way now.(But I guess it's fragile because we
> have to prevent modify ext4_do_update_inode() return before filling data
> into inode buffer cautiously in the future.)

I guess can have "bring buffer uptodate" code in ext4_do_update_inode() to
have it closer to the code filling the buffer with correct data and comment
there that once we mark the buffer as uptodate, we rely on filling in the
correct information.

> BTW, could we also add a patch to just remove the
> ext4_has_feature_huge_file() check in ext4_inode_blocks_set() or move it
> to ext4_mark_iloc_dirty() before ext4_mark_iloc_dirty()? Or else we may
> get confused and have to add comments to explain it.

Yes, I would transform that check to WARN_ON_ONCE() with a comment that
sb->s_maxbytes should not have allowed this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-18 13:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 14:27 [PATCH 0/3] ext4: fix a inode checksum error Zhang Yi
2021-08-10 14:27 ` [PATCH 1/3] ext4: move inode eio simulation behind io completeion Zhang Yi
2021-08-13 12:55   ` Jan Kara
2021-08-10 14:27 ` [PATCH 2/3] ext4: remove an unnecessary if statement in __ext4_get_inode_loc() Zhang Yi
2021-08-13 13:00   ` Jan Kara
2021-08-10 14:27 ` [PATCH 3/3] ext4: prevent getting empty inode buffer Zhang Yi
2021-08-13 13:44   ` Jan Kara
2021-08-16 14:29     ` Zhang Yi
2021-08-16 17:14       ` Jan Kara
2021-08-18 12:15         ` Zhang Yi
2021-08-18 13:11           ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).