linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
@ 2021-05-06 14:10 Ye Bin
  2021-06-03  1:47 ` Theodore Ts'o
  0 siblings, 1 reply; 2+ messages in thread
From: Ye Bin @ 2021-05-06 14:10 UTC (permalink / raw)
  To: tytso, jack, adilger.kernel, linux-ext4, linux-kernel; +Cc: Ye Bin

We got follow bug_on when run fsstress with injecting IO fault:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
[130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995]  ext4_readpage+0x54/0x100 [ext4]
[130747.337359]  generic_file_buffered_read+0x410/0xae8
[130747.337767]  generic_file_read_iter+0x114/0x190
[130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556]  __vfs_read+0x11c/0x188
[130747.338851]  vfs_read+0x94/0x150
[130747.339110]  ksys_read+0x74/0xf0

If call ext4_ext_insert_extent failed but new extent already inserted, we just
update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
cause bug on when cache extent.
If call ext4_ext_insert_extent failed don't update ex->ee_len with old value.
Maybe there will lead to block leak, but it can be fixed by fsck later.

After we fixed above issue with v2 patch, but we got the same issue.
ext4_split_extent_at:
{
        ......
        err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
        if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
            ......
            ext4_ext_try_to_merge(handle, inode, path, ex); ->step(1)
            err = ext4_ext_dirty(handle, inode, path + path->p_depth); ->step(2)
            if (err)
                goto fix_extent_len;
        ......
        }
        ......
fix_extent_len:
        ex->ee_len = orig_ex.ee_len; ->step(3)
        ......
}
If step(1) have been merged, but step(2) dirty extent failed, then go to
fix_extent_len label to fix ex->ee_len with orig_ex.ee_len. But "ex" may not be
old one, will cause overwritten. Then will trigger the same issue as previous.
If step(2) failed, just return error, don't fix ex->ee_len with old value.

This patch's modification is according to Jan Kara's suggestion in V3 patch:
("https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/")
"I see. Now I understand your patch. Honestly, seeing how fragile is trying
to fix extent tree after split has failed in the middle, I would probably
go even further and make sure we fix the tree properly in case of ENOSPC
and EDQUOT (those are easily user triggerable).  Anything else indicates a
HW problem or fs corruption so I'd rather leave the extent tree as is and
don't try to fix it (which also means we will not create overlapping
extents)."

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 77c84d6f1af6..cbf37b2cf871 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3206,7 +3206,10 @@ static int ext4_split_extent_at(handle_t *handle,
 		ext4_ext_mark_unwritten(ex2);
 
 	err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
-	if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
+	if (err != -ENOSPC && err != -EDQUOT)
+		goto out;
+
+	if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
 		if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
 			if (split_flag & EXT4_EXT_DATA_VALID1) {
 				err = ext4_ext_zeroout(inode, ex2);
@@ -3232,25 +3235,22 @@ static int ext4_split_extent_at(handle_t *handle,
 					      ext4_ext_pblock(&orig_ex));
 		}
 
-		if (err)
-			goto fix_extent_len;
-		/* update the extent length and mark as initialized */
-		ex->ee_len = cpu_to_le16(ee_len);
-		ext4_ext_try_to_merge(handle, inode, path, ex);
-		err = ext4_ext_dirty(handle, inode, path + path->p_depth);
-		if (err)
-			goto fix_extent_len;
-
-		/* update extent status tree */
-		err = ext4_zeroout_es(inode, &zero_ex);
-
-		goto out;
-	} else if (err)
-		goto fix_extent_len;
-
-out:
-	ext4_ext_show_leaf(inode, path);
-	return err;
+		if (!err) {
+			/* update the extent length and mark as initialized */
+			ex->ee_len = cpu_to_le16(ee_len);
+			ext4_ext_try_to_merge(handle, inode, path, ex);
+			err = ext4_ext_dirty(handle, inode, path + path->p_depth);
+			if (!err)
+				/* update extent status tree */
+				err = ext4_zeroout_es(inode, &zero_ex);
+			/* If we failed at this point, we don't know in which
+			 * state the extent tree exactly is so don't try to fix
+			 * length of the original extent as it may do even more
+			 * damage.
+			 */
+			goto out;
+		}
+	}
 
 fix_extent_len:
 	ex->ee_len = orig_ex.ee_len;
@@ -3260,6 +3260,9 @@ static int ext4_split_extent_at(handle_t *handle,
 	 */
 	ext4_ext_dirty(handle, inode, path + path->p_depth);
 	return err;
+out:
+	ext4_ext_show_leaf(inode, path);
+	return err;
 }
 
 /*
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v4] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  2021-05-06 14:10 [PATCH v4] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed Ye Bin
@ 2021-06-03  1:47 ` Theodore Ts'o
  0 siblings, 0 replies; 2+ messages in thread
From: Theodore Ts'o @ 2021-06-03  1:47 UTC (permalink / raw)
  To: Ye Bin; +Cc: jack, adilger.kernel, linux-ext4, linux-kernel

On Thu, May 06, 2021 at 10:10:42PM +0800, Ye Bin wrote:
> We got follow bug_on when run fsstress with injecting IO fault:
> [130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
> [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
> ......
> [130747.334329] Call trace:
> [130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
> [130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
> [130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
> [130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
> [130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
> [130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
> [130747.336995]  ext4_readpage+0x54/0x100 [ext4]
> [130747.337359]  generic_file_buffered_read+0x410/0xae8
> [130747.337767]  generic_file_read_iter+0x114/0x190
> [130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
> [130747.338556]  __vfs_read+0x11c/0x188
> [130747.338851]  vfs_read+0x94/0x150
> [130747.339110]  ksys_read+0x74/0xf0
> 
> If call ext4_ext_insert_extent failed but new extent already inserted, we just
> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
> cause bug on when cache extent.
> If call ext4_ext_insert_extent failed don't update ex->ee_len with old value.
> Maybe there will lead to block leak, but it can be fixed by fsck later.
> 
> After we fixed above issue with v2 patch, but we got the same issue.
> ext4_split_extent_at:
> {
>         ......
>         err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
>         if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
>             ......
>             ext4_ext_try_to_merge(handle, inode, path, ex); ->step(1)
>             err = ext4_ext_dirty(handle, inode, path + path->p_depth); ->step(2)
>             if (err)
>                 goto fix_extent_len;
>         ......
>         }
>         ......
> fix_extent_len:
>         ex->ee_len = orig_ex.ee_len; ->step(3)
>         ......
> }
> If step(1) have been merged, but step(2) dirty extent failed, then go to
> fix_extent_len label to fix ex->ee_len with orig_ex.ee_len. But "ex" may not be
> old one, will cause overwritten. Then will trigger the same issue as previous.
> If step(2) failed, just return error, don't fix ex->ee_len with old value.
> 
> This patch's modification is according to Jan Kara's suggestion in V3 patch:
> ("https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/")
> "I see. Now I understand your patch. Honestly, seeing how fragile is trying
> to fix extent tree after split has failed in the middle, I would probably
> go even further and make sure we fix the tree properly in case of ENOSPC
> and EDQUOT (those are easily user triggerable).  Anything else indicates a
> HW problem or fs corruption so I'd rather leave the extent tree as is and
> don't try to fix it (which also means we will not create overlapping
> extents)."
> 
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-06-03  1:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 14:10 [PATCH v4] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed Ye Bin
2021-06-03  1:47 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).