Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
@ 2021-03-25  2:29 Ye Bin
  2021-04-06  3:34 ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: Ye Bin @ 2021-03-25  2:29 UTC (permalink / raw)
  To: tytso, adilger.kernel, linux-ext4, linux-kernel; +Cc: Ye Bin

We got follow bug_on:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
[130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995]  ext4_readpage+0x54/0x100 [ext4]
[130747.337359]  generic_file_buffered_read+0x410/0xae8
[130747.337767]  generic_file_read_iter+0x114/0x190
[130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556]  __vfs_read+0x11c/0x188
[130747.338851]  vfs_read+0x94/0x150
[130747.339110]  ksys_read+0x74/0xf0

If call ext4_ext_insert_extent failed but new extent already inserted, we just
update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
cause bug on when cache extent.
If call ext4_ext_insert_extent failed don't update ex->ee_len with old value.
Maybe there will lead to block leak, but it can be fixed by fsck later.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/ext4/extents.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 77c84d6f1af6..970eb2dfcc46 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3246,7 +3246,7 @@ static int ext4_split_extent_at(handle_t *handle,
 
 		goto out;
 	} else if (err)
-		goto fix_extent_len;
+		goto err;
 
 out:
 	ext4_ext_show_leaf(inode, path);
@@ -3254,6 +3254,7 @@ static int ext4_split_extent_at(handle_t *handle,
 
 fix_extent_len:
 	ex->ee_len = orig_ex.ee_len;
+err:
 	/*
 	 * Ignore ext4_ext_dirty return value since we are already in error path
 	 * and err is a non-zero error code.
-- 
2.25.4


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  2021-03-25  2:29 [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed Ye Bin
@ 2021-04-06  3:34 ` Theodore Ts'o
  2021-04-07  1:41   ` yebin
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2021-04-06  3:34 UTC (permalink / raw)
  To: Ye Bin; +Cc: adilger.kernel, linux-ext4, linux-kernel

On Thu, Mar 25, 2021 at 10:29:25AM +0800, Ye Bin wrote:
> We got follow bug_on:
> [130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
> [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
> ......
> [130747.334329] Call trace:
> [130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
> [130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
> [130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
> [130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
> [130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
> [130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
> [130747.336995]  ext4_readpage+0x54/0x100 [ext4]
> [130747.337359]  generic_file_buffered_read+0x410/0xae8
> [130747.337767]  generic_file_read_iter+0x114/0x190
> [130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
> [130747.338556]  __vfs_read+0x11c/0x188
> [130747.338851]  vfs_read+0x94/0x150
> [130747.339110]  ksys_read+0x74/0xf0
> 
> If call ext4_ext_insert_extent failed but new extent already inserted, we just
> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
> cause bug on when cache extent.

How did this happen in the first place?  It sounds like if the extent
was already inserted, that would be casue there was an on-disk file
system corruption, no?

In that case, shouldn't we call ext4_error() to declare the file
system has an inconsistency, so it can be fixed by fsck?

> If call ext4_ext_insert_extent failed don't update ex->ee_len with old value.
> Maybe there will lead to block leak, but it can be fixed by fsck later.

  	      	   	   	       - Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  2021-04-06  3:34 ` Theodore Ts'o
@ 2021-04-07  1:41   ` yebin
  2021-04-09  1:47     ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: yebin @ 2021-04-07  1:41 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: adilger.kernel, linux-ext4, linux-kernel



On 2021/4/6 11:34, Theodore Ts'o wrote:
> On Thu, Mar 25, 2021 at 10:29:25AM +0800, Ye Bin wrote:
>> We got follow bug_on:
>> [130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
>> [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
>> ......
>> [130747.334329] Call trace:
>> [130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
>> [130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
>> [130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
>> [130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
>> [130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
>> [130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
>> [130747.336995]  ext4_readpage+0x54/0x100 [ext4]
>> [130747.337359]  generic_file_buffered_read+0x410/0xae8
>> [130747.337767]  generic_file_read_iter+0x114/0x190
>> [130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
>> [130747.338556]  __vfs_read+0x11c/0x188
>> [130747.338851]  vfs_read+0x94/0x150
>> [130747.339110]  ksys_read+0x74/0xf0
>>
>> If call ext4_ext_insert_extent failed but new extent already inserted, we just
>> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
>> cause bug on when cache extent.
> How did this happen in the first place?  It sounds like if the extent
> was already inserted, that would be casue there was an on-disk file
> system corruption, no?
>
> In that case, shouldn't we call ext4_error() to declare the file
> system has an inconsistency, so it can be fixed by fsck?
We inject IO fault when runing  fsstress,  JBD detect IO error then 
trigger JBD abort. At the same time,
if ext4_ext_insert_extent already insert new exntent then call 
ext4_ext_dirty to dirty metadata , but
JBD already aborted ,  ext4_ext_dirty will return error.
In ext4_ext_dirty function call  ext4_ext_check_inode check extent if 
ok, if not, trigger BUG_ON and
also print extent detail information. I caught follow call trace and 
extent information:
=================================================================================
[ 7597.436071] inode 1808 ext4_ext_check_inode
[ 7597.436074] inode[1808]: [4] lblk=76 len=5 orig_len=32773 prev=0
[ 7597.436077] inode[1808]: [3] lblk=81 len=25 orig_len=25 prev=81
[ 7597.436080] inode[1808]: [2] lblk=106 len=130 orig_len=32898 prev=106
[ 7597.436084] inode[1808]: [1] lblk=228 len=8 orig_len=32776 prev=236
[ 7597.436181] ------------[ cut here ]------------
[ 7597.436183] kernel BUG at fs/ext4/extents.c:221!
[ 7597.436216] invalid opcode: 0000 [#1] SMP KASAN
[ 7597.436219] Buffer I/O error on dev sda, logical block 131072, lost 
sync page write
[ 7597.436224] CPU: 7 PID: 5157 Comm: kworker/u16:1 Not tainted 
4.19.95-00020-gb095ded9163d-dirty #186
[ 7597.436230] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 
04/01/2014
[ 7597.436242] Workqueue: writeback wb_workfn (flush-8:16)
[ 7597.436248] RIP: 0010:__ext4_ext_dirty.cold+0x2ee/0x308
[ 7597.436251] Code: ff 48 89 df e8 c7 e1 d0 ff e9 48 ff ff ff 48 c7 c7 
80 d1 f7 ae e8 87 45 84 ff 48 83 05 ee 3b ca 07 01 48 83 05 c6 3c ca 07 
01 <0f> 04
[ 7597.436253] RSP: 0018:ffff88838bcaf050 EFLAGS: 00010202
[ 7597.436256] RAX: 0000000000000000 RBX: ffff888364b67783 RCX: 
0000000000000000
[ 7597.436258] RDX: 0000000000000001 RSI: 0000000000000008 RDI: 
ffffed1071795dfd
[ 7597.436259] RBP: 000000000000003b R08: 0000000000000000 R09: 
0000000000000001
[ 7597.436261] R10: ffffed10744fbd6e R11: ffff8883a27deb77 R12: 
dffffc0000000000
[ 7597.436262] R13: dffffc0000000000 R14: ffff888364b673f0 R15: 
ffff888364b6730c
[ 7597.436264] FS:  0000000000000000(0000) GS:ffff8883a27c0000(0000) 
knlGS:0000000000000000
[ 7597.436266] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7597.436267] CR2: 00007fca7a249000 CR3: 000000039ce75000 CR4: 
00000000000006e0
[ 7597.436270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 7597.436271] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[ 7597.436275] JBD2: Error -5 detected when updating journal superblock 
for sda-8.
[ 7597.436277] Call Trace:
[ 7597.436294]  ext4_split_extent_at+0x785/0xc80
[ 7597.466767]  ext4_split_extent.isra.0+0x2af/0x550
[ 7597.472503]  ext4_ext_convert_to_initialized+0x4b4/0x2510
[ 7597.536999]  ext4_ext_map_blocks+0x37ef/0x50c0
[ 7597.542467]  ext4_map_blocks+0x685/0x1910
[ 7597.544107]  ext4_writepages+0x1809/0x32f0
[ 7597.547546]  do_writepages+0x7f/0x1b0
[ 7597.548619]  __writeback_single_inode+0xc6/0xc40
[ 7597.549203]  writeback_sb_inodes+0x49f/0xd60
[ 7597.550748]  wb_writeback+0x252/0x9d0
[ 7597.552335]  wb_workfn+0x309/0xe60
[ 7597.557029]  process_one_work+0x70b/0x1610
[ 7597.557566]  worker_thread+0x5a9/0x1060
[ 7597.559091]  kthread+0x35e/0x430
[ 7597.560601]  ret_from_fork+0x1f/0x30
=================================================================================
Obviously, the follow extent is  overlap:
[ 7597.436080] inode[1808]: [2] lblk=106 len=130 orig_len=32898 prev=106
[ 7597.436084] inode[1808]: [1] lblk=228 len=8 orig_len=32776 prev=236

I think this wouldn't  cause file system inconsistency, as error data 
will not be stored  in disk.

>
>> If call ext4_ext_insert_extent failed don't update ex->ee_len with old value.
>> Maybe there will lead to block leak, but it can be fixed by fsck later.
>    	      	   	   	       - Ted
> .
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  2021-04-07  1:41   ` yebin
@ 2021-04-09  1:47     ` Theodore Ts'o
  2021-04-20 14:05       ` yebin
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2021-04-09  1:47 UTC (permalink / raw)
  To: yebin; +Cc: adilger.kernel, linux-ext4, linux-kernel

On Wed, Apr 07, 2021 at 09:41:57AM +0800, yebin wrote:
> > > If call ext4_ext_insert_extent failed but new extent already inserted, we just
> > > update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
> > > cause bug on when cache extent.
> > How did this happen in the first place?  It sounds like if the extent
> > was already inserted, that would be casue there was an on-disk file
> > system corruption, no?
> > 
> > In that case, shouldn't we call ext4_error() to declare the file
> > system has an inconsistency, so it can be fixed by fsck?
> We inject IO fault when runing  fsstress,  JBD detect IO error then trigger
> JBD abort.  At the same time,
> if ext4_ext_insert_extent already insert new exntent then call
> ext4_ext_dirty to dirty metadata , but
> JBD already aborted ,  ext4_ext_dirty will return error.
> In ext4_ext_dirty function call  ext4_ext_check_inode check extent if ok, if
> not, trigger BUG_ON and
> also print extent detail information.

In this particular case, skipping the "ex->ee_len = orig_ex.ee_len"
may avoid the BUG_ON.  But it's not clear that this is always the
right thing to do.  The fundamental question is what should we do we
run into an error while we are in the middle of making changes to
on-disk and in-memory data structures?

In the ideal world, we should undo the changes that we were in the
middle of making before we return an error.  That way, the semantics
are very clear; on success, the function has made the requested change
to the file system.  If the function returns an error, then no changes
should be made.

That was the reasoning behind resetting ex->ee_len to orig_ex.ee_len
in the fix_extent_len inside ext4_split_extent_at().  Unofrtunately,
ext4_ext_insert_extent() does *not* always follow this convention, and
that's because it would be extremely difficult for it to do so --- the
mutations that it makes can be quite complex, including potentially
increasing the height of the extent tree.

However, I don't think your fix is by any means the ideal one, because
the much more common way that ext4_ext_insert_extent() is when it
needs to insert a new leaf node, or need to increase the height of the
extent tree --- and in it returns an ENOSPC failure.  In that case, it
won't have made any changes changes in the extent tree, and so having
ext4_split_extent_at() undo the change to ex->ee_len is the right
thing to do.

Having blocks get leaked when there is an ENOSPC failure, requiring
fsck to be run --- and without giving the user any warning that this
has happened is *not* a good way to fail.  So I don't think the
proposed patch is the right way to go.

A better way to go would be to teach ext4_ext_insert_extent() so if
there is a late failure, that it unwinds the leaf node back to its
original state (at least from a semantic value).  Since the extent
leaf node could have been split, and/or adjacent extent entries may
have been merged, what it would need to do is to remember the starting
block number and length, and make whatever changes are necessaries to
the extent entries in that leaf node corresponding to that starting
block number and length.

If you don't want to do that, then a "do no harm" fix would be
something like this:

	...
	} else if (err == -EROFS) {
		return err;
	} else if (err)
		goto fix_extent_len;

So in the journal abort case, when err is set to EROFS, we don't try
to reset the length, since in theory the file system is read-only
already anyway.  However, in the ENOSPC case, we won't end up silently
leaking blocks that will be lost until the user somehow decides to run
fsck.

There are still times when this doesn't get things completely right
(e.g., what if we get a late ENOMEM error versus an early ENOMEM
failure), where the only real fix is to make ext4_ext_insert_extent()
obey the convention that if it returns an error, it must not result in
any user-visible state change.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  2021-04-09  1:47     ` Theodore Ts'o
@ 2021-04-20 14:05       ` yebin
  0 siblings, 0 replies; 5+ messages in thread
From: yebin @ 2021-04-20 14:05 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: adilger.kernel, linux-ext4, linux-kernel



On 2021/4/9 9:47, Theodore Ts'o wrote:
> On Wed, Apr 07, 2021 at 09:41:57AM +0800, yebin wrote:
>>>> If call ext4_ext_insert_extent failed but new extent already inserted, we just
>>>> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
>>>> cause bug on when cache extent.
>>> How did this happen in the first place?  It sounds like if the extent
>>> was already inserted, that would be casue there was an on-disk file
>>> system corruption, no?
>>>
>>> In that case, shouldn't we call ext4_error() to declare the file
>>> system has an inconsistency, so it can be fixed by fsck?
>> We inject IO fault when runing  fsstress,  JBD detect IO error then trigger
>> JBD abort.  At the same time,
>> if ext4_ext_insert_extent already insert new exntent then call
>> ext4_ext_dirty to dirty metadata , but
>> JBD already aborted ,  ext4_ext_dirty will return error.
>> In ext4_ext_dirty function call  ext4_ext_check_inode check extent if ok, if
>> not, trigger BUG_ON and
>> also print extent detail information.
> In this particular case, skipping the "ex->ee_len = orig_ex.ee_len"
> may avoid the BUG_ON.  But it's not clear that this is always the
> right thing to do.  The fundamental question is what should we do we
> run into an error while we are in the middle of making changes to
> on-disk and in-memory data structures?
>
> In the ideal world, we should undo the changes that we were in the
> middle of making before we return an error.  That way, the semantics
> are very clear; on success, the function has made the requested change
> to the file system.  If the function returns an error, then no changes
> should be made.
>
> That was the reasoning behind resetting ex->ee_len to orig_ex.ee_len
> in the fix_extent_len inside ext4_split_extent_at().  Unofrtunately,
> ext4_ext_insert_extent() does *not* always follow this convention, and
> that's because it would be extremely difficult for it to do so --- the
> mutations that it makes can be quite complex, including potentially
> increasing the height of the extent tree.
>
> However, I don't think your fix is by any means the ideal one, because
> the much more common way that ext4_ext_insert_extent() is when it
> needs to insert a new leaf node, or need to increase the height of the
> extent tree --- and in it returns an ENOSPC failure.  In that case, it
> won't have made any changes changes in the extent tree, and so having
> ext4_split_extent_at() undo the change to ex->ee_len is the right
> thing to do.
>
> Having blocks get leaked when there is an ENOSPC failure, requiring
> fsck to be run --- and without giving the user any warning that this
> has happened is *not* a good way to fail.  So I don't think the
> proposed patch is the right way to go.
>
> A better way to go would be to teach ext4_ext_insert_extent() so if
> there is a late failure, that it unwinds the leaf node back to its
> original state (at least from a semantic value).  Since the extent
> leaf node could have been split, and/or adjacent extent entries may
> have been merged, what it would need to do is to remember the starting
> block number and length, and make whatever changes are necessaries to
> the extent entries in that leaf node corresponding to that starting
> block number and length.
>
> If you don't want to do that, then a "do no harm" fix would be
> something like this:
>
> 	...
> 	} else if (err == -EROFS) {
> 		return err;
> 	} else if (err)
> 		goto fix_extent_len;
Thanks  for your advice. I will send v2 patch.
>
> So in the journal abort case, when err is set to EROFS, we don't try
> to reset the length, since in theory the file system is read-only
> already anyway.  However, in the ENOSPC case, we won't end up silently
> leaking blocks that will be lost until the user somehow decides to run
> fsck.
>
> There are still times when this doesn't get things completely right
> (e.g., what if we get a late ENOMEM error versus an early ENOMEM
> failure), where the only real fix is to make ext4_ext_insert_extent()
> obey the convention that if it returns an error, it must not result in
> any user-visible state change.
>
> Cheers,
>
> 						- Ted
> .


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25  2:29 [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed Ye Bin
2021-04-06  3:34 ` Theodore Ts'o
2021-04-07  1:41   ` yebin
2021-04-09  1:47     ` Theodore Ts'o
2021-04-20 14:05       ` yebin

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git