[PATCH] Btrfs: clean up resources during umount after trans is aborted

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Btrfs: clean up resources during umount after trans is aborted
@ 2018-03-30 22:11 Liu Bo
  2018-04-05 16:11 ` David Sterba
  2018-04-17 13:34 ` David Sterba
  0 siblings, 2 replies; 6+ messages in thread
From: Liu Bo @ 2018-03-30 22:11 UTC (permalink / raw)
  To: linux-btrfs

Currently if some fatal errors occur, like all IO get -EIO, resources
would be cleaned up when
a) transaction is being committed or
b) BTRFS_FS_STATE_ERROR is set

However, in some rare cases, resources may be left alone after transaction
gets aborted and umount may run into some ASSERT(), e.g.
ASSERT(list_empty(&block_group->dirty_list));

For case a), in btrfs_commit_transaciton(), there're several places at the
beginning where we just call btrfs_end_transaction() without cleaning up
resources.  For case b), it is possible that the trans handle doesn't have
any dirty stuff, then only trans hanlde is marked as aborted while
BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.

This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
all resources won't stay in memory after umount.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 21f34ad..643bd69 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3735,7 +3735,8 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 			btrfs_err(fs_info, "commit super ret %d", ret);
 	}

-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state) ||
+	    test_bit(BTRFS_FS_STATE_TRANS_ABORTED, &fs_info->fs_state))
 		btrfs_error_commit_super(fs_info);

 	kthread_stop(fs_info->transaction_kthread);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
  2018-03-30 22:11 [PATCH] Btrfs: clean up resources during umount after trans is aborted Liu Bo
@ 2018-04-05 16:11 ` David Sterba
  2018-04-05 18:45   ` Liu Bo
  2018-04-17 13:34 ` David Sterba
  1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-04-05 16:11 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> Currently if some fatal errors occur, like all IO get -EIO, resources
> would be cleaned up when
> a) transaction is being committed or
> b) BTRFS_FS_STATE_ERROR is set
> 
> However, in some rare cases, resources may be left alone after transaction
> gets aborted and umount may run into some ASSERT(), e.g.
> ASSERT(list_empty(&block_group->dirty_list));
> 
> For case a), in btrfs_commit_transaciton(), there're several places at the
> beginning where we just call btrfs_end_transaction() without cleaning up
> resources.  For case b), it is possible that the trans handle doesn't have
> any dirty stuff, then only trans hanlde is marked as aborted while
> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
> 
> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> all resources won't stay in memory after umount.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>

Is it possible that the following stactrace could be caused by the
missing check? It roughly matches what you describe (ie. close_ctree and
unreleased resources). This is from generic/475, that does some error
injection:

[16991.455178] WARNING: CPU: 6 PID: 23518 at fs/btrfs/extent-tree.c:9896 btrfs_free_block_groups+0x2c8/0x420 [btrfs]

[16991.621105]  close_ctree+0x114/0x2d0 [btrfs]
[16991.625482]  generic_shutdown_super+0x6c/0x120
[16991.630025]  kill_anon_super+0xe/0x20
[16991.633820]  btrfs_kill_super+0x13/0x100 [btrfs]
[16991.638550]  deactivate_locked_super+0x3f/0x70
[16991.643332]  cleanup_mnt+0x3b/0x70
[16991.646889]  task_work_run+0x89/0xa0
[16991.650565]  exit_to_usermode_loop+0x79/0xa3
[16991.654985]  do_syscall_64+0xe9/0x110
[16991.658841]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
  2018-04-05 16:11 ` David Sterba
@ 2018-04-05 18:45   ` Liu Bo
  2018-04-06 13:16     ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2018-04-05 18:45 UTC (permalink / raw)
  To: David Sterba, Liu Bo, linux-btrfs

On Thu, Apr 5, 2018 at 9:11 AM, David Sterba <dsterba@suse.cz> wrote:
> On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
>> Currently if some fatal errors occur, like all IO get -EIO, resources
>> would be cleaned up when
>> a) transaction is being committed or
>> b) BTRFS_FS_STATE_ERROR is set
>>
>> However, in some rare cases, resources may be left alone after transaction
>> gets aborted and umount may run into some ASSERT(), e.g.
>> ASSERT(list_empty(&block_group->dirty_list));
>>
>> For case a), in btrfs_commit_transaciton(), there're several places at the
>> beginning where we just call btrfs_end_transaction() without cleaning up
>> resources.  For case b), it is possible that the trans handle doesn't have
>> any dirty stuff, then only trans hanlde is marked as aborted while
>> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
>>
>> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
>> all resources won't stay in memory after umount.
>>
>> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
>
> Is it possible that the following stactrace could be caused by the
> missing check? It roughly matches what you describe (ie. close_ctree and
> unreleased resources). This is from generic/475, that does some error
> injection:
>
> [16991.455178] WARNING: CPU: 6 PID: 23518 at fs/btrfs/extent-tree.c:9896 btrfs_free_block_groups+0x2c8/0x420 [btrfs]
>

Hmm...I don't think so, while running 475, the one I got pretty stable is
ASSERT(list_empty(&block_group->dirty_list));

And I did see this warning a few times, but I thought that was due to
the new flag (ZERO) of fallocate for which we had fixes from Filipe,
not sure if they've been merged?

Anyway, let me double check.

thanks,
liubo

> [16991.621105]  close_ctree+0x114/0x2d0 [btrfs]
> [16991.625482]  generic_shutdown_super+0x6c/0x120
> [16991.630025]  kill_anon_super+0xe/0x20
> [16991.633820]  btrfs_kill_super+0x13/0x100 [btrfs]
> [16991.638550]  deactivate_locked_super+0x3f/0x70
> [16991.643332]  cleanup_mnt+0x3b/0x70
> [16991.646889]  task_work_run+0x89/0xa0
> [16991.650565]  exit_to_usermode_loop+0x79/0xa3
> [16991.654985]  do_syscall_64+0xe9/0x110
> [16991.658841]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
  2018-04-05 18:45   ` Liu Bo
@ 2018-04-06 13:16     ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2018-04-06 13:16 UTC (permalink / raw)
  To: Liu Bo; +Cc: David Sterba, Liu Bo, linux-btrfs

On Thu, Apr 05, 2018 at 11:45:55AM -0700, Liu Bo wrote:
> On Thu, Apr 5, 2018 at 9:11 AM, David Sterba <dsterba@suse.cz> wrote:
> > On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> >> Currently if some fatal errors occur, like all IO get -EIO, resources
> >> would be cleaned up when
> >> a) transaction is being committed or
> >> b) BTRFS_FS_STATE_ERROR is set
> >>
> >> However, in some rare cases, resources may be left alone after transaction
> >> gets aborted and umount may run into some ASSERT(), e.g.
> >> ASSERT(list_empty(&block_group->dirty_list));
> >>
> >> For case a), in btrfs_commit_transaciton(), there're several places at the
> >> beginning where we just call btrfs_end_transaction() without cleaning up
> >> resources.  For case b), it is possible that the trans handle doesn't have
> >> any dirty stuff, then only trans hanlde is marked as aborted while
> >> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
> >>
> >> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> >> all resources won't stay in memory after umount.
> >>
> >> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> >
> > Is it possible that the following stactrace could be caused by the
> > missing check? It roughly matches what you describe (ie. close_ctree and
> > unreleased resources). This is from generic/475, that does some error
> > injection:
> >
> > [16991.455178] WARNING: CPU: 6 PID: 23518 at fs/btrfs/extent-tree.c:9896 btrfs_free_block_groups+0x2c8/0x420 [btrfs]
> >
> 
> Hmm...I don't think so, while running 475, the one I got pretty stable is
> ASSERT(list_empty(&block_group->dirty_list));

There's a number of things that 475 catches so this might depend on
timing, memory, disks etc.

> And I did see this warning a few times, but I thought that was due to
> the new flag (ZERO) of fallocate for which we had fixes from Filipe,
> not sure if they've been merged?

Merged to 4.15:

* f27451f22996687 Btrfs: add support for fallocate's zero range operation
* 9f13ce743b1bd4e Btrfs: fix missing inode i_size update after zero range operation

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
  2018-03-30 22:11 [PATCH] Btrfs: clean up resources during umount after trans is aborted Liu Bo
  2018-04-05 16:11 ` David Sterba
@ 2018-04-17 13:34 ` David Sterba
  2018-04-17 17:14   ` Liu Bo
  1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-04-17 13:34 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> Currently if some fatal errors occur, like all IO get -EIO, resources
> would be cleaned up when
> a) transaction is being committed or
> b) BTRFS_FS_STATE_ERROR is set
> 
> However, in some rare cases, resources may be left alone after transaction
> gets aborted and umount may run into some ASSERT(), e.g.
> ASSERT(list_empty(&block_group->dirty_list));
> 
> For case a), in btrfs_commit_transaciton(), there're several places at the
> beginning where we just call btrfs_end_transaction() without cleaning up
> resources.  For case b), it is possible that the trans handle doesn't have
> any dirty stuff, then only trans hanlde is marked as aborted while
> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.

I have some doubts here. The flag BTRFS_FS_STATE_TRANS_ABORTED was added
to avoid duplicate warnings when the transaction is aborted for the
first time. And nothing else.

BTRFS_FS_STATE_ERROR should track the overall state and is checked in
several places if a post-error cleanup is necessary.

Calling abort should set BTRFS_FS_STATE_ERROR immediatelly so any caller
up the callchaing can potentially cleanup.

> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> all resources won't stay in memory after umount.

The question is why BTRFS_FS_STATE_ERROR is not set but we still got
here and the filesystem is in an error state.

You're not specific about the rare cases where the resources are left
after abort and the umount is about to start. I think the rare cases
need to be identified and understood otherwise this seems like papering
over the problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
  2018-04-17 13:34 ` David Sterba
@ 2018-04-17 17:14   ` Liu Bo
  0 siblings, 0 replies; 6+ messages in thread
From: Liu Bo @ 2018-04-17 17:14 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Tue, Apr 17, 2018 at 03:34:04PM +0200, David Sterba wrote:
> On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> > Currently if some fatal errors occur, like all IO get -EIO, resources
> > would be cleaned up when
> > a) transaction is being committed or
> > b) BTRFS_FS_STATE_ERROR is set
> > 
> > However, in some rare cases, resources may be left alone after transaction
> > gets aborted and umount may run into some ASSERT(), e.g.
> > ASSERT(list_empty(&block_group->dirty_list));
> > 
> > For case a), in btrfs_commit_transaciton(), there're several places at the
> > beginning where we just call btrfs_end_transaction() without cleaning up
> > resources.  For case b), it is possible that the trans handle doesn't have
> > any dirty stuff, then only trans hanlde is marked as aborted while
> > BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
> 
> I have some doubts here. The flag BTRFS_FS_STATE_TRANS_ABORTED was added
> to avoid duplicate warnings when the transaction is aborted for the
> first time. And nothing else.
> 
> BTRFS_FS_STATE_ERROR should track the overall state and is checked in
> several places if a post-error cleanup is necessary.
> 
> Calling abort should set BTRFS_FS_STATE_ERROR immediatelly so any caller
> up the callchaing can potentially cleanup.
> 
> > This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> > all resources won't stay in memory after umount.
> 
> The question is why BTRFS_FS_STATE_ERROR is not set but we still got
> here and the filesystem is in an error state.
> 
> You're not specific about the rare cases where the resources are left
> after abort and the umount is about to start. I think the rare cases
> need to be identified and understood otherwise this seems like papering
> over the problem.

Yes, that makes sense.

btrfs_commit_transaction()
   -> btrfs_run_delayed_refs() # the very 1st one
      -> __btrfs_run_delayed_refs() # bail out with errors e.g. -EIO, -ENOSPC
      -> btrfs_abort_transaction()
         -> # set BTRFS_FS_STATE_TRANS_ABORTED
	 -> __btrfs_abort_transaction() # !trans->dirty and no new_bgs,
	                                   then return without setting
					   BTRFS_FS_STATA_ERROR

While trans handle is aborted, cur_trans is not aborted, so that other
threads would join this @cur_trans and it's likely that they would go
thru the same process.  Eventually umount is called,

close_ctree()
   -> btrfs_commit_super()
       -> join_transaction()
       -> btrfs_commit_transaction() # bail out due to the same reason
   -> ...  # btrfs_error_commit_super is not called due to STATE_ERROR not being set.


So as we can see, if btrfs_commit_transaction() enters COMMIT phrase,
then cleanup_transaction would handle the rest well, otherwise we'd
get this.

thanks,
-liubo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-17 17:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-30 22:11 [PATCH] Btrfs: clean up resources during umount after trans is aborted Liu Bo
2018-04-05 16:11 ` David Sterba
2018-04-05 18:45   ` Liu Bo
2018-04-06 13:16     ` David Sterba
2018-04-17 13:34 ` David Sterba
2018-04-17 17:14   ` Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.