[PATCH v2 0/2] btrfs: qgroup: Fix deadlock where btrfs_qgroup_wait_for

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] btrfs: qgroup: Fix deadlock where btrfs_qgroup_wait_for_completion() waits for never-queued work
@ 2020-02-07  5:38 Qu Wenruo
  2020-02-07  5:38 ` [PATCH v2 1/2] btrfs: qgroup: Ensure qgroup_rescan_running is only set when the worker is at least queued Qu Wenruo
  2020-02-07  5:38 ` [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued Qu Wenruo
  0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2020-02-07  5:38 UTC (permalink / raw)
  To: linux-btrfs

There is a long existing report about btrfs hangs at unmount time,
waiting for qgroup.

Jeff has submitted a patch for that, but never merged.
https://patchwork.kernel.org/patch/10376585/

After re-digging the case, although Jeff's fix can solve the problem,
the racy cause doesn't look correct to me.

After all, close_ctree() wait for qgroup rescan before destroying
related work queues. Thus as long as the work is queued, we can finish
the wait without problem.

Further digging into the bug, it looks like the deadlock is possible,
and Jeff is right about the wait-for-never-queued-work part.
But the racy part doesn't look possible, thus it should only happen
when something wrong happened.

Now with a proper cause analyse, we can craft a much smaller thus better
fix (anyway I'm the guy to backport, smaller is always better).

Changelog:
v2:
- Change the subject
  It's not about race. I got confused by the initial patch.

- Change the cause analyse
  No need for any race. Also add analyse for all qgroup_rescan_init()
  callers to ensure no missing fixes.
  BTW, qgroup_rescan_init() uses BTRFS_QGROUP_STATUS_FLAG_RESCAN flag to
  determine if there is a conflicting rescan, thus it's not affected by
  the timing change.

- Split the spinlock cleanup into another patch

Qu Wenruo (2):
  btrfs: qgroup: Ensure qgroup_rescan_running is only set when the
    worker is at least queued
  btrfs: qgroup: Remove the unnecesaary spin lock for
    qgroup_rescan_running|queued

 fs/btrfs/qgroup.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.25.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] btrfs: qgroup: Ensure qgroup_rescan_running is only set when the worker is at least queued
  2020-02-07  5:38 [PATCH v2 0/2] btrfs: qgroup: Fix deadlock where btrfs_qgroup_wait_for_completion() waits for never-queued work Qu Wenruo
@ 2020-02-07  5:38 ` Qu Wenruo
  2020-02-07  5:38 ` [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued Qu Wenruo
  1 sibling, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2020-02-07  5:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jeff Mahoney

[BUG]
There are some reports about btrfs wait forever to unmount itself, with
the following call trace:
  INFO: task umount:4631 blocked for more than 491 seconds.
        Tainted: G               X  5.3.8-2-default #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  umount          D    0  4631   3337 0x00000000
  Call Trace:
  ([<00000000174adf7a>] __schedule+0x342/0x748)
   [<00000000174ae3ca>] schedule+0x4a/0xd8
   [<00000000174b1f08>] schedule_timeout+0x218/0x420
   [<00000000174af10c>] wait_for_common+0x104/0x1d8
   [<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs]
   [<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs]
   [<0000000016fa3136>] generic_shutdown_super+0x8e/0x158
   [<0000000016fa34d6>] kill_anon_super+0x26/0x40
   [<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs]
   [<0000000016fa39f8>] deactivate_locked_super+0x68/0x98
   [<0000000016fcb198>] cleanup_mnt+0xc0/0x140
   [<0000000016d6a846>] task_work_run+0xc6/0x110
   [<0000000016d04f76>] do_notify_resume+0xae/0xb8
   [<00000000174b30ae>] system_call+0xe2/0x2c8

[CAUSE]
The problem happens when we have called qgroup_rescan_init(), but
doesn't queue the worker. It can be caused mostly by error handling.

	Qgroup ioctl thread		|	Unmount thread
----------------------------------------+-----------------------------------
					|
btrfs_qgroup_rescan()			|
|- qgroup_rescan_init()			|
|  |- qgroup_rescan_running = true;	|
|					|
|- trans = btrfs_join_transaction()	|
|  Some error happened			|
|					|
|- btrfs_qgroup_rescan() returns error	|
   But qgroup_rescan_running == true;	|
					| close_ctree()
					| |- btrfs_qgroup_wait_for_completion()
					|    |- running == true;
					|    |- wait_for_completion();

btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake
up close_ctree() and we get a deadlock.

All involved qgroup_rescan_init() callers are:
- btrfs_qgroup_rescan()
  The example above. It's possible to trigger the deadlock when error
  happened.

- btrfs_quota_enable()
  Not possible. Just after qgroup_rescan_init() we queue the work.

- btrfs_read_qgroup_config()
  It's possible to trigger the deadlock. It only init the work, the
  work queueing happens in btrfs_qgroup_rescan_resume().
  Thus if error happened between, deadlock is possible.

We shouldn't set fs_info->qgroup_rescan_running just in
qgroup_rescan_init(), as at that stage we haven't yet submit qgroup
rescan worker to run.

[FIX]
This patch fixes the problem by setting qgroup_rescan_running before
queueing the work, so that we ensure the rescan work is queued when we
wait for it.

Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ Change subject and cause analyse, use a smaller fix ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Change the subject
  It's not about race. I got confused by the initial patch.

- Change the cause analyse
  No need for any race. Also add analyse for all qgroup_rescan_init()
  callers to ensure no missing fixes.
  BTW, qgroup_rescan_init() uses BTRFS_QGROUP_STATUS_FLAG_RESCAN flag to
  determine if there is a conflicting rescan, thus it's not affected by
  the timing change.

- Split the spinlock cleanup into another patch

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index d4282e12f2a6..812f51f67903 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1030,6 +1030,7 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 	ret = qgroup_rescan_init(fs_info, 0, 1);
 	if (!ret) {
 	        qgroup_rescan_zero_tracking(fs_info);
+		fs_info->qgroup_rescan_running = true;
 	        btrfs_queue_work(fs_info->qgroup_rescan_workers,
 	                         &fs_info->qgroup_rescan_work);
 	}
@@ -3272,7 +3273,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 		sizeof(fs_info->qgroup_rescan_progress));
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
 	init_completion(&fs_info->qgroup_rescan_completion);
-	fs_info->qgroup_rescan_running = true;
 
 	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
@@ -3335,8 +3335,11 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
 	qgroup_rescan_zero_tracking(fs_info);
 
+	mutex_lock(&fs_info->qgroup_rescan_lock);
+	fs_info->qgroup_rescan_running = true;
 	btrfs_queue_work(fs_info->qgroup_rescan_workers,
 			 &fs_info->qgroup_rescan_work);
+	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	return 0;
 }
@@ -3372,9 +3375,13 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
 void
 btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 {
-	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
+	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
+		mutex_lock(&fs_info->qgroup_rescan_lock);
+		fs_info->qgroup_rescan_running = true;
 		btrfs_queue_work(fs_info->qgroup_rescan_workers,
 				 &fs_info->qgroup_rescan_work);
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+	}
 }
 
 /*
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued
  2020-02-07  5:38 [PATCH v2 0/2] btrfs: qgroup: Fix deadlock where btrfs_qgroup_wait_for_completion() waits for never-queued work Qu Wenruo
  2020-02-07  5:38 ` [PATCH v2 1/2] btrfs: qgroup: Ensure qgroup_rescan_running is only set when the worker is at least queued Qu Wenruo
@ 2020-02-07  5:38 ` Qu Wenruo
  2020-02-24 16:45   ` David Sterba
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2020-02-07  5:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

Those two members are all protected by
btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock.

Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- New patch split in v2
---
 fs/btrfs/qgroup.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 812f51f67903..e07d6a6b2049 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3247,7 +3247,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 	}
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
 
 	if (init_flags) {
 		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
@@ -3262,7 +3261,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 		}
 
 		if (ret) {
-			spin_unlock(&fs_info->qgroup_lock);
 			mutex_unlock(&fs_info->qgroup_rescan_lock);
 			return ret;
 		}
@@ -3273,8 +3271,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 		sizeof(fs_info->qgroup_rescan_progress));
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
 	init_completion(&fs_info->qgroup_rescan_completion);
-
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	btrfs_init_work(&fs_info->qgroup_rescan_work,
@@ -3351,9 +3347,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
 	running = fs_info->qgroup_rescan_running;
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	if (!running)
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued
  2020-02-07  5:38 ` [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued Qu Wenruo
@ 2020-02-24 16:45   ` David Sterba
  2020-02-24 23:44     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2020-02-24 16:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Josef Bacik

On Fri, Feb 07, 2020 at 01:38:21PM +0800, Qu Wenruo wrote:
> Those two members are all protected by
> btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock.

Two members refers to btrfs_fs_info::qgroup_rescan_lock and what else?
Byt the subject it's something 'queued' but I can't find what it's
referring to.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued
  2020-02-24 16:45   ` David Sterba
@ 2020-02-24 23:44     ` Qu Wenruo
  2020-02-25 16:49       ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2020-02-24 23:44 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Josef Bacik


[-- Attachment #1.1: Type: text/plain, Size: 601 bytes --]



On 2020/2/25 上午12:45, David Sterba wrote:
> On Fri, Feb 07, 2020 at 01:38:21PM +0800, Qu Wenruo wrote:
>> Those two members are all protected by
>> btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock.
> 
> Two members refers to btrfs_fs_info::qgroup_rescan_lock and what else?
> Byt the subject it's something 'queued' but I can't find what it's
> referring to.
> 
My bad, with the latest version, there is only qgroup_rescan_running, no
qgroup_rescan_queued.

Just one member now.

Do I need to resend the patch with commit message updated?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued
  2020-02-24 23:44     ` Qu Wenruo
@ 2020-02-25 16:49       ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2020-02-25 16:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Josef Bacik

On Tue, Feb 25, 2020 at 07:44:12AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/2/25 上午12:45, David Sterba wrote:
> > On Fri, Feb 07, 2020 at 01:38:21PM +0800, Qu Wenruo wrote:
> >> Those two members are all protected by
> >> btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock.
> > 
> > Two members refers to btrfs_fs_info::qgroup_rescan_lock and what else?
> > Byt the subject it's something 'queued' but I can't find what it's
> > referring to.
> > 
> My bad, with the latest version, there is only qgroup_rescan_running, no
> qgroup_rescan_queued.
> 
> Just one member now.
> 
> Do I need to resend the patch with commit message updated?

Not needed, I only wanted to clarify if I'm not missing something. I'll
update changelog and push to misc-next.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-25 16:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07  5:38 [PATCH v2 0/2] btrfs: qgroup: Fix deadlock where btrfs_qgroup_wait_for_completion() waits for never-queued work Qu Wenruo
2020-02-07  5:38 ` [PATCH v2 1/2] btrfs: qgroup: Ensure qgroup_rescan_running is only set when the worker is at least queued Qu Wenruo
2020-02-07  5:38 ` [PATCH v2 2/2] btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running|queued Qu Wenruo
2020-02-24 16:45   ` David Sterba
2020-02-24 23:44     ` Qu Wenruo
2020-02-25 16:49       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).