Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
@ 2018-05-02 21:11 jeffm
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: jeffm @ 2018-05-02 21:11 UTC (permalink / raw)
  To: dsterba, linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

Hi Dave -

Here's the updated patchset for the rescan races.  This fixes the issue
where we'd try to start multiple workers.  It introduces a new "ready"
bool that we set during initialization and clear while queuing the worker.
The queuer is also now responsible for most of the initialization.

I have a separate patch set start that gets rid of the racy mess surrounding
the rescan worker startup.  We can handle it in btrfs_run_qgroups and
just set a flag to start it everywhere else.

-Jeff

---

Jeff Mahoney (3):
  btrfs: qgroups, fix rescan worker running races
  btrfs: qgroups, remove unnecessary memset before btrfs_init_work
  btrfs: qgroup, don't try to insert status item after ENOMEM in rescan
    worker

 fs/btrfs/async-thread.c |   1 +
 fs/btrfs/ctree.h        |   2 +
 fs/btrfs/qgroup.c       | 100 +++++++++++++++++++++++++++---------------------
 3 files changed, 60 insertions(+), 43 deletions(-)

-- 
2.12.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
@ 2018-05-02 21:11 ` jeffm
  2018-05-03  7:24   ` Nikolay Borisov
                     ` (2 more replies)
  2018-05-02 21:11 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 17+ messages in thread
From: jeffm @ 2018-05-02 21:11 UTC (permalink / raw)
  To: dsterba, linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
ended up reintroducing the hang-on-unmount bug that the commit it
intended to fix addressed.

The race this time is between qgroup_rescan_init setting
->qgroup_rescan_running = true and the worker starting.  There are
many scenarios where we initialize the worker and never start it.  The
completion btrfs_ioctl_quota_rescan_wait waits for will never come.
This can happen even without involving error handling, since mounting
the file system read-only returns between initializing the worker and
queueing it.

The right place to do it is when we're queuing the worker.  The flag
really just means that btrfs_ioctl_quota_rescan_wait should wait for
a completion.

Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
refer to both runtime behavior and on-disk state, we introduce a new
fs_info->qgroup_rescan_ready to indicate that we're initialized and
waiting to start.

This patch introduces a new helper, queue_rescan_worker, that handles
most of the initialization, the two flags, and queuing the worker,
including races with unmount.

While we're at it, ->qgroup_rescan_running is protected only by the
->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
to take the spinlock too.

Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/ctree.h  |  2 ++
 fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
 2 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index da308774b8a4..4003498bb714 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *qgroup_rescan_workers;
 	struct completion qgroup_rescan_completion;
 	struct btrfs_work qgroup_rescan_work;
+	/* qgroup rescan worker is running or queued to run */
+	bool qgroup_rescan_ready;
 	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
 
 	/* filesystem state */
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index aa259d6986e1..466744741873 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -101,6 +101,7 @@ static int
 qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 		   int init_flags);
 static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
+static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
 
 /* must be called with qgroup_ioctl_lock held */
 static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
@@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
+{
+	mutex_lock(&fs_info->qgroup_rescan_lock);
+	if (btrfs_fs_closing(fs_info)) {
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+
+	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
+		btrfs_warn(fs_info, "rescan worker not ready");
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+	fs_info->qgroup_rescan_ready = false;
+
+	if (WARN_ON(fs_info->qgroup_rescan_running)) {
+		btrfs_warn(fs_info, "rescan worker already queued");
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+
+	/*
+	 * Being queued is enough for btrfs_qgroup_wait_for_completion
+	 * to need to wait.
+	 */
+	fs_info->qgroup_rescan_running = true;
+	init_completion(&fs_info->qgroup_rescan_completion);
+	mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+	memset(&fs_info->qgroup_rescan_work, 0,
+	       sizeof(fs_info->qgroup_rescan_work));
+
+	btrfs_init_work(&fs_info->qgroup_rescan_work,
+			btrfs_qgroup_rescan_helper,
+			btrfs_qgroup_rescan_worker, NULL, NULL);
+
+	btrfs_queue_work(fs_info->qgroup_rescan_workers,
+			 &fs_info->qgroup_rescan_work);
+}
+
 /*
  * called from commit_transaction. Writes all changed qgroups to disk.
  */
@@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 		ret = qgroup_rescan_init(fs_info, 0, 1);
 		if (!ret) {
 			qgroup_rescan_zero_tracking(fs_info);
-			btrfs_queue_work(fs_info->qgroup_rescan_workers,
-					 &fs_info->qgroup_rescan_work);
+			queue_rescan_worker(fs_info);
 		}
 		ret = 0;
 	}
@@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 	if (!path)
 		goto out;
 
+	mutex_lock(&fs_info->qgroup_rescan_lock);
+	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+	mutex_unlock(&fs_info->qgroup_rescan_lock);
+
 	err = 0;
 	while (!err && !btrfs_fs_closing(fs_info)) {
 		trans = btrfs_start_transaction(fs_info->fs_root, 0);
@@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 {
 	int ret = 0;
 
-	if (!init_flags &&
-	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
-	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
 		ret = -EINVAL;
 		goto err;
 	}
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
-
-	if (init_flags) {
-		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
-			ret = -EINPROGRESS;
-		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
-			ret = -EINVAL;
-
-		if (ret) {
-			spin_unlock(&fs_info->qgroup_lock);
-			mutex_unlock(&fs_info->qgroup_rescan_lock);
-			goto err;
-		}
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		ret = -EINPROGRESS;
+		goto err;
 	}
 
 	memset(&fs_info->qgroup_rescan_progress, 0,
 		sizeof(fs_info->qgroup_rescan_progress));
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
-	init_completion(&fs_info->qgroup_rescan_completion);
-	fs_info->qgroup_rescan_running = true;
+	fs_info->qgroup_rescan_ready = true;
 
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
-	memset(&fs_info->qgroup_rescan_work, 0,
-	       sizeof(fs_info->qgroup_rescan_work));
-	btrfs_init_work(&fs_info->qgroup_rescan_work,
-			btrfs_qgroup_rescan_helper,
-			btrfs_qgroup_rescan_worker, NULL, NULL);
-
-	if (ret) {
 err:
+	if (ret) {
 		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
 		return ret;
 	}
@@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
 	qgroup_rescan_zero_tracking(fs_info);
 
-	btrfs_queue_work(fs_info->qgroup_rescan_workers,
-			 &fs_info->qgroup_rescan_work);
-
+	queue_rescan_worker(fs_info);
 	return 0;
 }
 
@@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
 	running = fs_info->qgroup_rescan_running;
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	if (!running)
@@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
  * this is only called from open_ctree where we're still single threaded, thus
  * locking is omitted here.
  */
-void
-btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
+void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 {
 	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
-		btrfs_queue_work(fs_info->qgroup_rescan_workers,
-				 &fs_info->qgroup_rescan_work);
+		queue_rescan_worker(fs_info);
 }
 
 /*
-- 
2.12.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work
  2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
@ 2018-05-02 21:11 ` jeffm
  2018-05-02 21:11 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: jeffm @ 2018-05-02 21:11 UTC (permalink / raw)
  To: dsterba, linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

btrfs_init_work clears the work struct except for ->wq, so the memset
before calling btrfs_init_work in qgroup_rescan_init is unnecessary.

We'll also initialize ->wq in btrfs_init_work so that it's obvious.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/async-thread.c | 1 +
 fs/btrfs/qgroup.c       | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index d5540749f0e5..c614fb7b9b9d 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -354,6 +354,7 @@ void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
 	INIT_WORK(&work->normal_work, uniq_func);
 	INIT_LIST_HEAD(&work->ordered_list);
 	work->flags = 0;
+	work->wq = NULL;
 }
 
 static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 466744741873..3d47700c6a30 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2102,9 +2102,6 @@ static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
 	init_completion(&fs_info->qgroup_rescan_completion);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
-	memset(&fs_info->qgroup_rescan_work, 0,
-	       sizeof(fs_info->qgroup_rescan_work));
-
 	btrfs_init_work(&fs_info->qgroup_rescan_work,
 			btrfs_qgroup_rescan_helper,
 			btrfs_qgroup_rescan_worker, NULL, NULL);
-- 
2.12.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-05-02 21:11 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
@ 2018-05-02 21:11 ` jeffm
  2018-05-03  6:23 ` [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) Nikolay Borisov
  2019-11-28  3:28 ` Qu Wenruo
  4 siblings, 0 replies; 17+ messages in thread
From: jeffm @ 2018-05-02 21:11 UTC (permalink / raw)
  To: dsterba, linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

If we fail to allocate memory for a path, don't bother trying to
insert the qgroup status item.  We haven't done anything yet and it'll
fail also.  Just print an error and be done with it.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/qgroup.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 3d47700c6a30..44d5e3da835a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2666,7 +2666,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 			btrfs_end_transaction(trans);
 	}
 
-out:
 	btrfs_free_path(path);
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
@@ -2702,13 +2701,13 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 
 	if (btrfs_fs_closing(fs_info)) {
 		btrfs_info(fs_info, "qgroup scan paused");
-	} else if (err >= 0) {
+		err = 0;
+	} else if (err >= 0)
 		btrfs_info(fs_info, "qgroup scan completed%s",
 			err > 0 ? " (inconsistency flag cleared)" : "");
-	} else {
+out:
+	if (err < 0)
 		btrfs_err(fs_info, "qgroup scan failed with %d", err);
-	}
-
 done:
 	mutex_lock(&fs_info->qgroup_rescan_lock);
 	fs_info->qgroup_rescan_running = false;
-- 
2.12.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
                   ` (2 preceding siblings ...)
  2018-05-02 21:11 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
@ 2018-05-03  6:23 ` Nikolay Borisov
  2018-05-03 22:27   ` Jeff Mahoney
  2019-11-28  3:28 ` Qu Wenruo
  4 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2018-05-03  6:23 UTC (permalink / raw)
  To: jeffm, dsterba, linux-btrfs



On  3.05.2018 00:11, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Hi Dave -
> 
> Here's the updated patchset for the rescan races.  This fixes the issue
> where we'd try to start multiple workers.  It introduces a new "ready"
> bool that we set during initialization and clear while queuing the worker.
> The queuer is also now responsible for most of the initialization.
> 
> I have a separate patch set start that gets rid of the racy mess surrounding
> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
> just set a flag to start it everywhere else.
I'd be interested in seeing those patches. Some time ago I did send a
patch which cleaned up the way qgroup rescan was initiated. It was done
from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
ought to really have well-defined semantics when qgroups rescan are run,
preferably we shouldn't be conflating rescan + run (unless there is
_really_ good reason to do). In the past the rescan from scan was used
only during qgroup enabling.

> 
> -Jeff
> 
> ---
> 
> Jeff Mahoney (3):
>   btrfs: qgroups, fix rescan worker running races
>   btrfs: qgroups, remove unnecessary memset before btrfs_init_work
>   btrfs: qgroup, don't try to insert status item after ENOMEM in rescan
>     worker
> 
>  fs/btrfs/async-thread.c |   1 +
>  fs/btrfs/ctree.h        |   2 +
>  fs/btrfs/qgroup.c       | 100 +++++++++++++++++++++++++++---------------------
>  3 files changed, 60 insertions(+), 43 deletions(-)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
@ 2018-05-03  7:24   ` Nikolay Borisov
  2018-05-03 13:39     ` Jeff Mahoney
  2018-05-10 19:49   ` Jeff Mahoney
  2018-05-10 23:04   ` Jeff Mahoney
  2 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2018-05-03  7:24 UTC (permalink / raw)
  To: jeffm, dsterba, linux-btrfs



On  3.05.2018 00:11, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.

Am I correct in my understanding that this qgroup_rescan_ready flag is
used to avoid qgroup_rescan_init being called AFTER it has already been
called but BEFORE queue_rescan_worker ? Why wasn't the initial version
of this patch without this flag sufficient?

> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}
>  
>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-03  7:24   ` Nikolay Borisov
@ 2018-05-03 13:39     ` Jeff Mahoney
  2018-05-03 15:52       ` Nikolay Borisov
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Mahoney @ 2018-05-03 13:39 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs

On 5/3/18 3:24 AM, Nikolay Borisov wrote:
> 
> 
> On  3.05.2018 00:11, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>> ended up reintroducing the hang-on-unmount bug that the commit it
>> intended to fix addressed.
>>
>> The race this time is between qgroup_rescan_init setting
>> ->qgroup_rescan_running = true and the worker starting.  There are
>> many scenarios where we initialize the worker and never start it.  The
>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>> This can happen even without involving error handling, since mounting
>> the file system read-only returns between initializing the worker and
>> queueing it.
>>
>> The right place to do it is when we're queuing the worker.  The flag
>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>> a completion.
>>
>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
>> refer to both runtime behavior and on-disk state, we introduce a new
>> fs_info->qgroup_rescan_ready to indicate that we're initialized and
>> waiting to start.
> 
> Am I correct in my understanding that this qgroup_rescan_ready flag is
> used to avoid qgroup_rescan_init being called AFTER it has already been
> called but BEFORE queue_rescan_worker ? Why wasn't the initial version
> of this patch without this flag sufficient?

No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN
flag near the end of the worker and clearing the running flag.  The
rescan lock is dropped in between, so btrfs_rescan_init will let a new
rescan request in while we update the status item on disk.  We wouldn't
have queued another worker since that's what the warning catches, but if
there were already tasks waiting for completion, they wouldn't have been
woken since the wait queue list would be reinitialized.  There's no way
to reorder clearing the flag without changing how we handle
->qgroup_flags.  I plan on doing that separately.  This was just meant
to be the simple fix.

That we can use the ready variable to also ensure that we don't let
qgroup_rescan_init be called twice without running the rescan is a nice
bonus.

-Jeff

>>
>> This patch introduces a new helper, queue_rescan_worker, that handles
>> most of the initialization, the two flags, and queuing the worker,
>> including races with unmount.
>>
>> While we're at it, ->qgroup_rescan_running is protected only by the
>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>> to take the spinlock too.
>>
>> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>>  fs/btrfs/ctree.h  |  2 ++
>>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>>  2 files changed, 58 insertions(+), 38 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index da308774b8a4..4003498bb714 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>>  	struct btrfs_workqueue *qgroup_rescan_workers;
>>  	struct completion qgroup_rescan_completion;
>>  	struct btrfs_work qgroup_rescan_work;
>> +	/* qgroup rescan worker is running or queued to run */
>> +	bool qgroup_rescan_ready;
>>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>>  
>>  	/* filesystem state */
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index aa259d6986e1..466744741873 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -101,6 +101,7 @@ static int
>>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>  		   int init_flags);
>>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
>> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>>  
>>  /* must be called with qgroup_ioctl_lock held */
>>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
>> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>>  	return ret;
>>  }
>>  
>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>> +{
>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>> +	if (btrfs_fs_closing(fs_info)) {
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +
>> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
>> +		btrfs_warn(fs_info, "rescan worker not ready");
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +	fs_info->qgroup_rescan_ready = false;
>> +
>> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
>> +		btrfs_warn(fs_info, "rescan worker already queued");
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
>> +	 * to need to wait.
>> +	 */
>> +	fs_info->qgroup_rescan_running = true;
>> +	init_completion(&fs_info->qgroup_rescan_completion);
>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +
>> +	memset(&fs_info->qgroup_rescan_work, 0,
>> +	       sizeof(fs_info->qgroup_rescan_work));
>> +
>> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
>> +			btrfs_qgroup_rescan_helper,
>> +			btrfs_qgroup_rescan_worker, NULL, NULL);
>> +
>> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> +			 &fs_info->qgroup_rescan_work);
>> +}
>> +
>>  /*
>>   * called from commit_transaction. Writes all changed qgroups to disk.
>>   */
>> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>>  		if (!ret) {
>>  			qgroup_rescan_zero_tracking(fs_info);
>> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -					 &fs_info->qgroup_rescan_work);
>> +			queue_rescan_worker(fs_info);
>>  		}
>>  		ret = 0;
>>  	}
>> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>>  	if (!path)
>>  		goto out;
>>  
>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +
>>  	err = 0;
>>  	while (!err && !btrfs_fs_closing(fs_info)) {
>>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
>> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>  {
>>  	int ret = 0;
>>  
>> -	if (!init_flags &&
>> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
>> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
>> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>>  		ret = -EINVAL;
>>  		goto err;
>>  	}
>>  
>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>> -	spin_lock(&fs_info->qgroup_lock);
>> -
>> -	if (init_flags) {
>> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>> -			ret = -EINPROGRESS;
>> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
>> -			ret = -EINVAL;
>> -
>> -		if (ret) {
>> -			spin_unlock(&fs_info->qgroup_lock);
>> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
>> -			goto err;
>> -		}
>> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		ret = -EINPROGRESS;
>> +		goto err;
>>  	}
>>  
>>  	memset(&fs_info->qgroup_rescan_progress, 0,
>>  		sizeof(fs_info->qgroup_rescan_progress));
>>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>> -	init_completion(&fs_info->qgroup_rescan_completion);
>> -	fs_info->qgroup_rescan_running = true;
>> +	fs_info->qgroup_rescan_ready = true;
>>  
>> -	spin_unlock(&fs_info->qgroup_lock);
>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>  
>> -	memset(&fs_info->qgroup_rescan_work, 0,
>> -	       sizeof(fs_info->qgroup_rescan_work));
>> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
>> -			btrfs_qgroup_rescan_helper,
>> -			btrfs_qgroup_rescan_worker, NULL, NULL);
>> -
>> -	if (ret) {
>>  err:
>> +	if (ret) {
>>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>>  		return ret;
>>  	}
>> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>>  
>>  	qgroup_rescan_zero_tracking(fs_info);
>>  
>> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -			 &fs_info->qgroup_rescan_work);
>> -
>> +	queue_rescan_worker(fs_info);
>>  	return 0;
>>  }
>>  
>> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>  	int ret = 0;
>>  
>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>> -	spin_lock(&fs_info->qgroup_lock);
>>  	running = fs_info->qgroup_rescan_running;
>> -	spin_unlock(&fs_info->qgroup_lock);
>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>  
>>  	if (!running)
>> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>   * this is only called from open_ctree where we're still single threaded, thus
>>   * locking is omitted here.
>>   */
>> -void
>> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>  {
>>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -				 &fs_info->qgroup_rescan_work);
>> +		queue_rescan_worker(fs_info);
>>  }
>>  
>>  /*
>>


-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-03 13:39     ` Jeff Mahoney
@ 2018-05-03 15:52       ` Nikolay Borisov
  2018-05-03 15:57         ` Jeff Mahoney
  0 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2018-05-03 15:52 UTC (permalink / raw)
  To: Jeff Mahoney, dsterba, linux-btrfs



On  3.05.2018 16:39, Jeff Mahoney wrote:
> On 5/3/18 3:24 AM, Nikolay Borisov wrote:
>>
>>
>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>>> ended up reintroducing the hang-on-unmount bug that the commit it
>>> intended to fix addressed.
>>>
>>> The race this time is between qgroup_rescan_init setting
>>> ->qgroup_rescan_running = true and the worker starting.  There are
>>> many scenarios where we initialize the worker and never start it.  The
>>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>>> This can happen even without involving error handling, since mounting
>>> the file system read-only returns between initializing the worker and
>>> queueing it.
>>>
>>> The right place to do it is when we're queuing the worker.  The flag
>>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>>> a completion.
>>>
>>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
>>> refer to both runtime behavior and on-disk state, we introduce a new
>>> fs_info->qgroup_rescan_ready to indicate that we're initialized and
>>> waiting to start.
>>
>> Am I correct in my understanding that this qgroup_rescan_ready flag is
>> used to avoid qgroup_rescan_init being called AFTER it has already been
>> called but BEFORE queue_rescan_worker ? Why wasn't the initial version
>> of this patch without this flag sufficient?
> 
> No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN
> flag near the end of the worker and clearing the running flag.  The
> rescan lock is dropped in between, so btrfs_rescan_init will let a new
> rescan request in while we update the status item on disk.  We wouldn't
> have queued another worker since that's what the warning catches, but if
> there were already tasks waiting for completion, they wouldn't have been
> woken since the wait queue list would be reinitialized.  There's no way
> to reorder clearing the flag without changing how we handle
> ->qgroup_flags.  I plan on doing that separately.  This was just meant
> to be the simple fix.

Great, I think some of this information should go into the change log,
in explaining what the symptoms of the race condition are.

> 
> That we can use the ready variable to also ensure that we don't let
> qgroup_rescan_init be called twice without running the rescan is a nice
> bonus.
> 
> -Jeff
> 
>>>
>>> This patch introduces a new helper, queue_rescan_worker, that handles
>>> most of the initialization, the two flags, and queuing the worker,
>>> including races with unmount.
>>>
>>> While we're at it, ->qgroup_rescan_running is protected only by the
>>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>>> to take the spinlock too.
>>>
>>> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>> ---
>>>  fs/btrfs/ctree.h  |  2 ++
>>>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>>>  2 files changed, 58 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index da308774b8a4..4003498bb714 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>>>  	struct btrfs_workqueue *qgroup_rescan_workers;
>>>  	struct completion qgroup_rescan_completion;
>>>  	struct btrfs_work qgroup_rescan_work;
>>> +	/* qgroup rescan worker is running or queued to run */
>>> +	bool qgroup_rescan_ready;
>>>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>>>  
>>>  	/* filesystem state */
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index aa259d6986e1..466744741873 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -101,6 +101,7 @@ static int
>>>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>>  		   int init_flags);
>>>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
>>> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>>>  
>>>  /* must be called with qgroup_ioctl_lock held */
>>>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
>>> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>>>  	return ret;
>>>  }
>>>  
>>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>>> +{
>>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> +	if (btrfs_fs_closing(fs_info)) {
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +
>>> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
>>> +		btrfs_warn(fs_info, "rescan worker not ready");
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +	fs_info->qgroup_rescan_ready = false;
>>> +
>>> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
>>> +		btrfs_warn(fs_info, "rescan worker already queued");
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
>>> +	 * to need to wait.
>>> +	 */
>>> +	fs_info->qgroup_rescan_running = true;
>>> +	init_completion(&fs_info->qgroup_rescan_completion);
>>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +
>>> +	memset(&fs_info->qgroup_rescan_work, 0,
>>> +	       sizeof(fs_info->qgroup_rescan_work));
>>> +
>>> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
>>> +			btrfs_qgroup_rescan_helper,
>>> +			btrfs_qgroup_rescan_worker, NULL, NULL);
>>> +
>>> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> +			 &fs_info->qgroup_rescan_work);
>>> +}
>>> +
>>>  /*
>>>   * called from commit_transaction. Writes all changed qgroups to disk.
>>>   */
>>> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>>>  		if (!ret) {
>>>  			qgroup_rescan_zero_tracking(fs_info);
>>> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> -					 &fs_info->qgroup_rescan_work);
>>> +			queue_rescan_worker(fs_info);
>>>  		}
>>>  		ret = 0;
>>>  	}
>>> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>>>  	if (!path)
>>>  		goto out;
>>>  
>>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +
>>>  	err = 0;
>>>  	while (!err && !btrfs_fs_closing(fs_info)) {
>>>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
>>> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>>  {
>>>  	int ret = 0;
>>>  
>>> -	if (!init_flags &&
>>> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
>>> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
>>> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>>>  		ret = -EINVAL;
>>>  		goto err;
>>>  	}
>>>  
>>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> -	spin_lock(&fs_info->qgroup_lock);
>>> -
>>> -	if (init_flags) {
>>> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>>> -			ret = -EINPROGRESS;
>>> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
>>> -			ret = -EINVAL;
>>> -
>>> -		if (ret) {
>>> -			spin_unlock(&fs_info->qgroup_lock);
>>> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> -			goto err;
>>> -		}
>>> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>>> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		ret = -EINPROGRESS;
>>> +		goto err;
>>>  	}
>>>  
>>>  	memset(&fs_info->qgroup_rescan_progress, 0,
>>>  		sizeof(fs_info->qgroup_rescan_progress));
>>>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>>> -	init_completion(&fs_info->qgroup_rescan_completion);
>>> -	fs_info->qgroup_rescan_running = true;
>>> +	fs_info->qgroup_rescan_ready = true;
>>>  
>>> -	spin_unlock(&fs_info->qgroup_lock);
>>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>>  
>>> -	memset(&fs_info->qgroup_rescan_work, 0,
>>> -	       sizeof(fs_info->qgroup_rescan_work));
>>> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
>>> -			btrfs_qgroup_rescan_helper,
>>> -			btrfs_qgroup_rescan_worker, NULL, NULL);
>>> -
>>> -	if (ret) {
>>>  err:
>>> +	if (ret) {
>>>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>>>  		return ret;
>>>  	}
>>> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>>>  
>>>  	qgroup_rescan_zero_tracking(fs_info);
>>>  
>>> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> -			 &fs_info->qgroup_rescan_work);
>>> -
>>> +	queue_rescan_worker(fs_info);
>>>  	return 0;
>>>  }
>>>  
>>> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>>  	int ret = 0;
>>>  
>>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> -	spin_lock(&fs_info->qgroup_lock);
>>>  	running = fs_info->qgroup_rescan_running;
>>> -	spin_unlock(&fs_info->qgroup_lock);
>>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>>  
>>>  	if (!running)
>>> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>>   * this is only called from open_ctree where we're still single threaded, thus
>>>   * locking is omitted here.
>>>   */
>>> -void
>>> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>>  {
>>>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>>> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> -				 &fs_info->qgroup_rescan_work);
>>> +		queue_rescan_worker(fs_info);
>>>  }
>>>  
>>>  /*
>>>
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-03 15:52       ` Nikolay Borisov
@ 2018-05-03 15:57         ` Jeff Mahoney
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Mahoney @ 2018-05-03 15:57 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 2662 bytes --]

On 5/3/18 11:52 AM, Nikolay Borisov wrote:
> 
> 
> On  3.05.2018 16:39, Jeff Mahoney wrote:
>> On 5/3/18 3:24 AM, Nikolay Borisov wrote:
>>>
>>>
>>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>
>>>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>>>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>>>> ended up reintroducing the hang-on-unmount bug that the commit it
>>>> intended to fix addressed.
>>>>
>>>> The race this time is between qgroup_rescan_init setting
>>>> ->qgroup_rescan_running = true and the worker starting.  There are
>>>> many scenarios where we initialize the worker and never start it.  The
>>>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>>>> This can happen even without involving error handling, since mounting
>>>> the file system read-only returns between initializing the worker and
>>>> queueing it.
>>>>
>>>> The right place to do it is when we're queuing the worker.  The flag
>>>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>>>> a completion.
>>>>
>>>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
>>>> refer to both runtime behavior and on-disk state, we introduce a new
>>>> fs_info->qgroup_rescan_ready to indicate that we're initialized and
>>>> waiting to start.
>>>
>>> Am I correct in my understanding that this qgroup_rescan_ready flag is
>>> used to avoid qgroup_rescan_init being called AFTER it has already been
>>> called but BEFORE queue_rescan_worker ? Why wasn't the initial version
>>> of this patch without this flag sufficient?
>>
>> No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN
>> flag near the end of the worker and clearing the running flag.  The
>> rescan lock is dropped in between, so btrfs_rescan_init will let a new
>> rescan request in while we update the status item on disk.  We wouldn't
>> have queued another worker since that's what the warning catches, but if
>> there were already tasks waiting for completion, they wouldn't have been
>> woken since the wait queue list would be reinitialized.  There's no way
>> to reorder clearing the flag without changing how we handle
>> ->qgroup_flags.  I plan on doing that separately.  This was just meant
>> to be the simple fix.
> 
> Great, I think some of this information should go into the change log,
> in explaining what the symptoms of the race condition are.

You're right.  I was treating as a race that my patch introduced but it
didn't.  It just complained about it.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2018-05-03  6:23 ` [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) Nikolay Borisov
@ 2018-05-03 22:27   ` Jeff Mahoney
  2018-05-04  5:59     ` Nikolay Borisov
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Mahoney @ 2018-05-03 22:27 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3437 bytes --]

On 5/3/18 2:23 AM, Nikolay Borisov wrote:
> 
> 
> On  3.05.2018 00:11, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Hi Dave -
>>
>> Here's the updated patchset for the rescan races.  This fixes the issue
>> where we'd try to start multiple workers.  It introduces a new "ready"
>> bool that we set during initialization and clear while queuing the worker.
>> The queuer is also now responsible for most of the initialization.
>>
>> I have a separate patch set start that gets rid of the racy mess surrounding
>> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
>> just set a flag to start it everywhere else.
> I'd be interested in seeing those patches. Some time ago I did send a
> patch which cleaned up the way qgroup rescan was initiated. It was done
> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
> ought to really have well-defined semantics when qgroups rescan are run,
> preferably we shouldn't be conflating rescan + run (unless there is
> _really_ good reason to do). In the past the rescan from scan was used
> only during qgroup enabling.

I think btrfs_run_qgroups is the place to do it.  Here's why:

2773 int
2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
2775 {
2776         int ret = 0;
2777         struct btrfs_trans_handle *trans;
2778
2779         ret = qgroup_rescan_init(fs_info, 0, 1);
2780         if (ret)
2781                 return ret;
2782
2783         /*
2784          * We have set the rescan_progress to 0, which means no more
2785          * delayed refs will be accounted by btrfs_qgroup_account_ref.
2786          * However, btrfs_qgroup_account_ref may be right after its call
2787          * to btrfs_find_all_roots, in which case it would still do the
2788          * accounting.
2789          * To solve this, we're committing the transaction, which will
2790          * ensure we run all delayed refs and only after that, we are
2791          * going to clear all tracking information for a clean start.
2792          */
2793
2794         trans = btrfs_join_transaction(fs_info->fs_root);
2795         if (IS_ERR(trans)) {
2796                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
2797                 return PTR_ERR(trans);
2798         }
2799         ret = btrfs_commit_transaction(trans);
2800         if (ret) {
2801                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
2802                 return ret;
2803         }
2804
2805         qgroup_rescan_zero_tracking(fs_info);
2806
2807         queue_rescan_worker(fs_info);
2808         return 0;
2809 }

The delayed ref race should exist anywhere we initiate a rescan outside of
initially enabling qgroups.  We already zero the tracking and queue the rescan
worker in btrfs_run_qgroups for when we enable qgroups.  Why not just always
queue the worker there so the initialization and execution has a clear starting point?

There are a few other races I'd like to fix as well.  We call btrfs_run_qgroups
directly from btrfs_ioctl_qgroup_assign, which is buggy since
btrfs_add_qgroup_relation only checks to see if the quota_root exists.  It will
exist as soon as btrfs_quota_enable runs but we won't have committed the
transaction yet.  The call will end up enabling quotas in the middle of a transaction.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2018-05-03 22:27   ` Jeff Mahoney
@ 2018-05-04  5:59     ` Nikolay Borisov
  2018-05-04 13:32       ` Jeff Mahoney
  0 siblings, 1 reply; 17+ messages in thread
From: Nikolay Borisov @ 2018-05-04  5:59 UTC (permalink / raw)
  To: Jeff Mahoney, dsterba, linux-btrfs



On  4.05.2018 01:27, Jeff Mahoney wrote:
> On 5/3/18 2:23 AM, Nikolay Borisov wrote:
>>
>>
>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> Hi Dave -
>>>
>>> Here's the updated patchset for the rescan races.  This fixes the issue
>>> where we'd try to start multiple workers.  It introduces a new "ready"
>>> bool that we set during initialization and clear while queuing the worker.
>>> The queuer is also now responsible for most of the initialization.
>>>
>>> I have a separate patch set start that gets rid of the racy mess surrounding
>>> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
>>> just set a flag to start it everywhere else.
>> I'd be interested in seeing those patches. Some time ago I did send a
>> patch which cleaned up the way qgroup rescan was initiated. It was done
>> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
>> ought to really have well-defined semantics when qgroups rescan are run,
>> preferably we shouldn't be conflating rescan + run (unless there is
>> _really_ good reason to do). In the past the rescan from scan was used
>> only during qgroup enabling.
> 
> I think btrfs_run_qgroups is the place to do it.  Here's why:
> 
> 2773 int
> 2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
> 2775 {
> 2776         int ret = 0;
> 2777         struct btrfs_trans_handle *trans;
> 2778
> 2779         ret = qgroup_rescan_init(fs_info, 0, 1);
> 2780         if (ret)
> 2781                 return ret;
> 2782
> 2783         /*
> 2784          * We have set the rescan_progress to 0, which means no more
> 2785          * delayed refs will be accounted by btrfs_qgroup_account_ref.
> 2786          * However, btrfs_qgroup_account_ref may be right after its call
> 2787          * to btrfs_find_all_roots, in which case it would still do the
> 2788          * accounting.
> 2789          * To solve this, we're committing the transaction, which will
> 2790          * ensure we run all delayed refs and only after that, we are
> 2791          * going to clear all tracking information for a clean start.
> 2792          */
> 2793
> 2794         trans = btrfs_join_transaction(fs_info->fs_root);
> 2795         if (IS_ERR(trans)) {
> 2796                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> 2797                 return PTR_ERR(trans);
> 2798         }
> 2799         ret = btrfs_commit_transaction(trans);
> 2800         if (ret) {
> 2801                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> 2802                 return ret;
> 2803         }
> 2804
> 2805         qgroup_rescan_zero_tracking(fs_info);
> 2806
> 2807         queue_rescan_worker(fs_info);
> 2808         return 0;
> 2809 }
> 
> The delayed ref race should exist anywhere we initiate a rescan outside of
> initially enabling qgroups.  We already zero the tracking and queue the rescan
> worker in btrfs_run_qgroups for when we enable qgroups.  Why not just always
> queue the worker there so the initialization and execution has a clear starting point?

This is no longer true in upstream as of commit 5d23515be669 ("btrfs:
Move qgroup rescan on quota enable to btrfs_quota_enable"). Hence my
asking about this. I guess if we make it unconditional it won't increase
the complexity, but the original code which was only run during qgroup
enable was rather iffy I Just don't want to repeat this.



> There are a few other races I'd like to fix as well.  We call btrfs_run_qgroups
> directly from btrfs_ioctl_qgroup_assign, which is buggy since
> btrfs_add_qgroup_relation only checks to see if the quota_root exists.  It will
> exist as soon as btrfs_quota_enable runs but we won't have committed the
> transaction yet.  The call will end up enabling quotas in the middle of a transaction.
> 
> -Jeff
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2018-05-04  5:59     ` Nikolay Borisov
@ 2018-05-04 13:32       ` Jeff Mahoney
  2018-05-04 13:41         ` Nikolay Borisov
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Mahoney @ 2018-05-04 13:32 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3798 bytes --]

On 5/4/18 1:59 AM, Nikolay Borisov wrote:
> 
> 
> On  4.05.2018 01:27, Jeff Mahoney wrote:
>> On 5/3/18 2:23 AM, Nikolay Borisov wrote:
>>>
>>>
>>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>
>>>> Hi Dave -
>>>>
>>>> Here's the updated patchset for the rescan races.  This fixes the issue
>>>> where we'd try to start multiple workers.  It introduces a new "ready"
>>>> bool that we set during initialization and clear while queuing the worker.
>>>> The queuer is also now responsible for most of the initialization.
>>>>
>>>> I have a separate patch set start that gets rid of the racy mess surrounding
>>>> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
>>>> just set a flag to start it everywhere else.
>>> I'd be interested in seeing those patches. Some time ago I did send a
>>> patch which cleaned up the way qgroup rescan was initiated. It was done
>>> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
>>> ought to really have well-defined semantics when qgroups rescan are run,
>>> preferably we shouldn't be conflating rescan + run (unless there is
>>> _really_ good reason to do). In the past the rescan from scan was used
>>> only during qgroup enabling.
>>
>> I think btrfs_run_qgroups is the place to do it.  Here's why:
>>
>> 2773 int
>> 2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>> 2775 {
>> 2776         int ret = 0;
>> 2777         struct btrfs_trans_handle *trans;
>> 2778
>> 2779         ret = qgroup_rescan_init(fs_info, 0, 1);
>> 2780         if (ret)
>> 2781                 return ret;
>> 2782
>> 2783         /*
>> 2784          * We have set the rescan_progress to 0, which means no more
>> 2785          * delayed refs will be accounted by btrfs_qgroup_account_ref.
>> 2786          * However, btrfs_qgroup_account_ref may be right after its call
>> 2787          * to btrfs_find_all_roots, in which case it would still do the
>> 2788          * accounting.
>> 2789          * To solve this, we're committing the transaction, which will
>> 2790          * ensure we run all delayed refs and only after that, we are
>> 2791          * going to clear all tracking information for a clean start.
>> 2792          */
>> 2793
>> 2794         trans = btrfs_join_transaction(fs_info->fs_root);
>> 2795         if (IS_ERR(trans)) {
>> 2796                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> 2797                 return PTR_ERR(trans);
>> 2798         }
>> 2799         ret = btrfs_commit_transaction(trans);
>> 2800         if (ret) {
>> 2801                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> 2802                 return ret;
>> 2803         }
>> 2804
>> 2805         qgroup_rescan_zero_tracking(fs_info);
>> 2806
>> 2807         queue_rescan_worker(fs_info);
>> 2808         return 0;
>> 2809 }
>>
>> The delayed ref race should exist anywhere we initiate a rescan outside of
>> initially enabling qgroups.  We already zero the tracking and queue the rescan
>> worker in btrfs_run_qgroups for when we enable qgroups.  Why not just always
>> queue the worker there so the initialization and execution has a clear starting point?
> 
> This is no longer true in upstream as of commit 5d23515be669 ("btrfs:
> Move qgroup rescan on quota enable to btrfs_quota_enable"). Hence my
> asking about this. I guess if we make it unconditional it won't increase
> the complexity, but the original code which was only run during qgroup
> enable was rather iffy I Just don't want to repeat this.

Ah, ok.  My repo is still using v4.16.  How does this work with the race
that is described in btrfs_qgroup_rescan?

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2018-05-04 13:32       ` Jeff Mahoney
@ 2018-05-04 13:41         ` Nikolay Borisov
  0 siblings, 0 replies; 17+ messages in thread
From: Nikolay Borisov @ 2018-05-04 13:41 UTC (permalink / raw)
  To: Jeff Mahoney, dsterba, linux-btrfs



On  4.05.2018 16:32, Jeff Mahoney wrote:
> On 5/4/18 1:59 AM, Nikolay Borisov wrote:
>>
>>
>> On  4.05.2018 01:27, Jeff Mahoney wrote:
>>> On 5/3/18 2:23 AM, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>>
>>>>> Hi Dave -
>>>>>
>>>>> Here's the updated patchset for the rescan races.  This fixes the issue
>>>>> where we'd try to start multiple workers.  It introduces a new "ready"
>>>>> bool that we set during initialization and clear while queuing the worker.
>>>>> The queuer is also now responsible for most of the initialization.
>>>>>
>>>>> I have a separate patch set start that gets rid of the racy mess surrounding
>>>>> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
>>>>> just set a flag to start it everywhere else.
>>>> I'd be interested in seeing those patches. Some time ago I did send a
>>>> patch which cleaned up the way qgroup rescan was initiated. It was done
>>>> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we
>>>> ought to really have well-defined semantics when qgroups rescan are run,
>>>> preferably we shouldn't be conflating rescan + run (unless there is
>>>> _really_ good reason to do). In the past the rescan from scan was used
>>>> only during qgroup enabling.
>>>
>>> I think btrfs_run_qgroups is the place to do it.  Here's why:
>>>
>>> 2773 int
>>> 2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>>> 2775 {
>>> 2776         int ret = 0;
>>> 2777         struct btrfs_trans_handle *trans;
>>> 2778
>>> 2779         ret = qgroup_rescan_init(fs_info, 0, 1);
>>> 2780         if (ret)
>>> 2781                 return ret;
>>> 2782
>>> 2783         /*
>>> 2784          * We have set the rescan_progress to 0, which means no more
>>> 2785          * delayed refs will be accounted by btrfs_qgroup_account_ref.
>>> 2786          * However, btrfs_qgroup_account_ref may be right after its call
>>> 2787          * to btrfs_find_all_roots, in which case it would still do the
>>> 2788          * accounting.
>>> 2789          * To solve this, we're committing the transaction, which will
>>> 2790          * ensure we run all delayed refs and only after that, we are
>>> 2791          * going to clear all tracking information for a clean start.
>>> 2792          */
>>> 2793
>>> 2794         trans = btrfs_join_transaction(fs_info->fs_root);
>>> 2795         if (IS_ERR(trans)) {
>>> 2796                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>>> 2797                 return PTR_ERR(trans);
>>> 2798         }
>>> 2799         ret = btrfs_commit_transaction(trans);
>>> 2800         if (ret) {
>>> 2801                 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>>> 2802                 return ret;
>>> 2803         }
>>> 2804
>>> 2805         qgroup_rescan_zero_tracking(fs_info);
>>> 2806
>>> 2807         queue_rescan_worker(fs_info);
>>> 2808         return 0;
>>> 2809 }
>>>
>>> The delayed ref race should exist anywhere we initiate a rescan outside of
>>> initially enabling qgroups.  We already zero the tracking and queue the rescan
>>> worker in btrfs_run_qgroups for when we enable qgroups.  Why not just always
>>> queue the worker there so the initialization and execution has a clear starting point?
>>
>> This is no longer true in upstream as of commit 5d23515be669 ("btrfs:
>> Move qgroup rescan on quota enable to btrfs_quota_enable"). Hence my
>> asking about this. I guess if we make it unconditional it won't increase
>> the complexity, but the original code which was only run during qgroup
>> enable was rather iffy I Just don't want to repeat this.
> 
> Ah, ok.  My repo is still using v4.16.  How does this work with the race
> that is described in btrfs_qgroup_rescan?

TBH I didn't even consider it. It seems the qgroups code is just a
minefield ;\. So the original code only ever queued the rescan from
btrfs_run_qgroups if we were enabling qgroups i.e once. So I just moved
the code to queue the scan during the ioctl (btrfs_quota_enable)
execution. Prior to my patch it seems that the rescan following qgroup
enable was triggered during the first transaction commit.

> 
> -Jeff
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-05-03  7:24   ` Nikolay Borisov
@ 2018-05-10 19:49   ` Jeff Mahoney
  2018-05-10 23:04   ` Jeff Mahoney
  2 siblings, 0 replies; 17+ messages in thread
From: Jeff Mahoney @ 2018-05-10 19:49 UTC (permalink / raw)
  To: dsterba, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 8391 bytes --]

On 5/2/18 5:11 PM, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}
>  
>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)

This check will never be true since the worker is now responsible for
setting it.

-Jeff

> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-05-03  7:24   ` Nikolay Borisov
  2018-05-10 19:49   ` Jeff Mahoney
@ 2018-05-10 23:04   ` Jeff Mahoney
  2 siblings, 0 replies; 17+ messages in thread
From: Jeff Mahoney @ 2018-05-10 23:04 UTC (permalink / raw)
  To: dsterba, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 8643 bytes --]

On 5/2/18 5:11 PM, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}

Without checking for these flags when deciding whether we want to do
accounting, we'll end up doing the accounting until we start the rescan
thread.  This may not matter normally, but I'm working on a patch as a
workaround to 824d8dff8846533c9f1f9b1eabb0c03959e989ca doing all qgroup
accounting for entire trees in single transactions.

-Jeff

>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
                   ` (3 preceding siblings ...)
  2018-05-03  6:23 ` [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) Nikolay Borisov
@ 2019-11-28  3:28 ` Qu Wenruo
  2019-12-03 19:32   ` David Sterba
  4 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2019-11-28  3:28 UTC (permalink / raw)
  To: jeffm, dsterba, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 1264 bytes --]

Any feedback and update on this patchset?

It looks like the first patch is going to fix a bug of btrfs unable to
unmount the fs due to deadlock rescan.

Thanks,
Qu

On 2018/5/3 上午5:11, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Hi Dave -
> 
> Here's the updated patchset for the rescan races.  This fixes the issue
> where we'd try to start multiple workers.  It introduces a new "ready"
> bool that we set during initialization and clear while queuing the worker.
> The queuer is also now responsible for most of the initialization.
> 
> I have a separate patch set start that gets rid of the racy mess surrounding
> the rescan worker startup.  We can handle it in btrfs_run_qgroups and
> just set a flag to start it everywhere else.
> 
> -Jeff
> 
> ---
> 
> Jeff Mahoney (3):
>   btrfs: qgroups, fix rescan worker running races
>   btrfs: qgroups, remove unnecessary memset before btrfs_init_work
>   btrfs: qgroup, don't try to insert status item after ENOMEM in rescan
>     worker
> 
>  fs/btrfs/async-thread.c |   1 +
>  fs/btrfs/ctree.h        |   2 +
>  fs/btrfs/qgroup.c       | 100 +++++++++++++++++++++++++++---------------------
>  3 files changed, 60 insertions(+), 43 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/3] btrfs: qgroup rescan races (part 1)
  2019-11-28  3:28 ` Qu Wenruo
@ 2019-12-03 19:32   ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2019-12-03 19:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: jeffm, dsterba, linux-btrfs

On Thu, Nov 28, 2019 at 11:28:58AM +0800, Qu Wenruo wrote:
> Any feedback and update on this patchset?
> 
> It looks like the first patch is going to fix a bug of btrfs unable to
> unmount the fs due to deadlock rescan.

The patchset was posted 1.5 year ago, it would be better to resend. I
think there were some changes in the quota rescan, there were also some
replies to the thread so fresh iteration can continue the discussion.
Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
2018-05-03  7:24   ` Nikolay Borisov
2018-05-03 13:39     ` Jeff Mahoney
2018-05-03 15:52       ` Nikolay Borisov
2018-05-03 15:57         ` Jeff Mahoney
2018-05-10 19:49   ` Jeff Mahoney
2018-05-10 23:04   ` Jeff Mahoney
2018-05-02 21:11 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
2018-05-02 21:11 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
2018-05-03  6:23 ` [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) Nikolay Borisov
2018-05-03 22:27   ` Jeff Mahoney
2018-05-04  5:59     ` Nikolay Borisov
2018-05-04 13:32       ` Jeff Mahoney
2018-05-04 13:41         ` Nikolay Borisov
2019-11-28  3:28 ` Qu Wenruo
2019-12-03 19:32   ` David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git