All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
@ 2018-04-26 19:23 jeffm
  2018-04-26 19:23 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: jeffm @ 2018-04-26 19:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
ended up reintroducing the hang-on-unmount bug that the commit it
intended to fix addressed.

The race this time is between qgroup_rescan_init setting
->qgroup_rescan_running = true and the worker starting.  There are
many scenarios where we initialize the worker and never start it.  The
completion btrfs_ioctl_quota_rescan_wait waits for will never come.
This can happen even without involving error handling, since mounting
the file system read-only returns between initializing the worker and
queueing it.

The right place to do it is when we're queuing the worker.  The flag
really just means that btrfs_ioctl_quota_rescan_wait should wait for
a completion.

This patch introduces a new helper, queue_rescan_worker, that handles
the ->qgroup_rescan_running flag, including any races with umount.

While we're at it, ->qgroup_rescan_running is protected only by the
->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
to take the spinlock too.

Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/ctree.h  |  1 +
 fs/btrfs/qgroup.c | 40 ++++++++++++++++++++++++++++------------
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index da308774b8a4..dbba615f4d0f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1045,6 +1045,7 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *qgroup_rescan_workers;
 	struct completion qgroup_rescan_completion;
 	struct btrfs_work qgroup_rescan_work;
+	/* qgroup rescan worker is running or queued to run */
 	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
 
 	/* filesystem state */
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index aa259d6986e1..be491b6c020a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
+{
+	mutex_lock(&fs_info->qgroup_rescan_lock);
+	if (btrfs_fs_closing(fs_info)) {
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+	if (WARN_ON(fs_info->qgroup_rescan_running)) {
+		btrfs_warn(fs_info, "rescan worker already queued");
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+
+	/*
+	 * Being queued is enough for btrfs_qgroup_wait_for_completion
+	 * to need to wait.
+	 */
+	fs_info->qgroup_rescan_running = true;
+	mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+	btrfs_queue_work(fs_info->qgroup_rescan_workers,
+			 &fs_info->qgroup_rescan_work);
+}
+
 /*
  * called from commit_transaction. Writes all changed qgroups to disk.
  */
@@ -2123,8 +2147,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 		ret = qgroup_rescan_init(fs_info, 0, 1);
 		if (!ret) {
 			qgroup_rescan_zero_tracking(fs_info);
-			btrfs_queue_work(fs_info->qgroup_rescan_workers,
-					 &fs_info->qgroup_rescan_work);
+			queue_rescan_worker(fs_info);
 		}
 		ret = 0;
 	}
@@ -2713,7 +2736,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 		sizeof(fs_info->qgroup_rescan_progress));
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
 	init_completion(&fs_info->qgroup_rescan_completion);
-	fs_info->qgroup_rescan_running = true;
 
 	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
@@ -2785,9 +2807,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
 	qgroup_rescan_zero_tracking(fs_info);
 
-	btrfs_queue_work(fs_info->qgroup_rescan_workers,
-			 &fs_info->qgroup_rescan_work);
-
+	queue_rescan_worker(fs_info);
 	return 0;
 }
 
@@ -2798,9 +2818,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
 	running = fs_info->qgroup_rescan_running;
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	if (!running)
@@ -2819,12 +2837,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
  * this is only called from open_ctree where we're still single threaded, thus
  * locking is omitted here.
  */
-void
-btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
+void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 {
 	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
-		btrfs_queue_work(fs_info->qgroup_rescan_workers,
-				 &fs_info->qgroup_rescan_work);
+		queue_rescan_worker(fs_info);
 }
 
 /*
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
@ 2018-04-26 19:23 ` jeffm
  2018-04-26 20:37   ` Nikolay Borisov
  2018-04-26 19:23 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: jeffm @ 2018-04-26 19:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

btrfs_init_work clears the work struct except for ->wq, so the memset
before calling btrfs_init_work in qgroup_rescan_init is unnecessary.

We'll also initialize ->wq in btrfs_init_work so that it's obvious.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/async-thread.c | 1 +
 fs/btrfs/qgroup.c       | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index d5540749f0e5..c614fb7b9b9d 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -354,6 +354,7 @@ void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
 	INIT_WORK(&work->normal_work, uniq_func);
 	INIT_LIST_HEAD(&work->ordered_list);
 	work->flags = 0;
+	work->wq = NULL;
 }
 
 static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index be491b6c020a..8de423a0c7e3 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2740,8 +2740,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
-	memset(&fs_info->qgroup_rescan_work, 0,
-	       sizeof(fs_info->qgroup_rescan_work));
 	btrfs_init_work(&fs_info->qgroup_rescan_work,
 			btrfs_qgroup_rescan_helper,
 			btrfs_qgroup_rescan_worker, NULL, NULL);
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-04-26 19:23 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
@ 2018-04-26 19:23 ` jeffm
  2018-04-26 20:39   ` Nikolay Borisov
  2018-04-27  8:42 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races Nikolay Borisov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: jeffm @ 2018-04-26 19:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

If we fail to allocate memory for a path, don't bother trying to
insert the qgroup status item.  We haven't done anything yet and it'll
fail also.  Just print an error and be done with it.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/qgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8de423a0c7e3..4c0978bce5b9 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2648,7 +2648,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 			btrfs_end_transaction(trans);
 	}
 
-out:
 	btrfs_free_path(path);
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
@@ -2688,6 +2687,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 		btrfs_info(fs_info, "qgroup scan completed%s",
 			err > 0 ? " (inconsistency flag cleared)" : "");
 	} else {
+out:
 		btrfs_err(fs_info, "qgroup scan failed with %d", err);
 	}
 
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work
  2018-04-26 19:23 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
@ 2018-04-26 20:37   ` Nikolay Borisov
  0 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2018-04-26 20:37 UTC (permalink / raw)
  To: jeffm, linux-btrfs



On 26.04.2018 22:23, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> btrfs_init_work clears the work struct except for ->wq, so the memset
> before calling btrfs_init_work in qgroup_rescan_init is unnecessary.
> 
> We'll also initialize ->wq in btrfs_init_work so that it's obvious.
> 
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/async-thread.c | 1 +
>  fs/btrfs/qgroup.c       | 2 --
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> index d5540749f0e5..c614fb7b9b9d 100644
> --- a/fs/btrfs/async-thread.c
> +++ b/fs/btrfs/async-thread.c
> @@ -354,6 +354,7 @@ void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
>  	INIT_WORK(&work->normal_work, uniq_func);
>  	INIT_LIST_HEAD(&work->ordered_list);
>  	work->flags = 0;
> +	work->wq = NULL;
>  }
>  
>  static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index be491b6c020a..8de423a0c7e3 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2740,8 +2740,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
>  	btrfs_init_work(&fs_info->qgroup_rescan_work,
>  			btrfs_qgroup_rescan_helper,
>  			btrfs_qgroup_rescan_worker, NULL, NULL);
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-04-26 19:23 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
@ 2018-04-26 20:39   ` Nikolay Borisov
  2018-04-27 15:44     ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2018-04-26 20:39 UTC (permalink / raw)
  To: jeffm, linux-btrfs



On 26.04.2018 22:23, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> If we fail to allocate memory for a path, don't bother trying to
> insert the qgroup status item.  We haven't done anything yet and it'll
> fail also.  Just print an error and be done with it.
> 
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>

nit: So the code is correct however, having the out label there is
really ugly. What about on path alloc failure just have the print in the
if branch do goto done?


Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/qgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 8de423a0c7e3..4c0978bce5b9 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2648,7 +2648,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  			btrfs_end_transaction(trans);
>  	}
>  
> -out:
>  	btrfs_free_path(path);
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> @@ -2688,6 +2687,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  		btrfs_info(fs_info, "qgroup scan completed%s",
>  			err > 0 ? " (inconsistency flag cleared)" : "");
>  	} else {
> +out:
>  		btrfs_err(fs_info, "qgroup scan failed with %d", err);
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-04-26 19:23 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
  2018-04-26 19:23 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
@ 2018-04-27  8:42 ` Nikolay Borisov
  2018-04-27  8:48 ` Filipe Manana
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2018-04-27  8:42 UTC (permalink / raw)
  To: jeffm, linux-btrfs



On 26.04.2018 22:23, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> the ->qgroup_rescan_running flag, including any races with umount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>


LGTM.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/ctree.h  |  1 +
>  fs/btrfs/qgroup.c | 40 ++++++++++++++++++++++++++++------------
>  2 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..dbba615f4d0f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,7 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..be491b6c020a 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2147,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}

So here it's not possible to race, since if qgroup_rescan_init returns 0
then we are guaranteed to queue the rescan.

>  		ret = 0;
>  	}
> @@ -2713,7 +2736,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>  	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
>  
>  	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
> @@ -2785,9 +2807,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);

Which leaves this to be the only problematic case, in case transaction
joining/commit fails, right?

>  	return 0;
>  }
>  
> @@ -2798,9 +2818,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2837,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
                   ` (2 preceding siblings ...)
  2018-04-27  8:42 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races Nikolay Borisov
@ 2018-04-27  8:48 ` Filipe Manana
  2018-04-27 16:00   ` Jeff Mahoney
  2018-04-27 15:56 ` David Sterba
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Filipe Manana @ 2018-04-27  8:48 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: linux-btrfs

On Thu, Apr 26, 2018 at 8:23 PM,  <jeffm@suse.com> wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
>
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
>
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
>
> This patch introduces a new helper, queue_rescan_worker, that handles
> the ->qgroup_rescan_running flag, including any races with umount.
>
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
>
> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)

The commit id and subjects don't match:

commit d2c609b834d62f1e91f1635a27dca29f7806d3d6
Author: Jeff Mahoney <jeffm@suse.com>
Date:   Mon Aug 15 12:10:33 2016 -0400

    btrfs: properly track when rescan worker is running


> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  1 +
>  fs/btrfs/qgroup.c | 40 ++++++++++++++++++++++++++++------------
>  2 files changed, 29 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..dbba615f4d0f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,7 @@ struct btrfs_fs_info {
>         struct btrfs_workqueue *qgroup_rescan_workers;
>         struct completion qgroup_rescan_completion;
>         struct btrfs_work qgroup_rescan_work;
> +       /* qgroup rescan worker is running or queued to run */
>         bool qgroup_rescan_running;     /* protected by qgroup_rescan_lock */
>
>         /* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..be491b6c020a 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>         return ret;
>  }
>
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +       mutex_lock(&fs_info->qgroup_rescan_lock);
> +       if (btrfs_fs_closing(fs_info)) {
> +               mutex_unlock(&fs_info->qgroup_rescan_lock);
> +               return;
> +       }
> +       if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +               btrfs_warn(fs_info, "rescan worker already queued");
> +               mutex_unlock(&fs_info->qgroup_rescan_lock);
> +               return;
> +       }
> +
> +       /*
> +        * Being queued is enough for btrfs_qgroup_wait_for_completion
> +        * to need to wait.
> +        */
> +       fs_info->qgroup_rescan_running = true;
> +       mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +       btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +                        &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2147,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>                 ret = qgroup_rescan_init(fs_info, 0, 1);
>                 if (!ret) {
>                         qgroup_rescan_zero_tracking(fs_info);
> -                       btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -                                        &fs_info->qgroup_rescan_work);
> +                       queue_rescan_worker(fs_info);
>                 }
>                 ret = 0;
>         }
> @@ -2713,7 +2736,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>                 sizeof(fs_info->qgroup_rescan_progress));
>         fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>         init_completion(&fs_info->qgroup_rescan_completion);
> -       fs_info->qgroup_rescan_running = true;
>
>         spin_unlock(&fs_info->qgroup_lock);
>         mutex_unlock(&fs_info->qgroup_rescan_lock);
> @@ -2785,9 +2807,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>
>         qgroup_rescan_zero_tracking(fs_info);
>
> -       btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -                        &fs_info->qgroup_rescan_work);
> -
> +       queue_rescan_worker(fs_info);
>         return 0;
>  }
>
> @@ -2798,9 +2818,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>         int ret = 0;
>
>         mutex_lock(&fs_info->qgroup_rescan_lock);
> -       spin_lock(&fs_info->qgroup_lock);
>         running = fs_info->qgroup_rescan_running;
> -       spin_unlock(&fs_info->qgroup_lock);
>         mutex_unlock(&fs_info->qgroup_rescan_lock);
>
>         if (!running)
> @@ -2819,12 +2837,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>         if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -               btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -                                &fs_info->qgroup_rescan_work);
> +               queue_rescan_worker(fs_info);
>  }
>
>  /*
> --
> 2.12.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-04-26 20:39   ` Nikolay Borisov
@ 2018-04-27 15:44     ` David Sterba
  2018-04-27 16:08       ` Jeff Mahoney
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2018-04-27 15:44 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: jeffm, linux-btrfs

On Thu, Apr 26, 2018 at 11:39:50PM +0300, Nikolay Borisov wrote:
> On 26.04.2018 22:23, jeffm@suse.com wrote:
> > From: Jeff Mahoney <jeffm@suse.com>
> > 
> > If we fail to allocate memory for a path, don't bother trying to
> > insert the qgroup status item.  We haven't done anything yet and it'll
> > fail also.  Just print an error and be done with it.
> > 
> > Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> 
> nit: So the code is correct however, having the out label there is
> really ugly. What about on path alloc failure just have the print in the
> if branch do goto done?

Yeah, I don't like jumping to the inner blocks either. I saw this in the
qgroup code so we should clean it up and not add new instances.

In this case, only the path allocation failure jumps to the out label,
so printing the message and then jump to done makes sense to me.
However, the message would have to be duplicated in the end, and I don't
see a better way without further restructuring the code.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
                   ` (3 preceding siblings ...)
  2018-04-27  8:48 ` Filipe Manana
@ 2018-04-27 15:56 ` David Sterba
  2018-04-27 16:02   ` Jeff Mahoney
  2018-04-27 19:28   ` Noah Massey
  2018-04-30  6:20 ` Qu Wenruo
  2018-05-02 10:29 ` David Sterba
  6 siblings, 2 replies; 32+ messages in thread
From: David Sterba @ 2018-04-27 15:56 UTC (permalink / raw)
  To: jeffm; +Cc: linux-btrfs

On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> the ->qgroup_rescan_running flag, including any races with umount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>

I've added this to misc-next as I'd like to push it to the next rc. The
Fixes is fixed.

> +	/* qgroup rescan worker is running or queued to run */
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */

Comments merged.

>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..be491b6c020a 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{

And this had to be moved upwards as there was earlier use of
btrfs_queue_work that matched following the hunk.

> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2147,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27  8:48 ` Filipe Manana
@ 2018-04-27 16:00   ` Jeff Mahoney
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Mahoney @ 2018-04-27 16:00 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On 4/27/18 4:48 AM, Filipe Manana wrote:
> On Thu, Apr 26, 2018 at 8:23 PM,  <jeffm@suse.com> wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>> ended up reintroducing the hang-on-unmount bug that the commit it
>> intended to fix addressed.
>>
>> The race this time is between qgroup_rescan_init setting
>> ->qgroup_rescan_running = true and the worker starting.  There are
>> many scenarios where we initialize the worker and never start it.  The
>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>> This can happen even without involving error handling, since mounting
>> the file system read-only returns between initializing the worker and
>> queueing it.
>>
>> The right place to do it is when we're queuing the worker.  The flag
>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>> a completion.
>>
>> This patch introduces a new helper, queue_rescan_worker, that handles
>> the ->qgroup_rescan_running flag, including any races with umount.
>>
>> While we're at it, ->qgroup_rescan_running is protected only by the
>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>> to take the spinlock too.
>>
>> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> 
> The commit id and subjects don't match:
> 
> commit d2c609b834d62f1e91f1635a27dca29f7806d3d6
> Author: Jeff Mahoney <jeffm@suse.com>
> Date:   Mon Aug 15 12:10:33 2016 -0400
> 
>     btrfs: properly track when rescan worker is running
> 


Thanks.  Fixed.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27 15:56 ` David Sterba
@ 2018-04-27 16:02   ` Jeff Mahoney
  2018-04-27 16:40     ` David Sterba
  2018-04-27 19:28   ` Noah Massey
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Mahoney @ 2018-04-27 16:02 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 4/27/18 11:56 AM, David Sterba wrote:
> On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>> ended up reintroducing the hang-on-unmount bug that the commit it
>> intended to fix addressed.
>>
>> The race this time is between qgroup_rescan_init setting
>> ->qgroup_rescan_running = true and the worker starting.  There are
>> many scenarios where we initialize the worker and never start it.  The
>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>> This can happen even without involving error handling, since mounting
>> the file system read-only returns between initializing the worker and
>> queueing it.
>>
>> The right place to do it is when we're queuing the worker.  The flag
>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>> a completion.
>>
>> This patch introduces a new helper, queue_rescan_worker, that handles
>> the ->qgroup_rescan_running flag, including any races with umount.
>>
>> While we're at it, ->qgroup_rescan_running is protected only by the
>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>> to take the spinlock too.
>>
>> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> 
> I've added this to misc-next as I'd like to push it to the next rc. The
> Fixes is fixed.
> 
>> +	/* qgroup rescan worker is running or queued to run */
>>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
> 
> Comments merged.

Thanks.

>>  	/* filesystem state */
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index aa259d6986e1..be491b6c020a 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>>  	return ret;
>>  }
>>  
>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>> +{
> 
> And this had to be moved upwards as there was earlier use of
> btrfs_queue_work that matched following the hunk.

Weird.  That must be exactly the kind of mismerge artifact that we were
talking about the other day.  In my tree it's in the right spot.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-04-27 15:44     ` David Sterba
@ 2018-04-27 16:08       ` Jeff Mahoney
  2018-04-27 16:11         ` [PATCH v2] " Jeff Mahoney
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Mahoney @ 2018-04-27 16:08 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, linux-btrfs

On 4/27/18 11:44 AM, David Sterba wrote:
> On Thu, Apr 26, 2018 at 11:39:50PM +0300, Nikolay Borisov wrote:
>> On 26.04.2018 22:23, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> If we fail to allocate memory for a path, don't bother trying to
>>> insert the qgroup status item.  We haven't done anything yet and it'll
>>> fail also.  Just print an error and be done with it.
>>>
>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>
>> nit: So the code is correct however, having the out label there is
>> really ugly. What about on path alloc failure just have the print in the
>> if branch do goto done?
> 
> Yeah, I don't like jumping to the inner blocks either. I saw this in the
> qgroup code so we should clean it up and not add new instances.
> 
> In this case, only the path allocation failure jumps to the out label,
> so printing the message and then jump to done makes sense to me.
> However, the message would have to be duplicated in the end, and I don't
> see a better way without further restructuring the code.
> 

It doesn't require major surgery.  The else can be disconnected.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-04-27 16:08       ` Jeff Mahoney
@ 2018-04-27 16:11         ` Jeff Mahoney
  2018-04-27 16:34           ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Mahoney @ 2018-04-27 16:11 UTC (permalink / raw)
  To: dsterba, Nikolay Borisov, linux-btrfs

If we fail to allocate memory for a path, don't bother trying to
insert the qgroup status item.  We haven't done anything yet and it'll
fail also.  Just print an error and be done with it.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/qgroup.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8de423a0c7e3..b795bad54705 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2648,7 +2648,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 			btrfs_end_transaction(trans);
 	}
 
-out:
 	btrfs_free_path(path);
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
@@ -2684,13 +2683,13 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 
 	if (btrfs_fs_closing(fs_info)) {
 		btrfs_info(fs_info, "qgroup scan paused");
-	} else if (err >= 0) {
+		err = 0;
+	} else if (err >= 0)
 		btrfs_info(fs_info, "qgroup scan completed%s",
 			err > 0 ? " (inconsistency flag cleared)" : "");
-	} else {
+out:
+	if (err < 0)
 		btrfs_err(fs_info, "qgroup scan failed with %d", err);
-	}
-
 done:
 	mutex_lock(&fs_info->qgroup_rescan_lock);
 	fs_info->qgroup_rescan_running = false;
-- 
2.12.3



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker
  2018-04-27 16:11         ` [PATCH v2] " Jeff Mahoney
@ 2018-04-27 16:34           ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2018-04-27 16:34 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: dsterba, Nikolay Borisov, linux-btrfs

On Fri, Apr 27, 2018 at 12:11:00PM -0400, Jeff Mahoney wrote:
> If we fail to allocate memory for a path, don't bother trying to
> insert the qgroup status item.  We haven't done anything yet and it'll
> fail also.  Just print an error and be done with it.
> 
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/qgroup.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 8de423a0c7e3..b795bad54705 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2648,7 +2648,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  			btrfs_end_transaction(trans);
>  	}
>  
> -out:
>  	btrfs_free_path(path);
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> @@ -2684,13 +2683,13 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  
>  	if (btrfs_fs_closing(fs_info)) {
>  		btrfs_info(fs_info, "qgroup scan paused");
> -	} else if (err >= 0) {
> +		err = 0;
> +	} else if (err >= 0)
>  		btrfs_info(fs_info, "qgroup scan completed%s",
>  			err > 0 ? " (inconsistency flag cleared)" : "");
> -	} else {
> +out:
> +	if (err < 0)
>  		btrfs_err(fs_info, "qgroup scan failed with %d", err);

Ah right, with the err = 0 in the fs_closing check we won't see both
messages reported, "qgroup scan paused" and "qgroup scan failed with %d".

Reviewed-by: David Sterba <dsterba@suse.com>

> -	}
> -
>  done:
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>  	fs_info->qgroup_rescan_running = false;
> -- 
> 2.12.3
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27 16:02   ` Jeff Mahoney
@ 2018-04-27 16:40     ` David Sterba
  2018-04-27 19:32       ` Jeff Mahoney
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2018-04-27 16:40 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: dsterba, linux-btrfs

On Fri, Apr 27, 2018 at 12:02:13PM -0400, Jeff Mahoney wrote:
> >> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> >> +{
> > 
> > And this had to be moved upwards as there was earlier use of
> > btrfs_queue_work that matched following the hunk.
> 
> Weird.  That must be exactly the kind of mismerge artifact that we were
> talking about the other day.  In my tree it's in the right spot.

I've tried current master, upcoming pull request queue (misc-4.17, one
nonc-onflicting patch) and current misc-next. None of them applies the
patch cleanly and the function is still added after the first use, so
this would not compile.

The result can be found in
https://github.com/kdave/btrfs-devel/commits/ext/jeffm/qgroup-fixes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27 15:56 ` David Sterba
  2018-04-27 16:02   ` Jeff Mahoney
@ 2018-04-27 19:28   ` Noah Massey
  2018-04-28 17:10     ` David Sterba
  1 sibling, 1 reply; 32+ messages in thread
From: Noah Massey @ 2018-04-27 19:28 UTC (permalink / raw)
  To: David Sterba, Jeff Mahoney, linux-btrfs

On Fri, Apr 27, 2018 at 11:56 AM, David Sterba <dsterba@suse.cz> wrote:
> On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>> ended up reintroducing the hang-on-unmount bug that the commit it
>> intended to fix addressed.
>>
>> The race this time is between qgroup_rescan_init setting
>> ->qgroup_rescan_running = true and the worker starting.  There are
>> many scenarios where we initialize the worker and never start it.  The
>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>> This can happen even without involving error handling, since mounting
>> the file system read-only returns between initializing the worker and
>> queueing it.
>>
>> The right place to do it is when we're queuing the worker.  The flag
>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>> a completion.
>>
>> This patch introduces a new helper, queue_rescan_worker, that handles
>> the ->qgroup_rescan_running flag, including any races with umount.
>>
>> While we're at it, ->qgroup_rescan_running is protected only by the
>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>> to take the spinlock too.
>>
>> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>
> I've added this to misc-next as I'd like to push it to the next rc. The
> Fixes is fixed.
>

I don't see it pushed to misc-next yet, but based on f89fbcd776, could
you update the reference in the first line of the commit to match the
Fixes line?

Thanks,
Noah

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27 16:40     ` David Sterba
@ 2018-04-27 19:32       ` Jeff Mahoney
  2018-04-28 17:09         ` David Sterba
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Mahoney @ 2018-04-27 19:32 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 4/27/18 12:40 PM, David Sterba wrote:
> On Fri, Apr 27, 2018 at 12:02:13PM -0400, Jeff Mahoney wrote:
>>>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>>>> +{
>>>
>>> And this had to be moved upwards as there was earlier use of
>>> btrfs_queue_work that matched following the hunk.
>>
>> Weird.  That must be exactly the kind of mismerge artifact that we were
>> talking about the other day.  In my tree it's in the right spot.
> 
> I've tried current master, upcoming pull request queue (misc-4.17, one
> nonc-onflicting patch) and current misc-next. None of them applies the
> patch cleanly and the function is still added after the first use, so
> this would not compile.
> 
> The result can be found in
> https://github.com/kdave/btrfs-devel/commits/ext/jeffm/qgroup-fixes
> 

Thanks.  The "Fixes" is incorrect there.  I had the right commit message
but not the right commit id.  It should be:

8d9eddad1946 (Btrfs: fix qgroup rescan worker initialization)

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27 19:32       ` Jeff Mahoney
@ 2018-04-28 17:09         ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2018-04-28 17:09 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: dsterba, linux-btrfs

On Fri, Apr 27, 2018 at 03:32:14PM -0400, Jeff Mahoney wrote:
> On 4/27/18 12:40 PM, David Sterba wrote:
> > On Fri, Apr 27, 2018 at 12:02:13PM -0400, Jeff Mahoney wrote:
> >>>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> >>>> +{
> >>>
> >>> And this had to be moved upwards as there was earlier use of
> >>> btrfs_queue_work that matched following the hunk.
> >>
> >> Weird.  That must be exactly the kind of mismerge artifact that we were
> >> talking about the other day.  In my tree it's in the right spot.
> > 
> > I've tried current master, upcoming pull request queue (misc-4.17, one
> > nonc-onflicting patch) and current misc-next. None of them applies the
> > patch cleanly and the function is still added after the first use, so
> > this would not compile.
> > 
> > The result can be found in
> > https://github.com/kdave/btrfs-devel/commits/ext/jeffm/qgroup-fixes
> > 
> 
> Thanks.  The "Fixes" is incorrect there.  I had the right commit message
> but not the right commit id.  It should be:
> 
> 8d9eddad1946 (Btrfs: fix qgroup rescan worker initialization)

I've updated the wrong part, subject instead of the commit id. Now
fixed.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-27 19:28   ` Noah Massey
@ 2018-04-28 17:10     ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2018-04-28 17:10 UTC (permalink / raw)
  To: Noah Massey; +Cc: David Sterba, Jeff Mahoney, linux-btrfs

On Fri, Apr 27, 2018 at 03:28:44PM -0400, Noah Massey wrote:
> On Fri, Apr 27, 2018 at 11:56 AM, David Sterba <dsterba@suse.cz> wrote:
> > On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
> >> From: Jeff Mahoney <jeffm@suse.com>
> >>
> >> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
...
> >>
> >> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> >> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> >
> > I've added this to misc-next as I'd like to push it to the next rc. The
> > Fixes is fixed.
> >
> 
> I don't see it pushed to misc-next yet, but based on f89fbcd776, could
> you update the reference in the first line of the commit to match the
> Fixes line?

Fixed, thanks for the notice.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
                   ` (4 preceding siblings ...)
  2018-04-27 15:56 ` David Sterba
@ 2018-04-30  6:20 ` Qu Wenruo
  2018-04-30 14:07   ` Jeff Mahoney
  2018-05-02 10:29 ` David Sterba
  6 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2018-04-30  6:20 UTC (permalink / raw)
  To: jeffm, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5470 bytes --]



On 2018年04月27日 03:23, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> the ->qgroup_rescan_running flag, including any races with umount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>

A little off-topic, (thanks Nikolay for reporting this) sometimes
btrfs/017 could report qgroup corruption, and it turns out it's related
to rescan racy, which double account existing tree blocks twice.
(One by btrfs quota enable, another by btrfs quota rescan -w)

Would this patch help in such case?

Thanks,
Qu

> ---
>  fs/btrfs/ctree.h  |  1 +
>  fs/btrfs/qgroup.c | 40 ++++++++++++++++++++++++++++------------
>  2 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..dbba615f4d0f 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,7 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..be491b6c020a 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2147,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2713,7 +2736,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>  	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
>  
>  	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
> @@ -2785,9 +2807,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2818,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2837,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-30  6:20 ` Qu Wenruo
@ 2018-04-30 14:07   ` Jeff Mahoney
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Mahoney @ 2018-04-30 14:07 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 4/30/18 2:20 AM, Qu Wenruo wrote:
> 
> 
> On 2018年04月27日 03:23, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>> ended up reintroducing the hang-on-unmount bug that the commit it
>> intended to fix addressed.
>>
>> The race this time is between qgroup_rescan_init setting
>> ->qgroup_rescan_running = true and the worker starting.  There are
>> many scenarios where we initialize the worker and never start it.  The
>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>> This can happen even without involving error handling, since mounting
>> the file system read-only returns between initializing the worker and
>> queueing it.
>>
>> The right place to do it is when we're queuing the worker.  The flag
>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>> a completion.
>>
>> This patch introduces a new helper, queue_rescan_worker, that handles
>> the ->qgroup_rescan_running flag, including any races with umount.
>>
>> While we're at it, ->qgroup_rescan_running is protected only by the
>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>> to take the spinlock too.
>>
>> Fixes: d2c609b834d6 (Btrfs: fix qgroup rescan worker initialization)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> 
> A little off-topic, (thanks Nikolay for reporting this) sometimes
> btrfs/017 could report qgroup corruption, and it turns out it's related
> to rescan racy, which double account existing tree blocks twice.
> (One by btrfs quota enable, another by btrfs quota rescan -w)
> 
> Would this patch help in such case?

It shouldn't.  This only fixes races between the rescan worker getting
initialized and running vs waiting for it to complete.

-Jeff

>>  fs/btrfs/ctree.h  |  1 +
>>  fs/btrfs/qgroup.c | 40 ++++++++++++++++++++++++++++------------
>>  2 files changed, 29 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index da308774b8a4..dbba615f4d0f 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1045,6 +1045,7 @@ struct btrfs_fs_info {
>>  	struct btrfs_workqueue *qgroup_rescan_workers;
>>  	struct completion qgroup_rescan_completion;
>>  	struct btrfs_work qgroup_rescan_work;
>> +	/* qgroup rescan worker is running or queued to run */
>>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>>  
>>  	/* filesystem state */
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index aa259d6986e1..be491b6c020a 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -2072,6 +2072,30 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>>  	return ret;
>>  }
>>  
>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>> +{
>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>> +	if (btrfs_fs_closing(fs_info)) {
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
>> +		btrfs_warn(fs_info, "rescan worker already queued");
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
>> +	 * to need to wait.
>> +	 */
>> +	fs_info->qgroup_rescan_running = true;
>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +
>> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> +			 &fs_info->qgroup_rescan_work);
>> +}
>> +
>>  /*
>>   * called from commit_transaction. Writes all changed qgroups to disk.
>>   */
>> @@ -2123,8 +2147,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>>  		if (!ret) {
>>  			qgroup_rescan_zero_tracking(fs_info);
>> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -					 &fs_info->qgroup_rescan_work);
>> +			queue_rescan_worker(fs_info);
>>  		}
>>  		ret = 0;
>>  	}
>> @@ -2713,7 +2736,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>  		sizeof(fs_info->qgroup_rescan_progress));
>>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>>  	init_completion(&fs_info->qgroup_rescan_completion);
>> -	fs_info->qgroup_rescan_running = true;
>>  
>>  	spin_unlock(&fs_info->qgroup_lock);
>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>> @@ -2785,9 +2807,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>>  
>>  	qgroup_rescan_zero_tracking(fs_info);
>>  
>> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -			 &fs_info->qgroup_rescan_work);
>> -
>> +	queue_rescan_worker(fs_info);
>>  	return 0;
>>  }
>>  
>> @@ -2798,9 +2818,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>  	int ret = 0;
>>  
>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>> -	spin_lock(&fs_info->qgroup_lock);
>>  	running = fs_info->qgroup_rescan_running;
>> -	spin_unlock(&fs_info->qgroup_lock);
>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>  
>>  	if (!running)
>> @@ -2819,12 +2837,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>   * this is only called from open_ctree where we're still single threaded, thus
>>   * locking is omitted here.
>>   */
>> -void
>> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>  {
>>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -				 &fs_info->qgroup_rescan_work);
>> +		queue_rescan_worker(fs_info);
>>  }
>>  
>>  /*
>>
> 


-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
                   ` (5 preceding siblings ...)
  2018-04-30  6:20 ` Qu Wenruo
@ 2018-05-02 10:29 ` David Sterba
  2018-05-02 13:15   ` David Sterba
  6 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2018-05-02 10:29 UTC (permalink / raw)
  To: jeffm; +Cc: linux-btrfs

On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {

The warning is quite noisy, I see it after tests btrfs/ 017, 022, 124,
139, 153. Is it necessary for non-debugging builds?

The tested branch was full for-next so it could be your patchset
interacting with other fixes, but the warning noise level question still
stands.

> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 10:29 ` David Sterba
@ 2018-05-02 13:15   ` David Sterba
  2018-05-02 13:58     ` Jeff Mahoney
  0 siblings, 1 reply; 32+ messages in thread
From: David Sterba @ 2018-05-02 13:15 UTC (permalink / raw)
  To: dsterba, jeffm, linux-btrfs

On Wed, May 02, 2018 at 12:29:28PM +0200, David Sterba wrote:
> On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
> > From: Jeff Mahoney <jeffm@suse.com>
> > +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> > +{
> > +	mutex_lock(&fs_info->qgroup_rescan_lock);
> > +	if (btrfs_fs_closing(fs_info)) {
> > +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> > +		return;
> > +	}
> > +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> 
> The warning is quite noisy, I see it after tests btrfs/ 017, 022, 124,
> 139, 153. Is it necessary for non-debugging builds?
> 
> The tested branch was full for-next so it could be your patchset
> interacting with other fixes, but the warning noise level question still
> stands.

So it must be something with the rest of misc-next or for-next patches,
current for 4.17 queue does show the warning at all, and the patch is ok
for merge.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 13:15   ` David Sterba
@ 2018-05-02 13:58     ` Jeff Mahoney
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Mahoney @ 2018-05-02 13:58 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 5/2/18 9:15 AM, David Sterba wrote:
> On Wed, May 02, 2018 at 12:29:28PM +0200, David Sterba wrote:
>> On Thu, Apr 26, 2018 at 03:23:49PM -0400, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>>> +{
>>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> +	if (btrfs_fs_closing(fs_info)) {
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
>>
>> The warning is quite noisy, I see it after tests btrfs/ 017, 022, 124,
>> 139, 153. Is it necessary for non-debugging builds?
>>
>> The tested branch was full for-next so it could be your patchset
>> interacting with other fixes, but the warning noise level question still
>> stands.
> 
> So it must be something with the rest of misc-next or for-next patches,
> current for 4.17 queue does show the warning at all, and the patch is ok
> for merge.
>
You might have something that causes it to be more noisy but it looks
like it should be possible to hit on 4.16.  The warning is supposed to
detect and complain about multiple rescan threads starting.  What I
think it's doing here is (correctly) identifying a different race: at
the end of btrfs_qgroup_rescan_worker, we clear the rescan status flag,
drop the lock, commit the status item transaction, and then update
->qgroup_rescan_running.  If a rescan is requested before the lock is
reacquired, we'll try to start it up and then hit that warning.

So, the warning is doing its job.  Please hold off on merging this patch.

IMO the root cause is overloading fs_info->qgroup_flags to correspond to
the on-disk item and control runtime behavior.  I've been meaning to fix
that for a while, so I'll do that now.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
                     ` (2 preceding siblings ...)
  2018-05-10 23:04   ` Jeff Mahoney
@ 2020-01-16  6:41   ` Qu Wenruo
  3 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2020-01-16  6:41 UTC (permalink / raw)
  To: jeffm, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 8591 bytes --]

Hi Jeff,

Would you like to share more info about the how the race happened?

Some stackdump or race graph, or even just reproducer would help greatly.
Especially when I guess the latest qgroup rescan code is no longer
affected, but without firm reproducer/race cause, I can't rule it out
completely.

Thanks,
Qu

On 2018/5/3 上午5:11, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}
>  
>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-05-03  7:24   ` Nikolay Borisov
  2018-05-10 19:49   ` Jeff Mahoney
@ 2018-05-10 23:04   ` Jeff Mahoney
  2020-01-16  6:41   ` Qu Wenruo
  3 siblings, 0 replies; 32+ messages in thread
From: Jeff Mahoney @ 2018-05-10 23:04 UTC (permalink / raw)
  To: dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 8643 bytes --]

On 5/2/18 5:11 PM, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}

Without checking for these flags when deciding whether we want to do
accounting, we'll end up doing the accounting until we start the rescan
thread.  This may not matter normally, but I'm working on a patch as a
workaround to 824d8dff8846533c9f1f9b1eabb0c03959e989ca doing all qgroup
accounting for entire trees in single transactions.

-Jeff

>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
  2018-05-03  7:24   ` Nikolay Borisov
@ 2018-05-10 19:49   ` Jeff Mahoney
  2018-05-10 23:04   ` Jeff Mahoney
  2020-01-16  6:41   ` Qu Wenruo
  3 siblings, 0 replies; 32+ messages in thread
From: Jeff Mahoney @ 2018-05-10 19:49 UTC (permalink / raw)
  To: dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 8391 bytes --]

On 5/2/18 5:11 PM, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.
> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}
>  
>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)

This check will never be true since the worker is now responsible for
setting it.

-Jeff

> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-03 15:52       ` Nikolay Borisov
@ 2018-05-03 15:57         ` Jeff Mahoney
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Mahoney @ 2018-05-03 15:57 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2662 bytes --]

On 5/3/18 11:52 AM, Nikolay Borisov wrote:
> 
> 
> On  3.05.2018 16:39, Jeff Mahoney wrote:
>> On 5/3/18 3:24 AM, Nikolay Borisov wrote:
>>>
>>>
>>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>
>>>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>>>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>>>> ended up reintroducing the hang-on-unmount bug that the commit it
>>>> intended to fix addressed.
>>>>
>>>> The race this time is between qgroup_rescan_init setting
>>>> ->qgroup_rescan_running = true and the worker starting.  There are
>>>> many scenarios where we initialize the worker and never start it.  The
>>>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>>>> This can happen even without involving error handling, since mounting
>>>> the file system read-only returns between initializing the worker and
>>>> queueing it.
>>>>
>>>> The right place to do it is when we're queuing the worker.  The flag
>>>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>>>> a completion.
>>>>
>>>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
>>>> refer to both runtime behavior and on-disk state, we introduce a new
>>>> fs_info->qgroup_rescan_ready to indicate that we're initialized and
>>>> waiting to start.
>>>
>>> Am I correct in my understanding that this qgroup_rescan_ready flag is
>>> used to avoid qgroup_rescan_init being called AFTER it has already been
>>> called but BEFORE queue_rescan_worker ? Why wasn't the initial version
>>> of this patch without this flag sufficient?
>>
>> No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN
>> flag near the end of the worker and clearing the running flag.  The
>> rescan lock is dropped in between, so btrfs_rescan_init will let a new
>> rescan request in while we update the status item on disk.  We wouldn't
>> have queued another worker since that's what the warning catches, but if
>> there were already tasks waiting for completion, they wouldn't have been
>> woken since the wait queue list would be reinitialized.  There's no way
>> to reorder clearing the flag without changing how we handle
>> ->qgroup_flags.  I plan on doing that separately.  This was just meant
>> to be the simple fix.
> 
> Great, I think some of this information should go into the change log,
> in explaining what the symptoms of the race condition are.

You're right.  I was treating as a race that my patch introduced but it
didn't.  It just complained about it.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-03 13:39     ` Jeff Mahoney
@ 2018-05-03 15:52       ` Nikolay Borisov
  2018-05-03 15:57         ` Jeff Mahoney
  0 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2018-05-03 15:52 UTC (permalink / raw)
  To: Jeff Mahoney, dsterba, linux-btrfs



On  3.05.2018 16:39, Jeff Mahoney wrote:
> On 5/3/18 3:24 AM, Nikolay Borisov wrote:
>>
>>
>> On  3.05.2018 00:11, jeffm@suse.com wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>>
>>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>>> ended up reintroducing the hang-on-unmount bug that the commit it
>>> intended to fix addressed.
>>>
>>> The race this time is between qgroup_rescan_init setting
>>> ->qgroup_rescan_running = true and the worker starting.  There are
>>> many scenarios where we initialize the worker and never start it.  The
>>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>>> This can happen even without involving error handling, since mounting
>>> the file system read-only returns between initializing the worker and
>>> queueing it.
>>>
>>> The right place to do it is when we're queuing the worker.  The flag
>>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>>> a completion.
>>>
>>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
>>> refer to both runtime behavior and on-disk state, we introduce a new
>>> fs_info->qgroup_rescan_ready to indicate that we're initialized and
>>> waiting to start.
>>
>> Am I correct in my understanding that this qgroup_rescan_ready flag is
>> used to avoid qgroup_rescan_init being called AFTER it has already been
>> called but BEFORE queue_rescan_worker ? Why wasn't the initial version
>> of this patch without this flag sufficient?
> 
> No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN
> flag near the end of the worker and clearing the running flag.  The
> rescan lock is dropped in between, so btrfs_rescan_init will let a new
> rescan request in while we update the status item on disk.  We wouldn't
> have queued another worker since that's what the warning catches, but if
> there were already tasks waiting for completion, they wouldn't have been
> woken since the wait queue list would be reinitialized.  There's no way
> to reorder clearing the flag without changing how we handle
> ->qgroup_flags.  I plan on doing that separately.  This was just meant
> to be the simple fix.

Great, I think some of this information should go into the change log,
in explaining what the symptoms of the race condition are.

> 
> That we can use the ready variable to also ensure that we don't let
> qgroup_rescan_init be called twice without running the rescan is a nice
> bonus.
> 
> -Jeff
> 
>>>
>>> This patch introduces a new helper, queue_rescan_worker, that handles
>>> most of the initialization, the two flags, and queuing the worker,
>>> including races with unmount.
>>>
>>> While we're at it, ->qgroup_rescan_running is protected only by the
>>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>>> to take the spinlock too.
>>>
>>> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>> ---
>>>  fs/btrfs/ctree.h  |  2 ++
>>>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>>>  2 files changed, 58 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index da308774b8a4..4003498bb714 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>>>  	struct btrfs_workqueue *qgroup_rescan_workers;
>>>  	struct completion qgroup_rescan_completion;
>>>  	struct btrfs_work qgroup_rescan_work;
>>> +	/* qgroup rescan worker is running or queued to run */
>>> +	bool qgroup_rescan_ready;
>>>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>>>  
>>>  	/* filesystem state */
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index aa259d6986e1..466744741873 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -101,6 +101,7 @@ static int
>>>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>>  		   int init_flags);
>>>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
>>> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>>>  
>>>  /* must be called with qgroup_ioctl_lock held */
>>>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
>>> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>>>  	return ret;
>>>  }
>>>  
>>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>>> +{
>>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> +	if (btrfs_fs_closing(fs_info)) {
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +
>>> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
>>> +		btrfs_warn(fs_info, "rescan worker not ready");
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +	fs_info->qgroup_rescan_ready = false;
>>> +
>>> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
>>> +		btrfs_warn(fs_info, "rescan worker already queued");
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
>>> +	 * to need to wait.
>>> +	 */
>>> +	fs_info->qgroup_rescan_running = true;
>>> +	init_completion(&fs_info->qgroup_rescan_completion);
>>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +
>>> +	memset(&fs_info->qgroup_rescan_work, 0,
>>> +	       sizeof(fs_info->qgroup_rescan_work));
>>> +
>>> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
>>> +			btrfs_qgroup_rescan_helper,
>>> +			btrfs_qgroup_rescan_worker, NULL, NULL);
>>> +
>>> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> +			 &fs_info->qgroup_rescan_work);
>>> +}
>>> +
>>>  /*
>>>   * called from commit_transaction. Writes all changed qgroups to disk.
>>>   */
>>> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>>>  		if (!ret) {
>>>  			qgroup_rescan_zero_tracking(fs_info);
>>> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> -					 &fs_info->qgroup_rescan_work);
>>> +			queue_rescan_worker(fs_info);
>>>  		}
>>>  		ret = 0;
>>>  	}
>>> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>>>  	if (!path)
>>>  		goto out;
>>>  
>>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +
>>>  	err = 0;
>>>  	while (!err && !btrfs_fs_closing(fs_info)) {
>>>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
>>> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>>  {
>>>  	int ret = 0;
>>>  
>>> -	if (!init_flags &&
>>> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
>>> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
>>> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>>>  		ret = -EINVAL;
>>>  		goto err;
>>>  	}
>>>  
>>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> -	spin_lock(&fs_info->qgroup_lock);
>>> -
>>> -	if (init_flags) {
>>> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>>> -			ret = -EINPROGRESS;
>>> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
>>> -			ret = -EINVAL;
>>> -
>>> -		if (ret) {
>>> -			spin_unlock(&fs_info->qgroup_lock);
>>> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> -			goto err;
>>> -		}
>>> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>>> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
>>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>>> +		ret = -EINPROGRESS;
>>> +		goto err;
>>>  	}
>>>  
>>>  	memset(&fs_info->qgroup_rescan_progress, 0,
>>>  		sizeof(fs_info->qgroup_rescan_progress));
>>>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>>> -	init_completion(&fs_info->qgroup_rescan_completion);
>>> -	fs_info->qgroup_rescan_running = true;
>>> +	fs_info->qgroup_rescan_ready = true;
>>>  
>>> -	spin_unlock(&fs_info->qgroup_lock);
>>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>>  
>>> -	memset(&fs_info->qgroup_rescan_work, 0,
>>> -	       sizeof(fs_info->qgroup_rescan_work));
>>> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
>>> -			btrfs_qgroup_rescan_helper,
>>> -			btrfs_qgroup_rescan_worker, NULL, NULL);
>>> -
>>> -	if (ret) {
>>>  err:
>>> +	if (ret) {
>>>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>>>  		return ret;
>>>  	}
>>> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>>>  
>>>  	qgroup_rescan_zero_tracking(fs_info);
>>>  
>>> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> -			 &fs_info->qgroup_rescan_work);
>>> -
>>> +	queue_rescan_worker(fs_info);
>>>  	return 0;
>>>  }
>>>  
>>> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>>  	int ret = 0;
>>>  
>>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>>> -	spin_lock(&fs_info->qgroup_lock);
>>>  	running = fs_info->qgroup_rescan_running;
>>> -	spin_unlock(&fs_info->qgroup_lock);
>>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>>  
>>>  	if (!running)
>>> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>>   * this is only called from open_ctree where we're still single threaded, thus
>>>   * locking is omitted here.
>>>   */
>>> -void
>>> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>>  {
>>>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>>> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
>>> -				 &fs_info->qgroup_rescan_work);
>>> +		queue_rescan_worker(fs_info);
>>>  }
>>>  
>>>  /*
>>>
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-03  7:24   ` Nikolay Borisov
@ 2018-05-03 13:39     ` Jeff Mahoney
  2018-05-03 15:52       ` Nikolay Borisov
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Mahoney @ 2018-05-03 13:39 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba, linux-btrfs

On 5/3/18 3:24 AM, Nikolay Borisov wrote:
> 
> 
> On  3.05.2018 00:11, jeffm@suse.com wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
>> ended up reintroducing the hang-on-unmount bug that the commit it
>> intended to fix addressed.
>>
>> The race this time is between qgroup_rescan_init setting
>> ->qgroup_rescan_running = true and the worker starting.  There are
>> many scenarios where we initialize the worker and never start it.  The
>> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
>> This can happen even without involving error handling, since mounting
>> the file system read-only returns between initializing the worker and
>> queueing it.
>>
>> The right place to do it is when we're queuing the worker.  The flag
>> really just means that btrfs_ioctl_quota_rescan_wait should wait for
>> a completion.
>>
>> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
>> refer to both runtime behavior and on-disk state, we introduce a new
>> fs_info->qgroup_rescan_ready to indicate that we're initialized and
>> waiting to start.
> 
> Am I correct in my understanding that this qgroup_rescan_ready flag is
> used to avoid qgroup_rescan_init being called AFTER it has already been
> called but BEFORE queue_rescan_worker ? Why wasn't the initial version
> of this patch without this flag sufficient?

No, the race is between clearing the BTRFS_QGROUP_STATUS_FLAG_RESCAN
flag near the end of the worker and clearing the running flag.  The
rescan lock is dropped in between, so btrfs_rescan_init will let a new
rescan request in while we update the status item on disk.  We wouldn't
have queued another worker since that's what the warning catches, but if
there were already tasks waiting for completion, they wouldn't have been
woken since the wait queue list would be reinitialized.  There's no way
to reorder clearing the flag without changing how we handle
->qgroup_flags.  I plan on doing that separately.  This was just meant
to be the simple fix.

That we can use the ready variable to also ensure that we don't let
qgroup_rescan_init be called twice without running the rescan is a nice
bonus.

-Jeff

>>
>> This patch introduces a new helper, queue_rescan_worker, that handles
>> most of the initialization, the two flags, and queuing the worker,
>> including races with unmount.
>>
>> While we're at it, ->qgroup_rescan_running is protected only by the
>> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
>> to take the spinlock too.
>>
>> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>>  fs/btrfs/ctree.h  |  2 ++
>>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>>  2 files changed, 58 insertions(+), 38 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index da308774b8a4..4003498bb714 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>>  	struct btrfs_workqueue *qgroup_rescan_workers;
>>  	struct completion qgroup_rescan_completion;
>>  	struct btrfs_work qgroup_rescan_work;
>> +	/* qgroup rescan worker is running or queued to run */
>> +	bool qgroup_rescan_ready;
>>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>>  
>>  	/* filesystem state */
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index aa259d6986e1..466744741873 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -101,6 +101,7 @@ static int
>>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>  		   int init_flags);
>>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
>> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>>  
>>  /* must be called with qgroup_ioctl_lock held */
>>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
>> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>>  	return ret;
>>  }
>>  
>> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
>> +{
>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>> +	if (btrfs_fs_closing(fs_info)) {
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +
>> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
>> +		btrfs_warn(fs_info, "rescan worker not ready");
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +	fs_info->qgroup_rescan_ready = false;
>> +
>> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
>> +		btrfs_warn(fs_info, "rescan worker already queued");
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
>> +	 * to need to wait.
>> +	 */
>> +	fs_info->qgroup_rescan_running = true;
>> +	init_completion(&fs_info->qgroup_rescan_completion);
>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +
>> +	memset(&fs_info->qgroup_rescan_work, 0,
>> +	       sizeof(fs_info->qgroup_rescan_work));
>> +
>> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
>> +			btrfs_qgroup_rescan_helper,
>> +			btrfs_qgroup_rescan_worker, NULL, NULL);
>> +
>> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> +			 &fs_info->qgroup_rescan_work);
>> +}
>> +
>>  /*
>>   * called from commit_transaction. Writes all changed qgroups to disk.
>>   */
>> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>>  		if (!ret) {
>>  			qgroup_rescan_zero_tracking(fs_info);
>> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -					 &fs_info->qgroup_rescan_work);
>> +			queue_rescan_worker(fs_info);
>>  		}
>>  		ret = 0;
>>  	}
>> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>>  	if (!path)
>>  		goto out;
>>  
>> +	mutex_lock(&fs_info->qgroup_rescan_lock);
>> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +
>>  	err = 0;
>>  	while (!err && !btrfs_fs_closing(fs_info)) {
>>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
>> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>>  {
>>  	int ret = 0;
>>  
>> -	if (!init_flags &&
>> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
>> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
>> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>>  		ret = -EINVAL;
>>  		goto err;
>>  	}
>>  
>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>> -	spin_lock(&fs_info->qgroup_lock);
>> -
>> -	if (init_flags) {
>> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>> -			ret = -EINPROGRESS;
>> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
>> -			ret = -EINVAL;
>> -
>> -		if (ret) {
>> -			spin_unlock(&fs_info->qgroup_lock);
>> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
>> -			goto err;
>> -		}
>> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
>> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
>> +		ret = -EINPROGRESS;
>> +		goto err;
>>  	}
>>  
>>  	memset(&fs_info->qgroup_rescan_progress, 0,
>>  		sizeof(fs_info->qgroup_rescan_progress));
>>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
>> -	init_completion(&fs_info->qgroup_rescan_completion);
>> -	fs_info->qgroup_rescan_running = true;
>> +	fs_info->qgroup_rescan_ready = true;
>>  
>> -	spin_unlock(&fs_info->qgroup_lock);
>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>  
>> -	memset(&fs_info->qgroup_rescan_work, 0,
>> -	       sizeof(fs_info->qgroup_rescan_work));
>> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
>> -			btrfs_qgroup_rescan_helper,
>> -			btrfs_qgroup_rescan_worker, NULL, NULL);
>> -
>> -	if (ret) {
>>  err:
>> +	if (ret) {
>>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>>  		return ret;
>>  	}
>> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>>  
>>  	qgroup_rescan_zero_tracking(fs_info);
>>  
>> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -			 &fs_info->qgroup_rescan_work);
>> -
>> +	queue_rescan_worker(fs_info);
>>  	return 0;
>>  }
>>  
>> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>  	int ret = 0;
>>  
>>  	mutex_lock(&fs_info->qgroup_rescan_lock);
>> -	spin_lock(&fs_info->qgroup_lock);
>>  	running = fs_info->qgroup_rescan_running;
>> -	spin_unlock(&fs_info->qgroup_lock);
>>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>>  
>>  	if (!running)
>> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>>   * this is only called from open_ctree where we're still single threaded, thus
>>   * locking is omitted here.
>>   */
>> -void
>> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>  {
>>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
>> -				 &fs_info->qgroup_rescan_work);
>> +		queue_rescan_worker(fs_info);
>>  }
>>  
>>  /*
>>


-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
@ 2018-05-03  7:24   ` Nikolay Borisov
  2018-05-03 13:39     ` Jeff Mahoney
  2018-05-10 19:49   ` Jeff Mahoney
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Nikolay Borisov @ 2018-05-03  7:24 UTC (permalink / raw)
  To: jeffm, dsterba, linux-btrfs



On  3.05.2018 00:11, jeffm@suse.com wrote:
> From: Jeff Mahoney <jeffm@suse.com>
> 
> Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
> ended up reintroducing the hang-on-unmount bug that the commit it
> intended to fix addressed.
> 
> The race this time is between qgroup_rescan_init setting
> ->qgroup_rescan_running = true and the worker starting.  There are
> many scenarios where we initialize the worker and never start it.  The
> completion btrfs_ioctl_quota_rescan_wait waits for will never come.
> This can happen even without involving error handling, since mounting
> the file system read-only returns between initializing the worker and
> queueing it.
> 
> The right place to do it is when we're queuing the worker.  The flag
> really just means that btrfs_ioctl_quota_rescan_wait should wait for
> a completion.
> 
> Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
> refer to both runtime behavior and on-disk state, we introduce a new
> fs_info->qgroup_rescan_ready to indicate that we're initialized and
> waiting to start.

Am I correct in my understanding that this qgroup_rescan_ready flag is
used to avoid qgroup_rescan_init being called AFTER it has already been
called but BEFORE queue_rescan_worker ? Why wasn't the initial version
of this patch without this flag sufficient?

> 
> This patch introduces a new helper, queue_rescan_worker, that handles
> most of the initialization, the two flags, and queuing the worker,
> including races with unmount.
> 
> While we're at it, ->qgroup_rescan_running is protected only by the
> ->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
> to take the spinlock too.
> 
> Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  fs/btrfs/ctree.h  |  2 ++
>  fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
>  2 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..4003498bb714 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
>  	struct btrfs_workqueue *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
>  	struct btrfs_work qgroup_rescan_work;
> +	/* qgroup rescan worker is running or queued to run */
> +	bool qgroup_rescan_ready;
>  	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
>  
>  	/* filesystem state */
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index aa259d6986e1..466744741873 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -101,6 +101,7 @@ static int
>  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  		   int init_flags);
>  static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
> +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
>  
>  /* must be called with qgroup_ioctl_lock held */
>  static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
> @@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
> +{
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	if (btrfs_fs_closing(fs_info)) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
> +		btrfs_warn(fs_info, "rescan worker not ready");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +	fs_info->qgroup_rescan_ready = false;
> +
> +	if (WARN_ON(fs_info->qgroup_rescan_running)) {
> +		btrfs_warn(fs_info, "rescan worker already queued");
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * Being queued is enough for btrfs_qgroup_wait_for_completion
> +	 * to need to wait.
> +	 */
> +	fs_info->qgroup_rescan_running = true;
> +	init_completion(&fs_info->qgroup_rescan_completion);
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
> +	memset(&fs_info->qgroup_rescan_work, 0,
> +	       sizeof(fs_info->qgroup_rescan_work));
> +
> +	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
> +			btrfs_qgroup_rescan_worker, NULL, NULL);
> +
> +	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> +			 &fs_info->qgroup_rescan_work);
> +}
> +
>  /*
>   * called from commit_transaction. Writes all changed qgroups to disk.
>   */
> @@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -					 &fs_info->qgroup_rescan_work);
> +			queue_rescan_worker(fs_info);
>  		}
>  		ret = 0;
>  	}
> @@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>  	if (!path)
>  		goto out;
>  
> +	mutex_lock(&fs_info->qgroup_rescan_lock);
> +	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	mutex_unlock(&fs_info->qgroup_rescan_lock);
> +
>  	err = 0;
>  	while (!err && !btrfs_fs_closing(fs_info)) {
>  		trans = btrfs_start_transaction(fs_info->fs_root, 0);
> @@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  {
>  	int ret = 0;
>  
> -	if (!init_flags &&
> -	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
> -	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
> +	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
>  		ret = -EINVAL;
>  		goto err;
>  	}
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
> -
> -	if (init_flags) {
> -		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -			ret = -EINPROGRESS;
> -		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
> -			ret = -EINVAL;
> -
> -		if (ret) {
> -			spin_unlock(&fs_info->qgroup_lock);
> -			mutex_unlock(&fs_info->qgroup_rescan_lock);
> -			goto err;
> -		}
> -		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
> +		mutex_unlock(&fs_info->qgroup_rescan_lock);
> +		ret = -EINPROGRESS;
> +		goto err;
>  	}
>  
>  	memset(&fs_info->qgroup_rescan_progress, 0,
>  		sizeof(fs_info->qgroup_rescan_progress));
>  	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
> -	init_completion(&fs_info->qgroup_rescan_completion);
> -	fs_info->qgroup_rescan_running = true;
> +	fs_info->qgroup_rescan_ready = true;
>  
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
> -	memset(&fs_info->qgroup_rescan_work, 0,
> -	       sizeof(fs_info->qgroup_rescan_work));
> -	btrfs_init_work(&fs_info->qgroup_rescan_work,
> -			btrfs_qgroup_rescan_helper,
> -			btrfs_qgroup_rescan_worker, NULL, NULL);
> -
> -	if (ret) {
>  err:
> +	if (ret) {
>  		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
>  		return ret;
>  	}
> @@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -			 &fs_info->qgroup_rescan_work);
> -
> +	queue_rescan_worker(fs_info);
>  	return 0;
>  }
>  
> @@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>  	int ret = 0;
>  
>  	mutex_lock(&fs_info->qgroup_rescan_lock);
> -	spin_lock(&fs_info->qgroup_lock);
>  	running = fs_info->qgroup_rescan_running;
> -	spin_unlock(&fs_info->qgroup_lock);
>  	mutex_unlock(&fs_info->qgroup_rescan_lock);
>  
>  	if (!running)
> @@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
>   * this is only called from open_ctree where we're still single threaded, thus
>   * locking is omitted here.
>   */
> -void
> -btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
> +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_work(fs_info->qgroup_rescan_workers,
> -				 &fs_info->qgroup_rescan_work);
> +		queue_rescan_worker(fs_info);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] btrfs: qgroups, fix rescan worker running races
  2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
@ 2018-05-02 21:11 ` jeffm
  2018-05-03  7:24   ` Nikolay Borisov
                     ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: jeffm @ 2018-05-02 21:11 UTC (permalink / raw)
  To: dsterba, linux-btrfs; +Cc: Jeff Mahoney

From: Jeff Mahoney <jeffm@suse.com>

Commit 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
fixed the issue with BTRFS_IOC_QUOTA_RESCAN_WAIT being racy, but
ended up reintroducing the hang-on-unmount bug that the commit it
intended to fix addressed.

The race this time is between qgroup_rescan_init setting
->qgroup_rescan_running = true and the worker starting.  There are
many scenarios where we initialize the worker and never start it.  The
completion btrfs_ioctl_quota_rescan_wait waits for will never come.
This can happen even without involving error handling, since mounting
the file system read-only returns between initializing the worker and
queueing it.

The right place to do it is when we're queuing the worker.  The flag
really just means that btrfs_ioctl_quota_rescan_wait should wait for
a completion.

Since the BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is overloaded to
refer to both runtime behavior and on-disk state, we introduce a new
fs_info->qgroup_rescan_ready to indicate that we're initialized and
waiting to start.

This patch introduces a new helper, queue_rescan_worker, that handles
most of the initialization, the two flags, and queuing the worker,
including races with unmount.

While we're at it, ->qgroup_rescan_running is protected only by the
->qgroup_rescan_mutex.  btrfs_ioctl_quota_rescan_wait doesn't need
to take the spinlock too.

Fixes: 8d9eddad194 (Btrfs: fix qgroup rescan worker initialization)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/ctree.h  |  2 ++
 fs/btrfs/qgroup.c | 94 +++++++++++++++++++++++++++++++++----------------------
 2 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index da308774b8a4..4003498bb714 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1045,6 +1045,8 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *qgroup_rescan_workers;
 	struct completion qgroup_rescan_completion;
 	struct btrfs_work qgroup_rescan_work;
+	/* qgroup rescan worker is running or queued to run */
+	bool qgroup_rescan_ready;
 	bool qgroup_rescan_running;	/* protected by qgroup_rescan_lock */
 
 	/* filesystem state */
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index aa259d6986e1..466744741873 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -101,6 +101,7 @@ static int
 qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 		   int init_flags);
 static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
+static void btrfs_qgroup_rescan_worker(struct btrfs_work *work);
 
 /* must be called with qgroup_ioctl_lock held */
 static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
@@ -2072,6 +2073,46 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static void queue_rescan_worker(struct btrfs_fs_info *fs_info)
+{
+	mutex_lock(&fs_info->qgroup_rescan_lock);
+	if (btrfs_fs_closing(fs_info)) {
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+
+	if (WARN_ON(!fs_info->qgroup_rescan_ready)) {
+		btrfs_warn(fs_info, "rescan worker not ready");
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+	fs_info->qgroup_rescan_ready = false;
+
+	if (WARN_ON(fs_info->qgroup_rescan_running)) {
+		btrfs_warn(fs_info, "rescan worker already queued");
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		return;
+	}
+
+	/*
+	 * Being queued is enough for btrfs_qgroup_wait_for_completion
+	 * to need to wait.
+	 */
+	fs_info->qgroup_rescan_running = true;
+	init_completion(&fs_info->qgroup_rescan_completion);
+	mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+	memset(&fs_info->qgroup_rescan_work, 0,
+	       sizeof(fs_info->qgroup_rescan_work));
+
+	btrfs_init_work(&fs_info->qgroup_rescan_work,
+			btrfs_qgroup_rescan_helper,
+			btrfs_qgroup_rescan_worker, NULL, NULL);
+
+	btrfs_queue_work(fs_info->qgroup_rescan_workers,
+			 &fs_info->qgroup_rescan_work);
+}
+
 /*
  * called from commit_transaction. Writes all changed qgroups to disk.
  */
@@ -2123,8 +2164,7 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 		ret = qgroup_rescan_init(fs_info, 0, 1);
 		if (!ret) {
 			qgroup_rescan_zero_tracking(fs_info);
-			btrfs_queue_work(fs_info->qgroup_rescan_workers,
-					 &fs_info->qgroup_rescan_work);
+			queue_rescan_worker(fs_info);
 		}
 		ret = 0;
 	}
@@ -2607,6 +2647,10 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 	if (!path)
 		goto out;
 
+	mutex_lock(&fs_info->qgroup_rescan_lock);
+	fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+	mutex_unlock(&fs_info->qgroup_rescan_lock);
+
 	err = 0;
 	while (!err && !btrfs_fs_closing(fs_info)) {
 		trans = btrfs_start_transaction(fs_info->fs_root, 0);
@@ -2685,47 +2729,27 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 {
 	int ret = 0;
 
-	if (!init_flags &&
-	    (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) ||
-	     !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))) {
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
 		ret = -EINVAL;
 		goto err;
 	}
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
-
-	if (init_flags) {
-		if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
-			ret = -EINPROGRESS;
-		else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON))
-			ret = -EINVAL;
-
-		if (ret) {
-			spin_unlock(&fs_info->qgroup_lock);
-			mutex_unlock(&fs_info->qgroup_rescan_lock);
-			goto err;
-		}
-		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+	if (fs_info->qgroup_rescan_ready || fs_info->qgroup_rescan_running) {
+		mutex_unlock(&fs_info->qgroup_rescan_lock);
+		ret = -EINPROGRESS;
+		goto err;
 	}
 
 	memset(&fs_info->qgroup_rescan_progress, 0,
 		sizeof(fs_info->qgroup_rescan_progress));
 	fs_info->qgroup_rescan_progress.objectid = progress_objectid;
-	init_completion(&fs_info->qgroup_rescan_completion);
-	fs_info->qgroup_rescan_running = true;
+	fs_info->qgroup_rescan_ready = true;
 
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
-	memset(&fs_info->qgroup_rescan_work, 0,
-	       sizeof(fs_info->qgroup_rescan_work));
-	btrfs_init_work(&fs_info->qgroup_rescan_work,
-			btrfs_qgroup_rescan_helper,
-			btrfs_qgroup_rescan_worker, NULL, NULL);
-
-	if (ret) {
 err:
+	if (ret) {
 		btrfs_info(fs_info, "qgroup_rescan_init failed with %d", ret);
 		return ret;
 	}
@@ -2785,9 +2809,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
 	qgroup_rescan_zero_tracking(fs_info);
 
-	btrfs_queue_work(fs_info->qgroup_rescan_workers,
-			 &fs_info->qgroup_rescan_work);
-
+	queue_rescan_worker(fs_info);
 	return 0;
 }
 
@@ -2798,9 +2820,7 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	mutex_lock(&fs_info->qgroup_rescan_lock);
-	spin_lock(&fs_info->qgroup_lock);
 	running = fs_info->qgroup_rescan_running;
-	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	if (!running)
@@ -2819,12 +2839,10 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
  * this is only called from open_ctree where we're still single threaded, thus
  * locking is omitted here.
  */
-void
-btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
+void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 {
 	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
-		btrfs_queue_work(fs_info->qgroup_rescan_workers,
-				 &fs_info->qgroup_rescan_work);
+		queue_rescan_worker(fs_info);
 }
 
 /*
-- 
2.12.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-01-16  6:41 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-26 19:23 [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
2018-04-26 19:23 ` [PATCH 2/3] btrfs: qgroups, remove unnecessary memset before btrfs_init_work jeffm
2018-04-26 20:37   ` Nikolay Borisov
2018-04-26 19:23 ` [PATCH 3/3] btrfs: qgroup, don't try to insert status item after ENOMEM in rescan worker jeffm
2018-04-26 20:39   ` Nikolay Borisov
2018-04-27 15:44     ` David Sterba
2018-04-27 16:08       ` Jeff Mahoney
2018-04-27 16:11         ` [PATCH v2] " Jeff Mahoney
2018-04-27 16:34           ` David Sterba
2018-04-27  8:42 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races Nikolay Borisov
2018-04-27  8:48 ` Filipe Manana
2018-04-27 16:00   ` Jeff Mahoney
2018-04-27 15:56 ` David Sterba
2018-04-27 16:02   ` Jeff Mahoney
2018-04-27 16:40     ` David Sterba
2018-04-27 19:32       ` Jeff Mahoney
2018-04-28 17:09         ` David Sterba
2018-04-27 19:28   ` Noah Massey
2018-04-28 17:10     ` David Sterba
2018-04-30  6:20 ` Qu Wenruo
2018-04-30 14:07   ` Jeff Mahoney
2018-05-02 10:29 ` David Sterba
2018-05-02 13:15   ` David Sterba
2018-05-02 13:58     ` Jeff Mahoney
2018-05-02 21:11 [PATCH v3 0/3] btrfs: qgroup rescan races (part 1) jeffm
2018-05-02 21:11 ` [PATCH 1/3] btrfs: qgroups, fix rescan worker running races jeffm
2018-05-03  7:24   ` Nikolay Borisov
2018-05-03 13:39     ` Jeff Mahoney
2018-05-03 15:52       ` Nikolay Borisov
2018-05-03 15:57         ` Jeff Mahoney
2018-05-10 19:49   ` Jeff Mahoney
2018-05-10 23:04   ` Jeff Mahoney
2020-01-16  6:41   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.