Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
* [PATCH 0/2] Scrub allocations vs reclaim fix
@ 2018-12-04 15:11 David Sterba
  2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
  2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
  0 siblings, 2 replies; 6+ messages in thread
From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana, David Sterba

I've instrumented the scrub alloctions to check for the GFP_NOFS context
and whether the device_list_mutex is not held. This caught one
allocation, and I've used the instrumented code to verify that Filipe's
scrub/reclaim fix is correct.

There are some debugging helpers need that live in the generic code so
I'm posting only results. The patches are in branch dev/locks-not-held
in my devel repos.

David Sterba (2):
  btrfs: scrub: pass fs_info to scrub_setup_ctx
  btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex

 fs/btrfs/scrub.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx
  2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba
@ 2018-12-04 15:11 ` David Sterba
  2018-12-04 15:15   ` Nikolay Borisov
  2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
  1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana, David Sterba

We can pass fs_info directly as this is the only member of btrfs_device
that's bing used inside scrub_setup_ctx.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index bbd1b36f4918..ffcab263e057 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx)
 		scrub_free_ctx(sctx);
 }
 
-static noinline_for_stack
-struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
+static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
+		struct btrfs_fs_info *fs_info, int is_dev_replace)
 {
 	struct scrub_ctx *sctx;
 	int		i;
-	struct btrfs_fs_info *fs_info = dev->fs_info;
 
 	sctx = kzalloc(sizeof(*sctx), GFP_KERNEL);
 	if (!sctx)
@@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
 	sctx->is_dev_replace = is_dev_replace;
 	sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO;
 	sctx->curr = -1;
-	sctx->fs_info = dev->fs_info;
+	sctx->fs_info = fs_info;
 	for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) {
 		struct scrub_bio *sbio;
 
@@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		return ret;
 	}
 
-	sctx = scrub_setup_ctx(dev, is_dev_replace);
+	sctx = scrub_setup_ctx(fs_info, is_dev_replace);
 	if (IS_ERR(sctx)) {
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
  2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba
  2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
@ 2018-12-04 15:11 ` David Sterba
  2018-12-04 15:22   ` Nikolay Borisov
  1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: fdmanana, David Sterba

The scrub context is allocated with GFP_KERNEL and called from
btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe
regarding reclaim that could try to flush filesystem data in order to
get the memory. And the device_list_mutex is held during superblock
commit, so this would cause a lockup.

Move the alocation and initialization before any changes that require
the mutex.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ffcab263e057..051d14c9f013 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		return -EINVAL;
 	}
 
+	/* Allocate outside of device_list_mutex */
+	sctx = scrub_setup_ctx(fs_info, is_dev_replace);
+	if (IS_ERR(sctx))
+		return PTR_ERR(sctx);
 
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	dev = btrfs_find_device(fs_info, devid, NULL, NULL);
 	if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) &&
 		     !is_dev_replace)) {
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-		return -ENODEV;
+		ret = -ENODEV;
+		goto out_free_ctx;
 	}
 
 	if (!is_dev_replace && !readonly &&
@@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 		btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable",
 				rcu_str_deref(dev->name));
-		return -EROFS;
+		ret = -EROFS;
+		goto out_free_ctx;
 	}
 
 	mutex_lock(&fs_info->scrub_lock);
@@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	    test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &dev->dev_state)) {
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-		return -EIO;
+		ret = -EIO;
+		goto out_free_ctx;
 	}
 
 	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
@@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-		return -EINPROGRESS;
+		ret = -EINPROGRESS;
+		goto out_free_ctx;
 	}
 	btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 
@@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	if (ret) {
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-		return ret;
+		goto out_free_ctx;
 	}
 
-	sctx = scrub_setup_ctx(fs_info, is_dev_replace);
-	if (IS_ERR(sctx)) {
-		mutex_unlock(&fs_info->scrub_lock);
-		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-		scrub_workers_put(fs_info);
-		return PTR_ERR(sctx);
-	}
 	sctx->readonly = readonly;
 	dev->scrub_ctx = sctx;
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
@@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 
 	scrub_put_ctx(sctx);
 
+	return ret;
+
+out_free_ctx:
+	scrub_free_ctx(sctx);
+
 	return ret;
 }
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx
  2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
@ 2018-12-04 15:15   ` Nikolay Borisov
  0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2018-12-04 15:15 UTC (permalink / raw)
  To: David Sterba, linux-btrfs; +Cc: fdmanana



On 4.12.18 г. 17:11 ч., David Sterba wrote:
> We can pass fs_info directly as this is the only member of btrfs_device
> that's bing used inside scrub_setup_ctx.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/scrub.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index bbd1b36f4918..ffcab263e057 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx)
>  		scrub_free_ctx(sctx);
>  }
>  
> -static noinline_for_stack
> -struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
> +static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
> +		struct btrfs_fs_info *fs_info, int is_dev_replace)
>  {
>  	struct scrub_ctx *sctx;
>  	int		i;
> -	struct btrfs_fs_info *fs_info = dev->fs_info;
>  
>  	sctx = kzalloc(sizeof(*sctx), GFP_KERNEL);
>  	if (!sctx)
> @@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
>  	sctx->is_dev_replace = is_dev_replace;
>  	sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO;
>  	sctx->curr = -1;
> -	sctx->fs_info = dev->fs_info;
> +	sctx->fs_info = fs_info;
>  	for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) {
>  		struct scrub_bio *sbio;
>  
> @@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  		return ret;
>  	}
>  
> -	sctx = scrub_setup_ctx(dev, is_dev_replace);
> +	sctx = scrub_setup_ctx(fs_info, is_dev_replace);
>  	if (IS_ERR(sctx)) {
>  		mutex_unlock(&fs_info->scrub_lock);
>  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
  2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
@ 2018-12-04 15:22   ` Nikolay Borisov
  2018-12-05  0:17     ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Nikolay Borisov @ 2018-12-04 15:22 UTC (permalink / raw)
  To: David Sterba, linux-btrfs; +Cc: fdmanana



On 4.12.18 г. 17:11 ч., David Sterba wrote:
> The scrub context is allocated with GFP_KERNEL and called from
> btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe
> regarding reclaim that could try to flush filesystem data in order to
> get the memory. And the device_list_mutex is held during superblock
> commit, so this would cause a lockup.
> 
> Move the alocation and initialization before any changes that require
> the mutex.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/scrub.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ffcab263e057..051d14c9f013 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  		return -EINVAL;
>  	}
>  
> +	/* Allocate outside of device_list_mutex */
> +	sctx = scrub_setup_ctx(fs_info, is_dev_replace);
> +	if (IS_ERR(sctx))
> +		return PTR_ERR(sctx);
>  
>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>  	dev = btrfs_find_device(fs_info, devid, NULL, NULL);
>  	if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) &&
>  		     !is_dev_replace)) {
>  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> -		return -ENODEV;
> +		ret = -ENODEV;
> +		goto out_free_ctx;
>  	}
>  
>  	if (!is_dev_replace && !readonly &&
> @@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>  		btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable",
>  				rcu_str_deref(dev->name));
> -		return -EROFS;
> +		ret = -EROFS;
> +		goto out_free_ctx;
>  	}
>  
>  	mutex_lock(&fs_info->scrub_lock);
> @@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	    test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &dev->dev_state)) {
>  		mutex_unlock(&fs_info->scrub_lock);
>  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> -		return -EIO;
> +		ret = -EIO;
> +		goto out_free_ctx;
>  	}
>  
>  	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
> @@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
>  		mutex_unlock(&fs_info->scrub_lock);
>  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> -		return -EINPROGRESS;
> +		ret = -EINPROGRESS;
> +		goto out_free_ctx;
>  	}
>  	btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
>  
> @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	if (ret) {
>  		mutex_unlock(&fs_info->scrub_lock);
>  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> -		return ret;
> +		goto out_free_ctx;

Don't we suffer the same issue when calling scrub_workers_get since in
it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL?


>  	}
>  
> -	sctx = scrub_setup_ctx(fs_info, is_dev_replace);
> -	if (IS_ERR(sctx)) {
> -		mutex_unlock(&fs_info->scrub_lock);
> -		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> -		scrub_workers_put(fs_info);
> -		return PTR_ERR(sctx);
> -	}
>  	sctx->readonly = readonly;
>  	dev->scrub_ctx = sctx;
>  	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> @@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  
>  	scrub_put_ctx(sctx);
>  
> +	return ret;
> +
> +out_free_ctx:
> +	scrub_free_ctx(sctx);
> +
>  	return ret;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
  2018-12-04 15:22   ` Nikolay Borisov
@ 2018-12-05  0:17     ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2018-12-05  0:17 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: David Sterba, linux-btrfs, fdmanana

On Tue, Dec 04, 2018 at 05:22:19PM +0200, Nikolay Borisov wrote:
> > @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> >  	if (ret) {
> >  		mutex_unlock(&fs_info->scrub_lock);
> >  		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > -		return ret;
> > +		goto out_free_ctx;
> 
> Don't we suffer the same issue when calling scrub_workers_get since in
> it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL?

Yes, that's right. I instrumented only the allocations in scrub.c to see
if the nofs and lock_not_held assertions work at all so this one did not
get caught directly.

As scrub_workers_get still needs the scrub_lock, fixing it by moving
does not work and would need more restructuring.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba
2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
2018-12-04 15:15   ` Nikolay Borisov
2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
2018-12-04 15:22   ` Nikolay Borisov
2018-12-05  0:17     ` David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox