* [PATCH 0/2] Scrub allocations vs reclaim fix
@ 2018-12-04 15:11 David Sterba
2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
0 siblings, 2 replies; 6+ messages in thread
From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana, David Sterba
I've instrumented the scrub alloctions to check for the GFP_NOFS context
and whether the device_list_mutex is not held. This caught one
allocation, and I've used the instrumented code to verify that Filipe's
scrub/reclaim fix is correct.
There are some debugging helpers need that live in the generic code so
I'm posting only results. The patches are in branch dev/locks-not-held
in my devel repos.
David Sterba (2):
btrfs: scrub: pass fs_info to scrub_setup_ctx
btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
fs/btrfs/scrub.c | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
--
2.19.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx
2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba
@ 2018-12-04 15:11 ` David Sterba
2018-12-04 15:15 ` Nikolay Borisov
2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana, David Sterba
We can pass fs_info directly as this is the only member of btrfs_device
that's bing used inside scrub_setup_ctx.
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/scrub.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index bbd1b36f4918..ffcab263e057 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx)
scrub_free_ctx(sctx);
}
-static noinline_for_stack
-struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
+static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
+ struct btrfs_fs_info *fs_info, int is_dev_replace)
{
struct scrub_ctx *sctx;
int i;
- struct btrfs_fs_info *fs_info = dev->fs_info;
sctx = kzalloc(sizeof(*sctx), GFP_KERNEL);
if (!sctx)
@@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
sctx->is_dev_replace = is_dev_replace;
sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO;
sctx->curr = -1;
- sctx->fs_info = dev->fs_info;
+ sctx->fs_info = fs_info;
for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) {
struct scrub_bio *sbio;
@@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
return ret;
}
- sctx = scrub_setup_ctx(dev, is_dev_replace);
+ sctx = scrub_setup_ctx(fs_info, is_dev_replace);
if (IS_ERR(sctx)) {
mutex_unlock(&fs_info->scrub_lock);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
--
2.19.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba
2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
@ 2018-12-04 15:11 ` David Sterba
2018-12-04 15:22 ` Nikolay Borisov
1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw)
To: linux-btrfs; +Cc: fdmanana, David Sterba
The scrub context is allocated with GFP_KERNEL and called from
btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe
regarding reclaim that could try to flush filesystem data in order to
get the memory. And the device_list_mutex is held during superblock
commit, so this would cause a lockup.
Move the alocation and initialization before any changes that require
the mutex.
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/scrub.c | 30 ++++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ffcab263e057..051d14c9f013 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
return -EINVAL;
}
+ /* Allocate outside of device_list_mutex */
+ sctx = scrub_setup_ctx(fs_info, is_dev_replace);
+ if (IS_ERR(sctx))
+ return PTR_ERR(sctx);
mutex_lock(&fs_info->fs_devices->device_list_mutex);
dev = btrfs_find_device(fs_info, devid, NULL, NULL);
if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) &&
!is_dev_replace)) {
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
- return -ENODEV;
+ ret = -ENODEV;
+ goto out_free_ctx;
}
if (!is_dev_replace && !readonly &&
@@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable",
rcu_str_deref(dev->name));
- return -EROFS;
+ ret = -EROFS;
+ goto out_free_ctx;
}
mutex_lock(&fs_info->scrub_lock);
@@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &dev->dev_state)) {
mutex_unlock(&fs_info->scrub_lock);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
- return -EIO;
+ ret = -EIO;
+ goto out_free_ctx;
}
btrfs_dev_replace_read_lock(&fs_info->dev_replace);
@@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
mutex_unlock(&fs_info->scrub_lock);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
- return -EINPROGRESS;
+ ret = -EINPROGRESS;
+ goto out_free_ctx;
}
btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
@@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
if (ret) {
mutex_unlock(&fs_info->scrub_lock);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
- return ret;
+ goto out_free_ctx;
}
- sctx = scrub_setup_ctx(fs_info, is_dev_replace);
- if (IS_ERR(sctx)) {
- mutex_unlock(&fs_info->scrub_lock);
- mutex_unlock(&fs_info->fs_devices->device_list_mutex);
- scrub_workers_put(fs_info);
- return PTR_ERR(sctx);
- }
sctx->readonly = readonly;
dev->scrub_ctx = sctx;
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
@@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
scrub_put_ctx(sctx);
+ return ret;
+
+out_free_ctx:
+ scrub_free_ctx(sctx);
+
return ret;
}
--
2.19.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx
2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
@ 2018-12-04 15:15 ` Nikolay Borisov
0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2018-12-04 15:15 UTC (permalink / raw)
To: David Sterba, linux-btrfs; +Cc: fdmanana
On 4.12.18 г. 17:11 ч., David Sterba wrote:
> We can pass fs_info directly as this is the only member of btrfs_device
> that's bing used inside scrub_setup_ctx.
>
> Signed-off-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
> fs/btrfs/scrub.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index bbd1b36f4918..ffcab263e057 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx)
> scrub_free_ctx(sctx);
> }
>
> -static noinline_for_stack
> -struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
> +static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
> + struct btrfs_fs_info *fs_info, int is_dev_replace)
> {
> struct scrub_ctx *sctx;
> int i;
> - struct btrfs_fs_info *fs_info = dev->fs_info;
>
> sctx = kzalloc(sizeof(*sctx), GFP_KERNEL);
> if (!sctx)
> @@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
> sctx->is_dev_replace = is_dev_replace;
> sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO;
> sctx->curr = -1;
> - sctx->fs_info = dev->fs_info;
> + sctx->fs_info = fs_info;
> for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) {
> struct scrub_bio *sbio;
>
> @@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> return ret;
> }
>
> - sctx = scrub_setup_ctx(dev, is_dev_replace);
> + sctx = scrub_setup_ctx(fs_info, is_dev_replace);
> if (IS_ERR(sctx)) {
> mutex_unlock(&fs_info->scrub_lock);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
@ 2018-12-04 15:22 ` Nikolay Borisov
2018-12-05 0:17 ` David Sterba
0 siblings, 1 reply; 6+ messages in thread
From: Nikolay Borisov @ 2018-12-04 15:22 UTC (permalink / raw)
To: David Sterba, linux-btrfs; +Cc: fdmanana
On 4.12.18 г. 17:11 ч., David Sterba wrote:
> The scrub context is allocated with GFP_KERNEL and called from
> btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe
> regarding reclaim that could try to flush filesystem data in order to
> get the memory. And the device_list_mutex is held during superblock
> commit, so this would cause a lockup.
>
> Move the alocation and initialization before any changes that require
> the mutex.
>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
> fs/btrfs/scrub.c | 30 ++++++++++++++++++------------
> 1 file changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ffcab263e057..051d14c9f013 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> return -EINVAL;
> }
>
> + /* Allocate outside of device_list_mutex */
> + sctx = scrub_setup_ctx(fs_info, is_dev_replace);
> + if (IS_ERR(sctx))
> + return PTR_ERR(sctx);
>
> mutex_lock(&fs_info->fs_devices->device_list_mutex);
> dev = btrfs_find_device(fs_info, devid, NULL, NULL);
> if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) &&
> !is_dev_replace)) {
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> - return -ENODEV;
> + ret = -ENODEV;
> + goto out_free_ctx;
> }
>
> if (!is_dev_replace && !readonly &&
> @@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable",
> rcu_str_deref(dev->name));
> - return -EROFS;
> + ret = -EROFS;
> + goto out_free_ctx;
> }
>
> mutex_lock(&fs_info->scrub_lock);
> @@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &dev->dev_state)) {
> mutex_unlock(&fs_info->scrub_lock);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> - return -EIO;
> + ret = -EIO;
> + goto out_free_ctx;
> }
>
> btrfs_dev_replace_read_lock(&fs_info->dev_replace);
> @@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
> mutex_unlock(&fs_info->scrub_lock);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> - return -EINPROGRESS;
> + ret = -EINPROGRESS;
> + goto out_free_ctx;
> }
> btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
>
> @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> if (ret) {
> mutex_unlock(&fs_info->scrub_lock);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> - return ret;
> + goto out_free_ctx;
Don't we suffer the same issue when calling scrub_workers_get since in
it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL?
> }
>
> - sctx = scrub_setup_ctx(fs_info, is_dev_replace);
> - if (IS_ERR(sctx)) {
> - mutex_unlock(&fs_info->scrub_lock);
> - mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> - scrub_workers_put(fs_info);
> - return PTR_ERR(sctx);
> - }
> sctx->readonly = readonly;
> dev->scrub_ctx = sctx;
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> @@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>
> scrub_put_ctx(sctx);
>
> + return ret;
> +
> +out_free_ctx:
> + scrub_free_ctx(sctx);
> +
> return ret;
> }
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
2018-12-04 15:22 ` Nikolay Borisov
@ 2018-12-05 0:17 ` David Sterba
0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2018-12-05 0:17 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: David Sterba, linux-btrfs, fdmanana
On Tue, Dec 04, 2018 at 05:22:19PM +0200, Nikolay Borisov wrote:
> > @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> > if (ret) {
> > mutex_unlock(&fs_info->scrub_lock);
> > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > - return ret;
> > + goto out_free_ctx;
>
> Don't we suffer the same issue when calling scrub_workers_get since in
> it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL?
Yes, that's right. I instrumented only the allocations in scrub.c to see
if the nofs and lock_not_held assertions work at all so this one did not
get caught directly.
As scrub_workers_get still needs the scrub_lock, fixing it by moving
does not work and would need more restructuring.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-05 0:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba
2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba
2018-12-04 15:15 ` Nikolay Borisov
2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba
2018-12-04 15:22 ` Nikolay Borisov
2018-12-05 0:17 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).