* [PATCH 0/2] Scrub allocations vs reclaim fix @ 2018-12-04 15:11 David Sterba 2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba 2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba 0 siblings, 2 replies; 6+ messages in thread From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw) To: linux-btrfs; +Cc: fdmanana, David Sterba I've instrumented the scrub alloctions to check for the GFP_NOFS context and whether the device_list_mutex is not held. This caught one allocation, and I've used the instrumented code to verify that Filipe's scrub/reclaim fix is correct. There are some debugging helpers need that live in the generic code so I'm posting only results. The patches are in branch dev/locks-not-held in my devel repos. David Sterba (2): btrfs: scrub: pass fs_info to scrub_setup_ctx btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex fs/btrfs/scrub.c | 37 +++++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 16 deletions(-) -- 2.19.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx 2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba @ 2018-12-04 15:11 ` David Sterba 2018-12-04 15:15 ` Nikolay Borisov 2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba 1 sibling, 1 reply; 6+ messages in thread From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw) To: linux-btrfs; +Cc: fdmanana, David Sterba We can pass fs_info directly as this is the only member of btrfs_device that's bing used inside scrub_setup_ctx. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/scrub.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index bbd1b36f4918..ffcab263e057 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx) scrub_free_ctx(sctx); } -static noinline_for_stack -struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) +static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( + struct btrfs_fs_info *fs_info, int is_dev_replace) { struct scrub_ctx *sctx; int i; - struct btrfs_fs_info *fs_info = dev->fs_info; sctx = kzalloc(sizeof(*sctx), GFP_KERNEL); if (!sctx) @@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) sctx->is_dev_replace = is_dev_replace; sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO; sctx->curr = -1; - sctx->fs_info = dev->fs_info; + sctx->fs_info = fs_info; for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) { struct scrub_bio *sbio; @@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, return ret; } - sctx = scrub_setup_ctx(dev, is_dev_replace); + sctx = scrub_setup_ctx(fs_info, is_dev_replace); if (IS_ERR(sctx)) { mutex_unlock(&fs_info->scrub_lock); mutex_unlock(&fs_info->fs_devices->device_list_mutex); -- 2.19.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx 2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba @ 2018-12-04 15:15 ` Nikolay Borisov 0 siblings, 0 replies; 6+ messages in thread From: Nikolay Borisov @ 2018-12-04 15:15 UTC (permalink / raw) To: David Sterba, linux-btrfs; +Cc: fdmanana On 4.12.18 г. 17:11 ч., David Sterba wrote: > We can pass fs_info directly as this is the only member of btrfs_device > that's bing used inside scrub_setup_ctx. > > Signed-off-by: David Sterba <dsterba@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/scrub.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index bbd1b36f4918..ffcab263e057 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -578,12 +578,11 @@ static void scrub_put_ctx(struct scrub_ctx *sctx) > scrub_free_ctx(sctx); > } > > -static noinline_for_stack > -struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) > +static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( > + struct btrfs_fs_info *fs_info, int is_dev_replace) > { > struct scrub_ctx *sctx; > int i; > - struct btrfs_fs_info *fs_info = dev->fs_info; > > sctx = kzalloc(sizeof(*sctx), GFP_KERNEL); > if (!sctx) > @@ -592,7 +591,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) > sctx->is_dev_replace = is_dev_replace; > sctx->pages_per_rd_bio = SCRUB_PAGES_PER_RD_BIO; > sctx->curr = -1; > - sctx->fs_info = dev->fs_info; > + sctx->fs_info = fs_info; > for (i = 0; i < SCRUB_BIOS_PER_SCTX; ++i) { > struct scrub_bio *sbio; > > @@ -3878,7 +3877,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > return ret; > } > > - sctx = scrub_setup_ctx(dev, is_dev_replace); > + sctx = scrub_setup_ctx(fs_info, is_dev_replace); > if (IS_ERR(sctx)) { > mutex_unlock(&fs_info->scrub_lock); > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex 2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba 2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba @ 2018-12-04 15:11 ` David Sterba 2018-12-04 15:22 ` Nikolay Borisov 1 sibling, 1 reply; 6+ messages in thread From: David Sterba @ 2018-12-04 15:11 UTC (permalink / raw) To: linux-btrfs; +Cc: fdmanana, David Sterba The scrub context is allocated with GFP_KERNEL and called from btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe regarding reclaim that could try to flush filesystem data in order to get the memory. And the device_list_mutex is held during superblock commit, so this would cause a lockup. Move the alocation and initialization before any changes that require the mutex. Signed-off-by: David Sterba <dsterba@suse.com> --- fs/btrfs/scrub.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ffcab263e057..051d14c9f013 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, return -EINVAL; } + /* Allocate outside of device_list_mutex */ + sctx = scrub_setup_ctx(fs_info, is_dev_replace); + if (IS_ERR(sctx)) + return PTR_ERR(sctx); mutex_lock(&fs_info->fs_devices->device_list_mutex); dev = btrfs_find_device(fs_info, devid, NULL, NULL); if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) && !is_dev_replace)) { mutex_unlock(&fs_info->fs_devices->device_list_mutex); - return -ENODEV; + ret = -ENODEV; + goto out_free_ctx; } if (!is_dev_replace && !readonly && @@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, mutex_unlock(&fs_info->fs_devices->device_list_mutex); btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable", rcu_str_deref(dev->name)); - return -EROFS; + ret = -EROFS; + goto out_free_ctx; } mutex_lock(&fs_info->scrub_lock); @@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &dev->dev_state)) { mutex_unlock(&fs_info->scrub_lock); mutex_unlock(&fs_info->fs_devices->device_list_mutex); - return -EIO; + ret = -EIO; + goto out_free_ctx; } btrfs_dev_replace_read_lock(&fs_info->dev_replace); @@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, btrfs_dev_replace_read_unlock(&fs_info->dev_replace); mutex_unlock(&fs_info->scrub_lock); mutex_unlock(&fs_info->fs_devices->device_list_mutex); - return -EINPROGRESS; + ret = -EINPROGRESS; + goto out_free_ctx; } btrfs_dev_replace_read_unlock(&fs_info->dev_replace); @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, if (ret) { mutex_unlock(&fs_info->scrub_lock); mutex_unlock(&fs_info->fs_devices->device_list_mutex); - return ret; + goto out_free_ctx; } - sctx = scrub_setup_ctx(fs_info, is_dev_replace); - if (IS_ERR(sctx)) { - mutex_unlock(&fs_info->scrub_lock); - mutex_unlock(&fs_info->fs_devices->device_list_mutex); - scrub_workers_put(fs_info); - return PTR_ERR(sctx); - } sctx->readonly = readonly; dev->scrub_ctx = sctx; mutex_unlock(&fs_info->fs_devices->device_list_mutex); @@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, scrub_put_ctx(sctx); + return ret; + +out_free_ctx: + scrub_free_ctx(sctx); + return ret; } -- 2.19.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex 2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba @ 2018-12-04 15:22 ` Nikolay Borisov 2018-12-05 0:17 ` David Sterba 0 siblings, 1 reply; 6+ messages in thread From: Nikolay Borisov @ 2018-12-04 15:22 UTC (permalink / raw) To: David Sterba, linux-btrfs; +Cc: fdmanana On 4.12.18 г. 17:11 ч., David Sterba wrote: > The scrub context is allocated with GFP_KERNEL and called from > btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe > regarding reclaim that could try to flush filesystem data in order to > get the memory. And the device_list_mutex is held during superblock > commit, so this would cause a lockup. > > Move the alocation and initialization before any changes that require > the mutex. > > Signed-off-by: David Sterba <dsterba@suse.com> > --- > fs/btrfs/scrub.c | 30 ++++++++++++++++++------------ > 1 file changed, 18 insertions(+), 12 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index ffcab263e057..051d14c9f013 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -3834,13 +3834,18 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > return -EINVAL; > } > > + /* Allocate outside of device_list_mutex */ > + sctx = scrub_setup_ctx(fs_info, is_dev_replace); > + if (IS_ERR(sctx)) > + return PTR_ERR(sctx); > > mutex_lock(&fs_info->fs_devices->device_list_mutex); > dev = btrfs_find_device(fs_info, devid, NULL, NULL); > if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) && > !is_dev_replace)) { > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > - return -ENODEV; > + ret = -ENODEV; > + goto out_free_ctx; > } > > if (!is_dev_replace && !readonly && > @@ -3848,7 +3853,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable", > rcu_str_deref(dev->name)); > - return -EROFS; > + ret = -EROFS; > + goto out_free_ctx; > } > > mutex_lock(&fs_info->scrub_lock); > @@ -3856,7 +3862,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &dev->dev_state)) { > mutex_unlock(&fs_info->scrub_lock); > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > - return -EIO; > + ret = -EIO; > + goto out_free_ctx; > } > > btrfs_dev_replace_read_lock(&fs_info->dev_replace); > @@ -3866,7 +3873,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > btrfs_dev_replace_read_unlock(&fs_info->dev_replace); > mutex_unlock(&fs_info->scrub_lock); > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > - return -EINPROGRESS; > + ret = -EINPROGRESS; > + goto out_free_ctx; > } > btrfs_dev_replace_read_unlock(&fs_info->dev_replace); > > @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > if (ret) { > mutex_unlock(&fs_info->scrub_lock); > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > - return ret; > + goto out_free_ctx; Don't we suffer the same issue when calling scrub_workers_get since in it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL? > } > > - sctx = scrub_setup_ctx(fs_info, is_dev_replace); > - if (IS_ERR(sctx)) { > - mutex_unlock(&fs_info->scrub_lock); > - mutex_unlock(&fs_info->fs_devices->device_list_mutex); > - scrub_workers_put(fs_info); > - return PTR_ERR(sctx); > - } > sctx->readonly = readonly; > dev->scrub_ctx = sctx; > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > @@ -3936,6 +3937,11 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > > scrub_put_ctx(sctx); > > + return ret; > + > +out_free_ctx: > + scrub_free_ctx(sctx); > + > return ret; > } > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex 2018-12-04 15:22 ` Nikolay Borisov @ 2018-12-05 0:17 ` David Sterba 0 siblings, 0 replies; 6+ messages in thread From: David Sterba @ 2018-12-05 0:17 UTC (permalink / raw) To: Nikolay Borisov; +Cc: David Sterba, linux-btrfs, fdmanana On Tue, Dec 04, 2018 at 05:22:19PM +0200, Nikolay Borisov wrote: > > @@ -3874,16 +3882,9 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, > > if (ret) { > > mutex_unlock(&fs_info->scrub_lock); > > mutex_unlock(&fs_info->fs_devices->device_list_mutex); > > - return ret; > > + goto out_free_ctx; > > Don't we suffer the same issue when calling scrub_workers_get since in > it we do btrfs_alloc_workqueue which also calls kzalloc with GFP_KERNEL? Yes, that's right. I instrumented only the allocations in scrub.c to see if the nofs and lock_not_held assertions work at all so this one did not get caught directly. As scrub_workers_get still needs the scrub_lock, fixing it by moving does not work and would need more restructuring. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-05 0:18 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-04 15:11 [PATCH 0/2] Scrub allocations vs reclaim fix David Sterba 2018-12-04 15:11 ` [PATCH 1/2] btrfs: scrub: pass fs_info to scrub_setup_ctx David Sterba 2018-12-04 15:15 ` Nikolay Borisov 2018-12-04 15:11 ` [PATCH 2/2] btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex David Sterba 2018-12-04 15:22 ` Nikolay Borisov 2018-12-05 0:17 ` David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).