* [PULL] Btrfs for 4.7, part 2 @ 2016-05-26 9:27 David Sterba 2016-05-27 0:14 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: David Sterba @ 2016-05-26 9:27 UTC (permalink / raw) To: clm; +Cc: linux-btrfs, David Sterba Hi, please pull a few more patches that did not go to pull #1 for 4.7, minor cleanups and fixes. Thanks. The following changes since commit c315ef8d9db7f1a0ebd023a395ebdfde1c68057e: Merge branch 'for-chris-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7 (2016-05-17 14:43:19 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris for you to fetch changes up to 4c6143dd497901e3537dc4324dc203dfda442009: Merge branch 'dev/comp-workspaces' into for-chris-4.7-20160525 (2016-05-25 22:51:04 +0200) ---------------------------------------------------------------- David Sterba (17): btrfs: rename and document compression workspace members btrfs: preallocate compression workspaces btrfs: make find_workspace always succeed btrfs: make find_workspace warn if there are no workspaces btrfs: sink gfp parameter to set_extent_bits btrfs: sink gfp parameter to clear_extent_bits btrfs: sink gfp parameter to clear_record_extent_bits btrfs: sink gfp parameter to clear_extent_dirty btrfs: sink gfp parameter to set_extent_delalloc btrfs: sink gfp parameter to set_extent_defrag btrfs: sink gfp parameter to set_extent_new btrfs: sink gfp parameter to set_record_extent_bits btrfs: untangle gotos a bit in __set_extent_bit btrfs: untangle gotos a bit in __clear_extent_bit btrfs: untangle gotos a bit in convert_extent_bit btrfs: make state preallocation more speculative in __set_extent_bit btrfs: sink gfp parameter to convert_extent_bit Liu Bo (2): Btrfs: free sys_array eb as soon as possible Btrfs: fix unexpected return value of fiemap Nicholas D Steeves (1): btrfs: fix string and comment grammatical issues and typos Zhao Lei (1): btrfs: scrub: Set bbio to NULL before calling btrfs_map_block ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-26 9:27 [PULL] Btrfs for 4.7, part 2 David Sterba @ 2016-05-27 0:14 ` Chris Mason 2016-05-27 11:18 ` David Sterba 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2016-05-27 0:14 UTC (permalink / raw) To: David Sterba; +Cc: linux-btrfs On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote: > Hi, > > please pull a few more patches that did not go to pull #1 for 4.7, minor > cleanups and fixes. Thanks. Thanks Dave! Trying to figure out why we're failing btrfs/011, but I don't see how it could be related to this bunch. I'll nail it down. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-27 0:14 ` Chris Mason @ 2016-05-27 11:18 ` David Sterba 2016-05-27 14:35 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: David Sterba @ 2016-05-27 11:18 UTC (permalink / raw) To: Chris Mason, David Sterba, linux-btrfs On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote: > On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote: > > Hi, > > > > please pull a few more patches that did not go to pull #1 for 4.7, minor > > cleanups and fixes. Thanks. > > Thanks Dave! Trying to figure out why we're failing btrfs/011, but I > don't see how it could be related to this bunch. I'll nail it down. 011 passes here, there are some unrelated soft-failures (mismatching output with new progs). I'm now testing a branch without "btrfs: scrub: Set bbio to NULL before calling btrfs_map_block", that seems to be the only likely offender. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-27 11:18 ` David Sterba @ 2016-05-27 14:35 ` Chris Mason 2016-05-27 15:42 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2016-05-27 14:35 UTC (permalink / raw) To: dsterba, David Sterba, linux-btrfs On Fri, May 27, 2016 at 01:18:22PM +0200, David Sterba wrote: > On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote: > > On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote: > > > Hi, > > > > > > please pull a few more patches that did not go to pull #1 for 4.7, minor > > > cleanups and fixes. Thanks. > > > > Thanks Dave! Trying to figure out why we're failing btrfs/011, but I > > don't see how it could be related to this bunch. I'll nail it down. > > 011 passes here, there are some unrelated soft-failures (mismatching > output with new progs). I'm now testing a branch without "btrfs: scrub: > Set bbio to NULL before calling btrfs_map_block", that seems to be the > only likely offender. I'm getting errors from btrfs fi show -d, after the very last round of device replaces. A little extra debugging: bytenr mismatch, want=4332716032, have=0 ERROR: cannot read chunk root ERROR reading /dev/vdh failed /dev/vdh Which is cute because the very next command we run fscks /dev/vdh and succeeds. So the page cache is stale and this isn't related to any of our patches. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-27 14:35 ` Chris Mason @ 2016-05-27 15:42 ` Chris Mason 2016-05-28 5:14 ` Anand Jain 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2016-05-27 15:42 UTC (permalink / raw) To: dsterba, David Sterba, linux-btrfs On Fri, May 27, 2016 at 10:35:27AM -0400, Chris Mason wrote: > On Fri, May 27, 2016 at 01:18:22PM +0200, David Sterba wrote: > > On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote: > > > On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote: > > > > Hi, > > > > > > > > please pull a few more patches that did not go to pull #1 for 4.7, minor > > > > cleanups and fixes. Thanks. > > > > > > Thanks Dave! Trying to figure out why we're failing btrfs/011, but I > > > don't see how it could be related to this bunch. I'll nail it down. > > > > 011 passes here, there are some unrelated soft-failures (mismatching > > output with new progs). I'm now testing a branch without "btrfs: scrub: > > Set bbio to NULL before calling btrfs_map_block", that seems to be the > > only likely offender. > > I'm getting errors from btrfs fi show -d, after the very last round of > device replaces. A little extra debugging: > > bytenr mismatch, want=4332716032, have=0 > ERROR: cannot read chunk root > ERROR reading /dev/vdh > failed /dev/vdh > > Which is cute because the very next command we run fscks /dev/vdh and > succeeds. > > So the page cache is stale and this isn't related to any of our patches. close_ctree() calls into btrfs_close_devices(), which calls btrfs_close_one_device(), which uses: call_rcu(&device->rcu, free_device); close_ctree() also does an rcu_barrier() to make sure and wait for free_device() to finish. But, free_device() just puts the work into schedule_work(), so we don't know for sure the blkdev_put is done when we exit. It's been this way for a while, so its not holding up my pull request to Linus. But I'll fix it up. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-27 15:42 ` Chris Mason @ 2016-05-28 5:14 ` Anand Jain 2016-05-29 12:21 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: Anand Jain @ 2016-05-28 5:14 UTC (permalink / raw) To: Chris Mason, dsterba, David Sterba, linux-btrfs On 05/27/2016 11:42 PM, Chris Mason wrote: > On Fri, May 27, 2016 at 10:35:27AM -0400, Chris Mason wrote: >> On Fri, May 27, 2016 at 01:18:22PM +0200, David Sterba wrote: >>> On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote: >>>> On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote: >>>>> Hi, >>>>> >>>>> please pull a few more patches that did not go to pull #1 for 4.7, minor >>>>> cleanups and fixes. Thanks. >>>> >>>> Thanks Dave! Trying to figure out why we're failing btrfs/011, but I >>>> don't see how it could be related to this bunch. I'll nail it down. >>> >>> 011 passes here, there are some unrelated soft-failures (mismatching >>> output with new progs). I'm now testing a branch without "btrfs: scrub: >>> Set bbio to NULL before calling btrfs_map_block", that seems to be the >>> only likely offender. >> >> I'm getting errors from btrfs fi show -d, after the very last round of >> device replaces. A little extra debugging: >> >> bytenr mismatch, want=4332716032, have=0 >> ERROR: cannot read chunk root >> ERROR reading /dev/vdh >> failed /dev/vdh >> >> Which is cute because the very next command we run fscks /dev/vdh and >> succeeds. Checked the code paths both btrfs fi show -d and btrfs check, both are calling flush during relative open_ctree in progs. However the flush is called after we have read superblock. That means the read_superblock during 'show' cli (only) will read superblock without flush, and 'check' won't, because 011 calls 'check' after 'show'. But it still does not explain the above error, which is during open_ctree not at read superblock. Remains strange case as of now. Also. I can't reproduce. >> So the page cache is stale and this isn't related to any of our patches. > > close_ctree() calls into btrfs_close_devices(), which calls > btrfs_close_one_device(), which uses: > > call_rcu(&device->rcu, free_device); > > close_ctree() also does an rcu_barrier() to make sure and wait for > free_device() to finish. > > But, free_device() just puts the work into schedule_work(), so we don't > know for sure the blkdev_put is done when we exit. Right, saw that before. Any idea why its like that ? Or if it should be fixed? > It's been this way for a while, so its not holding up my pull request to > Linus. But I'll fix it up. Yes. Its been like that. Thanks, Anand > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-28 5:14 ` Anand Jain @ 2016-05-29 12:21 ` Chris Mason 2016-06-14 10:52 ` Anand Jain 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2016-05-29 12:21 UTC (permalink / raw) To: Anand Jain; +Cc: dsterba, David Sterba, linux-btrfs On Sat, May 28, 2016 at 01:14:13PM +0800, Anand Jain wrote: > > >On 05/27/2016 11:42 PM, Chris Mason wrote: >>>I'm getting errors from btrfs fi show -d, after the very last round of >>>device replaces. A little extra debugging: >>> >>>bytenr mismatch, want=4332716032, have=0 >>>ERROR: cannot read chunk root >>>ERROR reading /dev/vdh >>>failed /dev/vdh >>> >>>Which is cute because the very next command we run fscks /dev/vdh and >>>succeeds. > >Checked the code paths both btrfs fi show -d and btrfs check, >both are calling flush during relative open_ctree in progs. > >However the flush is called after we have read superblock. That >means the read_superblock during 'show' cli (only) will read superblock >without flush, and 'check' won't, because 011 calls 'check' after >'show'. But it still does not explain the above error, which is >during open_ctree not at read superblock. Remains strange case as >of now. It's because we're just not done writing it out yet when btrfs fi show is run. I think replace is special here. > >Also. I can't reproduce. > I'm in a relatively new test rig using kvm, which probably explains why I haven't seen it before. You can probably make it easier by adding a sleep inside the actual __free_device() func. >>>So the page cache is stale and this isn't related to any of our patches. >> >>close_ctree() calls into btrfs_close_devices(), which calls >>btrfs_close_one_device(), which uses: >> >>call_rcu(&device->rcu, free_device); >> >>close_ctree() also does an rcu_barrier() to make sure and wait for >>free_device() to finish. >> >>But, free_device() just puts the work into schedule_work(), so we don't >>know for sure the blkdev_put is done when we exit. > > Right, saw that before. Any idea why its like that ? Or if it > should be fixed? It's just trying to limit the work that is done from call_rcu, and it should definitely be fixed. It might cause EBUSY or other problems. Probably easiest to add a counter or completion object that gets changed by the __free_device function. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PULL] Btrfs for 4.7, part 2 2016-05-29 12:21 ` Chris Mason @ 2016-06-14 10:52 ` Anand Jain 2016-06-14 10:55 ` [PATCH 1/2] btrfs: reorg btrfs_close_one_device() Anand Jain 0 siblings, 1 reply; 21+ messages in thread From: Anand Jain @ 2016-06-14 10:52 UTC (permalink / raw) To: Chris Mason, dsterba, David Sterba, linux-btrfs Chris, Sorry for the delay due to vacation. more below.. On 05/29/2016 08:21 PM, Chris Mason wrote: > On Sat, May 28, 2016 at 01:14:13PM +0800, Anand Jain wrote: >> >> >> On 05/27/2016 11:42 PM, Chris Mason wrote: >>>> I'm getting errors from btrfs fi show -d, after the very last round of >>>> device replaces. A little extra debugging: >>>> >>>> bytenr mismatch, want=4332716032, have=0 >>>> ERROR: cannot read chunk root >>>> ERROR reading /dev/vdh >>>> failed /dev/vdh >>>> >>>> Which is cute because the very next command we run fscks /dev/vdh and >>>> succeeds. >> >> Checked the code paths both btrfs fi show -d and btrfs check, >> both are calling flush during relative open_ctree in progs. >> >> However the flush is called after we have read superblock. That >> means the read_superblock during 'show' cli (only) will read superblock >> without flush, and 'check' won't, because 011 calls 'check' after >> 'show'. But it still does not explain the above error, which is >> during open_ctree not at read superblock. Remains strange case as >> of now. > > It's because we're just not done writing it out yet when btrfs fi show > is run. > I think replace is special here. > >> >> Also. I can't reproduce. >> > > I'm in a relatively new test rig using kvm, which probably explains why > I haven't seen it before. You can probably make it easier by adding > a sleep inside the actual __free_device() func. > >>>> So the page cache is stale and this isn't related to any of our >>>> patches. >>> >>> close_ctree() calls into btrfs_close_devices(), which calls >>> btrfs_close_one_device(), which uses: >>> >>> call_rcu(&device->rcu, free_device); >>> >>> close_ctree() also does an rcu_barrier() to make sure and wait for >>> free_device() to finish. >>> >>> But, free_device() just puts the work into schedule_work(), so we don't >>> know for sure the blkdev_put is done when we exit. >> >> Right, saw that before. Any idea why its like that ? Or if it >> should be fixed? > > It's just trying to limit the work that is done from call_rcu, and it > should > definitely be fixed. It might cause EBUSY or other problems. Probably > easiest to add a counter or completion object that gets changed by the > __free_device function. yes indeed sleep made the problem to reproduce, Also looks like this problem was identified by below commit before, however the fix wasn't correct. ---- commit bc178622d40d87e75abc131007342429c9b03351 btrfs: use rcu_barrier() to wait for bdev puts at unmount :: Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount complet ---- As free_devces() spinoff __free_device() to make the actual bdev put we need to wait on __free_device(). But rcu_barrier() just waits for free_device() to complete, so at the end of rcu_barrier() the blkdev_put() may not be completed. Wrote a new fix as in the patches, [PATH 2/2] btrfs: wait for bdev put For review comments. Thanks, -Anand > -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] btrfs: reorg btrfs_close_one_device() 2016-06-14 10:52 ` Anand Jain @ 2016-06-14 10:55 ` Anand Jain 2016-06-14 10:55 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain 0 siblings, 1 reply; 21+ messages in thread From: Anand Jain @ 2016-06-14 10:55 UTC (permalink / raw) To: linux-btrfs; +Cc: clm Moves closer to the caller and removes declaration Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/volumes.c | 71 +++++++++++++++++++++++++++--------------------------- 1 file changed, 35 insertions(+), 36 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a637e99e4c6b..a4e8d48acd4b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -140,7 +140,6 @@ static int btrfs_relocate_sys_chunks(struct btrfs_root *root); static void __btrfs_reset_dev_stats(struct btrfs_device *dev); static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev); static void btrfs_dev_stat_print_on_load(struct btrfs_device *device); -static void btrfs_close_one_device(struct btrfs_device *device); DEFINE_MUTEX(uuid_mutex); static LIST_HEAD(fs_uuids); @@ -853,6 +852,41 @@ static void free_device(struct rcu_head *head) schedule_work(&device->rcu_work); } +static void btrfs_close_one_device(struct btrfs_device *device) +{ + struct btrfs_fs_devices *fs_devices = device->fs_devices; + struct btrfs_device *new_device; + struct rcu_string *name; + + if (device->bdev) + fs_devices->open_devices--; + + if (device->writeable && + device->devid != BTRFS_DEV_REPLACE_DEVID) { + list_del_init(&device->dev_alloc_list); + fs_devices->rw_devices--; + } + + if (device->missing) + fs_devices->missing_devices--; + + new_device = btrfs_alloc_device(NULL, &device->devid, + device->uuid); + BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ + + /* Safe because we are under uuid_mutex */ + if (device->name) { + name = rcu_string_strdup(device->name->str, GFP_NOFS); + BUG_ON(!name); /* -ENOMEM */ + rcu_assign_pointer(new_device->name, name); + } + + list_replace_rcu(&device->dev_list, &new_device->dev_list); + new_device->fs_devices = device->fs_devices; + + call_rcu(&device->rcu, free_device); +} + static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_device *device, *tmp; @@ -7054,38 +7088,3 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info) fs_devices = fs_devices->seed; } } - -static void btrfs_close_one_device(struct btrfs_device *device) -{ - struct btrfs_fs_devices *fs_devices = device->fs_devices; - struct btrfs_device *new_device; - struct rcu_string *name; - - if (device->bdev) - fs_devices->open_devices--; - - if (device->writeable && - device->devid != BTRFS_DEV_REPLACE_DEVID) { - list_del_init(&device->dev_alloc_list); - fs_devices->rw_devices--; - } - - if (device->missing) - fs_devices->missing_devices--; - - new_device = btrfs_alloc_device(NULL, &device->devid, - device->uuid); - BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ - - /* Safe because we are under uuid_mutex */ - if (device->name) { - name = rcu_string_strdup(device->name->str, GFP_NOFS); - BUG_ON(!name); /* -ENOMEM */ - rcu_assign_pointer(new_device->name, name); - } - - list_replace_rcu(&device->dev_list, &new_device->dev_list); - new_device->fs_devices = device->fs_devices; - - call_rcu(&device->rcu, free_device); -} -- 2.7.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/2] btrfs: wait for bdev put 2016-06-14 10:55 ` [PATCH 1/2] btrfs: reorg btrfs_close_one_device() Anand Jain @ 2016-06-14 10:55 ` Anand Jain 2016-06-18 16:34 ` Holger Hoffstätte ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Anand Jain @ 2016-06-14 10:55 UTC (permalink / raw) To: linux-btrfs; +Cc: clm Further to the previous commit bc178622d40d87e75abc131007342429c9b03351 btrfs: use rcu_barrier() to wait for bdev puts at unmount Since free_device() spinoff __free_device() the rcu_barrier() for call_rcu(&device->rcu, free_device); didn't help. This patch reverts changes by bc178622d40d87e75abc131007342429c9b03351 and implement a method to wait on the __free_device() by using a new bdev_closing member in struct btrfs_device. Signed-off-by: Anand Jain <anand.jain@oracle.com> [rework: bc178622d40d87e75abc131007342429c9b03351] --- fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++------ fs/btrfs/volumes.h | 1 + 2 files changed, 39 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a4e8d48acd4b..404ce1daebb1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -27,6 +27,7 @@ #include <linux/raid/pq.h> #include <linux/semaphore.h> #include <linux/uuid.h> +#include <linux/delay.h> #include <asm/div64.h> #include "ctree.h" #include "extent_map.h" @@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void) return dev; } +static int is_device_closing(struct list_head *head) +{ + struct btrfs_device *dev; + + list_for_each_entry(dev, head, dev_list) { + if (dev->bdev_closing) + return 1; + } + return 0; +} + static noinline struct btrfs_device *__find_device(struct list_head *head, u64 devid, u8 *uuid) { @@ -832,12 +844,22 @@ again: static void __free_device(struct work_struct *work) { struct btrfs_device *device; + struct btrfs_device *new_device_addr; device = container_of(work, struct btrfs_device, rcu_work); if (device->bdev) blkdev_put(device->bdev, device->mode); + /* + * If we are coming here from btrfs_close_one_device() + * then it allocates a new device structure for the same + * devid, so find device again with the devid + */ + new_device_addr = __find_device(&device->fs_devices->devices, + device->devid, NULL); + + new_device_addr->bdev_closing = 0; rcu_string_free(device->name); kfree(device); } @@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device) list_replace_rcu(&device->dev_list, &new_device->dev_list); new_device->fs_devices = device->fs_devices; + /* + * So to wait for kworkers to finish all blkdev_puts, + * so device is really free when umount is done. + */ + new_device->bdev_closing = 1; + call_rcu(&device->rcu, free_device); } @@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_fs_devices *seed_devices = NULL; int ret; + int retry_cnt = 5; mutex_lock(&uuid_mutex); ret = __btrfs_close_devices(fs_devices); @@ -927,12 +956,15 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) __btrfs_close_devices(fs_devices); free_fs_devices(fs_devices); } - /* - * Wait for rcu kworkers under __btrfs_close_devices - * to finish all blkdev_puts so device is really - * free when umount is done. - */ - rcu_barrier(); + + while (is_device_closing(&fs_devices->devices) && + --retry_cnt) { + mdelay(1000); //1 sec + } + + if (!(retry_cnt > 0)) + printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n", + fs_devices->fsid); return ret; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0ac90f8d85bd..945e49f5e17d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -150,6 +150,7 @@ struct btrfs_device { /* Counter to record the change of device stats */ atomic_t dev_stats_ccnt; atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; + int bdev_closing; }; /* -- 2.7.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] btrfs: wait for bdev put 2016-06-14 10:55 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain @ 2016-06-18 16:34 ` Holger Hoffstätte 2016-06-20 8:33 ` Anand Jain 2016-06-21 10:24 ` [PATCH v2 " Anand Jain 2016-06-23 12:54 ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain 2 siblings, 1 reply; 21+ messages in thread From: Holger Hoffstätte @ 2016-06-18 16:34 UTC (permalink / raw) To: linux-btrfs On Tue, 14 Jun 2016 18:55:26 +0800, Anand Jain wrote: > Further to the previous commit > bc178622d40d87e75abc131007342429c9b03351 > btrfs: use rcu_barrier() to wait for bdev puts at unmount > > Since free_device() spinoff __free_device() the rcu_barrier() for > call_rcu(&device->rcu, free_device); > didn't help. > > This patch reverts changes by > bc178622d40d87e75abc131007342429c9b03351 > and implement a method to wait on the __free_device() by using > a new bdev_closing member in struct btrfs_device. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > [rework: bc178622d40d87e75abc131007342429c9b03351] > --- > fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++------ > fs/btrfs/volumes.h | 1 + > 2 files changed, 39 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index a4e8d48acd4b..404ce1daebb1 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -27,6 +27,7 @@ > #include <linux/raid/pq.h> > #include <linux/semaphore.h> > #include <linux/uuid.h> > +#include <linux/delay.h> > #include <asm/div64.h> > #include "ctree.h" > #include "extent_map.h" > @@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void) > return dev; > } > > +static int is_device_closing(struct list_head *head) > +{ > + struct btrfs_device *dev; > + > + list_for_each_entry(dev, head, dev_list) { > + if (dev->bdev_closing) > + return 1; > + } > + return 0; > +} > + > static noinline struct btrfs_device *__find_device(struct list_head *head, > u64 devid, u8 *uuid) > { > @@ -832,12 +844,22 @@ again: > static void __free_device(struct work_struct *work) > { > struct btrfs_device *device; > + struct btrfs_device *new_device_addr; > > device = container_of(work, struct btrfs_device, rcu_work); > > if (device->bdev) > blkdev_put(device->bdev, device->mode); > > + /* > + * If we are coming here from btrfs_close_one_device() > + * then it allocates a new device structure for the same > + * devid, so find device again with the devid > + */ > + new_device_addr = __find_device(&device->fs_devices->devices, > + device->devid, NULL); > + > + new_device_addr->bdev_closing = 0; > rcu_string_free(device->name); > kfree(device); > } > @@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device) > list_replace_rcu(&device->dev_list, &new_device->dev_list); > new_device->fs_devices = device->fs_devices; > > + /* > + * So to wait for kworkers to finish all blkdev_puts, > + * so device is really free when umount is done. > + */ > + new_device->bdev_closing = 1; > + > call_rcu(&device->rcu, free_device); > } > > @@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) > { > struct btrfs_fs_devices *seed_devices = NULL; > int ret; > + int retry_cnt = 5; > > mutex_lock(&uuid_mutex); > ret = __btrfs_close_devices(fs_devices); > @@ -927,12 +956,15 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) > __btrfs_close_devices(fs_devices); > free_fs_devices(fs_devices); > } > - /* > - * Wait for rcu kworkers under __btrfs_close_devices > - * to finish all blkdev_puts so device is really > - * free when umount is done. > - */ > - rcu_barrier(); > + > + while (is_device_closing(&fs_devices->devices) && > + --retry_cnt) { > + mdelay(1000); //1 sec > + } > + > + if (!(retry_cnt > 0)) > + printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n", > + fs_devices->fsid); > return ret; > } > > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index 0ac90f8d85bd..945e49f5e17d 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -150,6 +150,7 @@ struct btrfs_device { > /* Counter to record the change of device stats */ > atomic_t dev_stats_ccnt; > atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; > + int bdev_closing; > }; > > /* > -- > 2.7.0 I gave this a try and somehow it seems to make unmounting worse: it now always takes ~5s (max retry time) and I see the warning every time. Without the patch unmounting a single volume (disk) is much faster (1-2s), without problems. Any ideas? cheers, Holger ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] btrfs: wait for bdev put 2016-06-18 16:34 ` Holger Hoffstätte @ 2016-06-20 8:33 ` Anand Jain 0 siblings, 0 replies; 21+ messages in thread From: Anand Jain @ 2016-06-20 8:33 UTC (permalink / raw) To: Holger Hoffstätte, linux-btrfs On 06/19/2016 12:34 AM, Holger Hoffstätte wrote: > On Tue, 14 Jun 2016 18:55:26 +0800, Anand Jain wrote: > >> Further to the previous commit >> bc178622d40d87e75abc131007342429c9b03351 >> btrfs: use rcu_barrier() to wait for bdev puts at unmount >> >> Since free_device() spinoff __free_device() the rcu_barrier() for >> call_rcu(&device->rcu, free_device); >> didn't help. >> >> This patch reverts changes by >> bc178622d40d87e75abc131007342429c9b03351 >> and implement a method to wait on the __free_device() by using >> a new bdev_closing member in struct btrfs_device. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> [rework: bc178622d40d87e75abc131007342429c9b03351] >> --- >> fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++------ >> fs/btrfs/volumes.h | 1 + >> 2 files changed, 39 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index a4e8d48acd4b..404ce1daebb1 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -27,6 +27,7 @@ >> #include <linux/raid/pq.h> >> #include <linux/semaphore.h> >> #include <linux/uuid.h> >> +#include <linux/delay.h> >> #include <asm/div64.h> >> #include "ctree.h" >> #include "extent_map.h" >> @@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void) >> return dev; >> } >> >> +static int is_device_closing(struct list_head *head) >> +{ >> + struct btrfs_device *dev; >> + >> + list_for_each_entry(dev, head, dev_list) { >> + if (dev->bdev_closing) >> + return 1; >> + } >> + return 0; >> +} >> + >> static noinline struct btrfs_device *__find_device(struct list_head *head, >> u64 devid, u8 *uuid) >> { >> @@ -832,12 +844,22 @@ again: >> static void __free_device(struct work_struct *work) >> { >> struct btrfs_device *device; >> + struct btrfs_device *new_device_addr; >> >> device = container_of(work, struct btrfs_device, rcu_work); >> >> if (device->bdev) >> blkdev_put(device->bdev, device->mode); >> >> + /* >> + * If we are coming here from btrfs_close_one_device() >> + * then it allocates a new device structure for the same >> + * devid, so find device again with the devid >> + */ >> + new_device_addr = __find_device(&device->fs_devices->devices, >> + device->devid, NULL); >> + >> + new_device_addr->bdev_closing = 0; >> rcu_string_free(device->name); >> kfree(device); >> } >> @@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device) >> list_replace_rcu(&device->dev_list, &new_device->dev_list); >> new_device->fs_devices = device->fs_devices; >> >> + /* >> + * So to wait for kworkers to finish all blkdev_puts, >> + * so device is really free when umount is done. >> + */ >> + new_device->bdev_closing = 1; >> + >> call_rcu(&device->rcu, free_device); >> } >> >> @@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) >> { >> struct btrfs_fs_devices *seed_devices = NULL; >> int ret; >> + int retry_cnt = 5; >> >> mutex_lock(&uuid_mutex); >> ret = __btrfs_close_devices(fs_devices); >> @@ -927,12 +956,15 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) >> __btrfs_close_devices(fs_devices); >> free_fs_devices(fs_devices); >> } >> - /* >> - * Wait for rcu kworkers under __btrfs_close_devices >> - * to finish all blkdev_puts so device is really >> - * free when umount is done. >> - */ >> - rcu_barrier(); >> + >> + while (is_device_closing(&fs_devices->devices) && >> + --retry_cnt) { >> + mdelay(1000); //1 sec >> + } >> + >> + if (!(retry_cnt > 0)) >> + printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n", >> + fs_devices->fsid); >> return ret; >> } >> >> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h >> index 0ac90f8d85bd..945e49f5e17d 100644 >> --- a/fs/btrfs/volumes.h >> +++ b/fs/btrfs/volumes.h >> @@ -150,6 +150,7 @@ struct btrfs_device { >> /* Counter to record the change of device stats */ >> atomic_t dev_stats_ccnt; >> atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; >> + int bdev_closing; >> }; >> >> /* >> -- >> 2.7.0 > > I gave this a try and somehow it seems to make unmounting worse: > it now always takes ~5s (max retry time) and I see the warning every > time. Without the patch unmounting a single volume (disk) is much > faster (1-2s), without problems. Thanks Holger, for testing. It depends on long the blkdev_put() will take, originally unmount thread didn't wait for it to complete, so it was faster, but had other problem as explained. Thanks, Anand > Any ideas? > cheers, > Holger ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 2/2] btrfs: wait for bdev put 2016-06-14 10:55 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain 2016-06-18 16:34 ` Holger Hoffstätte @ 2016-06-21 10:24 ` Anand Jain 2016-06-21 11:46 ` Holger Hoffstätte 2016-06-21 13:00 ` Chris Mason 2016-06-23 12:54 ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain 2 siblings, 2 replies; 21+ messages in thread From: Anand Jain @ 2016-06-21 10:24 UTC (permalink / raw) To: linux-btrfs; +Cc: holger, clm, dsterba, xiaolong.ye From: Anand Jain <Anand.Jain@oracle.com> Further to the commit bc178622d40d87e75abc131007342429c9b03351 btrfs: use rcu_barrier() to wait for bdev puts at unmount This patch implements a method to time wait on the __free_device() which actually does the bdev put. This is needed as the user space running 'btrfs fi show -d' immediately after the replace and unmount, is still reading older information from the device. mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html Signed-off-by: Anand Jain <anand.jain@oracle.com> [updates: bc178622d40d87e75abc131007342429c9b03351] --- v2: Also to make sure bdev_closing is set it needs rcu_barrier(), restored rcu_barrier(). fs/btrfs/volumes.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.h | 1 + 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 604daf315669..ef61c34cafbf 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -27,6 +27,7 @@ #include <linux/raid/pq.h> #include <linux/semaphore.h> #include <linux/uuid.h> +#include <linux/delay.h> #include <asm/div64.h> #include "ctree.h" #include "extent_map.h" @@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void) return dev; } +static int is_device_closing(struct list_head *head) +{ + struct btrfs_device *dev; + + list_for_each_entry(dev, head, dev_list) { + if (dev->bdev_closing) + return 1; + } + return 0; +} + static noinline struct btrfs_device *__find_device(struct list_head *head, u64 devid, u8 *uuid) { @@ -832,12 +844,22 @@ again: static void __free_device(struct work_struct *work) { struct btrfs_device *device; + struct btrfs_device *new_device_addr; device = container_of(work, struct btrfs_device, rcu_work); if (device->bdev) blkdev_put(device->bdev, device->mode); + /* + * If we are coming here from btrfs_close_one_device() + * then it allocates a new device structure for the same + * devid, so find device again with the devid + */ + new_device_addr = __find_device(&device->fs_devices->devices, + device->devid, NULL); + + new_device_addr->bdev_closing = 0; rcu_string_free(device->name); kfree(device); } @@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device) list_replace_rcu(&device->dev_list, &new_device->dev_list); new_device->fs_devices = device->fs_devices; + /* + * So to wait for kworkers to finish all blkdev_puts, + * so device is really free when umount is done. + */ + new_device->bdev_closing = 1; + call_rcu(&device->rcu, free_device); } @@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_fs_devices *seed_devices = NULL; int ret; + int retry_cnt = 5; mutex_lock(&uuid_mutex); ret = __btrfs_close_devices(fs_devices); @@ -929,10 +958,22 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices) } /* * Wait for rcu kworkers under __btrfs_close_devices - * to finish all blkdev_puts so device is really - * free when umount is done. + * to finish all free_device() */ rcu_barrier(); + + /* + * Wait for a grace period so that __free_device() + * will actaully do the device close. + */ + while (is_device_closing(&fs_devices->devices) && + --retry_cnt) { + mdelay(1000); //1 sec + } + + if (!(retry_cnt > 0)) + printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n", + fs_devices->fsid); return ret; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 0ac90f8d85bd..945e49f5e17d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -150,6 +150,7 @@ struct btrfs_device { /* Counter to record the change of device stats */ atomic_t dev_stats_ccnt; atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX]; + int bdev_closing; }; /* -- 2.7.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/2] btrfs: wait for bdev put 2016-06-21 10:24 ` [PATCH v2 " Anand Jain @ 2016-06-21 11:46 ` Holger Hoffstätte 2016-06-21 13:00 ` Chris Mason 1 sibling, 0 replies; 21+ messages in thread From: Holger Hoffstätte @ 2016-06-21 11:46 UTC (permalink / raw) To: Anand Jain, linux-btrfs; +Cc: clm, dsterba, xiaolong.ye On 06/21/16 12:24, Anand Jain wrote: > From: Anand Jain <Anand.Jain@oracle.com> > > Further to the commit > bc178622d40d87e75abc131007342429c9b03351 > btrfs: use rcu_barrier() to wait for bdev puts at unmount > > This patch implements a method to time wait on the __free_device() > which actually does the bdev put. This is needed as the user space > running 'btrfs fi show -d' immediately after the replace and > unmount, is still reading older information from the device. > > mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > [updates: bc178622d40d87e75abc131007342429c9b03351] > --- > v2: Also to make sure bdev_closing is set it needs rcu_barrier(), > restored rcu_barrier(). Looks like this one works reliably again. ;) Tested with a slow disk, no long unmounts or timeout messages. Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com> thanks! Holger ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/2] btrfs: wait for bdev put 2016-06-21 10:24 ` [PATCH v2 " Anand Jain 2016-06-21 11:46 ` Holger Hoffstätte @ 2016-06-21 13:00 ` Chris Mason 2016-06-22 10:18 ` Anand Jain 1 sibling, 1 reply; 21+ messages in thread From: Chris Mason @ 2016-06-21 13:00 UTC (permalink / raw) To: Anand Jain, linux-btrfs; +Cc: holger, dsterba, xiaolong.ye On 06/21/2016 06:24 AM, Anand Jain wrote: > From: Anand Jain <Anand.Jain@oracle.com> > > Further to the commit > bc178622d40d87e75abc131007342429c9b03351 > btrfs: use rcu_barrier() to wait for bdev puts at unmount > > This patch implements a method to time wait on the __free_device() > which actually does the bdev put. This is needed as the user space > running 'btrfs fi show -d' immediately after the replace and > unmount, is still reading older information from the device. Thanks for working on this Anand. Since it looks like blkdev_put can deadlock against us, can we please switch to making sure we fully flush the outstanding IO? It's probably enough to do a sync_blockdev() call before we allow the unmount to finish, but we can toss in an invalidate_bdev for good measure. Then we can get rid of the mdelay loop completely, which seems pretty error prone to me. Thanks! -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/2] btrfs: wait for bdev put 2016-06-21 13:00 ` Chris Mason @ 2016-06-22 10:18 ` Anand Jain 2016-06-22 21:47 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: Anand Jain @ 2016-06-22 10:18 UTC (permalink / raw) To: Chris Mason, linux-btrfs; +Cc: holger, dsterba, xiaolong.ye Thanks for the review Chris. On 06/21/2016 09:00 PM, Chris Mason wrote: > On 06/21/2016 06:24 AM, Anand Jain wrote: >> From: Anand Jain <Anand.Jain@oracle.com> >> >> Further to the commit >> bc178622d40d87e75abc131007342429c9b03351 >> btrfs: use rcu_barrier() to wait for bdev puts at unmount >> >> This patch implements a method to time wait on the __free_device() >> which actually does the bdev put. This is needed as the user space >> running 'btrfs fi show -d' immediately after the replace and >> unmount, is still reading older information from the device. > > Thanks for working on this Anand. Since it looks like blkdev_put can > deadlock against us, can we please switch to making sure we fully flush > the outstanding IO? It's probably enough to do a sync_blockdev() call > before we allow the unmount to finish, but we can toss in an > invalidate_bdev for good measure. ------------ # git diff diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 604daf315669..e0ad29d6fe9a 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->missing) fs_devices->missing_devices--; + if (device->bdev && device->writeable) { + sync_blockdev(device->bdev); + invalidate_bdev(device->bdev); + } + new_device = btrfs_alloc_device(NULL, &device->devid, device->uuid); BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ ----------- However, theoretically still there might be a problem - at the end of unmount, if the device exclusive open is not actually closed, then there might be a race with another program which is trying to open the device in exclusive mode. Like for eg: unmount /btrfs; fsck /dev/X and here fsck might fail to open the device if it wins the race. > Then we can get rid of the mdelay loop completely, which seems pretty > error prone to me. Yes, the code got little complex here (and also when sysfs fixes were wrote) as struct btrfs_device is getting migrated to a new struct btrfs_device at unmount, which I don't think was necessary? Thanks, Anand > Thanks! > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/2] btrfs: wait for bdev put 2016-06-22 10:18 ` Anand Jain @ 2016-06-22 21:47 ` Chris Mason 2016-06-23 13:07 ` Anand Jain 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2016-06-22 21:47 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs, holger, dsterba, xiaolong.ye On Wed, Jun 22, 2016 at 6:18 AM, Anand Jain <anand.jain@oracle.com> wrote: > > > Thanks for the review Chris. > > On 06/21/2016 09:00 PM, Chris Mason wrote: >> On 06/21/2016 06:24 AM, Anand Jain wrote: >>> From: Anand Jain <Anand.Jain@oracle.com> >>> >>> Further to the commit >>> bc178622d40d87e75abc131007342429c9b03351 >>> btrfs: use rcu_barrier() to wait for bdev puts at unmount >>> >>> This patch implements a method to time wait on the __free_device() >>> which actually does the bdev put. This is needed as the user space >>> running 'btrfs fi show -d' immediately after the replace and >>> unmount, is still reading older information from the device. >> >> Thanks for working on this Anand. Since it looks like blkdev_put can >> deadlock against us, can we please switch to making sure we fully >> flush >> the outstanding IO? It's probably enough to do a sync_blockdev() >> call >> before we allow the unmount to finish, but we can toss in an >> invalidate_bdev for good measure. > > > ------------ > # git diff > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 604daf315669..e0ad29d6fe9a 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct > btrfs_device *device) > if (device->missing) > fs_devices->missing_devices--; > > + if (device->bdev && device->writeable) { > + sync_blockdev(device->bdev); > + invalidate_bdev(device->bdev); > + } > + > new_device = btrfs_alloc_device(NULL, &device->devid, > device->uuid); > BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ > ----------- > > > However, theoretically still there might be a problem - at the end of > unmount, if the device exclusive open is not actually closed, then > there might be a race with another program which is trying to open > the device in exclusive mode. Like for eg: > unmount /btrfs; fsck /dev/X > and here fsck might fail to open the device if it wins the race. This true, but at least we know he'll have up to date buffers if he does manage to open the device. With the generic code, the blkdev_put happens after the super is gone, so I'm not sure we can completely fix this from inside our callback. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/2] btrfs: wait for bdev put 2016-06-22 21:47 ` Chris Mason @ 2016-06-23 13:07 ` Anand Jain 0 siblings, 0 replies; 21+ messages in thread From: Anand Jain @ 2016-06-23 13:07 UTC (permalink / raw) To: Chris Mason; +Cc: linux-btrfs, holger, dsterba, xiaolong.ye On 06/23/2016 05:47 AM, Chris Mason wrote: > > > On Wed, Jun 22, 2016 at 6:18 AM, Anand Jain <anand.jain@oracle.com> wrote: >> >> >> Thanks for the review Chris. >> >> On 06/21/2016 09:00 PM, Chris Mason wrote: >>> On 06/21/2016 06:24 AM, Anand Jain wrote: >>>> From: Anand Jain <Anand.Jain@oracle.com> >>>> >>>> Further to the commit >>>> bc178622d40d87e75abc131007342429c9b03351 >>>> btrfs: use rcu_barrier() to wait for bdev puts at unmount >>>> >>>> This patch implements a method to time wait on the __free_device() >>>> which actually does the bdev put. This is needed as the user space >>>> running 'btrfs fi show -d' immediately after the replace and >>>> unmount, is still reading older information from the device. >>> >>> Thanks for working on this Anand. Since it looks like blkdev_put can >>> deadlock against us, can we please switch to making sure we fully flush >>> the outstanding IO? It's probably enough to do a sync_blockdev() call >>> before we allow the unmount to finish, but we can toss in an >>> invalidate_bdev for good measure. >> >> >> ------------ >> # git diff >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 604daf315669..e0ad29d6fe9a 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct >> btrfs_device *device) >> if (device->missing) >> fs_devices->missing_devices--; >> >> + if (device->bdev && device->writeable) { >> + sync_blockdev(device->bdev); >> + invalidate_bdev(device->bdev); >> + } >> + >> new_device = btrfs_alloc_device(NULL, &device->devid, >> device->uuid); >> BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ >> ----------- >> >> >> However, theoretically still there might be a problem - at the end of >> unmount, if the device exclusive open is not actually closed, then >> there might be a race with another program which is trying to open >> the device in exclusive mode. Like for eg: >> unmount /btrfs; fsck /dev/X >> and here fsck might fail to open the device if it wins the race. > > This true, but at least we know he'll have up to date buffers if he does > manage to open the device. > > With the generic code, the blkdev_put happens after the super is gone, > so I'm not sure we can completely fix this from inside our callback. Makes sense, sent out v3 with title (btrfs: make sure device is synced before return) Also sent out RFC patch btrfs: make sure device is synced before return where I have tried not to background blkdev_put(), which seems to be better, it works fine per fstests. Thanks, Anand > -chris > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 2/2] btrfs: make sure device is synced before return 2016-06-14 10:55 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain 2016-06-18 16:34 ` Holger Hoffstätte 2016-06-21 10:24 ` [PATCH v2 " Anand Jain @ 2016-06-23 12:54 ` Anand Jain 2016-06-23 14:27 ` Chris Mason 2016-07-08 14:13 ` David Sterba 2 siblings, 2 replies; 21+ messages in thread From: Anand Jain @ 2016-06-23 12:54 UTC (permalink / raw) To: linux-btrfs; +Cc: dsterba, holger, clm, xiaolong.ye An inconsistent behavior due to stale reads from the disk was reported mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html This patch will make sure devices are synced before return in the unmount thread. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- v2: Also to make sure bdev_closing is set it needs rcu_barrier(), restored rcu_barrier(). v3: Removed the complicated timed-wait on blkdev_put code, but make synced and invalidated before umount is returned. So sorry that I have to change the title of this patch as well (was: btrfs: wait for bdev put). fs/btrfs/volumes.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 604daf315669..f741ade130a4 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->missing) fs_devices->missing_devices--; + if (device->bdev && device->writeable) { + sync_blockdev(device->bdev); + invalidate_bdev(device->bdev); + } + new_device = btrfs_alloc_device(NULL, &device->devid, device->uuid); BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ -- 2.7.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] btrfs: make sure device is synced before return 2016-06-23 12:54 ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain @ 2016-06-23 14:27 ` Chris Mason 2016-07-08 14:13 ` David Sterba 1 sibling, 0 replies; 21+ messages in thread From: Chris Mason @ 2016-06-23 14:27 UTC (permalink / raw) To: Anand Jain, linux-btrfs; +Cc: dsterba, holger, xiaolong.ye On 06/23/2016 08:54 AM, Anand Jain wrote: > An inconsistent behavior due to stale reads from the > disk was reported > > mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html > > This patch will make sure devices are synced before > return in the unmount thread. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > v2: Also to make sure bdev_closing is set it needs rcu_barrier(), > restored rcu_barrier(). > v3: Removed the complicated timed-wait on blkdev_put code, > but make synced and invalidated before umount is returned. > So sorry that I have to change the title of this patch > as well (was: btrfs: wait for bdev put). > > fs/btrfs/volumes.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 604daf315669..f741ade130a4 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct btrfs_device *device) > if (device->missing) > fs_devices->missing_devices--; > > + if (device->bdev && device->writeable) { > + sync_blockdev(device->bdev); > + invalidate_bdev(device->bdev); > + } > + > new_device = btrfs_alloc_device(NULL, &device->devid, > device->uuid); > BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ > This should do it,I'll check with the xfstest I was able to reliably get errors on. Thanks! -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] btrfs: make sure device is synced before return 2016-06-23 12:54 ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain 2016-06-23 14:27 ` Chris Mason @ 2016-07-08 14:13 ` David Sterba 1 sibling, 0 replies; 21+ messages in thread From: David Sterba @ 2016-07-08 14:13 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs, dsterba, holger, clm, xiaolong.ye On Thu, Jun 23, 2016 at 08:54:07PM +0800, Anand Jain wrote: > An inconsistent behavior due to stale reads from the > disk was reported > > mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html > > This patch will make sure devices are synced before > return in the unmount thread. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> Added to for-next. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2016-07-08 14:13 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-26 9:27 [PULL] Btrfs for 4.7, part 2 David Sterba 2016-05-27 0:14 ` Chris Mason 2016-05-27 11:18 ` David Sterba 2016-05-27 14:35 ` Chris Mason 2016-05-27 15:42 ` Chris Mason 2016-05-28 5:14 ` Anand Jain 2016-05-29 12:21 ` Chris Mason 2016-06-14 10:52 ` Anand Jain 2016-06-14 10:55 ` [PATCH 1/2] btrfs: reorg btrfs_close_one_device() Anand Jain 2016-06-14 10:55 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain 2016-06-18 16:34 ` Holger Hoffstätte 2016-06-20 8:33 ` Anand Jain 2016-06-21 10:24 ` [PATCH v2 " Anand Jain 2016-06-21 11:46 ` Holger Hoffstätte 2016-06-21 13:00 ` Chris Mason 2016-06-22 10:18 ` Anand Jain 2016-06-22 21:47 ` Chris Mason 2016-06-23 13:07 ` Anand Jain 2016-06-23 12:54 ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain 2016-06-23 14:27 ` Chris Mason 2016-07-08 14:13 ` David Sterba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.