All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL] Btrfs for 4.7, part 2
@ 2016-05-26  9:27 David Sterba
  2016-05-27  0:14 ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2016-05-26  9:27 UTC (permalink / raw)
  To: clm; +Cc: linux-btrfs, David Sterba

Hi,

please pull a few more patches that did not go to pull #1 for 4.7, minor
cleanups and fixes. Thanks.


The following changes since commit c315ef8d9db7f1a0ebd023a395ebdfde1c68057e:

  Merge branch 'for-chris-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7 (2016-05-17 14:43:19 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris

for you to fetch changes up to 4c6143dd497901e3537dc4324dc203dfda442009:

  Merge branch 'dev/comp-workspaces' into for-chris-4.7-20160525 (2016-05-25 22:51:04 +0200)

----------------------------------------------------------------
David Sterba (17):
  btrfs: rename and document compression workspace members
  btrfs: preallocate compression workspaces
  btrfs: make find_workspace always succeed
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: sink gfp parameter to set_extent_bits
  btrfs: sink gfp parameter to clear_extent_bits
  btrfs: sink gfp parameter to clear_record_extent_bits
  btrfs: sink gfp parameter to clear_extent_dirty
  btrfs: sink gfp parameter to set_extent_delalloc
  btrfs: sink gfp parameter to set_extent_defrag
  btrfs: sink gfp parameter to set_extent_new
  btrfs: sink gfp parameter to set_record_extent_bits
  btrfs: untangle gotos a bit in __set_extent_bit
  btrfs: untangle gotos a bit in __clear_extent_bit
  btrfs: untangle gotos a bit in convert_extent_bit
  btrfs: make state preallocation more speculative in __set_extent_bit
  btrfs: sink gfp parameter to convert_extent_bit

Liu Bo (2):
  Btrfs: free sys_array eb as soon as possible
  Btrfs: fix unexpected return value of fiemap

Nicholas D Steeves (1):
  btrfs: fix string and comment grammatical issues and typos

Zhao Lei (1):
  btrfs: scrub: Set bbio to NULL before calling btrfs_map_block


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-26  9:27 [PULL] Btrfs for 4.7, part 2 David Sterba
@ 2016-05-27  0:14 ` Chris Mason
  2016-05-27 11:18   ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2016-05-27  0:14 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote:
> Hi,
> 
> please pull a few more patches that did not go to pull #1 for 4.7, minor
> cleanups and fixes. Thanks.

Thanks Dave!  Trying to figure out why we're failing btrfs/011, but I
don't see how it could be related to this bunch.  I'll nail it down.

-chris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-27  0:14 ` Chris Mason
@ 2016-05-27 11:18   ` David Sterba
  2016-05-27 14:35     ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2016-05-27 11:18 UTC (permalink / raw)
  To: Chris Mason, David Sterba, linux-btrfs

On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote:
> On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote:
> > Hi,
> > 
> > please pull a few more patches that did not go to pull #1 for 4.7, minor
> > cleanups and fixes. Thanks.
> 
> Thanks Dave!  Trying to figure out why we're failing btrfs/011, but I
> don't see how it could be related to this bunch.  I'll nail it down.

011 passes here, there are some unrelated soft-failures (mismatching
output with new progs). I'm now testing a branch without "btrfs: scrub:
Set bbio to NULL before calling btrfs_map_block", that seems to be the
only likely offender.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-27 11:18   ` David Sterba
@ 2016-05-27 14:35     ` Chris Mason
  2016-05-27 15:42       ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2016-05-27 14:35 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs

On Fri, May 27, 2016 at 01:18:22PM +0200, David Sterba wrote:
> On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote:
> > On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote:
> > > Hi,
> > > 
> > > please pull a few more patches that did not go to pull #1 for 4.7, minor
> > > cleanups and fixes. Thanks.
> > 
> > Thanks Dave!  Trying to figure out why we're failing btrfs/011, but I
> > don't see how it could be related to this bunch.  I'll nail it down.
> 
> 011 passes here, there are some unrelated soft-failures (mismatching
> output with new progs). I'm now testing a branch without "btrfs: scrub:
> Set bbio to NULL before calling btrfs_map_block", that seems to be the
> only likely offender.

I'm getting errors from btrfs fi show -d, after the very last round of
device replaces.  A little extra debugging:

bytenr mismatch, want=4332716032, have=0
ERROR: cannot read chunk root
ERROR reading /dev/vdh
failed /dev/vdh

Which is cute because the very next command we run fscks /dev/vdh and
succeeds.

So the page cache is stale and this isn't related to any of our patches.

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-27 14:35     ` Chris Mason
@ 2016-05-27 15:42       ` Chris Mason
  2016-05-28  5:14         ` Anand Jain
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2016-05-27 15:42 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs

On Fri, May 27, 2016 at 10:35:27AM -0400, Chris Mason wrote:
> On Fri, May 27, 2016 at 01:18:22PM +0200, David Sterba wrote:
> > On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote:
> > > On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote:
> > > > Hi,
> > > > 
> > > > please pull a few more patches that did not go to pull #1 for 4.7, minor
> > > > cleanups and fixes. Thanks.
> > > 
> > > Thanks Dave!  Trying to figure out why we're failing btrfs/011, but I
> > > don't see how it could be related to this bunch.  I'll nail it down.
> > 
> > 011 passes here, there are some unrelated soft-failures (mismatching
> > output with new progs). I'm now testing a branch without "btrfs: scrub:
> > Set bbio to NULL before calling btrfs_map_block", that seems to be the
> > only likely offender.
> 
> I'm getting errors from btrfs fi show -d, after the very last round of
> device replaces.  A little extra debugging:
> 
> bytenr mismatch, want=4332716032, have=0
> ERROR: cannot read chunk root
> ERROR reading /dev/vdh
> failed /dev/vdh
> 
> Which is cute because the very next command we run fscks /dev/vdh and
> succeeds.
> 
> So the page cache is stale and this isn't related to any of our patches.

close_ctree() calls into btrfs_close_devices(), which calls
btrfs_close_one_device(), which uses:

call_rcu(&device->rcu, free_device);

close_ctree() also does an rcu_barrier() to make sure and wait for
free_device() to finish.

But, free_device() just puts the work into schedule_work(), so we don't
know for sure the blkdev_put is done when we exit.

It's been this way for a while, so its not holding up my pull request to
Linus.  But I'll fix it up.

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-27 15:42       ` Chris Mason
@ 2016-05-28  5:14         ` Anand Jain
  2016-05-29 12:21           ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Anand Jain @ 2016-05-28  5:14 UTC (permalink / raw)
  To: Chris Mason, dsterba, David Sterba, linux-btrfs



On 05/27/2016 11:42 PM, Chris Mason wrote:
> On Fri, May 27, 2016 at 10:35:27AM -0400, Chris Mason wrote:
>> On Fri, May 27, 2016 at 01:18:22PM +0200, David Sterba wrote:
>>> On Thu, May 26, 2016 at 08:14:14PM -0400, Chris Mason wrote:
>>>> On Thu, May 26, 2016 at 11:27:06AM +0200, David Sterba wrote:
>>>>> Hi,
>>>>>
>>>>> please pull a few more patches that did not go to pull #1 for 4.7, minor
>>>>> cleanups and fixes. Thanks.
>>>>
>>>> Thanks Dave!  Trying to figure out why we're failing btrfs/011, but I
>>>> don't see how it could be related to this bunch.  I'll nail it down.
>>>
>>> 011 passes here, there are some unrelated soft-failures (mismatching
>>> output with new progs). I'm now testing a branch without "btrfs: scrub:
>>> Set bbio to NULL before calling btrfs_map_block", that seems to be the
>>> only likely offender.
>>
>> I'm getting errors from btrfs fi show -d, after the very last round of
>> device replaces.  A little extra debugging:
>>
>> bytenr mismatch, want=4332716032, have=0
>> ERROR: cannot read chunk root
>> ERROR reading /dev/vdh
>> failed /dev/vdh
 >>
>> Which is cute because the very next command we run fscks /dev/vdh and
>> succeeds.

Checked the code paths both btrfs fi show -d and btrfs check,
both are calling flush during relative open_ctree in progs.

However the flush is called after we have read superblock. That
means the read_superblock during 'show' cli (only) will read superblock
without flush, and 'check' won't, because 011 calls 'check' after
'show'. But it still does not explain the above error, which is
during open_ctree not at read superblock. Remains strange case as
of now.

Also. I can't reproduce.

>> So the page cache is stale and this isn't related to any of our patches.
>
> close_ctree() calls into btrfs_close_devices(), which calls
> btrfs_close_one_device(), which uses:
>
> call_rcu(&device->rcu, free_device);
>
> close_ctree() also does an rcu_barrier() to make sure and wait for
> free_device() to finish.
>
> But, free_device() just puts the work into schedule_work(), so we don't
> know for sure the blkdev_put is done when we exit.

  Right, saw that before. Any idea why its like that ? Or if it
  should be fixed?

> It's been this way for a while, so its not holding up my pull request to
> Linus.  But I'll fix it up.

  Yes. Its been like that.

Thanks, Anand


> -chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-28  5:14         ` Anand Jain
@ 2016-05-29 12:21           ` Chris Mason
  2016-06-14 10:52             ` Anand Jain
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2016-05-29 12:21 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, David Sterba, linux-btrfs

On Sat, May 28, 2016 at 01:14:13PM +0800, Anand Jain wrote:
>
>
>On 05/27/2016 11:42 PM, Chris Mason wrote:
>>>I'm getting errors from btrfs fi show -d, after the very last round of
>>>device replaces.  A little extra debugging:
>>>
>>>bytenr mismatch, want=4332716032, have=0
>>>ERROR: cannot read chunk root
>>>ERROR reading /dev/vdh
>>>failed /dev/vdh
>>>
>>>Which is cute because the very next command we run fscks /dev/vdh and
>>>succeeds.
>
>Checked the code paths both btrfs fi show -d and btrfs check,
>both are calling flush during relative open_ctree in progs.
>
>However the flush is called after we have read superblock. That
>means the read_superblock during 'show' cli (only) will read superblock
>without flush, and 'check' won't, because 011 calls 'check' after
>'show'. But it still does not explain the above error, which is
>during open_ctree not at read superblock. Remains strange case as
>of now.

It's because we're just not done writing it out yet when btrfs fi show is run.
I think replace is special here.

>
>Also. I can't reproduce.
>

I'm in a relatively new test rig using kvm, which probably explains why
I haven't seen it before.  You can probably make it easier by adding
a sleep inside the actual __free_device() func.

>>>So the page cache is stale and this isn't related to any of our patches.
>>
>>close_ctree() calls into btrfs_close_devices(), which calls
>>btrfs_close_one_device(), which uses:
>>
>>call_rcu(&device->rcu, free_device);
>>
>>close_ctree() also does an rcu_barrier() to make sure and wait for
>>free_device() to finish.
>>
>>But, free_device() just puts the work into schedule_work(), so we don't
>>know for sure the blkdev_put is done when we exit.
>
> Right, saw that before. Any idea why its like that ? Or if it
> should be fixed?

It's just trying to limit the work that is done from call_rcu, and it should
definitely be fixed.  It might cause EBUSY or other problems.  Probably
easiest to add a counter or completion object that gets changed by the
__free_device function.

-chris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PULL] Btrfs for 4.7, part 2
  2016-05-29 12:21           ` Chris Mason
@ 2016-06-14 10:52             ` Anand Jain
  2016-06-14 10:55               ` [PATCH 1/2] btrfs: reorg btrfs_close_one_device() Anand Jain
  0 siblings, 1 reply; 21+ messages in thread
From: Anand Jain @ 2016-06-14 10:52 UTC (permalink / raw)
  To: Chris Mason, dsterba, David Sterba, linux-btrfs


Chris,

   Sorry for the delay due to vacation.

more below..

On 05/29/2016 08:21 PM, Chris Mason wrote:
> On Sat, May 28, 2016 at 01:14:13PM +0800, Anand Jain wrote:
>>
>>
>> On 05/27/2016 11:42 PM, Chris Mason wrote:
>>>> I'm getting errors from btrfs fi show -d, after the very last round of
>>>> device replaces.  A little extra debugging:
>>>>
>>>> bytenr mismatch, want=4332716032, have=0
>>>> ERROR: cannot read chunk root
>>>> ERROR reading /dev/vdh
>>>> failed /dev/vdh
>>>>
>>>> Which is cute because the very next command we run fscks /dev/vdh and
>>>> succeeds.
>>
>> Checked the code paths both btrfs fi show -d and btrfs check,
>> both are calling flush during relative open_ctree in progs.
>>
>> However the flush is called after we have read superblock. That
>> means the read_superblock during 'show' cli (only) will read superblock
>> without flush, and 'check' won't, because 011 calls 'check' after
>> 'show'. But it still does not explain the above error, which is
>> during open_ctree not at read superblock. Remains strange case as
>> of now.
>
> It's because we're just not done writing it out yet when btrfs fi show
> is run.
> I think replace is special here.
>
>>
>> Also. I can't reproduce.
>>
>
> I'm in a relatively new test rig using kvm, which probably explains why
> I haven't seen it before.  You can probably make it easier by adding
> a sleep inside the actual __free_device() func.
>
>>>> So the page cache is stale and this isn't related to any of our
>>>> patches.
>>>
>>> close_ctree() calls into btrfs_close_devices(), which calls
>>> btrfs_close_one_device(), which uses:
>>>
>>> call_rcu(&device->rcu, free_device);
>>>
>>> close_ctree() also does an rcu_barrier() to make sure and wait for
>>> free_device() to finish.
>>>
>>> But, free_device() just puts the work into schedule_work(), so we don't
>>> know for sure the blkdev_put is done when we exit.
>>
>> Right, saw that before. Any idea why its like that ? Or if it
>> should be fixed?
>
> It's just trying to limit the work that is done from call_rcu, and it
> should
> definitely be fixed.  It might cause EBUSY or other problems.  Probably
> easiest to add a counter or completion object that gets changed by the
> __free_device function.


yes indeed sleep made the problem to reproduce,

Also looks like this problem was identified by below
commit before, however the fix wasn't correct.
    ----
      commit bc178622d40d87e75abc131007342429c9b03351
      btrfs: use rcu_barrier() to wait for bdev puts at unmount

      ::
      Adding an rcu_barrier() to btrfs_close_devices() causes unmount
      to wait
      until all blkdev_put()s are done, and the device is truly free once
      unmount complet
    ----

  As free_devces() spinoff __free_device() to make the actual
  bdev put we need to wait on __free_device(). But rcu_barrier()
  just waits for free_device() to complete, so at the end of
  rcu_barrier() the blkdev_put()  may not be completed.


  Wrote a new fix as in the patches,
   [PATH 2/2] btrfs: wait for bdev put

  For review comments.


Thanks, -Anand

> -chris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] btrfs: reorg btrfs_close_one_device()
  2016-06-14 10:52             ` Anand Jain
@ 2016-06-14 10:55               ` Anand Jain
  2016-06-14 10:55                 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain
  0 siblings, 1 reply; 21+ messages in thread
From: Anand Jain @ 2016-06-14 10:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm

Moves closer to the caller and removes declaration

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 71 +++++++++++++++++++++++++++---------------------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a637e99e4c6b..a4e8d48acd4b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -140,7 +140,6 @@ static int btrfs_relocate_sys_chunks(struct btrfs_root *root);
 static void __btrfs_reset_dev_stats(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
-static void btrfs_close_one_device(struct btrfs_device *device);
 
 DEFINE_MUTEX(uuid_mutex);
 static LIST_HEAD(fs_uuids);
@@ -853,6 +852,41 @@ static void free_device(struct rcu_head *head)
 	schedule_work(&device->rcu_work);
 }
 
+static void btrfs_close_one_device(struct btrfs_device *device)
+{
+	struct btrfs_fs_devices *fs_devices = device->fs_devices;
+	struct btrfs_device *new_device;
+	struct rcu_string *name;
+
+	if (device->bdev)
+		fs_devices->open_devices--;
+
+	if (device->writeable &&
+	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
+		list_del_init(&device->dev_alloc_list);
+		fs_devices->rw_devices--;
+	}
+
+	if (device->missing)
+		fs_devices->missing_devices--;
+
+	new_device = btrfs_alloc_device(NULL, &device->devid,
+					device->uuid);
+	BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
+
+	/* Safe because we are under uuid_mutex */
+	if (device->name) {
+		name = rcu_string_strdup(device->name->str, GFP_NOFS);
+		BUG_ON(!name); /* -ENOMEM */
+		rcu_assign_pointer(new_device->name, name);
+	}
+
+	list_replace_rcu(&device->dev_list, &new_device->dev_list);
+	new_device->fs_devices = device->fs_devices;
+
+	call_rcu(&device->rcu, free_device);
+}
+
 static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_device *device, *tmp;
@@ -7054,38 +7088,3 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
 		fs_devices = fs_devices->seed;
 	}
 }
-
-static void btrfs_close_one_device(struct btrfs_device *device)
-{
-	struct btrfs_fs_devices *fs_devices = device->fs_devices;
-	struct btrfs_device *new_device;
-	struct rcu_string *name;
-
-	if (device->bdev)
-		fs_devices->open_devices--;
-
-	if (device->writeable &&
-	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
-		list_del_init(&device->dev_alloc_list);
-		fs_devices->rw_devices--;
-	}
-
-	if (device->missing)
-		fs_devices->missing_devices--;
-
-	new_device = btrfs_alloc_device(NULL, &device->devid,
-					device->uuid);
-	BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
-
-	/* Safe because we are under uuid_mutex */
-	if (device->name) {
-		name = rcu_string_strdup(device->name->str, GFP_NOFS);
-		BUG_ON(!name); /* -ENOMEM */
-		rcu_assign_pointer(new_device->name, name);
-	}
-
-	list_replace_rcu(&device->dev_list, &new_device->dev_list);
-	new_device->fs_devices = device->fs_devices;
-
-	call_rcu(&device->rcu, free_device);
-}
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] btrfs: wait for bdev put
  2016-06-14 10:55               ` [PATCH 1/2] btrfs: reorg btrfs_close_one_device() Anand Jain
@ 2016-06-14 10:55                 ` Anand Jain
  2016-06-18 16:34                   ` Holger Hoffstätte
                                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Anand Jain @ 2016-06-14 10:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: clm

Further to the previous commit
 bc178622d40d87e75abc131007342429c9b03351
 btrfs: use rcu_barrier() to wait for bdev puts at unmount

Since free_device() spinoff __free_device() the rcu_barrier() for
      call_rcu(&device->rcu, free_device);
didn't help.

This patch reverts changes by
 bc178622d40d87e75abc131007342429c9b03351
and implement a method to wait on the __free_device() by using
a new bdev_closing member in struct btrfs_device.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[rework: bc178622d40d87e75abc131007342429c9b03351]
---
 fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/volumes.h |  1 +
 2 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a4e8d48acd4b..404ce1daebb1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -27,6 +27,7 @@
 #include <linux/raid/pq.h>
 #include <linux/semaphore.h>
 #include <linux/uuid.h>
+#include <linux/delay.h>
 #include <asm/div64.h>
 #include "ctree.h"
 #include "extent_map.h"
@@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void)
 	return dev;
 }
 
+static int is_device_closing(struct list_head *head)
+{
+	struct btrfs_device *dev;
+
+	list_for_each_entry(dev, head, dev_list) {
+		if (dev->bdev_closing)
+			return 1;
+	}
+	return 0;
+}
+
 static noinline struct btrfs_device *__find_device(struct list_head *head,
 						   u64 devid, u8 *uuid)
 {
@@ -832,12 +844,22 @@ again:
 static void __free_device(struct work_struct *work)
 {
 	struct btrfs_device *device;
+	struct btrfs_device *new_device_addr;
 
 	device = container_of(work, struct btrfs_device, rcu_work);
 
 	if (device->bdev)
 		blkdev_put(device->bdev, device->mode);
 
+	/*
+	 * If we are coming here from btrfs_close_one_device()
+	 * then it allocates a new device structure for the same
+	 * devid, so find device again with the devid
+	 */
+	new_device_addr = __find_device(&device->fs_devices->devices,
+						device->devid, NULL);
+
+	new_device_addr->bdev_closing = 0;
 	rcu_string_free(device->name);
 	kfree(device);
 }
@@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 	list_replace_rcu(&device->dev_list, &new_device->dev_list);
 	new_device->fs_devices = device->fs_devices;
 
+	/*
+	 * So to wait for kworkers to finish all blkdev_puts,
+	 * so device is really free when umount is done.
+	 */
+	new_device->bdev_closing = 1;
+
 	call_rcu(&device->rcu, free_device);
 }
 
@@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_fs_devices *seed_devices = NULL;
 	int ret;
+	int retry_cnt = 5;
 
 	mutex_lock(&uuid_mutex);
 	ret = __btrfs_close_devices(fs_devices);
@@ -927,12 +956,15 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 		__btrfs_close_devices(fs_devices);
 		free_fs_devices(fs_devices);
 	}
-	/*
-	 * Wait for rcu kworkers under __btrfs_close_devices
-	 * to finish all blkdev_puts so device is really
-	 * free when umount is done.
-	 */
-	rcu_barrier();
+
+	while (is_device_closing(&fs_devices->devices) &&
+						--retry_cnt) {
+		mdelay(1000); //1 sec
+	}
+
+	if (!(retry_cnt > 0))
+		printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n",
+			fs_devices->fsid);
 	return ret;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 0ac90f8d85bd..945e49f5e17d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -150,6 +150,7 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+	int bdev_closing;
 };
 
 /*
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] btrfs: wait for bdev put
  2016-06-14 10:55                 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain
@ 2016-06-18 16:34                   ` Holger Hoffstätte
  2016-06-20  8:33                     ` Anand Jain
  2016-06-21 10:24                   ` [PATCH v2 " Anand Jain
  2016-06-23 12:54                   ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain
  2 siblings, 1 reply; 21+ messages in thread
From: Holger Hoffstätte @ 2016-06-18 16:34 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 14 Jun 2016 18:55:26 +0800, Anand Jain wrote:

> Further to the previous commit
>  bc178622d40d87e75abc131007342429c9b03351
>  btrfs: use rcu_barrier() to wait for bdev puts at unmount
> 
> Since free_device() spinoff __free_device() the rcu_barrier() for
>       call_rcu(&device->rcu, free_device);
> didn't help.
> 
> This patch reverts changes by
>  bc178622d40d87e75abc131007342429c9b03351
> and implement a method to wait on the __free_device() by using
> a new bdev_closing member in struct btrfs_device.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> [rework: bc178622d40d87e75abc131007342429c9b03351]
> ---
>  fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++------
>  fs/btrfs/volumes.h |  1 +
>  2 files changed, 39 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a4e8d48acd4b..404ce1daebb1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -27,6 +27,7 @@
>  #include <linux/raid/pq.h>
>  #include <linux/semaphore.h>
>  #include <linux/uuid.h>
> +#include <linux/delay.h>
>  #include <asm/div64.h>
>  #include "ctree.h"
>  #include "extent_map.h"
> @@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void)
>  	return dev;
>  }
>  
> +static int is_device_closing(struct list_head *head)
> +{
> +	struct btrfs_device *dev;
> +
> +	list_for_each_entry(dev, head, dev_list) {
> +		if (dev->bdev_closing)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
>  static noinline struct btrfs_device *__find_device(struct list_head *head,
>  						   u64 devid, u8 *uuid)
>  {
> @@ -832,12 +844,22 @@ again:
>  static void __free_device(struct work_struct *work)
>  {
>  	struct btrfs_device *device;
> +	struct btrfs_device *new_device_addr;
>  
>  	device = container_of(work, struct btrfs_device, rcu_work);
>  
>  	if (device->bdev)
>  		blkdev_put(device->bdev, device->mode);
>  
> +	/*
> +	 * If we are coming here from btrfs_close_one_device()
> +	 * then it allocates a new device structure for the same
> +	 * devid, so find device again with the devid
> +	 */
> +	new_device_addr = __find_device(&device->fs_devices->devices,
> +						device->devid, NULL);
> +
> +	new_device_addr->bdev_closing = 0;
>  	rcu_string_free(device->name);
>  	kfree(device);
>  }
> @@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device)
>  	list_replace_rcu(&device->dev_list, &new_device->dev_list);
>  	new_device->fs_devices = device->fs_devices;
>  
> +	/*
> +	 * So to wait for kworkers to finish all blkdev_puts,
> +	 * so device is really free when umount is done.
> +	 */
> +	new_device->bdev_closing = 1;
> +
>  	call_rcu(&device->rcu, free_device);
>  }
>  
> @@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
>  {
>  	struct btrfs_fs_devices *seed_devices = NULL;
>  	int ret;
> +	int retry_cnt = 5;
>  
>  	mutex_lock(&uuid_mutex);
>  	ret = __btrfs_close_devices(fs_devices);
> @@ -927,12 +956,15 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
>  		__btrfs_close_devices(fs_devices);
>  		free_fs_devices(fs_devices);
>  	}
> -	/*
> -	 * Wait for rcu kworkers under __btrfs_close_devices
> -	 * to finish all blkdev_puts so device is really
> -	 * free when umount is done.
> -	 */
> -	rcu_barrier();
> +
> +	while (is_device_closing(&fs_devices->devices) &&
> +						--retry_cnt) {
> +		mdelay(1000); //1 sec
> +	}
> +
> +	if (!(retry_cnt > 0))
> +		printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n",
> +			fs_devices->fsid);
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 0ac90f8d85bd..945e49f5e17d 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -150,6 +150,7 @@ struct btrfs_device {
>  	/* Counter to record the change of device stats */
>  	atomic_t dev_stats_ccnt;
>  	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
> +	int bdev_closing;
>  };
>  
>  /*
> -- 
> 2.7.0

I gave this a try and somehow it seems to make unmounting worse:
it now always takes ~5s (max retry time) and I see the warning every
time. Without the patch unmounting a single volume (disk) is much
faster (1-2s), without problems.

Any ideas?

cheers,
Holger


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] btrfs: wait for bdev put
  2016-06-18 16:34                   ` Holger Hoffstätte
@ 2016-06-20  8:33                     ` Anand Jain
  0 siblings, 0 replies; 21+ messages in thread
From: Anand Jain @ 2016-06-20  8:33 UTC (permalink / raw)
  To: Holger Hoffstätte, linux-btrfs



On 06/19/2016 12:34 AM, Holger Hoffstätte wrote:
> On Tue, 14 Jun 2016 18:55:26 +0800, Anand Jain wrote:
>
>> Further to the previous commit
>>  bc178622d40d87e75abc131007342429c9b03351
>>  btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>
>> Since free_device() spinoff __free_device() the rcu_barrier() for
>>       call_rcu(&device->rcu, free_device);
>> didn't help.
>>
>> This patch reverts changes by
>>  bc178622d40d87e75abc131007342429c9b03351
>> and implement a method to wait on the __free_device() by using
>> a new bdev_closing member in struct btrfs_device.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> [rework: bc178622d40d87e75abc131007342429c9b03351]
>> ---
>>  fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++++++------
>>  fs/btrfs/volumes.h |  1 +
>>  2 files changed, 39 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index a4e8d48acd4b..404ce1daebb1 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/raid/pq.h>
>>  #include <linux/semaphore.h>
>>  #include <linux/uuid.h>
>> +#include <linux/delay.h>
>>  #include <asm/div64.h>
>>  #include "ctree.h"
>>  #include "extent_map.h"
>> @@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void)
>>  	return dev;
>>  }
>>
>> +static int is_device_closing(struct list_head *head)
>> +{
>> +	struct btrfs_device *dev;
>> +
>> +	list_for_each_entry(dev, head, dev_list) {
>> +		if (dev->bdev_closing)
>> +			return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>>  static noinline struct btrfs_device *__find_device(struct list_head *head,
>>  						   u64 devid, u8 *uuid)
>>  {
>> @@ -832,12 +844,22 @@ again:
>>  static void __free_device(struct work_struct *work)
>>  {
>>  	struct btrfs_device *device;
>> +	struct btrfs_device *new_device_addr;
>>
>>  	device = container_of(work, struct btrfs_device, rcu_work);
>>
>>  	if (device->bdev)
>>  		blkdev_put(device->bdev, device->mode);
>>
>> +	/*
>> +	 * If we are coming here from btrfs_close_one_device()
>> +	 * then it allocates a new device structure for the same
>> +	 * devid, so find device again with the devid
>> +	 */
>> +	new_device_addr = __find_device(&device->fs_devices->devices,
>> +						device->devid, NULL);
>> +
>> +	new_device_addr->bdev_closing = 0;
>>  	rcu_string_free(device->name);
>>  	kfree(device);
>>  }
>> @@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device)
>>  	list_replace_rcu(&device->dev_list, &new_device->dev_list);
>>  	new_device->fs_devices = device->fs_devices;
>>
>> +	/*
>> +	 * So to wait for kworkers to finish all blkdev_puts,
>> +	 * so device is really free when umount is done.
>> +	 */
>> +	new_device->bdev_closing = 1;
>> +
>>  	call_rcu(&device->rcu, free_device);
>>  }
>>
>> @@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
>>  {
>>  	struct btrfs_fs_devices *seed_devices = NULL;
>>  	int ret;
>> +	int retry_cnt = 5;
>>
>>  	mutex_lock(&uuid_mutex);
>>  	ret = __btrfs_close_devices(fs_devices);
>> @@ -927,12 +956,15 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
>>  		__btrfs_close_devices(fs_devices);
>>  		free_fs_devices(fs_devices);
>>  	}
>> -	/*
>> -	 * Wait for rcu kworkers under __btrfs_close_devices
>> -	 * to finish all blkdev_puts so device is really
>> -	 * free when umount is done.
>> -	 */
>> -	rcu_barrier();
>> +
>> +	while (is_device_closing(&fs_devices->devices) &&
>> +						--retry_cnt) {
>> +		mdelay(1000); //1 sec
>> +	}
>> +
>> +	if (!(retry_cnt > 0))
>> +		printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n",
>> +			fs_devices->fsid);
>>  	return ret;
>>  }
>>
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 0ac90f8d85bd..945e49f5e17d 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -150,6 +150,7 @@ struct btrfs_device {
>>  	/* Counter to record the change of device stats */
>>  	atomic_t dev_stats_ccnt;
>>  	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
>> +	int bdev_closing;
>>  };
>>
>>  /*
>> --
>> 2.7.0
>
> I gave this a try and somehow it seems to make unmounting worse:
> it now always takes ~5s (max retry time) and I see the warning every
> time. Without the patch unmounting a single volume (disk) is much
> faster (1-2s), without problems.

  Thanks Holger, for testing.
  It depends on long the blkdev_put() will take, originally unmount
  thread didn't wait for it to complete, so it was faster, but had
  other problem as explained.

Thanks, Anand

> Any ideas?


> cheers,
> Holger



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 2/2] btrfs: wait for bdev put
  2016-06-14 10:55                 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain
  2016-06-18 16:34                   ` Holger Hoffstätte
@ 2016-06-21 10:24                   ` Anand Jain
  2016-06-21 11:46                     ` Holger Hoffstätte
  2016-06-21 13:00                     ` Chris Mason
  2016-06-23 12:54                   ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain
  2 siblings, 2 replies; 21+ messages in thread
From: Anand Jain @ 2016-06-21 10:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: holger, clm, dsterba, xiaolong.ye

From: Anand Jain <Anand.Jain@oracle.com>

Further to the commit
     bc178622d40d87e75abc131007342429c9b03351
     btrfs: use rcu_barrier() to wait for bdev puts at unmount

This patch implements a method to time wait on the __free_device()
which actually does the bdev put. This is needed as the user space
running 'btrfs fi show -d' immediately after the replace and
unmount, is still reading older information from the device.

 mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[updates: bc178622d40d87e75abc131007342429c9b03351]
---
v2: Also to make sure bdev_closing is set it needs rcu_barrier(),
    restored rcu_barrier().

 fs/btrfs/volumes.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/volumes.h |  1 +
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 604daf315669..ef61c34cafbf 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -27,6 +27,7 @@
 #include <linux/raid/pq.h>
 #include <linux/semaphore.h>
 #include <linux/uuid.h>
+#include <linux/delay.h>
 #include <asm/div64.h>
 #include "ctree.h"
 #include "extent_map.h"
@@ -254,6 +255,17 @@ static struct btrfs_device *__alloc_device(void)
 	return dev;
 }
 
+static int is_device_closing(struct list_head *head)
+{
+	struct btrfs_device *dev;
+
+	list_for_each_entry(dev, head, dev_list) {
+		if (dev->bdev_closing)
+			return 1;
+	}
+	return 0;
+}
+
 static noinline struct btrfs_device *__find_device(struct list_head *head,
 						   u64 devid, u8 *uuid)
 {
@@ -832,12 +844,22 @@ again:
 static void __free_device(struct work_struct *work)
 {
 	struct btrfs_device *device;
+	struct btrfs_device *new_device_addr;
 
 	device = container_of(work, struct btrfs_device, rcu_work);
 
 	if (device->bdev)
 		blkdev_put(device->bdev, device->mode);
 
+	/*
+	 * If we are coming here from btrfs_close_one_device()
+	 * then it allocates a new device structure for the same
+	 * devid, so find device again with the devid
+	 */
+	new_device_addr = __find_device(&device->fs_devices->devices,
+						device->devid, NULL);
+
+	new_device_addr->bdev_closing = 0;
 	rcu_string_free(device->name);
 	kfree(device);
 }
@@ -884,6 +906,12 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 	list_replace_rcu(&device->dev_list, &new_device->dev_list);
 	new_device->fs_devices = device->fs_devices;
 
+	/*
+	 * So to wait for kworkers to finish all blkdev_puts,
+	 * so device is really free when umount is done.
+	 */
+	new_device->bdev_closing = 1;
+
 	call_rcu(&device->rcu, free_device);
 }
 
@@ -912,6 +940,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_fs_devices *seed_devices = NULL;
 	int ret;
+	int retry_cnt = 5;
 
 	mutex_lock(&uuid_mutex);
 	ret = __btrfs_close_devices(fs_devices);
@@ -929,10 +958,22 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 	}
 	/*
 	 * Wait for rcu kworkers under __btrfs_close_devices
-	 * to finish all blkdev_puts so device is really
-	 * free when umount is done.
+	 * to finish all free_device()
 	 */
 	rcu_barrier();
+
+	/*
+	 * Wait for a grace period so that __free_device()
+	 * will actaully do the device close.
+	 */
+	while (is_device_closing(&fs_devices->devices) &&
+						--retry_cnt) {
+		mdelay(1000); //1 sec
+	}
+
+	if (!(retry_cnt > 0))
+		printk(KERN_WARNING "BTRFS: %pU bdev_put didn't complete, giving up\n",
+			fs_devices->fsid);
 	return ret;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 0ac90f8d85bd..945e49f5e17d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -150,6 +150,7 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+	int bdev_closing;
 };
 
 /*
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] btrfs: wait for bdev put
  2016-06-21 10:24                   ` [PATCH v2 " Anand Jain
@ 2016-06-21 11:46                     ` Holger Hoffstätte
  2016-06-21 13:00                     ` Chris Mason
  1 sibling, 0 replies; 21+ messages in thread
From: Holger Hoffstätte @ 2016-06-21 11:46 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: clm, dsterba, xiaolong.ye

On 06/21/16 12:24, Anand Jain wrote:
> From: Anand Jain <Anand.Jain@oracle.com>
> 
> Further to the commit
>      bc178622d40d87e75abc131007342429c9b03351
>      btrfs: use rcu_barrier() to wait for bdev puts at unmount
> 
> This patch implements a method to time wait on the __free_device()
> which actually does the bdev put. This is needed as the user space
> running 'btrfs fi show -d' immediately after the replace and
> unmount, is still reading older information from the device.
> 
>  mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> [updates: bc178622d40d87e75abc131007342429c9b03351]
> ---
> v2: Also to make sure bdev_closing is set it needs rcu_barrier(),
>     restored rcu_barrier().

Looks like this one works reliably again. ;)
Tested with a slow disk, no long unmounts or timeout messages.

Tested-by: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>

thanks!
Holger


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] btrfs: wait for bdev put
  2016-06-21 10:24                   ` [PATCH v2 " Anand Jain
  2016-06-21 11:46                     ` Holger Hoffstätte
@ 2016-06-21 13:00                     ` Chris Mason
  2016-06-22 10:18                       ` Anand Jain
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Mason @ 2016-06-21 13:00 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: holger, dsterba, xiaolong.ye

On 06/21/2016 06:24 AM, Anand Jain wrote:
> From: Anand Jain <Anand.Jain@oracle.com>
>
> Further to the commit
>       bc178622d40d87e75abc131007342429c9b03351
>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>
> This patch implements a method to time wait on the __free_device()
> which actually does the bdev put. This is needed as the user space
> running 'btrfs fi show -d' immediately after the replace and
> unmount, is still reading older information from the device.

Thanks for working on this Anand.  Since it looks like blkdev_put can 
deadlock against us, can we please switch to making sure we fully flush 
the outstanding IO?  It's probably enough to do a sync_blockdev() call 
before we allow the unmount to finish, but we can toss in an 
invalidate_bdev for good measure.

Then we can get rid of the mdelay loop completely, which seems pretty 
error prone to me.

Thanks!

-chris


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] btrfs: wait for bdev put
  2016-06-21 13:00                     ` Chris Mason
@ 2016-06-22 10:18                       ` Anand Jain
  2016-06-22 21:47                         ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Anand Jain @ 2016-06-22 10:18 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs; +Cc: holger, dsterba, xiaolong.ye



  Thanks for the review Chris.

On 06/21/2016 09:00 PM, Chris Mason wrote:
> On 06/21/2016 06:24 AM, Anand Jain wrote:
>> From: Anand Jain <Anand.Jain@oracle.com>
>>
>> Further to the commit
>>       bc178622d40d87e75abc131007342429c9b03351
>>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>
>> This patch implements a method to time wait on the __free_device()
>> which actually does the bdev put. This is needed as the user space
>> running 'btrfs fi show -d' immediately after the replace and
>> unmount, is still reading older information from the device.
>
> Thanks for working on this Anand.  Since it looks like blkdev_put can
> deadlock against us, can we please switch to making sure we fully flush
> the outstanding IO?  It's probably enough to do a sync_blockdev() call
> before we allow the unmount to finish, but we can toss in an
> invalidate_bdev for good measure.


------------
# git diff
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 604daf315669..e0ad29d6fe9a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct 
btrfs_device *device)
         if (device->missing)
                 fs_devices->missing_devices--;

+       if (device->bdev && device->writeable) {
+               sync_blockdev(device->bdev);
+               invalidate_bdev(device->bdev);
+       }
+
         new_device = btrfs_alloc_device(NULL, &device->devid,
                                         device->uuid);
         BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
-----------


  However, theoretically still there might be a problem - at the end of
  unmount, if the device exclusive open is not actually closed, then
  there might be a race with another program which is trying to open
  the device in exclusive mode. Like for eg:
       unmount /btrfs; fsck /dev/X
  and here fsck might fail to open the device if it wins the race.


> Then we can get rid of the mdelay loop completely, which seems pretty
> error prone to me.

  Yes, the code got little complex here (and also when sysfs fixes
  were wrote) as struct btrfs_device is getting migrated to a new
  struct btrfs_device at unmount, which I don't think was necessary?


Thanks, Anand


> Thanks!
>
> -chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] btrfs: wait for bdev put
  2016-06-22 10:18                       ` Anand Jain
@ 2016-06-22 21:47                         ` Chris Mason
  2016-06-23 13:07                           ` Anand Jain
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2016-06-22 21:47 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, holger, dsterba, xiaolong.ye



On Wed, Jun 22, 2016 at 6:18 AM, Anand Jain <anand.jain@oracle.com> 
wrote:
> 
> 
>  Thanks for the review Chris.
> 
> On 06/21/2016 09:00 PM, Chris Mason wrote:
>> On 06/21/2016 06:24 AM, Anand Jain wrote:
>>> From: Anand Jain <Anand.Jain@oracle.com>
>>> 
>>> Further to the commit
>>>       bc178622d40d87e75abc131007342429c9b03351
>>>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>> 
>>> This patch implements a method to time wait on the __free_device()
>>> which actually does the bdev put. This is needed as the user space
>>> running 'btrfs fi show -d' immediately after the replace and
>>> unmount, is still reading older information from the device.
>> 
>> Thanks for working on this Anand.  Since it looks like blkdev_put can
>> deadlock against us, can we please switch to making sure we fully 
>> flush
>> the outstanding IO?  It's probably enough to do a sync_blockdev() 
>> call
>> before we allow the unmount to finish, but we can toss in an
>> invalidate_bdev for good measure.
> 
> 
> ------------
> # git diff
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 604daf315669..e0ad29d6fe9a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct 
> btrfs_device *device)
>         if (device->missing)
>                 fs_devices->missing_devices--;
> 
> +       if (device->bdev && device->writeable) {
> +               sync_blockdev(device->bdev);
> +               invalidate_bdev(device->bdev);
> +       }
> +
>         new_device = btrfs_alloc_device(NULL, &device->devid,
>                                         device->uuid);
>         BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
> -----------
> 
> 
>  However, theoretically still there might be a problem - at the end of
>  unmount, if the device exclusive open is not actually closed, then
>  there might be a race with another program which is trying to open
>  the device in exclusive mode. Like for eg:
>       unmount /btrfs; fsck /dev/X
>  and here fsck might fail to open the device if it wins the race.

This true, but at least we know he'll have up to date buffers if he 
does manage to open the device.

With the generic code, the blkdev_put happens after the super is gone, 
so I'm not sure we can completely fix this from inside our callback.

-chris




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 2/2] btrfs: make sure device is synced before return
  2016-06-14 10:55                 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain
  2016-06-18 16:34                   ` Holger Hoffstätte
  2016-06-21 10:24                   ` [PATCH v2 " Anand Jain
@ 2016-06-23 12:54                   ` Anand Jain
  2016-06-23 14:27                     ` Chris Mason
  2016-07-08 14:13                     ` David Sterba
  2 siblings, 2 replies; 21+ messages in thread
From: Anand Jain @ 2016-06-23 12:54 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, holger, clm, xiaolong.ye

An inconsistent behavior due to stale reads from the
disk was reported

  mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html

This patch will make sure devices are synced before
return in the unmount thread.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 v2: Also to make sure bdev_closing is set it needs rcu_barrier(),
     restored rcu_barrier().
 v3: Removed the complicated timed-wait on blkdev_put code,
     but make synced and invalidated before umount is returned.
     So sorry that I have to change the title of this patch
     as well (was: btrfs: wait for bdev put).

 fs/btrfs/volumes.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 604daf315669..f741ade130a4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 	if (device->missing)
 		fs_devices->missing_devices--;
 
+	if (device->bdev && device->writeable) {
+		sync_blockdev(device->bdev);
+		invalidate_bdev(device->bdev);
+	}
+
 	new_device = btrfs_alloc_device(NULL, &device->devid,
 					device->uuid);
 	BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] btrfs: wait for bdev put
  2016-06-22 21:47                         ` Chris Mason
@ 2016-06-23 13:07                           ` Anand Jain
  0 siblings, 0 replies; 21+ messages in thread
From: Anand Jain @ 2016-06-23 13:07 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, holger, dsterba, xiaolong.ye



On 06/23/2016 05:47 AM, Chris Mason wrote:
>
>
> On Wed, Jun 22, 2016 at 6:18 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>>  Thanks for the review Chris.
>>
>> On 06/21/2016 09:00 PM, Chris Mason wrote:
>>> On 06/21/2016 06:24 AM, Anand Jain wrote:
>>>> From: Anand Jain <Anand.Jain@oracle.com>
>>>>
>>>> Further to the commit
>>>>       bc178622d40d87e75abc131007342429c9b03351
>>>>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>>>
>>>> This patch implements a method to time wait on the __free_device()
>>>> which actually does the bdev put. This is needed as the user space
>>>> running 'btrfs fi show -d' immediately after the replace and
>>>> unmount, is still reading older information from the device.
>>>
>>> Thanks for working on this Anand.  Since it looks like blkdev_put can
>>> deadlock against us, can we please switch to making sure we fully flush
>>> the outstanding IO?  It's probably enough to do a sync_blockdev() call
>>> before we allow the unmount to finish, but we can toss in an
>>> invalidate_bdev for good measure.
>>
>>
>> ------------
>> # git diff
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 604daf315669..e0ad29d6fe9a 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct
>> btrfs_device *device)
>>         if (device->missing)
>>                 fs_devices->missing_devices--;
>>
>> +       if (device->bdev && device->writeable) {
>> +               sync_blockdev(device->bdev);
>> +               invalidate_bdev(device->bdev);
>> +       }
>> +
>>         new_device = btrfs_alloc_device(NULL, &device->devid,
>>                                         device->uuid);
>>         BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
>> -----------
>>
>>
>>  However, theoretically still there might be a problem - at the end of
>>  unmount, if the device exclusive open is not actually closed, then
>>  there might be a race with another program which is trying to open
>>  the device in exclusive mode. Like for eg:
>>       unmount /btrfs; fsck /dev/X
>>  and here fsck might fail to open the device if it wins the race.
>
> This true, but at least we know he'll have up to date buffers if he does
> manage to open the device.
>
> With the generic code, the blkdev_put happens after the super is gone,
> so I'm not sure we can completely fix this from inside our callback.

  Makes sense, sent out v3 with title
   (btrfs: make sure device is synced before return)

  Also sent out RFC patch
     btrfs: make sure device is synced before return
  where I have tried not to background blkdev_put(),
  which seems to be better, it works fine per fstests.


Thanks, Anand


> -chris
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/2] btrfs: make sure device is synced before return
  2016-06-23 12:54                   ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain
@ 2016-06-23 14:27                     ` Chris Mason
  2016-07-08 14:13                     ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Mason @ 2016-06-23 14:27 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba, holger, xiaolong.ye



On 06/23/2016 08:54 AM, Anand Jain wrote:
> An inconsistent behavior due to stale reads from the
> disk was reported
>
>   mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html
>
> This patch will make sure devices are synced before
> return in the unmount thread.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  v2: Also to make sure bdev_closing is set it needs rcu_barrier(),
>      restored rcu_barrier().
>  v3: Removed the complicated timed-wait on blkdev_put code,
>      but make synced and invalidated before umount is returned.
>      So sorry that I have to change the title of this patch
>      as well (was: btrfs: wait for bdev put).
>
>  fs/btrfs/volumes.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 604daf315669..f741ade130a4 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct btrfs_device *device)
>  	if (device->missing)
>  		fs_devices->missing_devices--;
>
> +	if (device->bdev && device->writeable) {
> +		sync_blockdev(device->bdev);
> +		invalidate_bdev(device->bdev);
> +	}
> +
>  	new_device = btrfs_alloc_device(NULL, &device->devid,
>  					device->uuid);
>  	BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
>

This should do it,I'll check with the xfstest I was able to reliably get 
errors on.

Thanks!

-chris

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/2] btrfs: make sure device is synced before return
  2016-06-23 12:54                   ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain
  2016-06-23 14:27                     ` Chris Mason
@ 2016-07-08 14:13                     ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: David Sterba @ 2016-07-08 14:13 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba, holger, clm, xiaolong.ye

On Thu, Jun 23, 2016 at 08:54:07PM +0800, Anand Jain wrote:
> An inconsistent behavior due to stale reads from the
> disk was reported
> 
>   mail-archive.com/linux-btrfs@vger.kernel.org/msg54188.html
> 
> This patch will make sure devices are synced before
> return in the unmount thread.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Added to for-next.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-07-08 14:13 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26  9:27 [PULL] Btrfs for 4.7, part 2 David Sterba
2016-05-27  0:14 ` Chris Mason
2016-05-27 11:18   ` David Sterba
2016-05-27 14:35     ` Chris Mason
2016-05-27 15:42       ` Chris Mason
2016-05-28  5:14         ` Anand Jain
2016-05-29 12:21           ` Chris Mason
2016-06-14 10:52             ` Anand Jain
2016-06-14 10:55               ` [PATCH 1/2] btrfs: reorg btrfs_close_one_device() Anand Jain
2016-06-14 10:55                 ` [PATCH 2/2] btrfs: wait for bdev put Anand Jain
2016-06-18 16:34                   ` Holger Hoffstätte
2016-06-20  8:33                     ` Anand Jain
2016-06-21 10:24                   ` [PATCH v2 " Anand Jain
2016-06-21 11:46                     ` Holger Hoffstätte
2016-06-21 13:00                     ` Chris Mason
2016-06-22 10:18                       ` Anand Jain
2016-06-22 21:47                         ` Chris Mason
2016-06-23 13:07                           ` Anand Jain
2016-06-23 12:54                   ` [PATCH v3 2/2] btrfs: make sure device is synced before return Anand Jain
2016-06-23 14:27                     ` Chris Mason
2016-07-08 14:13                     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.