All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: free btrfs_device in place
@ 2017-10-10 21:51 Liu Bo
  2017-10-11  6:56 ` Anand Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Liu Bo @ 2017-10-10 21:51 UTC (permalink / raw)
  To: linux-btrfs

It's pointless to defer it to a kthread helper as we're not under any
special context.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/volumes.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d983cea..4a72c45 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -836,26 +836,16 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 	mutex_unlock(&uuid_mutex);
 }
 
-static void __free_device(struct work_struct *work)
+static void free_device(struct rcu_head *head)
 {
 	struct btrfs_device *device;
 
-	device = container_of(work, struct btrfs_device, rcu_work);
+	device = container_of(head, struct btrfs_device, rcu);
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
 	kfree(device);
 }
 
-static void free_device(struct rcu_head *head)
-{
-	struct btrfs_device *device;
-
-	device = container_of(head, struct btrfs_device, rcu);
-
-	INIT_WORK(&device->rcu_work, __free_device);
-	schedule_work(&device->rcu_work);
-}
-
 static void btrfs_close_bdev(struct btrfs_device *device)
 {
 	if (device->bdev && device->writeable) {
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: free btrfs_device in place
  2017-10-10 21:51 [PATCH] Btrfs: free btrfs_device in place Liu Bo
@ 2017-10-11  6:56 ` Anand Jain
  2017-10-11 17:54 ` David Sterba
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2017-10-11  6:56 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs, linux_lkml_grp



On 10/11/2017 05:51 AM, Liu Bo wrote:
> It's pointless to defer it to a kthread helper as we're not under any
> special context.

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>   fs/btrfs/volumes.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d983cea..4a72c45 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -836,26 +836,16 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
>   	mutex_unlock(&uuid_mutex);
>   }
>   
> -static void __free_device(struct work_struct *work)
> +static void free_device(struct rcu_head *head)
>   {
>   	struct btrfs_device *device;
>   
> -	device = container_of(work, struct btrfs_device, rcu_work);
> +	device = container_of(head, struct btrfs_device, rcu);
>   	rcu_string_free(device->name);
>   	bio_put(device->flush_bio);
>   	kfree(device);
>   }
>   
> -static void free_device(struct rcu_head *head)
> -{
> -	struct btrfs_device *device;
> -
> -	device = container_of(head, struct btrfs_device, rcu);
> -
> -	INIT_WORK(&device->rcu_work, __free_device);
> -	schedule_work(&device->rcu_work);
> -}
> -
>   static void btrfs_close_bdev(struct btrfs_device *device)
>   {
>   	if (device->bdev && device->writeable) {
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: free btrfs_device in place
  2017-10-11 17:54 ` David Sterba
@ 2017-10-11 17:21   ` Liu Bo
  0 siblings, 0 replies; 8+ messages in thread
From: Liu Bo @ 2017-10-11 17:21 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On Wed, Oct 11, 2017 at 07:54:04PM +0200, David Sterba wrote:
> On Tue, Oct 10, 2017 at 03:51:03PM -0600, Liu Bo wrote:
> > It's pointless to defer it to a kthread helper as we're not under any
> > special context.
> 
> I agree the doubly deferred freeing is pointless. It's a weird mix of
> RCU and workques and understanding all the interactions turned out to be
> hard, last time I tried.
> 
> The RCU stuff needs the rcu_barriers, and the callback can be served
> from any process context. While the workqueus have their dedicated
> kthreads.
> 
> Calling free_device() is quick, it just adds the work to the queue and
> returns. This makes __btrfs_close_devices/btrfs_rm_device/... and all
> other callers fast, at the cost that there must be some explicit barrier
> or waiting done when we want to make sure all the device resources have
> been freed.
> 
> I can't say the quick return is wrong, but it makes the device lifetime
> hard to understand. The device freeing callback (__free_device) is
> lightweight, but also calls "rcu_free" for the device name.
>

I did check the history, at the time that this mix was introduced, it
was not under from a non-sleep context, so I'm not sure why it needs
to be deferred.

rcu_barrier() was needed because we used to do blkdev_put() in this
free_device(), which means the bdev's refcnt is held until the rcu
callback runs.  Later we found this is unnecessary because rcu
protected resources and blkdev_put() can be done separated.

> I have WIP patches to clean up the rcu and locking around devices and
> actually document the rules, but with unreviewed pile in the mailinglist
> I can't tell when this is going to land. If you want to simplify at
> least the device freeing, please go on, and explain in the changelog
> that it's not going to break anything. The hand-wavy sentence is not
> what I'd expect :)

I'll update the changelog to clarify the concern.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: free btrfs_device in place
  2017-10-10 21:51 [PATCH] Btrfs: free btrfs_device in place Liu Bo
  2017-10-11  6:56 ` Anand Jain
@ 2017-10-11 17:54 ` David Sterba
  2017-10-11 17:21   ` Liu Bo
  2017-10-11 18:13 ` [PATCH v2] " Liu Bo
  2017-10-24  5:02 ` [PATCH v3] " Liu Bo
  3 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2017-10-11 17:54 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Tue, Oct 10, 2017 at 03:51:03PM -0600, Liu Bo wrote:
> It's pointless to defer it to a kthread helper as we're not under any
> special context.

I agree the doubly deferred freeing is pointless. It's a weird mix of
RCU and workques and understanding all the interactions turned out to be
hard, last time I tried.

The RCU stuff needs the rcu_barriers, and the callback can be served
from any process context. While the workqueus have their dedicated
kthreads.

Calling free_device() is quick, it just adds the work to the queue and
returns. This makes __btrfs_close_devices/btrfs_rm_device/... and all
other callers fast, at the cost that there must be some explicit barrier
or waiting done when we want to make sure all the device resources have
been freed.

I can't say the quick return is wrong, but it makes the device lifetime
hard to understand. The device freeing callback (__free_device) is
lightweight, but also calls "rcu_free" for the device name.

I have WIP patches to clean up the rcu and locking around devices and
actually document the rules, but with unreviewed pile in the mailinglist
I can't tell when this is going to land. If you want to simplify at
least the device freeing, please go on, and explain in the changelog
that it's not going to break anything. The hand-wavy sentence is not
what I'd expect :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] Btrfs: free btrfs_device in place
  2017-10-10 21:51 [PATCH] Btrfs: free btrfs_device in place Liu Bo
  2017-10-11  6:56 ` Anand Jain
  2017-10-11 17:54 ` David Sterba
@ 2017-10-11 18:13 ` Liu Bo
  2017-10-20 17:52   ` David Sterba
  2017-10-24  5:02 ` [PATCH v3] " Liu Bo
  3 siblings, 1 reply; 8+ messages in thread
From: Liu Bo @ 2017-10-11 18:13 UTC (permalink / raw)
  To: linux-btrfs

It's pointless to defer it to a kthread helper as we're not under any
special context.

A bit more about device's lifetime, filesystems use an exclusive way
to access devices and hold a reference count on the devices in use.
Now that we've run blkdev_put() in btrfs_close_bdev(), device->bdev's
lifetime ends at btrfs_close_bdev(), not free_device(), and %bdev is
the only thing that others who need to access it really care about.

Since free_device() only frees the resources of 'struct btrfs_device',
this change won't result in the problem, ie. others like mkfs and md
are unable to access the device immediately after we do umount.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
---

v2: Clarify the lifetime of device and device->bdev respectively and
    clear the concern about raising the 'device is in use' problem.

 fs/btrfs/volumes.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d983cea..4a72c45 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -836,26 +836,16 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 	mutex_unlock(&uuid_mutex);
 }
 
-static void __free_device(struct work_struct *work)
+static void free_device(struct rcu_head *head)
 {
 	struct btrfs_device *device;
 
-	device = container_of(work, struct btrfs_device, rcu_work);
+	device = container_of(head, struct btrfs_device, rcu);
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
 	kfree(device);
 }
 
-static void free_device(struct rcu_head *head)
-{
-	struct btrfs_device *device;
-
-	device = container_of(head, struct btrfs_device, rcu);
-
-	INIT_WORK(&device->rcu_work, __free_device);
-	schedule_work(&device->rcu_work);
-}
-
 static void btrfs_close_bdev(struct btrfs_device *device)
 {
 	if (device->bdev && device->writeable) {
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Btrfs: free btrfs_device in place
  2017-10-11 18:13 ` [PATCH v2] " Liu Bo
@ 2017-10-20 17:52   ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2017-10-20 17:52 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Wed, Oct 11, 2017 at 12:13:30PM -0600, Liu Bo wrote:
> It's pointless to defer it to a kthread helper as we're not under any
> special context.
> 
> A bit more about device's lifetime, filesystems use an exclusive way
> to access devices and hold a reference count on the devices in use.
> Now that we've run blkdev_put() in btrfs_close_bdev(), device->bdev's
> lifetime ends at btrfs_close_bdev(), not free_device(), and %bdev is
> the only thing that others who need to access it really care about.
> 
> Since free_device() only frees the resources of 'struct btrfs_device',
> this change won't result in the problem, ie. others like mkfs and md
> are unable to access the device immediately after we do umount.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
> ---
> 
> v2: Clarify the lifetime of device and device->bdev respectively and
>     clear the concern about raising the 'device is in use' problem.

I found my version of the patch, the diff is the same, plus we can now remove
the rcu_work hook

--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -133,7 +133,6 @@ struct btrfs_device {

        struct btrfs_work work;
        struct rcu_head rcu;
-       struct work_struct rcu_work;

        /* readahead state */
        spinlock_t reada_lock;

The changelog is outdated and still mentions the blkdev_put, that's now
obsolete. I'd still want to keep the commit ids for reference. Can you please
merge it to your changelog and resend? Thanks.

"
    Commit "Btrfs: using rcu lock in the reader side of devices list"
    (1f78160ce1b1b8e657e2248118c4d91f881763f0) introdced RCU freeing for
    device structures. In a way that is questionable to say at least.

    One of the strange points is the freeing of device: schedule an RCU
    callback and from that schedule a workqueue callback that actually drops
    the bdev and frees the structure.

    Elswehere in the code we checkpoint the RCU with call_rcu or
    rcu_barrier, although with the extra call to workqueu, there might be
    some pending operations still unfinished. This can lead to a busy device
    after umount, similar to what commit "btrfs: use rcu_barrier() to wait
    for bdev puts at unmount" (bc178622d40d87e75abc131007342429c9b03351)
    fixed.

    Drop the extra step and don't schedule the work. We can free directly
    from the RCU callback. While this means we might do more work on in the
    RCU callback context, ie. calling blkdev_put, this is not an issue. The
    device call_rcu is always preceded by sync and invalidation of the block
    device so we're left only with the lightweight operations.
"

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3] Btrfs: free btrfs_device in place
  2017-10-10 21:51 [PATCH] Btrfs: free btrfs_device in place Liu Bo
                   ` (2 preceding siblings ...)
  2017-10-11 18:13 ` [PATCH v2] " Liu Bo
@ 2017-10-24  5:02 ` Liu Bo
  2017-10-25 13:13   ` David Sterba
  3 siblings, 1 reply; 8+ messages in thread
From: Liu Bo @ 2017-10-24  5:02 UTC (permalink / raw)
  To: linux-btrfs

It's pointless to defer it to a kthread helper as we're not under a
special context.

For reference, commit 1f78160ce1b1 ("Btrfs: using rcu lock in the
reader side of devices list") introduced RCU freeing for device
structures.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
---

v3: - Enhance changelog with commit id which introduced this for future
      reference.
    - Now we can remove %rcu_work.

v2: - Clarify the lifetime of device and device->bdev respectively and
      clear the concern about raising the 'device is in use' problem.

 fs/btrfs/volumes.c | 14 ++------------
 fs/btrfs/volumes.h |  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d983cea..4a72c45 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -836,26 +836,16 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 	mutex_unlock(&uuid_mutex);
 }
 
-static void __free_device(struct work_struct *work)
+static void free_device(struct rcu_head *head)
 {
 	struct btrfs_device *device;
 
-	device = container_of(work, struct btrfs_device, rcu_work);
+	device = container_of(head, struct btrfs_device, rcu);
 	rcu_string_free(device->name);
 	bio_put(device->flush_bio);
 	kfree(device);
 }
 
-static void free_device(struct rcu_head *head)
-{
-	struct btrfs_device *device;
-
-	device = container_of(head, struct btrfs_device, rcu);
-
-	INIT_WORK(&device->rcu_work, __free_device);
-	schedule_work(&device->rcu_work);
-}
-
 static void btrfs_close_bdev(struct btrfs_device *device)
 {
 	if (device->bdev && device->writeable) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6108fdf..f60c535 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -133,7 +133,6 @@ struct btrfs_device {
 
 	struct btrfs_work work;
 	struct rcu_head rcu;
-	struct work_struct rcu_work;
 
 	/* readahead state */
 	spinlock_t reada_lock;
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Btrfs: free btrfs_device in place
  2017-10-24  5:02 ` [PATCH v3] " Liu Bo
@ 2017-10-25 13:13   ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2017-10-25 13:13 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Mon, Oct 23, 2017 at 11:02:54PM -0600, Liu Bo wrote:
> It's pointless to defer it to a kthread helper as we're not under a
> special context.
> 
> For reference, commit 1f78160ce1b1 ("Btrfs: using rcu lock in the
> reader side of devices list") introduced RCU freeing for device
> structures.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> Reviewed-by: Anand Jain <anand.jain@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-25 13:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 21:51 [PATCH] Btrfs: free btrfs_device in place Liu Bo
2017-10-11  6:56 ` Anand Jain
2017-10-11 17:54 ` David Sterba
2017-10-11 17:21   ` Liu Bo
2017-10-11 18:13 ` [PATCH v2] " Liu Bo
2017-10-20 17:52   ` David Sterba
2017-10-24  5:02 ` [PATCH v3] " Liu Bo
2017-10-25 13:13   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.