linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Live resize of backing device
@ 2022-08-03 10:05 Andrea Tomassetti
  2022-08-04 14:32 ` Coly Li
  2022-08-05 19:38 ` Eric Wheeler
  0 siblings, 2 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2022-08-03 10:05 UTC (permalink / raw)
  To: Coly Li; +Cc: Kent Overstreet, linux-bcache

Hi Coly,
In one of our previous emails you said that
> Currently bcache doesn’t support cache or backing device resize

I was investigating this point and I actually found a solution. I
briefly tested it and it seems to work fine.
Basically what I'm doing is:
  1. Check if there's any discrepancy between the nr of sectors
reported by the bcache backing device (holder) and the nr of sectors
reported by its parent (slave).
  2. If the number of sectors of the two devices are not the same,
then call set_capacity_and_notify on the bcache device.
  3. From user space, depending on the fs used, grow the fs with some
utility (e.g. xfs_growfs)

This works without any need of unmounting the mounted fs nor stopping
the bcache backing device.

 So my question is: am I missing something? Can this live resize cause
some problems (e.g. data loss)? Would it be useful if I send a patch
on this?

Kind regards,
Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-08-03 10:05 [RFC] Live resize of backing device Andrea Tomassetti
@ 2022-08-04 14:32 ` Coly Li
  2022-08-05 19:38 ` Eric Wheeler
  1 sibling, 0 replies; 29+ messages in thread
From: Coly Li @ 2022-08-04 14:32 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Kent Overstreet, linux-bcache



> 2022年8月3日 18:05,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> Hi Coly,
> In one of our previous emails you said that
>> Currently bcache doesn’t support cache or backing device resize
> 
> I was investigating this point and I actually found a solution. I
> briefly tested it and it seems to work fine.
> Basically what I'm doing is:
>  1. Check if there's any discrepancy between the nr of sectors
> reported by the bcache backing device (holder) and the nr of sectors
> reported by its parent (slave).
>  2. If the number of sectors of the two devices are not the same,
> then call set_capacity_and_notify on the bcache device.
>  3. From user space, depending on the fs used, grow the fs with some
> utility (e.g. xfs_growfs)
> 
> This works without any need of unmounting the mounted fs nor stopping
> the bcache backing device.
> 
> So my question is: am I missing something? Can this live resize cause
> some problems (e.g. data loss)? Would it be useful if I send a patch
> on this?

Hi Andrea,

It sounds good. You may look at bcache_device_init() to see how the following items are initialized,
- d->nr_stripes
- d->stripe_sectors_dirty
- d->full_dirty_stripes

And you may check calc_cached_dev_sectors() to see how this item is initilaized,
- c->cached_dev_sectors

All the above items are writeback related, when they are updated with the size increased, you should use proper locks to avoid potential race from where they are referenced in the writeback code flow (maybe somewhere else too).

The overall idea should work, I’d like to see your tested patch :-)

Coly Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-08-03 10:05 [RFC] Live resize of backing device Andrea Tomassetti
  2022-08-04 14:32 ` Coly Li
@ 2022-08-05 19:38 ` Eric Wheeler
  2022-09-06 13:22   ` Andrea Tomassetti
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Wheeler @ 2022-08-05 19:38 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Coly Li, Kent Overstreet, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 2220 bytes --]

On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
> Hi Coly,
> In one of our previous emails you said that
> > Currently bcache doesn’t support cache or backing device resize
> 
> I was investigating this point and I actually found a solution. I
> briefly tested it and it seems to work fine.
> Basically what I'm doing is:
>   1. Check if there's any discrepancy between the nr of sectors
> reported by the bcache backing device (holder) and the nr of sectors
> reported by its parent (slave).
>   2. If the number of sectors of the two devices are not the same,
> then call set_capacity_and_notify on the bcache device.
>   3. From user space, depending on the fs used, grow the fs with some
> utility (e.g. xfs_growfs)
> 
> This works without any need of unmounting the mounted fs nor stopping
> the bcache backing device.
 
Well done! +1, would love to see a patch for this!

 
> So my question is: am I missing something? Can this live resize cause 
> some problems (e.g. data loss)? Would it be useful if I send a patch on 
> this?

A while a go we looked into doing this.  Here is the summary of our 
findings, not sure if there are any other considerations:

  1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger 
     resize on echo 1 >.
  2. Refactor the set_capacity() bits from  bcache_device_init() into its 
     own function.
  3. Put locks around bcache_device.full_dirty_stripes and 
     bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the 
     new bytes at the end.  Grep where bcache_device.full_dirty_stripes is 
     used and make sure it is locked appropriately, probably in the 
     writeback thread, maybe other places too.

The cachedev's don't know anything about the bdev size, so (according to 
Kent) they will "just work" by referencing new offsets that were 
previously beyond the disk. (This is basically the same as resizing the 
bdev and then unregister/re-register which is how we resize bdevs now.)

As for resizing a cachedev, I've not looked at all---not sure about that 
one.  We always detach, resize, make-bcache and re-attach the new cache.  
Maybe it is similarly simple, but haven't looked.


--
Eric Wheeler



> 
> Kind regards,
> Andrea
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-08-05 19:38 ` Eric Wheeler
@ 2022-09-06 13:22   ` Andrea Tomassetti
  2022-09-08  8:32     ` Andrea Tomassetti
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Tomassetti @ 2022-09-06 13:22 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

Hi Coly,
I have finally some time to work on the patch. I already have a first
prototype that seems to work but, before sending it, I would like to
ask you two questions:
  1. Inspecting the code, I found that the only lock mechanism is the
writeback_lock semaphore. Am I correct?
  2. How can I effectively test my patch? So far I'm doing something like this:
     a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
     b. mkfs.xfs /dev/bcache0
     c. dd if=/dev/random of=/mnt/random bs=1M count=1000
     d. md5sum /mnt/random | tee /mnt/random.md5
     e. live resize the disk and repeat c. and d.
     f. umount/reboot/remount and check that the md5 hashes are correct

Any suggestions?

Thank you very much,
Andrea

On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>
> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
> > Hi Coly,
> > In one of our previous emails you said that
> > > Currently bcache doesn’t support cache or backing device resize
> >
> > I was investigating this point and I actually found a solution. I
> > briefly tested it and it seems to work fine.
> > Basically what I'm doing is:
> >   1. Check if there's any discrepancy between the nr of sectors
> > reported by the bcache backing device (holder) and the nr of sectors
> > reported by its parent (slave).
> >   2. If the number of sectors of the two devices are not the same,
> > then call set_capacity_and_notify on the bcache device.
> >   3. From user space, depending on the fs used, grow the fs with some
> > utility (e.g. xfs_growfs)
> >
> > This works without any need of unmounting the mounted fs nor stopping
> > the bcache backing device.
>
> Well done! +1, would love to see a patch for this!
>
>
> > So my question is: am I missing something? Can this live resize cause
> > some problems (e.g. data loss)? Would it be useful if I send a patch on
> > this?
>
> A while a go we looked into doing this.  Here is the summary of our
> findings, not sure if there are any other considerations:
>
>   1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
>      resize on echo 1 >.
>   2. Refactor the set_capacity() bits from  bcache_device_init() into its
>      own function.
>   3. Put locks around bcache_device.full_dirty_stripes and
>      bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
>      new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
>      used and make sure it is locked appropriately, probably in the
>      writeback thread, maybe other places too.
>
> The cachedev's don't know anything about the bdev size, so (according to
> Kent) they will "just work" by referencing new offsets that were
> previously beyond the disk. (This is basically the same as resizing the
> bdev and then unregister/re-register which is how we resize bdevs now.)
>
> As for resizing a cachedev, I've not looked at all---not sure about that
> one.  We always detach, resize, make-bcache and re-attach the new cache.
> Maybe it is similarly simple, but haven't looked.
>
>
> --
> Eric Wheeler
>
>
>
> >
> > Kind regards,
> > Andrea
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-09-06 13:22   ` Andrea Tomassetti
@ 2022-09-08  8:32     ` Andrea Tomassetti
  2022-09-19 11:42       ` Andrea Tomassetti
  2022-12-30 10:40       ` Coly Li
  0 siblings, 2 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2022-09-08  8:32 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

 From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Date: Thu, 8 Sep 2022 09:47:55 +0200
Subject: [PATCH] bcache: Add support for live resize of backing devices

Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
---
Hi Coly,
Here is the first version of the patch. There are some points I noted down
that I would like to discuss with you:
  - I found it pretty convenient to hook the call of the new added function
    inside the `register_bcache`. In fact, every time (at least from my
    understandings) a disk changes size, it will trigger a new probe and,
    thus, `register_bcache` will be triggered. The only inconvenient
    is that, in case of success, the function will output
    `error: capacity changed` even if it's not really an error.
  - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
    `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
    problem, right?
  - There is some reused code between this new function and
    `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
    a broader scope or make it a macro, what do you think?

Thank you very much,
Andrea

  drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
  1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index ba3909bb6bea..9a77caf2a18f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
  	return bch_is_open_cache(dev) || bch_is_open_backing(dev);
  }

+static bool bch_update_capacity(dev_t dev)
+{
+	const size_t max_stripes = min_t(size_t, INT_MAX,
+					 SIZE_MAX / sizeof(atomic_t));
+
+	uint64_t n, n_old;
+	int nr_stripes_old;
+	bool res = false;
+
+	struct bcache_device *d;
+	struct cache_set *c, *tc;
+	struct cached_dev *dcp, *t, *dc = NULL;
+
+	uint64_t parent_nr_sectors;
+
+	list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
+		list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
+			if (dcp->bdev->bd_dev == dev) {
+				dc = dcp;
+				goto dc_found;
+			}
+
+dc_found:
+	if (!dc)
+		return false;
+
+	parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
+
+	if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
+		return false;
+
+	if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
+		return false;
+
+	d = &dc->disk;
+
+	/* Force cached device sectors re-calc */
+	calc_cached_dev_sectors(d->c);
+
+	/* Block writeback thread */
+	down_write(&dc->writeback_lock);
+	nr_stripes_old = d->nr_stripes;
+	n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
+	if (!n || n > max_stripes) {
+		pr_err("nr_stripes too large or invalid: %llu (start sector beyond 
end of disk?)\n",
+			n);
+		goto unblock_and_exit;
+	}
+	d->nr_stripes = n;
+
+	n = d->nr_stripes * sizeof(atomic_t);
+	n_old = nr_stripes_old * sizeof(atomic_t);
+	d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
+		n, GFP_KERNEL);
+	if (!d->stripe_sectors_dirty)
+		goto unblock_and_exit;
+
+	n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
+	n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
+	d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, 
GFP_KERNEL);
+	if (!d->full_dirty_stripes)
+		goto unblock_and_exit;
+
+	res = true;
+
+unblock_and_exit:
+	up_write(&dc->writeback_lock);
+	return res;
+}
+
  struct async_reg_args {
  	struct delayed_work reg_work;
  	char *path;
@@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, 
struct kobj_attribute *attr,
  			mutex_lock(&bch_register_lock);
  			if (lookup_bdev(strim(path), &dev) == 0 &&
  			    bch_is_open(dev))
-				err = "device already registered";
+				if (bch_update_capacity(dev))
+					err = "capacity changed";
+				else
+					err = "device already registered";
  			else
  				err = "device busy";
  			mutex_unlock(&bch_register_lock);
--
2.37.3



On 6/9/22 15:22, Andrea Tomassetti wrote:
> Hi Coly,
> I have finally some time to work on the patch. I already have a first
> prototype that seems to work but, before sending it, I would like to
> ask you two questions:
>    1. Inspecting the code, I found that the only lock mechanism is the
> writeback_lock semaphore. Am I correct?
>    2. How can I effectively test my patch? So far I'm doing something like this:
>       a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>       b. mkfs.xfs /dev/bcache0
>       c. dd if=/dev/random of=/mnt/random bs=1M count=1000
>       d. md5sum /mnt/random | tee /mnt/random.md5
>       e. live resize the disk and repeat c. and d.
>       f. umount/reboot/remount and check that the md5 hashes are correct
> 
> Any suggestions?
> 
> Thank you very much,
> Andrea
> 
> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>>
>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>> Hi Coly,
>>> In one of our previous emails you said that
>>>> Currently bcache doesn’t support cache or backing device resize
>>>
>>> I was investigating this point and I actually found a solution. I
>>> briefly tested it and it seems to work fine.
>>> Basically what I'm doing is:
>>>    1. Check if there's any discrepancy between the nr of sectors
>>> reported by the bcache backing device (holder) and the nr of sectors
>>> reported by its parent (slave).
>>>    2. If the number of sectors of the two devices are not the same,
>>> then call set_capacity_and_notify on the bcache device.
>>>    3. From user space, depending on the fs used, grow the fs with some
>>> utility (e.g. xfs_growfs)
>>>
>>> This works without any need of unmounting the mounted fs nor stopping
>>> the bcache backing device.
>>
>> Well done! +1, would love to see a patch for this!
>>
>>
>>> So my question is: am I missing something? Can this live resize cause
>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>> this?
>>
>> A while a go we looked into doing this.  Here is the summary of our
>> findings, not sure if there are any other considerations:
>>
>>    1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
>>       resize on echo 1 >.
>>    2. Refactor the set_capacity() bits from  bcache_device_init() into its
>>       own function.
>>    3. Put locks around bcache_device.full_dirty_stripes and
>>       bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
>>       new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
>>       used and make sure it is locked appropriately, probably in the
>>       writeback thread, maybe other places too.
>>
>> The cachedev's don't know anything about the bdev size, so (according to
>> Kent) they will "just work" by referencing new offsets that were
>> previously beyond the disk. (This is basically the same as resizing the
>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>
>> As for resizing a cachedev, I've not looked at all---not sure about that
>> one.  We always detach, resize, make-bcache and re-attach the new cache.
>> Maybe it is similarly simple, but haven't looked.
>>
>>
>> --
>> Eric Wheeler
>>
>>
>>
>>>
>>> Kind regards,
>>> Andrea
>>>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-09-08  8:32     ` Andrea Tomassetti
@ 2022-09-19 11:42       ` Andrea Tomassetti
  2022-09-19 12:16         ` Coly Li
  2022-12-30 10:40       ` Coly Li
  1 sibling, 1 reply; 29+ messages in thread
From: Andrea Tomassetti @ 2022-09-19 11:42 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

Hi Coly,
have you had any time to take a look at this? Do you prefer if I send 
the patch as a separate thread?

Thank you very much,
Andrea

On 8/9/22 10:32, Andrea Tomassetti wrote:
>  From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> Date: Thu, 8 Sep 2022 09:47:55 +0200
> Subject: [PATCH] bcache: Add support for live resize of backing devices
> 
> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> ---
> Hi Coly,
> Here is the first version of the patch. There are some points I noted down
> that I would like to discuss with you:
>   - I found it pretty convenient to hook the call of the new added function
>     inside the `register_bcache`. In fact, every time (at least from my
>     understandings) a disk changes size, it will trigger a new probe and,
>     thus, `register_bcache` will be triggered. The only inconvenient
>     is that, in case of success, the function will output
>     `error: capacity changed` even if it's not really an error.
>   - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
>     `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
>     problem, right?
>   - There is some reused code between this new function and
>     `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
>     a broader scope or make it a macro, what do you think?
> 
> Thank you very much,
> Andrea
> 
>   drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index ba3909bb6bea..9a77caf2a18f 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
>       return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>   }
> 
> +static bool bch_update_capacity(dev_t dev)
> +{
> +    const size_t max_stripes = min_t(size_t, INT_MAX,
> +                     SIZE_MAX / sizeof(atomic_t));
> +
> +    uint64_t n, n_old;
> +    int nr_stripes_old;
> +    bool res = false;
> +
> +    struct bcache_device *d;
> +    struct cache_set *c, *tc;
> +    struct cached_dev *dcp, *t, *dc = NULL;
> +
> +    uint64_t parent_nr_sectors;
> +
> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> +            if (dcp->bdev->bd_dev == dev) {
> +                dc = dcp;
> +                goto dc_found;
> +            }
> +
> +dc_found:
> +    if (!dc)
> +        return false;
> +
> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> +
> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> +        return false;
> +
> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
> +        return false;
> +
> +    d = &dc->disk;
> +
> +    /* Force cached device sectors re-calc */
> +    calc_cached_dev_sectors(d->c);
> +
> +    /* Block writeback thread */
> +    down_write(&dc->writeback_lock);
> +    nr_stripes_old = d->nr_stripes;
> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> +    if (!n || n > max_stripes) {
> +        pr_err("nr_stripes too large or invalid: %llu (start sector 
> beyond end of disk?)\n",
> +            n);
> +        goto unblock_and_exit;
> +    }
> +    d->nr_stripes = n;
> +
> +    n = d->nr_stripes * sizeof(atomic_t);
> +    n_old = nr_stripes_old * sizeof(atomic_t);
> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
> +        n, GFP_KERNEL);
> +    if (!d->stripe_sectors_dirty)
> +        goto unblock_and_exit;
> +
> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, 
> GFP_KERNEL);
> +    if (!d->full_dirty_stripes)
> +        goto unblock_and_exit;
> +
> +    res = true;
> +
> +unblock_and_exit:
> +    up_write(&dc->writeback_lock);
> +    return res;
> +}
> +
>   struct async_reg_args {
>       struct delayed_work reg_work;
>       char *path;
> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, 
> struct kobj_attribute *attr,
>               mutex_lock(&bch_register_lock);
>               if (lookup_bdev(strim(path), &dev) == 0 &&
>                   bch_is_open(dev))
> -                err = "device already registered";
> +                if (bch_update_capacity(dev))
> +                    err = "capacity changed";
> +                else
> +                    err = "device already registered";
>               else
>                   err = "device busy";
>               mutex_unlock(&bch_register_lock);
> -- 
> 2.37.3
> 
> 
> 
> On 6/9/22 15:22, Andrea Tomassetti wrote:
>> Hi Coly,
>> I have finally some time to work on the patch. I already have a first
>> prototype that seems to work but, before sending it, I would like to
>> ask you two questions:
>>    1. Inspecting the code, I found that the only lock mechanism is the
>> writeback_lock semaphore. Am I correct?
>>    2. How can I effectively test my patch? So far I'm doing something 
>> like this:
>>       a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>       b. mkfs.xfs /dev/bcache0
>>       c. dd if=/dev/random of=/mnt/random bs=1M count=1000
>>       d. md5sum /mnt/random | tee /mnt/random.md5
>>       e. live resize the disk and repeat c. and d.
>>       f. umount/reboot/remount and check that the md5 hashes are correct
>>
>> Any suggestions?
>>
>> Thank you very much,
>> Andrea
>>
>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler 
>> <bcache@lists.ewheeler.net> wrote:
>>>
>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>>> Hi Coly,
>>>> In one of our previous emails you said that
>>>>> Currently bcache doesn’t support cache or backing device resize
>>>>
>>>> I was investigating this point and I actually found a solution. I
>>>> briefly tested it and it seems to work fine.
>>>> Basically what I'm doing is:
>>>>    1. Check if there's any discrepancy between the nr of sectors
>>>> reported by the bcache backing device (holder) and the nr of sectors
>>>> reported by its parent (slave).
>>>>    2. If the number of sectors of the two devices are not the same,
>>>> then call set_capacity_and_notify on the bcache device.
>>>>    3. From user space, depending on the fs used, grow the fs with some
>>>> utility (e.g. xfs_growfs)
>>>>
>>>> This works without any need of unmounting the mounted fs nor stopping
>>>> the bcache backing device.
>>>
>>> Well done! +1, would love to see a patch for this!
>>>
>>>
>>>> So my question is: am I missing something? Can this live resize cause
>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>>> this?
>>>
>>> A while a go we looked into doing this.  Here is the summary of our
>>> findings, not sure if there are any other considerations:
>>>
>>>    1. Create a sysfs file like /sys/block/bcache0/bcache/resize to 
>>> trigger
>>>       resize on echo 1 >.
>>>    2. Refactor the set_capacity() bits from  bcache_device_init() 
>>> into its
>>>       own function.
>>>    3. Put locks around bcache_device.full_dirty_stripes and
>>>       bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and 
>>> zero the
>>>       new bytes at the end.  Grep where 
>>> bcache_device.full_dirty_stripes is
>>>       used and make sure it is locked appropriately, probably in the
>>>       writeback thread, maybe other places too.
>>>
>>> The cachedev's don't know anything about the bdev size, so (according to
>>> Kent) they will "just work" by referencing new offsets that were
>>> previously beyond the disk. (This is basically the same as resizing the
>>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>>
>>> As for resizing a cachedev, I've not looked at all---not sure about that
>>> one.  We always detach, resize, make-bcache and re-attach the new cache.
>>> Maybe it is similarly simple, but haven't looked.
>>>
>>>
>>> -- 
>>> Eric Wheeler
>>>
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Andrea
>>>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-09-19 11:42       ` Andrea Tomassetti
@ 2022-09-19 12:16         ` Coly Li
  2022-12-09  8:57           ` Andrea Tomassetti
  0 siblings, 1 reply; 29+ messages in thread
From: Coly Li @ 2022-09-19 12:16 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache



> 2022年9月19日 19:42,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> Hi Coly,
> have you had any time to take a look at this? Do you prefer if I send the patch as a separate thread?
> 
> Thank you very much,
> Andrea


Yes, it is on my queue, just after I finish my tasks on hand, I will take a look on it.

Thanks.

Coly Li


> 
> On 8/9/22 10:32, Andrea Tomassetti wrote:
>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>> Subject: [PATCH] bcache: Add support for live resize of backing devices
>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>> ---
>> Hi Coly,
>> Here is the first version of the patch. There are some points I noted down
>> that I would like to discuss with you:
>>  - I found it pretty convenient to hook the call of the new added function
>>    inside the `register_bcache`. In fact, every time (at least from my
>>    understandings) a disk changes size, it will trigger a new probe and,
>>    thus, `register_bcache` will be triggered. The only inconvenient
>>    is that, in case of success, the function will output
>>    `error: capacity changed` even if it's not really an error.
>>  - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
>>    `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
>>    problem, right?
>>  - There is some reused code between this new function and
>>    `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
>>    a broader scope or make it a macro, what do you think?
>> Thank you very much,
>> Andrea
>>  drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 74 insertions(+), 1 deletion(-)
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index ba3909bb6bea..9a77caf2a18f 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
>>      return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>>  }
>> +static bool bch_update_capacity(dev_t dev)
>> +{
>> +    const size_t max_stripes = min_t(size_t, INT_MAX,
>> +                     SIZE_MAX / sizeof(atomic_t));
>> +
>> +    uint64_t n, n_old;
>> +    int nr_stripes_old;
>> +    bool res = false;
>> +
>> +    struct bcache_device *d;
>> +    struct cache_set *c, *tc;
>> +    struct cached_dev *dcp, *t, *dc = NULL;
>> +
>> +    uint64_t parent_nr_sectors;
>> +
>> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
>> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
>> +            if (dcp->bdev->bd_dev == dev) {
>> +                dc = dcp;
>> +                goto dc_found;
>> +            }
>> +
>> +dc_found:
>> +    if (!dc)
>> +        return false;
>> +
>> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
>> +
>> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
>> +        return false;
>> +
>> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
>> +        return false;
>> +
>> +    d = &dc->disk;
>> +
>> +    /* Force cached device sectors re-calc */
>> +    calc_cached_dev_sectors(d->c);
>> +
>> +    /* Block writeback thread */
>> +    down_write(&dc->writeback_lock);
>> +    nr_stripes_old = d->nr_stripes;
>> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
>> +    if (!n || n > max_stripes) {
>> +        pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n",
>> +            n);
>> +        goto unblock_and_exit;
>> +    }
>> +    d->nr_stripes = n;
>> +
>> +    n = d->nr_stripes * sizeof(atomic_t);
>> +    n_old = nr_stripes_old * sizeof(atomic_t);
>> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
>> +        n, GFP_KERNEL);
>> +    if (!d->stripe_sectors_dirty)
>> +        goto unblock_and_exit;
>> +
>> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
>> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
>> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
>> +    if (!d->full_dirty_stripes)
>> +        goto unblock_and_exit;
>> +
>> +    res = true;
>> +
>> +unblock_and_exit:
>> +    up_write(&dc->writeback_lock);
>> +    return res;
>> +}
>> +
>>  struct async_reg_args {
>>      struct delayed_work reg_work;
>>      char *path;
>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>>              mutex_lock(&bch_register_lock);
>>              if (lookup_bdev(strim(path), &dev) == 0 &&
>>                  bch_is_open(dev))
>> -                err = "device already registered";
>> +                if (bch_update_capacity(dev))
>> +                    err = "capacity changed";
>> +                else
>> +                    err = "device already registered";
>>              else
>>                  err = "device busy";
>>              mutex_unlock(&bch_register_lock);
>> -- 
>> 2.37.3
>> On 6/9/22 15:22, Andrea Tomassetti wrote:
>>> Hi Coly,
>>> I have finally some time to work on the patch. I already have a first
>>> prototype that seems to work but, before sending it, I would like to
>>> ask you two questions:
>>>    1. Inspecting the code, I found that the only lock mechanism is the
>>> writeback_lock semaphore. Am I correct?
>>>    2. How can I effectively test my patch? So far I'm doing something like this:
>>>       a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>>       b. mkfs.xfs /dev/bcache0
>>>       c. dd if=/dev/random of=/mnt/random bs=1M count=1000
>>>       d. md5sum /mnt/random | tee /mnt/random.md5
>>>       e. live resize the disk and repeat c. and d.
>>>       f. umount/reboot/remount and check that the md5 hashes are correct
>>> 
>>> Any suggestions?
>>> 
>>> Thank you very much,
>>> Andrea
>>> 
>>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>>>> 
>>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>>>> Hi Coly,
>>>>> In one of our previous emails you said that
>>>>>> Currently bcache doesn’t support cache or backing device resize
>>>>> 
>>>>> I was investigating this point and I actually found a solution. I
>>>>> briefly tested it and it seems to work fine.
>>>>> Basically what I'm doing is:
>>>>>    1. Check if there's any discrepancy between the nr of sectors
>>>>> reported by the bcache backing device (holder) and the nr of sectors
>>>>> reported by its parent (slave).
>>>>>    2. If the number of sectors of the two devices are not the same,
>>>>> then call set_capacity_and_notify on the bcache device.
>>>>>    3. From user space, depending on the fs used, grow the fs with some
>>>>> utility (e.g. xfs_growfs)
>>>>> 
>>>>> This works without any need of unmounting the mounted fs nor stopping
>>>>> the bcache backing device.
>>>> 
>>>> Well done! +1, would love to see a patch for this!
>>>> 
>>>> 
>>>>> So my question is: am I missing something? Can this live resize cause
>>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>>>> this?
>>>> 
>>>> A while a go we looked into doing this.  Here is the summary of our
>>>> findings, not sure if there are any other considerations:
>>>> 
>>>>    1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
>>>>       resize on echo 1 >.
>>>>    2. Refactor the set_capacity() bits from  bcache_device_init() into its
>>>>       own function.
>>>>    3. Put locks around bcache_device.full_dirty_stripes and
>>>>       bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
>>>>       new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
>>>>       used and make sure it is locked appropriately, probably in the
>>>>       writeback thread, maybe other places too.
>>>> 
>>>> The cachedev's don't know anything about the bdev size, so (according to
>>>> Kent) they will "just work" by referencing new offsets that were
>>>> previously beyond the disk. (This is basically the same as resizing the
>>>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>>> 
>>>> As for resizing a cachedev, I've not looked at all---not sure about that
>>>> one.  We always detach, resize, make-bcache and re-attach the new cache.
>>>> Maybe it is similarly simple, but haven't looked.
>>>> 
>>>> 
>>>> -- 
>>>> Eric Wheeler
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Kind regards,
>>>>> Andrea
>>>>> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-09-19 12:16         ` Coly Li
@ 2022-12-09  8:57           ` Andrea Tomassetti
  2022-12-09  9:36             ` Coly Li
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Tomassetti @ 2022-12-09  8:57 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

Hi Coly,
just a kind reminder for this patch.

Thank you very much,
Andrea

On Mon, Sep 19, 2022 at 2:17 PM Coly Li <colyli@suse.de> wrote:
>
>
>
> > 2022年9月19日 19:42,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> >
> > Hi Coly,
> > have you had any time to take a look at this? Do you prefer if I send the patch as a separate thread?
> >
> > Thank you very much,
> > Andrea
>
>
> Yes, it is on my queue, just after I finish my tasks on hand, I will take a look on it.
>
> Thanks.
>
> Coly Li
>
>
> >
> > On 8/9/22 10:32, Andrea Tomassetti wrote:
> >> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> >> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> >> Date: Thu, 8 Sep 2022 09:47:55 +0200
> >> Subject: [PATCH] bcache: Add support for live resize of backing devices
> >> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> >> ---
> >> Hi Coly,
> >> Here is the first version of the patch. There are some points I noted down
> >> that I would like to discuss with you:
> >>  - I found it pretty convenient to hook the call of the new added function
> >>    inside the `register_bcache`. In fact, every time (at least from my
> >>    understandings) a disk changes size, it will trigger a new probe and,
> >>    thus, `register_bcache` will be triggered. The only inconvenient
> >>    is that, in case of success, the function will output
> >>    `error: capacity changed` even if it's not really an error.
> >>  - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
> >>    `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
> >>    problem, right?
> >>  - There is some reused code between this new function and
> >>    `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
> >>    a broader scope or make it a macro, what do you think?
> >> Thank you very much,
> >> Andrea
> >>  drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 74 insertions(+), 1 deletion(-)
> >> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> >> index ba3909bb6bea..9a77caf2a18f 100644
> >> --- a/drivers/md/bcache/super.c
> >> +++ b/drivers/md/bcache/super.c
> >> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
> >>      return bch_is_open_cache(dev) || bch_is_open_backing(dev);
> >>  }
> >> +static bool bch_update_capacity(dev_t dev)
> >> +{
> >> +    const size_t max_stripes = min_t(size_t, INT_MAX,
> >> +                     SIZE_MAX / sizeof(atomic_t));
> >> +
> >> +    uint64_t n, n_old;
> >> +    int nr_stripes_old;
> >> +    bool res = false;
> >> +
> >> +    struct bcache_device *d;
> >> +    struct cache_set *c, *tc;
> >> +    struct cached_dev *dcp, *t, *dc = NULL;
> >> +
> >> +    uint64_t parent_nr_sectors;
> >> +
> >> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> >> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> >> +            if (dcp->bdev->bd_dev == dev) {
> >> +                dc = dcp;
> >> +                goto dc_found;
> >> +            }
> >> +
> >> +dc_found:
> >> +    if (!dc)
> >> +        return false;
> >> +
> >> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> >> +
> >> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> >> +        return false;
> >> +
> >> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
> >> +        return false;
> >> +
> >> +    d = &dc->disk;
> >> +
> >> +    /* Force cached device sectors re-calc */
> >> +    calc_cached_dev_sectors(d->c);
> >> +
> >> +    /* Block writeback thread */
> >> +    down_write(&dc->writeback_lock);
> >> +    nr_stripes_old = d->nr_stripes;
> >> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> >> +    if (!n || n > max_stripes) {
> >> +        pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n",
> >> +            n);
> >> +        goto unblock_and_exit;
> >> +    }
> >> +    d->nr_stripes = n;
> >> +
> >> +    n = d->nr_stripes * sizeof(atomic_t);
> >> +    n_old = nr_stripes_old * sizeof(atomic_t);
> >> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
> >> +        n, GFP_KERNEL);
> >> +    if (!d->stripe_sectors_dirty)
> >> +        goto unblock_and_exit;
> >> +
> >> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> >> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> >> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
> >> +    if (!d->full_dirty_stripes)
> >> +        goto unblock_and_exit;
> >> +
> >> +    res = true;
> >> +
> >> +unblock_and_exit:
> >> +    up_write(&dc->writeback_lock);
> >> +    return res;
> >> +}
> >> +
> >>  struct async_reg_args {
> >>      struct delayed_work reg_work;
> >>      char *path;
> >> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
> >>              mutex_lock(&bch_register_lock);
> >>              if (lookup_bdev(strim(path), &dev) == 0 &&
> >>                  bch_is_open(dev))
> >> -                err = "device already registered";
> >> +                if (bch_update_capacity(dev))
> >> +                    err = "capacity changed";
> >> +                else
> >> +                    err = "device already registered";
> >>              else
> >>                  err = "device busy";
> >>              mutex_unlock(&bch_register_lock);
> >> --
> >> 2.37.3
> >> On 6/9/22 15:22, Andrea Tomassetti wrote:
> >>> Hi Coly,
> >>> I have finally some time to work on the patch. I already have a first
> >>> prototype that seems to work but, before sending it, I would like to
> >>> ask you two questions:
> >>>    1. Inspecting the code, I found that the only lock mechanism is the
> >>> writeback_lock semaphore. Am I correct?
> >>>    2. How can I effectively test my patch? So far I'm doing something like this:
> >>>       a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
> >>>       b. mkfs.xfs /dev/bcache0
> >>>       c. dd if=/dev/random of=/mnt/random bs=1M count=1000
> >>>       d. md5sum /mnt/random | tee /mnt/random.md5
> >>>       e. live resize the disk and repeat c. and d.
> >>>       f. umount/reboot/remount and check that the md5 hashes are correct
> >>>
> >>> Any suggestions?
> >>>
> >>> Thank you very much,
> >>> Andrea
> >>>
> >>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
> >>>>
> >>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
> >>>>> Hi Coly,
> >>>>> In one of our previous emails you said that
> >>>>>> Currently bcache doesn’t support cache or backing device resize
> >>>>>
> >>>>> I was investigating this point and I actually found a solution. I
> >>>>> briefly tested it and it seems to work fine.
> >>>>> Basically what I'm doing is:
> >>>>>    1. Check if there's any discrepancy between the nr of sectors
> >>>>> reported by the bcache backing device (holder) and the nr of sectors
> >>>>> reported by its parent (slave).
> >>>>>    2. If the number of sectors of the two devices are not the same,
> >>>>> then call set_capacity_and_notify on the bcache device.
> >>>>>    3. From user space, depending on the fs used, grow the fs with some
> >>>>> utility (e.g. xfs_growfs)
> >>>>>
> >>>>> This works without any need of unmounting the mounted fs nor stopping
> >>>>> the bcache backing device.
> >>>>
> >>>> Well done! +1, would love to see a patch for this!
> >>>>
> >>>>
> >>>>> So my question is: am I missing something? Can this live resize cause
> >>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
> >>>>> this?
> >>>>
> >>>> A while a go we looked into doing this.  Here is the summary of our
> >>>> findings, not sure if there are any other considerations:
> >>>>
> >>>>    1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
> >>>>       resize on echo 1 >.
> >>>>    2. Refactor the set_capacity() bits from  bcache_device_init() into its
> >>>>       own function.
> >>>>    3. Put locks around bcache_device.full_dirty_stripes and
> >>>>       bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
> >>>>       new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
> >>>>       used and make sure it is locked appropriately, probably in the
> >>>>       writeback thread, maybe other places too.
> >>>>
> >>>> The cachedev's don't know anything about the bdev size, so (according to
> >>>> Kent) they will "just work" by referencing new offsets that were
> >>>> previously beyond the disk. (This is basically the same as resizing the
> >>>> bdev and then unregister/re-register which is how we resize bdevs now.)
> >>>>
> >>>> As for resizing a cachedev, I've not looked at all---not sure about that
> >>>> one.  We always detach, resize, make-bcache and re-attach the new cache.
> >>>> Maybe it is similarly simple, but haven't looked.
> >>>>
> >>>>
> >>>> --
> >>>> Eric Wheeler
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> Kind regards,
> >>>>> Andrea
> >>>>>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-12-09  8:57           ` Andrea Tomassetti
@ 2022-12-09  9:36             ` Coly Li
  0 siblings, 0 replies; 29+ messages in thread
From: Coly Li @ 2022-12-09  9:36 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

Hi Andrea,

Hope I can have time late in this month, and response you.

Thanks.

Coly Li



> 2022年12月9日 16:57,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> Hi Coly,
> just a kind reminder for this patch.
> 
> Thank you very much,
> Andrea
> 
> On Mon, Sep 19, 2022 at 2:17 PM Coly Li <colyli@suse.de> wrote:
>> 
>> 
>> 
>>> 2022年9月19日 19:42,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
>>> 
>>> Hi Coly,
>>> have you had any time to take a look at this? Do you prefer if I send the patch as a separate thread?
>>> 
>>> Thank you very much,
>>> Andrea
>> 
>> 
>> Yes, it is on my queue, just after I finish my tasks on hand, I will take a look on it.
>> 
>> Thanks.
>> 
>> Coly Li
>> 
>> 
>>> 
>>> On 8/9/22 10:32, Andrea Tomassetti wrote:
>>>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
>>>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>>>> Subject: [PATCH] bcache: Add support for live resize of backing devices
>>>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>>> ---
>>>> Hi Coly,
>>>> Here is the first version of the patch. There are some points I noted down
>>>> that I would like to discuss with you:
>>>> - I found it pretty convenient to hook the call of the new added function
>>>>   inside the `register_bcache`. In fact, every time (at least from my
>>>>   understandings) a disk changes size, it will trigger a new probe and,
>>>>   thus, `register_bcache` will be triggered. The only inconvenient
>>>>   is that, in case of success, the function will output
>>>>   `error: capacity changed` even if it's not really an error.
>>>> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
>>>>   `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
>>>>   problem, right?
>>>> - There is some reused code between this new function and
>>>>   `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
>>>>   a broader scope or make it a macro, what do you think?
>>>> Thank you very much,
>>>> Andrea
>>>> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 74 insertions(+), 1 deletion(-)
>>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>>> index ba3909bb6bea..9a77caf2a18f 100644
>>>> --- a/drivers/md/bcache/super.c
>>>> +++ b/drivers/md/bcache/super.c
>>>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
>>>>     return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>>>> }
>>>> +static bool bch_update_capacity(dev_t dev)
>>>> +{
>>>> +    const size_t max_stripes = min_t(size_t, INT_MAX,
>>>> +                     SIZE_MAX / sizeof(atomic_t));
>>>> +
>>>> +    uint64_t n, n_old;
>>>> +    int nr_stripes_old;
>>>> +    bool res = false;
>>>> +
>>>> +    struct bcache_device *d;
>>>> +    struct cache_set *c, *tc;
>>>> +    struct cached_dev *dcp, *t, *dc = NULL;
>>>> +
>>>> +    uint64_t parent_nr_sectors;
>>>> +
>>>> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
>>>> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
>>>> +            if (dcp->bdev->bd_dev == dev) {
>>>> +                dc = dcp;
>>>> +                goto dc_found;
>>>> +            }
>>>> +
>>>> +dc_found:
>>>> +    if (!dc)
>>>> +        return false;
>>>> +
>>>> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
>>>> +
>>>> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
>>>> +        return false;
>>>> +
>>>> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
>>>> +        return false;
>>>> +
>>>> +    d = &dc->disk;
>>>> +
>>>> +    /* Force cached device sectors re-calc */
>>>> +    calc_cached_dev_sectors(d->c);
>>>> +
>>>> +    /* Block writeback thread */
>>>> +    down_write(&dc->writeback_lock);
>>>> +    nr_stripes_old = d->nr_stripes;
>>>> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
>>>> +    if (!n || n > max_stripes) {
>>>> +        pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n",
>>>> +            n);
>>>> +        goto unblock_and_exit;
>>>> +    }
>>>> +    d->nr_stripes = n;
>>>> +
>>>> +    n = d->nr_stripes * sizeof(atomic_t);
>>>> +    n_old = nr_stripes_old * sizeof(atomic_t);
>>>> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
>>>> +        n, GFP_KERNEL);
>>>> +    if (!d->stripe_sectors_dirty)
>>>> +        goto unblock_and_exit;
>>>> +
>>>> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
>>>> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
>>>> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
>>>> +    if (!d->full_dirty_stripes)
>>>> +        goto unblock_and_exit;
>>>> +
>>>> +    res = true;
>>>> +
>>>> +unblock_and_exit:
>>>> +    up_write(&dc->writeback_lock);
>>>> +    return res;
>>>> +}
>>>> +
>>>> struct async_reg_args {
>>>>     struct delayed_work reg_work;
>>>>     char *path;
>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>>>>             mutex_lock(&bch_register_lock);
>>>>             if (lookup_bdev(strim(path), &dev) == 0 &&
>>>>                 bch_is_open(dev))
>>>> -                err = "device already registered";
>>>> +                if (bch_update_capacity(dev))
>>>> +                    err = "capacity changed";
>>>> +                else
>>>> +                    err = "device already registered";
>>>>             else
>>>>                 err = "device busy";
>>>>             mutex_unlock(&bch_register_lock);
>>>> --
>>>> 2.37.3
>>>> On 6/9/22 15:22, Andrea Tomassetti wrote:
>>>>> Hi Coly,
>>>>> I have finally some time to work on the patch. I already have a first
>>>>> prototype that seems to work but, before sending it, I would like to
>>>>> ask you two questions:
>>>>>   1. Inspecting the code, I found that the only lock mechanism is the
>>>>> writeback_lock semaphore. Am I correct?
>>>>>   2. How can I effectively test my patch? So far I'm doing something like this:
>>>>>      a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>>>>      b. mkfs.xfs /dev/bcache0
>>>>>      c. dd if=/dev/random of=/mnt/random bs=1M count=1000
>>>>>      d. md5sum /mnt/random | tee /mnt/random.md5
>>>>>      e. live resize the disk and repeat c. and d.
>>>>>      f. umount/reboot/remount and check that the md5 hashes are correct
>>>>> 
>>>>> Any suggestions?
>>>>> 
>>>>> Thank you very much,
>>>>> Andrea
>>>>> 
>>>>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>>>>>> 
>>>>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>>>>>> Hi Coly,
>>>>>>> In one of our previous emails you said that
>>>>>>>> Currently bcache doesn’t support cache or backing device resize
>>>>>>> 
>>>>>>> I was investigating this point and I actually found a solution. I
>>>>>>> briefly tested it and it seems to work fine.
>>>>>>> Basically what I'm doing is:
>>>>>>>   1. Check if there's any discrepancy between the nr of sectors
>>>>>>> reported by the bcache backing device (holder) and the nr of sectors
>>>>>>> reported by its parent (slave).
>>>>>>>   2. If the number of sectors of the two devices are not the same,
>>>>>>> then call set_capacity_and_notify on the bcache device.
>>>>>>>   3. From user space, depending on the fs used, grow the fs with some
>>>>>>> utility (e.g. xfs_growfs)
>>>>>>> 
>>>>>>> This works without any need of unmounting the mounted fs nor stopping
>>>>>>> the bcache backing device.
>>>>>> 
>>>>>> Well done! +1, would love to see a patch for this!
>>>>>> 
>>>>>> 
>>>>>>> So my question is: am I missing something? Can this live resize cause
>>>>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>>>>>> this?
>>>>>> 
>>>>>> A while a go we looked into doing this.  Here is the summary of our
>>>>>> findings, not sure if there are any other considerations:
>>>>>> 
>>>>>>   1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
>>>>>>      resize on echo 1 >.
>>>>>>   2. Refactor the set_capacity() bits from  bcache_device_init() into its
>>>>>>      own function.
>>>>>>   3. Put locks around bcache_device.full_dirty_stripes and
>>>>>>      bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
>>>>>>      new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
>>>>>>      used and make sure it is locked appropriately, probably in the
>>>>>>      writeback thread, maybe other places too.
>>>>>> 
>>>>>> The cachedev's don't know anything about the bdev size, so (according to
>>>>>> Kent) they will "just work" by referencing new offsets that were
>>>>>> previously beyond the disk. (This is basically the same as resizing the
>>>>>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>>>>> 
>>>>>> As for resizing a cachedev, I've not looked at all---not sure about that
>>>>>> one.  We always detach, resize, make-bcache and re-attach the new cache.
>>>>>> Maybe it is similarly simple, but haven't looked.
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Eric Wheeler
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Andrea
>>>>>>> 
>> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-09-08  8:32     ` Andrea Tomassetti
  2022-09-19 11:42       ` Andrea Tomassetti
@ 2022-12-30 10:40       ` Coly Li
  2023-01-11 16:01         ` Andrea Tomassetti
  1 sibling, 1 reply; 29+ messages in thread
From: Coly Li @ 2022-12-30 10:40 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

On 9/8/22 4:32 PM, Andrea Tomassetti wrote:
> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> Date: Thu, 8 Sep 2022 09:47:55 +0200
> Subject: [PATCH] bcache: Add support for live resize of backing devices
>
> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>

Hi Andrea,

I am just recovered from Omicron, and able to reply email. Let me place 
my comments inline.


> ---
> Hi Coly,
> Here is the first version of the patch. There are some points I noted 
> down
> that I would like to discuss with you:
>  - I found it pretty convenient to hook the call of the new added 
> function
>    inside the `register_bcache`. In fact, every time (at least from my
>    understandings) a disk changes size, it will trigger a new probe and,
>    thus, `register_bcache` will be triggered. The only inconvenient
>    is that, in case of success, the function will output

The resize should be triggered manually, and not to do it automatically.

You may create a sysfs file under the cached device's directory, name it 
as "extend_size" or something else you think better.

Then the sysadmin may extend the cached device size explicitly on a 
predictable time.

> `error: capacity changed` even if it's not really an error.
>  - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
>    `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
>    problem, right?
>  - There is some reused code between this new function and
>    `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
>    a broader scope or make it a macro, what do you think?
>
> Thank you very much,
> Andrea
>
>  drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index ba3909bb6bea..9a77caf2a18f 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
>      return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>  }
>
> +static bool bch_update_capacity(dev_t dev)
> +{
> +    const size_t max_stripes = min_t(size_t, INT_MAX,
> +                     SIZE_MAX / sizeof(atomic_t));
> +
> +    uint64_t n, n_old;
> +    int nr_stripes_old;
> +    bool res = false;
> +
> +    struct bcache_device *d;
> +    struct cache_set *c, *tc;
> +    struct cached_dev *dcp, *t, *dc = NULL;
> +
> +    uint64_t parent_nr_sectors;
> +
> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> +            if (dcp->bdev->bd_dev == dev) {
> +                dc = dcp;
> +                goto dc_found;
> +            }
> +
> +dc_found:
> +    if (!dc)
> +        return false;
> +
> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> +
> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> +        return false;
> +

The above code only handles whole disk using as cached device. If a 
partition of a hard drive is used as cache device, and there are other 
data after this partition, such condition should be handled as well. So 
far I am fine to only extend size when using the whole hard drive as 
cached device, but more code is necessary to check and only permits 
size-extend for such condition.

> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
> +        return false;

The above code should be done when all rested are set.


> +
> +    d = &dc->disk;
> +
> +    /* Force cached device sectors re-calc */
> +    calc_cached_dev_sectors(d->c);

Here c->cached_dev_sectors might be changed, if any of the following 
steps fails, it should be restored to previous value.


> +
> +    /* Block writeback thread */
> +    down_write(&dc->writeback_lock);
> +    nr_stripes_old = d->nr_stripes;
> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> +    if (!n || n > max_stripes) {
> +        pr_err("nr_stripes too large or invalid: %llu (start sector 
> beyond end of disk?)\n",
> +            n);
> +        goto unblock_and_exit;
> +    }
> +    d->nr_stripes = n;
> +
> +    n = d->nr_stripes * sizeof(atomic_t);
> +    n_old = nr_stripes_old * sizeof(atomic_t);
> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
> +        n, GFP_KERNEL);
> +    if (!d->stripe_sectors_dirty)
> +        goto unblock_and_exit;
> +
> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, 
> n, GFP_KERNEL);
> +    if (!d->full_dirty_stripes)
> +        goto unblock_and_exit;

If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous 
array should be restored.

> +
> +    res = true;
> +
> +unblock_and_exit:
> +    up_write(&dc->writeback_lock);
> +    return res;
> +}
> +

I didn't test the patch, from the first glance, I feel the failure 
handling should restore all previous values, otherwise the cache device 
may be in a non-consistent state when extend size fails.


>  struct async_reg_args {
>      struct delayed_work reg_work;
>      char *path;
> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject 
> *k, struct kobj_attribute *attr,
>              mutex_lock(&bch_register_lock);
>              if (lookup_bdev(strim(path), &dev) == 0 &&
>                  bch_is_open(dev))
> -                err = "device already registered";
> +                if (bch_update_capacity(dev))
> +                    err = "capacity changed";
> +                else
> +                    err = "device already registered";


As I said, it should be a separated write-only sysfile under the cache 
device's directory.


> else
>                  err = "device busy";
>              mutex_unlock(&bch_register_lock);
> -- 
> 2.37.3
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2022-12-30 10:40       ` Coly Li
@ 2023-01-11 16:01         ` Andrea Tomassetti
  2023-01-17 13:08           ` Error messages with kernel 6.1.[56] Pierre Juhen
  2023-01-17 16:18           ` [RFC] Live resize of backing device Coly Li
  0 siblings, 2 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2023-01-11 16:01 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

Hi Coly,
thank you for taking the time to reply, I really hope you are feeling
better now.

On Fri, Dec 30, 2022 at 11:41 AM Coly Li <colyli@suse.de> wrote:
>
> On 9/8/22 4:32 PM, Andrea Tomassetti wrote:
> > From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> > From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > Date: Thu, 8 Sep 2022 09:47:55 +0200
> > Subject: [PATCH] bcache: Add support for live resize of backing devices
> >
> > Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>
> Hi Andrea,
>
> I am just recovered from Omicron, and able to reply email. Let me place
> my comments inline.
>
>
> > ---
> > Hi Coly,
> > Here is the first version of the patch. There are some points I noted
> > down
> > that I would like to discuss with you:
> >  - I found it pretty convenient to hook the call of the new added
> > function
> >    inside the `register_bcache`. In fact, every time (at least from my
> >    understandings) a disk changes size, it will trigger a new probe and,
> >    thus, `register_bcache` will be triggered. The only inconvenient
> >    is that, in case of success, the function will output
>
> The resize should be triggered manually, and not to do it automatically.
>
> You may create a sysfs file under the cached device's directory, name it
> as "extend_size" or something else you think better.
>
> Then the sysadmin may extend the cached device size explicitly on a
> predictable time.
>
> > `error: capacity changed` even if it's not really an error.
> >  - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
> >    `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
> >    problem, right?
> >  - There is some reused code between this new function and
> >    `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
> >    a broader scope or make it a macro, what do you think?
> >
> > Thank you very much,
> > Andrea
> >
> >  drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 74 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index ba3909bb6bea..9a77caf2a18f 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
> >      return bch_is_open_cache(dev) || bch_is_open_backing(dev);
> >  }
> >
> > +static bool bch_update_capacity(dev_t dev)
> > +{
> > +    const size_t max_stripes = min_t(size_t, INT_MAX,
> > +                     SIZE_MAX / sizeof(atomic_t));
> > +
> > +    uint64_t n, n_old;
> > +    int nr_stripes_old;
> > +    bool res = false;
> > +
> > +    struct bcache_device *d;
> > +    struct cache_set *c, *tc;
> > +    struct cached_dev *dcp, *t, *dc = NULL;
> > +
> > +    uint64_t parent_nr_sectors;
> > +
> > +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> > +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> > +            if (dcp->bdev->bd_dev == dev) {
> > +                dc = dcp;
> > +                goto dc_found;
> > +            }
> > +
> > +dc_found:
> > +    if (!dc)
> > +        return false;
> > +
> > +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> > +
> > +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> > +        return false;
> > +
>
> The above code only handles whole disk using as cached device. If a
> partition of a hard drive is used as cache device, and there are other
> data after this partition, such condition should be handled as well. So
> far I am fine to only extend size when using the whole hard drive as
> cached device, but more code is necessary to check and only permits
> size-extend for such condition.
I don't understand if there's a misalignment here so let me be more
clear: this patch is intended to support the live resize of *backing
devices*, is this what you mean with "cached device"?
When I was working on the patch I didn't consider the possibility of
live resizing the cache devices, but it should be trivial.
So, as far as I understand, a partition cannot be used as a backing
device, correct? The whole disk should be used as a backing device,
that's why I'm checking and that's why this check should be correct.
Am I wrong?

>
> > +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
> > +        return false;
>
> The above code should be done when all rested are set.
>
>
> > +
> > +    d = &dc->disk;
> > +
> > +    /* Force cached device sectors re-calc */
> > +    calc_cached_dev_sectors(d->c);
>
> Here c->cached_dev_sectors might be changed, if any of the following
> steps fails, it should be restored to previous value.
>
>
> > +
> > +    /* Block writeback thread */
> > +    down_write(&dc->writeback_lock);
> > +    nr_stripes_old = d->nr_stripes;
> > +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> > +    if (!n || n > max_stripes) {
> > +        pr_err("nr_stripes too large or invalid: %llu (start sector
> > beyond end of disk?)\n",
> > +            n);
> > +        goto unblock_and_exit;
> > +    }
> > +    d->nr_stripes = n;
> > +
> > +    n = d->nr_stripes * sizeof(atomic_t);
> > +    n_old = nr_stripes_old * sizeof(atomic_t);
> > +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
> > +        n, GFP_KERNEL);
> > +    if (!d->stripe_sectors_dirty)
> > +        goto unblock_and_exit;
> > +
> > +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> > +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> > +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old,
> > n, GFP_KERNEL);
> > +    if (!d->full_dirty_stripes)
> > +        goto unblock_and_exit;
>
> If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous
> array should be restored.
>
> > +
> > +    res = true;
> > +
> > +unblock_and_exit:
> > +    up_write(&dc->writeback_lock);
> > +    return res;
> > +}
> > +
>
> I didn't test the patch, from the first glance, I feel the failure
> handling should restore all previous values, otherwise the cache device
> may be in a non-consistent state when extend size fails.
>
Completely agree with you on this point. I changed it locally and, as
soon as we agree on the other points I'll send a newer version of the
patch.
>
> >  struct async_reg_args {
> >      struct delayed_work reg_work;
> >      char *path;
> > @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
> > *k, struct kobj_attribute *attr,
> >              mutex_lock(&bch_register_lock);
> >              if (lookup_bdev(strim(path), &dev) == 0 &&
> >                  bch_is_open(dev))
> > -                err = "device already registered";
> > +                if (bch_update_capacity(dev))
> > +                    err = "capacity changed";
> > +                else
> > +                    err = "device already registered";
>
>
> As I said, it should be a separated write-only sysfile under the cache
> device's directory.
Can I ask why you don't like the automatic resize way? Why should the
resize be manual?
Someone needs to trigger the block device resize, so shouldn't that be enough?

Andrea
>
>
> > else
> >                  err = "device busy";
> >              mutex_unlock(&bch_register_lock);
> > --
> > 2.37.3
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Error messages with kernel 6.1.[56]
  2023-01-11 16:01         ` Andrea Tomassetti
@ 2023-01-17 13:08           ` Pierre Juhen
  2023-01-17 16:08             ` Coly Li
  2023-01-17 16:18           ` [RFC] Live resize of backing device Coly Li
  1 sibling, 1 reply; 29+ messages in thread
From: Pierre Juhen @ 2023-01-17 13:08 UTC (permalink / raw)
  To: linux-bcache

Hi bcache-list,

I wish you all the best for 2023.

I use fedora (36 for the moment).

Since the migration to 6.1 kernel, I get the following messages at boot 
time :

    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 264) of single field "&i->j" at
    drivers/md/bcache/journal.c:152 (size 240)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 650 at
    drivers/md/bcache/journal.c:152 journal_read_bucket+0x3d0/0x480 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in:
    bcachefjes(-) raid0 crct10dif_pclmul crc32_pclmul crc32c_intel
    polyval_clmulni polyval_generic nvme ghash_clmulni_intel
    sha512_ssse3 nvme_core sp5100_>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 650 Comm:
    bcache-register Not tainted 6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:journal_read_bucket+0x3d0/0x480 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ? bch_btree_exit+0x20/0x20
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_journal_read+0x69/0x2f0
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      register_bcache+0x12c9/0x1bf0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&b->key" at
    drivers/md/bcache/btree.c:939 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 650 at
    drivers/md/bcache/btree.c:939 mca_alloc+0x421/0x4c0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: raid1
    fjes(-) bcacheraid0 crct10dif_pclmul crc32_pclmul crc32c_intel
    polyval_clmulni polyval_generic nvme ghash_clmulni_intel
    sha512_ssse3 nvme_core s>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 650 Comm:
    bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:mca_alloc+0x421/0x4c0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_node_get.part.0+0x109/0x2b0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      register_bcache+0x16ed/0x1bf0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&c->uuid_bucket" at
    drivers/md/bcache/super.c:465 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 650 at
    drivers/md/bcache/super.c:465 register_bcache+0x1b39/0x1bf0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: raid1
    fjes(-) bcacheraid0 crct10dif_pclmul crc32_pclmul crc32c_intel
    polyval_clmulni polyval_generic nvme ghash_clmulni_intel
    sha512_ssse3 nvme_core s>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 650 Comm:
    bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:register_bcache+0x1b39/0x1bf0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&temp.key" at
    drivers/md/bcache/extents.c:428 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 9 PID: 650 at
    drivers/md/bcache/extents.c:428 bch_extent_insert_fixup+0x54e/0x560
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
    hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
    crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
    ghash_clmulni_intel sh>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 9 PID: 650 Comm:
    bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:bch_extent_insert_fixup+0x54e/0x560 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_insert_key+0xc5/0x260 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  btree_insert_key+0x4c/0xd0
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_insert_keys+0x3e/0x290 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    __bch_btree_ptr_invalid+0x5b/0xc0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_insert_node+0x143/0x3a0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_insert_check_key+0x150/0x150 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  btree_insert_fn+0x20/0x40
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_map_nodes_recurse+0xf0/0x170 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      __bch_btree_map_nodes+0x1dd/0x1f0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_insert_check_key+0x150/0x150 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_btree_insert+0xcd/0x110
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_journal_replay+0xe6/0x1f0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      register_bcache+0x1918/0x1bf0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&k.key" at
    drivers/md/bcache/btree.c:371 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 9 PID: 650 at
    drivers/md/bcache/btree.c:371 __bch_btree_node_write+0x59d/0x5d0
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
    hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
    crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
    ghash_clmulni_intel sh>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 9 PID: 650 Comm:
    bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:__bch_btree_node_write+0x59d/0x5d0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_insert_keys+0x48/0x290 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_insert_node+0x32b/0x3a0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_insert_check_key+0x150/0x150 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  btree_insert_fn+0x20/0x40
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_map_nodes_recurse+0xf0/0x170 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      __bch_btree_map_nodes+0x1dd/0x1f0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_insert_check_key+0x150/0x150 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_btree_insert+0xcd/0x110
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_journal_replay+0xe6/0x1f0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      register_bcache+0x1918/0x1bf0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: bcache: bch_journal_replay()
    journal replay done, 3591 keys in 304 entries, seq 6243591
    janv. 17 07:20:50 pierre.juhen kernel: bcache: register_cache()
    registered cache device nvme0n1p3
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field
    "&w->data->btree_root" at drivers/md/bcache/journal.c:778 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 700 at
    drivers/md/bcache/journal.c:778 journal_write_unlocked+0x4f6/0x560
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
    hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
    crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
    ghash_clmulni_intel sh>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 700 Comm:
    bcache_allocato Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:journal_write_unlocked+0x4f6/0x560 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_journal+0x302/0x370 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_journal_meta+0x38/0x50
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_prio_write+0x3af/0x4c0
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_allocator_thread+0x199/0xcd0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_invalidate_one_bucket+0x80/0x80 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field
    "&w->data->uuid_bucket" at drivers/md/bcache/journal.c:779 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 700 at
    drivers/md/bcache/journal.c:779 journal_write_unlocked+0x545/0x560
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
    hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
    crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
    ghash_clmulni_intel sh>
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 700 Comm:
    bcache_allocato Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:journal_write_unlocked+0x545/0x560 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_journal+0x302/0x370 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_journal_meta+0x38/0x50
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_prio_write+0x3af/0x4c0
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_allocator_thread+0x199/0xcd0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_invalidate_one_bucket+0x80/0x80 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: bcache: register_bdev()
    registered backing device md126
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&c->uuid_bucket" at
    drivers/md/bcache/super.c:520 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 11 PID: 703 at
    drivers/md/bcache/super.c:520 __uuid_write+0x1ad/0x1c0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in:
    hid_logitech_hidpp raid1 bcacheraid0 crct10dif_pclmul crc32_pclmul
    crc32c_intel polyval_clmulni polyval_generic nvme
    ghash_clmulni_intel sha512_ssse3 >
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 11 PID: 703 Comm:
    bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:__uuid_write+0x1ad/0x1c0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ? bch_btree_exit+0x20/0x20
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_cached_dev_attach+0x442/0x560 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      register_bcache.cold+0x26c/0x3e4 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: bcache:
    bch_cached_dev_attach() Caching md126 as bcache0 on set
    4e3b28d9-0795-4f91-af97-f8429b325481
    janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&w->key" at
    drivers/md/bcache/btree.c:2618 (size 16)
    janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 706 at
    drivers/md/bcache/btree.c:2618 refill_keybuf_fn+0x220/0x230 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel: Modules linked in:
    hid_logitech_hidpp raid1 bcacheraid0 crct10dif_pclmul crc32_pclmul
    crc32c_intel polyval_clmulni polyval_generic nvme
    ghash_clmulni_intel sha512_ssse3 >
    janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 706 Comm:
    bcache_writebac Tainted: G        W          6.1.6-100.fc36.x86_64 #1
    janv. 17 07:20:50 pierre.juhen kernel: RIP:
    0010:refill_keybuf_fn+0x220/0x230 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_map_keys_recurse+0x60/0x180 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_gc_finish+0x390/0x390 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_map_keys_recurse+0xe6/0x180 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_gc_finish+0x390/0x390 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    idle_counter_exceeded+0x50/0x50 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_btree_map_keys+0x1dd/0x1f0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    bch_btree_gc_finish+0x390/0x390 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    idle_counter_exceeded+0x50/0x50 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  bch_refill_keybuf+0xba/0x1e0
    [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ?
    idle_counter_exceeded+0x50/0x50 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:
      bch_writeback_thread+0x31a/0x5b0 [bcache]
    janv. 17 07:20:50 pierre.juhen kernel:  ? read_dirty+0x450/0x450
    [bcache]
    janv. 17 07:20:52 pierre.juhen dracut-initqueue[783]: Scanning
    devices bcache0 md127 sdc1  for LVM volume groups
    janv. 17 07:20:52 pierre.juhen kernel: memcpy: detected
    field-spanning write (size 24) of single field "&ret->key" at
    drivers/md/bcache/alloc.c:586 (size 16)
    janv. 17 07:20:52 pierre.juhen kernel: WARNING: CPU: 5 PID: 870 at
    drivers/md/bcache/alloc.c:586 bch_alloc_sectors+0x4ad/0x500 [bcache]
    janv. 17 07:20:52 pierre.juhen kernel: Modules linked in:
    nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE)
    hid_logitech_hidpp raid1 bcacheraid0 crct10dif_pclmul crc32_pclmul
    crc32c_intel polyva>
    janv. 17 07:20:52 pierre.juhen kernel: RIP:
    0010:bch_alloc_sectors+0x4ad/0x500 [bcache]
    janv. 17 07:20:52 pierre.juhen kernel:
      bch_data_insert_start+0x150/0x4c0 [bcache]
    janv. 17 07:20:52 pierre.juhen kernel:
      cached_dev_submit_bio+0xb1d/0xd70 [bcache]

Thank you,

Regards,

Pierre


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Error messages with kernel 6.1.[56]
  2023-01-17 13:08           ` Error messages with kernel 6.1.[56] Pierre Juhen
@ 2023-01-17 16:08             ` Coly Li
  0 siblings, 0 replies; 29+ messages in thread
From: Coly Li @ 2023-01-17 16:08 UTC (permalink / raw)
  To: Pierre Juhen; +Cc: linux-bcache



> 2023年1月17日 21:08,Pierre Juhen <pierre.juhen@orange.fr> 写道:
> 
> Hi bcache-list,
> 
> I wish you all the best for 2023.
> 
> I use fedora (36 for the moment).
> 
> Since the migration to 6.1 kernel, I get the following messages at boot time :
> 

This is reported, and fixed by Kees Cook with following original patches,

https://lore.kernel.org/lkml/20230106045327.never.413-kees@kernel.org/
https://lore.kernel.org/lkml/20230106053153.never.999-kees@kernel.org/
https://lore.kernel.org/lkml/20230106060229.never.047-kees@kernel.org/
https://lore.kernel.org/lkml/20230106061659.never.817-kees@kernel.org/

The patches are not upstream yet, maybe soon.

Just for your information.

Coly Li


>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 264) of single field "&i->j" at
>   drivers/md/bcache/journal.c:152 (size 240)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 650 at
>   drivers/md/bcache/journal.c:152 journal_read_bucket+0x3d0/0x480 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in:
>   bcachefjes(-) raid0 crct10dif_pclmul crc32_pclmul crc32c_intel
>   polyval_clmulni polyval_generic nvme ghash_clmulni_intel
>   sha512_ssse3 nvme_core sp5100_>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 650 Comm:
>   bcache-register Not tainted 6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:journal_read_bucket+0x3d0/0x480 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ? bch_btree_exit+0x20/0x20
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_journal_read+0x69/0x2f0
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     register_bcache+0x12c9/0x1bf0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&b->key" at
>   drivers/md/bcache/btree.c:939 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 650 at
>   drivers/md/bcache/btree.c:939 mca_alloc+0x421/0x4c0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: raid1
>   fjes(-) bcacheraid0 crct10dif_pclmul crc32_pclmul crc32c_intel
>   polyval_clmulni polyval_generic nvme ghash_clmulni_intel
>   sha512_ssse3 nvme_core s>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 650 Comm:
>   bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:mca_alloc+0x421/0x4c0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_node_get.part.0+0x109/0x2b0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     register_bcache+0x16ed/0x1bf0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&c->uuid_bucket" at
>   drivers/md/bcache/super.c:465 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 650 at
>   drivers/md/bcache/super.c:465 register_bcache+0x1b39/0x1bf0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: raid1
>   fjes(-) bcacheraid0 crct10dif_pclmul crc32_pclmul crc32c_intel
>   polyval_clmulni polyval_generic nvme ghash_clmulni_intel
>   sha512_ssse3 nvme_core s>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 650 Comm:
>   bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:register_bcache+0x1b39/0x1bf0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&temp.key" at
>   drivers/md/bcache/extents.c:428 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 9 PID: 650 at
>   drivers/md/bcache/extents.c:428 bch_extent_insert_fixup+0x54e/0x560
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
>   hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
>   crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
>   ghash_clmulni_intel sh>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 9 PID: 650 Comm:
>   bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:bch_extent_insert_fixup+0x54e/0x560 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_insert_key+0xc5/0x260 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  btree_insert_key+0x4c/0xd0
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_insert_keys+0x3e/0x290 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   __bch_btree_ptr_invalid+0x5b/0xc0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_insert_node+0x143/0x3a0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_insert_check_key+0x150/0x150 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  btree_insert_fn+0x20/0x40
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_map_nodes_recurse+0xf0/0x170 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     __bch_btree_map_nodes+0x1dd/0x1f0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_insert_check_key+0x150/0x150 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_btree_insert+0xcd/0x110
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_journal_replay+0xe6/0x1f0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     register_bcache+0x1918/0x1bf0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&k.key" at
>   drivers/md/bcache/btree.c:371 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 9 PID: 650 at
>   drivers/md/bcache/btree.c:371 __bch_btree_node_write+0x59d/0x5d0
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
>   hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
>   crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
>   ghash_clmulni_intel sh>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 9 PID: 650 Comm:
>   bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:__bch_btree_node_write+0x59d/0x5d0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_insert_keys+0x48/0x290 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_insert_node+0x32b/0x3a0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_insert_check_key+0x150/0x150 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  btree_insert_fn+0x20/0x40
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_map_nodes_recurse+0xf0/0x170 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     __bch_btree_map_nodes+0x1dd/0x1f0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_insert_check_key+0x150/0x150 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_btree_insert+0xcd/0x110
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_journal_replay+0xe6/0x1f0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     register_bcache+0x1918/0x1bf0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: bcache: bch_journal_replay()
>   journal replay done, 3591 keys in 304 entries, seq 6243591
>   janv. 17 07:20:50 pierre.juhen kernel: bcache: register_cache()
>   registered cache device nvme0n1p3
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field
>   "&w->data->btree_root" at drivers/md/bcache/journal.c:778 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 700 at
>   drivers/md/bcache/journal.c:778 journal_write_unlocked+0x4f6/0x560
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
>   hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
>   crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
>   ghash_clmulni_intel sh>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 700 Comm:
>   bcache_allocato Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:journal_write_unlocked+0x4f6/0x560 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_journal+0x302/0x370 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_journal_meta+0x38/0x50
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_prio_write+0x3af/0x4c0
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_allocator_thread+0x199/0xcd0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_invalidate_one_bucket+0x80/0x80 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field
>   "&w->data->uuid_bucket" at drivers/md/bcache/journal.c:779 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 700 at
>   drivers/md/bcache/journal.c:779 journal_write_unlocked+0x545/0x560
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in: fjes(-)
>   hid_logitech_hidpp(+) raid1 bcacheraid0 crct10dif_pclmul
>   crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme
>   ghash_clmulni_intel sh>
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 700 Comm:
>   bcache_allocato Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:journal_write_unlocked+0x545/0x560 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_journal+0x302/0x370 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_journal_meta+0x38/0x50
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_prio_write+0x3af/0x4c0
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_allocator_thread+0x199/0xcd0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_invalidate_one_bucket+0x80/0x80 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: bcache: register_bdev()
>   registered backing device md126
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&c->uuid_bucket" at
>   drivers/md/bcache/super.c:520 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 11 PID: 703 at
>   drivers/md/bcache/super.c:520 __uuid_write+0x1ad/0x1c0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in:
>   hid_logitech_hidpp raid1 bcacheraid0 crct10dif_pclmul crc32_pclmul
>   crc32c_intel polyval_clmulni polyval_generic nvme
>   ghash_clmulni_intel sha512_ssse3 >
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 11 PID: 703 Comm:
>   bcache-register Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:__uuid_write+0x1ad/0x1c0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ? bch_btree_exit+0x20/0x20
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_cached_dev_attach+0x442/0x560 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     register_bcache.cold+0x26c/0x3e4 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: bcache:
>   bch_cached_dev_attach() Caching md126 as bcache0 on set
>   4e3b28d9-0795-4f91-af97-f8429b325481
>   janv. 17 07:20:50 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&w->key" at
>   drivers/md/bcache/btree.c:2618 (size 16)
>   janv. 17 07:20:50 pierre.juhen kernel: WARNING: CPU: 0 PID: 706 at
>   drivers/md/bcache/btree.c:2618 refill_keybuf_fn+0x220/0x230 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel: Modules linked in:
>   hid_logitech_hidpp raid1 bcacheraid0 crct10dif_pclmul crc32_pclmul
>   crc32c_intel polyval_clmulni polyval_generic nvme
>   ghash_clmulni_intel sha512_ssse3 >
>   janv. 17 07:20:50 pierre.juhen kernel: CPU: 0 PID: 706 Comm:
>   bcache_writebac Tainted: G        W          6.1.6-100.fc36.x86_64 #1
>   janv. 17 07:20:50 pierre.juhen kernel: RIP:
>   0010:refill_keybuf_fn+0x220/0x230 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_map_keys_recurse+0x60/0x180 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_gc_finish+0x390/0x390 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_map_keys_recurse+0xe6/0x180 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_gc_finish+0x390/0x390 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   idle_counter_exceeded+0x50/0x50 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_btree_map_keys+0x1dd/0x1f0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   bch_btree_gc_finish+0x390/0x390 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   idle_counter_exceeded+0x50/0x50 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  bch_refill_keybuf+0xba/0x1e0
>   [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ?
>   idle_counter_exceeded+0x50/0x50 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:
>     bch_writeback_thread+0x31a/0x5b0 [bcache]
>   janv. 17 07:20:50 pierre.juhen kernel:  ? read_dirty+0x450/0x450
>   [bcache]
>   janv. 17 07:20:52 pierre.juhen dracut-initqueue[783]: Scanning
>   devices bcache0 md127 sdc1  for LVM volume groups
>   janv. 17 07:20:52 pierre.juhen kernel: memcpy: detected
>   field-spanning write (size 24) of single field "&ret->key" at
>   drivers/md/bcache/alloc.c:586 (size 16)
>   janv. 17 07:20:52 pierre.juhen kernel: WARNING: CPU: 5 PID: 870 at
>   drivers/md/bcache/alloc.c:586 bch_alloc_sectors+0x4ad/0x500 [bcache]
>   janv. 17 07:20:52 pierre.juhen kernel: Modules linked in:
>   nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE)
>   hid_logitech_hidpp raid1 bcacheraid0 crct10dif_pclmul crc32_pclmul
>   crc32c_intel polyva>
>   janv. 17 07:20:52 pierre.juhen kernel: RIP:
>   0010:bch_alloc_sectors+0x4ad/0x500 [bcache]
>   janv. 17 07:20:52 pierre.juhen kernel:
>     bch_data_insert_start+0x150/0x4c0 [bcache]
>   janv. 17 07:20:52 pierre.juhen kernel:
>     cached_dev_submit_bio+0xb1d/0xd70 [bcache]
> 
> Thank you,
> 
> Regards,
> 
> Pierre
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-11 16:01         ` Andrea Tomassetti
  2023-01-17 13:08           ` Error messages with kernel 6.1.[56] Pierre Juhen
@ 2023-01-17 16:18           ` Coly Li
  2023-01-25 10:07             ` Andrea Tomassetti
  1 sibling, 1 reply; 29+ messages in thread
From: Coly Li @ 2023-01-17 16:18 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache



> 2023年1月12日 00:01,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> Hi Coly,
> thank you for taking the time to reply, I really hope you are feeling
> better now.

Thanks. But the recovery is really slow, and my response cannot be in time. I was told it might be better after 1 month, hope it is true.

> 
> On Fri, Dec 30, 2022 at 11:41 AM Coly Li <colyli@suse.de> wrote:
>> 
>> On 9/8/22 4:32 PM, Andrea Tomassetti wrote:
>>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
>>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>>> Subject: [PATCH] bcache: Add support for live resize of backing devices
>>> 
>>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>> 
>> Hi Andrea,
>> 
>> I am just recovered from Omicron, and able to reply email. Let me place
>> my comments inline.
>> 
>> 
>>> ---
>>> Hi Coly,
>>> Here is the first version of the patch. There are some points I noted
>>> down
>>> that I would like to discuss with you:
>>> - I found it pretty convenient to hook the call of the new added
>>> function
>>>   inside the `register_bcache`. In fact, every time (at least from my
>>>   understandings) a disk changes size, it will trigger a new probe and,
>>>   thus, `register_bcache` will be triggered. The only inconvenient
>>>   is that, in case of success, the function will output
>> 
>> The resize should be triggered manually, and not to do it automatically.
>> 
>> You may create a sysfs file under the cached device's directory, name it
>> as "extend_size" or something else you think better.
>> 
>> Then the sysadmin may extend the cached device size explicitly on a
>> predictable time.
>> 
>>> `error: capacity changed` even if it's not really an error.
>>> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
>>>   `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
>>>   problem, right?
>>> - There is some reused code between this new function and
>>>   `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
>>>   a broader scope or make it a macro, what do you think?
>>> 
>>> Thank you very much,
>>> Andrea
>>> 
>>> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 74 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>> index ba3909bb6bea..9a77caf2a18f 100644
>>> --- a/drivers/md/bcache/super.c
>>> +++ b/drivers/md/bcache/super.c
>>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
>>>     return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>>> }
>>> 
>>> +static bool bch_update_capacity(dev_t dev)
>>> +{
>>> +    const size_t max_stripes = min_t(size_t, INT_MAX,
>>> +                     SIZE_MAX / sizeof(atomic_t));
>>> +
>>> +    uint64_t n, n_old;
>>> +    int nr_stripes_old;
>>> +    bool res = false;
>>> +
>>> +    struct bcache_device *d;
>>> +    struct cache_set *c, *tc;
>>> +    struct cached_dev *dcp, *t, *dc = NULL;
>>> +
>>> +    uint64_t parent_nr_sectors;
>>> +
>>> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
>>> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
>>> +            if (dcp->bdev->bd_dev == dev) {
>>> +                dc = dcp;
>>> +                goto dc_found;
>>> +            }
>>> +
>>> +dc_found:
>>> +    if (!dc)
>>> +        return false;
>>> +
>>> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
>>> +
>>> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
>>> +        return false;
>>> +
>> 
>> The above code only handles whole disk using as cached device. If a
>> partition of a hard drive is used as cache device, and there are other
>> data after this partition, such condition should be handled as well. So
>> far I am fine to only extend size when using the whole hard drive as
>> cached device, but more code is necessary to check and only permits
>> size-extend for such condition.
> I don't understand if there's a misalignment here so let me be more
> clear: this patch is intended to support the live resize of *backing
> devices*, is this what you mean with "cached device"?

Yes, backing device is cached device.


> When I was working on the patch I didn't consider the possibility of
> live resizing the cache devices, but it should be trivial.
> So, as far as I understand, a partition cannot be used as a backing
> device, correct? The whole disk should be used as a backing device,
> that's why I'm checking and that's why this check should be correct.
> Am I wrong?

Yes a partition can be used as a backing (cached) device.
That is to say, if a file system is format on top of the cached device, this file system cannot be directly accessed via the partition. It has to be accessed via the bcache device e.g. /dev/bcache0.


> 
>> 
>>> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
>>> +        return false;
>> 
>> The above code should be done when all rested are set.
>> 
>> 
>>> +
>>> +    d = &dc->disk;
>>> +
>>> +    /* Force cached device sectors re-calc */
>>> +    calc_cached_dev_sectors(d->c);
>> 
>> Here c->cached_dev_sectors might be changed, if any of the following
>> steps fails, it should be restored to previous value.
>> 
>> 
>>> +
>>> +    /* Block writeback thread */
>>> +    down_write(&dc->writeback_lock);
>>> +    nr_stripes_old = d->nr_stripes;
>>> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
>>> +    if (!n || n > max_stripes) {
>>> +        pr_err("nr_stripes too large or invalid: %llu (start sector
>>> beyond end of disk?)\n",
>>> +            n);
>>> +        goto unblock_and_exit;
>>> +    }
>>> +    d->nr_stripes = n;
>>> +
>>> +    n = d->nr_stripes * sizeof(atomic_t);
>>> +    n_old = nr_stripes_old * sizeof(atomic_t);
>>> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
>>> +        n, GFP_KERNEL);
>>> +    if (!d->stripe_sectors_dirty)
>>> +        goto unblock_and_exit;
>>> +
>>> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
>>> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
>>> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old,
>>> n, GFP_KERNEL);
>>> +    if (!d->full_dirty_stripes)
>>> +        goto unblock_and_exit;
>> 
>> If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous
>> array should be restored.
>> 
>>> +
>>> +    res = true;
>>> +
>>> +unblock_and_exit:
>>> +    up_write(&dc->writeback_lock);
>>> +    return res;
>>> +}
>>> +
>> 
>> I didn't test the patch, from the first glance, I feel the failure
>> handling should restore all previous values, otherwise the cache device
>> may be in a non-consistent state when extend size fails.
>> 
> Completely agree with you on this point. I changed it locally and, as
> soon as we agree on the other points I'll send a newer version of the
> patch.
>> 
>>> struct async_reg_args {
>>>     struct delayed_work reg_work;
>>>     char *path;
>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
>>> *k, struct kobj_attribute *attr,
>>>             mutex_lock(&bch_register_lock);
>>>             if (lookup_bdev(strim(path), &dev) == 0 &&
>>>                 bch_is_open(dev))
>>> -                err = "device already registered";
>>> +                if (bch_update_capacity(dev))
>>> +                    err = "capacity changed";
>>> +                else
>>> +                    err = "device already registered";
>> 
>> 
>> As I said, it should be a separated write-only sysfile under the cache
>> device's directory.
> Can I ask why you don't like the automatic resize way? Why should the
> resize be manual?

Most of system administrators don’t like such silently automatic things. They want to extend the size explicitly, especially when there is other dependences in their configurations.


> Someone needs to trigger the block device resize, so shouldn't that be enough?

Yes, an explicit write operation on a sysfs file to trigger the resize. Then we can permit people to do the size extend explicit and avoid to change current behavior.

Thanks.

Coly Li

> 
> Andrea
>> 
>> 
>>> else
>>>                 err = "device busy";
>>>             mutex_unlock(&bch_register_lock);
>>> --
>>> 2.37.3
>>> 
>> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-17 16:18           ` [RFC] Live resize of backing device Coly Li
@ 2023-01-25 10:07             ` Andrea Tomassetti
  2023-01-25 17:59               ` Coly Li
  2023-01-27  2:53               ` [RFC] Live resize of bcache " Eric Wheeler
  0 siblings, 2 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2023-01-25 10:07 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@suse.de> wrote:
>
>
>
> > 2023年1月12日 00:01,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> >
> > Hi Coly,
> > thank you for taking the time to reply, I really hope you are feeling
> > better now.
>
> Thanks. But the recovery is really slow, and my response cannot be in time. I was told it might be better after 1 month, hope it is true.
>
> >
> > On Fri, Dec 30, 2022 at 11:41 AM Coly Li <colyli@suse.de> wrote:
> >>
> >> On 9/8/22 4:32 PM, Andrea Tomassetti wrote:
> >>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> >>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> >>> Date: Thu, 8 Sep 2022 09:47:55 +0200
> >>> Subject: [PATCH] bcache: Add support for live resize of backing devices
> >>>
> >>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> >>
> >> Hi Andrea,
> >>
> >> I am just recovered from Omicron, and able to reply email. Let me place
> >> my comments inline.
> >>
> >>
> >>> ---
> >>> Hi Coly,
> >>> Here is the first version of the patch. There are some points I noted
> >>> down
> >>> that I would like to discuss with you:
> >>> - I found it pretty convenient to hook the call of the new added
> >>> function
> >>>   inside the `register_bcache`. In fact, every time (at least from my
> >>>   understandings) a disk changes size, it will trigger a new probe and,
> >>>   thus, `register_bcache` will be triggered. The only inconvenient
> >>>   is that, in case of success, the function will output
> >>
> >> The resize should be triggered manually, and not to do it automatically.
> >>
> >> You may create a sysfs file under the cached device's directory, name it
> >> as "extend_size" or something else you think better.
> >>
> >> Then the sysadmin may extend the cached device size explicitly on a
> >> predictable time.
> >>
> >>> `error: capacity changed` even if it's not really an error.
> >>> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
> >>>   `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
> >>>   problem, right?
> >>> - There is some reused code between this new function and
> >>>   `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
> >>>   a broader scope or make it a macro, what do you think?
> >>>
> >>> Thank you very much,
> >>> Andrea
> >>>
> >>> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
> >>> 1 file changed, 74 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> >>> index ba3909bb6bea..9a77caf2a18f 100644
> >>> --- a/drivers/md/bcache/super.c
> >>> +++ b/drivers/md/bcache/super.c
> >>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
> >>>     return bch_is_open_cache(dev) || bch_is_open_backing(dev);
> >>> }
> >>>
> >>> +static bool bch_update_capacity(dev_t dev)
> >>> +{
> >>> +    const size_t max_stripes = min_t(size_t, INT_MAX,
> >>> +                     SIZE_MAX / sizeof(atomic_t));
> >>> +
> >>> +    uint64_t n, n_old;
> >>> +    int nr_stripes_old;
> >>> +    bool res = false;
> >>> +
> >>> +    struct bcache_device *d;
> >>> +    struct cache_set *c, *tc;
> >>> +    struct cached_dev *dcp, *t, *dc = NULL;
> >>> +
> >>> +    uint64_t parent_nr_sectors;
> >>> +
> >>> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> >>> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> >>> +            if (dcp->bdev->bd_dev == dev) {
> >>> +                dc = dcp;
> >>> +                goto dc_found;
> >>> +            }
> >>> +
> >>> +dc_found:
> >>> +    if (!dc)
> >>> +        return false;
> >>> +
> >>> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> >>> +
> >>> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> >>> +        return false;
> >>> +
> >>
> >> The above code only handles whole disk using as cached device. If a
> >> partition of a hard drive is used as cache device, and there are other
> >> data after this partition, such condition should be handled as well. So
> >> far I am fine to only extend size when using the whole hard drive as
> >> cached device, but more code is necessary to check and only permits
> >> size-extend for such condition.
> > I don't understand if there's a misalignment here so let me be more
> > clear: this patch is intended to support the live resize of *backing
> > devices*, is this what you mean with "cached device"?
>
> Yes, backing device is cached device.
>
>
> > When I was working on the patch I didn't consider the possibility of
> > live resizing the cache devices, but it should be trivial.
> > So, as far as I understand, a partition cannot be used as a backing
> > device, correct? The whole disk should be used as a backing device,
> > that's why I'm checking and that's why this check should be correct.
> > Am I wrong?
>
> Yes a partition can be used as a backing (cached) device.
> That is to say, if a file system is format on top of the cached device, this file system cannot be directly accessed via the partition. It has to be accessed via the bcache device e.g. /dev/bcache0.
>
>
> >
> >>
> >>> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
> >>> +        return false;
> >>
> >> The above code should be done when all rested are set.
> >>
> >>
> >>> +
> >>> +    d = &dc->disk;
> >>> +
> >>> +    /* Force cached device sectors re-calc */
> >>> +    calc_cached_dev_sectors(d->c);
> >>
> >> Here c->cached_dev_sectors might be changed, if any of the following
> >> steps fails, it should be restored to previous value.
> >>
> >>
> >>> +
> >>> +    /* Block writeback thread */
> >>> +    down_write(&dc->writeback_lock);
> >>> +    nr_stripes_old = d->nr_stripes;
> >>> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> >>> +    if (!n || n > max_stripes) {
> >>> +        pr_err("nr_stripes too large or invalid: %llu (start sector
> >>> beyond end of disk?)\n",
> >>> +            n);
> >>> +        goto unblock_and_exit;
> >>> +    }
> >>> +    d->nr_stripes = n;
> >>> +
> >>> +    n = d->nr_stripes * sizeof(atomic_t);
> >>> +    n_old = nr_stripes_old * sizeof(atomic_t);
> >>> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
> >>> +        n, GFP_KERNEL);
> >>> +    if (!d->stripe_sectors_dirty)
> >>> +        goto unblock_and_exit;
> >>> +
> >>> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> >>> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> >>> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old,
> >>> n, GFP_KERNEL);
> >>> +    if (!d->full_dirty_stripes)
> >>> +        goto unblock_and_exit;
> >>
> >> If kvrealloc() fails and NULL set to d->full_dirty_sripes, the previous
> >> array should be restored.
> >>
> >>> +
> >>> +    res = true;
> >>> +
> >>> +unblock_and_exit:
> >>> +    up_write(&dc->writeback_lock);
> >>> +    return res;
> >>> +}
> >>> +
> >>
> >> I didn't test the patch, from the first glance, I feel the failure
> >> handling should restore all previous values, otherwise the cache device
> >> may be in a non-consistent state when extend size fails.
> >>
> > Completely agree with you on this point. I changed it locally and, as
> > soon as we agree on the other points I'll send a newer version of the
> > patch.
> >>
> >>> struct async_reg_args {
> >>>     struct delayed_work reg_work;
> >>>     char *path;
> >>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
> >>> *k, struct kobj_attribute *attr,
> >>>             mutex_lock(&bch_register_lock);
> >>>             if (lookup_bdev(strim(path), &dev) == 0 &&
> >>>                 bch_is_open(dev))
> >>> -                err = "device already registered";
> >>> +                if (bch_update_capacity(dev))
> >>> +                    err = "capacity changed";
> >>> +                else
> >>> +                    err = "device already registered";
> >>
> >>
> >> As I said, it should be a separated write-only sysfile under the cache
> >> device's directory.
> > Can I ask why you don't like the automatic resize way? Why should the
> > resize be manual?
>
> Most of system administrators don’t like such silently automatic things. They want to extend the size explicitly, especially when there is other dependences in their configurations.
>
What I was trying to say is that, in order to resize a block device, a
manual command should be executed. So, this is already a "non-silent"
automatic thing.
Moreover, if the block device has a FS on it, the FS needs to be
manually grown with some special utilities, e.g. xfs_growfs. So,
again, another non-silent automatic step. Don't you agree?
For example, to resize a qcow device attached to a VM I'm manually
doing a `virsh blockresize`. As soon as I issue that command, the
virtio_blk driver inside the VM detects the disk size change and calls
the `set_capacity_and_notify` function. Why then should bcache behave
differently?

If you're concerned that this can somehow break the
behaviour-compatibility with older versions of the driver, can we
protect this automatic discovery with an optional parameter? Will this
be an option you will take into account?

Thank you very much,
Andrea
>
> > Someone needs to trigger the block device resize, so shouldn't that be enough?
>
> Yes, an explicit write operation on a sysfs file to trigger the resize. Then we can permit people to do the size extend explicit and avoid to change current behavior.
>
> Thanks.
>
> Coly Li
>
> >
> > Andrea
> >>
> >>
> >>> else
> >>>                 err = "device busy";
> >>>             mutex_unlock(&bch_register_lock);
> >>> --
> >>> 2.37.3
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-25 10:07             ` Andrea Tomassetti
@ 2023-01-25 17:59               ` Coly Li
  2023-01-27 12:44                 ` Andrea Tomassetti
  2023-01-27  2:53               ` [RFC] Live resize of bcache " Eric Wheeler
  1 sibling, 1 reply; 29+ messages in thread
From: Coly Li @ 2023-01-25 17:59 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache



> 2023年1月25日 18:07,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@suse.de> wrote:
>>> 

>>>> 
>>>>> struct async_reg_args {
>>>>>    struct delayed_work reg_work;
>>>>>    char *path;
>>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
>>>>> *k, struct kobj_attribute *attr,
>>>>>            mutex_lock(&bch_register_lock);
>>>>>            if (lookup_bdev(strim(path), &dev) == 0 &&
>>>>>                bch_is_open(dev))
>>>>> -                err = "device already registered";
>>>>> +                if (bch_update_capacity(dev))
>>>>> +                    err = "capacity changed";
>>>>> +                else
>>>>> +                    err = "device already registered";
>>>> 
>>>> 
>>>> As I said, it should be a separated write-only sysfile under the cache
>>>> device's directory.
>>> Can I ask why you don't like the automatic resize way? Why should the
>>> resize be manual?
>> 
>> Most of system administrators don’t like such silently automatic things. They want to extend the size explicitly, especially when there is other dependences in their configurations.
>> 
> What I was trying to say is that, in order to resize a block device, a
> manual command should be executed. So, this is already a "non-silent"
> automatic thing.
> Moreover, if the block device has a FS on it, the FS needs to be
> manually grown with some special utilities, e.g. xfs_growfs. So,
> again, another non-silent automatic step. Don't you agree?
> For example, to resize a qcow device attached to a VM I'm manually
> doing a `virsh blockresize`. As soon as I issue that command, the
> virtio_blk driver inside the VM detects the disk size change and calls
> the `set_capacity_and_notify` function. Why then should bcache behave
> differently?

The above VM example makes sense, I am almost convinced.

> 
> If you're concerned that this can somehow break the
> behaviour-compatibility with older versions of the driver, can we
> protect this automatic discovery with an optional parameter? Will this
> be an option you will take into account?

Then let’s forget the option sysfs at this moment. Once you feel the patch is ready for me to testing, please notice me with detailed steps to redo your testing.
At that time during my testing, let’s discuss whether an extra option is necesssary, for now just keep your idea as automatically resize the cached device.

Thanks for your detailed explanation.

Coly Li


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of bcache backing device
  2023-01-25 10:07             ` Andrea Tomassetti
  2023-01-25 17:59               ` Coly Li
@ 2023-01-27  2:53               ` Eric Wheeler
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Wheeler @ 2023-01-27  2:53 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Coly Li, Kent Overstreet, linux-bcache, linux-block

[-- Attachment #1: Type: text/plain, Size: 9032 bytes --]

On Wed, 25 Jan 2023, Andrea Tomassetti wrote:
> On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@suse.de> wrote:
> > > 2023年1月12日 00:01,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> > > On Fri, Dec 30, 2022 at 11:41 AM Coly Li <colyli@suse.de> wrote:
> > >> On 9/8/22 4:32 PM, Andrea Tomassetti wrote:
> > >>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> > >>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > >>> Date: Thu, 8 Sep 2022 09:47:55 +0200
> > >>> Subject: [PATCH] bcache: Add support for live resize of backing devices
> > >>>
> > >>> Here is the first version of the patch. There are some points I noted
> > >>> down
> > >>> that I would like to discuss with you:
> > >>> - I found it pretty convenient to hook the call of the new added
> > >>> function
> > >>>   inside the `register_bcache`. In fact, every time (at least from my
> > >>>   understandings) a disk changes size, it will trigger a new probe and,
> > >>>   thus, `register_bcache` will be triggered. The only inconvenient
> > >>>   is that, in case of success, the function will output
> > >>
> > >> The resize should be triggered manually, and not to do it automatically.
> > >>
> > >> You may create a sysfs file under the cached device's directory, name it
> > >> as "extend_size" or something else you think better.

This is the way it works for other blockdevs:

	echo 1 > /sys/class/block/sdX/device/rescan

... but of course bcache doesn't have a "device" folder.  Maybe a sysfs 
flag named "rescan" or "resize" would be appropriate:

	echo 1 > /sys/block/bcache0/bcache/resize

> > >> Then the sysadmin may extend the cached device size explicitly on a
> > >> predictable time.
> > >>>
> > >>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > >>> index ba3909bb6bea..9a77caf2a18f 100644
> > >>> --- a/drivers/md/bcache/super.c
> > >>> +++ b/drivers/md/bcache/super.c
> > >>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
> > >>>     return bch_is_open_cache(dev) || bch_is_open_backing(dev);
> > >>> }
> > >>>
> > >>> +static bool bch_update_capacity(dev_t dev)
> > >>> +{
> > >>> +    const size_t max_stripes = min_t(size_t, INT_MAX,
> > >>> +                     SIZE_MAX / sizeof(atomic_t));
> > >>> +
> > >>> +    uint64_t n, n_old;
> > >>> +    int nr_stripes_old;
> > >>> +    bool res = false;
> > >>> +
> > >>> +    struct bcache_device *d;
> > >>> +    struct cache_set *c, *tc;
> > >>> +    struct cached_dev *dcp, *t, *dc = NULL;
> > >>> +
> > >>> +    uint64_t parent_nr_sectors;
> > >>> +
> > >>> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> > >>> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> > >>> +            if (dcp->bdev->bd_dev == dev) {
> > >>> +                dc = dcp;
> > >>> +                goto dc_found;
> > >>> +            }
> > >>> +
> > >>> +dc_found:
> > >>> +    if (!dc)
> > >>> +        return false;
> > >>> +
> > >>> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> > >>> +
> > >>> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> > >>> +        return false;
> > >>> +
> > >>
> > >> The above code only handles whole disk using as cached device. If a
> > >> partition of a hard drive is used as cache device, and there are other
> > >> data after this partition, such condition should be handled as well. So
> > >> far I am fine to only extend size when using the whole hard drive as
> > >> cached device, but more code is necessary to check and only permits
> > >> size-extend for such condition.

Is it even possible to hot-resize a partition that is in use?  I usually 
get an error if a filesystem from sdb is mounted (or, presumably, if a 
partition is in use as a backing device by bcache):

	# blockdev --rereadpt /dev/sdb
	blockdev: ioctl error on BLKRRPART: Device or resource busy

(I've always wanted a feature in Linux's partition code to support
hot-grow for non-overlapping partition changes that keep the original
starting sector, like when you grow the last partition, but I
digress...)

> > > I don't understand if there's a misalignment here so let me be more
> > > clear: this patch is intended to support the live resize of *backing
> > > devices*, is this what you mean with "cached device"?
> >
> > Yes, backing device is cached device.
> >
> >
> > > When I was working on the patch I didn't consider the possibility of
> > > live resizing the cache devices, but it should be trivial.
> > > So, as far as I understand, a partition cannot be used as a backing
> > > device, correct? The whole disk should be used as a backing device,
> > > that's why I'm checking and that's why this check should be correct.
> > > Am I wrong?
> >
> > Yes a partition can be used as a backing (cached) device. That is to 
> > say, if a file system is format on top of the cached device, this file 
> > system cannot be directly accessed via the partition. It has to be 
> > accessed via the bcache device e.g. /dev/bcache0.

> > >> As I said, it should be a separated write-only sysfile under the cache
> > >> device's directory.
>
> > > Can I ask why you don't like the automatic resize way? Why should the
> > > resize be manual?
> >
> > Most of system administrators don’t like such silently automatic 
> > things. They want to extend the size explicitly, especially when there 
> > is other dependences in their configurations.

+1, I'm one of those administrators...

> What I was trying to say is that, in order to resize a block device, a
> manual command should be executed. So, this is already a "non-silent"
> automatic thing.

In some cases the "manual command" to which you refer may take weeks:

Lets say you have a big hardware RAID5 and add a disk.  After the resize 
completes the RAID controller reports a new disk size to the OS:

	kernel: sd 2:0:0:1: [sdb] 503296888 512-byte logical blocks: (257 GB/239 GiB)
	kernel: sdb: detected capacity change from 150317101056 to 257688006656

Are you suggesting that if `sdb` is the backing device to `bcache0` that 
`bcache0` should be resized immediately when `sdb` detects a capacity 
change?

I would recommend against it:  while I can't control when the RAID
reconstruction finishes (since it takes 2 weeks for us to grow our
arrays), I would like to control when `bcache0` grows in case there is a
bug that triggers an IO hang (or worse).  At least then we can grow
`bcache0` in a maintenance window in case there is a "surprise".

IMHO, this should be controlable by the admin and should be a _manual_ 
procedure (ie, echo 1 > resize)

> Moreover, if the block device has a FS on it, the FS needs to be
> manually grown with some special utilities, e.g. xfs_growfs.

I've never seen a block dev resize call userspace commands from the kernel 
(like xfs_growfs or resize2fs).  Does that exist?  Again, to avoid
surprises, IMHO this should be a udev thing if someone wants it.

> So, again, another non-silent automatic step. Don't you agree? For 
> example, to resize a qcow device attached to a VM I'm manually doing a 
> `virsh blockresize`. As soon as I issue that command, the virtio_blk 
> driver inside the VM detects the disk size change and calls the 
> `set_capacity_and_notify` function.

Yes, that is expected and desired:

> Why then should bcache behave differently?

Because a VM presents itself as hardware.  When you `virsh blockresize` 
from _outside_ of the VM, then the result is effectively the same as a 
RAID reconstruction completing and notifying the OS as in my example 
above.

However, bcache _should_ call `set_capacity_and_notify` when
  echo 1 > /sys/fs/bcache0/bcache/resize
happens, but I'm not convinced that it is a good idea to auto-resize 
`bcache0` when the backing device `sdb` grows.

> If you're concerned that this can somehow break the 
> behaviour-compatibility with older versions of the driver, can we 
> protect this automatic discovery with an optional parameter? Will this 
> be an option you will take into account?

I like the idea of supporting it as a feature for those who might want it, 
but please default it off for the sanity of admins who like more control.  

Maybe /sys/fs/bcache0/bcache/resize can be tri-state sysfs attribute:

	echo 0 > /sys/fs/bcache0/bcache/resize # default
	echo 1 > /sys/fs/bcache0/bcache/resize # one-shot, resets to 0
	echo 2 > /sys/fs/bcache0/bcache/resize # auto-resize on bdev change

-Eric

> 
> Thank you very much,
> Andrea
> >
> > > Someone needs to trigger the block device resize, so shouldn't that be enough?
> >
> > Yes, an explicit write operation on a sysfs file to trigger the resize. Then we can permit people to do the size extend explicit and avoid to change current behavior.
> >
> > Thanks.
> >
> > Coly Li
> >
> > >
> > > Andrea
> > >>
> > >>
> > >>> else
> > >>>                 err = "device busy";
> > >>>             mutex_unlock(&bch_register_lock);
> > >>> --
> > >>> 2.37.3
> > >>>
> > >>
> >
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-25 17:59               ` Coly Li
@ 2023-01-27 12:44                 ` Andrea Tomassetti
  2023-01-27 22:40                   ` Eric Wheeler
                                     ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2023-01-27 12:44 UTC (permalink / raw)
  To: Coly Li; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache

 From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Date: Thu, 8 Sep 2022 09:47:55 +0200
Subject: [PATCH v2] bcache: Add support for live resize of backing devices

Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
---
Hi Coly,
this is the second version of the patch. As you correctly pointed out,
I implemented roll-back functionalities in case of error.
I'm testing this funcionality using QEMU/KVM vm via libvirt.
Here the steps:
   1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
   2. mkfs.xfs /dev/bcache0
   3. mount /dev/bcache0 /mnt
   3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
   4. md5sum /mnt/random0 | tee /mnt/random0.md5
   5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size 
<new-size>
   6. xfs_growfs /dev/bcache0
   6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
   7. umount/reboot/remount and check that the md5 hashes are correct with
         md5sum -c /mnt/random?.md5

  drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
  1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index ba3909bb6bea..1435a3f605f8 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2443,6 +2443,85 @@ static bool bch_is_open(dev_t dev)
  	return bch_is_open_cache(dev) || bch_is_open_backing(dev);
  }

+static bool bch_update_capacity(dev_t dev)
+{
+	const size_t max_stripes = min_t(size_t, INT_MAX,
+					 SIZE_MAX / sizeof(atomic_t));
+
+	uint64_t n, n_old, orig_cached_sectors = 0;
+	void *tmp_realloc;
+
+	int nr_stripes_old;
+	bool res = false;
+
+	struct bcache_device *d;
+	struct cache_set *c, *tc;
+	struct cached_dev *dcp, *t, *dc = NULL;
+
+	uint64_t parent_nr_sectors;
+
+	list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
+		list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
+			if (dcp->bdev->bd_dev == dev) {
+				dc = dcp;
+				goto dc_found;
+			}
+
+dc_found:
+	if (!dc)
+		return false;
+
+	parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
+
+	if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
+		return false;
+
+	d = &dc->disk;
+	orig_cached_sectors = d->c->cached_dev_sectors;
+
+	/* Force cached device sectors re-calc */
+	calc_cached_dev_sectors(d->c);
+
+	/* Block writeback thread */
+	down_write(&dc->writeback_lock);
+	nr_stripes_old = d->nr_stripes;
+	n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
+	if (!n || n > max_stripes) {
+		pr_err("nr_stripes too large or invalid: %llu (start sector beyond 
end of disk?)\n",
+			n);
+		goto restore_dev_sectors;
+	}
+	d->nr_stripes = n;
+
+	n = d->nr_stripes * sizeof(atomic_t);
+	n_old = nr_stripes_old * sizeof(atomic_t);
+	tmp_realloc = kvrealloc(d->stripe_sectors_dirty, n_old,
+		n, GFP_KERNEL);
+	if (!tmp_realloc)
+		goto restore_nr_stripes;
+
+	d->stripe_sectors_dirty = (atomic_t *) tmp_realloc;
+
+	n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
+	n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
+	tmp_realloc = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
+	if (!tmp_realloc)
+		goto restore_nr_stripes;
+
+	d->full_dirty_stripes = (unsigned long *) tmp_realloc;
+
+	if ((res = set_capacity_and_notify(dc->disk.disk, parent_nr_sectors)))
+		goto unblock_and_exit;
+
+restore_nr_stripes:
+	d->nr_stripes = nr_stripes_old;
+restore_dev_sectors:
+	d->c->cached_dev_sectors = orig_cached_sectors;
+unblock_and_exit:
+	up_write(&dc->writeback_lock);
+	return res;
+}
+
  struct async_reg_args {
  	struct delayed_work reg_work;
  	char *path;
@@ -2569,7 +2648,10 @@ static ssize_t register_bcache(struct kobject *k, 
struct kobj_attribute *attr,
  			mutex_lock(&bch_register_lock);
  			if (lookup_bdev(strim(path), &dev) == 0 &&
  			    bch_is_open(dev))
-				err = "device already registered";
+				if (bch_update_capacity(dev))
+					err = "capacity changed";
+				else
+					err = "device already registered";
  			else
  				err = "device busy";
  			mutex_unlock(&bch_register_lock);
--
2.39.0



On 25/1/23 18:59, Coly Li wrote:
> 
> 
>> 2023年1月25日 18:07,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
>>
>> On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@suse.de> wrote:
>>>>
> 
>>>>>
>>>>>> struct async_reg_args {
>>>>>>     struct delayed_work reg_work;
>>>>>>     char *path;
>>>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
>>>>>> *k, struct kobj_attribute *attr,
>>>>>>             mutex_lock(&bch_register_lock);
>>>>>>             if (lookup_bdev(strim(path), &dev) == 0 &&
>>>>>>                 bch_is_open(dev))
>>>>>> -                err = "device already registered";
>>>>>> +                if (bch_update_capacity(dev))
>>>>>> +                    err = "capacity changed";
>>>>>> +                else
>>>>>> +                    err = "device already registered";
>>>>>
>>>>>
>>>>> As I said, it should be a separated write-only sysfile under the cache
>>>>> device's directory.
>>>> Can I ask why you don't like the automatic resize way? Why should the
>>>> resize be manual?
>>>
>>> Most of system administrators don’t like such silently automatic things. They want to extend the size explicitly, especially when there is other dependences in their configurations.
>>>
>> What I was trying to say is that, in order to resize a block device, a
>> manual command should be executed. So, this is already a "non-silent"
>> automatic thing.
>> Moreover, if the block device has a FS on it, the FS needs to be
>> manually grown with some special utilities, e.g. xfs_growfs. So,
>> again, another non-silent automatic step. Don't you agree?
>> For example, to resize a qcow device attached to a VM I'm manually
>> doing a `virsh blockresize`. As soon as I issue that command, the
>> virtio_blk driver inside the VM detects the disk size change and calls
>> the `set_capacity_and_notify` function. Why then should bcache behave
>> differently?
> 
> The above VM example makes sense, I am almost convinced.
> 
>>
>> If you're concerned that this can somehow break the
>> behaviour-compatibility with older versions of the driver, can we
>> protect this automatic discovery with an optional parameter? Will this
>> be an option you will take into account?
> 
> Then let’s forget the option sysfs at this moment. Once you feel the patch is ready for me to testing, please notice me with detailed steps to redo your testing.
> At that time during my testing, let’s discuss whether an extra option is necesssary, for now just keep your idea as automatically resize the cached device.
> 
> Thanks for your detailed explanation.
> 
> Coly Li
> 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-27 12:44                 ` Andrea Tomassetti
@ 2023-01-27 22:40                   ` Eric Wheeler
  2023-01-31 10:20                     ` Andrea Tomassetti
  2023-02-02 17:18                   ` Coly Li
  2023-02-19  9:39                   ` Coly Li
  2 siblings, 1 reply; 29+ messages in thread
From: Eric Wheeler @ 2023-01-27 22:40 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Coly Li, Kent Overstreet, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 8019 bytes --]

On Fri, 27 Jan 2023, Andrea Tomassetti wrote:

> From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> Date: Thu, 8 Sep 2022 09:47:55 +0200
> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> 
> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> ---
> Hi Coly,
> this is the second version of the patch. As you correctly pointed out,
> I implemented roll-back functionalities in case of error.
> I'm testing this funcionality using QEMU/KVM vm via libvirt.
> Here the steps:
>   1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>   2. mkfs.xfs /dev/bcache0
>   3. mount /dev/bcache0 /mnt
>   3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
>   4. md5sum /mnt/random0 | tee /mnt/random0.md5
>   5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size 
> <new-size>
>   6. xfs_growfs /dev/bcache0
>   6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
>   7. umount/reboot/remount and check that the md5 hashes are correct with
>         md5sum -c /mnt/random?.md5
> 
>  drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 83 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index ba3909bb6bea..1435a3f605f8 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2443,6 +2443,85 @@ static bool bch_is_open(dev_t dev)
>  	return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>  }
> 
> +static bool bch_update_capacity(dev_t dev)
> +{
> +	const size_t max_stripes = min_t(size_t, INT_MAX,
> +					 SIZE_MAX / sizeof(atomic_t));
> +
> +	uint64_t n, n_old, orig_cached_sectors = 0;
> +	void *tmp_realloc;
> +
> +	int nr_stripes_old;
> +	bool res = false;
> +
> +	struct bcache_device *d;
> +	struct cache_set *c, *tc;
> +	struct cached_dev *dcp, *t, *dc = NULL;
> +
> +	uint64_t parent_nr_sectors;
> +
> +	list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> +		list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> +			if (dcp->bdev->bd_dev == dev) {
> +				dc = dcp;
> +				goto dc_found;
> +			}
> +
> +dc_found:
> +	if (!dc)
> +		return false;
> +
> +	parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> +
> +	if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> +		return false;
> +
> +	d = &dc->disk;
> +	orig_cached_sectors = d->c->cached_dev_sectors;
> +
> +	/* Force cached device sectors re-calc */
> +	calc_cached_dev_sectors(d->c);
> +
> +	/* Block writeback thread */
> +	down_write(&dc->writeback_lock);
> +	nr_stripes_old = d->nr_stripes;
> +	n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> +	if (!n || n > max_stripes) {
> +		pr_err("nr_stripes too large or invalid: %llu (start sector
> beyond end of disk?)\n",
> +			n);
> +		goto restore_dev_sectors;
> +	}
> +	d->nr_stripes = n;

It looks like there is some overlap between the new bch_update_capacity() 
function and the existing bcache_device_init() function:

	https://github.com/torvalds/linux/blob/master/drivers/md/bcache/super.c#L909

IMHO, it would be ideal if bch_update_capacity() could also handle the 
uninitialized state of full_dirty_stripes/stripe_sectors_dirty (and any 
related bits) at bdev registration time so bcache_device_init() can call 
bch_update_capacity().  Thus, bch_update_capacity() would replace the 
set_capacity() call in bcache_device_init().

If a call to bch_update_capacity() can set the size at registration or 
resize as necessary then it would remove duplicate code and keep 
initialization in only one place.

I'll defer to what Coly thinks is best, just my $0.02 

Cheers,


--
Eric Wheeler



> +
> +	n = d->nr_stripes * sizeof(atomic_t);
> +	n_old = nr_stripes_old * sizeof(atomic_t);
> +	tmp_realloc = kvrealloc(d->stripe_sectors_dirty, n_old,
> +		n, GFP_KERNEL);
> +	if (!tmp_realloc)
> +		goto restore_nr_stripes;
> +
> +	d->stripe_sectors_dirty = (atomic_t *) tmp_realloc;
> +
> +	n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> +	n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> +	tmp_realloc = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
> +	if (!tmp_realloc)
> +		goto restore_nr_stripes;
> +
> +	d->full_dirty_stripes = (unsigned long *) tmp_realloc;
> +
> +	if ((res = set_capacity_and_notify(dc->disk.disk, parent_nr_sectors)))
> +		goto unblock_and_exit;
> +
> +restore_nr_stripes:
> +	d->nr_stripes = nr_stripes_old;
> +restore_dev_sectors:
> +	d->c->cached_dev_sectors = orig_cached_sectors;
> +unblock_and_exit:
> +	up_write(&dc->writeback_lock);
> +	return res;
> +}
> +
>  struct async_reg_args {
>  	struct delayed_work reg_work;
>  	char *path;
> @@ -2569,7 +2648,10 @@ static ssize_t register_bcache(struct kobject *k,
> struct kobj_attribute *attr,
>  			mutex_lock(&bch_register_lock);
>  			if (lookup_bdev(strim(path), &dev) == 0 &&
>  			    bch_is_open(dev))
> -				err = "device already registered";
> +				if (bch_update_capacity(dev))
> +					err = "capacity changed";
> +				else
> +					err = "device already registered";
>  			else
>  				err = "device busy";
>  			mutex_unlock(&bch_register_lock);
> --
> 2.39.0
> 
> 
> 
> On 25/1/23 18:59, Coly Li wrote:
> > 
> > 
> >> 2023年1月25日 18:07,Andrea Tomassetti
> >> <andrea.tomassetti-opensource@devo.com> 写道:
> >>
> >> On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@suse.de> wrote:
> >>>>
> > 
> >>>>>
> >>>>>> struct async_reg_args {
> >>>>>>     struct delayed_work reg_work;
> >>>>>>     char *path;
> >>>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
> >>>>>> *k, struct kobj_attribute *attr,
> >>>>>>             mutex_lock(&bch_register_lock);
> >>>>>>             if (lookup_bdev(strim(path), &dev) == 0 &&
> >>>>>>                 bch_is_open(dev))
> >>>>>> -                err = "device already registered";
> >>>>>> +                if (bch_update_capacity(dev))
> >>>>>> +                    err = "capacity changed";
> >>>>>> +                else
> >>>>>> +                    err = "device already registered";
> >>>>>
> >>>>>
> >>>>> As I said, it should be a separated write-only sysfile under the cache
> >>>>> device's directory.
> >>>> Can I ask why you don't like the automatic resize way? Why should the
> >>>> resize be manual?
> >>>
> >>> Most of system administrators don’t like such silently automatic things.
> >>> They want to extend the size explicitly, especially when there is other
> >>> dependences in their configurations.
> >>>
> >> What I was trying to say is that, in order to resize a block device, a
> >> manual command should be executed. So, this is already a "non-silent"
> >> automatic thing.
> >> Moreover, if the block device has a FS on it, the FS needs to be
> >> manually grown with some special utilities, e.g. xfs_growfs. So,
> >> again, another non-silent automatic step. Don't you agree?
> >> For example, to resize a qcow device attached to a VM I'm manually
> >> doing a `virsh blockresize`. As soon as I issue that command, the
> >> virtio_blk driver inside the VM detects the disk size change and calls
> >> the `set_capacity_and_notify` function. Why then should bcache behave
> >> differently?
> > 
> > The above VM example makes sense, I am almost convinced.
> > 
> >>
> >> If you're concerned that this can somehow break the
> >> behaviour-compatibility with older versions of the driver, can we
> >> protect this automatic discovery with an optional parameter? Will this
> >> be an option you will take into account?
> > 
> > Then let’s forget the option sysfs at this moment. Once you feel the patch
> > is ready for me to testing, please notice me with detailed steps to redo
> > your testing.
> > At that time during my testing, let’s discuss whether an extra option is
> > necesssary, for now just keep your idea as automatically resize the cached
> > device.
> > 
> > Thanks for your detailed explanation.
> > 
> > Coly Li
> > 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-27 22:40                   ` Eric Wheeler
@ 2023-01-31 10:20                     ` Andrea Tomassetti
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2023-01-31 10:20 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Coly Li, Kent Overstreet, linux-bcache

Hi Eric,
thank you for your two cents.


On Fri, Jan 27, 2023 at 11:40 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>
> On Fri, 27 Jan 2023, Andrea Tomassetti wrote:
>
> > From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
> > From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > Date: Thu, 8 Sep 2022 09:47:55 +0200
> > Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> >
> > Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > ---
> > Hi Coly,
> > this is the second version of the patch. As you correctly pointed out,
> > I implemented roll-back functionalities in case of error.
> > I'm testing this funcionality using QEMU/KVM vm via libvirt.
> > Here the steps:
> >   1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
> >   2. mkfs.xfs /dev/bcache0
> >   3. mount /dev/bcache0 /mnt
> >   3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
> >   4. md5sum /mnt/random0 | tee /mnt/random0.md5
> >   5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size
> > <new-size>
> >   6. xfs_growfs /dev/bcache0
> >   6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
> >   7. umount/reboot/remount and check that the md5 hashes are correct with
> >         md5sum -c /mnt/random?.md5
> >
> >  drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 83 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> > index ba3909bb6bea..1435a3f605f8 100644
> > --- a/drivers/md/bcache/super.c
> > +++ b/drivers/md/bcache/super.c
> > @@ -2443,6 +2443,85 @@ static bool bch_is_open(dev_t dev)
> >       return bch_is_open_cache(dev) || bch_is_open_backing(dev);
> >  }
> >
> > +static bool bch_update_capacity(dev_t dev)
> > +{
> > +     const size_t max_stripes = min_t(size_t, INT_MAX,
> > +                                      SIZE_MAX / sizeof(atomic_t));
> > +
> > +     uint64_t n, n_old, orig_cached_sectors = 0;
> > +     void *tmp_realloc;
> > +
> > +     int nr_stripes_old;
> > +     bool res = false;
> > +
> > +     struct bcache_device *d;
> > +     struct cache_set *c, *tc;
> > +     struct cached_dev *dcp, *t, *dc = NULL;
> > +
> > +     uint64_t parent_nr_sectors;
> > +
> > +     list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> > +             list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> > +                     if (dcp->bdev->bd_dev == dev) {
> > +                             dc = dcp;
> > +                             goto dc_found;
> > +                     }
> > +
> > +dc_found:
> > +     if (!dc)
> > +             return false;
> > +
> > +     parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> > +
> > +     if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> > +             return false;
> > +
> > +     d = &dc->disk;
> > +     orig_cached_sectors = d->c->cached_dev_sectors;
> > +
> > +     /* Force cached device sectors re-calc */
> > +     calc_cached_dev_sectors(d->c);
> > +
> > +     /* Block writeback thread */
> > +     down_write(&dc->writeback_lock);
> > +     nr_stripes_old = d->nr_stripes;
> > +     n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> > +     if (!n || n > max_stripes) {
> > +             pr_err("nr_stripes too large or invalid: %llu (start sector
> > beyond end of disk?)\n",
> > +                     n);
> > +             goto restore_dev_sectors;
> > +     }
> > +     d->nr_stripes = n;
>
> It looks like there is some overlap between the new bch_update_capacity()
> function and the existing bcache_device_init() function:

This is something I already pointed out in the first version of my
patch, some emails ago:

  - There is some reused code between this new function and
    `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
    a broader scope or make it a macro, what do you think?

>
>         https://github.com/torvalds/linux/blob/master/drivers/md/bcache/super.c#L909
>
> IMHO, it would be ideal if bch_update_capacity() could also handle the
> uninitialized state of full_dirty_stripes/stripe_sectors_dirty (and any
> related bits) at bdev registration time so bcache_device_init() can call
> bch_update_capacity().  Thus, bch_update_capacity() would replace the
> set_capacity() call in bcache_device_init().
>
Nice idea

Regards,
Andrea

> If a call to bch_update_capacity() can set the size at registration or
> resize as necessary then it would remove duplicate code and keep
> initialization in only one place.
>
> I'll defer to what Coly thinks is best, just my $0.02
>
> Cheers,
>
>
> --
> Eric Wheeler
>
>
>
> > +
> > +     n = d->nr_stripes * sizeof(atomic_t);
> > +     n_old = nr_stripes_old * sizeof(atomic_t);
> > +     tmp_realloc = kvrealloc(d->stripe_sectors_dirty, n_old,
> > +             n, GFP_KERNEL);
> > +     if (!tmp_realloc)
> > +             goto restore_nr_stripes;
> > +
> > +     d->stripe_sectors_dirty = (atomic_t *) tmp_realloc;
> > +
> > +     n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> > +     n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> > +     tmp_realloc = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
> > +     if (!tmp_realloc)
> > +             goto restore_nr_stripes;
> > +
> > +     d->full_dirty_stripes = (unsigned long *) tmp_realloc;
> > +
> > +     if ((res = set_capacity_and_notify(dc->disk.disk, parent_nr_sectors)))
> > +             goto unblock_and_exit;
> > +
> > +restore_nr_stripes:
> > +     d->nr_stripes = nr_stripes_old;
> > +restore_dev_sectors:
> > +     d->c->cached_dev_sectors = orig_cached_sectors;
> > +unblock_and_exit:
> > +     up_write(&dc->writeback_lock);
> > +     return res;
> > +}
> > +
> >  struct async_reg_args {
> >       struct delayed_work reg_work;
> >       char *path;
> > @@ -2569,7 +2648,10 @@ static ssize_t register_bcache(struct kobject *k,
> > struct kobj_attribute *attr,
> >                       mutex_lock(&bch_register_lock);
> >                       if (lookup_bdev(strim(path), &dev) == 0 &&
> >                           bch_is_open(dev))
> > -                             err = "device already registered";
> > +                             if (bch_update_capacity(dev))
> > +                                     err = "capacity changed";
> > +                             else
> > +                                     err = "device already registered";
> >                       else
> >                               err = "device busy";
> >                       mutex_unlock(&bch_register_lock);
> > --
> > 2.39.0
> >
> >
> >
> > On 25/1/23 18:59, Coly Li wrote:
> > >
> > >
> > >> 2023年1月25日 18:07,Andrea Tomassetti
> > >> <andrea.tomassetti-opensource@devo.com> 写道:
> > >>
> > >> On Tue, Jan 17, 2023 at 5:18 PM Coly Li <colyli@suse.de> wrote:
> > >>>>
> > >
> > >>>>>
> > >>>>>> struct async_reg_args {
> > >>>>>>     struct delayed_work reg_work;
> > >>>>>>     char *path;
> > >>>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject
> > >>>>>> *k, struct kobj_attribute *attr,
> > >>>>>>             mutex_lock(&bch_register_lock);
> > >>>>>>             if (lookup_bdev(strim(path), &dev) == 0 &&
> > >>>>>>                 bch_is_open(dev))
> > >>>>>> -                err = "device already registered";
> > >>>>>> +                if (bch_update_capacity(dev))
> > >>>>>> +                    err = "capacity changed";
> > >>>>>> +                else
> > >>>>>> +                    err = "device already registered";
> > >>>>>
> > >>>>>
> > >>>>> As I said, it should be a separated write-only sysfile under the cache
> > >>>>> device's directory.
> > >>>> Can I ask why you don't like the automatic resize way? Why should the
> > >>>> resize be manual?
> > >>>
> > >>> Most of system administrators don’t like such silently automatic things.
> > >>> They want to extend the size explicitly, especially when there is other
> > >>> dependences in their configurations.
> > >>>
> > >> What I was trying to say is that, in order to resize a block device, a
> > >> manual command should be executed. So, this is already a "non-silent"
> > >> automatic thing.
> > >> Moreover, if the block device has a FS on it, the FS needs to be
> > >> manually grown with some special utilities, e.g. xfs_growfs. So,
> > >> again, another non-silent automatic step. Don't you agree?
> > >> For example, to resize a qcow device attached to a VM I'm manually
> > >> doing a `virsh blockresize`. As soon as I issue that command, the
> > >> virtio_blk driver inside the VM detects the disk size change and calls
> > >> the `set_capacity_and_notify` function. Why then should bcache behave
> > >> differently?
> > >
> > > The above VM example makes sense, I am almost convinced.
> > >
> > >>
> > >> If you're concerned that this can somehow break the
> > >> behaviour-compatibility with older versions of the driver, can we
> > >> protect this automatic discovery with an optional parameter? Will this
> > >> be an option you will take into account?
> > >
> > > Then let’s forget the option sysfs at this moment. Once you feel the patch
> > > is ready for me to testing, please notice me with detailed steps to redo
> > > your testing.
> > > At that time during my testing, let’s discuss whether an extra option is
> > > necesssary, for now just keep your idea as automatically resize the cached
> > > device.
> > >
> > > Thanks for your detailed explanation.
> > >
> > > Coly Li
> > >
> >
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-27 12:44                 ` Andrea Tomassetti
  2023-01-27 22:40                   ` Eric Wheeler
@ 2023-02-02 17:18                   ` Coly Li
  2023-02-02 20:48                     ` Eric Wheeler
  2023-02-19  9:39                   ` Coly Li
  2 siblings, 1 reply; 29+ messages in thread
From: Coly Li @ 2023-02-02 17:18 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache



> 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> Date: Thu, 8 Sep 2022 09:47:55 +0200
> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> 
> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> ---
> Hi Coly,
> this is the second version of the patch. As you correctly pointed out,
> I implemented roll-back functionalities in case of error.
> I'm testing this funcionality using QEMU/KVM vm via libvirt.
> Here the steps:
>  1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>  2. mkfs.xfs /dev/bcache0
>  3. mount /dev/bcache0 /mnt
>  3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
>  4. md5sum /mnt/random0 | tee /mnt/random0.md5
>  5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size <new-size>
>  6. xfs_growfs /dev/bcache0
>  6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
>  7. umount/reboot/remount and check that the md5 hashes are correct with
>        md5sum -c /mnt/random?.md5

[snipped]

Hi Andrea,

For the above step 5, could you provide a specific command line for me to reproduce?

I tried to resize the disk by virus from 400G to 800G, the disk resize doesn’t happen online, I have to stop and restart the virtual machine and see the resized disk.

Thanks.

Coly Li


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-02 17:18                   ` Coly Li
@ 2023-02-02 20:48                     ` Eric Wheeler
  2023-02-03  2:41                       ` Coly Li
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Wheeler @ 2023-02-02 20:48 UTC (permalink / raw)
  To: Coly Li; +Cc: Andrea Tomassetti, Kent Overstreet, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]

On Fri, 3 Feb 2023, Coly Li wrote:
> > 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> > From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
> > From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > Date: Thu, 8 Sep 2022 09:47:55 +0200
> > Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> > 
> > Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > ---
> > Hi Coly,
> > this is the second version of the patch. As you correctly pointed out,
> > I implemented roll-back functionalities in case of error.
> > I'm testing this funcionality using QEMU/KVM vm via libvirt.
> > Here the steps:
> >  1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
> >  2. mkfs.xfs /dev/bcache0
> >  3. mount /dev/bcache0 /mnt
> >  3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
> >  4. md5sum /mnt/random0 | tee /mnt/random0.md5
> >  5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size <new-size>
> >  6. xfs_growfs /dev/bcache0
> >  6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
> >  7. umount/reboot/remount and check that the md5 hashes are correct with
> >        md5sum -c /mnt/random?.md5
> 
> [snipped]
> 
> Hi Andrea,
> 
> For the above step 5, could you provide a specific command line for me to reproduce?
> 
> I tried to resize the disk by virsh from 400G to 800G, the disk resize 
> doesn’t happen online, I have to stop and restart the virtual machine 
> and see the resized disk.

Coly, 

The `virsh blockresize` command only informs the vm that the size has 
changed. You will need to resize the disk itself using `lvresize` or 
whatever your VM's backing device is. If you're backing device for the 
virtual machine is a file, then you could use `truncate -s <size> 
/file/path` before issuing `virsh blockresize`.

Another possibility: are you using virtio-blk as shown in his example, or 
a different virtual driver like virtio-scsi?

If the VM doesn't detect it you might need to do this if its a 
virtio-scsi disk on the inside:

	for i in /sys/class/scsi_host/*/scan; do echo - - - > $i; done

If you're using virtio-blk (vda) then it should be immediate.  I don't 
think IDE will work at all, and not sure about nvme.

--
Eric Wheeler

	 
> Thanks.
> 
> Coly Li
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-02 20:48                     ` Eric Wheeler
@ 2023-02-03  2:41                       ` Coly Li
  0 siblings, 0 replies; 29+ messages in thread
From: Coly Li @ 2023-02-03  2:41 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Andrea Tomassetti, Kent Overstreet, linux-bcache



> 2023年2月3日 04:48,Eric Wheeler <bcache@lists.ewheeler.net> 写道:
> 
> On Fri, 3 Feb 2023, Coly Li wrote:
>>> 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
>>> From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
>>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>>> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
>>> 
>>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>> ---
>>> Hi Coly,
>>> this is the second version of the patch. As you correctly pointed out,
>>> I implemented roll-back functionalities in case of error.
>>> I'm testing this funcionality using QEMU/KVM vm via libvirt.
>>> Here the steps:
>>> 1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>> 2. mkfs.xfs /dev/bcache0
>>> 3. mount /dev/bcache0 /mnt
>>> 3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
>>> 4. md5sum /mnt/random0 | tee /mnt/random0.md5
>>> 5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size <new-size>
>>> 6. xfs_growfs /dev/bcache0
>>> 6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
>>> 7. umount/reboot/remount and check that the md5 hashes are correct with
>>>       md5sum -c /mnt/random?.md5
>> 
>> [snipped]
>> 
>> Hi Andrea,
>> 
>> For the above step 5, could you provide a specific command line for me to reproduce?
>> 
>> I tried to resize the disk by virsh from 400G to 800G, the disk resize 
>> doesn’t happen online, I have to stop and restart the virtual machine 
>> and see the resized disk.
> 
> Coly, 
> 
> The `virsh blockresize` command only informs the vm that the size has 
> changed. You will need to resize the disk itself using `lvresize` or 
> whatever your VM's backing device is. If you're backing device for the 
> virtual machine is a file, then you could use `truncate -s <size> 
> /file/path` before issuing `virsh blockresize`.
> 
> Another possibility: are you using virtio-blk as shown in his example, or 
> a different virtual driver like virtio-scsi?
> 

Yes, it was virtio-scsi drive.


> If the VM doesn't detect it you might need to do this if its a 
> virtio-scsi disk on the inside:
> 
> for i in /sys/class/scsi_host/*/scan; do echo - - - > $i; done
> 
> If you're using virtio-blk (vda) then it should be immediate.  I don't 
> think IDE will work at all, and not sure about nvme.

Sure, now I switch to virtio-blk drive and I can see the backing device size extends immediately after blockresize command in virsh.

Thanks.

Coly Li


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-01-27 12:44                 ` Andrea Tomassetti
  2023-01-27 22:40                   ` Eric Wheeler
  2023-02-02 17:18                   ` Coly Li
@ 2023-02-19  9:39                   ` Coly Li
  2023-02-20  8:27                     ` mingzhe
  2 siblings, 1 reply; 29+ messages in thread
From: Coly Li @ 2023-02-19  9:39 UTC (permalink / raw)
  To: Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache



> 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> Date: Thu, 8 Sep 2022 09:47:55 +0200
> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> 
> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>

Hi Andrea,

I am fine with this patch and added it in my test queue now. Do you have an updated version, (e.g. more coding refine or adding commit log), then I can update my local version.

BTW, it could be better if the patch will be sent out as a separated email.

Thanks.

Coly Li



> ---
> Hi Coly,
> this is the second version of the patch. As you correctly pointed out,
> I implemented roll-back functionalities in case of error.
> I'm testing this funcionality using QEMU/KVM vm via libvirt.
> Here the steps:
>  1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>  2. mkfs.xfs /dev/bcache0
>  3. mount /dev/bcache0 /mnt
>  3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
>  4. md5sum /mnt/random0 | tee /mnt/random0.md5
>  5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size <new-size>
>  6. xfs_growfs /dev/bcache0
>  6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
>  7. umount/reboot/remount and check that the md5 hashes are correct with
>        md5sum -c /mnt/random?.md5
> 
> drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 83 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index ba3909bb6bea..1435a3f605f8 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
>> 

[snipped]


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-19  9:39                   ` Coly Li
@ 2023-02-20  8:27                     ` mingzhe
  2023-02-20 12:29                       ` Coly Li
  2023-02-27 22:08                       ` Eric Wheeler
  0 siblings, 2 replies; 29+ messages in thread
From: mingzhe @ 2023-02-20  8:27 UTC (permalink / raw)
  To: Coly Li, Andrea Tomassetti; +Cc: Eric Wheeler, Kent Overstreet, linux-bcache



在 2023/2/19 17:39, Coly Li 写道:
> 
> 
>> 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
>>
>>  From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
>>
>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> 
> Hi Andrea,
> 
> I am fine with this patch and added it in my test queue now. Do you have an updated version, (e.g. more coding refine or adding commit log), then I can update my local version.
> 
> BTW, it could be better if the patch will be sent out as a separated email.
> 
> Thanks.
> 
> Coly Li
> 
Hi, Coly

I posted some patchsets about online resize.

-[PATCH v5 1/3] bcache: add dirty_data in struct bcache_device
-[PATCH v5 2/3] bcache: allocate stripe memory when 
partial_stripes_expensive is true
-[PATCH v5 3/3] bcache: support online resizing of cached_dev

There are some differences:
1. Create /sys/block/bcache0/bcache/size in sysfs to trigger resize
2. Allocate stripe memory only if partial_stripes_expensive is true
3. Simplify bcache_dev_sectors_dirty()

Since the bcache superblock uses some sectors, the actual space of the 
bcache device is smaller than the backing. In order to provide a bcache 
device with a user-specified size, we need to create a backing device 
with a larger space, and then resize bcache. So resize can specify the 
size is very necessary.




> 
> 
>> ---
>> Hi Coly,
>> this is the second version of the patch. As you correctly pointed out,
>> I implemented roll-back functionalities in case of error.
>> I'm testing this funcionality using QEMU/KVM vm via libvirt.
>> Here the steps:
>>   1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>   2. mkfs.xfs /dev/bcache0
>>   3. mount /dev/bcache0 /mnt
>>   3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
>>   4. md5sum /mnt/random0 | tee /mnt/random0.md5
>>   5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size <new-size>
>>   6. xfs_growfs /dev/bcache0
>>   6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
>>   7. umount/reboot/remount and check that the md5 hashes are correct with
>>         md5sum -c /mnt/random?.md5
>>
>> drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 83 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>> index ba3909bb6bea..1435a3f605f8 100644
>> --- a/drivers/md/bcache/super.c
>> +++ b/drivers/md/bcache/super.c
>>>
> 
> [snipped]
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-20  8:27                     ` mingzhe
@ 2023-02-20 12:29                       ` Coly Li
  2023-02-22  8:42                         ` Andrea Tomassetti
  2023-02-27 22:08                       ` Eric Wheeler
  1 sibling, 1 reply; 29+ messages in thread
From: Coly Li @ 2023-02-20 12:29 UTC (permalink / raw)
  To: mingzhe; +Cc: Andrea Tomassetti, Eric Wheeler, Kent Overstreet, linux-bcache



> 2023年2月20日 16:27,mingzhe <mingzhe.zou@easystack.cn> 写道:
> 
> 
> 
> 在 2023/2/19 17:39, Coly Li 写道:
>>> 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
>>> 
>>> From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
>>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>>> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
>>> 
>>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>> Hi Andrea,
>> I am fine with this patch and added it in my test queue now. Do you have an updated version, (e.g. more coding refine or adding commit log), then I can update my local version.
>> BTW, it could be better if the patch will be sent out as a separated email.
>> Thanks.
>> Coly Li
> Hi, Coly
> 
> I posted some patchsets about online resize.
> 
> -[PATCH v5 1/3] bcache: add dirty_data in struct bcache_device
> -[PATCH v5 2/3] bcache: allocate stripe memory when partial_stripes_expensive is true
> -[PATCH v5 3/3] bcache: support online resizing of cached_dev
> 
> There are some differences:
> 1. Create /sys/block/bcache0/bcache/size in sysfs to trigger resize
> 2. Allocate stripe memory only if partial_stripes_expensive is true
> 3. Simplify bcache_dev_sectors_dirty()
> 
> Since the bcache superblock uses some sectors, the actual space of the bcache device is smaller than the backing. In order to provide a bcache device with a user-specified size, we need to create a backing device with a larger space, and then resize bcache. So resize can specify the size is very necessary.
> 
> 

Yes, they are in my for-test queue already. I will test both and make a choice.

Thanks.

Coly Li

[snipped]


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-20 12:29                       ` Coly Li
@ 2023-02-22  8:42                         ` Andrea Tomassetti
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Tomassetti @ 2023-02-22  8:42 UTC (permalink / raw)
  To: Coly Li; +Cc: mingzhe, Eric Wheeler, Kent Overstreet, linux-bcache

Hi Coly,

On Mon, Feb 20, 2023 at 1:30 PM Coly Li <colyli@suse.de> wrote:
>
>
>
> > 2023年2月20日 16:27,mingzhe <mingzhe.zou@easystack.cn> 写道:
> >
> >
> >
> > 在 2023/2/19 17:39, Coly Li 写道:
> >>> 2023年1月27日 20:44,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> >>>
> >>> From 83f490ec8e81c840bdaf69e66021d661751975f2 Mon Sep 17 00:00:00 2001
> >>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> >>> Date: Thu, 8 Sep 2022 09:47:55 +0200
> >>> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> >>>
> >>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> >> Hi Andrea,
> >> I am fine with this patch and added it in my test queue now. Do you have an updated version, (e.g. more coding refine or adding commit log), then I can update my local version.
Thank you very much. I appreciate it. I don't have any other version,
there's just some code in common with `bcache_device_init` but I
couldn't find an elegant way to avoid copy paste of those few lines.

> >> BTW, it could be better if the patch will be sent out as a separated email.
I'll send it right now on a separate thread.

Thank you,
Andrea

> >> Thanks.
> >> Coly Li
> > Hi, Coly
> >
> > I posted some patchsets about online resize.
> >
> > -[PATCH v5 1/3] bcache: add dirty_data in struct bcache_device
> > -[PATCH v5 2/3] bcache: allocate stripe memory when partial_stripes_expensive is true
> > -[PATCH v5 3/3] bcache: support online resizing of cached_dev
> >
> > There are some differences:
> > 1. Create /sys/block/bcache0/bcache/size in sysfs to trigger resize
> > 2. Allocate stripe memory only if partial_stripes_expensive is true
> > 3. Simplify bcache_dev_sectors_dirty()
> >
> > Since the bcache superblock uses some sectors, the actual space of the bcache device is smaller than the backing. In order to provide a bcache device with a user-specified size, we need to create a backing device with a larger space, and then resize bcache. So resize can specify the size is very necessary.
> >
> >
>
> Yes, they are in my for-test queue already. I will test both and make a choice.
>
> Thanks.
>
> Coly Li
>
> [snipped]
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-20  8:27                     ` mingzhe
  2023-02-20 12:29                       ` Coly Li
@ 2023-02-27 22:08                       ` Eric Wheeler
  2023-02-28  2:46                         ` mingzhe
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Wheeler @ 2023-02-27 22:08 UTC (permalink / raw)
  To: mingzhe
  Cc: Coly Li, Andrea Tomassetti, Eric Wheeler, Kent Overstreet, linux-bcache

[-- Attachment #1: Type: text/plain, Size: 2636 bytes --]

On Mon, 20 Feb 2023, mingzhe wrote:
> 在 2023/2/19 17:39, Coly Li 写道:
> >> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
> >>
> >> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> > 
> > Hi Andrea,
> > 
> > I am fine with this patch and added it in my test queue now. Do you have an
> > updated version, (e.g. more coding refine or adding commit log), then I can
> > update my local version.
> > 
> > 
> Hi, Coly
> 
> I posted some patchsets about online resize.
> 
> -[PATCH v5 1/3] bcache: add dirty_data in struct bcache_device
> -[PATCH v5 2/3] bcache: allocate stripe memory when partial_stripes_expensive
> is true
> -[PATCH v5 3/3] bcache: support online resizing of cached_dev
> 
> There are some differences:
> 1. Create /sys/block/bcache0/bcache/size in sysfs to trigger resize

Can the final version name the sysfs entry "resize", because "size" sounds 
like you are setting a specific size, not triggering a resize.

--
Eric Wheeler



> 2. Allocate stripe memory only if partial_stripes_expensive is true
> 3. Simplify bcache_dev_sectors_dirty()
> 
> Since the bcache superblock uses some sectors, the actual space of the bcache
> device is smaller than the backing. In order to provide a bcache device with a
> user-specified size, we need to create a backing device with a larger space,
> and then resize bcache. So resize can specify the size is very necessary.
> 
> 
> 
> 
> > 
> > 
> >> ---
> >> Hi Coly,
> >> this is the second version of the patch. As you correctly pointed out,
> >> I implemented roll-back functionalities in case of error.
> >> I'm testing this funcionality using QEMU/KVM vm via libvirt.
> >> Here the steps:
> >>   1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
> >>   2. mkfs.xfs /dev/bcache0
> >>   3. mount /dev/bcache0 /mnt
> >>   3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
> >>   4. md5sum /mnt/random0 | tee /mnt/random0.md5
> >>   5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size
> >>   <new-size>
> >>   6. xfs_growfs /dev/bcache0
> >>   6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
> >>   7. umount/reboot/remount and check that the md5 hashes are correct with
> >>         md5sum -c /mnt/random?.md5
> >>
> >> drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 83 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> >> index ba3909bb6bea..1435a3f605f8 100644
> >> --- a/drivers/md/bcache/super.c
> >> +++ b/drivers/md/bcache/super.c
> >>>
> > 
> > [snipped]
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC] Live resize of backing device
  2023-02-27 22:08                       ` Eric Wheeler
@ 2023-02-28  2:46                         ` mingzhe
  0 siblings, 0 replies; 29+ messages in thread
From: mingzhe @ 2023-02-28  2:46 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Coly Li, Andrea Tomassetti, Kent Overstreet, linux-bcache



在 2023/2/28 06:08, Eric Wheeler 写道:
> On Mon, 20 Feb 2023, mingzhe wrote:
>> 在 2023/2/19 17:39, Coly Li 写道:
>>>> Subject: [PATCH v2] bcache: Add support for live resize of backing devices
>>>>
>>>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>>
>>> Hi Andrea,
>>>
>>> I am fine with this patch and added it in my test queue now. Do you have an
>>> updated version, (e.g. more coding refine or adding commit log), then I can
>>> update my local version.
>>>
>>>
>> Hi, Coly
>>
>> I posted some patchsets about online resize.
>>
>> -[PATCH v5 1/3] bcache: add dirty_data in struct bcache_device
>> -[PATCH v5 2/3] bcache: allocate stripe memory when partial_stripes_expensive
>> is true
>> -[PATCH v5 3/3] bcache: support online resizing of cached_dev
>>
>> There are some differences:
>> 1. Create /sys/block/bcache0/bcache/size in sysfs to trigger resize
> 
> Can the final version name the sysfs entry "resize", because "size" sounds
> like you are setting a specific size, not triggering a resize.
> 
> --
> Eric Wheeler
> 
OK. I will update a version.

mingzhe
> 
> 
>> 2. Allocate stripe memory only if partial_stripes_expensive is true
>> 3. Simplify bcache_dev_sectors_dirty()
>>
>> Since the bcache superblock uses some sectors, the actual space of the bcache
>> device is smaller than the backing. In order to provide a bcache device with a
>> user-specified size, we need to create a backing device with a larger space,
>> and then resize bcache. So resize can specify the size is very necessary.
>>
>>
>>
>>
>>>
>>>
>>>> ---
>>>> Hi Coly,
>>>> this is the second version of the patch. As you correctly pointed out,
>>>> I implemented roll-back functionalities in case of error.
>>>> I'm testing this funcionality using QEMU/KVM vm via libvirt.
>>>> Here the steps:
>>>>    1. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>>>    2. mkfs.xfs /dev/bcache0
>>>>    3. mount /dev/bcache0 /mnt
>>>>    3. dd if=/dev/random of=/mnt/random0 bs=1M count=1000
>>>>    4. md5sum /mnt/random0 | tee /mnt/random0.md5
>>>>    5. [HOST] virsh blockresize <vm-name> --path <disk-path> --size
>>>>    <new-size>
>>>>    6. xfs_growfs /dev/bcache0
>>>>    6. Repeat steps 3 and 4 with a different file name (e.g. random1.md5)
>>>>    7. umount/reboot/remount and check that the md5 hashes are correct with
>>>>          md5sum -c /mnt/random?.md5
>>>>
>>>> drivers/md/bcache/super.c | 84 ++++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 83 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>>> index ba3909bb6bea..1435a3f605f8 100644
>>>> --- a/drivers/md/bcache/super.c
>>>> +++ b/drivers/md/bcache/super.c
>>>>>
>>>
>>> [snipped]
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-02-28  2:46 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 10:05 [RFC] Live resize of backing device Andrea Tomassetti
2022-08-04 14:32 ` Coly Li
2022-08-05 19:38 ` Eric Wheeler
2022-09-06 13:22   ` Andrea Tomassetti
2022-09-08  8:32     ` Andrea Tomassetti
2022-09-19 11:42       ` Andrea Tomassetti
2022-09-19 12:16         ` Coly Li
2022-12-09  8:57           ` Andrea Tomassetti
2022-12-09  9:36             ` Coly Li
2022-12-30 10:40       ` Coly Li
2023-01-11 16:01         ` Andrea Tomassetti
2023-01-17 13:08           ` Error messages with kernel 6.1.[56] Pierre Juhen
2023-01-17 16:08             ` Coly Li
2023-01-17 16:18           ` [RFC] Live resize of backing device Coly Li
2023-01-25 10:07             ` Andrea Tomassetti
2023-01-25 17:59               ` Coly Li
2023-01-27 12:44                 ` Andrea Tomassetti
2023-01-27 22:40                   ` Eric Wheeler
2023-01-31 10:20                     ` Andrea Tomassetti
2023-02-02 17:18                   ` Coly Li
2023-02-02 20:48                     ` Eric Wheeler
2023-02-03  2:41                       ` Coly Li
2023-02-19  9:39                   ` Coly Li
2023-02-20  8:27                     ` mingzhe
2023-02-20 12:29                       ` Coly Li
2023-02-22  8:42                         ` Andrea Tomassetti
2023-02-27 22:08                       ` Eric Wheeler
2023-02-28  2:46                         ` mingzhe
2023-01-27  2:53               ` [RFC] Live resize of bcache " Eric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).