From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
To: Coly Li <colyli@suse.de>
Cc: Eric Wheeler <bcache@lists.ewheeler.net>,
Kent Overstreet <kent.overstreet@gmail.com>,
linux-bcache@vger.kernel.org
Subject: Re: [RFC] Live resize of backing device
Date: Mon, 19 Sep 2022 13:42:15 +0200 [thread overview]
Message-ID: <14c2bdbd-e4ae-a5d1-3947-6ea6dc29f0bc@devo.com> (raw)
In-Reply-To: <4ddb082f-cefc-644e-2ccf-56d41207ecd3@devo.com>
Hi Coly,
have you had any time to take a look at this? Do you prefer if I send
the patch as a separate thread?
Thank you very much,
Andrea
On 8/9/22 10:32, Andrea Tomassetti wrote:
> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> Date: Thu, 8 Sep 2022 09:47:55 +0200
> Subject: [PATCH] bcache: Add support for live resize of backing devices
>
> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
> ---
> Hi Coly,
> Here is the first version of the patch. There are some points I noted down
> that I would like to discuss with you:
> - I found it pretty convenient to hook the call of the new added function
> inside the `register_bcache`. In fact, every time (at least from my
> understandings) a disk changes size, it will trigger a new probe and,
> thus, `register_bcache` will be triggered. The only inconvenient
> is that, in case of success, the function will output
> `error: capacity changed` even if it's not really an error.
> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
> `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
> problem, right?
> - There is some reused code between this new function and
> `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
> a broader scope or make it a macro, what do you think?
>
> Thank you very much,
> Andrea
>
> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index ba3909bb6bea..9a77caf2a18f 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
> return bch_is_open_cache(dev) || bch_is_open_backing(dev);
> }
>
> +static bool bch_update_capacity(dev_t dev)
> +{
> + const size_t max_stripes = min_t(size_t, INT_MAX,
> + SIZE_MAX / sizeof(atomic_t));
> +
> + uint64_t n, n_old;
> + int nr_stripes_old;
> + bool res = false;
> +
> + struct bcache_device *d;
> + struct cache_set *c, *tc;
> + struct cached_dev *dcp, *t, *dc = NULL;
> +
> + uint64_t parent_nr_sectors;
> +
> + list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
> + list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
> + if (dcp->bdev->bd_dev == dev) {
> + dc = dcp;
> + goto dc_found;
> + }
> +
> +dc_found:
> + if (!dc)
> + return false;
> +
> + parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
> +
> + if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
> + return false;
> +
> + if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
> + return false;
> +
> + d = &dc->disk;
> +
> + /* Force cached device sectors re-calc */
> + calc_cached_dev_sectors(d->c);
> +
> + /* Block writeback thread */
> + down_write(&dc->writeback_lock);
> + nr_stripes_old = d->nr_stripes;
> + n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
> + if (!n || n > max_stripes) {
> + pr_err("nr_stripes too large or invalid: %llu (start sector
> beyond end of disk?)\n",
> + n);
> + goto unblock_and_exit;
> + }
> + d->nr_stripes = n;
> +
> + n = d->nr_stripes * sizeof(atomic_t);
> + n_old = nr_stripes_old * sizeof(atomic_t);
> + d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
> + n, GFP_KERNEL);
> + if (!d->stripe_sectors_dirty)
> + goto unblock_and_exit;
> +
> + n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
> + n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
> + d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n,
> GFP_KERNEL);
> + if (!d->full_dirty_stripes)
> + goto unblock_and_exit;
> +
> + res = true;
> +
> +unblock_and_exit:
> + up_write(&dc->writeback_lock);
> + return res;
> +}
> +
> struct async_reg_args {
> struct delayed_work reg_work;
> char *path;
> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k,
> struct kobj_attribute *attr,
> mutex_lock(&bch_register_lock);
> if (lookup_bdev(strim(path), &dev) == 0 &&
> bch_is_open(dev))
> - err = "device already registered";
> + if (bch_update_capacity(dev))
> + err = "capacity changed";
> + else
> + err = "device already registered";
> else
> err = "device busy";
> mutex_unlock(&bch_register_lock);
> --
> 2.37.3
>
>
>
> On 6/9/22 15:22, Andrea Tomassetti wrote:
>> Hi Coly,
>> I have finally some time to work on the patch. I already have a first
>> prototype that seems to work but, before sending it, I would like to
>> ask you two questions:
>> 1. Inspecting the code, I found that the only lock mechanism is the
>> writeback_lock semaphore. Am I correct?
>> 2. How can I effectively test my patch? So far I'm doing something
>> like this:
>> a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>> b. mkfs.xfs /dev/bcache0
>> c. dd if=/dev/random of=/mnt/random bs=1M count=1000
>> d. md5sum /mnt/random | tee /mnt/random.md5
>> e. live resize the disk and repeat c. and d.
>> f. umount/reboot/remount and check that the md5 hashes are correct
>>
>> Any suggestions?
>>
>> Thank you very much,
>> Andrea
>>
>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler
>> <bcache@lists.ewheeler.net> wrote:
>>>
>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>>> Hi Coly,
>>>> In one of our previous emails you said that
>>>>> Currently bcache doesn’t support cache or backing device resize
>>>>
>>>> I was investigating this point and I actually found a solution. I
>>>> briefly tested it and it seems to work fine.
>>>> Basically what I'm doing is:
>>>> 1. Check if there's any discrepancy between the nr of sectors
>>>> reported by the bcache backing device (holder) and the nr of sectors
>>>> reported by its parent (slave).
>>>> 2. If the number of sectors of the two devices are not the same,
>>>> then call set_capacity_and_notify on the bcache device.
>>>> 3. From user space, depending on the fs used, grow the fs with some
>>>> utility (e.g. xfs_growfs)
>>>>
>>>> This works without any need of unmounting the mounted fs nor stopping
>>>> the bcache backing device.
>>>
>>> Well done! +1, would love to see a patch for this!
>>>
>>>
>>>> So my question is: am I missing something? Can this live resize cause
>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>>> this?
>>>
>>> A while a go we looked into doing this. Here is the summary of our
>>> findings, not sure if there are any other considerations:
>>>
>>> 1. Create a sysfs file like /sys/block/bcache0/bcache/resize to
>>> trigger
>>> resize on echo 1 >.
>>> 2. Refactor the set_capacity() bits from bcache_device_init()
>>> into its
>>> own function.
>>> 3. Put locks around bcache_device.full_dirty_stripes and
>>> bcache_device.stripe_sectors_dirty. Re-alloc+copy+free and
>>> zero the
>>> new bytes at the end. Grep where
>>> bcache_device.full_dirty_stripes is
>>> used and make sure it is locked appropriately, probably in the
>>> writeback thread, maybe other places too.
>>>
>>> The cachedev's don't know anything about the bdev size, so (according to
>>> Kent) they will "just work" by referencing new offsets that were
>>> previously beyond the disk. (This is basically the same as resizing the
>>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>>
>>> As for resizing a cachedev, I've not looked at all---not sure about that
>>> one. We always detach, resize, make-bcache and re-attach the new cache.
>>> Maybe it is similarly simple, but haven't looked.
>>>
>>>
>>> --
>>> Eric Wheeler
>>>
>>>
>>>
>>>>
>>>> Kind regards,
>>>> Andrea
>>>>
next prev parent reply other threads:[~2022-09-19 11:42 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-03 10:05 [RFC] Live resize of backing device Andrea Tomassetti
2022-08-04 14:32 ` Coly Li
2022-08-05 19:38 ` Eric Wheeler
2022-09-06 13:22 ` Andrea Tomassetti
2022-09-08 8:32 ` Andrea Tomassetti
2022-09-19 11:42 ` Andrea Tomassetti [this message]
2022-09-19 12:16 ` Coly Li
2022-12-09 8:57 ` Andrea Tomassetti
2022-12-09 9:36 ` Coly Li
2022-12-30 10:40 ` Coly Li
2023-01-11 16:01 ` Andrea Tomassetti
2023-01-17 13:08 ` Error messages with kernel 6.1.[56] Pierre Juhen
2023-01-17 16:08 ` Coly Li
2023-01-17 16:18 ` [RFC] Live resize of backing device Coly Li
2023-01-25 10:07 ` Andrea Tomassetti
2023-01-25 17:59 ` Coly Li
2023-01-27 12:44 ` Andrea Tomassetti
2023-01-27 22:40 ` Eric Wheeler
2023-01-31 10:20 ` Andrea Tomassetti
2023-02-02 17:18 ` Coly Li
2023-02-02 20:48 ` Eric Wheeler
2023-02-03 2:41 ` Coly Li
2023-02-19 9:39 ` Coly Li
2023-02-20 8:27 ` mingzhe
2023-02-20 12:29 ` Coly Li
2023-02-22 8:42 ` Andrea Tomassetti
2023-02-27 22:08 ` Eric Wheeler
2023-02-28 2:46 ` mingzhe
2023-01-27 2:53 ` [RFC] Live resize of bcache " Eric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14c2bdbd-e4ae-a5d1-3947-6ea6dc29f0bc@devo.com \
--to=andrea.tomassetti-opensource@devo.com \
--cc=bcache@lists.ewheeler.net \
--cc=colyli@suse.de \
--cc=kent.overstreet@gmail.com \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).