linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Cc: Eric Wheeler <bcache@lists.ewheeler.net>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	linux-bcache@vger.kernel.org
Subject: Re: [RFC] Live resize of backing device
Date: Fri, 9 Dec 2022 17:36:54 +0800	[thread overview]
Message-ID: <BF11120A-1148-427E-966A-A46F6002FCC4@suse.de> (raw)
In-Reply-To: <CAHykVA4tz_WxmYae9i+TUPLmEsUpJ0rxMV_cs=4st0voM=Oo9w@mail.gmail.com>

Hi Andrea,

Hope I can have time late in this month, and response you.

Thanks.

Coly Li



> 2022年12月9日 16:57,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
> 
> Hi Coly,
> just a kind reminder for this patch.
> 
> Thank you very much,
> Andrea
> 
> On Mon, Sep 19, 2022 at 2:17 PM Coly Li <colyli@suse.de> wrote:
>> 
>> 
>> 
>>> 2022年9月19日 19:42,Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> 写道:
>>> 
>>> Hi Coly,
>>> have you had any time to take a look at this? Do you prefer if I send the patch as a separate thread?
>>> 
>>> Thank you very much,
>>> Andrea
>> 
>> 
>> Yes, it is on my queue, just after I finish my tasks on hand, I will take a look on it.
>> 
>> Thanks.
>> 
>> Coly Li
>> 
>> 
>>> 
>>> On 8/9/22 10:32, Andrea Tomassetti wrote:
>>>> From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
>>>> From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>>> Date: Thu, 8 Sep 2022 09:47:55 +0200
>>>> Subject: [PATCH] bcache: Add support for live resize of backing devices
>>>> Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
>>>> ---
>>>> Hi Coly,
>>>> Here is the first version of the patch. There are some points I noted down
>>>> that I would like to discuss with you:
>>>> - I found it pretty convenient to hook the call of the new added function
>>>>   inside the `register_bcache`. In fact, every time (at least from my
>>>>   understandings) a disk changes size, it will trigger a new probe and,
>>>>   thus, `register_bcache` will be triggered. The only inconvenient
>>>>   is that, in case of success, the function will output
>>>>   `error: capacity changed` even if it's not really an error.
>>>> - I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
>>>>   `stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
>>>>   problem, right?
>>>> - There is some reused code between this new function and
>>>>   `bcache_device_init`. Maybe I can move `const size_t max_stripes` to
>>>>   a broader scope or make it a macro, what do you think?
>>>> Thank you very much,
>>>> Andrea
>>>> drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 74 insertions(+), 1 deletion(-)
>>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>>> index ba3909bb6bea..9a77caf2a18f 100644
>>>> --- a/drivers/md/bcache/super.c
>>>> +++ b/drivers/md/bcache/super.c
>>>> @@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
>>>>     return bch_is_open_cache(dev) || bch_is_open_backing(dev);
>>>> }
>>>> +static bool bch_update_capacity(dev_t dev)
>>>> +{
>>>> +    const size_t max_stripes = min_t(size_t, INT_MAX,
>>>> +                     SIZE_MAX / sizeof(atomic_t));
>>>> +
>>>> +    uint64_t n, n_old;
>>>> +    int nr_stripes_old;
>>>> +    bool res = false;
>>>> +
>>>> +    struct bcache_device *d;
>>>> +    struct cache_set *c, *tc;
>>>> +    struct cached_dev *dcp, *t, *dc = NULL;
>>>> +
>>>> +    uint64_t parent_nr_sectors;
>>>> +
>>>> +    list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
>>>> +        list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
>>>> +            if (dcp->bdev->bd_dev == dev) {
>>>> +                dc = dcp;
>>>> +                goto dc_found;
>>>> +            }
>>>> +
>>>> +dc_found:
>>>> +    if (!dc)
>>>> +        return false;
>>>> +
>>>> +    parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
>>>> +
>>>> +    if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
>>>> +        return false;
>>>> +
>>>> +    if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
>>>> +        return false;
>>>> +
>>>> +    d = &dc->disk;
>>>> +
>>>> +    /* Force cached device sectors re-calc */
>>>> +    calc_cached_dev_sectors(d->c);
>>>> +
>>>> +    /* Block writeback thread */
>>>> +    down_write(&dc->writeback_lock);
>>>> +    nr_stripes_old = d->nr_stripes;
>>>> +    n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
>>>> +    if (!n || n > max_stripes) {
>>>> +        pr_err("nr_stripes too large or invalid: %llu (start sector beyond end of disk?)\n",
>>>> +            n);
>>>> +        goto unblock_and_exit;
>>>> +    }
>>>> +    d->nr_stripes = n;
>>>> +
>>>> +    n = d->nr_stripes * sizeof(atomic_t);
>>>> +    n_old = nr_stripes_old * sizeof(atomic_t);
>>>> +    d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
>>>> +        n, GFP_KERNEL);
>>>> +    if (!d->stripe_sectors_dirty)
>>>> +        goto unblock_and_exit;
>>>> +
>>>> +    n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
>>>> +    n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
>>>> +    d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n, GFP_KERNEL);
>>>> +    if (!d->full_dirty_stripes)
>>>> +        goto unblock_and_exit;
>>>> +
>>>> +    res = true;
>>>> +
>>>> +unblock_and_exit:
>>>> +    up_write(&dc->writeback_lock);
>>>> +    return res;
>>>> +}
>>>> +
>>>> struct async_reg_args {
>>>>     struct delayed_work reg_work;
>>>>     char *path;
>>>> @@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>>>>             mutex_lock(&bch_register_lock);
>>>>             if (lookup_bdev(strim(path), &dev) == 0 &&
>>>>                 bch_is_open(dev))
>>>> -                err = "device already registered";
>>>> +                if (bch_update_capacity(dev))
>>>> +                    err = "capacity changed";
>>>> +                else
>>>> +                    err = "device already registered";
>>>>             else
>>>>                 err = "device busy";
>>>>             mutex_unlock(&bch_register_lock);
>>>> --
>>>> 2.37.3
>>>> On 6/9/22 15:22, Andrea Tomassetti wrote:
>>>>> Hi Coly,
>>>>> I have finally some time to work on the patch. I already have a first
>>>>> prototype that seems to work but, before sending it, I would like to
>>>>> ask you two questions:
>>>>>   1. Inspecting the code, I found that the only lock mechanism is the
>>>>> writeback_lock semaphore. Am I correct?
>>>>>   2. How can I effectively test my patch? So far I'm doing something like this:
>>>>>      a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
>>>>>      b. mkfs.xfs /dev/bcache0
>>>>>      c. dd if=/dev/random of=/mnt/random bs=1M count=1000
>>>>>      d. md5sum /mnt/random | tee /mnt/random.md5
>>>>>      e. live resize the disk and repeat c. and d.
>>>>>      f. umount/reboot/remount and check that the md5 hashes are correct
>>>>> 
>>>>> Any suggestions?
>>>>> 
>>>>> Thank you very much,
>>>>> Andrea
>>>>> 
>>>>> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>>>>>> 
>>>>>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>>>>>> Hi Coly,
>>>>>>> In one of our previous emails you said that
>>>>>>>> Currently bcache doesn’t support cache or backing device resize
>>>>>>> 
>>>>>>> I was investigating this point and I actually found a solution. I
>>>>>>> briefly tested it and it seems to work fine.
>>>>>>> Basically what I'm doing is:
>>>>>>>   1. Check if there's any discrepancy between the nr of sectors
>>>>>>> reported by the bcache backing device (holder) and the nr of sectors
>>>>>>> reported by its parent (slave).
>>>>>>>   2. If the number of sectors of the two devices are not the same,
>>>>>>> then call set_capacity_and_notify on the bcache device.
>>>>>>>   3. From user space, depending on the fs used, grow the fs with some
>>>>>>> utility (e.g. xfs_growfs)
>>>>>>> 
>>>>>>> This works without any need of unmounting the mounted fs nor stopping
>>>>>>> the bcache backing device.
>>>>>> 
>>>>>> Well done! +1, would love to see a patch for this!
>>>>>> 
>>>>>> 
>>>>>>> So my question is: am I missing something? Can this live resize cause
>>>>>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>>>>>> this?
>>>>>> 
>>>>>> A while a go we looked into doing this.  Here is the summary of our
>>>>>> findings, not sure if there are any other considerations:
>>>>>> 
>>>>>>   1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
>>>>>>      resize on echo 1 >.
>>>>>>   2. Refactor the set_capacity() bits from  bcache_device_init() into its
>>>>>>      own function.
>>>>>>   3. Put locks around bcache_device.full_dirty_stripes and
>>>>>>      bcache_device.stripe_sectors_dirty.  Re-alloc+copy+free and zero the
>>>>>>      new bytes at the end.  Grep where bcache_device.full_dirty_stripes is
>>>>>>      used and make sure it is locked appropriately, probably in the
>>>>>>      writeback thread, maybe other places too.
>>>>>> 
>>>>>> The cachedev's don't know anything about the bdev size, so (according to
>>>>>> Kent) they will "just work" by referencing new offsets that were
>>>>>> previously beyond the disk. (This is basically the same as resizing the
>>>>>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>>>>> 
>>>>>> As for resizing a cachedev, I've not looked at all---not sure about that
>>>>>> one.  We always detach, resize, make-bcache and re-attach the new cache.
>>>>>> Maybe it is similarly simple, but haven't looked.
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Eric Wheeler
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Andrea
>>>>>>> 
>> 


  reply	other threads:[~2022-12-09  9:37 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-03 10:05 [RFC] Live resize of backing device Andrea Tomassetti
2022-08-04 14:32 ` Coly Li
2022-08-05 19:38 ` Eric Wheeler
2022-09-06 13:22   ` Andrea Tomassetti
2022-09-08  8:32     ` Andrea Tomassetti
2022-09-19 11:42       ` Andrea Tomassetti
2022-09-19 12:16         ` Coly Li
2022-12-09  8:57           ` Andrea Tomassetti
2022-12-09  9:36             ` Coly Li [this message]
2022-12-30 10:40       ` Coly Li
2023-01-11 16:01         ` Andrea Tomassetti
2023-01-17 13:08           ` Error messages with kernel 6.1.[56] Pierre Juhen
2023-01-17 16:08             ` Coly Li
2023-01-17 16:18           ` [RFC] Live resize of backing device Coly Li
2023-01-25 10:07             ` Andrea Tomassetti
2023-01-25 17:59               ` Coly Li
2023-01-27 12:44                 ` Andrea Tomassetti
2023-01-27 22:40                   ` Eric Wheeler
2023-01-31 10:20                     ` Andrea Tomassetti
2023-02-02 17:18                   ` Coly Li
2023-02-02 20:48                     ` Eric Wheeler
2023-02-03  2:41                       ` Coly Li
2023-02-19  9:39                   ` Coly Li
2023-02-20  8:27                     ` mingzhe
2023-02-20 12:29                       ` Coly Li
2023-02-22  8:42                         ` Andrea Tomassetti
2023-02-27 22:08                       ` Eric Wheeler
2023-02-28  2:46                         ` mingzhe
2023-01-27  2:53               ` [RFC] Live resize of bcache " Eric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BF11120A-1148-427E-966A-A46F6002FCC4@suse.de \
    --to=colyli@suse.de \
    --cc=andrea.tomassetti-opensource@devo.com \
    --cc=bcache@lists.ewheeler.net \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).