All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: kreijack@inwind.it, linux-btrfs@vger.kernel.org
Cc: josef@toxicpanda.com, Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Subject: Re: [RFC][PATCH] btrfs: sysfs for chunk layout hint
Date: Fri, 19 Feb 2021 11:06:20 +0800	[thread overview]
Message-ID: <793269fa-32f2-5631-8557-f1961bcf06df@oracle.com> (raw)
In-Reply-To: <dade948e-7e66-4081-6ea1-f84a4dd6a11a@libero.it>

On 19/02/2021 03:11, Goffredo Baroncelli wrote:
> On 2/18/21 6:20 PM, Anand Jain wrote:
>> btrfs_chunk_alloc() uses dev_alloc_list to allocate new chunks. The
>> function's stack leading to btrfs_cmp_device_info() sorts the
>> dev_alloc_list in the descending order of unallocated space. This
>> sorting helps to maximize the filesystem space.
>>
>> But, there might be other types of preferences when allocating the
>> chunks. For example, allocation by device latency, with which the
>> metadata could go to the device with the least latency.
>>
>> This patch is a preparatory patch and makes the existing allocation
>> layout a configurable parameter using sysfs, as shown below.
>>
>> cd /sys/fs/btrfs/863c787e-fdbd-49ca-a0ea-22f36934ff1f
>> cat chunk_layout_data
>> [size]
>> cat chunk_layout_metadata
>> [size]
>>
>> We could add more chunk allocation types by adding to the list in
>> enum btrfs_chunk_layout{ }.
> 
> 
> Hi Anand,
> 
> I like the idea. My patches set (allocation hint), provides a similar 
> functionality.
> However I used a "static" priority (each disks has an user-defined tag 
> which give a priority for the allocation).
> 

Hi Goffredo,

This patch is a framework to add more types of allocation hints as
needed, and does not introduce any new allocation hints. The current
default allocation hint is called 'size' and retained here.

Not to confuse.  Add more allocations hints on top of this framework.
The latency above was just an example.

The allocation hint I mainly wanted was sequential (by devid). And this
framework helped to add it. As I mentioned below, I needed sequential
allocation hint for read_policy testing. If it helps, I can send that
patch too. But I don't know if there can be any practical use of it.

Thanks, Anand


> In any case the logic is always the same: until now 
> btrfs_cmp_device_info() sorts the
> devices on the basis of two criterion:
> - the max_avail
> - the total_vail
 >> I added another criterion, that I called "alloc_hint" that decides the
> sorting.
> 
> I go even further, because Zygo asked me to add a flag which may exclude 
> some disks.[*]
> 
> If we don't want to consider the idea to combine different criteria 
> (like order by
> speed AND latency AND space AND ... ) which increase the management 
> complexity, the
> main differences is that I used an "arbitrary user defined criterion", 
> instead you may
> suggest to use some specific performance index of the disks (like speed, 
> latency...).
> 

  Yes. Idea looks good to me.

> But in the end, because the latency or the speed are a "static" 
> attribute (they should not change
> during the life of the disks, or these change quite slowly) the results 
> don't change.

> So my concern is that my approach (which basically stores in the disk 
> the priority) and your one (allow
> to change the priority criterion) are quite overlapped but not 
> completely and difficult
> to combine.

  Umm. Yes that's the drawback of the dynamic latency determination,
  static latency as in your patches should be ok and straight forward.

Thanks, Anand


> 
> BR
> G.Baroncelli
> 
> 
> [*] Zygo has an user case for this, however I fear the complexity that 
> it adds from
> a "free space report" point of view...
> 
>>
>> This is only a preparatory patch. The parameter is only an in-memory
>> as of now. A persistent disk structure can be added on top of this
>> when we have a consensus.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> This + sequential chunk layout hint (experimental) (patch not yet sent)
>> helped me get consistent performance numbers for read_policy pid.
>> As chunk layout hint is not set at mkfs, a balance after setting the
>> desired chunk layout hint is needed.
>>
>>   fs/btrfs/ctree.h   |  3 ++
>>   fs/btrfs/disk-io.c |  3 ++
>>   fs/btrfs/sysfs.c   | 98 ++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/volumes.c |  4 +-
>>   fs/btrfs/volumes.h | 10 +++++
>>   5 files changed, 117 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 3bc00aed13b2..c37bd2d7f5d4 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -993,6 +993,9 @@ struct btrfs_fs_info {
>>       spinlock_t eb_leak_lock;
>>       struct list_head allocated_ebs;
>>   #endif
>> +
>> +    int chunk_layout_data;
>> +    int chunk_layout_metadata;
>>   };
>>   static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index c2576c5fe62e..c81f95339a35 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2890,6 +2890,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info 
>> *fs_info)
>>       fs_info->swapfile_pins = RB_ROOT;
>>       fs_info->send_in_progress = 0;
>> +
>> +    fs_info->chunk_layout_data = BTRFS_CHUNK_LAYOUT_SIZE;
>> +    fs_info->chunk_layout_metadata = BTRFS_CHUNK_LAYOUT_SIZE;
>>   }
>>   static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct 
>> super_block *sb)
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index 30e1cfcaa925..788784b1ed44 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -967,6 +967,102 @@ static ssize_t btrfs_read_policy_store(struct 
>> kobject *kobj,
>>   }
>>   BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, 
>> btrfs_read_policy_store);
>> +static const char * const btrfs_chunk_layout_name[] = { "size" };
>> +
>> +static ssize_t btrfs_chunk_layout_data_show(struct kobject *kobj,
>> +                        struct kobj_attribute *a, char *buf)
>> +{
>> +    struct btrfs_fs_info *fs_info = to_fs_info(kobj);
>> +    ssize_t ret = 0;
>> +    int i;
>> +
>> +    for (i = 0; i < BTRFS_NR_CHUNK_LAYOUT; i++) {
>> +        if (fs_info->chunk_layout_data == i)
>> +            ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s[%s]",
>> +                     (ret == 0 ? "" : " "),
>> +                     btrfs_chunk_layout_name[i]);
>> +        else
>> +            ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%s",
>> +                     (ret == 0 ? "" : " "),
>> +                     btrfs_chunk_layout_name[i]);
>> +    }
>> +
>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "\n");
>> +
>> +    return ret;
>> +}
>> +
>> +static ssize_t btrfs_chunk_layout_data_store(struct kobject *kobj,
>> +                         struct kobj_attribute *a,
>> +                         const char *buf, size_t len)
>> +{
>> +    struct btrfs_fs_info *fs_info = to_fs_info(kobj);
>> +    int i;
>> +
>> +    for (i = 0; i < BTRFS_NR_CHUNK_LAYOUT; i++) {
>> +        if (strmatch(buf, btrfs_chunk_layout_name[i])) {
>> +            if (i != fs_info->chunk_layout_data) {
>> +                fs_info->chunk_layout_data = i;
>> +                btrfs_info(fs_info, "chunk_layout_data set to '%s'",
>> +                       btrfs_chunk_layout_name[i]);
>> +            }
>> +            return len;
>> +        }
>> +    }
>> +
>> +    return -EINVAL;
>> +}
>> +BTRFS_ATTR_RW(, chunk_layout_data, btrfs_chunk_layout_data_show,
>> +          btrfs_chunk_layout_data_store);
>> +
>> +static ssize_t btrfs_chunk_layout_metadata_show(struct kobject *kobj,
>> +                        struct kobj_attribute *a,
>> +                        char *buf)
>> +{
>> +    struct btrfs_fs_info *fs_info = to_fs_info(kobj);
>> +    ssize_t ret = 0;
>> +    int i;
>> +
>> +    for (i = 0; i < BTRFS_NR_CHUNK_LAYOUT; i++) {
>> +        if (fs_info->chunk_layout_metadata == i)
>> +            ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s[%s]",
>> +                     (ret == 0 ? "" : " "),
>> +                     btrfs_chunk_layout_name[i]);
>> +        else
>> +            ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%s",
>> +                     (ret == 0 ? "" : " "),
>> +                     btrfs_chunk_layout_name[i]);
>> +    }
>> +
>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "\n");
>> +
>> +    return ret;
>> +}
>> +
>> +static ssize_t btrfs_chunk_layout_metadata_store(struct kobject *kobj,
>> +                         struct kobj_attribute *a,
>> +                         const char *buf, size_t len)
>> +{
>> +    struct btrfs_fs_info *fs_info = to_fs_info(kobj);
>> +    int i;
>> +
>> +    for (i = 0; i < BTRFS_NR_CHUNK_LAYOUT; i++) {
>> +        if (strmatch(buf, btrfs_chunk_layout_name[i])) {
>> +            if (i != fs_info->chunk_layout_metadata) {
>> +                fs_info->chunk_layout_metadata = i;
>> +                btrfs_info(fs_info,
>> +                       "chunk_layout_metadata set to '%s'",
>> +                       btrfs_chunk_layout_name[i]);
>> +            }
>> +            return len;
>> +        }
>> +    }
>> +
>> +    return -EINVAL;
>> +}
>> +BTRFS_ATTR_RW(, chunk_layout_metadata, btrfs_chunk_layout_metadata_show,
>> +          btrfs_chunk_layout_metadata_store);
>> +
>>   static const struct attribute *btrfs_attrs[] = {
>>       BTRFS_ATTR_PTR(, label),
>>       BTRFS_ATTR_PTR(, nodesize),
>> @@ -978,6 +1074,8 @@ static const struct attribute *btrfs_attrs[] = {
>>       BTRFS_ATTR_PTR(, exclusive_operation),
>>       BTRFS_ATTR_PTR(, generation),
>>       BTRFS_ATTR_PTR(, read_policy),
>> +    BTRFS_ATTR_PTR(, chunk_layout_data),
>> +    BTRFS_ATTR_PTR(, chunk_layout_metadata),
>>       NULL,
>>   };
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d1ba160ef73b..2223c4263d4a 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5097,7 +5097,9 @@ static int gather_device_info(struct 
>> btrfs_fs_devices *fs_devices,
>>       ctl->ndevs = ndevs;
>>       /*
>> -     * now sort the devices by hole size / available space
>> +     * Now sort the devices by hole size / available space.
>> +     * This sort helps to pick device(s) with larger space.
>> +     * That is BTRFS_CHUNK_LAYOUT_SIZE.
>>        */
>>       sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>>            btrfs_cmp_device_info, NULL);
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index d0a90dc7fc03..b514d09f4ba8 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -218,6 +218,16 @@ enum btrfs_chunk_allocation_policy {
>>       BTRFS_CHUNK_ALLOC_ZONED,
>>   };
>> +/*
>> + * If we have more than the required number of the devices for striping,
>> + * chunk_layout let us know which device to use.
>> + */
>> +enum btrfs_chunk_layout {
>> +    /* Use in the order of the size of the unallocated space on the 
>> device */
>> +    BTRFS_CHUNK_LAYOUT_SIZE,
>> +    BTRFS_NR_CHUNK_LAYOUT,
>> +};
>> +
>>   /*
>>    * Read policies for mirrored block group profiles, read picks the 
>> stripe based
>>    * on these policies.
>>
> 
> 


      reply	other threads:[~2021-02-19  3:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18 17:20 [RFC][PATCH] btrfs: sysfs for chunk layout hint Anand Jain
2021-02-18 19:11 ` Goffredo Baroncelli
2021-02-19  3:06   ` Anand Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=793269fa-32f2-5631-8557-f1961bcf06df@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.