From: Hans van Kranenburg <hans@knorrie.org>
To: Goffredo Baroncelli <kreijack@libero.it>, linux-btrfs@vger.kernel.org
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.cz>,
Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [PATCH 4/4] btrfs: add allocator_hint mode
Date: Fri, 17 Dec 2021 16:58:36 +0100 [thread overview]
Message-ID: <0fbfde93-3a00-7f20-8891-dd0fa798676e@knorrie.org> (raw)
In-Reply-To: <bf30502eb53ea2c1c05c2ae96c3788d3e327d59e.1635089352.git.kreijack@inwind.it>
Hi,
On 10/24/21 5:31 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
>
> When this mode is enabled, the chunk allocation policy is modified as
> follow.
>
> Each disk may have a different tag:
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
>
> Where:
> - ALLOCATION_PREFERRED_X means that it is preferred to use this disk for
> the X chunk type (the other type may be allowed when the space is low)
> - ALLOCATION_X_ONLY means that it is used *only* for the X chunk type.
> This means also that it is a preferred choice.
>
> Each time the allocator allocates a chunk of type X , first it takes the
> disks tagged as ALLOCATION_X_ONLY or ALLOCATION_PREFERRED_X; if the space
> is not enough, it uses also the disks tagged as ALLOCATION_METADATA_ONLY;
> if the space is not enough, it uses also the other disks, with the
> exception of the one marked as ALLOCATION_PREFERRED_Y, where Y the other
> type of chunk (i.e. not X).
I read this last paragraph for 5 times now, I think, but I still have no
idea what's going on here. Can you rephrase it? It's more complex than
the actual code below.
What the above text tells me is that if I start filling up my disks, it
will happily write DATA to METADATA_ONLY disks when the DATA ones are
full? And if the METADATA_ONLY disks are full with DATA also, it will
not write DATA into PREFERRED_METADATA disks.
I don't think that's what the code actually does. At least, I hope not.
Hans
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> ---
> fs/btrfs/volumes.c | 98 +++++++++++++++++++++++++++++++++++++++++++++-
> fs/btrfs/volumes.h | 1 +
> 2 files changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8ac99771f43c..7ee9c6e7bd44 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -153,6 +153,20 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
> },
> };
>
> +#define BTRFS_DEV_ALLOCATION_MASK ((1ULL << \
> + BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT) - 1)
> +#define BTRFS_DEV_ALLOCATION_MASK_COUNT (1ULL << \
> + BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT)
> +
> +static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
> + [BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
> + [BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
> + [BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 1,
> + [BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 2,
> + /* the other values are set to 0 */
> +};
> +
> +
> const char *btrfs_bg_type_to_raid_name(u64 flags)
> {
> const int index = btrfs_bg_flags_to_raid_index(flags);
> @@ -4997,13 +5011,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info,
> }
>
> /*
> - * sort the devices in descending order by max_avail, total_avail
> + * sort the devices in descending order by alloc_hint,
> + * max_avail, total_avail
> */
> static int btrfs_cmp_device_info(const void *a, const void *b)
> {
> const struct btrfs_device_info *di_a = a;
> const struct btrfs_device_info *di_b = b;
>
> + if (di_a->alloc_hint > di_b->alloc_hint)
> + return -1;
> + if (di_a->alloc_hint < di_b->alloc_hint)
> + return 1;
> if (di_a->max_avail > di_b->max_avail)
> return -1;
> if (di_a->max_avail < di_b->max_avail)
> @@ -5166,6 +5185,8 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
> int ndevs = 0;
> u64 max_avail;
> u64 dev_offset;
> + int hint;
> + int i;
>
> /*
> * in the first pass through the devices list, we gather information
> @@ -5218,16 +5239,91 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
> devices_info[ndevs].max_avail = max_avail;
> devices_info[ndevs].total_avail = total_avail;
> devices_info[ndevs].dev = device;
> +
> + if ((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
> + (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) {
> + /*
> + * if mixed bg set all the alloc_hint
> + * fields to the same value, so the sorting
> + * is not affected
> + */
> + devices_info[ndevs].alloc_hint = 0;
> + } else if (ctl->type & BTRFS_BLOCK_GROUP_DATA) {
> + hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> + /*
> + * skip BTRFS_DEV_METADATA_ONLY disks
> + */
> + if (hint == BTRFS_DEV_ALLOCATION_METADATA_ONLY)
> + continue;
> + /*
> + * if a data chunk must be allocated,
> + * sort also by hint (data disk
> + * higher priority)
> + */
> + devices_info[ndevs].alloc_hint = -alloc_hint_map[hint];
> + } else { /* BTRFS_BLOCK_GROUP_METADATA */
> + hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> + /*
> + * skip BTRFS_DEV_DATA_ONLY disks
> + */
> + if (hint == BTRFS_DEV_ALLOCATION_DATA_ONLY)
> + continue;
> + /*
> + * if a data chunk must be allocated,
> + * sort also by hint (metadata hint
> + * higher priority)
> + */
> + devices_info[ndevs].alloc_hint = alloc_hint_map[hint];
> + }
> +
> ++ndevs;
> }
> ctl->ndevs = ndevs;
>
> + /*
> + * no devices available
> + */
> + if (!ndevs)
> + return 0;
> +
> /*
> * now sort the devices by hole size / available space
> */
> sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
> btrfs_cmp_device_info, NULL);
>
> + /*
> + * select the minimum set of disks grouped by hint that
> + * can host the chunk
> + */
> + ndevs = 0;
> + while (ndevs < ctl->ndevs) {
> + hint = devices_info[ndevs++].alloc_hint;
> + while (ndevs < ctl->ndevs &&
> + devices_info[ndevs].alloc_hint == hint)
> + ndevs++;
> + if (ndevs >= ctl->devs_min)
> + break;
> + }
> +
> + BUG_ON(ndevs > ctl->ndevs);
> + ctl->ndevs = ndevs;
> +
> + /*
> + * the next layers require the devices_info ordered by
> + * max_avail. If we are returing two (or more) different
> + * group of alloc_hint, this is not always true. So sort
> + * these gain.
> + */
> +
> + for (i = 0 ; i < ndevs ; i++)
> + devices_info[i].alloc_hint = 0;
> +
> + sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
> + btrfs_cmp_device_info, NULL);
> +
> return 0;
> }
>
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index b8250f29df6e..37eb37b533c5 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -369,6 +369,7 @@ struct btrfs_device_info {
> u64 dev_offset;
> u64 max_avail;
> u64 total_avail;
> + int alloc_hint;
> };
>
> struct btrfs_raid_attr {
>
next prev parent reply other threads:[~2021-12-17 16:05 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58 ` Hans van Kranenburg [this message]
2021-12-17 18:28 ` Goffredo Baroncelli
2021-12-17 19:41 ` Zygo Blaxell
2021-12-18 9:07 ` Goffredo Baroncelli
2021-12-18 22:48 ` Zygo Blaxell
2021-12-19 0:03 ` Graham Cobb
2021-12-19 2:30 ` Zygo Blaxell
2021-12-13 9:39 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Paul Jones
2021-12-13 19:54 ` Goffredo Baroncelli
2021-12-13 21:15 ` Josef Bacik
2021-12-13 22:49 ` Zygo Blaxell
2021-12-14 14:31 ` Josef Bacik
2021-12-14 19:03 ` Goffredo Baroncelli
2021-12-14 20:04 ` Zygo Blaxell
2021-12-14 20:34 ` Josef Bacik
2021-12-14 20:41 ` Goffredo Baroncelli
2021-12-15 13:58 ` Josef Bacik
2021-12-15 18:53 ` Goffredo Baroncelli
2021-12-16 0:56 ` Josef Bacik
2021-12-17 5:40 ` Zygo Blaxell
2021-12-17 14:48 ` Josef Bacik
2021-12-17 16:31 ` Zygo Blaxell
2021-12-17 18:08 ` Goffredo Baroncelli
2021-12-16 2:30 ` Paul Jones
2021-12-14 1:03 ` Sinnamohideen, Shafeeq
2021-12-14 18:53 ` Goffredo Baroncelli
2021-12-14 20:35 ` Josef Bacik
[not found] <cover.1614028083.git.kreijack@inwind.it>
2021-02-22 21:19 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0fbfde93-3a00-7f20-8891-dd0fa798676e@knorrie.org \
--to=hans@knorrie.org \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kreijack@inwind.it \
--cc=kreijack@libero.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=shafeeqs@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).