From: Goffredo Baroncelli <kreijack@libero.it>
To: Hans van Kranenburg <hans@knorrie.org>, linux-btrfs@vger.kernel.org
Cc: Michael <mclaud@roznica.com.ua>, Hugo Mills <hugo@carfax.org.uk>,
Martin Svec <martin.svec@zoner.cz>,
Wang Yugui <wangyugui@e16-tech.com>,
Paul Jones <paul@pauljones.id.au>,
Adam Borowski <kilobyte@angband.pl>,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [PATCH 4/4] btrfs: add preferred_metadata mode
Date: Fri, 29 May 2020 18:26:38 +0200 [thread overview]
Message-ID: <83e19781-1733-47bb-dc07-876ca82e94c1@libero.it> (raw)
In-Reply-To: <61d2188a-290c-5d6f-ec32-6cacd3f63ce8@knorrie.org>
On 5/29/20 12:02 AM, Hans van Kranenburg wrote:
> Hi,
>
> On 5/28/20 8:34 PM, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>
>> When this mode is enabled,
>
> The commit message does not mention if this is either only a convenience
> during development and testing of the feature to be able to quickly turn
> it on/off, or if you intend to have this into the final change set.
Good question. IMHO for the initial devel phase I think that it is useful to have
a preferred_metadata disk (opt-in). Then we could reverse the logic and
default to preferred_metadata. Of course then we will have a
no-preferred_metadata flag (opt-out)
>
>> the allocation policy of the chunk
>> is so modified:
>> - allocation of metadata chunk: priority is given to preferred_metadata
>> disks.
>> - allocation of data chunk: priority is given to a non preferred_metadata
>> disk.
>>
>> When a striped profile is involved (like RAID0,5,6), the logic
>> is a bit more complex. If there are enough disks, the data profiles
>> are stored on the non preferred_metadata disks; instead the metadata
>> profiles are stored on the preferred_metadata disk.
>> If the disks are not enough, then the profile is allocated on all
>> the disks.
>>
>> Example: assuming that sda, sdb, sdc are ssd disks, and sde, sdf are
>> non preferred_metadata ones.
>> A data profile raid6, will be stored on sda, sdb, sdc, sde, sdf (sde
>> and sdf are not enough to host a raid5 profile).
>> A metadata profile raid6, will be stored on sda, sdb, sdc (these
>> are enough to host a raid6 profile).
>>
>> To enable this mode pass -o dedicated_metadata at mount time.
>
> Is it dedicated_metadata or preferred_metadata?
It was an copy&paste error. It should be preferred_metadata
>
>> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
>> ---
>> fs/btrfs/ctree.h | 1 +
>> fs/btrfs/super.c | 8 +++++
>> fs/btrfs/volumes.c | 89 ++++++++++++++++++++++++++++++++++++++++++++--
>> fs/btrfs/volumes.h | 1 +
>> 4 files changed, 97 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 03ea7370aea7..779760fd27b1 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1239,6 +1239,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
>> #define BTRFS_MOUNT_NOLOGREPLAY (1 << 27)
>> #define BTRFS_MOUNT_REF_VERIFY (1 << 28)
>> #define BTRFS_MOUNT_DISCARD_ASYNC (1 << 29)
>> +#define BTRFS_MOUNT_PREFERRED_METADATA (1 << 30)
>>
>> #define BTRFS_DEFAULT_COMMIT_INTERVAL (30)
>> #define BTRFS_DEFAULT_MAX_INLINE (2048)
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 438ecba26557..80700dc9dcf8 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -359,6 +359,7 @@ enum {
>> #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>> Opt_ref_verify,
>> #endif
>> + Opt_preferred_metadata,
>> Opt_err,
>> };
>>
>> @@ -430,6 +431,7 @@ static const match_table_t tokens = {
>> #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>> {Opt_ref_verify, "ref_verify"},
>> #endif
>> + {Opt_preferred_metadata, "preferred_metadata"},
>> {Opt_err, NULL},
>> };
>>
>> @@ -881,6 +883,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>> btrfs_set_opt(info->mount_opt, REF_VERIFY);
>> break;
>> #endif
>> + case Opt_preferred_metadata:
>> + btrfs_set_and_info(info, PREFERRED_METADATA,
>> + "enabling preferred_metadata");
>> + break;
>> case Opt_err:
>> btrfs_err(info, "unrecognized mount option '%s'", p);
>> ret = -EINVAL;
>> @@ -1403,6 +1409,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
>> #endif
>> if (btrfs_test_opt(info, REF_VERIFY))
>> seq_puts(seq, ",ref_verify");
>> + if (btrfs_test_opt(info, PREFERRED_METADATA))
>> + seq_puts(seq, ",preferred_metadata");
>> seq_printf(seq, ",subvolid=%llu",
>> BTRFS_I(d_inode(dentry))->root->root_key.objectid);
>> seq_puts(seq, ",subvol=");
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 5265f54c2931..c68efb15e473 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4770,6 +4770,56 @@ static int btrfs_cmp_device_info(const void *a, const void *b)
>> return 0;
>> }
>>
>> +/*
>> + * sort the devices in descending order by preferred_metadata,
>> + * max_avail, total_avail
>> + */
>> +static int btrfs_cmp_device_info_metadata(const void *a, const void *b)
>> +{
>> + const struct btrfs_device_info *di_a = a;
>> + const struct btrfs_device_info *di_b = b;
>> +
>> + /* metadata -> preferred_metadata first */
>> + if (di_a->preferred_metadata && !di_b->preferred_metadata)
>> + return -1;
>> + if (!di_a->preferred_metadata && di_b->preferred_metadata)
>> + return 1;
>> + if (di_a->max_avail > di_b->max_avail)
>> + return -1;
>> + if (di_a->max_avail < di_b->max_avail)
>> + return 1;
>> + if (di_a->total_avail > di_b->total_avail)
>> + return -1;
>> + if (di_a->total_avail < di_b->total_avail)
>> + return 1;
>> + return 0;
>> +}
>> +
>> +/*
>> + * sort the devices in descending order by !preferred_metadata,
>> + * max_avail, total_avail
>> + */
>> +static int btrfs_cmp_device_info_data(const void *a, const void *b)
>> +{
>> + const struct btrfs_device_info *di_a = a;
>> + const struct btrfs_device_info *di_b = b;
>> +
>> + /* data -> preferred_metadata last */
>> + if (di_a->preferred_metadata && !di_b->preferred_metadata)
>> + return 1;
>> + if (!di_a->preferred_metadata && di_b->preferred_metadata)
>> + return -1;
>> + if (di_a->max_avail > di_b->max_avail)
>> + return -1;
>> + if (di_a->max_avail < di_b->max_avail)
>> + return 1;
>> + if (di_a->total_avail > di_b->total_avail)
>> + return -1;
>> + if (di_a->total_avail < di_b->total_avail)
>> + return 1;
>> + return 0;
>> +}
>> +
>> static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
>> {
>> if (!(type & BTRFS_BLOCK_GROUP_RAID56_MASK))
>> @@ -4885,6 +4935,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>> int ndevs = 0;
>> u64 max_avail;
>> u64 dev_offset;
>> + int nr_preferred_metadata = 0;
>>
>> /*
>> * in the first pass through the devices list, we gather information
>> @@ -4937,15 +4988,49 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>> devices_info[ndevs].max_avail = max_avail;
>> devices_info[ndevs].total_avail = total_avail;
>> devices_info[ndevs].dev = device;
>> + devices_info[ndevs].preferred_metadata = !!(device->type &
>> + BTRFS_DEV_PREFERRED_METADATA);
>> + if (devices_info[ndevs].preferred_metadata)
>> + nr_preferred_metadata++;
>> ++ndevs;
>> }
>> ctl->ndevs = ndevs;
>>
>> + BUG_ON(nr_preferred_metadata > ndevs);
>> /*
>> * now sort the devices by hole size / available space
>> */
>> - sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>> - btrfs_cmp_device_info, NULL);
>> + if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
>> + (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
>> + !btrfs_test_opt(info, PREFERRED_METADATA)) {
>> + /* mixed bg or PREFERRED_METADATA not set */
>> + sort(devices_info, ctl->ndevs, sizeof(struct btrfs_device_info),
>> + btrfs_cmp_device_info, NULL);
>> + } else {
>> + /*
>> + * if PREFERRED_METADATA is set, sort the device considering
>> + * also the kind (preferred_metadata or not). Limit the
>> + * availables devices to the ones of the same kind, to avoid
>> + * that a striped profile, like raid5, spreads to all kind of
>> + * devices.
>> + * It is allowed to use different kinds of devices if the ones
>> + * of the same kind are not enough alone.
>> + */
>> + if (ctl->type & BTRFS_BLOCK_GROUP_DATA) {
>> + int nr_data = ctl->ndevs - nr_preferred_metadata;
>> + sort(devices_info, ctl->ndevs,
>> + sizeof(struct btrfs_device_info),
>> + btrfs_cmp_device_info_data, NULL);
>> + if (nr_data >= ctl->devs_min)
>> + ctl->ndevs = nr_data;
>> + } else { /* non data -> metadata and system */
>> + sort(devices_info, ctl->ndevs,
>> + sizeof(struct btrfs_device_info),
>> + btrfs_cmp_device_info_metadata, NULL);
>> + if (nr_preferred_metadata >= ctl->devs_min)
>> + ctl->ndevs = nr_preferred_metadata;
>> + }
>> + }
>>
>> return 0;
>> }
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 0ac5bf2b95e6..d39c3b0e7569 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -347,6 +347,7 @@ struct btrfs_device_info {
>> u64 dev_offset;
>> u64 max_avail;
>> u64 total_avail;
>> + int preferred_metadata:1;
>> };
>>
>> struct btrfs_raid_attr {
>>
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2020-05-29 16:26 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-28 18:34 [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 1/4] Add an ioctl to set/retrive the device properties Goffredo Baroncelli
2020-05-28 22:03 ` Hans van Kranenburg
2020-05-28 18:34 ` [PATCH 2/4] Add flags for dedicated metadata disks Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 3/4] Export dev_item.type in sysfs /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 4/4] btrfs: add preferred_metadata mode Goffredo Baroncelli
2020-05-28 22:02 ` Hans van Kranenburg
2020-05-29 16:26 ` Goffredo Baroncelli [this message]
2020-05-28 21:59 ` [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Hans van Kranenburg
2020-05-29 16:37 ` Goffredo Baroncelli
2020-05-30 11:44 ` Zygo Blaxell
2020-05-30 11:51 ` Goffredo Baroncelli
2021-01-08 1:05 ` Zygo Blaxell
2021-01-08 17:30 ` Goffredo Baroncelli
2021-01-08 17:43 ` BTRFS and *CACHE setup [was Re: [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata] Goffredo Baroncelli
2021-01-09 21:23 ` [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Zygo Blaxell
2021-01-10 19:55 ` Goffredo Baroncelli
2021-01-16 0:25 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83e19781-1733-47bb-dc07-876ca82e94c1@libero.it \
--to=kreijack@libero.it \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=hans@knorrie.org \
--cc=hugo@carfax.org.uk \
--cc=kilobyte@angband.pl \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=martin.svec@zoner.cz \
--cc=mclaud@roznica.com.ua \
--cc=paul@pauljones.id.au \
--cc=wangyugui@e16-tech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).