All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Hans van Kranenburg <hans@knorrie.org>, linux-btrfs@vger.kernel.org
Cc: Michael <mclaud@roznica.com.ua>, Hugo Mills <hugo@carfax.org.uk>,
	Martin Svec <martin.svec@zoner.cz>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Paul Jones <paul@pauljones.id.au>,
	Adam Borowski <kilobyte@angband.pl>,
	Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [PATCH 4/4] btrfs: add preferred_metadata mode
Date: Fri, 29 May 2020 18:26:38 +0200	[thread overview]
Message-ID: <83e19781-1733-47bb-dc07-876ca82e94c1@libero.it> (raw)
In-Reply-To: <61d2188a-290c-5d6f-ec32-6cacd3f63ce8@knorrie.org>

On 5/29/20 12:02 AM, Hans van Kranenburg wrote:
> Hi,
> 
> On 5/28/20 8:34 PM, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>
>> When this mode is enabled,
> 
> The commit message does not mention if this is either only a convenience
> during development and testing of the feature to be able to quickly turn
> it on/off, or if you intend to have this into the final change set.

Good question. IMHO for the initial devel phase I think that it is useful to have
a preferred_metadata disk (opt-in). Then we could reverse the logic and
default to preferred_metadata. Of course then we will have a
no-preferred_metadata flag (opt-out)
> 
>> the allocation policy of the chunk
>> is so modified:
>> - allocation of metadata chunk: priority is given to preferred_metadata
>>    disks.
>> - allocation of data chunk: priority is given to a non preferred_metadata
>>    disk.
>>
>> When a striped profile is involved (like RAID0,5,6), the logic
>> is a bit more complex. If there are enough disks, the data profiles
>> are stored on the non preferred_metadata disks; instead the metadata
>> profiles are stored on the preferred_metadata disk.
>> If the disks are not enough, then the profile is allocated on all
>> the disks.
>>
>> Example: assuming that sda, sdb, sdc are ssd disks, and sde, sdf are
>> non preferred_metadata ones.
>> A data profile raid6, will be stored on sda, sdb, sdc, sde, sdf (sde
>> and sdf are not enough to host a raid5 profile).
>> A metadata profile raid6, will be stored on sda, sdb, sdc (these
>> are enough to host a raid6 profile).
>>
>> To enable this mode pass -o dedicated_metadata at mount time.
> 
> Is it dedicated_metadata or preferred_metadata?

It was an copy&paste error. It should be preferred_metadata
> 
>> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
>> ---
>>   fs/btrfs/ctree.h   |  1 +
>>   fs/btrfs/super.c   |  8 +++++
>>   fs/btrfs/volumes.c | 89 ++++++++++++++++++++++++++++++++++++++++++++--
>>   fs/btrfs/volumes.h |  1 +
>>   4 files changed, 97 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 03ea7370aea7..779760fd27b1 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1239,6 +1239,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
>>   #define BTRFS_MOUNT_NOLOGREPLAY		(1 << 27)
>>   #define BTRFS_MOUNT_REF_VERIFY		(1 << 28)
>>   #define BTRFS_MOUNT_DISCARD_ASYNC	(1 << 29)
>> +#define BTRFS_MOUNT_PREFERRED_METADATA	(1 << 30)
>>   
>>   #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
>>   #define BTRFS_DEFAULT_MAX_INLINE	(2048)
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 438ecba26557..80700dc9dcf8 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -359,6 +359,7 @@ enum {
>>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>   	Opt_ref_verify,
>>   #endif
>> +	Opt_preferred_metadata,
>>   	Opt_err,
>>   };
>>   
>> @@ -430,6 +431,7 @@ static const match_table_t tokens = {
>>   #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>>   	{Opt_ref_verify, "ref_verify"},
>>   #endif
>> +	{Opt_preferred_metadata, "preferred_metadata"},
>>   	{Opt_err, NULL},
>>   };
>>   
>> @@ -881,6 +883,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>>   			btrfs_set_opt(info->mount_opt, REF_VERIFY);
>>   			break;
>>   #endif
>> +		case Opt_preferred_metadata:
>> +			btrfs_set_and_info(info, PREFERRED_METADATA,
>> +					"enabling preferred_metadata");
>> +			break;
>>   		case Opt_err:
>>   			btrfs_err(info, "unrecognized mount option '%s'", p);
>>   			ret = -EINVAL;
>> @@ -1403,6 +1409,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
>>   #endif
>>   	if (btrfs_test_opt(info, REF_VERIFY))
>>   		seq_puts(seq, ",ref_verify");
>> +	if (btrfs_test_opt(info, PREFERRED_METADATA))
>> +		seq_puts(seq, ",preferred_metadata");
>>   	seq_printf(seq, ",subvolid=%llu",
>>   		  BTRFS_I(d_inode(dentry))->root->root_key.objectid);
>>   	seq_puts(seq, ",subvol=");
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 5265f54c2931..c68efb15e473 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4770,6 +4770,56 @@ static int btrfs_cmp_device_info(const void *a, const void *b)
>>   	return 0;
>>   }
>>   
>> +/*
>> + * sort the devices in descending order by preferred_metadata,
>> + * max_avail, total_avail
>> + */
>> +static int btrfs_cmp_device_info_metadata(const void *a, const void *b)
>> +{
>> +	const struct btrfs_device_info *di_a = a;
>> +	const struct btrfs_device_info *di_b = b;
>> +
>> +	/* metadata -> preferred_metadata first */
>> +	if (di_a->preferred_metadata && !di_b->preferred_metadata)
>> +		return -1;
>> +	if (!di_a->preferred_metadata && di_b->preferred_metadata)
>> +		return 1;
>> +	if (di_a->max_avail > di_b->max_avail)
>> +		return -1;
>> +	if (di_a->max_avail < di_b->max_avail)
>> +		return 1;
>> +	if (di_a->total_avail > di_b->total_avail)
>> +		return -1;
>> +	if (di_a->total_avail < di_b->total_avail)
>> +		return 1;
>> +	return 0;
>> +}
>> +
>> +/*
>> + * sort the devices in descending order by !preferred_metadata,
>> + * max_avail, total_avail
>> + */
>> +static int btrfs_cmp_device_info_data(const void *a, const void *b)
>> +{
>> +	const struct btrfs_device_info *di_a = a;
>> +	const struct btrfs_device_info *di_b = b;
>> +
>> +	/* data -> preferred_metadata last */
>> +	if (di_a->preferred_metadata && !di_b->preferred_metadata)
>> +		return 1;
>> +	if (!di_a->preferred_metadata && di_b->preferred_metadata)
>> +		return -1;
>> +	if (di_a->max_avail > di_b->max_avail)
>> +		return -1;
>> +	if (di_a->max_avail < di_b->max_avail)
>> +		return 1;
>> +	if (di_a->total_avail > di_b->total_avail)
>> +		return -1;
>> +	if (di_a->total_avail < di_b->total_avail)
>> +		return 1;
>> +	return 0;
>> +}
>> +
>>   static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
>>   {
>>   	if (!(type & BTRFS_BLOCK_GROUP_RAID56_MASK))
>> @@ -4885,6 +4935,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>>   	int ndevs = 0;
>>   	u64 max_avail;
>>   	u64 dev_offset;
>> +	int nr_preferred_metadata = 0;
>>   
>>   	/*
>>   	 * in the first pass through the devices list, we gather information
>> @@ -4937,15 +4988,49 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>>   		devices_info[ndevs].max_avail = max_avail;
>>   		devices_info[ndevs].total_avail = total_avail;
>>   		devices_info[ndevs].dev = device;
>> +		devices_info[ndevs].preferred_metadata = !!(device->type &
>> +			BTRFS_DEV_PREFERRED_METADATA);
>> +		if (devices_info[ndevs].preferred_metadata)
>> +			nr_preferred_metadata++;
>>   		++ndevs;
>>   	}
>>   	ctl->ndevs = ndevs;
>>   
>> +	BUG_ON(nr_preferred_metadata > ndevs);
>>   	/*
>>   	 * now sort the devices by hole size / available space
>>   	 */
>> -	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>> -	     btrfs_cmp_device_info, NULL);
>> +	if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
>> +	     (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
>> +	    !btrfs_test_opt(info, PREFERRED_METADATA)) {
>> +		/* mixed bg or PREFERRED_METADATA not set */
>> +		sort(devices_info, ctl->ndevs, sizeof(struct btrfs_device_info),
>> +			     btrfs_cmp_device_info, NULL);
>> +	} else {
>> +		/*
>> +		 * if PREFERRED_METADATA is set, sort the device considering
>> +		 * also the kind (preferred_metadata or not). Limit the
>> +		 * availables devices to the ones of the same kind, to avoid
>> +		 * that a striped profile, like raid5, spreads to all kind of
>> +		 * devices.
>> +		 * It is allowed to use different kinds of devices if the ones
>> +		 * of the same kind are not enough alone.
>> +		 */
>> +		if (ctl->type & BTRFS_BLOCK_GROUP_DATA) {
>> +			int nr_data = ctl->ndevs - nr_preferred_metadata;
>> +			sort(devices_info, ctl->ndevs,
>> +				     sizeof(struct btrfs_device_info),
>> +				     btrfs_cmp_device_info_data, NULL);
>> +			if (nr_data >= ctl->devs_min)
>> +				ctl->ndevs = nr_data;
>> +		} else { /* non data -> metadata and system */
>> +			sort(devices_info, ctl->ndevs,
>> +				     sizeof(struct btrfs_device_info),
>> +				     btrfs_cmp_device_info_metadata, NULL);
>> +			if (nr_preferred_metadata >= ctl->devs_min)
>> +				ctl->ndevs = nr_preferred_metadata;
>> +		}
>> +	}
>>   
>>   	return 0;
>>   }
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 0ac5bf2b95e6..d39c3b0e7569 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -347,6 +347,7 @@ struct btrfs_device_info {
>>   	u64 dev_offset;
>>   	u64 max_avail;
>>   	u64 total_avail;
>> +	int preferred_metadata:1;
>>   };
>>   
>>   struct btrfs_raid_attr {
>>
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2020-05-29 16:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 18:34 [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 1/4] Add an ioctl to set/retrive the device properties Goffredo Baroncelli
2020-05-28 22:03   ` Hans van Kranenburg
2020-05-28 18:34 ` [PATCH 2/4] Add flags for dedicated metadata disks Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 3/4] Export dev_item.type in sysfs /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 4/4] btrfs: add preferred_metadata mode Goffredo Baroncelli
2020-05-28 22:02   ` Hans van Kranenburg
2020-05-29 16:26     ` Goffredo Baroncelli [this message]
2020-05-28 21:59 ` [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Hans van Kranenburg
2020-05-29 16:37   ` Goffredo Baroncelli
2020-05-30 11:44     ` Zygo Blaxell
2020-05-30 11:51       ` Goffredo Baroncelli
2021-01-08  1:05 ` Zygo Blaxell
2021-01-08 17:30   ` Goffredo Baroncelli
2021-01-08 17:43     ` BTRFS and *CACHE setup [was Re: [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata] Goffredo Baroncelli
2021-01-09 21:23     ` [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Zygo Blaxell
2021-01-10 19:55       ` Goffredo Baroncelli
2021-01-16  0:25         ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83e19781-1733-47bb-dc07-876ca82e94c1@libero.it \
    --to=kreijack@libero.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=hans@knorrie.org \
    --cc=hugo@carfax.org.uk \
    --cc=kilobyte@angband.pl \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin.svec@zoner.cz \
    --cc=mclaud@roznica.com.ua \
    --cc=paul@pauljones.id.au \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.