From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B4D7C28CF5 for ; Wed, 26 Jan 2022 17:39:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230229AbiAZRjV (ORCPT ); Wed, 26 Jan 2022 12:39:21 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:54610 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229998AbiAZRjU (ORCPT ); Wed, 26 Jan 2022 12:39:20 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 8048E1F3B0; Wed, 26 Jan 2022 17:39:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1643218759; h=from:from:reply-to:reply-to:date:date:message-id:message-id:to:to: cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1WR1SoC9UvcHWh7caBE1Gt0AYBKm5UqxXaqZntuN8gI=; b=IR6t/smrW1w7JioVmOrKx1X6Pw0PvKz7t54yLiP8VHWCTbfYxQG4j7OqjHRDOlifg/Lbb+ rS6KjkfTQotvsk1Lb4rlXSPG956oHurUVHJ5yZN59Gx3pSGA3MPPLbvB01wR3+FNl4R7Xy o3YJn+esPr1HXSrdAYSex9GDqnOGXLE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1643218759; h=from:from:reply-to:reply-to:date:date:message-id:message-id:to:to: cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1WR1SoC9UvcHWh7caBE1Gt0AYBKm5UqxXaqZntuN8gI=; b=3c4E9Y9RPxowYkdEb/EscJu7yXmDrmALozQOGwuIw/6hmw8ZIzcW1P8JdrcyLQEAx1PmaB y6V6fixrc+TpNWBw== Received: from ds.suse.cz (ds.suse.cz [10.100.12.205]) by relay2.suse.de (Postfix) with ESMTP id 79C8BA3B81; Wed, 26 Jan 2022 17:39:19 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id 5D169DA7A9; Wed, 26 Jan 2022 18:38:38 +0100 (CET) Date: Wed, 26 Jan 2022 18:38:38 +0100 From: David Sterba To: Anand Jain Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH 2/2] btrfs: create chunk device type aware Message-ID: <20220126173838.GE14046@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Anand Jain , linux-btrfs@vger.kernel.org References: <4ac12d6661470b18e1145f98c355bc1a93ebf214.1642518245.git.anand.jain@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ac12d6661470b18e1145f98c355bc1a93ebf214.1642518245.git.anand.jain@oracle.com> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Tue, Jan 18, 2022 at 11:18:02PM +0800, Anand Jain wrote: > Mixed device types configuration exist. Most commonly, HDD mixed with > SSD/NVME device types. This use case prefers that the data chunk > allocates on HDD and the metadata chunk allocates on the SSD/NVME. > > As of now, in the function gather_device_info() called from > btrfs_create_chunk(), we sort the devices based on unallocated space > only. Yes, and this guarantees maximizing the used space for the raid profiles. > After this patch, this function will check for mixed device types. And > will sort the devices based on enum btrfs_device_types. That is, sort if > the allocation type is metadata and reverse-sort if the allocation type > is data. And this changes the above, because a small fast device can be depleted first. > The advantage of this method is that data/metadata allocation distribution > based on the device type happens automatically without any manual > configuration. Yeah, but the default behaviour may not be suitable for all users so some policy will have to be done anyway. I vaguely remember some comments regarding mixed setups, along lines that "if there's a fast flash device I'd rather see ENOSPC and either delete files or add more devices than to let everything work but with the risk of storing metadata on the slow devices." > Signed-off-by: Anand Jain > --- > fs/btrfs/volumes.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index da3d6d0f5bc3..77fba78555d7 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -5060,6 +5060,37 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info, > return 0; > } > > +/* > + * Sort the devices in its ascending order of latency value. > + */ > +static int btrfs_cmp_device_latency(const void *a, const void *b) > +{ > + const struct btrfs_device_info *di_a = a; > + const struct btrfs_device_info *di_b = b; > + struct btrfs_device *dev_a = di_a->dev; > + struct btrfs_device *dev_b = di_b->dev; > + > + if (dev_a->dev_type > dev_b->dev_type) > + return 1; > + if (dev_a->dev_type < dev_b->dev_type) > + return -1; > + return 0; > +} > + > +static int btrfs_cmp_device_rev_latency(const void *a, const void *b) > +{ > + const struct btrfs_device_info *di_a = a; > + const struct btrfs_device_info *di_b = b; > + struct btrfs_device *dev_a = di_a->dev; > + struct btrfs_device *dev_b = di_b->dev; > + > + if (dev_a->dev_type > dev_b->dev_type) > + return -1; > + if (dev_a->dev_type < dev_b->dev_type) > + return 1; > + return 0; > +} > + > /* > * sort the devices in descending order by max_avail, total_avail > */ > @@ -5292,6 +5323,20 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, > sort(devices_info, ndevs, sizeof(struct btrfs_device_info), > btrfs_cmp_device_info, NULL); > > + /* > + * Sort devices by their latency. Ascending order of latency for > + * metadata and descending order of latency for the data chunks for > + * mixed device types. > + */ > + if (fs_devices->mixed_dev_types) { > + if (ctl->type & BTRFS_BLOCK_GROUP_DATA) > + sort(devices_info, ndevs, sizeof(struct btrfs_device_info), > + btrfs_cmp_device_rev_latency, NULL); > + else > + sort(devices_info, ndevs, sizeof(struct btrfs_device_info), > + btrfs_cmp_device_latency, NULL); In case there are mixed devices the sort happens twice and because as implemented in kernel sort() is not stable so even if device have same amount of data they can get reorderd wildly. The remaingin space is still a factor we need to take into account to avoid ENOSPC on the chunk level.