linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Hans van Kranenburg <hans@knorrie.org>,
	linux-btrfs@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.cz>,
	Sinnamohideen Shafeeq <shafeeqs@panasas.com>
Subject: Re: [PATCH 4/4] btrfs: add allocator_hint mode
Date: Sat, 18 Dec 2021 10:07:18 +0100	[thread overview]
Message-ID: <5afe9f17-d171-c4e5-84f0-24f9a7fa250f@libero.it> (raw)
In-Reply-To: <YbzoA6n8D7jT7y/F@hungrycats.org>

On 12/17/21 20:41, Zygo Blaxell wrote:
> On Fri, Dec 17, 2021 at 07:28:28PM +0100, Goffredo Baroncelli wrote:
>> On 12/17/21 16:58, Hans van Kranenburg wrote:
[...]
>> -----------------------------
>> The chunk allocation policy is modified as follow.
>>
>> Each disk may have one of the following tags:
>> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
> 
> Is it too late to rename these?  The order of the words is inconsistent
> and the English usage is a bit odd.
> 
> I'd much rather have:
> 
>> - BTRFS_DEV_ALLOCATION_PREFER_METADATA
>> - BTRFS_DEV_ALLOCATION_ONLY_METADATA
>> - BTRFS_DEV_ALLOCATION_ONLY_DATA
>> - BTRFS_DEV_ALLOCATION_PREFER_DATA (default)
> 
> English speakers would say "[I/we/you] prefer X" or "X [is] preferred".
> 
> or
> 
>> - BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>> - BTRFS_DEV_ALLOCATION_DATA_PREFERRED (default)
> 
> I keep typing "data_preferred" and "only_data" when it's really
> "preferred_data" and "data_only" because they're not consistent.
> 

Sorry but it is unclear to me the last sentence :-)

Anyway I prefer
BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
BTRFS_DEV_ALLOCATION_METADATA_ONLY
[...]

Because it seems to me more consistent



>> During a *mixed data/metadata* chunk allocation, BTRFS works as
>> usual.
>>
>> During a *data* chunk allocation, the space are searched first in
>> BTRFS_DEV_ALLOCATION_DATA_ONLY and BTRFS_DEV_ALLOCATION_PREFERRED_DATA
>> tagged disks. If no space is found or the space found is not enough (eg.
>> in raid5, only two disks are available), then also the disks tagged
>> BTRFS_DEV_ALLOCATION_PREFERRED_METADATA are evaluated. If even in this
>> case this the space is not sufficient, -ENOSPC is raised.
>> A disk tagged with BTRFS_DEV_ALLOCATION_METADATA_ONLY is never considered
>> for a data BG allocation.
>>
>> During a *metadata* chunk allocation, the space are searched first in
>> BTRFS_DEV_ALLOCATION_METADATA_ONLY and BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>> tagged disks. If no space is found or the space found is not enough (eg.
>> in raid5, only two disks are available), then also the disks tagged
>> BTRFS_DEV_ALLOCATION_PREFERRED_DATA are considered. If even in this
>> case this the space is not sufficient, -ENOSPC is raised.
>> A disk tagged with BTRFS_DEV_ALLOCATION_DATA_ONLY is never considered
>> for a metadata BG allocation.
>>
>> By default the disks are tagged as BTRFS_DEV_ALLOCATION_PREFERRED_DATA,
>> so the default behavior happens. If the user prefer to store the
>> metadata in the faster disks (e.g. the SSD), he can tag these with
>> BTRFS_DEV_ALLOCATION_PREFERRED_DATA: in this case the data BG go in the
>> BTRFS_DEV_ALLOCATION_PREFERRED_DATA disks and the metadata BG in the
>> others, until there is enough space. Only if one disks set is filled,
>> the other is occupied.
>>
>> WARNING: if the user tags a disk with BTRFS_DEV_ALLOCATION_DATA_ONLY,
>> this means that this disk will never be used for allocating metadata
>> increasing the likelihood of exhausting the metadata space.
> 
> This WARNING is not correct.  We use a combination of METADATA_ONLY and
> DATA_ONLY preferences to exclude data allocations from metadata devices,
> reducing the likelihood of exhausting the metadata space all the way
> to zero.  We do have to provide correctly-sized metadata devices, but
> SSDs come in powers-of-2 sizes, so we just bump up to the next power of
> two or add another SSD to the filesystem every time a metadata device
> goes over 50%.
> 
> Metadata-only devices completely eliminate our need to do other
> workarounds like data balances to reclaim unallocated space for metadata.
> 
> _PREFERRED devices are the problematic case.  Since no space is
> exclusively reserved for metadata, it means you have to do maintenance
> data balances as the filesystem fills up because you will be constantly
> getting data clogging up your metadata devices.


> 
> There is a use case for a mix of _PREFERRED and _ONLY devices:  a system
> with NVMe, SSD, and HDD might want to have the SSD use DATA_PREFERRED or
> METADATA_PREFERRED while the NVMe and HDD use METADATA_ONLY and DATA_ONLY
> respectively.  But this use case is not a very good match for what the
> implementation does--we'd want to separate device selection ("can I use
> this device for metadata, ever?") from ordering ("which devices should
> I use for metadata first?").
> 
> To keep things simple I'd say that use case is out of scope, and recommend
> not mixing _PREFERRED and _ONLY in the same filesystem.  Either explicitly
> allocate everything with _ONLY, or mark every device _PREFERRED one way
> or the other, but don't use both _ONLY and _PREFERRED at the same time
> unless you really know what you're doing.

In what METADATA_ONLY + DATA_PREFERRED would be more dangerous than
METADATA_ONLY + DATA_ONLY ?

If fact there I see two mains differents use cases:
- I want to put my metadata on a SSD for performance reasoning:
	METADATA_PREFERRED + DATA_PREFERRED
    as the most conservative approach
- I want to protect the metadata BG space from exhaustion (assuming that
   a "today standard" disk is far larger than the total BG metadata)
	METADATA_ONLY + X
   is a valid approach




[...]

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2021-12-18  9:07 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58   ` Hans van Kranenburg
2021-12-17 18:28     ` Goffredo Baroncelli
2021-12-17 19:41       ` Zygo Blaxell
2021-12-18  9:07         ` Goffredo Baroncelli [this message]
2021-12-18 22:48           ` Zygo Blaxell
2021-12-19  0:03             ` Graham Cobb
2021-12-19  2:30               ` Zygo Blaxell
2021-12-13  9:39 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Paul Jones
2021-12-13 19:54   ` Goffredo Baroncelli
2021-12-13 21:15     ` Josef Bacik
2021-12-13 22:49       ` Zygo Blaxell
2021-12-14 14:31         ` Josef Bacik
2021-12-14 19:03         ` Goffredo Baroncelli
2021-12-14 20:04           ` Zygo Blaxell
2021-12-14 20:34             ` Josef Bacik
2021-12-14 20:41               ` Goffredo Baroncelli
2021-12-15 13:58                 ` Josef Bacik
2021-12-15 18:53                   ` Goffredo Baroncelli
2021-12-16  0:56                     ` Josef Bacik
2021-12-17  5:40                       ` Zygo Blaxell
2021-12-17 14:48                         ` Josef Bacik
2021-12-17 16:31                           ` Zygo Blaxell
2021-12-17 18:08                         ` Goffredo Baroncelli
2021-12-16  2:30                   ` Paul Jones
2021-12-14  1:03       ` Sinnamohideen, Shafeeq
2021-12-14 18:53       ` Goffredo Baroncelli
2021-12-14 20:35         ` Josef Bacik
     [not found] <cover.1614028083.git.kreijack@inwind.it>
2021-02-22 21:19 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5afe9f17-d171-c4e5-84f0-24f9a7fa250f@libero.it \
    --to=kreijack@libero.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=hans@knorrie.org \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=shafeeqs@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).