All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Hans van Kranenburg <hans@knorrie.org>,
	linux-btrfs@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.cz>,
	Sinnamohideen Shafeeq <shafeeqs@panasas.com>
Subject: Re: [PATCH 4/4] btrfs: add allocator_hint mode
Date: Sat, 18 Dec 2021 10:07:18 +0100	[thread overview]
Message-ID: <5afe9f17-d171-c4e5-84f0-24f9a7fa250f@libero.it> (raw)
In-Reply-To: <YbzoA6n8D7jT7y/F@hungrycats.org>

On 12/17/21 20:41, Zygo Blaxell wrote:
> On Fri, Dec 17, 2021 at 07:28:28PM +0100, Goffredo Baroncelli wrote:
>> On 12/17/21 16:58, Hans van Kranenburg wrote:
[...]
>> -----------------------------
>> The chunk allocation policy is modified as follow.
>>
>> Each disk may have one of the following tags:
>> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
> 
> Is it too late to rename these?  The order of the words is inconsistent
> and the English usage is a bit odd.
> 
> I'd much rather have:
> 
>> - BTRFS_DEV_ALLOCATION_PREFER_METADATA
>> - BTRFS_DEV_ALLOCATION_ONLY_METADATA
>> - BTRFS_DEV_ALLOCATION_ONLY_DATA
>> - BTRFS_DEV_ALLOCATION_PREFER_DATA (default)
> 
> English speakers would say "[I/we/you] prefer X" or "X [is] preferred".
> 
> or
> 
>> - BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>> - BTRFS_DEV_ALLOCATION_DATA_PREFERRED (default)
> 
> I keep typing "data_preferred" and "only_data" when it's really
> "preferred_data" and "data_only" because they're not consistent.
> 

Sorry but it is unclear to me the last sentence :-)

Anyway I prefer
BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
BTRFS_DEV_ALLOCATION_METADATA_ONLY
[...]

Because it seems to me more consistent



>> During a *mixed data/metadata* chunk allocation, BTRFS works as
>> usual.
>>
>> During a *data* chunk allocation, the space are searched first in
>> BTRFS_DEV_ALLOCATION_DATA_ONLY and BTRFS_DEV_ALLOCATION_PREFERRED_DATA
>> tagged disks. If no space is found or the space found is not enough (eg.
>> in raid5, only two disks are available), then also the disks tagged
>> BTRFS_DEV_ALLOCATION_PREFERRED_METADATA are evaluated. If even in this
>> case this the space is not sufficient, -ENOSPC is raised.
>> A disk tagged with BTRFS_DEV_ALLOCATION_METADATA_ONLY is never considered
>> for a data BG allocation.
>>
>> During a *metadata* chunk allocation, the space are searched first in
>> BTRFS_DEV_ALLOCATION_METADATA_ONLY and BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>> tagged disks. If no space is found or the space found is not enough (eg.
>> in raid5, only two disks are available), then also the disks tagged
>> BTRFS_DEV_ALLOCATION_PREFERRED_DATA are considered. If even in this
>> case this the space is not sufficient, -ENOSPC is raised.
>> A disk tagged with BTRFS_DEV_ALLOCATION_DATA_ONLY is never considered
>> for a metadata BG allocation.
>>
>> By default the disks are tagged as BTRFS_DEV_ALLOCATION_PREFERRED_DATA,
>> so the default behavior happens. If the user prefer to store the
>> metadata in the faster disks (e.g. the SSD), he can tag these with
>> BTRFS_DEV_ALLOCATION_PREFERRED_DATA: in this case the data BG go in the
>> BTRFS_DEV_ALLOCATION_PREFERRED_DATA disks and the metadata BG in the
>> others, until there is enough space. Only if one disks set is filled,
>> the other is occupied.
>>
>> WARNING: if the user tags a disk with BTRFS_DEV_ALLOCATION_DATA_ONLY,
>> this means that this disk will never be used for allocating metadata
>> increasing the likelihood of exhausting the metadata space.
> 
> This WARNING is not correct.  We use a combination of METADATA_ONLY and
> DATA_ONLY preferences to exclude data allocations from metadata devices,
> reducing the likelihood of exhausting the metadata space all the way
> to zero.  We do have to provide correctly-sized metadata devices, but
> SSDs come in powers-of-2 sizes, so we just bump up to the next power of
> two or add another SSD to the filesystem every time a metadata device
> goes over 50%.
> 
> Metadata-only devices completely eliminate our need to do other
> workarounds like data balances to reclaim unallocated space for metadata.
> 
> _PREFERRED devices are the problematic case.  Since no space is
> exclusively reserved for metadata, it means you have to do maintenance
> data balances as the filesystem fills up because you will be constantly
> getting data clogging up your metadata devices.


> 
> There is a use case for a mix of _PREFERRED and _ONLY devices:  a system
> with NVMe, SSD, and HDD might want to have the SSD use DATA_PREFERRED or
> METADATA_PREFERRED while the NVMe and HDD use METADATA_ONLY and DATA_ONLY
> respectively.  But this use case is not a very good match for what the
> implementation does--we'd want to separate device selection ("can I use
> this device for metadata, ever?") from ordering ("which devices should
> I use for metadata first?").
> 
> To keep things simple I'd say that use case is out of scope, and recommend
> not mixing _PREFERRED and _ONLY in the same filesystem.  Either explicitly
> allocate everything with _ONLY, or mark every device _PREFERRED one way
> or the other, but don't use both _ONLY and _PREFERRED at the same time
> unless you really know what you're doing.

In what METADATA_ONLY + DATA_PREFERRED would be more dangerous than
METADATA_ONLY + DATA_ONLY ?

If fact there I see two mains differents use cases:
- I want to put my metadata on a SSD for performance reasoning:
	METADATA_PREFERRED + DATA_PREFERRED
    as the most conservative approach
- I want to protect the metadata BG space from exhaustion (assuming that
   a "today standard" disk is far larger than the total BG metadata)
	METADATA_ONLY + X
   is a valid approach




[...]

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2021-12-18  9:07 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58   ` Hans van Kranenburg
2021-12-17 18:28     ` Goffredo Baroncelli
2021-12-17 19:41       ` Zygo Blaxell
2021-12-18  9:07         ` Goffredo Baroncelli [this message]
2021-12-18 22:48           ` Zygo Blaxell
2021-12-19  0:03             ` Graham Cobb
2021-12-19  2:30               ` Zygo Blaxell
2021-12-13  9:39 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Paul Jones
2021-12-13 19:54   ` Goffredo Baroncelli
2021-12-13 21:15     ` Josef Bacik
2021-12-13 22:49       ` Zygo Blaxell
2021-12-14 14:31         ` Josef Bacik
2021-12-14 19:03         ` Goffredo Baroncelli
2021-12-14 20:04           ` Zygo Blaxell
2021-12-14 20:34             ` Josef Bacik
2021-12-14 20:41               ` Goffredo Baroncelli
2021-12-15 13:58                 ` Josef Bacik
2021-12-15 18:53                   ` Goffredo Baroncelli
2021-12-16  0:56                     ` Josef Bacik
2021-12-17  5:40                       ` Zygo Blaxell
2021-12-17 14:48                         ` Josef Bacik
2021-12-17 16:31                           ` Zygo Blaxell
2021-12-17 18:08                         ` Goffredo Baroncelli
2021-12-16  2:30                   ` Paul Jones
2021-12-14  1:03       ` Sinnamohideen, Shafeeq
2021-12-14 18:53       ` Goffredo Baroncelli
2021-12-14 20:35         ` Josef Bacik
     [not found] <cover.1614028083.git.kreijack@inwind.it>
2021-02-22 21:19 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5afe9f17-d171-c4e5-84f0-24f9a7fa250f@libero.it \
    --to=kreijack@libero.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=hans@knorrie.org \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=shafeeqs@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.