All of lore.kernel.org
 help / color / mirror / Atom feed
From: Graham Cobb <g.btrfs@cobb.uk.net>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>, kreijack@inwind.it
Cc: Hans van Kranenburg <hans@knorrie.org>,
	linux-btrfs@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.cz>,
	Sinnamohideen Shafeeq <shafeeqs@panasas.com>
Subject: Re: [PATCH 4/4] btrfs: add allocator_hint mode
Date: Sun, 19 Dec 2021 00:03:32 +0000	[thread overview]
Message-ID: <4e18eff2-fca1-bde2-b942-159f89569f0f@cobb.uk.net> (raw)
In-Reply-To: <Yb5lSevjq3eURuYB@hungrycats.org>

On 18/12/2021 22:48, Zygo Blaxell wrote:
> On Sat, Dec 18, 2021 at 10:07:18AM +0100, Goffredo Baroncelli wrote:
>> On 12/17/21 20:41, Zygo Blaxell wrote:
>>> On Fri, Dec 17, 2021 at 07:28:28PM +0100, Goffredo Baroncelli wrote:
>>>> On 12/17/21 16:58, Hans van Kranenburg wrote:
>> [...]
>>>> -----------------------------
>>>> The chunk allocation policy is modified as follow.
>>>>
>>>> Each disk may have one of the following tags:
>>>> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>>>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>>>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>>>> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
>>>
>>> Is it too late to rename these?  The order of the words is inconsistent
>>> and the English usage is a bit odd.
>>>
>>> I'd much rather have:
>>>
>>>> - BTRFS_DEV_ALLOCATION_PREFER_METADATA
>>>> - BTRFS_DEV_ALLOCATION_ONLY_METADATA
>>>> - BTRFS_DEV_ALLOCATION_ONLY_DATA
>>>> - BTRFS_DEV_ALLOCATION_PREFER_DATA (default)
>>>
>>> English speakers would say "[I/we/you] prefer X" or "X [is] preferred".
>>>
>>> or
>>>
>>>> - BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
>>>> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>>>> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>>>> - BTRFS_DEV_ALLOCATION_DATA_PREFERRED (default)
>>>
>>> I keep typing "data_preferred" and "only_data" when it's really
>>> "preferred_data" and "data_only" because they're not consistent.
>>>
>>
>> Sorry but it is unclear to me the last sentence :-)
>>
>> Anyway I prefer
>> BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
>> BTRFS_DEV_ALLOCATION_METADATA_ONLY
>> [...]
>>
>> Because it seems to me more consistent
> 
> Sounds good.
> 
>>> There is a use case for a mix of _PREFERRED and _ONLY devices:  a system
>>> with NVMe, SSD, and HDD might want to have the SSD use DATA_PREFERRED or
>>> METADATA_PREFERRED while the NVMe and HDD use METADATA_ONLY and DATA_ONLY
>>> respectively.  But this use case is not a very good match for what the
>>> implementation does--we'd want to separate device selection ("can I use
>>> this device for metadata, ever?") from ordering ("which devices should
>>> I use for metadata first?").
>>>
>>> To keep things simple I'd say that use case is out of scope, and recommend
>>> not mixing _PREFERRED and _ONLY in the same filesystem.  Either explicitly
>>> allocate everything with _ONLY, or mark every device _PREFERRED one way
>>> or the other, but don't use both _ONLY and _PREFERRED at the same time
>>> unless you really know what you're doing.
>>
>> In what METADATA_ONLY + DATA_PREFERRED would be more dangerous than
>> METADATA_ONLY + DATA_ONLY ?
> 
> If capacity is our first priority, we use METADATA_PREFERRED
> and DATA_PREFERRED (everything can be allocated everywhere, we try
> the highest performance but fall back).
> 
> If performance is our first priority, we use METADATA_ONLY and DATA_ONLY
> (so we never have to balance which would reduce performance) or
> METADATA_PREFERRED and DATA_ONLY (so we have more capacity, but get
> lower performance because we must balance data in some cases, but not
> as low as any combination of options with DATA_PREFERRED).

I think it would be a mistake to think that your performance and
capacity use cases are the only ones others will care about.

Your analysis misses a third option for priority: resilience. I have a
nearline backup server. It stores a lot of data but it is almost
entirely write-only. My priority is to be able to get at most of the
data quickly if I need it sometime - it isn't critical for any specific
piece of data as I have additional, slower backups, but I want to be
able to restore as much as possible from this server for speed and
convenience. To keep as much nearline backup as possible, I keep data in
SINGLE and metadata in RAID1. Fine - I can do that today.

However, in normal use the main activity is btrfs receive of many mostly
unchanged subvolumes every day. So, what I do today is have a large data
disk and a second small disk for the RAID1 copy of the metadata. I want
to keep data off that second disk. With this patch, I expect to set the
metadata disk as METADATA_ONLY and the data disk as DATA_PREFERRED.

Of course I would *also* like to be able to get btrfs to mostly read the
RAID1 copy from the fast metadata disk for reading metadata. This patch
does not address that, but I hope one day there will be a separate
option for that.

I think the proposed settings are a useful step and will allow some
experimentation and learning with different scenarios. They certainly
aren't the answer to all allocation problems but I would like to see
them available as soon as possible,

Graham

  reply	other threads:[~2021-12-19  0:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-24 15:31 [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 1/4] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 2/4] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 3/4] btrfs: change the DEV_ITEM 'type' field via sysfs Goffredo Baroncelli
2021-10-24 15:31 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-12-17 15:58   ` Hans van Kranenburg
2021-12-17 18:28     ` Goffredo Baroncelli
2021-12-17 19:41       ` Zygo Blaxell
2021-12-18  9:07         ` Goffredo Baroncelli
2021-12-18 22:48           ` Zygo Blaxell
2021-12-19  0:03             ` Graham Cobb [this message]
2021-12-19  2:30               ` Zygo Blaxell
2021-12-13  9:39 ` [RFC][V8][PATCH 0/5] btrfs: allocation_hint mode Paul Jones
2021-12-13 19:54   ` Goffredo Baroncelli
2021-12-13 21:15     ` Josef Bacik
2021-12-13 22:49       ` Zygo Blaxell
2021-12-14 14:31         ` Josef Bacik
2021-12-14 19:03         ` Goffredo Baroncelli
2021-12-14 20:04           ` Zygo Blaxell
2021-12-14 20:34             ` Josef Bacik
2021-12-14 20:41               ` Goffredo Baroncelli
2021-12-15 13:58                 ` Josef Bacik
2021-12-15 18:53                   ` Goffredo Baroncelli
2021-12-16  0:56                     ` Josef Bacik
2021-12-17  5:40                       ` Zygo Blaxell
2021-12-17 14:48                         ` Josef Bacik
2021-12-17 16:31                           ` Zygo Blaxell
2021-12-17 18:08                         ` Goffredo Baroncelli
2021-12-16  2:30                   ` Paul Jones
2021-12-14  1:03       ` Sinnamohideen, Shafeeq
2021-12-14 18:53       ` Goffredo Baroncelli
2021-12-14 20:35         ` Josef Bacik
     [not found] <cover.1614028083.git.kreijack@inwind.it>
2021-02-22 21:19 ` [PATCH 4/4] btrfs: add allocator_hint mode Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e18eff2-fca1-bde2-b942-159f89569f0f@cobb.uk.net \
    --to=g.btrfs@cobb.uk.net \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=hans@knorrie.org \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=shafeeqs@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.