All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Josef Bacik <josef@toxicpanda.com>, linux-btrfs@vger.kernel.org
Subject: Re: [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata
Date: Fri, 22 Jan 2021 19:31:18 +0100	[thread overview]
Message-ID: <89ee4ab9-64cd-b093-92d2-02eee4997250@inwind.it> (raw)
In-Reply-To: <20210121185400.GH28049@hungrycats.org>

On 1/21/21 7:54 PM, Zygo Blaxell wrote:
> On Thu, Jan 21, 2021 at 07:16:05PM +0100, Goffredo Baroncelli wrote:
>> On 1/20/21 5:02 PM, Josef Bacik wrote:
>>> On 1/17/21 1:54 PM, Goffredo Baroncelli wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This is an RFC; I wrote this patch because I find the idea interesting
>>>> even though it adds more complication to the chunk allocator.
>>>>
>>>> The basic idea is to store the metadata chunk in the fasters disks.
>>>> The fasters disk are marked by the "preferred_metadata" flag.
>>>>
>>>> BTRFS when allocate a new metadata/system chunk, selects the
>>>> "preferred_metadata" disks, otherwise it selectes the non
>>>> "preferred_metadata" disks. The intial patch allowed to use the other
>>>> kind of disk in case a set is full.
>>>>
>>>> This patches set is based on v5.11-rc2.
>>>>
>>>> For now, the only user of this patch that I am aware is Zygo.
>>>> However he asked to further constraint the allocation: i.e. avoid to
>>>> allocated metadata on a not "preferred_metadata"
>>>> disk. So I extended the patch adding 4 modes to operate.
>>>>
>>>> This is enabled passing the option "preferred_metadata=<mode>" at
>>>> mount time.
>>>>
>>>
>>> I'll echo Zygo's hatred for mount options.  The more complicated policy decisions belong in properties and sysfs knobs, not mount options.
>>>
>> I tend to agree. However adding a filesystem property can be done in a second time. I don't think that this a problem. However I prefer to make the patch smaller.
>>
>> Anyway I have to point out that we need a way to change the allocation
>> policy without changing the metadata otherwise we risk to be in the
>> loop of exhausting metadata space: - how we can increase the space for
>> metadata if we don't have space for metadata but I need to allocate
>> few block of metadata....
>>
>> What I mean is that even if we store the setting as filesystem
>> properties (and definitely we have to do), we need a way to override
>> in an emergency scenario.
> 
> There are no new issues introduced by this change, thus no requirement
> for a mount option to deal with new issues.
> 
> The same issue comes up when changing RAID profile, or removing devices,
> or when existing devices simply fill up.  Part of the solution is the
> global reserve, which ensures we can always create a transaction to modify
> a few metadata pages.
> 
> Part of the solution is a run-time check to ensure we have min_devs for
> active RAID profiles whenever we change a device policy to reject data
> or metadata (see btrfs_check_raid_min_devices).  This is currently
> implemented for the device remove ioctl, and a similar check will
> be needed for the device property set ioctl for preferred_metadata.
> That part is missing in v5 of this patch and will have to be added,
> though even now it works most of the time without.
> 
> v5 is also missing changes to the df statvfs code to deal with metadata
> only devices.  At this stage it's an RFC patch, so that's OK, but it
> will also need to be fixed.  We presume these will be addressed in future
> versions.  Again, it works now, but 'df' will give the wrong number.
> 
> None of the above requirements is addressed by a mount option, and
> the mount option adds new requirements that we don't want.
> 
>>> And then for the properties themselves, presumably we'll want to
>> add other FS wide properties in the future.  I'm not against adding
>> new actual keys and items to the tree itself, but is there a way
>> we could use our existing property infrastructure that we use for
>> compression, and simply store the xattrs in the tree root?  It looks
>> like we're just toggling a policy decision, and we don't actually
>> need the other properties in the item you've created, so why not
>> just a btrfs.preferred_metadata property with the value stored in it,
>> dropped into the tree_root so it can be read on mount?  Thanks,
>>
>> What if the root subvolume is not mounted ?
> 
> Same as device add or remove--if the filesystem isn't mounted, you can't
> make any changes.

I am referring to a case where a subvolume (id != 5) is mounted but not the root one (id=5).
The point is that (e.g.) in the current implementation you can use

$ sudo btrfs property set /dev/vde preferred_metadata 1

in any case where the filesystem is mounted (doesn't matter which
subvvolume).

I like the Josef idea: instead to develop a new api to retrive/change/list/store/delete/create
some setting, we could use the xattr api which already exists.

# adding properties
$ sudo setfattr -n "btrfs.metadata_preferred_disk=aabbcc" -v "mode=1"  /
$ sudo setfattr -n "btrfs.metadata_preferred_disk=aabbee" -v "mode=1"  /
$ sudo setfattr -n "btrfs.metadata_preferred" -v "1"  /

# listing properties and their values
$ sudo getfattr -d /
getfattr: Removing leading '/' from absolute path names
# file: .
btrfs.metadata_preferred_disk\075aabbcc="mode=1"
btrfs.metadata_preferred_disk\075aabbee="mode=1"
btrfs.metadata_preferred="1"

# removing a properties
$ sudo setfattr -x "btrfs.metadata_preferred_disk=aabbcc" /

However the xattr requires an inode to attach the .. xattrs. But when we are talking about
filesystem properties, which is the right inode ? The only stable inode is the '.' of
the root subvolume. The other inodes may be deleted so are not suitable to store per
filesystem properties.

So the point is: what happens if the root subvolume is not mounted ?

Of course we can tweak the xattr api to deal this case, but doing so is not
so different than developing a new api, which (I think) is not what Josef suggested.

> 
> Note that all the required properties are per-device, so really you just
> need any open FD on the filesystem.  (I think Josef didn't read that far
> down).
> 
> The per-device policy storage can go in dev_root (tree 4) along with the
> device stats data, if we don't want to use btrfs_device::type.  You'd still
> need an ioctl to get to it.
> 
> Or maybe I'm misreading Josef here, and his idea is to make the per-device
> configuration a string blob that can be set by putting an xattr on the
> root subvol?  I'm not sure that's better, but it'll work.
> 
>> Yes we can create a further
>> api to store/retrive this kind of metadata without mounting the root
>> subvolume, but doing so in what it would be different than adding a
>> key to the root fs like the default subvolume ioctl does ?
> 
>>>
>>> Josef
>>
>>
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
>>


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2021-01-22 18:32 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-17 18:54 [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 1/5] Add an ioctl to set the device properties Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 2/5] Add flags for dedicated metadata disks Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 3/5] Export dev_item.type in sysfs /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 4/5] btrfs: add preferred_metadata option Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 5/5] btrfs: add preferred_metadata mode mount option Goffredo Baroncelli
2021-01-18  3:05   ` kernel test robot
2021-01-18  3:05     ` kernel test robot
2021-01-19 23:12 ` [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata Zygo Blaxell
2021-01-21  8:31   ` Martin Svec
2021-01-20 16:02 ` Josef Bacik
2021-01-20 16:15   ` Johannes Thumshirn
2021-01-20 16:17     ` Josef Bacik
2021-01-20 16:20   ` Zygo Blaxell
2021-01-21 18:16   ` Goffredo Baroncelli
2021-01-21 18:54     ` Zygo Blaxell
2021-01-22 18:31       ` Goffredo Baroncelli [this message]
2021-01-22 22:42         ` Zygo Blaxell
2021-01-23 14:55           ` Graham Cobb
2021-01-23 17:21             ` Zygo Blaxell
2021-01-23 17:44               ` Graham Cobb
2021-01-24  4:00                 ` Zygo Blaxell
2021-01-24 20:05                 ` Goffredo Baroncelli
2021-01-25 15:21 ` Josef Bacik
2023-01-15 17:00   ` Goffredo Baroncelli
2023-01-15 17:05     ` Goffredo Baroncelli
2023-01-16  8:20       ` Paul Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89ee4ab9-64cd-b093-92d2-02eee4997250@inwind.it \
    --to=kreijack@inwind.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.