linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Josef Bacik <josef@toxicpanda.com>, linux-btrfs@vger.kernel.org
Subject: Re: [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata
Date: Fri, 22 Jan 2021 19:31:18 +0100	[thread overview]
Message-ID: <89ee4ab9-64cd-b093-92d2-02eee4997250@inwind.it> (raw)
In-Reply-To: <20210121185400.GH28049@hungrycats.org>

On 1/21/21 7:54 PM, Zygo Blaxell wrote:
> On Thu, Jan 21, 2021 at 07:16:05PM +0100, Goffredo Baroncelli wrote:
>> On 1/20/21 5:02 PM, Josef Bacik wrote:
>>> On 1/17/21 1:54 PM, Goffredo Baroncelli wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This is an RFC; I wrote this patch because I find the idea interesting
>>>> even though it adds more complication to the chunk allocator.
>>>>
>>>> The basic idea is to store the metadata chunk in the fasters disks.
>>>> The fasters disk are marked by the "preferred_metadata" flag.
>>>>
>>>> BTRFS when allocate a new metadata/system chunk, selects the
>>>> "preferred_metadata" disks, otherwise it selectes the non
>>>> "preferred_metadata" disks. The intial patch allowed to use the other
>>>> kind of disk in case a set is full.
>>>>
>>>> This patches set is based on v5.11-rc2.
>>>>
>>>> For now, the only user of this patch that I am aware is Zygo.
>>>> However he asked to further constraint the allocation: i.e. avoid to
>>>> allocated metadata on a not "preferred_metadata"
>>>> disk. So I extended the patch adding 4 modes to operate.
>>>>
>>>> This is enabled passing the option "preferred_metadata=<mode>" at
>>>> mount time.
>>>>
>>>
>>> I'll echo Zygo's hatred for mount options.  The more complicated policy decisions belong in properties and sysfs knobs, not mount options.
>>>
>> I tend to agree. However adding a filesystem property can be done in a second time. I don't think that this a problem. However I prefer to make the patch smaller.
>>
>> Anyway I have to point out that we need a way to change the allocation
>> policy without changing the metadata otherwise we risk to be in the
>> loop of exhausting metadata space: - how we can increase the space for
>> metadata if we don't have space for metadata but I need to allocate
>> few block of metadata....
>>
>> What I mean is that even if we store the setting as filesystem
>> properties (and definitely we have to do), we need a way to override
>> in an emergency scenario.
> 
> There are no new issues introduced by this change, thus no requirement
> for a mount option to deal with new issues.
> 
> The same issue comes up when changing RAID profile, or removing devices,
> or when existing devices simply fill up.  Part of the solution is the
> global reserve, which ensures we can always create a transaction to modify
> a few metadata pages.
> 
> Part of the solution is a run-time check to ensure we have min_devs for
> active RAID profiles whenever we change a device policy to reject data
> or metadata (see btrfs_check_raid_min_devices).  This is currently
> implemented for the device remove ioctl, and a similar check will
> be needed for the device property set ioctl for preferred_metadata.
> That part is missing in v5 of this patch and will have to be added,
> though even now it works most of the time without.
> 
> v5 is also missing changes to the df statvfs code to deal with metadata
> only devices.  At this stage it's an RFC patch, so that's OK, but it
> will also need to be fixed.  We presume these will be addressed in future
> versions.  Again, it works now, but 'df' will give the wrong number.
> 
> None of the above requirements is addressed by a mount option, and
> the mount option adds new requirements that we don't want.
> 
>>> And then for the properties themselves, presumably we'll want to
>> add other FS wide properties in the future.  I'm not against adding
>> new actual keys and items to the tree itself, but is there a way
>> we could use our existing property infrastructure that we use for
>> compression, and simply store the xattrs in the tree root?  It looks
>> like we're just toggling a policy decision, and we don't actually
>> need the other properties in the item you've created, so why not
>> just a btrfs.preferred_metadata property with the value stored in it,
>> dropped into the tree_root so it can be read on mount?  Thanks,
>>
>> What if the root subvolume is not mounted ?
> 
> Same as device add or remove--if the filesystem isn't mounted, you can't
> make any changes.

I am referring to a case where a subvolume (id != 5) is mounted but not the root one (id=5).
The point is that (e.g.) in the current implementation you can use

$ sudo btrfs property set /dev/vde preferred_metadata 1

in any case where the filesystem is mounted (doesn't matter which
subvvolume).

I like the Josef idea: instead to develop a new api to retrive/change/list/store/delete/create
some setting, we could use the xattr api which already exists.

# adding properties
$ sudo setfattr -n "btrfs.metadata_preferred_disk=aabbcc" -v "mode=1"  /
$ sudo setfattr -n "btrfs.metadata_preferred_disk=aabbee" -v "mode=1"  /
$ sudo setfattr -n "btrfs.metadata_preferred" -v "1"  /

# listing properties and their values
$ sudo getfattr -d /
getfattr: Removing leading '/' from absolute path names
# file: .
btrfs.metadata_preferred_disk\075aabbcc="mode=1"
btrfs.metadata_preferred_disk\075aabbee="mode=1"
btrfs.metadata_preferred="1"

# removing a properties
$ sudo setfattr -x "btrfs.metadata_preferred_disk=aabbcc" /

However the xattr requires an inode to attach the .. xattrs. But when we are talking about
filesystem properties, which is the right inode ? The only stable inode is the '.' of
the root subvolume. The other inodes may be deleted so are not suitable to store per
filesystem properties.

So the point is: what happens if the root subvolume is not mounted ?

Of course we can tweak the xattr api to deal this case, but doing so is not
so different than developing a new api, which (I think) is not what Josef suggested.

> 
> Note that all the required properties are per-device, so really you just
> need any open FD on the filesystem.  (I think Josef didn't read that far
> down).
> 
> The per-device policy storage can go in dev_root (tree 4) along with the
> device stats data, if we don't want to use btrfs_device::type.  You'd still
> need an ioctl to get to it.
> 
> Or maybe I'm misreading Josef here, and his idea is to make the per-device
> configuration a string blob that can be set by putting an xattr on the
> root subvol?  I'm not sure that's better, but it'll work.
> 
>> Yes we can create a further
>> api to store/retrive this kind of metadata without mounting the root
>> subvolume, but doing so in what it would be different than adding a
>> key to the root fs like the default subvolume ioctl does ?
> 
>>>
>>> Josef
>>
>>
>> -- 
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
>>


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2021-01-22 18:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-17 18:54 [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 1/5] Add an ioctl to set the device properties Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 2/5] Add flags for dedicated metadata disks Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 3/5] Export dev_item.type in sysfs /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 4/5] btrfs: add preferred_metadata option Goffredo Baroncelli
2021-01-17 18:54 ` [PATCH 5/5] btrfs: add preferred_metadata mode mount option Goffredo Baroncelli
2021-01-18  3:05   ` kernel test robot
2021-01-19 23:12 ` [RFC][PATCH V5] btrfs: preferred_metadata: preferred device for metadata Zygo Blaxell
2021-01-21  8:31   ` Martin Svec
2021-01-20 16:02 ` Josef Bacik
2021-01-20 16:15   ` Johannes Thumshirn
2021-01-20 16:17     ` Josef Bacik
2021-01-20 16:20   ` Zygo Blaxell
2021-01-21 18:16   ` Goffredo Baroncelli
2021-01-21 18:54     ` Zygo Blaxell
2021-01-22 18:31       ` Goffredo Baroncelli [this message]
2021-01-22 22:42         ` Zygo Blaxell
2021-01-23 14:55           ` Graham Cobb
2021-01-23 17:21             ` Zygo Blaxell
2021-01-23 17:44               ` Graham Cobb
2021-01-24  4:00                 ` Zygo Blaxell
2021-01-24 20:05                 ` Goffredo Baroncelli
2021-01-25 15:21 ` Josef Bacik
2023-01-15 17:00   ` Goffredo Baroncelli
2023-01-15 17:05     ` Goffredo Baroncelli
2023-01-16  8:20       ` Paul Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89ee4ab9-64cd-b093-92d2-02eee4997250@inwind.it \
    --to=kreijack@inwind.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).