All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <wqu@suse.com>
Cc: Boris Burkov <boris@bur.io>, Qu Wenruo <quwenruo.btrfs@gmx.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 3/9] btrfs: track original extent subvol in a new inline ref
Date: Wed, 10 May 2023 01:58:44 +0200	[thread overview]
Message-ID: <20230509235844.GO32559@twin.jikos.cz> (raw)
In-Reply-To: <e92837ae-0a14-21ee-1d2e-699165391ff2@suse.com>

On Fri, May 05, 2023 at 05:49:00AM +0800, Qu Wenruo wrote:
> On 2023/5/5 00:17, Boris Burkov wrote:
> > On Wed, May 03, 2023 at 11:17:12AM +0800, Qu Wenruo wrote:
> >> On 2023/5/3 08:59, Boris Burkov wrote:
> >>> In order to implement simple quota groups, we need to be able to
> >>> associate a data extent with the subvolume that created it. Once you
> >>> account for reflink, this information cannot be recovered without
> >>> explicitly storing it. Options for storing it are:
> >>> - a new key/item
> >>> - a new extent inline ref item
> >>>
> >>> The former is backwards compatible, but wastes space, the latter is
> >>> incompat, but is efficient in space and reuses the existing inline ref
> >>> machinery, while only abusing it a tiny amount -- specifically, the new
> >>> item is not a ref, per-se.
> >>
> >> Even we introduce new extent tree items, we can still mark the fs compat_ro.
> >>
> >> As long as we don't do any writes, we can still read the fs without any
> >> compatibility problem, and the enable/disable should be addressed by
> >> btrfstune/mkfs anyway.
> > 
> > Unfortunately, I don't believe compat_ro is possible with this design.
> > Because of how inline ref items are implemented, there is a lot of code
> > that makes assumptions about the extent item size, and the inline ref
> > item size based on their type. The best example that definitely breaks
> > things rather than maybe just warning is check_extent in tree-checker.c
> 
> IIRC if it's compat_ro, older kernel would reject the block group items 
> read.
> 
> If we expand that behavior to reject the whole extent tree, it can stay 
> compat_ro.
> Although you may need to do extra backports.
> 
> > 
> > With a new unparseable ref item inserted in the sequence of refs, that
> > code will either overflow or detect padding. The size calculation comes
> > up 0, etc. Perhaps there is a clever way to trick it, but I have not
> > seen it yet.
> > 
> > I was able to make it compat_ro by introducing an entirely new item for
> > the owner ref, but that comes with a per extent disk usage tradeoff that
> > is fairly steep for storing just a single u64.
> 
> If it's only to glue the original ref to an extent, I guess a new key 
> without an item would be enough.
> Although that's still quite expensive.

I consider allocating a new key as a high cost, it's worth for new
feature like verity or encryption where we require a fine grained
tracking of some new information. The number space is 255 values wide
and there are some ranges that are relatively ordered so the placement
in the logical b-tree space is good. We still have enough free values
but the gaps get smaller each time so I'd rather consider other options
first.

One drawback with features defined by keys is that it can't be easily
seen from superblock if the feature is present or not. Like extended
refs, no holes, lzo/zstd compressed extents. We always need to add the
compat bit. In case we would add a new key just to store little data and
still need to add the incompat bit it's time to think again if we could
get away with just the incompat bit. With some loss of backward
compatibility.

Right now I don't know what would be the best way forward but I'm
leaning more towards less backward compatibility and saving space in
structures. We get new incompat features "regularly" and people move to
newer kernels eventually after some period where we have time to iron
out bugs and explore the use case.

The simple quotas should fill the gap that qgroups can't so it makes
sense and people have been asking for something like that.

  reply	other threads:[~2023-05-10  0:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-03  0:59 [PATCH RFC 0/9] btrfs: simple quotas Boris Burkov
2023-05-03  0:59 ` [PATCH 1/9] btrfs: simple quotas mode Boris Burkov
2023-05-03  2:38   ` Qu Wenruo
2023-05-03  0:59 ` [PATCH 2/9] btrfs: new function for recording simple quota usage Boris Burkov
2023-05-03  0:59 ` [PATCH 3/9] btrfs: track original extent subvol in a new inline ref Boris Burkov
2023-05-03  3:17   ` Qu Wenruo
2023-05-04 16:17     ` Boris Burkov
2023-05-04 21:49       ` Qu Wenruo
2023-05-09 23:58         ` David Sterba [this message]
2023-05-10 16:57     ` David Sterba
2023-05-11  0:07       ` Qu Wenruo
2023-05-13  6:31       ` Qu Wenruo
2023-05-03  0:59 ` [PATCH 4/9] btrfs: track metadata owning root in delayed refs Boris Burkov
2023-05-03  0:59 ` [PATCH 5/9] btrfs: record simple quota deltas Boris Burkov
2023-05-03  0:59 ` [PATCH 6/9] btrfs: auto hierarchy for simple qgroups of nested subvols Boris Burkov
2023-05-03  4:47   ` Qu Wenruo
2023-05-04 16:34     ` Boris Burkov
2023-05-04 21:55       ` Qu Wenruo
2023-05-03  0:59 ` [PATCH 7/9] btrfs: check generation when recording simple quota delta Boris Burkov
2023-05-03  0:59 ` [PATCH 8/9] btrfs: expose the qgroup mode via sysfs Boris Burkov
2023-05-03  0:59 ` [PATCH 9/9] btrfs: free qgroup rsv on io failure Boris Burkov
2023-05-05  4:13 ` [PATCH RFC 0/9] btrfs: simple quotas Anand Jain
2023-05-10  1:09   ` Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230509235844.GO32559@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.