linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Andrey Albershteyn <aalbersh@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 00/11] fs-verity support for XFS
Date: Thu, 15 Dec 2022 10:06:32 +1100	[thread overview]
Message-ID: <20221214230632.GA1971568@dread.disaster.area> (raw)
In-Reply-To: <Y5ltzp6yeMo1oDSk@sol.localdomain>

On Tue, Dec 13, 2022 at 10:31:42PM -0800, Eric Biggers wrote:
> On Wed, Dec 14, 2022 at 09:11:39AM +1100, Dave Chinner wrote:
> > On Tue, Dec 13, 2022 at 12:50:28PM -0800, Eric Biggers wrote:
> > > On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote:
> > > > Not yet implemented:
> > > > - No pre-fetching of Merkle tree pages in the
> > > >   read_merkle_tree_page()
> > > 
> > > This would be helpful, but not essential.
> > > 
> > > > - No marking of already verified Merkle tree pages (each read, the
> > > >   whole tree is verified).
> > 
> > Ah, I wasn't aware that this was missing.
> > 
> > > 
> > > This is essential to have, IMO.
> > > 
> > > You *could* do what btrfs does, where it caches the Merkle tree pages in the
> > > inode's page cache past i_size, even though btrfs stores the Merkle tree
> > > separately from the file data on-disk.
> > >
> > > However, I'd guess that the other XFS developers would have an adversion to that
> > > approach, even though it would not affect the on-disk storage.
> > 
> > Yup, on an architectural level it just seems wrong to cache secure
> > verification metadata in the same user accessible address space as
> > the data it verifies.
> > 
> > > The alternatives would be to create a separate in-memory-only inode for the
> > > cache, or to build a custom cache with its own shrinker.
> > 
> > The merkel tree blocks are cached in the XFS buffer cache.
> > 
> > Andrey, could we just add a new flag to the xfs_buf->b_flags to
> > indicate that the buffer contains verified merkle tree records?
> > i.e. if it's not set after we've read the buffer, we need to verify
> > the buffer and set th verified buffer in cache and we can skip the
> > verification?
> 
> Well, my proposal at
> https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org is to keep
> tracking the "verified" status at the individual Merkle tree block level, by
> adding a bitmap fsverity_info::hash_block_verified.  That is part of the
> fs/verity/ infrastructure, and all filesystems would be able to use it.

Yeah, i had a look at that rewrite of the verification code last
night - I get the gist of what it is doing, but a single patch of
that complexity is largely impossible to sanely review...

Correct me if I'm wrong, but won't using a bitmap with 1 bit per
verified block cause problems with contiguous memory allocation
pretty quickly? i.e. a 64kB bitmap only tracks 512k blocks, which is
only 2GB of merkle tree data. Hence at file sizes of 100+GB, the
bitmap would have to be kvmalloc()d to guarantee allocation will
succeed.

I'm not really worried about the bitmap memory usage, just that it
handles large contiguous allocations sanely. I suspect we may
eventually need a sparse bitmap (e.g. the old btrfs bit-radix
implementation) to track verification in really large files
efficiently.

> However, since it's necessary to re-verify blocks that have been evicted and
> then re-instantiated, my patch also repurposes PG_checked as an indicator for
> whether the Merkle tree pages are newly instantiated.  For a "non-page-cache
> cache", that part would need to be replaced with something equivalent.

Which we could get as a boolean state from the XFS buffer cache
fairly easily - did we find the buffer in cache, or was it read from
disk...

> A different aproach would be to make it so that every time a page (or "cache
> buffer", to call it something more generic) of N Merkle tree blocks is read,
> then all N of those blocks are verified immediately.  Then there would be no
> need to track the "verified" status of individual blocks.

That won't work with XFS - merkle tree blocks are not contiguous in
the attribute b-tree so there is no efficient "sequential bulk read"
option available. The xattr structure is largely chosen because it
allows for fast, deterministic single merkle tree block
operations....

> My concerns with that approach are:
> 
>   * Most data reads only need a single Merkle tree block at the deepest level.

Yup, see above. :)

>     If at least N tree blocks were verified any time that any were verified at
>     all, that would make the worst-case read latency worse.

*nod*

>   * It's possible that the parents of N tree blocks are split across a cache
>     buffer.  Thus, while N blocks can't have more than N parents, and in
>     practice would just have 1-2, those 2 parents could be split into two
>     separate cache buffers, with a total length of 2*N.  Verifying all of those
>     would really increase the worst-case latency as well.
> 
> So I'm thinking that tracking the "verified" status of tree blocks individually
> is the right way to go.  But I'd appreciate any other thoughts on this.

I think that having the fsverity code track verified indexes itself
is a much more felxible and self contained and the right way to go
about it.

The other issue is that verify_page() assumes that it can drop the
reference to the cached object itself - the caller actually owns the
reference to the object, not the verify_page() code. Hence if we are
passing opaque buffers to verify_page() rather page cache pages, we
need a ->drop_block method that gets called instead of put_page().
This will allow the filesystem to hold a reference to the merkle
tree block data while the verification occurs, ensuring that they
don't get reclaimed by memory pressure whilst still in use...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-12-14 23:06 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
2022-12-14  0:53   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper Andrey Albershteyn
2022-12-13 17:55   ` Matthew Wilcox
2022-12-13 19:33     ` Eric Biggers
2022-12-13 21:10       ` Dave Chinner
2022-12-14  6:52         ` Eric Biggers
2022-12-14  8:12           ` Dave Chinner
2022-12-13 21:08     ` Dave Chinner
2023-01-09 16:34       ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 03/11] xfs: add attribute type for fs-verity Andrey Albershteyn
2022-12-13 17:43   ` Eric Sandeen
2022-12-14  1:03     ` Dave Chinner
2023-01-09 16:37       ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2022-12-14  1:06   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2022-12-14  1:29   ` Dave Chinner
2023-01-09 16:51     ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2022-12-14  1:35   ` Dave Chinner
2022-12-14  5:25     ` Eric Biggers
2022-12-14  8:18       ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
2022-12-14  2:07   ` Dave Chinner
2022-12-14  5:44     ` Eric Biggers
2022-12-23 16:18   ` Christoph Hellwig
2023-01-09 17:23     ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode Andrey Albershteyn
2022-12-14  2:07   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
2022-12-13 19:02   ` Eric Biggers
2023-01-09 16:58     ` Andrey Albershteyn
2022-12-14  5:43   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
2022-12-13 19:08   ` Eric Biggers
2022-12-13 19:22     ` Darrick J. Wong
2022-12-13 20:13       ` Eric Biggers
2022-12-13 20:33     ` Dave Chinner
2022-12-13 20:39       ` Eric Biggers
2022-12-13 21:40         ` Dave Chinner
2022-12-14  7:58   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 11/11] xfs: add fs-verity ioctls Andrey Albershteyn
2022-12-13 20:50 ` [RFC PATCH 00/11] fs-verity support for XFS Eric Biggers
2022-12-13 22:11   ` Dave Chinner
2022-12-14  6:31     ` Eric Biggers
2022-12-14 23:06       ` Dave Chinner [this message]
2022-12-15  6:47         ` Eric Biggers
2022-12-15 20:57           ` Dave Chinner
2022-12-16  5:04             ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221214230632.GA1971568@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=aalbersh@redhat.com \
    --cc=ebiggers@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).