All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/25] fs-verity support for XFS
@ 2024-02-12 16:57 Andrey Albershteyn
  2024-02-12 16:57 ` [PATCH v4 01/25] fsverity: remove hash page spin lock Andrey Albershteyn
                   ` (24 more replies)
  0 siblings, 25 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:57 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Hi all,

Here's v4 of my patchset of adding fs-verity support to XFS.

This implementation uses extended attributes to store fs-verity
metadata. The Merkle tree blocks are stored in the remote extended
attributes. The names are offsets into the tree.

A few key points of this patchset:
- fs-verity can work with Merkle tree blocks based caching (xfs) and
  PAGE caching (ext4, f2fs, btrfs)
- iomap does fs-verity verification. Filesystem has to provide
  workqueue only.
- In XFS, fs-verity metadata is stored in extended attributes
- New global XFS workqueue for verification processing
- Inodes with fs-verity have new on-disk diflag
- xfs_attr_get() can return a buffer with an extended attribute
- xfs_buf can allocate double space for Merkle tree blocks. Part of
  the space is used to store  the extended attribute data without
  leaf headers
- xfs_buf tracks verified status of merkle tree blocks

The patchset consists of five parts:
- [1]: fs-verity spinlock removal pending in fsverity/for-next
- [2..4]: Parent pointers adding binary xattr names
- [5]: Expose FS_XFLAG_VERITY for fs-verity files
- [6..9]: Changes to fs-verity core
- [10]: Integrate fs-verity to iomap
- [11-25]: Add fs-verity support to XFS

Testing:
The patchset is tested with xfstests -g quick on xfs_1k, xfs_4k,
xfs_1k_quota, xfs_4k_quota, ext4_4k, and ext4_4k_quota. With
KMEMLEAK and KASAN enabled. More testing on the way.

Changes from V3:
- redone changes to fs-verity core as previous version had an issue
  on ext4
- add blocks invalidation interface to fs-verity
- move memory ordering primitives out of block status check to fs
  read block function
- add fs-verity verification to iomap instead of general post read
  processing
Changes from V2:
- FS_XFLAG_VERITY extended attribute flag
- Change fs-verity to use Merkle tree blocks instead of expecting
  PAGE references from filesystem
- Change approach in iomap to filesystem provided bio_set and
  submit_io instead of just callouts to filesystem
- Add possibility for xfs_buf allocate more space for fs-verity
  extended attributes
- Make xfs_attr module to copy fs-verity blocks inside the xfs_buf,
  so XFS can get data without leaf headers
- Add Merkle tree removal for error path
- Makae scrub aware of new dinode flag
Changes from V1:
- Added parent pointer patches for easier testing
- Many issues and refactoring points fixed from the V1 review
- Adjusted for recent changes in fs-verity core (folios, non-4k)
- Dropped disabling of large folios
- Completely new fsverity patches (fix, callout, log_blocksize)
- Change approach to verification in iomap to the same one as in
  write path. Callouts to fs instead of direct fs-verity use.
- New XFS workqueue for post read folio verification
- xfs_attr_get() can return underlying xfs_buf
- xfs_bufs are marked with XBF_VERITY_CHECKED to track verified
  blocks

kernel:
[1]: https://github.com/alberand/linux/tree/fsverity-v4

xfsprogs:
[2]: https://github.com/alberand/xfsprogs/tree/fsverity-v4

xfstests:
[3]: https://github.com/alberand/xfstests/tree/fsverity-v4

v1:
[4]: https://lore.kernel.org/linux-xfs/20221213172935.680971-1-aalbersh@redhat.com/

v2:
[5]: https://lore.kernel.org/linux-xfs/20230404145319.2057051-1-aalbersh@redhat.com/

v3:
[6]: https://lore.kernel.org/all/20231006184922.252188-1-aalbersh@redhat.com/

fs-verity:
[7]: https://www.kernel.org/doc/html/latest/filesystems/fsverity.html

Thanks,
Andrey

Allison Henderson (3):
  xfs: add parent pointer support to attribute code
  xfs: define parent pointer ondisk extended attribute format
  xfs: add parent pointer validator functions

Andrey Albershteyn (22):
  fsverity: remove hash page spin lock
  fs: add FS_XFLAG_VERITY for verity files
  fsverity: pass log_blocksize to end_enable_verity()
  fsverity: support block-based Merkle tree caching
  fsverity: calculate readahead in bytes instead of pages
  fsverity: add tracepoints
  iomap: integrate fsverity verification into iomap's read path
  xfs: add XBF_VERITY_SEEN xfs_buf flag
  xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer
  xfs: introduce workqueue for post read IO work
  xfs: add attribute type for fs-verity
  xfs: make xfs_buf_get() to take XBF_* flags
  xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer
  xfs: add fs-verity ro-compat flag
  xfs: add inode on-disk VERITY flag
  xfs: initialize fs-verity on file open and cleanup on inode
    destruction
  xfs: don't allow to enable DAX on fs-verity sealsed inode
  xfs: disable direct read path for fs-verity files
  xfs: add fs-verity support
  xfs: make scrub aware of verity dinode flag
  xfs: add fs-verity ioctls
  xfs: enable ro-compat fs-verity flag

 Documentation/filesystems/fsverity.rst |  12 +
 fs/btrfs/verity.c                      |   4 +-
 fs/erofs/data.c                        |   4 +-
 fs/ext4/verity.c                       |   3 +-
 fs/f2fs/verity.c                       |   3 +-
 fs/gfs2/aops.c                         |   4 +-
 fs/ioctl.c                             |  11 +
 fs/iomap/buffered-io.c                 | 102 +++++++-
 fs/verity/enable.c                     |   9 +-
 fs/verity/fsverity_private.h           |  30 ++-
 fs/verity/init.c                       |   1 +
 fs/verity/open.c                       |   9 +-
 fs/verity/read_metadata.c              |  48 ++--
 fs/verity/signature.c                  |   2 +
 fs/verity/verify.c                     | 315 +++++++++++++++-------
 fs/xfs/Makefile                        |   2 +
 fs/xfs/libxfs/xfs_attr.c               |  31 ++-
 fs/xfs/libxfs/xfs_attr.h               |   3 +-
 fs/xfs/libxfs/xfs_attr_leaf.c          |  24 +-
 fs/xfs/libxfs/xfs_attr_remote.c        |  39 ++-
 fs/xfs/libxfs/xfs_da_btree.h           |   5 +-
 fs/xfs/libxfs/xfs_da_format.h          |  68 ++++-
 fs/xfs/libxfs/xfs_format.h             |  14 +-
 fs/xfs/libxfs/xfs_log_format.h         |   2 +
 fs/xfs/libxfs/xfs_ondisk.h             |   4 +
 fs/xfs/libxfs/xfs_parent.c             | 113 ++++++++
 fs/xfs/libxfs/xfs_parent.h             |  19 ++
 fs/xfs/libxfs/xfs_sb.c                 |   4 +-
 fs/xfs/scrub/attr.c                    |   4 +-
 fs/xfs/xfs_aops.c                      |  15 +-
 fs/xfs/xfs_attr_item.c                 |   6 +-
 fs/xfs/xfs_attr_list.c                 |  14 +-
 fs/xfs/xfs_buf.c                       |   6 +-
 fs/xfs/xfs_buf.h                       |  23 +-
 fs/xfs/xfs_file.c                      |  23 +-
 fs/xfs/xfs_inode.c                     |   2 +
 fs/xfs/xfs_inode.h                     |   3 +-
 fs/xfs/xfs_ioctl.c                     |  22 ++
 fs/xfs/xfs_iops.c                      |   4 +
 fs/xfs/xfs_linux.h                     |   1 +
 fs/xfs/xfs_mount.h                     |   3 +
 fs/xfs/xfs_super.c                     |  19 ++
 fs/xfs/xfs_trace.h                     |   4 +-
 fs/xfs/xfs_verity.c                    | 348 +++++++++++++++++++++++++
 fs/xfs/xfs_verity.h                    |  33 +++
 fs/xfs/xfs_xattr.c                     |  10 +
 fs/zonefs/file.c                       |   4 +-
 include/linux/fsverity.h               |  73 +++++-
 include/linux/iomap.h                  |   6 +-
 include/trace/events/fsverity.h        | 184 +++++++++++++
 include/uapi/linux/fs.h                |   1 +
 51 files changed, 1494 insertions(+), 199 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h
 create mode 100644 fs/xfs/xfs_verity.c
 create mode 100644 fs/xfs/xfs_verity.h
 create mode 100644 include/trace/events/fsverity.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v4 01/25] fsverity: remove hash page spin lock
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
@ 2024-02-12 16:57 ` Andrey Albershteyn
  2024-02-12 16:57 ` [PATCH v4 02/25] xfs: add parent pointer support to attribute code Andrey Albershteyn
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:57 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn, Eric Biggers

The spin lock is not necessary here as it can be replaced with
memory barrier which should be better performance-wise.

When Merkle tree block size differs from page size, in
is_hash_block_verified() two things are modified during check - a
bitmap and PG_checked flag of the page.

Each bit in the bitmap represent verification status of the Merkle
tree blocks. PG_checked flag tells if page was just re-instantiated
or was in pagecache. Both of this states are shared between
verification threads. Page which was re-instantiated can not have
already verified blocks (bit set in bitmap).

The spin lock was used to allow only one thread to modify both of
these states and keep order of operations. The only requirement here
is that PG_Checked is set strictly after bitmap is updated.
This way other threads which see that PG_Checked=1 (page cached)
knows that bitmap is up-to-date. Otherwise, if PG_Checked is set
before bitmap is cleared, other threads can see bit=1 and therefore
will not perform verification of that Merkle tree block.

However, there's still the case when one thread is setting a bit in
verify_data_block() and other thread is clearing it in
is_hash_block_verified(). This can happen if two threads get to
!PageChecked branch and one of the threads is rescheduled before
resetting the bitmap. This is fine as at worst blocks are
re-verified in each thread.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/verity/fsverity_private.h |  1 -
 fs/verity/open.c             |  1 -
 fs/verity/verify.c           | 48 ++++++++++++++++++------------------
 3 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index a6a6b2749241..b3506f56e180 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -69,7 +69,6 @@ struct fsverity_info {
 	u8 file_digest[FS_VERITY_MAX_DIGEST_SIZE];
 	const struct inode *inode;
 	unsigned long *hash_block_verified;
-	spinlock_t hash_page_init_lock;
 };
 
 #define FS_VERITY_MAX_SIGNATURE_SIZE	(FS_VERITY_MAX_DESCRIPTOR_SIZE - \
diff --git a/fs/verity/open.c b/fs/verity/open.c
index 6c31a871b84b..fdeb95eca3af 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -239,7 +239,6 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
 			err = -ENOMEM;
 			goto fail;
 		}
-		spin_lock_init(&vi->hash_page_init_lock);
 	}
 
 	return vi;
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 904ccd7e8e16..4fcad0825a12 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -19,7 +19,6 @@ static struct workqueue_struct *fsverity_read_workqueue;
 static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
 				   unsigned long hblock_idx)
 {
-	bool verified;
 	unsigned int blocks_per_page;
 	unsigned int i;
 
@@ -43,12 +42,20 @@ static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
 	 * re-instantiated from the backing storage are re-verified.  To do
 	 * this, we use PG_checked again, but now it doesn't really mean
 	 * "checked".  Instead, now it just serves as an indicator for whether
-	 * the hash page is newly instantiated or not.
+	 * the hash page is newly instantiated or not.  If the page is new, as
+	 * indicated by PG_checked=0, we clear the bitmap bits for the page's
+	 * blocks since they are untrustworthy, then set PG_checked=1.
+	 * Otherwise we return the bitmap bit for the requested block.
 	 *
-	 * The first thread that sees PG_checked=0 must clear the corresponding
-	 * bitmap bits, then set PG_checked=1.  This requires a spinlock.  To
-	 * avoid having to take this spinlock in the common case of
-	 * PG_checked=1, we start with an opportunistic lockless read.
+	 * Multiple threads may execute this code concurrently on the same page.
+	 * This is safe because we use memory barriers to ensure that if a
+	 * thread sees PG_checked=1, then it also sees the associated bitmap
+	 * clearing to have occurred.  Also, all writes and their corresponding
+	 * reads are atomic, and all writes are safe to repeat in the event that
+	 * multiple threads get into the PG_checked=0 section.  (Clearing a
+	 * bitmap bit again at worst causes a hash block to be verified
+	 * redundantly.  That event should be very rare, so it's not worth using
+	 * a lock to avoid.  Setting PG_checked again has no effect.)
 	 */
 	if (PageChecked(hpage)) {
 		/*
@@ -58,24 +65,17 @@ static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
 		smp_rmb();
 		return test_bit(hblock_idx, vi->hash_block_verified);
 	}
-	spin_lock(&vi->hash_page_init_lock);
-	if (PageChecked(hpage)) {
-		verified = test_bit(hblock_idx, vi->hash_block_verified);
-	} else {
-		blocks_per_page = vi->tree_params.blocks_per_page;
-		hblock_idx = round_down(hblock_idx, blocks_per_page);
-		for (i = 0; i < blocks_per_page; i++)
-			clear_bit(hblock_idx + i, vi->hash_block_verified);
-		/*
-		 * A write memory barrier is needed here to give RELEASE
-		 * semantics to the below SetPageChecked() operation.
-		 */
-		smp_wmb();
-		SetPageChecked(hpage);
-		verified = false;
-	}
-	spin_unlock(&vi->hash_page_init_lock);
-	return verified;
+	blocks_per_page = vi->tree_params.blocks_per_page;
+	hblock_idx = round_down(hblock_idx, blocks_per_page);
+	for (i = 0; i < blocks_per_page; i++)
+		clear_bit(hblock_idx + i, vi->hash_block_verified);
+	/*
+	 * A write memory barrier is needed here to give RELEASE semantics to
+	 * the below SetPageChecked() operation.
+	 */
+	smp_wmb();
+	SetPageChecked(hpage);
+	return false;
 }
 
 /*
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 02/25] xfs: add parent pointer support to attribute code
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
  2024-02-12 16:57 ` [PATCH v4 01/25] fsverity: remove hash page spin lock Andrey Albershteyn
@ 2024-02-12 16:57 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 03/25] xfs: define parent pointer ondisk extended attribute format Andrey Albershteyn
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:57 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Allison Henderson, Mark Tinguely, Dave Chinner

From: Allison Henderson <allison.henderson@oracle.com>

Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
entries; it uses reserved blocks like XFS_ATTR_ROOT.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       | 3 ++-
 fs/xfs/libxfs/xfs_da_format.h  | 5 ++++-
 fs/xfs/libxfs/xfs_log_format.h | 1 +
 fs/xfs/scrub/attr.c            | 2 +-
 fs/xfs/xfs_trace.h             | 3 ++-
 5 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index e965a48e7db9..1292ab043b4f 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -924,7 +924,8 @@ xfs_attr_set(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
-	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+	bool			rsvd = (args->attr_filter & (XFS_ATTR_ROOT |
+							     XFS_ATTR_PARENT));
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 24f9d1461f9a..18e8c7d44ab8 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -703,12 +703,15 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally */
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
+#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
-#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ONDISK_MASK \
+			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 269573c82808..eb7406c6ea41 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -972,6 +972,7 @@ struct xfs_icreate_log {
  */
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
+					 XFS_ATTR_PARENT | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 83c7feb38714..49f91cc85a65 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -494,7 +494,7 @@ xchk_xattr_rec(
 	/* Retrieve the entry and check it. */
 	hash = be32_to_cpu(ent->hashval);
 	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
-			XFS_ATTR_INCOMPLETE);
+			XFS_ATTR_INCOMPLETE | XFS_ATTR_PARENT);
 	if ((ent->flags & badflags) != 0)
 		xchk_da_set_corrupt(ds, level);
 	if (ent->flags & XFS_ATTR_LOCAL) {
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 0984a1c884c7..07e8a69f8e56 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -83,7 +83,8 @@ struct xfs_perag;
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
 	{ XFS_ATTR_SECURE,	"SECURE" }, \
-	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }
+	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }, \
+	{ XFS_ATTR_PARENT,	"PARENT" }
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 03/25] xfs: define parent pointer ondisk extended attribute format
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
  2024-02-12 16:57 ` [PATCH v4 01/25] fsverity: remove hash page spin lock Andrey Albershteyn
  2024-02-12 16:57 ` [PATCH v4 02/25] xfs: add parent pointer support to attribute code Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 04/25] xfs: add parent pointer validator functions Andrey Albershteyn
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Allison Henderson, Dave Chinner

From: Allison Henderson <allison.henderson@oracle.com>

We need to define the parent pointer attribute format before we start
adding support for it into all the code that needs to use it. The EA
format we will use encodes the following information:

        name={parent inode #, parent inode generation, dirent namehash}
        value={dirent name}

The inode/gen gives all the information we need to reliably identify the
parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl calls.
Storing the dirent name hash in the key reduces hash collisions if a
file is hardlinked multiple times in the same directory.

By using the NVLOOKUP mode in the extended attribute code to match
parent pointers using both the xattr name and value, we can identify the
exact parent pointer EA we need to modify/remove in rename/unlink
operations without searching the entire EA space.

By storing the dirent name, we have enough information to be able to
validate and reconstruct damaged directory trees.  Earlier iterations of
this patchset encoded the directory offset in the parent pointer key,
but this format required repair to keep that in sync across directory
rebuilds, which is unnecessary complexity.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: replace diroffset with the namehash in the pptr key]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 18e8c7d44ab8..e5eacfe75021 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -867,4 +867,24 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/*
+ * Parent pointer attribute format definition
+ *
+ * The xattr name encodes the parent inode number, generation and the crc32c
+ * hash of the dirent name.
+ *
+ * The xattr value contains the dirent name.
+ */
+struct xfs_parent_name_rec {
+	__be64	p_ino;
+	__be32	p_gen;
+	__be32	p_namehash;
+};
+
+/*
+ * Maximum size of the dirent name that can be stored in a parent pointer.
+ * This matches the maximum dirent name length.
+ */
+#define XFS_PARENT_DIRENT_NAME_MAX_SIZE		(MAXNAMELEN - 1)
+
 #endif /* __XFS_DA_FORMAT_H__ */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 04/25] xfs: add parent pointer validator functions
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (2 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 03/25] xfs: define parent pointer ondisk extended attribute format Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Allison Henderson

From: Allison Henderson <allison.henderson@oracle.com>

Attribute names of parent pointers are not strings.  So we need to
modify attr_namecheck to verify parent pointer records when the
XFS_ATTR_PARENT flag is set.  At the same time, we need to validate attr
values during log recovery if the xattr is really a parent pointer.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: move functions to xfs_parent.c, adjust for new disk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile               |   1 +
 fs/xfs/libxfs/xfs_attr.c      |  10 ++-
 fs/xfs/libxfs/xfs_attr.h      |   3 +-
 fs/xfs/libxfs/xfs_da_format.h |   8 +++
 fs/xfs/libxfs/xfs_parent.c    | 113 ++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h    |  19 ++++++
 fs/xfs/scrub/attr.c           |   2 +-
 fs/xfs/xfs_attr_item.c        |   6 +-
 fs/xfs/xfs_attr_list.c        |  14 +++--
 9 files changed, 165 insertions(+), 11 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index fbe3cdc79036..8be90c685b0b 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -41,6 +41,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
 				   xfs_ag_resv.o \
+				   xfs_parent.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
 				   xfs_refcount.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 1292ab043b4f..f9846df41669 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -26,6 +26,7 @@
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
+#include "xfs_parent.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1514,9 +1515,14 @@ xfs_attr_node_get(
 /* Returns true if the attribute entry name is valid. */
 bool
 xfs_attr_namecheck(
-	const void	*name,
-	size_t		length)
+	struct xfs_mount	*mp,
+	const void		*name,
+	size_t			length,
+	unsigned int		flags)
 {
+	if (flags & XFS_ATTR_PARENT)
+		return xfs_parent_namecheck(mp, name, length, flags);
+
 	/*
 	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
 	 * out, so use >= for the length check.
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 81be9b3e4004..92711c8d2a9f 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
-bool xfs_attr_namecheck(const void *name, size_t length);
+bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
+		unsigned int flags);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index e5eacfe75021..1b79c4de90bc 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -746,6 +746,14 @@ xfs_attr3_leaf_name(xfs_attr_leafblock_t *leafp, int idx)
 	return &((char *)leafp)[be16_to_cpu(entries[idx].nameidx)];
 }
 
+static inline int
+xfs_attr3_leaf_flags(xfs_attr_leafblock_t *leafp, int idx)
+{
+	struct xfs_attr_leaf_entry *entries = xfs_attr3_leaf_entryp(leafp);
+
+	return entries[idx].flags;
+}
+
 static inline xfs_attr_leaf_name_remote_t *
 xfs_attr3_leaf_name_remote(xfs_attr_leafblock_t *leafp, int idx)
 {
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
new file mode 100644
index 000000000000..1d45f926c13a
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022-2024 Oracle.
+ * All rights reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr_sf.h"
+#include "xfs_bmap.h"
+#include "xfs_defer.h"
+#include "xfs_log.h"
+#include "xfs_xattr.h"
+#include "xfs_parent.h"
+#include "xfs_trans_space.h"
+
+/*
+ * Parent pointer attribute handling.
+ *
+ * Because the attribute value is a filename component, it will never be longer
+ * than 255 bytes. This means the attribute will always be a local format
+ * attribute as it is xfs_attr_leaf_entsize_local_max() for v5 filesystems will
+ * always be larger than this (max is 75% of block size).
+ *
+ * Creating a new parent attribute will always create a new attribute - there
+ * should never, ever be an existing attribute in the tree for a new inode.
+ * ENOSPC behavior is problematic - creating the inode without the parent
+ * pointer is effectively a corruption, so we allow parent attribute creation
+ * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
+ * occurring.
+ */
+
+/* Return true if parent pointer EA name is valid. */
+bool
+xfs_parent_namecheck(
+	struct xfs_mount			*mp,
+	const struct xfs_parent_name_rec	*rec,
+	size_t					reclen,
+	unsigned int				attr_flags)
+{
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return false;
+
+	/* pptr updates use logged xattrs, so we should never see this flag */
+	if (attr_flags & XFS_ATTR_INCOMPLETE)
+		return false;
+
+	if (reclen != sizeof(struct xfs_parent_name_rec))
+		return false;
+
+	/* Only one namespace bit allowed. */
+	if (hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) > 1)
+		return false;
+
+	return true;
+}
+
+/* Return true if parent pointer EA value is valid. */
+bool
+xfs_parent_valuecheck(
+	struct xfs_mount		*mp,
+	const void			*value,
+	size_t				valuelen)
+{
+	if (valuelen == 0 || valuelen > XFS_PARENT_DIRENT_NAME_MAX_SIZE)
+		return false;
+
+	if (value == NULL)
+		return false;
+
+	return true;
+}
+
+/* Return true if the ondisk parent pointer is consistent. */
+bool
+xfs_parent_hashcheck(
+	struct xfs_mount		*mp,
+	const struct xfs_parent_name_rec *rec,
+	const void			*value,
+	size_t				valuelen)
+{
+	struct xfs_name			dname = {
+		.name			= value,
+		.len			= valuelen,
+	};
+	xfs_ino_t			p_ino;
+
+	/* Valid dirent name? */
+	if (!xfs_dir2_namecheck(value, valuelen))
+		return false;
+
+	/* Valid inode number? */
+	p_ino = be64_to_cpu(rec->p_ino);
+	if (!xfs_verify_dir_ino(mp, p_ino))
+		return false;
+
+	/* Namehash matches name? */
+	return be32_to_cpu(rec->p_namehash) == xfs_dir2_hashname(mp, &dname);
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
new file mode 100644
index 000000000000..fcfeddb645f6
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022-2024 Oracle.
+ * All Rights Reserved.
+ */
+#ifndef	__XFS_PARENT_H__
+#define	__XFS_PARENT_H__
+
+/* Metadata validators */
+bool xfs_parent_namecheck(struct xfs_mount *mp,
+		const struct xfs_parent_name_rec *rec, size_t reclen,
+		unsigned int attr_flags);
+bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
+		size_t valuelen);
+bool xfs_parent_hashcheck(struct xfs_mount *mp,
+		const struct xfs_parent_name_rec *rec, const void *value,
+		size_t valuelen);
+
+#endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 49f91cc85a65..9a1f59f7b5a4 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -195,7 +195,7 @@ xchk_xattr_listent(
 	}
 
 	/* Does this name make sense? */
-	if (!xfs_attr_namecheck(name, namelen)) {
+	if (!xfs_attr_namecheck(sx->sc->mp, name, namelen, flags)) {
 		xchk_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK, args.blkno);
 		goto fail_xref;
 	}
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 9e02111bd890..6f6eeaaa9010 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -588,7 +588,8 @@ xfs_attr_recover_work(
 	 */
 	attrp = &attrip->attri_format;
 	if (!xfs_attri_validate(mp, attrp) ||
-	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
+	    !xfs_attr_namecheck(mp, nv->name.i_addr, nv->name.i_len,
+				attrp->alfi_attr_filter))
 		return -EFSCORRUPTED;
 
 	attr = xfs_attri_recover_work(mp, dfp, attrp, &ip, nv);
@@ -728,7 +729,8 @@ xlog_recover_attri_commit_pass2(
 		return -EFSCORRUPTED;
 	}
 
-	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
+	if (!xfs_attr_namecheck(mp, attr_name, attri_formatp->alfi_name_len,
+				attri_formatp->alfi_attr_filter)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				item->ri_buf[1].i_addr, item->ri_buf[1].i_len);
 		return -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index e368ad671e26..1521ca2f0ce3 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -58,6 +58,7 @@ xfs_attr_shortform_list(
 	struct xfs_attr_sf_sort		*sbuf, *sbp;
 	struct xfs_attr_sf_hdr		*sf = dp->i_af.if_data;
 	struct xfs_attr_sf_entry	*sfe;
+	struct xfs_mount		*mp = dp->i_mount;
 	int				sbsize, nsbuf, count, i;
 	int				error = 0;
 
@@ -81,8 +82,9 @@ xfs_attr_shortform_list(
 	     (dp->i_af.if_bytes + sf->count * 16) < context->bufsize)) {
 		for (i = 0, sfe = xfs_attr_sf_firstentry(sf); i < sf->count; i++) {
 			if (XFS_IS_CORRUPT(context->dp->i_mount,
-					   !xfs_attr_namecheck(sfe->nameval,
-							       sfe->namelen)))
+					   !xfs_attr_namecheck(mp, sfe->nameval,
+							       sfe->namelen,
+							       sfe->flags)))
 				return -EFSCORRUPTED;
 			context->put_listent(context,
 					     sfe->flags,
@@ -173,8 +175,9 @@ xfs_attr_shortform_list(
 			cursor->offset = 0;
 		}
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(sbp->name,
-						       sbp->namelen))) {
+				   !xfs_attr_namecheck(mp, sbp->name,
+						       sbp->namelen,
+						       sbp->flags))) {
 			error = -EFSCORRUPTED;
 			goto out;
 		}
@@ -464,7 +467,8 @@ xfs_attr3_leaf_list_int(
 		}
 
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(name, namelen)))
+				   !xfs_attr_namecheck(mp, name, namelen,
+						       entry->flags)))
 			return -EFSCORRUPTED;
 		context->put_listent(context, entry->flags,
 					      name, namelen, valuelen);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (3 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 04/25] xfs: add parent pointer validator functions Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-23  4:23   ` Eric Biggers
  2024-02-12 16:58 ` [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity() Andrey Albershteyn
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 Documentation/filesystems/fsverity.rst | 12 ++++++++++++
 fs/ioctl.c                             | 11 +++++++++++
 include/uapi/linux/fs.h                |  1 +
 3 files changed, 24 insertions(+)

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 13e4b18e5dbb..19e59e87999e 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -326,6 +326,18 @@ the file has fs-verity enabled.  This can perform better than
 FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
 opening the file, and opening verity files can be expensive.
 
+FS_IOC_FSGETXATTR
+-----------------
+
+Since Linux v6.9, FS_XFLAG_VERITY (0x00020000) file attribute is set for verity
+files. The attribute can be observed via lsattr.
+
+    [root@vm:~]# lsattr /mnt/test/foo
+    --------------------V- /mnt/test/foo
+
+Note that this attribute cannot be set with FS_IOC_FSSETXATTR as enabling verity
+requires input parameters. See FS_IOC_ENABLE_VERITY.
+
 .. _accessing_verity_files:
 
 Accessing verity files
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 76cf22ac97d7..38c00e47c069 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -481,6 +481,8 @@ void fileattr_fill_xflags(struct fileattr *fa, u32 xflags)
 		fa->flags |= FS_DAX_FL;
 	if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
 		fa->flags |= FS_PROJINHERIT_FL;
+	if (fa->fsx_xflags & FS_XFLAG_VERITY)
+		fa->flags |= FS_VERITY_FL;
 }
 EXPORT_SYMBOL(fileattr_fill_xflags);
 
@@ -511,6 +513,8 @@ void fileattr_fill_flags(struct fileattr *fa, u32 flags)
 		fa->fsx_xflags |= FS_XFLAG_DAX;
 	if (fa->flags & FS_PROJINHERIT_FL)
 		fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
+	if (fa->flags & FS_VERITY_FL)
+		fa->fsx_xflags |= FS_XFLAG_VERITY;
 }
 EXPORT_SYMBOL(fileattr_fill_flags);
 
@@ -641,6 +645,13 @@ static int fileattr_set_prepare(struct inode *inode,
 	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
 		return -EINVAL;
 
+	/*
+	 * Verity cannot be set through FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS.
+	 * See FS_IOC_ENABLE_VERITY
+	 */
+	if (fa->fsx_xflags & FS_XFLAG_VERITY)
+		return -EINVAL;
+
 	/* Extent size hints of zero turn off the flags. */
 	if (fa->fsx_extsize == 0)
 		fa->fsx_xflags &= ~(FS_XFLAG_EXTSIZE | FS_XFLAG_EXTSZINHERIT);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 48ad69f7722e..6e63ea832d4f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -140,6 +140,7 @@ struct fsxattr {
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#define FS_XFLAG_VERITY		0x00020000	/* fs-verity sealed inode */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity()
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (4 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-15 21:45   ` Dave Chinner
  2024-02-23  4:26   ` Eric Biggers
  2024-02-12 16:58 ` [PATCH v4 07/25] fsverity: support block-based Merkle tree caching Andrey Albershteyn
                   ` (18 subsequent siblings)
  24 siblings, 2 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

XFS will need to know log_blocksize to remove the tree in case of an
error. The size is needed to calculate offsets of particular Merkle
tree blocks.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/btrfs/verity.c        | 4 +++-
 fs/ext4/verity.c         | 3 ++-
 fs/f2fs/verity.c         | 3 ++-
 fs/verity/enable.c       | 6 ++++--
 include/linux/fsverity.h | 4 +++-
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index 66e2270b0dae..84e9b1480241 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -621,6 +621,7 @@ static int btrfs_begin_enable_verity(struct file *filp)
  * @desc:              verity descriptor to write out (NULL in error conditions)
  * @desc_size:         size of the verity descriptor (variable with signatures)
  * @merkle_tree_size:  size of the merkle tree in bytes
+ * @tree_blocksize:    size of the Merkle tree block
  *
  * If desc is null, then VFS is signaling an error occurred during verity
  * enable, and we should try to rollback. Otherwise, attempt to finish verity.
@@ -628,7 +629,8 @@ static int btrfs_begin_enable_verity(struct file *filp)
  * Returns 0 on success, negative error code on error.
  */
 static int btrfs_end_enable_verity(struct file *filp, const void *desc,
-				   size_t desc_size, u64 merkle_tree_size)
+				   size_t desc_size, u64 merkle_tree_size,
+				   unsigned int tree_blocksize)
 {
 	struct btrfs_inode *inode = BTRFS_I(file_inode(filp));
 	int ret = 0;
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index 2f37e1ea3955..da2095a81349 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -189,7 +189,8 @@ static int ext4_write_verity_descriptor(struct inode *inode, const void *desc,
 }
 
 static int ext4_end_enable_verity(struct file *filp, const void *desc,
-				  size_t desc_size, u64 merkle_tree_size)
+				  size_t desc_size, u64 merkle_tree_size,
+				  unsigned int tree_blocksize)
 {
 	struct inode *inode = file_inode(filp);
 	const int credits = 2; /* superblock and inode for ext4_orphan_del() */
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 4fc95f353a7a..b4461b9f47a3 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -144,7 +144,8 @@ static int f2fs_begin_enable_verity(struct file *filp)
 }
 
 static int f2fs_end_enable_verity(struct file *filp, const void *desc,
-				  size_t desc_size, u64 merkle_tree_size)
+				  size_t desc_size, u64 merkle_tree_size,
+				  unsigned int tree_blocksize)
 {
 	struct inode *inode = file_inode(filp);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index c284f46d1b53..04e060880b79 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -274,7 +274,8 @@ static int enable_verity(struct file *filp,
 	 * Serialized with ->begin_enable_verity() by the inode lock.
 	 */
 	inode_lock(inode);
-	err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size);
+	err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size,
+				      params.block_size);
 	inode_unlock(inode);
 	if (err) {
 		fsverity_err(inode, "%ps() failed with err %d",
@@ -300,7 +301,8 @@ static int enable_verity(struct file *filp,
 
 rollback:
 	inode_lock(inode);
-	(void)vops->end_enable_verity(filp, NULL, 0, params.tree_size);
+	(void)vops->end_enable_verity(filp, NULL, 0, params.tree_size,
+				      params.block_size);
 	inode_unlock(inode);
 	goto out;
 }
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 1eb7eae580be..ab7b0772899b 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -51,6 +51,7 @@ struct fsverity_operations {
 	 * @desc: the verity descriptor to write, or NULL on failure
 	 * @desc_size: size of verity descriptor, or 0 on failure
 	 * @merkle_tree_size: total bytes the Merkle tree took up
+	 * @tree_blocksize: size of the Merkle tree block
 	 *
 	 * If desc == NULL, then enabling verity failed and the filesystem only
 	 * must do any necessary cleanups.  Else, it must also store the given
@@ -65,7 +66,8 @@ struct fsverity_operations {
 	 * Return: 0 on success, -errno on failure
 	 */
 	int (*end_enable_verity)(struct file *filp, const void *desc,
-				 size_t desc_size, u64 merkle_tree_size);
+				 size_t desc_size, u64 merkle_tree_size,
+				 unsigned int tree_blocksize);
 
 	/**
 	 * Get the verity descriptor of the given inode.
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 07/25] fsverity: support block-based Merkle tree caching
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (5 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity() Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-23  5:24   ` Eric Biggers
  2024-02-12 16:58 ` [PATCH v4 08/25] fsverity: calculate readahead in bytes instead of pages Andrey Albershteyn
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

In the current implementation fs-verity expects filesystem to
provide PAGEs filled with Merkle tree blocks. Then, when fs-verity
is done with processing the blocks, reference to PAGE is freed. This
doesn't fit well with the way XFS manages its memory.

To allow XFS integrate fs-verity this patch changes fs-verity
verification code to take Merkle tree blocks instead of PAGE
reference. Then, adds a thin compatibility layer to work with both
approaches. This way ext4, f2fs, and btrfs are still able to pass
PAGE references and XFS can pass reference to Merkle tree blocks
stored in XFS's buffer infrastructure.

Another addition is invalidation functions which tells fs-verity to
mark part of Merkle tree as not verified. These functions are used
by filesystem to tell fs-verity to invalidate blocks which were
evicted from memory.

Depending on Merkle tree block size fs-verity is using either bitmap
or PG_checked flag to track "verified" status of the blocks. With a
Merkle tree block caching (XFS) there is no PAGE to flag it as
verified. fs-verity always uses bitmap to track verified blocks for
filesystems which use block caching.

As verification function now works only with blocks - memory
barriers, used for verified status updates, are moved from
is_hash_block_verified() to fsverity_invalidate_page/range().
Depending on block or page caching, fs-verity clears bits in bitmap
based on PG_checked or from filesystem call out.

Further this patch allows filesystem to make additional processing
on verified pages instead of just dropping a reference via
fsverity_drop_block(). This will be used by XFS for internal buffer
cache manipulation in further patches. The btrfs, ext4, and f2fs
just drop the reference.

As btrfs, ext4 and f2fs return page with Merkle tree blocks this
patch also adds fsverity_read_merkle_tree_block() which wraps
addressing blocks in the page.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/verity/fsverity_private.h |  27 ++++
 fs/verity/open.c             |   8 +-
 fs/verity/read_metadata.c    |  48 +++---
 fs/verity/verify.c           | 280 ++++++++++++++++++++++++-----------
 include/linux/fsverity.h     |  69 +++++++++
 5 files changed, 316 insertions(+), 116 deletions(-)

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index b3506f56e180..72ac1cdd9e63 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -154,4 +154,31 @@ static inline void fsverity_init_signature(void)
 
 void __init fsverity_init_workqueue(void);
 
+/**
+ * fsverity_drop_block() - drop block obtained with ->read_merkle_tree_block()
+ * @inode: inode in use for verification or metadata reading
+ * @block: block to be dropped
+ *
+ * Calls out back to filesystem if ->drop_block() is set, otherwise, drop the
+ * reference in the block->context.
+ */
+void fsverity_drop_block(struct inode *inode,
+			 struct fsverity_blockbuf *block);
+
+/**
+ * fsverity_read_block_from_page() - general function to read Merkle tree block
+ * @inode: inode in use for verification or metadata reading
+ * @pos: byte offset of the block within the Merkle tree
+ * @block: block to read
+ * @num_ra_pages: number of pages to readahead, may be ignored
+ *
+ * Depending on fs implementation use read_merkle_tree_block() or
+ * read_merkle_tree_page() to read blocks.
+ */
+int fsverity_read_merkle_tree_block(struct inode *inode,
+				    u64 pos,
+				    struct fsverity_blockbuf *block,
+				    unsigned int log_blocksize,
+				    unsigned long num_ra_pages);
+
 #endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/open.c b/fs/verity/open.c
index fdeb95eca3af..6e6922b4b014 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -213,7 +213,13 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
 	if (err)
 		goto fail;
 
-	if (vi->tree_params.block_size != PAGE_SIZE) {
+	/*
+	 * If fs passes Merkle tree blocks to fs-verity (e.g. XFS), then
+	 * fs-verity should use hash_block_verified bitmap as there's no page
+	 * to mark it with PG_checked.
+	 */
+	if (vi->tree_params.block_size != PAGE_SIZE ||
+			inode->i_sb->s_vop->read_merkle_tree_block) {
 		/*
 		 * When the Merkle tree block size and page size differ, we use
 		 * a bitmap to keep track of which hash blocks have been
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index f58432772d9e..7e153356e7bc 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -16,9 +16,10 @@ static int fsverity_read_merkle_tree(struct inode *inode,
 				     const struct fsverity_info *vi,
 				     void __user *buf, u64 offset, int length)
 {
-	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	u64 end_offset;
-	unsigned int offs_in_page;
+	unsigned int offs_in_block;
+	const unsigned int block_size = vi->tree_params.block_size;
+	const u8 log_blocksize = vi->tree_params.log_blocksize;
 	pgoff_t index, last_index;
 	int retval = 0;
 	int err = 0;
@@ -26,42 +27,39 @@ static int fsverity_read_merkle_tree(struct inode *inode,
 	end_offset = min(offset + length, vi->tree_params.tree_size);
 	if (offset >= end_offset)
 		return 0;
-	offs_in_page = offset_in_page(offset);
-	last_index = (end_offset - 1) >> PAGE_SHIFT;
+	offs_in_block = offset & (block_size - 1);
+	last_index = (end_offset - 1) >> log_blocksize;
 
 	/*
-	 * Iterate through each Merkle tree page in the requested range and copy
-	 * the requested portion to userspace.  Note that the Merkle tree block
-	 * size isn't important here, as we are returning a byte stream; i.e.,
-	 * we can just work with pages even if the tree block size != PAGE_SIZE.
+	 * Iterate through each Merkle tree block in the requested range and
+	 * copy the requested portion to userspace. Note that we are returning
+	 * a byte stream, so PAGE_SIZE & block_size are not important here.
 	 */
-	for (index = offset >> PAGE_SHIFT; index <= last_index; index++) {
+	for (index = offset >> log_blocksize; index <= last_index; index++) {
 		unsigned long num_ra_pages =
 			min_t(unsigned long, last_index - index + 1,
 			      inode->i_sb->s_bdi->io_pages);
 		unsigned int bytes_to_copy = min_t(u64, end_offset - offset,
-						   PAGE_SIZE - offs_in_page);
-		struct page *page;
-		const void *virt;
+						   block_size - offs_in_block);
+		struct fsverity_blockbuf block;
 
-		page = vops->read_merkle_tree_page(inode, index, num_ra_pages);
-		if (IS_ERR(page)) {
-			err = PTR_ERR(page);
-			fsverity_err(inode,
-				     "Error %d reading Merkle tree page %lu",
-				     err, index);
+		block.size = block_size;
+		if (fsverity_read_merkle_tree_block(inode,
+					index << log_blocksize,
+					&block, log_blocksize,
+					num_ra_pages)) {
+			fsverity_drop_block(inode, &block);
+			err = -EIO;
 			break;
 		}
 
-		virt = kmap_local_page(page);
-		if (copy_to_user(buf, virt + offs_in_page, bytes_to_copy)) {
-			kunmap_local(virt);
-			put_page(page);
+		if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) {
+			fsverity_drop_block(inode, &block);
 			err = -EFAULT;
 			break;
 		}
-		kunmap_local(virt);
-		put_page(page);
+		fsverity_drop_block(inode, &block);
+		block.kaddr = NULL;
 
 		retval += bytes_to_copy;
 		buf += bytes_to_copy;
@@ -72,7 +70,7 @@ static int fsverity_read_merkle_tree(struct inode *inode,
 			break;
 		}
 		cond_resched();
-		offs_in_page = 0;
+		offs_in_block = 0;
 	}
 	return retval ? retval : err;
 }
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 4fcad0825a12..414ec3321fe6 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -13,69 +13,18 @@
 static struct workqueue_struct *fsverity_read_workqueue;
 
 /*
- * Returns true if the hash block with index @hblock_idx in the tree, located in
- * @hpage, has already been verified.
+ * Returns true if the hash block with index @hblock_idx in the tree has
+ * already been verified.
  */
-static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
+static bool is_hash_block_verified(struct fsverity_info *vi,
+				   struct fsverity_blockbuf *block,
 				   unsigned long hblock_idx)
 {
-	unsigned int blocks_per_page;
-	unsigned int i;
-
-	/*
-	 * When the Merkle tree block size and page size are the same, then the
-	 * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
-	 * to directly indicate whether the page's block has been verified.
-	 *
-	 * Using PG_checked also guarantees that we re-verify hash pages that
-	 * get evicted and re-instantiated from the backing storage, as new
-	 * pages always start out with PG_checked cleared.
-	 */
+	/* Merkle tree block size == PAGE_SIZE */
 	if (!vi->hash_block_verified)
-		return PageChecked(hpage);
+		return block->verified;
 
-	/*
-	 * When the Merkle tree block size and page size differ, we use a bitmap
-	 * to indicate whether each hash block has been verified.
-	 *
-	 * However, we still need to ensure that hash pages that get evicted and
-	 * re-instantiated from the backing storage are re-verified.  To do
-	 * this, we use PG_checked again, but now it doesn't really mean
-	 * "checked".  Instead, now it just serves as an indicator for whether
-	 * the hash page is newly instantiated or not.  If the page is new, as
-	 * indicated by PG_checked=0, we clear the bitmap bits for the page's
-	 * blocks since they are untrustworthy, then set PG_checked=1.
-	 * Otherwise we return the bitmap bit for the requested block.
-	 *
-	 * Multiple threads may execute this code concurrently on the same page.
-	 * This is safe because we use memory barriers to ensure that if a
-	 * thread sees PG_checked=1, then it also sees the associated bitmap
-	 * clearing to have occurred.  Also, all writes and their corresponding
-	 * reads are atomic, and all writes are safe to repeat in the event that
-	 * multiple threads get into the PG_checked=0 section.  (Clearing a
-	 * bitmap bit again at worst causes a hash block to be verified
-	 * redundantly.  That event should be very rare, so it's not worth using
-	 * a lock to avoid.  Setting PG_checked again has no effect.)
-	 */
-	if (PageChecked(hpage)) {
-		/*
-		 * A read memory barrier is needed here to give ACQUIRE
-		 * semantics to the above PageChecked() test.
-		 */
-		smp_rmb();
-		return test_bit(hblock_idx, vi->hash_block_verified);
-	}
-	blocks_per_page = vi->tree_params.blocks_per_page;
-	hblock_idx = round_down(hblock_idx, blocks_per_page);
-	for (i = 0; i < blocks_per_page; i++)
-		clear_bit(hblock_idx + i, vi->hash_block_verified);
-	/*
-	 * A write memory barrier is needed here to give RELEASE semantics to
-	 * the below SetPageChecked() operation.
-	 */
-	smp_wmb();
-	SetPageChecked(hpage);
-	return false;
+	return test_bit(hblock_idx, vi->hash_block_verified);
 }
 
 /*
@@ -95,15 +44,15 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 	const struct merkle_tree_params *params = &vi->tree_params;
 	const unsigned int hsize = params->digest_size;
 	int level;
+	int err;
+	int num_ra_pages;
 	u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
 	const u8 *want_hash;
 	u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
 	/* The hash blocks that are traversed, indexed by level */
 	struct {
-		/* Page containing the hash block */
-		struct page *page;
-		/* Mapped address of the hash block (will be within @page) */
-		const void *addr;
+		/* Buffer containing the hash block */
+		struct fsverity_blockbuf block;
 		/* Index of the hash block in the tree overall */
 		unsigned long index;
 		/* Byte offset of the wanted hash relative to @addr */
@@ -144,10 +93,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		unsigned long next_hidx;
 		unsigned long hblock_idx;
 		pgoff_t hpage_idx;
-		unsigned int hblock_offset_in_page;
 		unsigned int hoffset;
-		struct page *hpage;
-		const void *haddr;
+		struct fsverity_blockbuf *block = &hblocks[level].block;
 
 		/*
 		 * The index of the block in the current level; also the index
@@ -161,33 +108,27 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		/* Index of the hash page in the tree overall */
 		hpage_idx = hblock_idx >> params->log_blocks_per_page;
 
-		/* Byte offset of the hash block within the page */
-		hblock_offset_in_page =
-			(hblock_idx << params->log_blocksize) & ~PAGE_MASK;
-
 		/* Byte offset of the hash within the block */
 		hoffset = (hidx << params->log_digestsize) &
 			  (params->block_size - 1);
 
-		hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
-				hpage_idx, level == 0 ? min(max_ra_pages,
-					params->tree_pages - hpage_idx) : 0);
-		if (IS_ERR(hpage)) {
+		num_ra_pages = level == 0 ?
+			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
+		err = fsverity_read_merkle_tree_block(
+			inode, hblock_idx << params->log_blocksize, block,
+			params->log_blocksize, num_ra_pages);
+		if (err) {
 			fsverity_err(inode,
-				     "Error %ld reading Merkle tree page %lu",
-				     PTR_ERR(hpage), hpage_idx);
+				     "Error %d reading Merkle tree block %lu",
+				     err, hblock_idx);
 			goto error;
 		}
-		haddr = kmap_local_page(hpage) + hblock_offset_in_page;
-		if (is_hash_block_verified(vi, hpage, hblock_idx)) {
-			memcpy(_want_hash, haddr + hoffset, hsize);
+		if (is_hash_block_verified(vi, block, hblock_idx)) {
+			memcpy(_want_hash, block->kaddr + hoffset, hsize);
 			want_hash = _want_hash;
-			kunmap_local(haddr);
-			put_page(hpage);
+			fsverity_drop_block(inode, block);
 			goto descend;
 		}
-		hblocks[level].page = hpage;
-		hblocks[level].addr = haddr;
 		hblocks[level].index = hblock_idx;
 		hblocks[level].hoffset = hoffset;
 		hidx = next_hidx;
@@ -197,8 +138,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 descend:
 	/* Descend the tree verifying hash blocks. */
 	for (; level > 0; level--) {
-		struct page *hpage = hblocks[level - 1].page;
-		const void *haddr = hblocks[level - 1].addr;
+		struct fsverity_blockbuf *block = &hblocks[level - 1].block;
+		const void *haddr = block->kaddr;
 		unsigned long hblock_idx = hblocks[level - 1].index;
 		unsigned int hoffset = hblocks[level - 1].hoffset;
 
@@ -213,12 +154,10 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		 */
 		if (vi->hash_block_verified)
 			set_bit(hblock_idx, vi->hash_block_verified);
-		else
-			SetPageChecked(hpage);
+		block->verified = true;
 		memcpy(_want_hash, haddr + hoffset, hsize);
 		want_hash = _want_hash;
-		kunmap_local(haddr);
-		put_page(hpage);
+		fsverity_drop_block(inode, block);
 	}
 
 	/* Finally, verify the data block. */
@@ -236,8 +175,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		     params->hash_alg->name, hsize, real_hash);
 error:
 	for (; level > 0; level--) {
-		kunmap_local(hblocks[level - 1].addr);
-		put_page(hblocks[level - 1].page);
+		fsverity_drop_block(inode, &hblocks[level - 1].block);
 	}
 	return false;
 }
@@ -362,3 +300,165 @@ void __init fsverity_init_workqueue(void)
 	if (!fsverity_read_workqueue)
 		panic("failed to allocate fsverity_read_queue");
 }
+
+/**
+ * fsverity_invalidate_range() - invalidate range of Merkle tree blocks
+ * @inode: inode to which this Merkle tree blocks belong
+ * @offset: offset into the Merkle tree
+ * @size: number of bytes to invalidate starting from @offset
+ *
+ * This function invalidates/clears "verified" state of all Merkle tree blocks
+ * in the Merkle tree within the range starting from 'offset' to 'offset + size'.
+ *
+ * Note! As this function clears fs-verity bitmap and can be run from multiple
+ * threads simultaneously, filesystem has to take care of operation ordering
+ * while invalidating Merkle tree and caching it. See fsverity_invalidate_page()
+ * as reference.
+ */
+void fsverity_invalidate_range(struct inode *inode, loff_t offset,
+		size_t size)
+{
+	struct fsverity_info *vi = inode->i_verity_info;
+	const unsigned int log_blocksize = vi->tree_params.log_blocksize;
+	unsigned int i;
+	pgoff_t index = offset >> log_blocksize;
+	unsigned int blocks = size >> log_blocksize;
+
+	if (offset + size > vi->tree_params.tree_size) {
+		fsverity_err(inode,
+"Trying to invalidate beyond Merkle tree (tree %lld, offset %lld, size %ld)",
+			     vi->tree_params.tree_size, offset, size);
+		return;
+	}
+
+	for (i = 0; i < blocks; i++)
+		clear_bit(index + i, vi->hash_block_verified);
+}
+EXPORT_SYMBOL_GPL(fsverity_invalidate_range);
+
+/* fsverity_invalidate_page() - invalidate Merkle tree blocks in the page
+ * @inode: inode to which this Merkle tree blocks belong
+ * @page: page which contains blocks which need to be invalidated
+ * @index: index of the first Merkle tree block in the page
+ *
+ * This function invalidates "verified" state of all Merkle tree blocks within
+ * the 'page'.
+ *
+ * When the Merkle tree block size and page size are the same, then the
+ * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
+ * to directly indicate whether the page's block has been verified. This
+ * function does nothing in this case as page is invalidated by evicting from
+ * the memory.
+ *
+ * Using PG_checked also guarantees that we re-verify hash pages that
+ * get evicted and re-instantiated from the backing storage, as new
+ * pages always start out with PG_checked cleared.
+ */
+void fsverity_invalidate_page(struct inode *inode, struct page *page,
+		pgoff_t index)
+{
+	unsigned int blocks_per_page;
+	struct fsverity_info *vi = inode->i_verity_info;
+	const unsigned int log_blocksize = vi->tree_params.log_blocksize;
+
+	/*
+	 * If bitmap is not allocated, that means that fs-verity uses PG_checked
+	 * to track verification status of the blocks.
+	 */
+	if (!vi->hash_block_verified)
+		return;
+
+	/*
+	 * When the Merkle tree block size and page size differ, we use a bitmap
+	 * to indicate whether each hash block has been verified.
+	 *
+	 * However, we still need to ensure that hash pages that get evicted and
+	 * re-instantiated from the backing storage are re-verified.  To do
+	 * this, we use PG_checked again, but now it doesn't really mean
+	 * "checked".  Instead, now it just serves as an indicator for whether
+	 * the hash page is newly instantiated or not.  If the page is new, as
+	 * indicated by PG_checked=0, we clear the bitmap bits for the page's
+	 * blocks since they are untrustworthy, then set PG_checked=1.
+	 *
+	 * Multiple threads may execute this code concurrently on the same page.
+	 * This is safe because we use memory barriers to ensure that if a
+	 * thread sees PG_checked=1, then it also sees the associated bitmap
+	 * clearing to have occurred.  Also, all writes and their corresponding
+	 * reads are atomic, and all writes are safe to repeat in the event that
+	 * multiple threads get into the PG_checked=0 section.  (Clearing a
+	 * bitmap bit again at worst causes a hash block to be verified
+	 * redundantly.  That event should be very rare, so it's not worth using
+	 * a lock to avoid.  Setting PG_checked again has no effect.)
+	 */
+	if (PageChecked(page)) {
+		/*
+		 * A read memory barrier is needed here to give ACQUIRE
+		 * semantics to the above PageChecked() test.
+		 */
+		smp_rmb();
+		return;
+	}
+
+	blocks_per_page = vi->tree_params.blocks_per_page;
+	index = round_down(index, blocks_per_page);
+	fsverity_invalidate_range(inode, index << log_blocksize, PAGE_SIZE);
+	/*
+	 * A write memory barrier is needed here to give RELEASE
+	 * semantics to the below SetPageChecked() operation.
+	 */
+	smp_wmb();
+	SetPageChecked(page);
+}
+
+void fsverity_drop_block(struct inode *inode,
+		struct fsverity_blockbuf *block)
+{
+	if (inode->i_sb->s_vop->drop_block)
+		inode->i_sb->s_vop->drop_block(block);
+	else {
+		struct page *page = (struct page *)block->context;
+
+		/* Merkle tree block size == PAGE_SIZE; */
+		if (block->verified)
+			SetPageChecked(page);
+
+		kunmap_local(block->kaddr);
+		put_page(page);
+	}
+}
+
+int fsverity_read_merkle_tree_block(struct inode *inode,
+					u64 pos,
+					struct fsverity_blockbuf *block,
+					unsigned int log_blocksize,
+					unsigned long num_ra_pages)
+{
+	struct page *page;
+	int err = 0;
+	unsigned long index = pos >> PAGE_SHIFT;
+
+	if (inode->i_sb->s_vop->read_merkle_tree_block)
+		return inode->i_sb->s_vop->read_merkle_tree_block(
+			inode, pos, block, log_blocksize, num_ra_pages);
+
+	page = inode->i_sb->s_vop->read_merkle_tree_page(
+			inode, index, num_ra_pages);
+	if (IS_ERR(page)) {
+		err = PTR_ERR(page);
+		fsverity_err(inode,
+			     "Error %d reading Merkle tree page %lu",
+			     err, index);
+		return PTR_ERR(page);
+	}
+
+	fsverity_invalidate_page(inode, page, index);
+	/*
+	 * For the block size == PAGE_SIZE case set ->verified. The PG_checked
+	 * indicates whether block in the page is verified.
+	 */
+	block->verified = PageChecked(page);
+	block->kaddr = kmap_local_page(page) + (pos & (PAGE_SIZE - 1));
+	block->context = page;
+
+	return 0;
+}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index ab7b0772899b..fb2d4fccec0c 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -26,6 +26,36 @@
 /* Arbitrary limit to bound the kmalloc() size.  Can be changed. */
 #define FS_VERITY_MAX_DESCRIPTOR_SIZE	16384
 
+/**
+ * struct fsverity_blockbuf - Merkle Tree block
+ * @kaddr: virtual address of the block's data
+ * @size: buffer size
+ * @verified: true if block is verified against Merkle tree
+ * @context: filesystem private context
+ *
+ * Buffer containing single Merkle Tree block. These buffers are passed
+ *  - to filesystem, when fs-verity is building/writing merkel tree,
+ *  - from filesystem, when fs-verity is reading merkle tree from a disk.
+ * Filesystems sets kaddr together with size to point to a memory which contains
+ * Merkle tree block. Same is done by fs-verity when Merkle tree is need to be
+ * written down to disk.
+ *
+ * While reading the tree, fs-verity calls ->read_merkle_tree_block followed by
+ * ->drop_block to let filesystem know that memory can be freed.
+ *
+ * For Merkle tree block == PAGE_SIZE, fs-verity sets verified flag to true if
+ * block in the buffer was verified.
+ *
+ * The context is optional. This field can be used by filesystem to passthrough
+ * state from ->read_merkle_tree_block to ->drop_block.
+ */
+struct fsverity_blockbuf {
+	void *kaddr;
+	unsigned int size;
+	bool verified;
+	void *context;
+};
+
 /* Verity operations for filesystems */
 struct fsverity_operations {
 
@@ -107,6 +137,32 @@ struct fsverity_operations {
 					      pgoff_t index,
 					      unsigned long num_ra_pages);
 
+	/**
+	 * Read a Merkle tree block of the given inode.
+	 * @inode: the inode
+	 * @pos: byte offset of the block within the Merkle tree
+	 * @block: block buffer for filesystem to point it to the block
+	 * @log_blocksize: size of the expected block
+	 * @num_ra_pages: The number of pages with blocks that should be
+	 *		  prefetched starting at @index if the page at @index
+	 *		  isn't already cached.  Implementations may ignore this
+	 *		  argument; it's only a performance optimization.
+	 *
+	 * This can be called at any time on an open verity file.  It may be
+	 * called by multiple processes concurrently.
+	 *
+	 * As filesystem does caching of the blocks, this functions needs to tell
+	 * fsverity which blocks are not valid anymore (were evicted from memory)
+	 * by calling fsverity_invalidate_range().
+	 *
+	 * Return: 0 on success, -errno on failure
+	 */
+	int (*read_merkle_tree_block)(struct inode *inode,
+				      u64 pos,
+				      struct fsverity_blockbuf *block,
+				      unsigned int log_blocksize,
+				      unsigned long num_ra_pages);
+
 	/**
 	 * Write a Merkle tree block to the given inode.
 	 *
@@ -122,6 +178,16 @@ struct fsverity_operations {
 	 */
 	int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
 				       u64 pos, unsigned int size);
+
+	/**
+	 * Release the reference to a Merkle tree block
+	 *
+	 * @page: the block to release
+	 *
+	 * This is called when fs-verity is done with a block obtained with
+	 * ->read_merkle_tree_block().
+	 */
+	void (*drop_block)(struct fsverity_blockbuf *block);
 };
 
 #ifdef CONFIG_FS_VERITY
@@ -175,6 +241,9 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);
 bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
 void fsverity_verify_bio(struct bio *bio);
 void fsverity_enqueue_verify_work(struct work_struct *work);
+void fsverity_invalidate_range(struct inode *inode, loff_t offset, size_t size);
+void fsverity_invalidate_page(struct inode *inode, struct page *page,
+		pgoff_t index);
 
 #else /* !CONFIG_FS_VERITY */
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 08/25] fsverity: calculate readahead in bytes instead of pages
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (6 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 07/25] fsverity: support block-based Merkle tree caching Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-23  5:29   ` Eric Biggers
  2024-02-12 16:58 ` [PATCH v4 09/25] fsverity: add tracepoints Andrey Albershteyn
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Replace readahead unit from pages to bytes as fs-verity is now
mainly works with blocks instead of pages.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/verity/fsverity_private.h |  4 ++--
 fs/verity/verify.c           | 41 +++++++++++++++++++-----------------
 include/linux/fsverity.h     |  6 +++---
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 72ac1cdd9e63..2bf1f94d437c 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -170,7 +170,7 @@ void fsverity_drop_block(struct inode *inode,
  * @inode: inode in use for verification or metadata reading
  * @pos: byte offset of the block within the Merkle tree
  * @block: block to read
- * @num_ra_pages: number of pages to readahead, may be ignored
+ * @ra_bytes: number of bytes to readahead, may be ignored
  *
  * Depending on fs implementation use read_merkle_tree_block() or
  * read_merkle_tree_page() to read blocks.
@@ -179,6 +179,6 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 				    u64 pos,
 				    struct fsverity_blockbuf *block,
 				    unsigned int log_blocksize,
-				    unsigned long num_ra_pages);
+				    u64 ra_bytes);
 
 #endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 414ec3321fe6..6f4ff420c075 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -39,13 +39,12 @@ static bool is_hash_block_verified(struct fsverity_info *vi,
  */
 static bool
 verify_data_block(struct inode *inode, struct fsverity_info *vi,
-		  const void *data, u64 data_pos, unsigned long max_ra_pages)
+		  const void *data, u64 data_pos, u64 max_ra_bytes)
 {
 	const struct merkle_tree_params *params = &vi->tree_params;
 	const unsigned int hsize = params->digest_size;
 	int level;
 	int err;
-	int num_ra_pages;
 	u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
 	const u8 *want_hash;
 	u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
@@ -92,9 +91,11 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 	for (level = 0; level < params->num_levels; level++) {
 		unsigned long next_hidx;
 		unsigned long hblock_idx;
-		pgoff_t hpage_idx;
 		unsigned int hoffset;
 		struct fsverity_blockbuf *block = &hblocks[level].block;
+		u64 block_offset;
+		u64 ra_bytes = 0;
+		u64 tree_size;
 
 		/*
 		 * The index of the block in the current level; also the index
@@ -105,18 +106,20 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		/* Index of the hash block in the tree overall */
 		hblock_idx = params->level_start[level] + next_hidx;
 
-		/* Index of the hash page in the tree overall */
-		hpage_idx = hblock_idx >> params->log_blocks_per_page;
+		/* Offset of the Merkle tree block into the tree */
+		block_offset = hblock_idx << params->log_blocksize;
 
 		/* Byte offset of the hash within the block */
 		hoffset = (hidx << params->log_digestsize) &
 			  (params->block_size - 1);
 
-		num_ra_pages = level == 0 ?
-			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
+		if (level == 0) {
+			tree_size = params->tree_pages << PAGE_SHIFT;
+			ra_bytes = min(max_ra_bytes, (tree_size - block_offset));
+		}
 		err = fsverity_read_merkle_tree_block(
-			inode, hblock_idx << params->log_blocksize, block,
-			params->log_blocksize, num_ra_pages);
+			inode, block_offset, block,
+			params->log_blocksize, ra_bytes);
 		if (err) {
 			fsverity_err(inode,
 				     "Error %d reading Merkle tree block %lu",
@@ -182,7 +185,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 
 static bool
 verify_data_blocks(struct folio *data_folio, size_t len, size_t offset,
-		   unsigned long max_ra_pages)
+		   u64 max_ra_bytes)
 {
 	struct inode *inode = data_folio->mapping->host;
 	struct fsverity_info *vi = inode->i_verity_info;
@@ -200,7 +203,7 @@ verify_data_blocks(struct folio *data_folio, size_t len, size_t offset,
 
 		data = kmap_local_folio(data_folio, offset);
 		valid = verify_data_block(inode, vi, data, pos + offset,
-					  max_ra_pages);
+					  max_ra_bytes);
 		kunmap_local(data);
 		if (!valid)
 			return false;
@@ -246,24 +249,24 @@ EXPORT_SYMBOL_GPL(fsverity_verify_blocks);
 void fsverity_verify_bio(struct bio *bio)
 {
 	struct folio_iter fi;
-	unsigned long max_ra_pages = 0;
+	u64 max_ra_bytes = 0;
 
 	if (bio->bi_opf & REQ_RAHEAD) {
 		/*
 		 * If this bio is for data readahead, then we also do readahead
 		 * of the first (largest) level of the Merkle tree.  Namely,
-		 * when a Merkle tree page is read, we also try to piggy-back on
-		 * some additional pages -- up to 1/4 the number of data pages.
+		 * when a Merkle tree is read, we also try to piggy-back on
+		 * some additional bytes -- up to 1/4 of data.
 		 *
 		 * This improves sequential read performance, as it greatly
 		 * reduces the number of I/O requests made to the Merkle tree.
 		 */
-		max_ra_pages = bio->bi_iter.bi_size >> (PAGE_SHIFT + 2);
+		max_ra_bytes = bio->bi_iter.bi_size >> 2;
 	}
 
 	bio_for_each_folio_all(fi, bio) {
 		if (!verify_data_blocks(fi.folio, fi.length, fi.offset,
-					max_ra_pages)) {
+					max_ra_bytes)) {
 			bio->bi_status = BLK_STS_IOERR;
 			break;
 		}
@@ -431,7 +434,7 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 					u64 pos,
 					struct fsverity_blockbuf *block,
 					unsigned int log_blocksize,
-					unsigned long num_ra_pages)
+					u64 ra_bytes)
 {
 	struct page *page;
 	int err = 0;
@@ -439,10 +442,10 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 
 	if (inode->i_sb->s_vop->read_merkle_tree_block)
 		return inode->i_sb->s_vop->read_merkle_tree_block(
-			inode, pos, block, log_blocksize, num_ra_pages);
+			inode, pos, block, log_blocksize, ra_bytes);
 
 	page = inode->i_sb->s_vop->read_merkle_tree_page(
-			inode, index, num_ra_pages);
+			inode, index, (ra_bytes >> PAGE_SHIFT));
 	if (IS_ERR(page)) {
 		err = PTR_ERR(page);
 		fsverity_err(inode,
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index fb2d4fccec0c..7bb0e044c44e 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -143,8 +143,8 @@ struct fsverity_operations {
 	 * @pos: byte offset of the block within the Merkle tree
 	 * @block: block buffer for filesystem to point it to the block
 	 * @log_blocksize: size of the expected block
-	 * @num_ra_pages: The number of pages with blocks that should be
-	 *		  prefetched starting at @index if the page at @index
+	 * @ra_bytes: The number of bytes that should be
+	 *		  prefetched starting at @pos if the data at @pos
 	 *		  isn't already cached.  Implementations may ignore this
 	 *		  argument; it's only a performance optimization.
 	 *
@@ -161,7 +161,7 @@ struct fsverity_operations {
 				      u64 pos,
 				      struct fsverity_blockbuf *block,
 				      unsigned int log_blocksize,
-				      unsigned long num_ra_pages);
+				      u64 ra_bytes);
 
 	/**
 	 * Write a Merkle tree block to the given inode.
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 09/25] fsverity: add tracepoints
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (7 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 08/25] fsverity: calculate readahead in bytes instead of pages Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-23  5:31   ` Eric Biggers
  2024-02-12 16:58 ` [PATCH v4 10/25] iomap: integrate fsverity verification into iomap's read path Andrey Albershteyn
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

fs-verity previously had debug printk but it was removed. This patch
adds trace points to the same places where printk were used (with a
few additional ones).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/verity/enable.c              |   3 +
 fs/verity/fsverity_private.h    |   2 +
 fs/verity/init.c                |   1 +
 fs/verity/signature.c           |   2 +
 fs/verity/verify.c              |  10 ++
 include/trace/events/fsverity.h | 184 ++++++++++++++++++++++++++++++++
 6 files changed, 202 insertions(+)
 create mode 100644 include/trace/events/fsverity.h

diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 04e060880b79..945eba0092ab 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -227,6 +227,8 @@ static int enable_verity(struct file *filp,
 	if (err)
 		goto out;
 
+	trace_fsverity_enable(inode, desc, &params);
+
 	/*
 	 * Start enabling verity on this file, serialized by the inode lock.
 	 * Fail if verity is already enabled or is already being enabled.
@@ -255,6 +257,7 @@ static int enable_verity(struct file *filp,
 		fsverity_err(inode, "Error %d building Merkle tree", err);
 		goto rollback;
 	}
+	trace_fsverity_tree_done(inode, desc, &params);
 
 	/*
 	 * Create the fsverity_info.  Don't bother trying to save work by
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 2bf1f94d437c..4ac9786235b5 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -181,4 +181,6 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 				    unsigned int log_blocksize,
 				    u64 ra_bytes);
 
+#include <trace/events/fsverity.h>
+
 #endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index cb2c9aac61ed..3769d2dc9e3b 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -5,6 +5,7 @@
  * Copyright 2019 Google LLC
  */
 
+#define CREATE_TRACE_POINTS
 #include "fsverity_private.h"
 
 #include <linux/ratelimit.h>
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
index 90c07573dd77..c1f08bb32ed1 100644
--- a/fs/verity/signature.c
+++ b/fs/verity/signature.c
@@ -53,6 +53,8 @@ int fsverity_verify_signature(const struct fsverity_info *vi,
 	struct fsverity_formatted_digest *d;
 	int err;
 
+	trace_fsverity_verify_signature(inode, signature, sig_size);
+
 	if (sig_size == 0) {
 		if (fsverity_require_signatures) {
 			fsverity_err(inode,
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 6f4ff420c075..4375b0cd176e 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -57,6 +57,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		/* Byte offset of the wanted hash relative to @addr */
 		unsigned int hoffset;
 	} hblocks[FS_VERITY_MAX_LEVELS];
+	trace_fsverity_verify_block(inode, data_pos);
 	/*
 	 * The index of the previous level's block within that level; also the
 	 * index of that block's hash within the current level.
@@ -129,6 +130,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		if (is_hash_block_verified(vi, block, hblock_idx)) {
 			memcpy(_want_hash, block->kaddr + hoffset, hsize);
 			want_hash = _want_hash;
+			trace_fsverity_merkle_tree_block_verified(inode,
+					hblock_idx,
+					FSVERITY_TRACE_DIR_ASCEND);
 			fsverity_drop_block(inode, block);
 			goto descend;
 		}
@@ -160,6 +164,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		block->verified = true;
 		memcpy(_want_hash, haddr + hoffset, hsize);
 		want_hash = _want_hash;
+		trace_fsverity_merkle_tree_block_verified(inode, hblock_idx,
+				FSVERITY_TRACE_DIR_DESCEND);
 		fsverity_drop_block(inode, block);
 	}
 
@@ -334,6 +340,8 @@ void fsverity_invalidate_range(struct inode *inode, loff_t offset,
 		return;
 	}
 
+	trace_fsverity_invalidate_blocks(inode, index, blocks);
+
 	for (i = 0; i < blocks; i++)
 		clear_bit(index + i, vi->hash_block_verified);
 }
@@ -440,6 +448,8 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 	int err = 0;
 	unsigned long index = pos >> PAGE_SHIFT;
 
+	trace_fsverity_read_merkle_tree_block(inode, pos, log_blocksize);
+
 	if (inode->i_sb->s_vop->read_merkle_tree_block)
 		return inode->i_sb->s_vop->read_merkle_tree_block(
 			inode, pos, block, log_blocksize, ra_bytes);
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
new file mode 100644
index 000000000000..3cc429d21443
--- /dev/null
+++ b/include/trace/events/fsverity.h
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fsverity
+
+#if !defined(_TRACE_FSVERITY_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FSVERITY_H
+
+#include <linux/tracepoint.h>
+
+struct fsverity_descriptor;
+struct merkle_tree_params;
+struct fsverity_info;
+
+#define FSVERITY_TRACE_DIR_ASCEND	(1ul << 0)
+#define FSVERITY_TRACE_DIR_DESCEND	(1ul << 1)
+#define FSVERITY_HASH_SHOWN_LEN		20
+
+TRACE_EVENT(fsverity_enable,
+	TP_PROTO(struct inode *inode, struct fsverity_descriptor *desc,
+		struct merkle_tree_params *params),
+	TP_ARGS(inode, desc, params),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, data_size)
+		__field(unsigned int, block_size)
+		__field(unsigned int, num_levels)
+		__field(u64, tree_size)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->data_size = desc->data_size;
+		__entry->block_size = params->block_size;
+		__entry->num_levels = params->num_levels;
+		__entry->tree_size = params->tree_size;
+	),
+	TP_printk("ino %lu data size %llu tree size %llu block size %u levels %u",
+		(unsigned long) __entry->ino,
+		__entry->data_size,
+		__entry->tree_size,
+		__entry->block_size,
+		__entry->num_levels)
+);
+
+TRACE_EVENT(fsverity_tree_done,
+	TP_PROTO(struct inode *inode, struct fsverity_descriptor *desc,
+		struct merkle_tree_params *params),
+	TP_ARGS(inode, desc, params),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(unsigned int, levels)
+		__field(unsigned int, tree_blocks)
+		__field(u64, tree_size)
+		__array(u8, tree_hash, 64)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->levels = params->num_levels;
+		__entry->tree_blocks =
+			params->tree_size >> params->log_blocksize;
+		__entry->tree_size = params->tree_size;
+		memcpy(__entry->tree_hash, desc->root_hash, 64);
+	),
+	TP_printk("ino %lu levels %d tree_blocks %d tree_size %lld root_hash %s",
+		(unsigned long) __entry->ino,
+		__entry->levels,
+		__entry->tree_blocks,
+		__entry->tree_size,
+		__print_hex(__entry->tree_hash, 64))
+);
+
+TRACE_EVENT(fsverity_verify_block,
+	TP_PROTO(struct inode *inode, u64 offset),
+	TP_ARGS(inode, offset),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, offset)
+		__field(unsigned int, block_size)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->offset = offset;
+		__entry->block_size =
+			inode->i_verity_info->tree_params.block_size;
+	),
+	TP_printk("ino %lu data offset %lld data block size %u",
+		(unsigned long) __entry->ino,
+		__entry->offset,
+		__entry->block_size)
+);
+
+TRACE_EVENT(fsverity_merkle_tree_block_verified,
+	TP_PROTO(struct inode *inode, u64 index, u8 direction),
+	TP_ARGS(inode, index, direction),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, index)
+		__field(u8, direction)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->index = index;
+		__entry->direction = direction;
+	),
+	TP_printk("ino %lu block index %llu %s",
+		(unsigned long) __entry->ino,
+		__entry->index,
+		__entry->direction == 0 ? "ascend" : "descend")
+);
+
+TRACE_EVENT(fsverity_invalidate_blocks,
+	TP_PROTO(struct inode *inode, u64 index, size_t blocks),
+	TP_ARGS(inode, index, blocks),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(unsigned int, block_size)
+		__field(u64, offset)
+		__field(u64, index)
+		__field(size_t, blocks)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->block_size = inode->i_verity_info->tree_params.log_blocksize;
+		__entry->offset = index << __entry->block_size;
+		__entry->index = index;
+		__entry->blocks = blocks;
+	),
+	TP_printk("ino %lu tree offset %llu block index %llu num blocks %zx",
+		(unsigned long) __entry->ino,
+		__entry->offset,
+		__entry->index,
+		__entry->blocks)
+);
+
+TRACE_EVENT(fsverity_read_merkle_tree_block,
+	TP_PROTO(struct inode *inode, u64 offset, unsigned int log_blocksize),
+	TP_ARGS(inode, offset, log_blocksize),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, offset)
+		__field(u64, index)
+		__field(unsigned int, block_size)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->offset = offset;
+		__entry->index = offset >> log_blocksize;
+		__entry->block_size = 1 << log_blocksize;
+	),
+	TP_printk("ino %lu tree offset %llu block index %llu block hize %u",
+		(unsigned long) __entry->ino,
+		__entry->offset,
+		__entry->index,
+		__entry->block_size)
+);
+
+TRACE_EVENT(fsverity_verify_signature,
+	TP_PROTO(const struct inode *inode, const u8 *signature, size_t sig_size),
+	TP_ARGS(inode, signature, sig_size),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__dynamic_array(u8, signature, sig_size)
+		__field(size_t, sig_size)
+		__field(size_t, sig_size_show)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		memcpy(__get_dynamic_array(signature), signature, sig_size);
+		__entry->sig_size = sig_size;
+		__entry->sig_size_show = (sig_size > FSVERITY_HASH_SHOWN_LEN ?
+			FSVERITY_HASH_SHOWN_LEN : sig_size);
+	),
+	TP_printk("ino %lu sig_size %lu %s%s%s",
+		(unsigned long) __entry->ino,
+		__entry->sig_size,
+		(__entry->sig_size ? "sig " : ""),
+		__print_hex(__get_dynamic_array(signature),
+			__entry->sig_size_show),
+		(__entry->sig_size ? "..." : ""))
+);
+
+#endif /* _TRACE_FSVERITY_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 10/25] iomap: integrate fsverity verification into iomap's read path
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (8 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 09/25] fsverity: add tracepoints Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 11/25] xfs: add XBF_VERITY_SEEN xfs_buf flag Andrey Albershteyn
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn, Christoph Hellwig

This patch adds fsverity verification into iomap's read path. After
BIO's io operation is complete the data are verified against
fsverity's Merkle tree. Verification work is done in a separate
workqueue.

Even though fsverity can create its own workqueue, this patch allows
filesystems to pass any workqueue for fs-verity verification work
items. This is handy for XFS as fsverity's high priority global
workqueue isn't the best fit (potential livelock, global
cross-filesystem queue).

The read path ioend iomap_read_ioend are stored side by side with
BIOs if FS_VERITY is enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/erofs/data.c        |   4 +-
 fs/gfs2/aops.c         |   4 +-
 fs/iomap/buffered-io.c | 102 ++++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_aops.c      |   4 +-
 fs/zonefs/file.c       |   4 +-
 include/linux/iomap.h  |   6 ++-
 6 files changed, 103 insertions(+), 21 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index c98aeda8abb2..462917830b50 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -356,12 +356,12 @@ int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
  */
 static int erofs_read_folio(struct file *file, struct folio *folio)
 {
-	return iomap_read_folio(folio, &erofs_iomap_ops);
+	return iomap_read_folio(folio, &erofs_iomap_ops, NULL);
 }
 
 static void erofs_readahead(struct readahead_control *rac)
 {
-	return iomap_readahead(rac, &erofs_iomap_ops);
+	return iomap_readahead(rac, &erofs_iomap_ops, NULL);
 }
 
 static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 974aca9c8ea8..ede423796125 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -452,7 +452,7 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
 
 	if (!gfs2_is_jdata(ip) ||
 	    (i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
-		error = iomap_read_folio(folio, &gfs2_iomap_ops);
+		error = iomap_read_folio(folio, &gfs2_iomap_ops, NULL);
 	} else if (gfs2_is_stuffed(ip)) {
 		error = stuffed_read_folio(ip, folio);
 	} else {
@@ -527,7 +527,7 @@ static void gfs2_readahead(struct readahead_control *rac)
 	else if (gfs2_is_jdata(ip))
 		mpage_readahead(rac, gfs2_block_map);
 	else
-		iomap_readahead(rac, &gfs2_iomap_ops);
+		iomap_readahead(rac, &gfs2_iomap_ops, NULL);
 }
 
 /**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 093c4515b22a..719c3dec9652 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/compiler.h>
 #include <linux/fs.h>
+#include <linux/fsverity.h>
 #include <linux/iomap.h>
 #include <linux/pagemap.h>
 #include <linux/uio.h>
@@ -289,6 +290,7 @@ struct iomap_readpage_ctx {
 	bool			cur_folio_in_bio;
 	struct bio		*bio;
 	struct readahead_control *rac;
+	struct workqueue_struct	*wq;
 };
 
 /**
@@ -330,6 +332,57 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
 		pos >= i_size_read(iter->inode);
 }
 
+#ifdef CONFIG_FS_VERITY
+struct iomap_fsverity_bio {
+	struct work_struct	work;
+	struct bio		bio;
+};
+static struct bio_set iomap_fsverity_bioset;
+
+static void
+iomap_read_fsverify_end_io_work(struct work_struct *work)
+{
+	struct iomap_fsverity_bio *fbio =
+		container_of(work, struct iomap_fsverity_bio, work);
+
+	fsverity_verify_bio(&fbio->bio);
+	iomap_read_end_io(&fbio->bio);
+}
+
+static void
+iomap_read_fsverity_end_io(struct bio *bio)
+{
+	struct iomap_fsverity_bio *fbio =
+		container_of(bio, struct iomap_fsverity_bio, bio);
+
+	INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
+	queue_work(bio->bi_private, &fbio->work);
+}
+#endif /* CONFIG_FS_VERITY */
+
+static struct bio *iomap_read_bio_alloc(struct inode *inode,
+		struct block_device *bdev, int nr_vecs, gfp_t gfp,
+		struct workqueue_struct *wq)
+{
+	struct bio *bio;
+
+#ifdef CONFIG_FS_VERITY
+	if (fsverity_active(inode)) {
+		bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
+					&iomap_fsverity_bioset);
+		if (bio) {
+			bio->bi_private = wq;
+			bio->bi_end_io = iomap_read_fsverity_end_io;
+		}
+		return bio;
+	}
+#endif
+	bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
+	if (bio)
+		bio->bi_end_io = iomap_read_end_io;
+	return bio;
+}
+
 static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		struct iomap_readpage_ctx *ctx, loff_t offset)
 {
@@ -353,6 +406,12 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 	if (iomap_block_needs_zeroing(iter, pos)) {
 		folio_zero_range(folio, poff, plen);
+		if (fsverity_active(iter->inode) &&
+		    !fsverity_verify_blocks(folio, plen, poff)) {
+			folio_set_error(folio);
+			goto done;
+		}
+
 		iomap_set_range_uptodate(folio, poff, plen);
 		goto done;
 	}
@@ -370,28 +429,29 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 	    !bio_add_folio(ctx->bio, folio, plen, poff)) {
 		gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
 		gfp_t orig_gfp = gfp;
-		unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
 
 		if (ctx->bio)
 			submit_bio(ctx->bio);
 
 		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
-		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
-				     REQ_OP_READ, gfp);
+
+		ctx->bio = iomap_read_bio_alloc(iter->inode, iomap->bdev,
+				bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
+				gfp, ctx->wq);
+
 		/*
 		 * If the bio_alloc fails, try it again for a single page to
 		 * avoid having to deal with partial page reads.  This emulates
 		 * what do_mpage_read_folio does.
 		 */
 		if (!ctx->bio) {
-			ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
-					     orig_gfp);
+			ctx->bio = iomap_read_bio_alloc(iter->inode,
+					iomap->bdev, 1, orig_gfp, ctx->wq);
 		}
 		if (ctx->rac)
 			ctx->bio->bi_opf |= REQ_RAHEAD;
 		ctx->bio->bi_iter.bi_sector = sector;
-		ctx->bio->bi_end_io = iomap_read_end_io;
 		bio_add_folio_nofail(ctx->bio, folio, plen, poff);
 	}
 
@@ -405,7 +465,8 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 	return pos - orig_pos + plen;
 }
 
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
+int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
+		struct workqueue_struct *wq)
 {
 	struct iomap_iter iter = {
 		.inode		= folio->mapping->host,
@@ -414,6 +475,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
 	};
 	struct iomap_readpage_ctx ctx = {
 		.cur_folio	= folio,
+		.wq		= wq,
 	};
 	int ret;
 
@@ -471,6 +533,7 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter,
  * iomap_readahead - Attempt to read pages from a file.
  * @rac: Describes the pages to be read.
  * @ops: The operations vector for the filesystem.
+ * @wq: Workqueue for post-I/O processing (only need for fsverity)
  *
  * This function is for filesystems to call to implement their readahead
  * address_space operation.
@@ -482,7 +545,8 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter,
  * function is called with memalloc_nofs set, so allocations will not cause
  * the filesystem to be reentered.
  */
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops,
+		struct workqueue_struct *wq)
 {
 	struct iomap_iter iter = {
 		.inode	= rac->mapping->host,
@@ -491,6 +555,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 	};
 	struct iomap_readpage_ctx ctx = {
 		.rac	= rac,
+		.wq	= wq,
 	};
 
 	trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
@@ -1996,10 +2061,25 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 }
 EXPORT_SYMBOL_GPL(iomap_writepages);
 
+#define IOMAP_POOL_SIZE		(4 * (PAGE_SIZE / SECTOR_SIZE))
+
 static int __init iomap_init(void)
 {
-	return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
-			   offsetof(struct iomap_ioend, io_inline_bio),
-			   BIOSET_NEED_BVECS);
+	int error;
+
+	error = bioset_init(&iomap_ioend_bioset, IOMAP_POOL_SIZE,
+			    offsetof(struct iomap_ioend, io_inline_bio),
+			    BIOSET_NEED_BVECS);
+#ifdef CONFIG_FS_VERITY
+	if (error)
+		return error;
+
+	error = bioset_init(&iomap_fsverity_bioset, IOMAP_POOL_SIZE,
+			    offsetof(struct iomap_fsverity_bio, bio),
+			    BIOSET_NEED_BVECS);
+	if (error)
+		bioset_exit(&iomap_ioend_bioset);
+#endif
+	return error;
 }
 fs_initcall(iomap_init);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 813f85156b0c..7a6627404160 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -553,14 +553,14 @@ xfs_vm_read_folio(
 	struct file		*unused,
 	struct folio		*folio)
 {
-	return iomap_read_folio(folio, &xfs_read_iomap_ops);
+	return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
 }
 
 STATIC void
 xfs_vm_readahead(
 	struct readahead_control	*rac)
 {
-	iomap_readahead(rac, &xfs_read_iomap_ops);
+	iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
 }
 
 static int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 6ab2318a9c8e..d7a166bf15ac 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -112,12 +112,12 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
 
 static int zonefs_read_folio(struct file *unused, struct folio *folio)
 {
-	return iomap_read_folio(folio, &zonefs_read_iomap_ops);
+	return iomap_read_folio(folio, &zonefs_read_iomap_ops, NULL);
 }
 
 static void zonefs_readahead(struct readahead_control *rac)
 {
-	iomap_readahead(rac, &zonefs_read_iomap_ops);
+	iomap_readahead(rac, &zonefs_read_iomap_ops, NULL);
 }
 
 /*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 96dd0acbba44..c7522eb3a8ea 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -262,8 +262,10 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 		struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
 		int (*punch)(struct inode *inode, loff_t pos, loff_t length));
 
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
-void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
+int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops,
+		struct workqueue_struct *wq);
+void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops,
+		struct workqueue_struct *wq);
 bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
 struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
 bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 11/25] xfs: add XBF_VERITY_SEEN xfs_buf flag
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (9 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 10/25] iomap: integrate fsverity verification into iomap's read path Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 12/25] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer Andrey Albershteyn
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

One of essential ideas of fs-verity is that pages which are already
verified won't need to be re-verified if they still in page cache.

XFS will store Merkle tree blocks in extended file attributes. When
read extended attribute data is put into xfs_buf.

fs-verity uses PG_checked flag to track status of the blocks in the
page. This flag can has two meanings - page was re-instantiated and
the only block in the page is verified.

However, in XFS, the data in the buffer is not aligned with xfs_buf
pages and we don't have a reference to these pages. Moreover, these
pages are released when value is copied out in xfs_attr code. In
other words, we can not directly mark underlying xfs_buf's pages as
verified as it's done by fs-verity for other filesystems.

One way to track that these pages were processed by fs-verity is to
mark buffer as verified instead. If buffer is evicted the incore
XBF_VERITY_SEEN flag is lost. When the xattr is read again
xfs_attr_get() returns new buffer without the flag. The xfs_buf's
flag is then used to tell fs-verity this buffer was cached or not.

The second state indicated by PG_checked is if the only block in the
PAGE is verified. This is not the case for XFS as there could be
multiple blocks in single buffer (page size 64k block size 4k). This
is handled by fs-verity bitmap. fs-verity is always uses bitmap for
XFS despite of Merkle tree block size.

The meaning of the flag is that value of the extended attribute in
the buffer is processed by fs-verity.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_buf.h | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index b470de08a46c..8f418f726592 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -24,14 +24,15 @@ struct xfs_buf;
 
 #define XFS_BUF_DADDR_NULL	((xfs_daddr_t) (-1LL))
 
-#define XBF_READ	 (1u << 0) /* buffer intended for reading from device */
-#define XBF_WRITE	 (1u << 1) /* buffer intended for writing to device */
-#define XBF_READ_AHEAD	 (1u << 2) /* asynchronous read-ahead */
-#define XBF_NO_IOACCT	 (1u << 3) /* bypass I/O accounting (non-LRU bufs) */
-#define XBF_ASYNC	 (1u << 4) /* initiator will not wait for completion */
-#define XBF_DONE	 (1u << 5) /* all pages in the buffer uptodate */
-#define XBF_STALE	 (1u << 6) /* buffer has been staled, do not find it */
-#define XBF_WRITE_FAIL	 (1u << 7) /* async writes have failed on this buffer */
+#define XBF_READ		(1u << 0) /* buffer intended for reading from device */
+#define XBF_WRITE		(1u << 1) /* buffer intended for writing to device */
+#define XBF_READ_AHEAD		(1u << 2) /* asynchronous read-ahead */
+#define XBF_NO_IOACCT		(1u << 3) /* bypass I/O accounting (non-LRU bufs) */
+#define XBF_ASYNC		(1u << 4) /* initiator will not wait for completion */
+#define XBF_DONE		(1u << 5) /* all pages in the buffer uptodate */
+#define XBF_STALE		(1u << 6) /* buffer has been staled, do not find it */
+#define XBF_WRITE_FAIL		(1u << 7) /* async writes have failed on this buffer */
+#define XBF_VERITY_SEEN		(1u << 8) /* buffer was processed by fs-verity */
 
 /* buffer type flags for write callbacks */
 #define _XBF_INODES	 (1u << 16)/* inode buffer */
@@ -65,6 +66,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_STALE,		"STALE" }, \
 	{ XBF_WRITE_FAIL,	"WRITE_FAIL" }, \
+	{ XBF_VERITY_SEEN,	"VERITY_SEEN" }, \
 	{ _XBF_INODES,		"INODES" }, \
 	{ _XBF_DQUOTS,		"DQUOTS" }, \
 	{ _XBF_LOGRECOVERY,	"LOG_RECOVERY" }, \
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 12/25] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (10 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 11/25] xfs: add XBF_VERITY_SEEN xfs_buf flag Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 13/25] xfs: introduce workqueue for post read IO work Andrey Albershteyn
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

With XBF_VERITY_SEEN flag on xfs_buf XFS can track which buffers
contain verified Merkle tree blocks. However, we also need to expose
the buffer to pass a reference of underlying page to fs-verity.

This patch adds XFS_DA_OP_BUFFER to tell xfs_attr_get() to
xfs_buf_hold() underlying buffer and return it as xfs_da_args->bp.
The caller must then xfs_buf_rele() the buffer. Therefore, XFS will
hold a reference to xfs_buf till fs-verity is verifying xfs_buf's
content.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c        |  5 ++++-
 fs/xfs/libxfs/xfs_attr_leaf.c   |  7 +++++++
 fs/xfs/libxfs/xfs_attr_remote.c | 13 +++++++++++--
 fs/xfs/libxfs/xfs_da_btree.h    |  5 ++++-
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index f9846df41669..8e3138af4a5f 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -252,6 +252,8 @@ xfs_attr_get_ilocked(
  * If the attribute is found, but exceeds the size limit set by the caller in
  * args->valuelen, return -ERANGE with the size of the attribute that was found
  * in args->valuelen.
+ *
+ * Using XFS_DA_OP_BUFFER the caller have to release the buffer args->bp.
  */
 int
 xfs_attr_get(
@@ -270,7 +272,8 @@ xfs_attr_get(
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
 	/* Entirely possible to look up a name which doesn't exist */
-	args->op_flags = XFS_DA_OP_OKNOENT;
+	args->op_flags = XFS_DA_OP_OKNOENT |
+					(args->op_flags & XFS_DA_OP_BUFFER);
 
 	lock_mode = xfs_ilock_attr_map_shared(args->dp);
 	error = xfs_attr_get_ilocked(args);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 6374bf107242..51aa5d5df76c 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -2449,6 +2449,13 @@ xfs_attr3_leaf_getvalue(
 		name_loc = xfs_attr3_leaf_name_local(leaf, args->index);
 		ASSERT(name_loc->namelen == args->namelen);
 		ASSERT(memcmp(args->name, name_loc->nameval, args->namelen) == 0);
+
+		/* must be released by the caller */
+		if (args->op_flags & XFS_DA_OP_BUFFER) {
+			xfs_buf_hold(bp);
+			args->bp = bp;
+		}
+
 		return xfs_attr_copy_value(args,
 					&name_loc->nameval[args->namelen],
 					be16_to_cpu(name_loc->valuelen));
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index d440393b40eb..72908e0e1c86 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -424,9 +424,18 @@ xfs_attr_rmtval_get(
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
 							&offset, &valuelen,
 							&dst);
-			xfs_buf_relse(bp);
-			if (error)
+			xfs_buf_unlock(bp);
+			/* must be released by the caller */
+			if (args->op_flags & XFS_DA_OP_BUFFER)
+				args->bp = bp;
+			else
+				xfs_buf_rele(bp);
+
+			if (error) {
+				if (args->op_flags & XFS_DA_OP_BUFFER)
+					xfs_buf_rele(args->bp);
 				return error;
+			}
 
 			/* roll attribute extent map forwards */
 			lblkno += map[i].br_blockcount;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 706baf36e175..1534f4102a47 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -59,6 +59,7 @@ typedef struct xfs_da_args {
 	uint8_t		filetype;	/* filetype of inode for directories */
 	void		*value;		/* set of bytes (maybe contain NULLs) */
 	int		valuelen;	/* length of value */
+	struct xfs_buf	*bp;		/* OUT: xfs_buf which contains the attr */
 	unsigned int	attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
 	unsigned int	attr_flags;	/* XATTR_{CREATE,REPLACE} */
 	xfs_dahash_t	hashval;	/* hash value of name */
@@ -93,6 +94,7 @@ typedef struct xfs_da_args {
 #define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
 #define XFS_DA_OP_RECOVERY	(1u << 7) /* Log recovery operation */
 #define XFS_DA_OP_LOGGED	(1u << 8) /* Use intent items to track op */
+#define XFS_DA_OP_BUFFER	(1u << 9) /* Return underlying buffer */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
@@ -103,7 +105,8 @@ typedef struct xfs_da_args {
 	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
 	{ XFS_DA_OP_REMOVE,	"REMOVE" }, \
 	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }, \
-	{ XFS_DA_OP_LOGGED,	"LOGGED" }
+	{ XFS_DA_OP_LOGGED,	"LOGGED" }, \
+	{ XFS_DA_OP_BUFFER,	"BUFFER" }
 
 /*
  * Storage for holding state during Btree searches and split/join ops.
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 13/25] xfs: introduce workqueue for post read IO work
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (11 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 12/25] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-15 22:11   ` Dave Chinner
  2024-02-12 16:58 ` [PATCH v4 14/25] xfs: add attribute type for fs-verity Andrey Albershteyn
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

As noted by Dave there are two problems with using fs-verity's
workqueue in XFS:

1. High priority workqueues are used within XFS to ensure that data
   IO completion cannot stall processing of journal IO completions.
   Hence using a WQ_HIGHPRI workqueue directly in the user data IO
   path is a potential filesystem livelock/deadlock vector.

2. The fsverity workqueue is global - it creates a cross-filesystem
   contention point.

This patch adds per-filesystem, per-cpu workqueue for fsverity
work.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_aops.c  | 15 +++++++++++++--
 fs/xfs/xfs_linux.h |  1 +
 fs/xfs/xfs_mount.h |  1 +
 fs/xfs/xfs_super.c |  9 +++++++++
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 7a6627404160..70e444c151b2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -548,19 +548,30 @@ xfs_vm_bmap(
 	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
 }
 
+static inline struct workqueue_struct *
+xfs_fsverity_wq(
+	struct address_space	*mapping)
+{
+	if (fsverity_active(mapping->host))
+		return XFS_I(mapping->host)->i_mount->m_postread_workqueue;
+	return NULL;
+}
+
 STATIC int
 xfs_vm_read_folio(
 	struct file		*unused,
 	struct folio		*folio)
 {
-	return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
+	return iomap_read_folio(folio, &xfs_read_iomap_ops,
+				xfs_fsverity_wq(folio->mapping));
 }
 
 STATIC void
 xfs_vm_readahead(
 	struct readahead_control	*rac)
 {
-	iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
+	iomap_readahead(rac, &xfs_read_iomap_ops,
+			xfs_fsverity_wq(rac->mapping));
 }
 
 static int
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index d7873e0360f0..9c76e025b5d8 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -64,6 +64,7 @@ typedef __u32			xfs_nlink_t;
 #include <linux/xattr.h>
 #include <linux/mnt_idmapping.h>
 #include <linux/debugfs.h>
+#include <linux/fsverity.h>
 
 #include <asm/page.h>
 #include <asm/div64.h>
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 503fe3c7edbf..f64bf75f50d6 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -109,6 +109,7 @@ typedef struct xfs_mount {
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
 	struct workqueue_struct *m_buf_workqueue;
 	struct workqueue_struct	*m_unwritten_workqueue;
+	struct workqueue_struct	*m_postread_workqueue;
 	struct workqueue_struct	*m_reclaim_workqueue;
 	struct workqueue_struct	*m_sync_workqueue;
 	struct workqueue_struct *m_blockgc_wq;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5a2512d20bd0..b2b6c1f24c42 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -553,6 +553,12 @@ xfs_init_mount_workqueues(
 	if (!mp->m_unwritten_workqueue)
 		goto out_destroy_buf;
 
+	mp->m_postread_workqueue = alloc_workqueue("xfs-pread/%s",
+			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
+			0, mp->m_super->s_id);
+	if (!mp->m_postread_workqueue)
+		goto out_destroy_postread;
+
 	mp->m_reclaim_workqueue = alloc_workqueue("xfs-reclaim/%s",
 			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM),
 			0, mp->m_super->s_id);
@@ -586,6 +592,8 @@ xfs_init_mount_workqueues(
 	destroy_workqueue(mp->m_reclaim_workqueue);
 out_destroy_unwritten:
 	destroy_workqueue(mp->m_unwritten_workqueue);
+out_destroy_postread:
+	destroy_workqueue(mp->m_postread_workqueue);
 out_destroy_buf:
 	destroy_workqueue(mp->m_buf_workqueue);
 out:
@@ -601,6 +609,7 @@ xfs_destroy_mount_workqueues(
 	destroy_workqueue(mp->m_inodegc_wq);
 	destroy_workqueue(mp->m_reclaim_workqueue);
 	destroy_workqueue(mp->m_unwritten_workqueue);
+	destroy_workqueue(mp->m_postread_workqueue);
 	destroy_workqueue(mp->m_buf_workqueue);
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 14/25] xfs: add attribute type for fs-verity
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (12 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 13/25] xfs: introduce workqueue for post read IO work Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 15/25] xfs: make xfs_buf_get() to take XBF_* flags Andrey Albershteyn
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

The Merkle tree blocks and descriptor are stored in the extended
attributes of the inode. Add new attribute type for fs-verity
metadata. Add XFS_ATTR_INTERNAL_MASK to skip parent pointer and
fs-verity attributes as those are only for internal use. While we're
at it add a few comments in relevant places that internally visible
attributes are not suppose to be handled via interface defined in
xfs_xattr.c.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h  | 10 +++++++++-
 fs/xfs/libxfs/xfs_log_format.h |  1 +
 fs/xfs/xfs_ioctl.c             |  5 +++++
 fs/xfs/xfs_trace.h             |  3 ++-
 fs/xfs/xfs_xattr.c             | 10 ++++++++++
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 1b79c4de90bc..05b82e5b64fa 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -704,14 +704,22 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
 #define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
+#define	XFS_ATTR_VERITY_BIT	4	/* verity merkle tree and descriptor */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
 #define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
+#define XFS_ATTR_VERITY		(1u << XFS_ATTR_VERITY_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
 #define XFS_ATTR_NSP_ONDISK_MASK \
-			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
+			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT | \
+			 XFS_ATTR_VERITY)
+
+/*
+ * Internal attributes not exposed to the user
+ */
+#define XFS_ATTR_INTERNAL_MASK (XFS_ATTR_PARENT | XFS_ATTR_VERITY)
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index eb7406c6ea41..8bc83d9645fe 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -973,6 +973,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
 					 XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index f02b6e558af5..048d83acda0a 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -352,6 +352,11 @@ static unsigned int
 xfs_attr_filter(
 	u32			ioc_flags)
 {
+	/*
+	 * Only externally visible attributes should be specified here.
+	 * Internally used attributes (such as parent pointers or fs-verity)
+	 * should not be exposed to userspace.
+	 */
 	if (ioc_flags & XFS_IOC_ATTR_ROOT)
 		return XFS_ATTR_ROOT;
 	if (ioc_flags & XFS_IOC_ATTR_SECURE)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 07e8a69f8e56..0dd78a43c1f1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -84,7 +84,8 @@ struct xfs_perag;
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
 	{ XFS_ATTR_SECURE,	"SECURE" }, \
 	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }, \
-	{ XFS_ATTR_PARENT,	"PARENT" }
+	{ XFS_ATTR_PARENT,	"PARENT" }, \
+	{ XFS_ATTR_VERITY,	"VERITY" }
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 364104e1b38a..e4c88dde4e44 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -20,6 +20,13 @@
 
 #include <linux/posix_acl_xattr.h>
 
+/*
+ * This file defines interface to work with externally visible extended
+ * attributes, such as those in user, system or security namespaces. This
+ * interface should not be used for internally used attributes (consider
+ * xfs_attr.c).
+ */
+
 /*
  * Get permission to use log-assisted atomic exchange of file extents.
  *
@@ -244,6 +251,9 @@ xfs_xattr_put_listent(
 
 	ASSERT(context->count >= 0);
 
+	if (flags & XFS_ATTR_INTERNAL_MASK)
+		return;
+
 	if (flags & XFS_ATTR_ROOT) {
 #ifdef CONFIG_XFS_POSIX_ACL
 		if (namelen == SGI_ACL_FILE_SIZE &&
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 15/25] xfs: make xfs_buf_get() to take XBF_* flags
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (13 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 14/25] xfs: add attribute type for fs-verity Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 16/25] xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer Andrey Albershteyn
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Allow passing XBF_* buffer flags from xfs_buf_get(). This will allow
fs-verity to specify flag for increased buffer size.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c | 2 +-
 fs/xfs/libxfs/xfs_sb.c          | 2 +-
 fs/xfs/xfs_buf.h                | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 72908e0e1c86..5762135dc2a6 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -521,7 +521,7 @@ xfs_attr_rmtval_set_value(
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
 		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
-		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, &bp);
+		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, 0, &bp);
 		if (error)
 			return error;
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 5bb6e2bd6dee..f08108c9a297 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1100,7 +1100,7 @@ xfs_update_secondary_sbs(
 
 		error = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AG_DADDR(mp, pag->pag_agno, XFS_SB_DADDR),
-				 XFS_FSS_TO_BB(mp, 1), &bp);
+				 XFS_FSS_TO_BB(mp, 1), 0, &bp);
 		/*
 		 * If we get an error reading or writing alternate superblocks,
 		 * continue.  xfs_repair chooses the "best" superblock based
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8f418f726592..80566ee444f8 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -245,11 +245,12 @@ xfs_buf_get(
 	struct xfs_buftarg	*target,
 	xfs_daddr_t		blkno,
 	size_t			numblks,
+	xfs_buf_flags_t		flags,
 	struct xfs_buf		**bpp)
 {
 	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
 
-	return xfs_buf_get_map(target, &map, 1, 0, bpp);
+	return xfs_buf_get_map(target, &map, 1, flags, bpp);
 }
 
 static inline int
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 16/25] xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (14 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 15/25] xfs: make xfs_buf_get() to take XBF_* flags Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 17/25] xfs: add fs-verity ro-compat flag Andrey Albershteyn
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

For fs-verity integration, XFS needs to supply kaddr'es of Merkle
tree blocks to fs-verity core and track which blocks are already
verified. One way to track verified status is to set xfs_buf flag
(previously added XBF_VERITY_SEEN). When xfs_buf is evicted from
memory we loose verified status. Otherwise, fs-verity hits the
xfs_buf which is still in cache and contains already verified blocks.

However, the leaf blocks which are read to the xfs_buf contains leaf
headers. xfs_attr_get() allocates new pages and copies out the data
without header. Those newly allocated pages with extended attribute
data are not attached to the buffer anymore.

Add new XBF_DOUBLE_ALLOC which makes xfs_buf allocates x2 memory for
the buffer. Additional memory will be used for a copy of the
attribute data but without any headers. Also, make
xfs_attr_rmtval_get() to copy data to the buffer itself if XFS asked
for fs-verity block.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c | 26 ++++++++++++++++++++++++--
 fs/xfs/xfs_buf.c                |  6 +++++-
 fs/xfs/xfs_buf.h                |  2 ++
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 5762135dc2a6..1d32041412cc 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -392,12 +392,22 @@ xfs_attr_rmtval_get(
 	int			blkcnt = args->rmtblkcnt;
 	int			i;
 	int			offset = 0;
+	int			flags = 0;
+	void			*addr;
 
 	trace_xfs_attr_rmtval_get(args);
 
 	ASSERT(args->valuelen != 0);
 	ASSERT(args->rmtvaluelen == args->valuelen);
 
+	/*
+	 * We also check for _OP_BUFFER as we want to trigger on
+	 * verity blocks only, not on verity_descriptor
+	 */
+	if (args->attr_filter & XFS_ATTR_VERITY &&
+			args->op_flags & XFS_DA_OP_BUFFER)
+		flags = XBF_DOUBLE_ALLOC;
+
 	valuelen = args->rmtvaluelen;
 	while (valuelen > 0) {
 		nmap = ATTR_RMTVALUE_MAPSIZE;
@@ -417,10 +427,21 @@ xfs_attr_rmtval_get(
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
 			error = xfs_buf_read(mp->m_ddev_targp, dblkno, dblkcnt,
-					0, &bp, &xfs_attr3_rmt_buf_ops);
+					flags, &bp, &xfs_attr3_rmt_buf_ops);
 			if (error)
 				return error;
 
+			/*
+			 * For fs-verity we allocated more space. That space is
+			 * filled with the same xattr data but without leaf
+			 * headers. Point args->value to that data
+			 */
+			if (flags & XBF_DOUBLE_ALLOC) {
+				addr = xfs_buf_offset(bp, BBTOB(bp->b_length));
+				args->value = addr;
+				dst = addr;
+			}
+
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
 							&offset, &valuelen,
 							&dst);
@@ -521,7 +542,8 @@ xfs_attr_rmtval_set_value(
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
 		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
-		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, 0, &bp);
+		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt,
+				XBF_DOUBLE_ALLOC, &bp);
 		if (error)
 			return error;
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 8e5bd50d29fe..2645e64f2439 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -328,6 +328,9 @@ xfs_buf_alloc_kmem(
 	xfs_km_flags_t	kmflag_mask = KM_NOFS;
 	size_t		size = BBTOB(bp->b_length);
 
+	if (flags & XBF_DOUBLE_ALLOC)
+		size *= 2;
+
 	/* Assure zeroed buffer for non-read cases. */
 	if (!(flags & XBF_READ))
 		kmflag_mask |= KM_ZERO;
@@ -358,6 +361,7 @@ xfs_buf_alloc_pages(
 {
 	gfp_t		gfp_mask = __GFP_NOWARN;
 	long		filled = 0;
+	int		mul = (bp->b_flags & XBF_DOUBLE_ALLOC) ? 2 : 1;
 
 	if (flags & XBF_READ_AHEAD)
 		gfp_mask |= __GFP_NORETRY;
@@ -365,7 +369,7 @@ xfs_buf_alloc_pages(
 		gfp_mask |= GFP_NOFS;
 
 	/* Make sure that we have a page list */
-	bp->b_page_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE);
+	bp->b_page_count = DIV_ROUND_UP(BBTOB(bp->b_length*mul), PAGE_SIZE);
 	if (bp->b_page_count <= XB_PAGES) {
 		bp->b_pages = bp->b_page_array;
 	} else {
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 80566ee444f8..8ca8760c401e 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -33,6 +33,7 @@ struct xfs_buf;
 #define XBF_STALE		(1u << 6) /* buffer has been staled, do not find it */
 #define XBF_WRITE_FAIL		(1u << 7) /* async writes have failed on this buffer */
 #define XBF_VERITY_SEEN		(1u << 8) /* buffer was processed by fs-verity */
+#define XBF_DOUBLE_ALLOC	(1u << 9) /* double allocated space */
 
 /* buffer type flags for write callbacks */
 #define _XBF_INODES	 (1u << 16)/* inode buffer */
@@ -67,6 +68,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_STALE,		"STALE" }, \
 	{ XBF_WRITE_FAIL,	"WRITE_FAIL" }, \
 	{ XBF_VERITY_SEEN,	"VERITY_SEEN" }, \
+	{ XBF_DOUBLE_ALLOC,	"DOUBLE_ALLOC" }, \
 	{ _XBF_INODES,		"INODES" }, \
 	{ _XBF_DQUOTS,		"DQUOTS" }, \
 	{ _XBF_LOGRECOVERY,	"LOG_RECOVERY" }, \
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 17/25] xfs: add fs-verity ro-compat flag
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (15 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 16/25] xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 18/25] xfs: add inode on-disk VERITY flag Andrey Albershteyn
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

To mark inodes with fs-verity enabled the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h | 1 +
 fs/xfs/libxfs/xfs_sb.c     | 2 ++
 fs/xfs/xfs_mount.h         | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 382ab1e71c0b..e36718c93539 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index f08108c9a297..dcb6b15714b1 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -163,6 +163,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+		features |= XFS_FEAT_VERITY;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f64bf75f50d6..5de007989b71 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -290,6 +290,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_BIGTIME	(1ULL << 24)	/* large timestamps */
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
+#define XFS_FEAT_VERITY		(1ULL << 27)	/* fs-verity */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -353,6 +354,7 @@ __XFS_HAS_FEAT(inobtcounts, INOBTCNT)
 __XFS_HAS_FEAT(bigtime, BIGTIME)
 __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
 __XFS_HAS_FEAT(large_extent_counts, NREXT64)
+__XFS_HAS_FEAT(verity, VERITY)
 
 /*
  * Mount features
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 18/25] xfs: add inode on-disk VERITY flag
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (16 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 17/25] xfs: add fs-verity ro-compat flag Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 19/25] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 4 +++-
 fs/xfs/xfs_inode.c         | 2 ++
 fs/xfs/xfs_iops.c          | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e36718c93539..ea78b595aa97 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1086,16 +1086,18 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
+#define XFS_DIFLAG2_VERITY_BIT	5	/* inode sealed by fsverity */
 
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_VERITY	(1 << XFS_DIFLAG2_VERITY_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_VERITY)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 1fd94958aa97..6289a0c49780 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -629,6 +629,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+			flags |= FS_XFLAG_VERITY;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a0d77f5f512e..8972274b8bc0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1242,6 +1242,8 @@ xfs_diflags_to_iflags(
 		flags |= S_NOATIME;
 	if (init && xfs_inode_should_enable_dax(ip))
 		flags |= S_DAX;
+	if (xflags & FS_XFLAG_VERITY)
+		flags |= S_VERITY;
 
 	/*
 	 * S_DAX can only be set during inode initialization and is never set by
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 19/25] xfs: initialize fs-verity on file open and cleanup on inode destruction
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (17 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 18/25] xfs: add inode on-disk VERITY flag Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 20/25] xfs: don't allow to enable DAX on fs-verity sealsed inode Andrey Albershteyn
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

fs-verity will read and attach metadata (not the tree itself) from
a disk for those inodes which already have fs-verity enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_file.c  | 8 ++++++++
 fs/xfs/xfs_super.c | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e33e5e13b95f..ed36cd088926 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -31,6 +31,7 @@
 #include <linux/mman.h>
 #include <linux/fadvise.h>
 #include <linux/mount.h>
+#include <linux/fsverity.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
 
@@ -1228,10 +1229,17 @@ xfs_file_open(
 	struct inode	*inode,
 	struct file	*file)
 {
+	int		error = 0;
+
 	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
 		return -EIO;
 	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC |
 			FMODE_DIO_PARALLEL_WRITE | FMODE_CAN_ODIRECT;
+
+	error = fsverity_file_open(inode, file);
+	if (error)
+		return error;
+
 	return generic_file_open(inode, file);
 }
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index b2b6c1f24c42..4737101edab9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -48,6 +48,7 @@
 #include <linux/magic.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
+#include <linux/fsverity.h>
 
 static const struct super_operations xfs_super_operations;
 
@@ -672,6 +673,7 @@ xfs_fs_destroy_inode(
 	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 	XFS_STATS_INC(ip->i_mount, vn_rele);
 	XFS_STATS_INC(ip->i_mount, vn_remove);
+	fsverity_cleanup_inode(inode);
 	xfs_inode_mark_reclaimable(ip);
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 20/25] xfs: don't allow to enable DAX on fs-verity sealsed inode
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (18 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 19/25] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 21/25] xfs: disable direct read path for fs-verity files Andrey Albershteyn
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

fs-verity doesn't support DAX. Forbid filesystem to enable DAX on
inodes which already have fs-verity enabled. The opposite is checked
when fs-verity is enabled, it won't be enabled if DAX is.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_iops.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 8972274b8bc0..4cf6b317d018 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1214,6 +1214,8 @@ xfs_inode_should_enable_dax(
 		return false;
 	if (!xfs_inode_supports_dax(ip))
 		return false;
+	if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+		return false;
 	if (xfs_has_dax_always(ip->i_mount))
 		return true;
 	if (ip->i_diflags2 & XFS_DIFLAG2_DAX)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 21/25] xfs: disable direct read path for fs-verity files
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (19 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 20/25] xfs: don't allow to enable DAX on fs-verity sealsed inode Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 22/25] xfs: add fs-verity support Andrey Albershteyn
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

The direct path is not supported on verity files. Attempts to use direct
I/O path on such files should fall back to buffered I/O path.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_file.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index ed36cd088926..011c311efe22 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -281,7 +281,8 @@ xfs_file_dax_read(
 	struct kiocb		*iocb,
 	struct iov_iter		*to)
 {
-	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret = 0;
 
 	trace_xfs_file_dax_read(iocb, to);
@@ -334,10 +335,18 @@ xfs_file_read_iter(
 
 	if (IS_DAX(inode))
 		ret = xfs_file_dax_read(iocb, to);
-	else if (iocb->ki_flags & IOCB_DIRECT)
+	else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
 		ret = xfs_file_dio_read(iocb, to);
-	else
+	else {
+		/*
+		 * In case fs-verity is enabled, we also fallback to the
+		 * buffered read from the direct read path. Therefore,
+		 * IOCB_DIRECT is set and need to be cleared (see
+		 * generic_file_read_iter())
+		 */
+		iocb->ki_flags &= ~IOCB_DIRECT;
 		ret = xfs_file_buffered_read(iocb, to);
+	}
 
 	if (ret > 0)
 		XFS_STATS_ADD(mp, xs_read_bytes, ret);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 22/25] xfs: add fs-verity support
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (20 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 21/25] xfs: disable direct read path for fs-verity files Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 23/25] xfs: make scrub aware of verity dinode flag Andrey Albershteyn
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Add integration with fs-verity. The XFS store fs-verity metadata in
the extended file attributes. The metadata consist of verity
descriptor and Merkle tree blocks.

The descriptor is stored under "vdesc" extended attribute. The
Merkle tree blocks are stored under binary indexes which are offsets
into the Merkle tree.

When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
flag is set meaning that the Merkle tree is being build. The
initialization ends with storing of verity descriptor and setting
inode on-disk flag (XFS_DIFLAG2_VERITY).

The verification on read is done in read path of iomap.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        |  13 ++
 fs/xfs/libxfs/xfs_attr_leaf.c   |  17 +-
 fs/xfs/libxfs/xfs_attr_remote.c |   8 +-
 fs/xfs/libxfs/xfs_da_format.h   |  27 +++
 fs/xfs/libxfs/xfs_ondisk.h      |   4 +
 fs/xfs/xfs_inode.h              |   3 +-
 fs/xfs/xfs_super.c              |   8 +
 fs/xfs/xfs_verity.c             | 348 ++++++++++++++++++++++++++++++++
 fs/xfs/xfs_verity.h             |  33 +++
 10 files changed, 455 insertions(+), 7 deletions(-)
 create mode 100644 fs/xfs/xfs_verity.c
 create mode 100644 fs/xfs/xfs_verity.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 8be90c685b0b..207a64f47a71 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -131,6 +131,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
 xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
+xfs-$(CONFIG_FS_VERITY)		+= xfs_verity.o
 
 # notify failure
 ifeq ($(CONFIG_MEMORY_FAILURE),y)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 8e3138af4a5f..21ad25bddd5d 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -27,6 +27,7 @@
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
 #include "xfs_parent.h"
+#include "xfs_verity.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1526,6 +1527,18 @@ xfs_attr_namecheck(
 	if (flags & XFS_ATTR_PARENT)
 		return xfs_parent_namecheck(mp, name, length, flags);
 
+	if (flags & XFS_ATTR_VERITY) {
+		/* Merkle tree pages are stored under u64 indexes */
+		if (length == sizeof(struct xfs_fsverity_merkle_key))
+			return true;
+
+		/* Verity descriptor blocks are held in a named attribute. */
+		if (length == XFS_VERITY_DESCRIPTOR_NAME_LEN)
+			return true;
+
+		return false;
+	}
+
 	/*
 	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
 	 * out, so use >= for the length check.
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 51aa5d5df76c..28274d57ba9b 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -29,6 +29,7 @@
 #include "xfs_log.h"
 #include "xfs_ag.h"
 #include "xfs_errortag.h"
+#include "xfs_verity.h"
 
 
 /*
@@ -518,7 +519,12 @@ xfs_attr_copy_value(
 		return -ERANGE;
 	}
 
-	if (!args->value) {
+	/*
+	 * We don't want to allocate memory for fs-verity Merkle tree blocks
+	 * (fs-verity descriptor is fine though). They will be stored in
+	 * underlying xfs_buf
+	 */
+	if (!args->value && !xfs_verity_merkle_block(args)) {
 		args->value = kvmalloc(valuelen, GFP_KERNEL | __GFP_NOLOCKDEP);
 		if (!args->value)
 			return -ENOMEM;
@@ -537,7 +543,14 @@ xfs_attr_copy_value(
 	 */
 	if (!value)
 		return -EINVAL;
-	memcpy(args->value, value, valuelen);
+	/*
+	 * We won't copy Merkle tree block to the args->value as we want it be
+	 * in the xfs_buf. And we didn't allocate any memory in args->value.
+	 */
+	if (xfs_verity_merkle_block(args))
+		args->value = value;
+	else
+		memcpy(args->value, value, valuelen);
 	return 0;
 }
 
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 1d32041412cc..dafb27fb3527 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -22,6 +22,7 @@
 #include "xfs_attr_remote.h"
 #include "xfs_trace.h"
 #include "xfs_error.h"
+#include "xfs_verity.h"
 
 #define ATTR_RMTVALUE_MAPSIZE	1	/* # of map entries at once */
 
@@ -401,11 +402,10 @@ xfs_attr_rmtval_get(
 	ASSERT(args->rmtvaluelen == args->valuelen);
 
 	/*
-	 * We also check for _OP_BUFFER as we want to trigger on
-	 * verity blocks only, not on verity_descriptor
+	 * For fs-verity we want additional space in the xfs_buf. This space is
+	 * used to copy xattr value without leaf headers (crc header).
 	 */
-	if (args->attr_filter & XFS_ATTR_VERITY &&
-			args->op_flags & XFS_DA_OP_BUFFER)
+	if (xfs_verity_merkle_block(args))
 		flags = XBF_DOUBLE_ALLOC;
 
 	valuelen = args->rmtvaluelen;
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 05b82e5b64fa..4d28a64f8cd7 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -903,4 +903,31 @@ struct xfs_parent_name_rec {
  */
 #define XFS_PARENT_DIRENT_NAME_MAX_SIZE		(MAXNAMELEN - 1)
 
+/*
+ * fs-verity attribute name format
+ *
+ * Merkle tree blocks are stored under extended attributes of the inode. The
+ * name of the attributes are offsets into merkle tree.
+ */
+struct xfs_fsverity_merkle_key {
+	__be64 merkleoff;
+};
+
+static inline void
+xfs_fsverity_merkle_key_to_disk(struct xfs_fsverity_merkle_key *key, loff_t pos)
+{
+	key->merkleoff = cpu_to_be64(pos);
+}
+
+static inline loff_t
+xfs_fsverity_name_to_block_offset(unsigned char *name)
+{
+	struct xfs_fsverity_merkle_key key = {
+		.merkleoff = *(__be64 *)name
+	};
+	loff_t offset = be64_to_cpu(key.merkleoff);
+
+	return offset;
+}
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h
index 81885a6a028e..39209943c474 100644
--- a/fs/xfs/libxfs/xfs_ondisk.h
+++ b/fs/xfs/libxfs/xfs_ondisk.h
@@ -194,6 +194,10 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
 			16299260424LL);
+
+	/* fs-verity descriptor xattr name */
+	XFS_CHECK_VALUE(strlen(XFS_VERITY_DESCRIPTOR_NAME),
+			XFS_VERITY_DESCRIPTOR_NAME_LEN);
 }
 
 #endif /* __XFS_ONDISK_H */
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 97f63bacd4c2..97fa5155fcba 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -342,7 +342,8 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
  * inactivation completes, both flags will be cleared and the inode is a
  * plain old IRECLAIMABLE inode.
  */
-#define XFS_INACTIVATING	(1 << 13)
+#define XFS_INACTIVATING		(1 << 13)
+#define XFS_IVERITY_CONSTRUCTION	(1 << 14) /* merkle tree construction */
 
 /* Quotacheck is running but inode has not been added to quota counts. */
 #define XFS_IQUOTAUNCHECKED	(1 << 14)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4737101edab9..3bb4dba3f1ca 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -30,6 +30,7 @@
 #include "xfs_filestream.h"
 #include "xfs_quota.h"
 #include "xfs_sysfs.h"
+#include "xfs_verity.h"
 #include "xfs_ondisk.h"
 #include "xfs_rmap_item.h"
 #include "xfs_refcount_item.h"
@@ -1531,6 +1532,9 @@ xfs_fs_fill_super(
 	sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
 #endif
 	sb->s_op = &xfs_super_operations;
+#ifdef CONFIG_FS_VERITY
+	sb->s_vop = &xfs_verity_ops;
+#endif
 
 	/*
 	 * Delay mount work if the debug hook is set. This is debug
@@ -1740,6 +1744,10 @@ xfs_fs_fill_super(
 		goto out_filestream_unmount;
 	}
 
+	if (xfs_has_verity(mp))
+		xfs_alert(mp,
+	"EXPERIMENTAL fs-verity feature in use. Use at your own risk!");
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;
diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
new file mode 100644
index 000000000000..dfa05cf6518c
--- /dev/null
+++ b/fs/xfs/xfs_verity.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_log_format.h"
+#include "xfs_attr.h"
+#include "xfs_verity.h"
+#include "xfs_bmap_util.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_attr_leaf.h"
+
+/*
+ * Make fs-verity invalidate verified status of Merkle tree block
+ */
+static void
+xfs_verity_put_listent(
+	struct xfs_attr_list_context	*context,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	int				valuelen)
+{
+	/*
+	 * Verity descriptor is smaller than 1024; verity block min size is
+	 * 1024. Exclude verity descriptor
+	 */
+	if (valuelen < 1024)
+		return;
+
+	fsverity_invalidate_range(VFS_I(context->dp),
+				  xfs_fsverity_name_to_block_offset(name),
+				  valuelen);
+}
+
+/*
+ * Iterate over extended attributes in the bp to invalidate Merkle tree blocks
+ */
+static int
+xfs_invalidate_blocks(
+	struct xfs_inode	*ip,
+	struct xfs_buf		*bp)
+{
+	struct xfs_attr_list_context context;
+
+	memset(&context, 0, sizeof(context));
+	context.dp = ip;
+	context.resynch = 0;
+	context.buffer = NULL;
+	context.bufsize = 0;
+	context.firstu = 0;
+	context.attr_filter = XFS_ATTR_VERITY;
+	context.put_listent = xfs_verity_put_listent;
+
+	return xfs_attr3_leaf_list_int(bp, &context);
+}
+
+static int
+xfs_get_verity_descriptor(
+	struct inode		*inode,
+	void			*buf,
+	size_t			buf_size)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error = 0;
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
+		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
+		.value		= buf,
+		.valuelen	= buf_size,
+	};
+
+	/*
+	 * The fact that (returned attribute size) == (provided buf_size) is
+	 * checked by xfs_attr_copy_value() (returns -ERANGE)
+	 */
+	error = xfs_attr_get(&args);
+	if (error)
+		return error;
+
+	return args.valuelen;
+}
+
+static int
+xfs_begin_enable_verity(
+	struct file	    *filp)
+{
+	struct inode	    *inode = file_inode(filp);
+	struct xfs_inode    *ip = XFS_I(inode);
+	int		    error = 0;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+
+	if (IS_DAX(inode))
+		return -EINVAL;
+
+	if (xfs_iflags_test_and_set(ip, XFS_IVERITY_CONSTRUCTION))
+		return -EBUSY;
+
+	return error;
+}
+
+static int
+xfs_drop_merkle_tree(
+	struct xfs_inode		*ip,
+	u64				merkle_tree_size,
+	unsigned int			tree_blocksize)
+{
+	struct xfs_fsverity_merkle_key	name;
+	int				error = 0;
+	u64				offset = 0;
+	struct xfs_da_args		args = {
+		.dp			= ip,
+		.whichfork		= XFS_ATTR_FORK,
+		.attr_filter		= XFS_ATTR_VERITY,
+		.op_flags		= XFS_DA_OP_REMOVE,
+		.namelen		= sizeof(struct xfs_fsverity_merkle_key),
+		/* NULL value make xfs_attr_set remove the attr */
+		.value			= NULL,
+	};
+
+	if (!merkle_tree_size)
+		return 0;
+
+	args.name = (const uint8_t *)&name.merkleoff;
+	for (offset = 0; offset < merkle_tree_size; offset += tree_blocksize) {
+		xfs_fsverity_merkle_key_to_disk(&name, offset);
+		error = xfs_attr_set(&args);
+		if (error)
+			return error;
+	}
+
+	args.name = (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME;
+	args.namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN;
+	error = xfs_attr_set(&args);
+
+	return error;
+}
+
+static int
+xfs_end_enable_verity(
+	struct file		*filp,
+	const void		*desc,
+	size_t			desc_size,
+	u64			merkle_tree_size,
+	unsigned int		tree_blocksize)
+{
+	struct inode		*inode = file_inode(filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.whichfork	= XFS_ATTR_FORK,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.attr_flags	= XATTR_CREATE,
+		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
+		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
+		.value		= (void *)desc,
+		.valuelen	= desc_size,
+	};
+	int			error = 0;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+
+	/* fs-verity failed, just cleanup */
+	if (desc == NULL)
+		goto out;
+
+	error = xfs_attr_set(&args);
+	if (error)
+		goto out;
+
+	/* Set fsverity inode flag */
+	error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
+			0, 0, false, &tp);
+	if (error)
+		goto out;
+
+	/*
+	 * Ensure that we've persisted the verity information before we enable
+	 * it on the inode and tell the caller we have sealed the inode.
+	 */
+	ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	xfs_trans_set_sync(tp);
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	if (!error)
+		inode->i_flags |= S_VERITY;
+
+out:
+	if (error)
+		WARN_ON_ONCE(xfs_drop_merkle_tree(ip, merkle_tree_size,
+						  tree_blocksize));
+
+	xfs_iflags_clear(ip, XFS_IVERITY_CONSTRUCTION);
+	return error;
+}
+
+static int
+xfs_read_merkle_tree_block(
+	struct inode			*inode,
+	u64				pos,
+	struct fsverity_blockbuf	*block,
+	unsigned int			log_blocksize,
+	u64				ra_bytes)
+{
+	struct xfs_inode		*ip = XFS_I(inode);
+	struct xfs_fsverity_merkle_key	name;
+	int				error = 0;
+	struct xfs_da_args		args = {
+		.dp			= ip,
+		.attr_filter		= XFS_ATTR_VERITY,
+		.op_flags		= XFS_DA_OP_BUFFER,
+		.namelen		= sizeof(struct xfs_fsverity_merkle_key),
+		.valuelen		= (1 << log_blocksize),
+	};
+	xfs_fsverity_merkle_key_to_disk(&name, pos);
+	args.name = (const uint8_t *)&name.merkleoff;
+
+	error = xfs_attr_get(&args);
+	if (error)
+		goto out;
+
+	if (!args.valuelen)
+		return -ENODATA;
+
+	/*
+	 * The more detailed reasoning  for the memory barriers below is the same
+	 * as described in fsverity_invalidate_page(). Memory barriers are used
+	 * to force operation ordering on clearing bitmap in
+	 * fsverity_invalidate_range() and settings XBF_VERITY_SEEN flag. But as
+	 * XFS doesn't use neither PAGEs to store the blocks nor PG_checked that
+	 * function can not be used directly.
+	 */
+	if (!(args.bp->b_flags & XBF_VERITY_SEEN)) {
+		/*
+		 * A read memory barrier is needed here to give ACQUIRE
+		 * semantics to the above check.
+		 */
+		smp_rmb();
+		/*
+		 * fs-verity is not aware if buffer was evicted from the memory.
+		 * Make fs-verity invalidate verfied status of all blocks in the
+		 * buffer.
+		 *
+		 * Single extended attribute can contain multiple Merkle tree
+		 * blocks:
+		 * - leaf with inline data -> invalidate all blocks in the leaf
+		 * - remote value -> invalidate single block
+		 *
+		 * For example, leaf on 64k system with 4k/1k filesystem will
+		 * contain multiple Merkle tree blocks.
+		 *
+		 * Only remote value buffers would have XBF_DOUBLE_ALLOC flag
+		 */
+		if (args.bp->b_flags & XBF_DOUBLE_ALLOC)
+			fsverity_invalidate_range(inode, pos, args.valuelen);
+		else {
+			error = xfs_invalidate_blocks(ip, args.bp);
+			if (error)
+				goto out;
+		}
+	}
+
+	/*
+	 * A write memory barrier is needed here to give RELEASE
+	 * semantics to the below flag.
+	 */
+	smp_wmb();
+	args.bp->b_flags |= XBF_VERITY_SEEN;
+
+	block->kaddr = args.value;
+	block->size = args.valuelen;
+	block->context = args.bp;
+
+	return error;
+
+out:
+	kmem_free(args.value);
+	if (args.bp)
+		xfs_buf_rele(args.bp);
+	return error;
+}
+
+static int
+xfs_write_merkle_tree_block(
+	struct inode		*inode,
+	const void		*buf,
+	u64			pos,
+	unsigned int		size)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_fsverity_merkle_key	name;
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.whichfork	= XFS_ATTR_FORK,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.attr_flags	= XATTR_CREATE,
+		.namelen	= sizeof(struct xfs_fsverity_merkle_key),
+		.value		= (void *)buf,
+		.valuelen	= size,
+	};
+
+	xfs_fsverity_merkle_key_to_disk(&name, pos);
+	args.name = (const uint8_t *)&name.merkleoff;
+
+	return xfs_attr_set(&args);
+}
+
+static void
+xfs_drop_block(
+	struct fsverity_blockbuf	*block)
+{
+	struct xfs_buf			*bp;
+
+	ASSERT(block != NULL);
+	bp = (struct xfs_buf *)block->context;
+
+	ASSERT(bp->b_flags & XBF_VERITY_SEEN);
+
+	xfs_buf_rele(bp);
+
+	kunmap_local(block->kaddr);
+}
+
+const struct fsverity_operations xfs_verity_ops = {
+	.begin_enable_verity		= &xfs_begin_enable_verity,
+	.end_enable_verity		= &xfs_end_enable_verity,
+	.get_verity_descriptor		= &xfs_get_verity_descriptor,
+	.read_merkle_tree_block		= &xfs_read_merkle_tree_block,
+	.write_merkle_tree_block	= &xfs_write_merkle_tree_block,
+	.drop_block			= &xfs_drop_block,
+};
diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
new file mode 100644
index 000000000000..0de6c66fdb1a
--- /dev/null
+++ b/fs/xfs/xfs_verity.h
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_VERITY_H__
+#define __XFS_VERITY_H__
+
+#include "xfs.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include <linux/fsverity.h>
+
+#define XFS_VERITY_DESCRIPTOR_NAME "vdesc"
+#define XFS_VERITY_DESCRIPTOR_NAME_LEN 5
+
+static inline bool
+xfs_verity_merkle_block(
+		struct xfs_da_args *args)
+{
+	if (!(args->attr_filter & XFS_ATTR_VERITY))
+		return false;
+
+	if (!(args->op_flags & XFS_DA_OP_BUFFER))
+		return false;
+
+	return true;
+}
+
+#ifdef CONFIG_FS_VERITY
+extern const struct fsverity_operations xfs_verity_ops;
+#endif	/* CONFIG_FS_VERITY */
+
+#endif	/* __XFS_VERITY_H__ */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 23/25] xfs: make scrub aware of verity dinode flag
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (21 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 22/25] xfs: add fs-verity support Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 24/25] xfs: add fs-verity ioctls Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 25/25] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

fs-verity adds new inode flag which causes scrub to fail as it is
not yet known.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/attr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 9a1f59f7b5a4..ae4227cb55ec 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -494,7 +494,7 @@ xchk_xattr_rec(
 	/* Retrieve the entry and check it. */
 	hash = be32_to_cpu(ent->hashval);
 	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
-			XFS_ATTR_INCOMPLETE | XFS_ATTR_PARENT);
+			XFS_ATTR_INCOMPLETE | XFS_ATTR_PARENT | XFS_ATTR_VERITY);
 	if ((ent->flags & badflags) != 0)
 		xchk_da_set_corrupt(ds, level);
 	if (ent->flags & XFS_ATTR_LOCAL) {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 24/25] xfs: add fs-verity ioctls
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (22 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 23/25] xfs: make scrub aware of verity dinode flag Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  2024-02-12 16:58 ` [PATCH v4 25/25] xfs: enable ro-compat fs-verity flag Andrey Albershteyn
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Add fs-verity ioctls to enable, dump metadata (descriptor and Merkle
tree pages) and obtain file's digest.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_ioctl.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 048d83acda0a..5d64e11bf056 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -43,6 +43,7 @@
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/fileattr.h>
+#include <linux/fsverity.h>
 
 /*
  * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
@@ -2174,6 +2175,22 @@ xfs_file_ioctl(
 		return error;
 	}
 
+	case FS_IOC_ENABLE_VERITY:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_enable(filp, (const void __user *)arg);
+
+	case FS_IOC_MEASURE_VERITY:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_measure(filp, (void __user *)arg);
+
+	case FS_IOC_READ_VERITY_METADATA:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_read_metadata(filp,
+						    (const void __user *)arg);
+
 	default:
 		return -ENOTTY;
 	}
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 25/25] xfs: enable ro-compat fs-verity flag
  2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
                   ` (23 preceding siblings ...)
  2024-02-12 16:58 ` [PATCH v4 24/25] xfs: add fs-verity ioctls Andrey Albershteyn
@ 2024-02-12 16:58 ` Andrey Albershteyn
  24 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-12 16:58 UTC (permalink / raw)
  To: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers
  Cc: Andrey Albershteyn

Finalize fs-verity integration in XFS by making kernel fs-verity
aware with ro-compat flag.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index ea78b595aa97..0cb5bf9142b7 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -355,10 +355,11 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
 #define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
-		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
-		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT  | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT  | \
+		 XFS_SB_FEAT_RO_COMPAT_REFLINK | \
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT| \
+		 XFS_SB_FEAT_RO_COMPAT_VERITY)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity()
  2024-02-12 16:58 ` [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity() Andrey Albershteyn
@ 2024-02-15 21:45   ` Dave Chinner
  2024-02-16 16:18     ` Andrey Albershteyn
  2024-02-23  4:26   ` Eric Biggers
  1 sibling, 1 reply; 44+ messages in thread
From: Dave Chinner @ 2024-02-15 21:45 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers

On Mon, Feb 12, 2024 at 05:58:03PM +0100, Andrey Albershteyn wrote:
> XFS will need to know log_blocksize to remove the tree in case of an
                        ^^^^^^^^^^^^^
tree blocksize?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 13/25] xfs: introduce workqueue for post read IO work
  2024-02-12 16:58 ` [PATCH v4 13/25] xfs: introduce workqueue for post read IO work Andrey Albershteyn
@ 2024-02-15 22:11   ` Dave Chinner
  2024-02-16 16:29     ` Andrey Albershteyn
  0 siblings, 1 reply; 44+ messages in thread
From: Dave Chinner @ 2024-02-15 22:11 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers

On Mon, Feb 12, 2024 at 05:58:10PM +0100, Andrey Albershteyn wrote:
> As noted by Dave there are two problems with using fs-verity's
> workqueue in XFS:
> 
> 1. High priority workqueues are used within XFS to ensure that data
>    IO completion cannot stall processing of journal IO completions.
>    Hence using a WQ_HIGHPRI workqueue directly in the user data IO
>    path is a potential filesystem livelock/deadlock vector.
> 
> 2. The fsverity workqueue is global - it creates a cross-filesystem
>    contention point.
> 
> This patch adds per-filesystem, per-cpu workqueue for fsverity
> work.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/xfs_aops.c  | 15 +++++++++++++--
>  fs/xfs/xfs_linux.h |  1 +
>  fs/xfs/xfs_mount.h |  1 +
>  fs/xfs/xfs_super.c |  9 +++++++++
>  4 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 7a6627404160..70e444c151b2 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -548,19 +548,30 @@ xfs_vm_bmap(
>  	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
>  }
>  
> +static inline struct workqueue_struct *
> +xfs_fsverity_wq(
> +	struct address_space	*mapping)
> +{
> +	if (fsverity_active(mapping->host))
> +		return XFS_I(mapping->host)->i_mount->m_postread_workqueue;
> +	return NULL;
> +}
> +
>  STATIC int
>  xfs_vm_read_folio(
>  	struct file		*unused,
>  	struct folio		*folio)
>  {
> -	return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
> +	return iomap_read_folio(folio, &xfs_read_iomap_ops,
> +				xfs_fsverity_wq(folio->mapping));
>  }
>  
>  STATIC void
>  xfs_vm_readahead(
>  	struct readahead_control	*rac)
>  {
> -	iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
> +	iomap_readahead(rac, &xfs_read_iomap_ops,
> +			xfs_fsverity_wq(rac->mapping));
>  }

Ok, Now I see how this workqueue is specified, I just don't see
anything XFS specific about this, and it adds complexity to the
whole system by making XFS special.

Either the fsverity code provides a per-sb workqueue instance, or
we use the global fsverity workqueue. i.e. the filesystem itself
should not have to supply this, nor should it be plumbed into
generic iomap IO path.

We already do this with direct IO completion to use a
per-superblock workqueue for defering write completions
(sb->s_dio_done_wq), so I think that is what we should be doing
here, too. i.e. a generic per-sb post-read workqueue.

That way iomap_read_bio_alloc() becomes:

+#ifdef CONFIG_FS_VERITY
+	if (fsverity_active(inode)) {
+		bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
+					&iomap_fsverity_bioset);
+		if (bio) {
+			bio->bi_private = inode->i_sb->i_postread_wq;
+			bio->bi_end_io = iomap_read_fsverity_end_io;
+		}
+		return bio;
+	}

And we no longer need to pass a work queue through the IO stack.
This workqueue can be initialised when we first initialise fsverity
support for the superblock at mount time, and it would be relatively
trivial to convert all the fsverity filesytsems to use this
mechanism, getting rid of the global workqueue altogether.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity()
  2024-02-15 21:45   ` Dave Chinner
@ 2024-02-16 16:18     ` Andrey Albershteyn
  0 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-16 16:18 UTC (permalink / raw)
  To: Dave Chinner
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers

On 2024-02-16 08:45:48, Dave Chinner wrote:
> On Mon, Feb 12, 2024 at 05:58:03PM +0100, Andrey Albershteyn wrote:
> > XFS will need to know log_blocksize to remove the tree in case of an
>                         ^^^^^^^^^^^^^
> tree blocksize?

Thanks, yes tree

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 13/25] xfs: introduce workqueue for post read IO work
  2024-02-15 22:11   ` Dave Chinner
@ 2024-02-16 16:29     ` Andrey Albershteyn
  0 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-16 16:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong, ebiggers

On 2024-02-16 09:11:43, Dave Chinner wrote:
> On Mon, Feb 12, 2024 at 05:58:10PM +0100, Andrey Albershteyn wrote:
> > As noted by Dave there are two problems with using fs-verity's
> > workqueue in XFS:
> > 
> > 1. High priority workqueues are used within XFS to ensure that data
> >    IO completion cannot stall processing of journal IO completions.
> >    Hence using a WQ_HIGHPRI workqueue directly in the user data IO
> >    path is a potential filesystem livelock/deadlock vector.
> > 
> > 2. The fsverity workqueue is global - it creates a cross-filesystem
> >    contention point.
> > 
> > This patch adds per-filesystem, per-cpu workqueue for fsverity
> > work.
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > ---
> >  fs/xfs/xfs_aops.c  | 15 +++++++++++++--
> >  fs/xfs/xfs_linux.h |  1 +
> >  fs/xfs/xfs_mount.h |  1 +
> >  fs/xfs/xfs_super.c |  9 +++++++++
> >  4 files changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 7a6627404160..70e444c151b2 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -548,19 +548,30 @@ xfs_vm_bmap(
> >  	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
> >  }
> >  
> > +static inline struct workqueue_struct *
> > +xfs_fsverity_wq(
> > +	struct address_space	*mapping)
> > +{
> > +	if (fsverity_active(mapping->host))
> > +		return XFS_I(mapping->host)->i_mount->m_postread_workqueue;
> > +	return NULL;
> > +}
> > +
> >  STATIC int
> >  xfs_vm_read_folio(
> >  	struct file		*unused,
> >  	struct folio		*folio)
> >  {
> > -	return iomap_read_folio(folio, &xfs_read_iomap_ops, NULL);
> > +	return iomap_read_folio(folio, &xfs_read_iomap_ops,
> > +				xfs_fsverity_wq(folio->mapping));
> >  }
> >  
> >  STATIC void
> >  xfs_vm_readahead(
> >  	struct readahead_control	*rac)
> >  {
> > -	iomap_readahead(rac, &xfs_read_iomap_ops, NULL);
> > +	iomap_readahead(rac, &xfs_read_iomap_ops,
> > +			xfs_fsverity_wq(rac->mapping));
> >  }
> 
> Ok, Now I see how this workqueue is specified, I just don't see
> anything XFS specific about this, and it adds complexity to the
> whole system by making XFS special.
> 
> Either the fsverity code provides a per-sb workqueue instance, or
> we use the global fsverity workqueue. i.e. the filesystem itself
> should not have to supply this, nor should it be plumbed into
> generic iomap IO path.
> 
> We already do this with direct IO completion to use a
> per-superblock workqueue for defering write completions
> (sb->s_dio_done_wq), so I think that is what we should be doing
> here, too. i.e. a generic per-sb post-read workqueue.
> 
> That way iomap_read_bio_alloc() becomes:
> 
> +#ifdef CONFIG_FS_VERITY
> +	if (fsverity_active(inode)) {
> +		bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
> +					&iomap_fsverity_bioset);
> +		if (bio) {
> +			bio->bi_private = inode->i_sb->i_postread_wq;
> +			bio->bi_end_io = iomap_read_fsverity_end_io;
> +		}
> +		return bio;
> +	}
> 
> And we no longer need to pass a work queue through the IO stack.
> This workqueue can be initialised when we first initialise fsverity
> support for the superblock at mount time, and it would be relatively
> trivial to convert all the fsverity filesytsems to use this
> mechanism, getting rid of the global workqueue altogether.

Thanks, haven't thought about that. I will change it.

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files
  2024-02-12 16:58 ` [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
@ 2024-02-23  4:23   ` Eric Biggers
  2024-02-23 12:55     ` Andrey Albershteyn
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Biggers @ 2024-02-23  4:23 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Mon, Feb 12, 2024 at 05:58:02PM +0100, Andrey Albershteyn wrote:
> +FS_IOC_FSGETXATTR
> +-----------------
> +
> +Since Linux v6.9, FS_XFLAG_VERITY (0x00020000) file attribute is set for verity
> +files. The attribute can be observed via lsattr.
> +
> +    [root@vm:~]# lsattr /mnt/test/foo
> +    --------------------V- /mnt/test/foo
> +
> +Note that this attribute cannot be set with FS_IOC_FSSETXATTR as enabling verity
> +requires input parameters. See FS_IOC_ENABLE_VERITY.

The lsattr example is irrelevant and misleading because lsattr uses
FS_IOC_GETFLAGS, not FS_IOC_FSGETXATTR.

Also, I know that you titled the subsection "FS_IOC_FSGETXATTR", but the text
itself should make it super clear that FS_XFLAG_VERITY is only for
FS_IOC_FSGETXATTR, not FS_IOC_GETFLAGS.

> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 48ad69f7722e..6e63ea832d4f 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -140,6 +140,7 @@ struct fsxattr {
>  #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
>  #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
>  #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
> +#define FS_XFLAG_VERITY		0x00020000	/* fs-verity sealed inode */

There's currently nowhere in the documentation or code that uses the phrase
"fs-verity sealed inode".  It's instead called a verity file, or a file that has
fs-verity enabled.  We should try to avoid inconsistent terminology.

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity()
  2024-02-12 16:58 ` [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity() Andrey Albershteyn
  2024-02-15 21:45   ` Dave Chinner
@ 2024-02-23  4:26   ` Eric Biggers
  2024-02-23 13:02     ` Andrey Albershteyn
  1 sibling, 1 reply; 44+ messages in thread
From: Eric Biggers @ 2024-02-23  4:26 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Mon, Feb 12, 2024 at 05:58:03PM +0100, Andrey Albershteyn wrote:
> XFS will need to know log_blocksize to remove the tree in case of an

tree_blocksize

> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index 1eb7eae580be..ab7b0772899b 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -51,6 +51,7 @@ struct fsverity_operations {
>  	 * @desc: the verity descriptor to write, or NULL on failure
>  	 * @desc_size: size of verity descriptor, or 0 on failure
>  	 * @merkle_tree_size: total bytes the Merkle tree took up
> +	 * @tree_blocksize: size of the Merkle tree block

There may be many Merkle tree blocks, so it doesn't really make sense to write
"the Merkle tree block".  Maybe write "the Merkle tree block size".

Likewise in fs/btrfs/verity.c.

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 07/25] fsverity: support block-based Merkle tree caching
  2024-02-12 16:58 ` [PATCH v4 07/25] fsverity: support block-based Merkle tree caching Andrey Albershteyn
@ 2024-02-23  5:24   ` Eric Biggers
  2024-02-23 16:02     ` Andrey Albershteyn
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Biggers @ 2024-02-23  5:24 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Mon, Feb 12, 2024 at 05:58:04PM +0100, Andrey Albershteyn wrote:
> diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
> index f58432772d9e..7e153356e7bc 100644
> --- a/fs/verity/read_metadata.c
> +++ b/fs/verity/read_metadata.c
[...]
>  	/*
> -	 * Iterate through each Merkle tree page in the requested range and copy
> -	 * the requested portion to userspace.  Note that the Merkle tree block
> -	 * size isn't important here, as we are returning a byte stream; i.e.,
> -	 * we can just work with pages even if the tree block size != PAGE_SIZE.
> +	 * Iterate through each Merkle tree block in the requested range and
> +	 * copy the requested portion to userspace. Note that we are returning
> +	 * a byte stream, so PAGE_SIZE & block_size are not important here.

The block size *is* important here now, because this code is now working with
the data in blocks.  Maybe just delete the last sentence from the comment.

> +		fsverity_drop_block(inode, &block);
> +		block.kaddr = NULL;

Either the 'block.kaddr = NULL' should not be here, or it should be done
automatically in fsverity_drop_block().

> +		num_ra_pages = level == 0 ?
> +			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
> +		err = fsverity_read_merkle_tree_block(
> +			inode, hblock_idx << params->log_blocksize, block,
> +			params->log_blocksize, num_ra_pages);

'hblock_idx << params->log_blocksize' needs to be
'(u64)hblock_idx << params->log_blocksize'

>  	for (; level > 0; level--) {
> -		kunmap_local(hblocks[level - 1].addr);
> -		put_page(hblocks[level - 1].page);
> +		fsverity_drop_block(inode, &hblocks[level - 1].block);
>  	}

Braces should be removed above

> +/**
> + * fsverity_invalidate_range() - invalidate range of Merkle tree blocks
> + * @inode: inode to which this Merkle tree blocks belong
> + * @offset: offset into the Merkle tree
> + * @size: number of bytes to invalidate starting from @offset

Maybe use @pos instead of @offset, to make it clear that it's in bytes.

But, what happens if the region passed is not Merkle tree block aligned?
Perhaps this function should operate on blocks, to avoid that case?

> + * Note! As this function clears fs-verity bitmap and can be run from multiple
> + * threads simultaneously, filesystem has to take care of operation ordering
> + * while invalidating Merkle tree and caching it. See fsverity_invalidate_page()
> + * as reference.

I'm not sure what this means.  What specifically does the filesystem have to do?

> +/* fsverity_invalidate_page() - invalidate Merkle tree blocks in the page

Is this intended to be kerneldoc?  Kerneldoc comments start with /**

Also, this function is only used within fs/verity/verify.c itself.  So it should
be static, and it shouldn't be declared in include/linux/fsverity.h.

> + * @inode: inode to which this Merkle tree blocks belong
> + * @page: page which contains blocks which need to be invalidated
> + * @index: index of the first Merkle tree block in the page

The only value that is assigned to index is 'pos >> PAGE_SHIFT', which implies
it is in units of pages, not Merkle tree blocks.  Which is correct?

> + *
> + * This function invalidates "verified" state of all Merkle tree blocks within
> + * the 'page'.
> + *
> + * When the Merkle tree block size and page size are the same, then the
> + * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
> + * to directly indicate whether the page's block has been verified. This
> + * function does nothing in this case as page is invalidated by evicting from
> + * the memory.
> + *
> + * Using PG_checked also guarantees that we re-verify hash pages that
> + * get evicted and re-instantiated from the backing storage, as new
> + * pages always start out with PG_checked cleared.

This comment duplicates information from the comment in the function itself.

> +void fsverity_drop_block(struct inode *inode,
> +		struct fsverity_blockbuf *block)
> +{
> +	if (inode->i_sb->s_vop->drop_block)
> +		inode->i_sb->s_vop->drop_block(block);
> +	else {
> +		struct page *page = (struct page *)block->context;
> +
> +		/* Merkle tree block size == PAGE_SIZE; */
> +		if (block->verified)
> +			SetPageChecked(page);
> +
> +		kunmap_local(block->kaddr);
> +		put_page(page);
> +	}
> +}

I don't think this is the logical place for the call to SetPageChecked().
verity_data_block() currently does:

        if (vi->hash_block_verified)
                set_bit(hblock_idx, vi->hash_block_verified);
        else
                SetPageChecked(page);

You're proposing moving the SetPageChecked() to fsverity_drop_block().  Why?  We
should try to do things in a consistent place.

Similarly, I don't see why is_hash_block_verified() shouldn't keep the
PageChecked().

If we just keep PG_checked be get and set in the same places it currently is,
then adding fsverity_blockbuf::verified wouldn't be necessary.

Maybe you intended to move the awareness of PG_checked out of fs/verity/ and
into the filesystems?  Your change in how PG_checked is get and set is sort of a
step towards that, but it doesn't complete it.  It doesn't make sense to leave
in this half-finished state.  IMO, keeping fs/verity/ aware of PG_checked is
fine for now.  It avoids the need for some indirect calls, which is nice.

> +/**
> + * struct fsverity_blockbuf - Merkle Tree block
> + * @kaddr: virtual address of the block's data
> + * @size: buffer size

Is "buffer size" different from block size?

> + * @verified: true if block is verified against Merkle tree

This field has confusing semantics, as it's not used by the filesystems but only
by fs/verity/ internally.  As per my feedback above, I don't think this field is
necessary.

> + * Buffer containing single Merkle Tree block. These buffers are passed
> + *  - to filesystem, when fs-verity is building/writing merkel tree,
> + *  - from filesystem, when fs-verity is reading merkle tree from a disk.
> + * Filesystems sets kaddr together with size to point to a memory which contains
> + * Merkle tree block. Same is done by fs-verity when Merkle tree is need to be
> + * written down to disk.

Writes actually still use fsverity_operations::write_merkle_tree_block(), which
does not use this struct.

> + * For Merkle tree block == PAGE_SIZE, fs-verity sets verified flag to true if
> + * block in the buffer was verified.

Again, I think we can do without this field.

> +	/**
> +	 * Read a Merkle tree block of the given inode.
> +	 * @inode: the inode
> +	 * @pos: byte offset of the block within the Merkle tree
> +	 * @block: block buffer for filesystem to point it to the block
> +	 * @log_blocksize: size of the expected block

Presumably @log_blocksize is the log2 of the size of the block?

> +	 * @num_ra_pages: The number of pages with blocks that should be
> +	 *		  prefetched starting at @index if the page at @index
> +	 *		  isn't already cached.  Implementations may ignore this
> +	 *		  argument; it's only a performance optimization.

There's no parameter named @index.

> +	 * As filesystem does caching of the blocks, this functions needs to tell
> +	 * fsverity which blocks are not valid anymore (were evicted from memory)
> +	 * by calling fsverity_invalidate_range().

This function only reads a single block, so what does this mean by "blocks"?

Since there's only one block being read, why isn't the validation status just
conveyed through a bool in fsverity_blockbuf?

> +	/**
> +	 * Release the reference to a Merkle tree block
> +	 *
> +	 * @page: the block to release

@block, not @page

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 08/25] fsverity: calculate readahead in bytes instead of pages
  2024-02-12 16:58 ` [PATCH v4 08/25] fsverity: calculate readahead in bytes instead of pages Andrey Albershteyn
@ 2024-02-23  5:29   ` Eric Biggers
  0 siblings, 0 replies; 44+ messages in thread
From: Eric Biggers @ 2024-02-23  5:29 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Mon, Feb 12, 2024 at 05:58:05PM +0100, Andrey Albershteyn wrote:
> +		u64 block_offset;

'hblock_pos', to make it clear that it's in bytes, and that it's for the hash
block, not the data block.

> +		u64 ra_bytes = 0;
> +		u64 tree_size;
>  
>  		/*
>  		 * The index of the block in the current level; also the index
> @@ -105,18 +106,20 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
>  		/* Index of the hash block in the tree overall */
>  		hblock_idx = params->level_start[level] + next_hidx;
>  
> -		/* Index of the hash page in the tree overall */
> -		hpage_idx = hblock_idx >> params->log_blocks_per_page;
> +		/* Offset of the Merkle tree block into the tree */
> +		block_offset = hblock_idx << params->log_blocksize;
>  
>  		/* Byte offset of the hash within the block */
>  		hoffset = (hidx << params->log_digestsize) &
>  			  (params->block_size - 1);
>  
> -		num_ra_pages = level == 0 ?
> -			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
> +		if (level == 0) {
> +			tree_size = params->tree_pages << PAGE_SHIFT;
> +			ra_bytes = min(max_ra_bytes, (tree_size - block_offset));
> +		}

How about:

		ra_bytes = level == 0 ?
			min(max_ra_bytes, params->tree_size - hblock_pos) : 0;

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/25] fsverity: add tracepoints
  2024-02-12 16:58 ` [PATCH v4 09/25] fsverity: add tracepoints Andrey Albershteyn
@ 2024-02-23  5:31   ` Eric Biggers
  2024-02-23 13:23     ` Andrey Albershteyn
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Biggers @ 2024-02-23  5:31 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Mon, Feb 12, 2024 at 05:58:06PM +0100, Andrey Albershteyn wrote:
> fs-verity previously had debug printk but it was removed. This patch
> adds trace points to the same places where printk were used (with a
> few additional ones).

Are all of these actually useful?  There's a maintenance cost to adding all of
these.

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files
  2024-02-23  4:23   ` Eric Biggers
@ 2024-02-23 12:55     ` Andrey Albershteyn
  2024-02-23 17:59       ` Eric Biggers
  0 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-23 12:55 UTC (permalink / raw)
  To: Eric Biggers; +Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On 2024-02-22 20:23:04, Eric Biggers wrote:
> On Mon, Feb 12, 2024 at 05:58:02PM +0100, Andrey Albershteyn wrote:
> > +FS_IOC_FSGETXATTR
> > +-----------------
> > +
> > +Since Linux v6.9, FS_XFLAG_VERITY (0x00020000) file attribute is set for verity
> > +files. The attribute can be observed via lsattr.
> > +
> > +    [root@vm:~]# lsattr /mnt/test/foo
> > +    --------------------V- /mnt/test/foo
> > +
> > +Note that this attribute cannot be set with FS_IOC_FSSETXATTR as enabling verity
> > +requires input parameters. See FS_IOC_ENABLE_VERITY.
> 
> The lsattr example is irrelevant and misleading because lsattr uses
> FS_IOC_GETFLAGS, not FS_IOC_FSGETXATTR.
> 
> Also, I know that you titled the subsection "FS_IOC_FSGETXATTR", but the text
> itself should make it super clear that FS_XFLAG_VERITY is only for
> FS_IOC_FSGETXATTR, not FS_IOC_GETFLAGS.

Sure, I will remove the example. Would something like this be clear
enough?

    FS_IOC_FSGETXATTR
    -----------------

    Since Linux v6.9, FS_XFLAG_VERITY (0x00020000) file attribute is set for verity
    files. This attribute can be checked with FS_IOC_FSGETXATTR ioctl. Note that
    this attribute cannot be set with FS_IOC_FSSETXATTR as enabling verity requires
    input parameters. See FS_IOC_ENABLE_VERITY.

> > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> > index 48ad69f7722e..6e63ea832d4f 100644
> > --- a/include/uapi/linux/fs.h
> > +++ b/include/uapi/linux/fs.h
> > @@ -140,6 +140,7 @@ struct fsxattr {
> >  #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
> >  #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
> >  #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
> > +#define FS_XFLAG_VERITY		0x00020000	/* fs-verity sealed inode */
> 
> There's currently nowhere in the documentation or code that uses the phrase
> "fs-verity sealed inode".  It's instead called a verity file, or a file that has
> fs-verity enabled.  We should try to avoid inconsistent terminology.

Oops, missed this one. Thanks!

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity()
  2024-02-23  4:26   ` Eric Biggers
@ 2024-02-23 13:02     ` Andrey Albershteyn
  0 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-23 13:02 UTC (permalink / raw)
  To: Eric Biggers; +Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

> There may be many Merkle tree blocks, so it doesn't really make sense to write
> "the Merkle tree block".  Maybe write "the Merkle tree block size".
> 
> Likewise in fs/btrfs/verity.c.

Right, thanks!

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/25] fsverity: add tracepoints
  2024-02-23  5:31   ` Eric Biggers
@ 2024-02-23 13:23     ` Andrey Albershteyn
  2024-02-23 18:27       ` Eric Biggers
  0 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-23 13:23 UTC (permalink / raw)
  To: Eric Biggers; +Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On 2024-02-22 21:31:56, Eric Biggers wrote:
> On Mon, Feb 12, 2024 at 05:58:06PM +0100, Andrey Albershteyn wrote:
> > fs-verity previously had debug printk but it was removed. This patch
> > adds trace points to the same places where printk were used (with a
> > few additional ones).
> 
> Are all of these actually useful?  There's a maintenance cost to adding all of
> these.
> 

Well, they were useful for me while testing/working on this
patchset. Especially combining -e xfs -e fsverity was quite good for
checking correctness and debugging with xfstests tests. They're
probably could be handy if something breaks.

Or you mean if each of them is useful? The ones which I added to
signature verification probably aren't as useful as other; my
intention adding them was to also cover these code paths.

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 07/25] fsverity: support block-based Merkle tree caching
  2024-02-23  5:24   ` Eric Biggers
@ 2024-02-23 16:02     ` Andrey Albershteyn
  2024-02-23 18:07       ` Eric Biggers
  0 siblings, 1 reply; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-23 16:02 UTC (permalink / raw)
  To: Eric Biggers; +Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On 2024-02-22 21:24:59, Eric Biggers wrote:
> On Mon, Feb 12, 2024 at 05:58:04PM +0100, Andrey Albershteyn wrote:
> > diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
> > index f58432772d9e..7e153356e7bc 100644
> > --- a/fs/verity/read_metadata.c
> > +++ b/fs/verity/read_metadata.c
> [...]
> >  	/*
> > -	 * Iterate through each Merkle tree page in the requested range and copy
> > -	 * the requested portion to userspace.  Note that the Merkle tree block
> > -	 * size isn't important here, as we are returning a byte stream; i.e.,
> > -	 * we can just work with pages even if the tree block size != PAGE_SIZE.
> > +	 * Iterate through each Merkle tree block in the requested range and
> > +	 * copy the requested portion to userspace. Note that we are returning
> > +	 * a byte stream, so PAGE_SIZE & block_size are not important here.
> 
> The block size *is* important here now, because this code is now working with
> the data in blocks.  Maybe just delete the last sentence from the comment.
> 
> > +		fsverity_drop_block(inode, &block);
> > +		block.kaddr = NULL;
> 
> Either the 'block.kaddr = NULL' should not be here, or it should be done
> automatically in fsverity_drop_block().
> 
> > +		num_ra_pages = level == 0 ?
> > +			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
> > +		err = fsverity_read_merkle_tree_block(
> > +			inode, hblock_idx << params->log_blocksize, block,
> > +			params->log_blocksize, num_ra_pages);
> 
> 'hblock_idx << params->log_blocksize' needs to be
> '(u64)hblock_idx << params->log_blocksize'
> 
> >  	for (; level > 0; level--) {
> > -		kunmap_local(hblocks[level - 1].addr);
> > -		put_page(hblocks[level - 1].page);
> > +		fsverity_drop_block(inode, &hblocks[level - 1].block);
> >  	}
> 
> Braces should be removed above
> 
> > +/**
> > + * fsverity_invalidate_range() - invalidate range of Merkle tree blocks
> > + * @inode: inode to which this Merkle tree blocks belong
> > + * @offset: offset into the Merkle tree
> > + * @size: number of bytes to invalidate starting from @offset
> 
> Maybe use @pos instead of @offset, to make it clear that it's in bytes.
> 
> But, what happens if the region passed is not Merkle tree block aligned?
> Perhaps this function should operate on blocks, to avoid that case?
> 
> > + * Note! As this function clears fs-verity bitmap and can be run from multiple
> > + * threads simultaneously, filesystem has to take care of operation ordering
> > + * while invalidating Merkle tree and caching it. See fsverity_invalidate_page()
> > + * as reference.
> 
> I'm not sure what this means.  What specifically does the filesystem have to do?
> 
> > +/* fsverity_invalidate_page() - invalidate Merkle tree blocks in the page
> 
> Is this intended to be kerneldoc?  Kerneldoc comments start with /**
> 
> Also, this function is only used within fs/verity/verify.c itself.  So it should
> be static, and it shouldn't be declared in include/linux/fsverity.h.
> 
> > + * @inode: inode to which this Merkle tree blocks belong
> > + * @page: page which contains blocks which need to be invalidated
> > + * @index: index of the first Merkle tree block in the page
> 
> The only value that is assigned to index is 'pos >> PAGE_SHIFT', which implies
> it is in units of pages, not Merkle tree blocks.  Which is correct?
> 
> > + *
> > + * This function invalidates "verified" state of all Merkle tree blocks within
> > + * the 'page'.
> > + *
> > + * When the Merkle tree block size and page size are the same, then the
> > + * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
> > + * to directly indicate whether the page's block has been verified. This
> > + * function does nothing in this case as page is invalidated by evicting from
> > + * the memory.
> > + *
> > + * Using PG_checked also guarantees that we re-verify hash pages that
> > + * get evicted and re-instantiated from the backing storage, as new
> > + * pages always start out with PG_checked cleared.
> 
> This comment duplicates information from the comment in the function itself.
> 
> > +void fsverity_drop_block(struct inode *inode,
> > +		struct fsverity_blockbuf *block)
> > +{
> > +	if (inode->i_sb->s_vop->drop_block)
> > +		inode->i_sb->s_vop->drop_block(block);
> > +	else {
> > +		struct page *page = (struct page *)block->context;
> > +
> > +		/* Merkle tree block size == PAGE_SIZE; */
> > +		if (block->verified)
> > +			SetPageChecked(page);
> > +
> > +		kunmap_local(block->kaddr);
> > +		put_page(page);
> > +	}
> > +}
> 
> I don't think this is the logical place for the call to SetPageChecked().
> verity_data_block() currently does:
> 
>         if (vi->hash_block_verified)
>                 set_bit(hblock_idx, vi->hash_block_verified);
>         else
>                 SetPageChecked(page);
> 
> You're proposing moving the SetPageChecked() to fsverity_drop_block().  Why?  We
> should try to do things in a consistent place.
> 
> Similarly, I don't see why is_hash_block_verified() shouldn't keep the
> PageChecked().
> 
> If we just keep PG_checked be get and set in the same places it currently is,
> then adding fsverity_blockbuf::verified wouldn't be necessary.
> 
> Maybe you intended to move the awareness of PG_checked out of fs/verity/ and
> into the filesystems?

yes

> Your change in how PG_checked is get and set is sort of a
> step towards that, but it doesn't complete it.  It doesn't make sense to leave
> in this half-finished state.

What do you think is missing? I didn't want to make too many changes
to fs which already use fs-verity and completely change the
interface, just to shift page handling stuff to middle layer
functions. So yeah kinda "step towards" only :)

> IMO, keeping fs/verity/ aware of PG_checked is
> fine for now.  It avoids the need for some indirect calls, which is nice.

> > +/**
> > + * struct fsverity_blockbuf - Merkle Tree block
> > + * @kaddr: virtual address of the block's data
> > + * @size: buffer size
> 
> Is "buffer size" different from block size?
> 
> > + * @verified: true if block is verified against Merkle tree
> 
> This field has confusing semantics, as it's not used by the filesystems but only
> by fs/verity/ internally.  As per my feedback above, I don't think this field is
> necessary.
> 
> > + * Buffer containing single Merkle Tree block. These buffers are passed
> > + *  - to filesystem, when fs-verity is building/writing merkel tree,
> > + *  - from filesystem, when fs-verity is reading merkle tree from a disk.
> > + * Filesystems sets kaddr together with size to point to a memory which contains
> > + * Merkle tree block. Same is done by fs-verity when Merkle tree is need to be
> > + * written down to disk.
> 
> Writes actually still use fsverity_operations::write_merkle_tree_block(), which
> does not use this struct.
> 
> > + * For Merkle tree block == PAGE_SIZE, fs-verity sets verified flag to true if
> > + * block in the buffer was verified.
> 
> Again, I think we can do without this field.
> 
> > +	/**
> > +	 * Read a Merkle tree block of the given inode.
> > +	 * @inode: the inode
> > +	 * @pos: byte offset of the block within the Merkle tree
> > +	 * @block: block buffer for filesystem to point it to the block
> > +	 * @log_blocksize: size of the expected block
> 
> Presumably @log_blocksize is the log2 of the size of the block?
> 
> > +	 * @num_ra_pages: The number of pages with blocks that should be
> > +	 *		  prefetched starting at @index if the page at @index
> > +	 *		  isn't already cached.  Implementations may ignore this
> > +	 *		  argument; it's only a performance optimization.
> 
> There's no parameter named @index.
> 
> > +	 * As filesystem does caching of the blocks, this functions needs to tell
> > +	 * fsverity which blocks are not valid anymore (were evicted from memory)
> > +	 * by calling fsverity_invalidate_range().
> 
> This function only reads a single block, so what does this mean by "blocks"?
> 
> Since there's only one block being read, why isn't the validation status just
> conveyed through a bool in fsverity_blockbuf?

There's the case when XFS also needs to invalidate multiple tree
blocks when only single one is requested. Same as ext4 invalidates
all blocks in the page when page is evicted. This happens, for
example, when PAGE size is 64k and fs block size is 4k. XFS then
calls fsverity_invalidate_range() for all those blocks; not just for
requested one.

I can rephrase this comment.

> > +	/**
> > +	 * Release the reference to a Merkle tree block
> > +	 *
> > +	 * @page: the block to release
> 
> @block, not @page
> 
> - Eric
> 

Thanks for all the spotted mistakes, I will fix them.

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files
  2024-02-23 12:55     ` Andrey Albershteyn
@ 2024-02-23 17:59       ` Eric Biggers
  0 siblings, 0 replies; 44+ messages in thread
From: Eric Biggers @ 2024-02-23 17:59 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Fri, Feb 23, 2024 at 01:55:21PM +0100, Andrey Albershteyn wrote:
> On 2024-02-22 20:23:04, Eric Biggers wrote:
> > On Mon, Feb 12, 2024 at 05:58:02PM +0100, Andrey Albershteyn wrote:
> > > +FS_IOC_FSGETXATTR
> > > +-----------------
> > > +
> > > +Since Linux v6.9, FS_XFLAG_VERITY (0x00020000) file attribute is set for verity
> > > +files. The attribute can be observed via lsattr.
> > > +
> > > +    [root@vm:~]# lsattr /mnt/test/foo
> > > +    --------------------V- /mnt/test/foo
> > > +
> > > +Note that this attribute cannot be set with FS_IOC_FSSETXATTR as enabling verity
> > > +requires input parameters. See FS_IOC_ENABLE_VERITY.
> > 
> > The lsattr example is irrelevant and misleading because lsattr uses
> > FS_IOC_GETFLAGS, not FS_IOC_FSGETXATTR.
> > 
> > Also, I know that you titled the subsection "FS_IOC_FSGETXATTR", but the text
> > itself should make it super clear that FS_XFLAG_VERITY is only for
> > FS_IOC_FSGETXATTR, not FS_IOC_GETFLAGS.
> 
> Sure, I will remove the example. Would something like this be clear
> enough?
> 
>     FS_IOC_FSGETXATTR
>     -----------------
> 
>     Since Linux v6.9, FS_XFLAG_VERITY (0x00020000) file attribute is set for verity
>     files. This attribute can be checked with FS_IOC_FSGETXATTR ioctl. Note that
>     this attribute cannot be set with FS_IOC_FSSETXATTR as enabling verity requires
>     input parameters. See FS_IOC_ENABLE_VERITY.

It's better, but I'd probably put FS_IOC_FSGETXATTR in the first sentence.
Like: Since Linux v6.9, the FS_IOC_FSGETXATTR ioctl sets FS_XFLAG_VERITY
(0x00020000) in the returned flags when the file has verity enabled.

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 07/25] fsverity: support block-based Merkle tree caching
  2024-02-23 16:02     ` Andrey Albershteyn
@ 2024-02-23 18:07       ` Eric Biggers
  2024-02-24 14:10         ` Andrey Albershteyn
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Biggers @ 2024-02-23 18:07 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Fri, Feb 23, 2024 at 05:02:45PM +0100, Andrey Albershteyn wrote:
> > > +void fsverity_drop_block(struct inode *inode,
> > > +		struct fsverity_blockbuf *block)
> > > +{
> > > +	if (inode->i_sb->s_vop->drop_block)
> > > +		inode->i_sb->s_vop->drop_block(block);
> > > +	else {
> > > +		struct page *page = (struct page *)block->context;
> > > +
> > > +		/* Merkle tree block size == PAGE_SIZE; */
> > > +		if (block->verified)
> > > +			SetPageChecked(page);
> > > +
> > > +		kunmap_local(block->kaddr);
> > > +		put_page(page);
> > > +	}
> > > +}
> > 
> > I don't think this is the logical place for the call to SetPageChecked().
> > verity_data_block() currently does:
> > 
> >         if (vi->hash_block_verified)
> >                 set_bit(hblock_idx, vi->hash_block_verified);
> >         else
> >                 SetPageChecked(page);
> > 
> > You're proposing moving the SetPageChecked() to fsverity_drop_block().  Why?  We
> > should try to do things in a consistent place.
> > 
> > Similarly, I don't see why is_hash_block_verified() shouldn't keep the
> > PageChecked().
> > 
> > If we just keep PG_checked be get and set in the same places it currently is,
> > then adding fsverity_blockbuf::verified wouldn't be necessary.
> > 
> > Maybe you intended to move the awareness of PG_checked out of fs/verity/ and
> > into the filesystems?
> 
> yes
> 
> > Your change in how PG_checked is get and set is sort of a
> > step towards that, but it doesn't complete it.  It doesn't make sense to leave
> > in this half-finished state.
> 
> What do you think is missing? I didn't want to make too many changes
> to fs which already use fs-verity and completely change the
> interface, just to shift page handling stuff to middle layer
> functions. So yeah kinda "step towards" only :)

In your patchset, PG_checked is get and set by fsverity_drop_block() and
fsverity_read_merkle_tree_block(), which are located in fs/verity/ and called by
other code in fs/verity/.  I don't see this as being a separate layer from the
rest of fs/verity/.  If it was done by the individual filesystems (e.g.
fs/ext4/) that would be different, but it's not.  I think keeping fs/verity/
aware of PG_checked is the right call, and it's not necessary to do the half-way
move that sort of moves it to a different place in the stack but not really.

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/25] fsverity: add tracepoints
  2024-02-23 13:23     ` Andrey Albershteyn
@ 2024-02-23 18:27       ` Eric Biggers
  2024-02-26  2:24         ` Dave Chinner
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Biggers @ 2024-02-23 18:27 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On Fri, Feb 23, 2024 at 02:23:52PM +0100, Andrey Albershteyn wrote:
> On 2024-02-22 21:31:56, Eric Biggers wrote:
> > On Mon, Feb 12, 2024 at 05:58:06PM +0100, Andrey Albershteyn wrote:
> > > fs-verity previously had debug printk but it was removed. This patch
> > > adds trace points to the same places where printk were used (with a
> > > few additional ones).
> > 
> > Are all of these actually useful?  There's a maintenance cost to adding all of
> > these.
> > 
> 
> Well, they were useful for me while testing/working on this
> patchset. Especially combining -e xfs -e fsverity was quite good for
> checking correctness and debugging with xfstests tests. They're
> probably could be handy if something breaks.
> 
> Or you mean if each of them is useful? The ones which I added to
> signature verification probably aren't as useful as other; my
> intention adding them was to also cover these code paths.

Well, I'll have to maintain all of these, including reviewing them, keeping them
working as code gets refactored, and fixing any bugs that exist or may get
introduced later in them.  They also increase the icache footprint of the code.
I'd like to make sure that it will be worthwhile.  The pr_debug messages that I
had put in fs/verity/ originally were slightly useful when writing fs/verity/
originally, but after that I never really used them.  Instead I found they
actually made patching fs/verity/ a bit harder, since I had to make sure to keep
all the pr_debug statements updated as code changed around them.

Maybe I am an outlier and other people really do like having these tracepoints
around.  But I'd like to see a bit more feedback along those lines first.  If we
could keep them to a more minimal set, that would also be helpful.

- Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 07/25] fsverity: support block-based Merkle tree caching
  2024-02-23 18:07       ` Eric Biggers
@ 2024-02-24 14:10         ` Andrey Albershteyn
  0 siblings, 0 replies; 44+ messages in thread
From: Andrey Albershteyn @ 2024-02-24 14:10 UTC (permalink / raw)
  To: Eric Biggers; +Cc: fsverity, linux-xfs, linux-fsdevel, chandan.babu, djwong

On 2024-02-23 10:07:32, Eric Biggers wrote:
> On Fri, Feb 23, 2024 at 05:02:45PM +0100, Andrey Albershteyn wrote:
> > > > +void fsverity_drop_block(struct inode *inode,
> > > > +		struct fsverity_blockbuf *block)
> > > > +{
> > > > +	if (inode->i_sb->s_vop->drop_block)
> > > > +		inode->i_sb->s_vop->drop_block(block);
> > > > +	else {
> > > > +		struct page *page = (struct page *)block->context;
> > > > +
> > > > +		/* Merkle tree block size == PAGE_SIZE; */
> > > > +		if (block->verified)
> > > > +			SetPageChecked(page);
> > > > +
> > > > +		kunmap_local(block->kaddr);
> > > > +		put_page(page);
> > > > +	}
> > > > +}
> > > 
> > > I don't think this is the logical place for the call to SetPageChecked().
> > > verity_data_block() currently does:
> > > 
> > >         if (vi->hash_block_verified)
> > >                 set_bit(hblock_idx, vi->hash_block_verified);
> > >         else
> > >                 SetPageChecked(page);
> > > 
> > > You're proposing moving the SetPageChecked() to fsverity_drop_block().  Why?  We
> > > should try to do things in a consistent place.
> > > 
> > > Similarly, I don't see why is_hash_block_verified() shouldn't keep the
> > > PageChecked().
> > > 
> > > If we just keep PG_checked be get and set in the same places it currently is,
> > > then adding fsverity_blockbuf::verified wouldn't be necessary.
> > > 
> > > Maybe you intended to move the awareness of PG_checked out of fs/verity/ and
> > > into the filesystems?
> > 
> > yes
> > 
> > > Your change in how PG_checked is get and set is sort of a
> > > step towards that, but it doesn't complete it.  It doesn't make sense to leave
> > > in this half-finished state.
> > 
> > What do you think is missing? I didn't want to make too many changes
> > to fs which already use fs-verity and completely change the
> > interface, just to shift page handling stuff to middle layer
> > functions. So yeah kinda "step towards" only :)
> 
> In your patchset, PG_checked is get and set by fsverity_drop_block() and
> fsverity_read_merkle_tree_block(), which are located in fs/verity/ and called by
> other code in fs/verity/.  I don't see this as being a separate layer from the
> rest of fs/verity/.  If it was done by the individual filesystems (e.g.
> fs/ext4/) that would be different, but it's not.  I think keeping fs/verity/
> aware of PG_checked is the right call, and it's not necessary to do the half-way
> move that sort of moves it to a different place in the stack but not really.
> 
> - Eric
> 

I see, thanks! I will move back

-- 
- Andrey


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 09/25] fsverity: add tracepoints
  2024-02-23 18:27       ` Eric Biggers
@ 2024-02-26  2:24         ` Dave Chinner
  0 siblings, 0 replies; 44+ messages in thread
From: Dave Chinner @ 2024-02-26  2:24 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Andrey Albershteyn, fsverity, linux-xfs, linux-fsdevel,
	chandan.babu, djwong

On Fri, Feb 23, 2024 at 10:27:35AM -0800, Eric Biggers wrote:
> On Fri, Feb 23, 2024 at 02:23:52PM +0100, Andrey Albershteyn wrote:
> > On 2024-02-22 21:31:56, Eric Biggers wrote:
> > > On Mon, Feb 12, 2024 at 05:58:06PM +0100, Andrey Albershteyn wrote:
> > > > fs-verity previously had debug printk but it was removed. This patch
> > > > adds trace points to the same places where printk were used (with a
> > > > few additional ones).
> > > 
> > > Are all of these actually useful?  There's a maintenance cost to adding all of
> > > these.
> > > 
> > 
> > Well, they were useful for me while testing/working on this
> > patchset. Especially combining -e xfs -e fsverity was quite good for
> > checking correctness and debugging with xfstests tests. They're
> > probably could be handy if something breaks.
> > 
> > Or you mean if each of them is useful? The ones which I added to
> > signature verification probably aren't as useful as other; my
> > intention adding them was to also cover these code paths.
> 
> Well, I'll have to maintain all of these, including reviewing them, keeping them
> working as code gets refactored, and fixing any bugs that exist or may get
> introduced later in them.  They also increase the icache footprint of the code.
> I'd like to make sure that it will be worthwhile.  The pr_debug messages that I
> had put in fs/verity/ originally were slightly useful when writing fs/verity/
> originally, but after that I never really used them.  Instead I found they
> actually made patching fs/verity/ a bit harder, since I had to make sure to keep
> all the pr_debug statements updated as code changed around them.

pr_debug is largely useless outside of code development purposes.

The value in tracepoints is that they are available for diagnosing
problems on production systems and should be thought of as such.
Yes, you can also use them to debug development code, but in that
environment they are no substitute for custom trace_printk() debug
output.

However, when you have extensive tracepoints coverage, the amount of
custom trace_printk() stuff you need to add to a kernel to debug an
issue ends up being limited, because most of the key state and
object changes in the code are already covered by tracepoints.

> Maybe I am an outlier and other people really do like having these tracepoints
> around.  But I'd like to see a bit more feedback along those lines first.  If we
> could keep them to a more minimal set, that would also be helpful.

For people who are used to subsystems with extensive tracepoint
coverage (like XFS), the lack of tracepoints in all the surrounding
code is jarring. It makes the rest of the system feel like a black
hole where detailed runtime introspection is almost completely
impossible without a *lot* of work.

Extensive tracepoints help everyone in the production support
and diagnosis chain understand what is going on by providing easy to
access runtime introspection for the code. i.e. they provide
benefit to far more people than just the one kernel developer who
enables pr_debug on the subsystem when developing new code...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2024-02-26  2:24 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-12 16:57 [PATCH v4 00/25] fs-verity support for XFS Andrey Albershteyn
2024-02-12 16:57 ` [PATCH v4 01/25] fsverity: remove hash page spin lock Andrey Albershteyn
2024-02-12 16:57 ` [PATCH v4 02/25] xfs: add parent pointer support to attribute code Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 03/25] xfs: define parent pointer ondisk extended attribute format Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 04/25] xfs: add parent pointer validator functions Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 05/25] fs: add FS_XFLAG_VERITY for verity files Andrey Albershteyn
2024-02-23  4:23   ` Eric Biggers
2024-02-23 12:55     ` Andrey Albershteyn
2024-02-23 17:59       ` Eric Biggers
2024-02-12 16:58 ` [PATCH v4 06/25] fsverity: pass log_blocksize to end_enable_verity() Andrey Albershteyn
2024-02-15 21:45   ` Dave Chinner
2024-02-16 16:18     ` Andrey Albershteyn
2024-02-23  4:26   ` Eric Biggers
2024-02-23 13:02     ` Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 07/25] fsverity: support block-based Merkle tree caching Andrey Albershteyn
2024-02-23  5:24   ` Eric Biggers
2024-02-23 16:02     ` Andrey Albershteyn
2024-02-23 18:07       ` Eric Biggers
2024-02-24 14:10         ` Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 08/25] fsverity: calculate readahead in bytes instead of pages Andrey Albershteyn
2024-02-23  5:29   ` Eric Biggers
2024-02-12 16:58 ` [PATCH v4 09/25] fsverity: add tracepoints Andrey Albershteyn
2024-02-23  5:31   ` Eric Biggers
2024-02-23 13:23     ` Andrey Albershteyn
2024-02-23 18:27       ` Eric Biggers
2024-02-26  2:24         ` Dave Chinner
2024-02-12 16:58 ` [PATCH v4 10/25] iomap: integrate fsverity verification into iomap's read path Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 11/25] xfs: add XBF_VERITY_SEEN xfs_buf flag Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 12/25] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 13/25] xfs: introduce workqueue for post read IO work Andrey Albershteyn
2024-02-15 22:11   ` Dave Chinner
2024-02-16 16:29     ` Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 14/25] xfs: add attribute type for fs-verity Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 15/25] xfs: make xfs_buf_get() to take XBF_* flags Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 16/25] xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 17/25] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 18/25] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 19/25] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 20/25] xfs: don't allow to enable DAX on fs-verity sealsed inode Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 21/25] xfs: disable direct read path for fs-verity files Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 22/25] xfs: add fs-verity support Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 23/25] xfs: make scrub aware of verity dinode flag Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 24/25] xfs: add fs-verity ioctls Andrey Albershteyn
2024-02-12 16:58 ` [PATCH v4 25/25] xfs: enable ro-compat fs-verity flag Andrey Albershteyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.