All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] fs-verity support for XFS
@ 2022-12-13 17:29 Andrey Albershteyn
  2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
                   ` (11 more replies)
  0 siblings, 12 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

Hi all,

This patchset introduces fs-verity [5] support for XFS. This
implementation utilizes extended attributes to store fs-verity
metadata in comparison to ext4/f2fs which store that after EOF. The
pages are stored in the remote extended attributes.

A few starting points:
- The xattr name of a each Merkle tree page is binary
- fs-verity doesn't work with multi-page folios yet. Thus, those are
  disabled when fs-verity is enabled on inode.
- Direct path and DAX are disabled for inodes with fs-verity
- Pages are verified in iomap's read IO path (offloaded with
  workqueue)
- New ro-compat flag is added as inodes with fs-verity have new
  on-disk diflag

Not yet implemented:
- No pre-fetching of Merkle tree pages in the
  read_merkle_tree_page()
- No marking of already verified Merkle tree pages (each read, the
  whole tree is verified).

Preliminary testing:
- fstests 1k, 4k
- More in-depth testing is on the way :)

This patchset depends on Allison's Parent Pointer patchset [1],
which introduces binary names for extended attributes. Particularly,
patch "[PATCH v6 13/27] xfs: Add xfs_verify_pptr" [3] is needed.

The first patch moves setting of large folio support flag to more
appropriate location - xfs_setup_inode(), where other flags are set.
The second one adds wrapper which would be used when already
existing inode is sealed with fs-verity. The rest adds fs-verity
support.

Allison's Parent Pointer patchset v6:
[1]: https://lore.kernel.org/linux-xfs/20221129211242.2689855-1-allison.henderson@oracle.com/

Allison's Parent Pointer branch:
[2]: https://github.com/allisonhenderson/xfs/tree/xfs_new_pptrsv6

Patch which adds handling of xattr binary names:
[3]: https://lore.kernel.org/linux-xfs/20221129211242.2689855-14-allison.henderson@oracle.com/

This patchset branch:
[4]: https://github.com/alberand/linux/tree/xfs-verity

fs-verity docs:
[5]: https://www.kernel.org/doc/html/latest/filesystems/fsverity.html

I'm looking forward for your comments.

Thanks!
Andrey

Andrey Albershteyn (11):
  xfs: enable large folios in xfs_setup_inode()
  pagemap: add mapping_clear_large_folios() wrapper
  xfs: add attribute type for fs-verity
  xfs: add fs-verity ro-compat flag
  xfs: add inode on-disk VERITY flag
  xfs: initialize fs-verity on file open and cleanup on inode
    destruction
  xfs: disable direct read path for fs-verity sealed files
  xfs: don't enable large folios on fs-verity sealed inode
  iomap: fs-verity verification on page read
  xfs: add fs-verity support
  xfs: add fs-verity ioctls

 fs/iomap/buffered-io.c         |  80 ++++++++++++-
 fs/xfs/Makefile                |   1 +
 fs/xfs/libxfs/xfs_attr.c       |   8 ++
 fs/xfs/libxfs/xfs_da_format.h  |   5 +-
 fs/xfs/libxfs/xfs_format.h     |  14 ++-
 fs/xfs/libxfs/xfs_log_format.h |   1 +
 fs/xfs/libxfs/xfs_sb.c         |   2 +
 fs/xfs/xfs_file.c              |  22 +++-
 fs/xfs/xfs_icache.c            |   2 -
 fs/xfs/xfs_inode.c             |   2 +
 fs/xfs/xfs_inode.h             |   1 +
 fs/xfs/xfs_ioctl.c             |  11 ++
 fs/xfs/xfs_iops.c              |   9 ++
 fs/xfs/xfs_mount.h             |   2 +
 fs/xfs/xfs_super.c             |  12 ++
 fs/xfs/xfs_trace.h             |   1 +
 fs/xfs/xfs_verity.c            | 203 +++++++++++++++++++++++++++++++++
 fs/xfs/xfs_verity.h            |  19 +++
 fs/xfs/xfs_xattr.c             |   3 +
 include/linux/iomap.h          |   5 +
 include/linux/pagemap.h        |   5 +
 21 files changed, 393 insertions(+), 15 deletions(-)
 create mode 100644 fs/xfs/xfs_verity.c
 create mode 100644 fs/xfs/xfs_verity.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode()
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-14  0:53   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper Andrey Albershteyn
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

This is more appropriate place to set large folios flag as other
mapping's flags are set here. This will also allow to conditionally
enable large folios based on inode's diflags (e.g. fs-verity).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_icache.c | 2 --
 fs/xfs/xfs_iops.c   | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f35e2cee52655..8679739160507 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -88,7 +88,6 @@ xfs_inode_alloc(
 	/* VFS doesn't initialise i_mode or i_state! */
 	VFS_I(ip)->i_mode = 0;
 	VFS_I(ip)->i_state = 0;
-	mapping_set_large_folios(VFS_I(ip)->i_mapping);
 
 	XFS_STATS_INC(mp, vn_active);
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
@@ -323,7 +322,6 @@ xfs_reinit_inode(
 	inode->i_rdev = dev;
 	inode->i_uid = uid;
 	inode->i_gid = gid;
-	mapping_set_large_folios(inode->i_mapping);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 10a5e85f2a709..9c90cfcecabc2 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1292,6 +1292,8 @@ xfs_setup_inode(
 	gfp_mask = mapping_gfp_mask(inode->i_mapping);
 	mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS)));
 
+	mapping_set_large_folios(inode->i_mapping);
+
 	/*
 	 * If there is no attribute fork no ACL can exist on this inode,
 	 * and it can't have any file capabilities attached to it either.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
  2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-13 17:55   ` Matthew Wilcox
  2022-12-13 17:29 ` [RFC PATCH 03/11] xfs: add attribute type for fs-verity Andrey Albershteyn
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

Add wrapper to clear mapping's large folio flag. This is handy for
disabling large folios on already existing inodes (e.g. future XFS
integration of fs-verity).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 include/linux/pagemap.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index bbccb40442224..63ca600bdf8f7 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -306,6 +306,11 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
 	__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
 }
 
+static inline void mapping_clear_large_folios(struct address_space *mapping)
+{
+	__clear_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+}
+
 /*
  * Large folio support currently depends on THP.  These dependencies are
  * being worked on but are not yet fixed.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 03/11] xfs: add attribute type for fs-verity
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
  2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
  2022-12-13 17:29 ` [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-13 17:43   ` Eric Sandeen
  2022-12-13 17:29 ` [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag Andrey Albershteyn
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

The Merkle tree pages and descriptor are stored in the extended
attributes of the inode. Add new attribute type for fs-verity
metadata. Skip fs-verity attributes for getfattr as it can not parse
binary page names.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_da_format.h  | 5 ++++-
 fs/xfs/libxfs/xfs_log_format.h | 1 +
 fs/xfs/xfs_trace.h             | 1 +
 fs/xfs/xfs_xattr.c             | 3 +++
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 75b13807145d1..778bf2b476618 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -689,14 +689,17 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
 #define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
+#define	XFS_ATTR_VERITY_BIT	4	/* verity merkle tree and descriptor */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
 #define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
+#define XFS_ATTR_VERITY		(1u << XFS_ATTR_VERITY_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
 #define XFS_ATTR_NSP_ONDISK_MASK \
-			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
+			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT | \
+			 XFS_ATTR_VERITY)
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 727b5a8580285..678eacb7925c9 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -968,6 +968,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
 					 XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 372d871bccc5e..5eceb259cc5f7 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -78,6 +78,7 @@ struct xfs_icwalk;
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
 	{ XFS_ATTR_SECURE,	"SECURE" }, \
+	{ XFS_ATTR_VERITY,	"VERITY" }, \
 	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 5b57f6348d630..acbfa29d04af0 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -237,6 +237,9 @@ xfs_xattr_put_listent(
 	if (flags & XFS_ATTR_PARENT)
 		return;
 
+	if (flags & XFS_ATTR_VERITY)
+		return;
+
 	if (flags & XFS_ATTR_ROOT) {
 #ifdef CONFIG_XFS_POSIX_ACL
 		if (namelen == SGI_ACL_FILE_SIZE &&
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (2 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 03/11] xfs: add attribute type for fs-verity Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-14  1:06   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag Andrey Albershteyn
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

To mark inodes sealed with fs-verity the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 10 ++++++----
 fs/xfs/libxfs/xfs_sb.c     |  2 ++
 fs/xfs/xfs_mount.h         |  2 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index f413819b2a8aa..2b76e646e6f14 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -353,11 +353,13 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
-		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
-		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT  | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT  | \
+		 XFS_SB_FEAT_RO_COMPAT_REFLINK | \
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT| \
+		 XFS_SB_FEAT_RO_COMPAT_VERITY)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index a59bf09495b1d..5c975879f5664 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -161,6 +161,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+		features |= XFS_FEAT_VERITY;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 8aca2cc173ac1..3da28271011d1 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -279,6 +279,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_BIGTIME	(1ULL << 24)	/* large timestamps */
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
+#define XFS_FEAT_VERITY		(1ULL << 27)	/* fs-verity */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -342,6 +343,7 @@ __XFS_HAS_FEAT(inobtcounts, INOBTCNT)
 __XFS_HAS_FEAT(bigtime, BIGTIME)
 __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR)
 __XFS_HAS_FEAT(large_extent_counts, NREXT64)
+__XFS_HAS_FEAT(verity, VERITY)
 
 /*
  * Mount features
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (3 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-14  1:29   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 4 +++-
 fs/xfs/xfs_inode.c         | 2 ++
 fs/xfs/xfs_iops.c          | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 2b76e646e6f14..6950a4ef19967 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1073,16 +1073,18 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
+#define XFS_DIFLAG2_VERITY_BIT	5	/* inode sealed by fsverity */
 
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_VERITY	(1 << XFS_DIFLAG2_VERITY_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_VERITY)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f08a2d5f96ad4..8d9c9697d3619 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -636,6 +636,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+			flags |= FS_VERITY_FL;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 9c90cfcecabc2..b229d25c1c3d6 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1236,6 +1236,8 @@ xfs_diflags_to_iflags(
 		flags |= S_NOATIME;
 	if (init && xfs_inode_should_enable_dax(ip))
 		flags |= S_DAX;
+	if (xflags & FS_VERITY_FL)
+		flags |= S_VERITY;
 
 	/*
 	 * S_DAX can only be set during inode initialization and is never set by
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (4 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-14  1:35   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

fs-verity will read and attach metadata (not the tree itself) from
a disk for those inodes which already have fs-verity enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_file.c  | 8 ++++++++
 fs/xfs/xfs_super.c | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 242165580e682..5eadd9a37c50e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -32,6 +32,7 @@
 #include <linux/mman.h>
 #include <linux/fadvise.h>
 #include <linux/mount.h>
+#include <linux/fsverity.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
 
@@ -1170,9 +1171,16 @@ xfs_file_open(
 	struct inode	*inode,
 	struct file	*file)
 {
+	int		error = 0;
+
 	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
 		return -EIO;
 	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
+
+	error = fsverity_file_open(inode, file);
+	if (error)
+		return error;
+
 	return generic_file_open(inode, file);
 }
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8f1e9b9ed35d9..50c2c819ba940 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -45,6 +45,7 @@
 #include <linux/magic.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
+#include <linux/fsverity.h>
 
 static const struct super_operations xfs_super_operations;
 
@@ -647,6 +648,7 @@ xfs_fs_destroy_inode(
 	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 	XFS_STATS_INC(ip->i_mount, vn_rele);
 	XFS_STATS_INC(ip->i_mount, vn_remove);
+	fsverity_cleanup_inode(inode);
 	xfs_inode_mark_reclaimable(ip);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (5 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-14  2:07   ` Dave Chinner
  2022-12-23 16:18   ` Christoph Hellwig
  2022-12-13 17:29 ` [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode Andrey Albershteyn
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

The direct path is not supported on verity files. Attempts to use direct
I/O path on such files should fall back to buffered I/O path.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_file.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5eadd9a37c50e..fb4181e38a19d 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -245,7 +245,8 @@ xfs_file_dax_read(
 	struct kiocb		*iocb,
 	struct iov_iter		*to)
 {
-	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret = 0;
 
 	trace_xfs_file_dax_read(iocb, to);
@@ -298,10 +299,17 @@ xfs_file_read_iter(
 
 	if (IS_DAX(inode))
 		ret = xfs_file_dax_read(iocb, to);
-	else if (iocb->ki_flags & IOCB_DIRECT)
+	else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
 		ret = xfs_file_dio_read(iocb, to);
-	else
+	else {
+		/*
+		 * In case fs-verity is enabled, we also fallback to the
+		 * buffered read from the direct read path. Therefore,
+		 * IOCB_DIRECT is set and need to be cleared
+		 */
+		iocb->ki_flags &= ~IOCB_DIRECT;
 		ret = xfs_file_buffered_read(iocb, to);
+	}
 
 	if (ret > 0)
 		XFS_STATS_ADD(mp, xs_read_bytes, ret);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (6 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-14  2:07   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

fs-verity doesn't work with large folios. Don't enable large folios
on those inode which are already sealed with fs-verity (indicated by
diflag).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_iops.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index b229d25c1c3d6..a4c8db588690e 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1294,7 +1294,12 @@ xfs_setup_inode(
 	gfp_mask = mapping_gfp_mask(inode->i_mapping);
 	mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS)));
 
-	mapping_set_large_folios(inode->i_mapping);
+	/*
+	 * As fs-verity doesn't support folios so far, we won't enable them on
+	 * sealed inodes
+	 */
+	if (!IS_VERITY(inode))
+		mapping_set_large_folios(inode->i_mapping);
 
 	/*
 	 * If there is no attribute fork no ACL can exist on this inode,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 09/11] iomap: fs-verity verification on page read
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (7 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-13 19:02   ` Eric Biggers
  2022-12-14  5:43   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

Add fs-verity page verification in read IO path. The verification
itself is offloaded into workqueue (provided by fs-verity).

The work_struct items are allocated from bioset side by side with
bio being processed.

As inodes with fs-verity doesn't use large folios we check only
first page of the folio for errors (set by fs-verity if verification
failed).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++---
 include/linux/iomap.h  |  5 +++
 2 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 91ee0b308e13d..b7abc2f806cfc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -17,6 +17,7 @@
 #include <linux/bio.h>
 #include <linux/sched/signal.h>
 #include <linux/migrate.h>
+#include <linux/fsverity.h>
 #include "trace.h"
 
 #include "../internal.h"
@@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
 }
 
 static struct bio_set iomap_ioend_bioset;
+static struct bio_set iomap_readend_bioset;
 
 static struct iomap_page *
 iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
@@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio)
 	int error = blk_status_to_errno(bio->bi_status);
 	struct folio_iter fi;
 
-	bio_for_each_folio_all(fi, bio)
+	bio_for_each_folio_all(fi, bio) {
+		/*
+		 * As fs-verity doesn't work with multi-page folios, verity
+		 * inodes have large folios disabled (only single page folios
+		 * are used)
+		 */
+		if (!error)
+			error = PageError(folio_page(fi.folio, 0));
+
 		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
+	}
+
 	bio_put(bio);
+	/* The iomap_readend has been freed by bio_put() */
+}
+
+static void iomap_read_work_end_io(
+	struct work_struct *work)
+{
+	struct iomap_readend *ctx =
+		container_of(work, struct iomap_readend, read_work);
+	struct bio *bio = &ctx->read_inline_bio;
+
+	fsverity_verify_bio(bio);
+	iomap_read_end_io(bio);
+}
+
+static void iomap_read_work_io(struct bio *bio)
+{
+	struct iomap_readend *ctx =
+		container_of(bio, struct iomap_readend, read_inline_bio);
+
+	fsverity_enqueue_verify_work(&ctx->read_work);
 }
 
 struct iomap_readpage_ctx {
@@ -264,6 +296,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 	loff_t orig_pos = pos;
 	size_t poff, plen;
 	sector_t sector;
+	struct iomap_readend *readend;
 
 	if (iomap->type == IOMAP_INLINE)
 		return iomap_read_inline_data(iter, folio);
@@ -276,7 +309,21 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 	if (iomap_block_needs_zeroing(iter, pos)) {
 		folio_zero_range(folio, poff, plen);
-		iomap_set_range_uptodate(folio, iop, poff, plen);
+		if (!fsverity_active(iter->inode)) {
+			iomap_set_range_uptodate(folio, iop, poff, plen);
+			goto done;
+		}
+
+		/*
+		 * As fs-verity doesn't work with folios sealed inodes have
+		 * multi-page folios disabled and we can check on first and only
+		 * page
+		 */
+		if (fsverity_verify_page(folio_page(folio, 0)))
+			iomap_set_range_uptodate(folio, iop, poff, plen);
+		else
+			folio_set_error(folio);
+
 		goto done;
 	}
 
@@ -297,8 +344,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
-		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+		if (fsverity_active(iter->inode)) {
+			ctx->bio = bio_alloc_bioset(iomap->bdev,
+					bio_max_segs(nr_vecs), REQ_OP_READ,
+					GFP_NOFS, &iomap_readend_bioset);
+			readend = container_of(ctx->bio,
+					struct iomap_readend,
+					read_inline_bio);
+			INIT_WORK(&readend->read_work, iomap_read_work_end_io);
+		} else {
+			ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
 				     REQ_OP_READ, gfp);
+		}
 		/*
 		 * If the bio_alloc fails, try it again for a single page to
 		 * avoid having to deal with partial page reads.  This emulates
@@ -311,7 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		if (ctx->rac)
 			ctx->bio->bi_opf |= REQ_RAHEAD;
 		ctx->bio->bi_iter.bi_sector = sector;
-		ctx->bio->bi_end_io = iomap_read_end_io;
+		if (fsverity_active(iter->inode))
+			ctx->bio->bi_end_io = iomap_read_work_io;
+		else
+			ctx->bio->bi_end_io = iomap_read_end_io;
+
 		bio_add_folio(ctx->bio, folio, plen, poff);
 	}
 
@@ -1546,6 +1607,17 @@ EXPORT_SYMBOL_GPL(iomap_writepages);
 
 static int __init iomap_init(void)
 {
+#ifdef CONFIG_FS_VERITY
+	int error = 0;
+
+	error = bioset_init(&iomap_readend_bioset,
+			   4 * (PAGE_SIZE / SECTOR_SIZE),
+			   offsetof(struct iomap_readend, read_inline_bio),
+			   BIOSET_NEED_BVECS);
+	if (error)
+		return error;
+#endif
+
 	return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
 			   offsetof(struct iomap_ioend, io_inline_bio),
 			   BIOSET_NEED_BVECS);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 238a03087e17e..dbdef159b20d7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -264,6 +264,11 @@ struct iomap_ioend {
 	struct bio		io_inline_bio;	/* MUST BE LAST! */
 };
 
+struct iomap_readend {
+	struct work_struct	read_work;	/* post read work (fs-verity) */
+	struct bio		read_inline_bio;/* MUST BE LAST! */
+};
+
 struct iomap_writeback_ops {
 	/*
 	 * Required, maps the blocks so that writeback can be performed on
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (8 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-13 19:08   ` Eric Biggers
  2022-12-14  7:58   ` Dave Chinner
  2022-12-13 17:29 ` [RFC PATCH 11/11] xfs: add fs-verity ioctls Andrey Albershteyn
  2022-12-13 20:50 ` [RFC PATCH 00/11] fs-verity support for XFS Eric Biggers
  11 siblings, 2 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

Add integration with fs-verity. The XFS store fs-verity metadata in
the extended attributes. The metadata consist of verity descriptor
and Merkle tree pages.

The descriptor is stored under "verity_descriptor" extended
attribute. The Merkle tree pages are stored under binary indexes.

When fs-verity is enabled on an inode, the XFS_IVERITY flag is set
meaning that the Merkle tree is being build. Then, pagecache is
flushed and large folios are disabled as these aren't yet supported
by fs-verity. This is done in xfs_begin_enable_verity() to make sure
that fs-verity operations on the inode don't populate cache with
large folios during a tree build. The initialization ends with
storing of verity descriptor and setting inode on-disk flag
(XFS_DIFLAG2_VERITY).

Also add check that block size == PAGE_SIZE as fs-verity doesn't
support different sizes yet.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/Makefile          |   1 +
 fs/xfs/libxfs/xfs_attr.c |   8 ++
 fs/xfs/xfs_inode.h       |   1 +
 fs/xfs/xfs_super.c       |  10 ++
 fs/xfs/xfs_verity.c      | 203 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_verity.h      |  19 ++++
 6 files changed, 242 insertions(+)
 create mode 100644 fs/xfs/xfs_verity.c
 create mode 100644 fs/xfs/xfs_verity.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 42d0496fdad7d..5afa8ae5b3b7f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -131,6 +131,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
 xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
+xfs-$(CONFIG_FS_VERITY)		+= xfs_verity.o
 
 # notify failure
 ifeq ($(CONFIG_MEMORY_FAILURE),y)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 57080ea4c869b..42013fc99b76a 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -26,6 +26,7 @@
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
+#include "xfs_verity.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1632,6 +1633,13 @@ xfs_attr_namecheck(
 		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
 	}
 
+	if (flags & XFS_ATTR_VERITY) {
+		if (length != sizeof(__be64) &&
+				length != XFS_VERITY_DESCRIPTOR_NAME_LEN)
+			return false;
+		return true;
+	}
+
 	return xfs_str_attr_namecheck(name, length);
 }
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 5735de32beebd..070631adac572 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -325,6 +325,7 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
  * plain old IRECLAIMABLE inode.
  */
 #define XFS_INACTIVATING	(1 << 13)
+#define XFS_IVERITY		(1 << 14) /* merkle tree is in progress */
 
 /* All inode state flags related to inode reclaim. */
 #define XFS_ALL_IRECLAIM_FLAGS	(XFS_IRECLAIMABLE | \
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 50c2c819ba940..a3c89d2c06a8a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -41,6 +41,7 @@
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
 #include "xfs_iunlink_item.h"
+#include "xfs_verity.h"
 
 #include <linux/magic.h>
 #include <linux/fs_context.h>
@@ -1469,6 +1470,9 @@ xfs_fs_fill_super(
 	sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
 #endif
 	sb->s_op = &xfs_super_operations;
+#ifdef CONFIG_FS_VERITY
+	sb->s_vop = &xfs_verity_ops;
+#endif
 
 	/*
 	 * Delay mount work if the debug hook is set. This is debug
@@ -1669,6 +1673,12 @@ xfs_fs_fill_super(
 		xfs_alert(mp,
 	"EXPERIMENTAL parent pointer feature enabled. Use at your own risk!");
 
+	if (xfs_has_verity(mp) && mp->m_super->s_blocksize != PAGE_SIZE) {
+		xfs_alert(mp,
+			"Cannot use fs-verity with block size != PAGE_SIZE");
+		goto out_filestream_unmount;
+	}
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;
diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
new file mode 100644
index 0000000000000..112a72d0b0ca7
--- /dev/null
+++ b/fs/xfs/xfs_verity.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_attr.h"
+#include "xfs_verity.h"
+#include "xfs_bmap_util.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+
+static int
+xfs_get_verity_descriptor(
+	struct inode		*inode,
+	void			*buf,
+	size_t			buf_size)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error = 0;
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
+		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
+		.valuelen	= buf_size,
+	};
+
+	error = xfs_attr_get(&args);
+	if (error)
+		return error;
+
+	if (buf_size == 0)
+		return args.valuelen;
+
+	if (args.valuelen > buf_size) {
+		kmem_free(args.value);
+		return -ERANGE;
+	}
+
+	memcpy(buf, args.value, buf_size);
+
+	kmem_free(args.value);
+	return args.valuelen;
+}
+
+static int
+xfs_begin_enable_verity(
+	struct file	    *filp)
+{
+	struct inode	    *inode = file_inode(filp);
+	struct xfs_inode    *ip = XFS_I(inode);
+	int		    error = 0;
+
+	if (IS_DAX(inode))
+		return -EINVAL;
+
+	if (xfs_iflags_test(ip, XFS_IVERITY))
+		return -EBUSY;
+	xfs_iflags_set(ip, XFS_IVERITY);
+
+	/*
+	 * As fs-verity doesn't support multi-page folios yet, flush everything
+	 * from page cache and disable it
+	 */
+	filemap_invalidate_lock(inode->i_mapping);
+
+	inode_dio_wait(inode);
+	error = xfs_flush_unmap_range(ip, 0, XFS_ISIZE(ip));
+	if (error)
+		goto out;
+	mapping_clear_large_folios(inode->i_mapping);
+
+out:
+	filemap_invalidate_unlock(inode->i_mapping);
+	if (error)
+		xfs_iflags_clear(ip, XFS_IVERITY);
+	return error;
+}
+
+static int
+xfs_end_enable_verity(
+	struct file		*filp,
+	const void		*desc,
+	size_t			desc_size,
+	u64			merkle_tree_size)
+{
+	struct inode		*inode = file_inode(filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.whichfork	= XFS_ATTR_FORK,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.attr_flags	= XATTR_CREATE,
+		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
+		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
+		.value		= (void *)desc,
+		.valuelen	= desc_size,
+	};
+	int			error = 0;
+
+	/* fs-verity failed, just cleanup */
+	if (desc == NULL) {
+		mapping_set_large_folios(inode->i_mapping);
+		goto out;
+	}
+
+	error = xfs_attr_set(&args);
+	if (error)
+		goto out;
+
+	/* Set fsverity inode flag */
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
+	if (error)
+		goto out;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+	ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
+	inode->i_flags |= S_VERITY;
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	error = xfs_trans_commit(tp);
+
+out:
+	if (error)
+		mapping_set_large_folios(inode->i_mapping);
+
+	xfs_iflags_clear(ip, XFS_IVERITY);
+	return error;
+}
+
+static struct page *
+xfs_read_merkle_tree_page(
+	struct inode		*inode,
+	pgoff_t			index,
+	unsigned long		num_ra_pages)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct page		*page;
+	__be64			name = cpu_to_be64(index);
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.name		= (const uint8_t *)&name,
+		.namelen	= sizeof(__be64),
+		.valuelen	= PAGE_SIZE,
+	};
+	int			error = 0;
+
+	error = xfs_attr_get(&args);
+	if (error)
+		return ERR_PTR(-EFAULT);
+
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		return ERR_PTR(-ENOMEM);
+
+	memcpy(page_address(page), args.value, args.valuelen);
+
+	kmem_free(args.value);
+	return page;
+}
+
+static int
+xfs_write_merkle_tree_block(
+	struct inode		*inode,
+	const void		*buf,
+	u64			index,
+	int			log_blocksize)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	__be64			name = cpu_to_be64(index);
+	struct xfs_da_args	args = {
+		.dp		= ip,
+		.whichfork	= XFS_ATTR_FORK,
+		.attr_filter	= XFS_ATTR_VERITY,
+		.attr_flags	= XATTR_CREATE,
+		.name		= (const uint8_t *)&name,
+		.namelen	= sizeof(__be64),
+		.value		= (void *)buf,
+		.valuelen	= 1 << log_blocksize,
+	};
+
+	return xfs_attr_set(&args);
+}
+
+const struct fsverity_operations xfs_verity_ops = {
+	.begin_enable_verity = &xfs_begin_enable_verity,
+	.end_enable_verity = &xfs_end_enable_verity,
+	.get_verity_descriptor = &xfs_get_verity_descriptor,
+	.read_merkle_tree_page = &xfs_read_merkle_tree_page,
+	.write_merkle_tree_block = &xfs_write_merkle_tree_block,
+};
diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
new file mode 100644
index 0000000000000..ae5d87ca32a86
--- /dev/null
+++ b/fs/xfs/xfs_verity.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_VERITY_H__
+#define __XFS_VERITY_H__
+
+#include <linux/fsverity.h>
+
+#define XFS_VERITY_DESCRIPTOR_NAME "verity_descriptor"
+#define XFS_VERITY_DESCRIPTOR_NAME_LEN 17
+
+#ifdef CONFIG_FS_VERITY
+extern const struct fsverity_operations xfs_verity_ops;
+#else
+#define xfs_verity_ops NULL
+#endif	/* CONFIG_FS_VERITY */
+
+#endif	/* __XFS_VERITY_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 11/11] xfs: add fs-verity ioctls
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (9 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
@ 2022-12-13 17:29 ` Andrey Albershteyn
  2022-12-13 20:50 ` [RFC PATCH 00/11] fs-verity support for XFS Eric Biggers
  11 siblings, 0 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2022-12-13 17:29 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel; +Cc: Andrey Albershteyn

Add fs-verity ioctls to enable, dump metadata (descriptor and Merkle
tree pages) and obtain file's digest.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_ioctl.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3cd46d030ccdc..dff3672c16140 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -43,6 +43,7 @@
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/fileattr.h>
+#include <linux/fsverity.h>
 
 /*
  * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
@@ -2267,6 +2268,16 @@ xfs_file_ioctl(
 		return error;
 	}
 
+	case FS_IOC_ENABLE_VERITY:
+		return fsverity_ioctl_enable(filp, (const void __user *)arg);
+
+	case FS_IOC_MEASURE_VERITY:
+		return fsverity_ioctl_measure(filp, (void __user *)arg);
+
+	case FS_IOC_READ_VERITY_METADATA:
+		return fsverity_ioctl_read_metadata(filp,
+						    (const void __user *)arg);
+
 	default:
 		return -ENOTTY;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/11] xfs: add attribute type for fs-verity
  2022-12-13 17:29 ` [RFC PATCH 03/11] xfs: add attribute type for fs-verity Andrey Albershteyn
@ 2022-12-13 17:43   ` Eric Sandeen
  2022-12-14  1:03     ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Sandeen @ 2022-12-13 17:43 UTC (permalink / raw)
  To: Andrey Albershteyn, linux-xfs, linux-fsdevel

On 12/13/22 11:29 AM, Andrey Albershteyn wrote:
> The Merkle tree pages and descriptor are stored in the extended
> attributes of the inode. Add new attribute type for fs-verity
> metadata. Skip fs-verity attributes for getfattr as it can not parse
> binary page names.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>


>  DECLARE_EVENT_CLASS(xfs_attr_list_class,
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index 5b57f6348d630..acbfa29d04af0 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -237,6 +237,9 @@ xfs_xattr_put_listent(
>  	if (flags & XFS_ATTR_PARENT)
>  		return;
>  
> +	if (flags & XFS_ATTR_VERITY)
> +		return;
> +

Just a nitpick, but now that there are already 2 cases like this, I wonder
if it would be wise to #define something like an XFS_ATTR_VISIBLE_MASK
(or maybe XFS_ATTR_INTERNAL_MASK) and use that to decide, rather than
testing each one individually?

Thanks,
-Eric


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 17:29 ` [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper Andrey Albershteyn
@ 2022-12-13 17:55   ` Matthew Wilcox
  2022-12-13 19:33     ` Eric Biggers
  2022-12-13 21:08     ` Dave Chinner
  0 siblings, 2 replies; 51+ messages in thread
From: Matthew Wilcox @ 2022-12-13 17:55 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:26PM +0100, Andrey Albershteyn wrote:
> Add wrapper to clear mapping's large folio flag. This is handy for
> disabling large folios on already existing inodes (e.g. future XFS
> integration of fs-verity).

I have two problems with this.  One is your use of __clear_bit().
We can use __set_bit() because it's done as part of initialisation.
As far as I can tell from your patches, mapping_clear_large_folios() is
called on a live inode, so you'd have to use clear_bit() to avoid races.

The second is that verity should obviously be enhanced to support
large folios (and for that matter, block sizes smaller than PAGE_SIZE).
Without that, this is just a toy or a prototype.  Disabling large folios
is not an option.

I'm happy to work with you to add support for large folios to verity.
It hasn't been high priority for me, but I'm now working on folio support
for bufferhead filesystems and this would probably fit in.

> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  include/linux/pagemap.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index bbccb40442224..63ca600bdf8f7 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -306,6 +306,11 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
>  	__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
>  }
>  
> +static inline void mapping_clear_large_folios(struct address_space *mapping)
> +{
> +	__clear_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
> +}
> +
>  /*
>   * Large folio support currently depends on THP.  These dependencies are
>   * being worked on but are not yet fixed.
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 09/11] iomap: fs-verity verification on page read
  2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
@ 2022-12-13 19:02   ` Eric Biggers
  2023-01-09 16:58     ` Andrey Albershteyn
  2022-12-14  5:43   ` Dave Chinner
  1 sibling, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-13 19:02 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote:
> Add fs-verity page verification in read IO path. The verification
> itself is offloaded into workqueue (provided by fs-verity).
> 
> The work_struct items are allocated from bioset side by side with
> bio being processed.
> 
> As inodes with fs-verity doesn't use large folios we check only
> first page of the folio for errors (set by fs-verity if verification
> failed).
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++---
>  include/linux/iomap.h  |  5 +++
>  2 files changed, 81 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 91ee0b308e13d..b7abc2f806cfc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -17,6 +17,7 @@
>  #include <linux/bio.h>
>  #include <linux/sched/signal.h>
>  #include <linux/migrate.h>
> +#include <linux/fsverity.h>
>  #include "trace.h"
>  
>  #include "../internal.h"
> @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
>  }
>  
>  static struct bio_set iomap_ioend_bioset;
> +static struct bio_set iomap_readend_bioset;
>  
>  static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
> @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio)
>  	int error = blk_status_to_errno(bio->bi_status);
>  	struct folio_iter fi;
>  
> -	bio_for_each_folio_all(fi, bio)
> +	bio_for_each_folio_all(fi, bio) {
> +		/*
> +		 * As fs-verity doesn't work with multi-page folios, verity
> +		 * inodes have large folios disabled (only single page folios
> +		 * are used)
> +		 */
> +		if (!error)
> +			error = PageError(folio_page(fi.folio, 0));
> +
>  		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> +	}

fs/verity/ no longer uses PG_error to report errors.  See upstream commit
98dc08bae678 ("fsverity: stop using PG_error to track error status").

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
@ 2022-12-13 19:08   ` Eric Biggers
  2022-12-13 19:22     ` Darrick J. Wong
  2022-12-13 20:33     ` Dave Chinner
  2022-12-14  7:58   ` Dave Chinner
  1 sibling, 2 replies; 51+ messages in thread
From: Eric Biggers @ 2022-12-13 19:08 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> 
> Also add check that block size == PAGE_SIZE as fs-verity doesn't
> support different sizes yet.

That's coming with
https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u,
which I'll be resending soon and I hope to apply for 6.3.
Review and testing of that patchset, along with its associated xfstests update
(https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u),
would be greatly appreciated.

Note, as proposed there will still be a limit of:

	merkle_tree_block_size <= fs_block_size <= page_size

Hopefully you don't need fs_block_size > page_size or
merkle_tree_block_size > fs_block_size?

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 19:08   ` Eric Biggers
@ 2022-12-13 19:22     ` Darrick J. Wong
  2022-12-13 20:13       ` Eric Biggers
  2022-12-13 20:33     ` Dave Chinner
  1 sibling, 1 reply; 51+ messages in thread
From: Darrick J. Wong @ 2022-12-13 19:22 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 11:08:45AM -0800, Eric Biggers wrote:
> On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> > 
> > Also add check that block size == PAGE_SIZE as fs-verity doesn't
> > support different sizes yet.
> 
> That's coming with
> https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u,
> which I'll be resending soon and I hope to apply for 6.3.
> Review and testing of that patchset, along with its associated xfstests update
> (https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u),
> would be greatly appreciated.
> 
> Note, as proposed there will still be a limit of:
> 
> 	merkle_tree_block_size <= fs_block_size <= page_size
> 
> Hopefully you don't need fs_block_size > page_size or

Not right this second, but large folios (and the ability to demand them)
are probably the last major piece that we need to support running
64k-fsblock filesystems on x86.

> merkle_tree_block_size > fs_block_size?

Doubtful.

--D

> 
> - Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 17:55   ` Matthew Wilcox
@ 2022-12-13 19:33     ` Eric Biggers
  2022-12-13 21:10       ` Dave Chinner
  2022-12-13 21:08     ` Dave Chinner
  1 sibling, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-13 19:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 05:55:22PM +0000, Matthew Wilcox wrote:
> I'm happy to work with you to add support for large folios to verity.
> It hasn't been high priority for me, but I'm now working on folio support
> for bufferhead filesystems and this would probably fit in.

I'd be very interested to know what else is needed after commit 98dc08bae678
("fsverity: stop using PG_error to track error status") which is upstream now,
and
https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u
("fsverity: support for non-4K pages") which is planned for 6.3.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 19:22     ` Darrick J. Wong
@ 2022-12-13 20:13       ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-12-13 20:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 11:22:31AM -0800, Darrick J. Wong wrote:
> On Tue, Dec 13, 2022 at 11:08:45AM -0800, Eric Biggers wrote:
> > On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> > > 
> > > Also add check that block size == PAGE_SIZE as fs-verity doesn't
> > > support different sizes yet.
> > 
> > That's coming with
> > https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u,
> > which I'll be resending soon and I hope to apply for 6.3.
> > Review and testing of that patchset, along with its associated xfstests update
> > (https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u),
> > would be greatly appreciated.
> > 
> > Note, as proposed there will still be a limit of:
> > 
> > 	merkle_tree_block_size <= fs_block_size <= page_size
> > 
> > Hopefully you don't need fs_block_size > page_size or
> 
> Not right this second, but large folios (and the ability to demand them)
> are probably the last major piece that we need to support running
> 64k-fsblock filesystems on x86.
> 
> > merkle_tree_block_size > fs_block_size?
> 
> Doubtful.
> 

Thanks for the info.  Actually, I think

	merkle_tree_block_size <= fs_block_size <= page_size

is wrong.  It should actually be

	merkle_tree_block_size <= min(fs_block_size, page_size)

So there shouldn't actually be a problem with fs_block_size > page_size
(assuming that the filesystem doesn't have unrelated problems with it).

merkle_tree_block_size <= page_size comes from the way that fs/verity/verify.c
works (again, talking about the version with my patchset "fsverity: support for
non-4K pages" applied, and not the current upstream version which has a stronger
assumption of merkle_tree_block_size == page_size).

merkle_tree_block_size <= fs_block_size comes from the fact that every time data
is verified, it must be Merkle tree block aligned.  Maybe even that is not
necessarily a problem, if the filesystem waits to collect a full page (or folio)
before verifying it.  But ext4 will do a FS block at a time.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 19:08   ` Eric Biggers
  2022-12-13 19:22     ` Darrick J. Wong
@ 2022-12-13 20:33     ` Dave Chinner
  2022-12-13 20:39       ` Eric Biggers
  1 sibling, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-13 20:33 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 11:08:45AM -0800, Eric Biggers wrote:
> On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> > 
> > Also add check that block size == PAGE_SIZE as fs-verity doesn't
> > support different sizes yet.
> 
> That's coming with
> https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u,
> which I'll be resending soon and I hope to apply for 6.3.
> Review and testing of that patchset, along with its associated xfstests update
> (https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u),
> would be greatly appreciated.
> 
> Note, as proposed there will still be a limit of:
> 
> 	merkle_tree_block_size <= fs_block_size <= page_size

> Hopefully you don't need fs_block_size > page_size or

Yes, we will.

This back on my radar now that folios have settled down. It's
pretty trivial for XFS to do because we already support metadata
block sizes > filesystem block size. Here is an old prototype:

https://lore.kernel.org/linux-xfs/20181107063127.3902-1-david@fromorbit.com/

> merkle_tree_block_size > fs_block_size?

That's also a desirable addition.

XFS is using xattrs to hold merkle tree blocks so the merkle tree
storage is are already independent of the filesystem block size and
page cache limitations. Being able to using 64kB merkle tree blocks
would be really handy for reducing the search depth and overall IO
footprint of really large files.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 20:33     ` Dave Chinner
@ 2022-12-13 20:39       ` Eric Biggers
  2022-12-13 21:40         ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-13 20:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 07:33:19AM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 11:08:45AM -0800, Eric Biggers wrote:
> > On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> > > 
> > > Also add check that block size == PAGE_SIZE as fs-verity doesn't
> > > support different sizes yet.
> > 
> > That's coming with
> > https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u,
> > which I'll be resending soon and I hope to apply for 6.3.
> > Review and testing of that patchset, along with its associated xfstests update
> > (https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u),
> > would be greatly appreciated.
> > 
> > Note, as proposed there will still be a limit of:
> > 
> > 	merkle_tree_block_size <= fs_block_size <= page_size
> 
> > Hopefully you don't need fs_block_size > page_size or
> 
> Yes, we will.
> 
> This back on my radar now that folios have settled down. It's
> pretty trivial for XFS to do because we already support metadata
> block sizes > filesystem block size. Here is an old prototype:
> 
> https://lore.kernel.org/linux-xfs/20181107063127.3902-1-david@fromorbit.com/

As per my follow-up response
(https://lore.kernel.org/r/Y5jc7P1ZeWHiTKRF@sol.localdomain),
I now think that wouldn't actually be a problem.

> > merkle_tree_block_size > fs_block_size?
> 
> That's also a desirable addition.
> 
> XFS is using xattrs to hold merkle tree blocks so the merkle tree
> storage is are already independent of the filesystem block size and
> page cache limitations. Being able to using 64kB merkle tree blocks
> would be really handy for reducing the search depth and overall IO
> footprint of really large files.

Well, the main problem is that using a Merkle tree block of 64K would mean that
you can never read less than 64K at a time.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
                   ` (10 preceding siblings ...)
  2022-12-13 17:29 ` [RFC PATCH 11/11] xfs: add fs-verity ioctls Andrey Albershteyn
@ 2022-12-13 20:50 ` Eric Biggers
  2022-12-13 22:11   ` Dave Chinner
  11 siblings, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-13 20:50 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote:
> Not yet implemented:
> - No pre-fetching of Merkle tree pages in the
>   read_merkle_tree_page()

This would be helpful, but not essential.

> - No marking of already verified Merkle tree pages (each read, the
>   whole tree is verified).

This is essential to have, IMO.

You *could* do what btrfs does, where it caches the Merkle tree pages in the
inode's page cache past i_size, even though btrfs stores the Merkle tree
separately from the file data on-disk.

However, I'd guess that the other XFS developers would have an adversion to that
approach, even though it would not affect the on-disk storage.

The alternatives would be to create a separate in-memory-only inode for the
cache, or to build a custom cache with its own shrinker.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 17:55   ` Matthew Wilcox
  2022-12-13 19:33     ` Eric Biggers
@ 2022-12-13 21:08     ` Dave Chinner
  2023-01-09 16:34       ` Andrey Albershteyn
  1 sibling, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-13 21:08 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 05:55:22PM +0000, Matthew Wilcox wrote:
> On Tue, Dec 13, 2022 at 06:29:26PM +0100, Andrey Albershteyn wrote:
> > Add wrapper to clear mapping's large folio flag. This is handy for
> > disabling large folios on already existing inodes (e.g. future XFS
> > integration of fs-verity).
> 
> I have two problems with this.  One is your use of __clear_bit().
> We can use __set_bit() because it's done as part of initialisation.
> As far as I can tell from your patches, mapping_clear_large_folios() is
> called on a live inode, so you'd have to use clear_bit() to avoid races.

I think we can do without mapping_clear_large_folios() - we
already have precedence for this sort of mapping state change with
the DAX inode feature flag.  That is, we change the on-disk state in
the ioctl context, but we don't change the in-memory inode state.
Instead, we mark it I_DONTCACHEi to get it turfed from memory with
expediency. Then when it is re-instantiated, we see the on-disk
state and then don't enable large mappings on that inode.

That will work just fine here, I think.

> The second is that verity should obviously be enhanced to support
> large folios (and for that matter, block sizes smaller than PAGE_SIZE).
> Without that, this is just a toy or a prototype.  Disabling large folios
> is not an option.

Disabling large folios is very much an option. Filesystems must opt
in to large mapping support, so they can also choose to opt out.
i.e. large mappings is a filesystem policy decision, not a core
infrastructure decision. Hence how we disable large mappings
for fsverity enabled inodes is open to discussion, but saying we
can't opt out of an optional feature is entirely non-sensical.

> I'm happy to work with you to add support for large folios to verity.
> It hasn't been high priority for me, but I'm now working on folio support
> for bufferhead filesystems and this would probably fit in.

Yes, we need fsverity to support multipage folios, but modifying
fsverity is outside the scope of initially enabling fsverity support
on XFS.  This patch set is about sorting out the on-disk format
changes and interfacing with the fsverity infrastructure to enable
the feature to be tested and verified.

Stuff like large mapping support in fsverity is a future concern,
not a show-stopper for initial feature support. We don't need every
bell and whistle in the initial merge....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 19:33     ` Eric Biggers
@ 2022-12-13 21:10       ` Dave Chinner
  2022-12-14  6:52         ` Eric Biggers
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-13 21:10 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Matthew Wilcox, Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 11:33:54AM -0800, Eric Biggers wrote:
> On Tue, Dec 13, 2022 at 05:55:22PM +0000, Matthew Wilcox wrote:
> > I'm happy to work with you to add support for large folios to verity.
> > It hasn't been high priority for me, but I'm now working on folio support
> > for bufferhead filesystems and this would probably fit in.
> 
> I'd be very interested to know what else is needed after commit 98dc08bae678
> ("fsverity: stop using PG_error to track error status") which is upstream now,
> and
> https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u
> ("fsverity: support for non-4K pages") which is planned for 6.3.

Did you change the bio interfaces to iterate a bvec full of
variable sized folios, or does it still expect a bio to only have
PAGE_SIZE pages attached to it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 20:39       ` Eric Biggers
@ 2022-12-13 21:40         ` Dave Chinner
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-13 21:40 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 12:39:39PM -0800, Eric Biggers wrote:
> On Wed, Dec 14, 2022 at 07:33:19AM +1100, Dave Chinner wrote:
> > On Tue, Dec 13, 2022 at 11:08:45AM -0800, Eric Biggers wrote:
> > > On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> > > > 
> > > > Also add check that block size == PAGE_SIZE as fs-verity doesn't
> > > > support different sizes yet.
> > > 
> > > That's coming with
> > > https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u,
> > > which I'll be resending soon and I hope to apply for 6.3.
> > > Review and testing of that patchset, along with its associated xfstests update
> > > (https://lore.kernel.org/fstests/20221211070704.341481-1-ebiggers@kernel.org/T/#u),
> > > would be greatly appreciated.
> > > 
> > > Note, as proposed there will still be a limit of:
> > > 
> > > 	merkle_tree_block_size <= fs_block_size <= page_size
> > 
> > > Hopefully you don't need fs_block_size > page_size or
> > 
> > Yes, we will.
> > 
> > This back on my radar now that folios have settled down. It's
> > pretty trivial for XFS to do because we already support metadata
> > block sizes > filesystem block size. Here is an old prototype:
> > 
> > https://lore.kernel.org/linux-xfs/20181107063127.3902-1-david@fromorbit.com/
> 
> As per my follow-up response
> (https://lore.kernel.org/r/Y5jc7P1ZeWHiTKRF@sol.localdomain),
> I now think that wouldn't actually be a problem.

Good to hear.

> > > merkle_tree_block_size > fs_block_size?
> > 
> > That's also a desirable addition.
> > 
> > XFS is using xattrs to hold merkle tree blocks so the merkle tree
> > storage is are already independent of the filesystem block size and
> > page cache limitations. Being able to using 64kB merkle tree blocks
> > would be really handy for reducing the search depth and overall IO
> > footprint of really large files.
> 
> Well, the main problem is that using a Merkle tree block of 64K would mean that
> you can never read less than 64K at a time.

Sure, but why does that matter?

The typical cost of a 64kB IO is only about 5% more than a
4kB IO, even on slow spinning storage.  However, we bring an order
of magnitude more data into the cache with that IO, so we can then
process more data before we have to go to disk again and take
another latency hit.

FYI, we have this large 64kB block size option for directories in
XFS already - you can have a 4kB block size filesystem with a 64kB
directory block size. The larger block size is a little slower for
small directories because they have higher per-leaf block CPU
processing overhead, but once you get to millions of records in a
single directory or really high sustained IO load, the larger block
size is *much* faster because the reduction in IO latency and search
efficiency more than makes up for the single block CPU processing
overhead...

The merkle tree is little different - once we get into TB scale
files, the merkle tree is indexing millions of individual records.
At this point overall record lookup and IO efficiency dominates the
data access time, not the amount of data each individual IO
retreives from disk.

Keep in mind that the block size used for the merkle tree would be a
filesystem choice. If we have the capability to support 64kB
merkle tree blocks, then XFS can make the choice of what block size
to use at the point where we are measuring the file because we know
how large the file is at that point. And because we're storing the
merkle tree blocks in xattrs, we know exactly what block size the
merkle tree data was stored in from the xattr metadata...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-13 20:50 ` [RFC PATCH 00/11] fs-verity support for XFS Eric Biggers
@ 2022-12-13 22:11   ` Dave Chinner
  2022-12-14  6:31     ` Eric Biggers
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-13 22:11 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 12:50:28PM -0800, Eric Biggers wrote:
> On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote:
> > Not yet implemented:
> > - No pre-fetching of Merkle tree pages in the
> >   read_merkle_tree_page()
> 
> This would be helpful, but not essential.
> 
> > - No marking of already verified Merkle tree pages (each read, the
> >   whole tree is verified).

Ah, I wasn't aware that this was missing.

> 
> This is essential to have, IMO.
> 
> You *could* do what btrfs does, where it caches the Merkle tree pages in the
> inode's page cache past i_size, even though btrfs stores the Merkle tree
> separately from the file data on-disk.
>
> However, I'd guess that the other XFS developers would have an adversion to that
> approach, even though it would not affect the on-disk storage.

Yup, on an architectural level it just seems wrong to cache secure
verification metadata in the same user accessible address space as
the data it verifies.

> The alternatives would be to create a separate in-memory-only inode for the
> cache, or to build a custom cache with its own shrinker.

The merkel tree blocks are cached in the XFS buffer cache.

Andrey, could we just add a new flag to the xfs_buf->b_flags to
indicate that the buffer contains verified merkle tree records?
i.e. if it's not set after we've read the buffer, we need to verify
the buffer and set th verified buffer in cache and we can skip the
verification?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode()
  2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
@ 2022-12-14  0:53   ` Dave Chinner
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  0:53 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:25PM +0100, Andrey Albershteyn wrote:
> This is more appropriate place to set large folios flag as other
> mapping's flags are set here. This will also allow to conditionally
> enable large folios based on inode's diflags (e.g. fs-verity).
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 2 --
>  fs/xfs/xfs_iops.c   | 2 ++
>  2 files changed, 2 insertions(+), 2 deletions(-)

LGTM.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/11] xfs: add attribute type for fs-verity
  2022-12-13 17:43   ` Eric Sandeen
@ 2022-12-14  1:03     ` Dave Chinner
  2023-01-09 16:37       ` Andrey Albershteyn
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  1:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 11:43:42AM -0600, Eric Sandeen wrote:
> On 12/13/22 11:29 AM, Andrey Albershteyn wrote:
> > The Merkle tree pages and descriptor are stored in the extended
> > attributes of the inode. Add new attribute type for fs-verity
> > metadata. Skip fs-verity attributes for getfattr as it can not parse
> > binary page names.
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> 
> 
> >  DECLARE_EVENT_CLASS(xfs_attr_list_class,
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index 5b57f6348d630..acbfa29d04af0 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -237,6 +237,9 @@ xfs_xattr_put_listent(
> >  	if (flags & XFS_ATTR_PARENT)
> >  		return;
> >  
> > +	if (flags & XFS_ATTR_VERITY)
> > +		return;
> > +
> 
> Just a nitpick, but now that there are already 2 cases like this, I wonder
> if it would be wise to #define something like an XFS_ATTR_VISIBLE_MASK
> (or maybe XFS_ATTR_INTERNAL_MASK) and use that to decide, rather than
> testing each one individually?

Seems like a good idea to me.

There's also a couple of other spots that a comment about internal
vs externally visible xattr namespaces might be appropriate. e.g
xfs_attr_filter() in the ioctl code should never have internal xattr
namespaces added to it, and similarly a comment at the top of
fs/xfs/xfs_xattr.c that the interfaces implemented in the file are
only for exposing externally visible xattr namespaces to users.

That way it becomes more obvious that we handle internal vs external
xattr namespaces very differently.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag
  2022-12-13 17:29 ` [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag Andrey Albershteyn
@ 2022-12-14  1:06   ` Dave Chinner
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  1:06 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:28PM +0100, Andrey Albershteyn wrote:
> To mark inodes sealed with fs-verity the new XFS_DIFLAG2_VERITY flag
> will be added in further patch. This requires ro-compat flag to let
> older kernels know that fs with fs-verity can not be modified.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_format.h | 10 ++++++----
>  fs/xfs/libxfs/xfs_sb.c     |  2 ++
>  fs/xfs/xfs_mount.h         |  2 ++
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index f413819b2a8aa..2b76e646e6f14 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -353,11 +353,13 @@ xfs_sb_has_compat_feature(
>  #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>  #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>  #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
> +#define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */

Yup, define this now, but ....

>  #define XFS_SB_FEAT_RO_COMPAT_ALL \
> -		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
> -		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
> -		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
> -		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
> +		(XFS_SB_FEAT_RO_COMPAT_FINOBT  | \
> +		 XFS_SB_FEAT_RO_COMPAT_RMAPBT  | \
> +		 XFS_SB_FEAT_RO_COMPAT_REFLINK | \
> +		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT| \
> +		 XFS_SB_FEAT_RO_COMPAT_VERITY)

This hunk should be in a separate patch at the end of the series
so that VERITY enabled filesystems won't mount until all the
infrastructure is in place to handle it correctly. This means git
bisects won't land in the middle of the patchset where
VERITY is accepted as valid but the functionality doesn't work.

Otherwise looks good.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag
  2022-12-13 17:29 ` [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag Andrey Albershteyn
@ 2022-12-14  1:29   ` Dave Chinner
  2023-01-09 16:51     ` Andrey Albershteyn
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  1:29 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:29PM +0100, Andrey Albershteyn wrote:
> Add flag to mark inodes which have fs-verity enabled on them (i.e.
> descriptor exist and tree is built).
.....
> 
>  static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
>  {
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f08a2d5f96ad4..8d9c9697d3619 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -636,6 +636,8 @@ xfs_ip2xflags(
>  			flags |= FS_XFLAG_DAX;
>  		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
>  			flags |= FS_XFLAG_COWEXTSIZE;
> +		if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
> +			flags |= FS_VERITY_FL;
>  	}

Ah, attribute flag confusion - easy to do. xflags (FS_XFLAG*) are a
different set of (extended) flags than the standard VFS inode flags
(FS_*_FL).

To place the verity enabled state in the extended flags, you would
need to define FS_XFLAG_VERITY in include/uapi/linux/fs.h. You'll
also need to add the conversion from FS_VERITY_FL to FS_XFLAG_VERITY
to fileattr_fill_flags() and vice versa to fileattr_fill_xflags()

This will allow both the VFS inode flags UAPI and the
FS_IOC_FSGETXATTR extended flag API to see the inode has verity
enabled on it.

Once FS_XFLAG_VERITY is defined, changing the code in XFS to use it
directly instead of FS_VERITY_FL will result in everything working
correct throughout the code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction
  2022-12-13 17:29 ` [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
@ 2022-12-14  1:35   ` Dave Chinner
  2022-12-14  5:25     ` Eric Biggers
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  1:35 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:30PM +0100, Andrey Albershteyn wrote:
> fs-verity will read and attach metadata (not the tree itself) from
> a disk for those inodes which already have fs-verity enabled.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/xfs_file.c  | 8 ++++++++
>  fs/xfs/xfs_super.c | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 242165580e682..5eadd9a37c50e 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -32,6 +32,7 @@
>  #include <linux/mman.h>
>  #include <linux/fadvise.h>
>  #include <linux/mount.h>
> +#include <linux/fsverity.h>
>  
>  static const struct vm_operations_struct xfs_file_vm_ops;
>  
> @@ -1170,9 +1171,16 @@ xfs_file_open(
>  	struct inode	*inode,
>  	struct file	*file)
>  {
> +	int		error = 0;
> +
>  	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
>  		return -EIO;
>  	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
> +
> +	error = fsverity_file_open(inode, file);
> +	if (error)
> +		return error;

This is a hot path, so shouldn't we elide the function call
altogether if verity is not enabled on the inode? i.e:

	if (IS_VERITY(inode)) {
		error = fsverity_file_open(inode, file);
		if (error)
			return error;
	}

It doesn't really matter for a single file open, but when you're
opening a few million inodes every second the function call overhead
only to immediately return because IS_VERITY() is false adds up...

>  	return generic_file_open(inode, file);
>  }
>  
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 8f1e9b9ed35d9..50c2c819ba940 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -45,6 +45,7 @@
>  #include <linux/magic.h>
>  #include <linux/fs_context.h>
>  #include <linux/fs_parser.h>
> +#include <linux/fsverity.h>
>  
>  static const struct super_operations xfs_super_operations;
>  
> @@ -647,6 +648,7 @@ xfs_fs_destroy_inode(
>  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
>  	XFS_STATS_INC(ip->i_mount, vn_rele);
>  	XFS_STATS_INC(ip->i_mount, vn_remove);
> +	fsverity_cleanup_inode(inode);

Similarly, shouldn't this be:

	if (fsverity_active(inode))
		fsverity_cleanup_inode(inode);

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files
  2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
@ 2022-12-14  2:07   ` Dave Chinner
  2022-12-14  5:44     ` Eric Biggers
  2022-12-23 16:18   ` Christoph Hellwig
  1 sibling, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  2:07 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:31PM +0100, Andrey Albershteyn wrote:
> The direct path is not supported on verity files. Attempts to use direct
> I/O path on such files should fall back to buffered I/O path.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5eadd9a37c50e..fb4181e38a19d 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -245,7 +245,8 @@ xfs_file_dax_read(
>  	struct kiocb		*iocb,
>  	struct iov_iter		*to)
>  {
> -	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
> +	struct inode		*inode = iocb->ki_filp->f_mapping->host;
> +	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			ret = 0;
>  
>  	trace_xfs_file_dax_read(iocb, to);
> @@ -298,10 +299,17 @@ xfs_file_read_iter(
>  
>  	if (IS_DAX(inode))
>  		ret = xfs_file_dax_read(iocb, to);

fsverity is supported on DAX?

Eric, I was under the impression that the DAX io path does not
support fsverity, but I can't see anything that prevents ext4 from
using fsverity on dax enabled filesystems. Does this work (is it
tested regularly?), or is the lack of checking simply an oversight
in that nobody thought to check DAX status when fsverity is enabled?

> -	else if (iocb->ki_flags & IOCB_DIRECT)
> +	else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
>  		ret = xfs_file_dio_read(iocb, to);
> -	else
> +	else {
> +		/*
> +		 * In case fs-verity is enabled, we also fallback to the
> +		 * buffered read from the direct read path. Therefore,
> +		 * IOCB_DIRECT is set and need to be cleared
> +		 */
> +		iocb->ki_flags &= ~IOCB_DIRECT;
>  		ret = xfs_file_buffered_read(iocb, to);
> +	}

Is this IOCB_DIRECT avoidance a limitation of the XFS
implementation, or a generic limitation of the fsverity
infrastructure?

If it's a limitation of the fsverity infrastructure, then we
shouldn't be working around this in every single filesystem that
supports fsverity.  If all the major filesystems are having to check
fsverity_active() and clear IOCB_DIRECT on every single IOCB_DIRECT
IO that is issued on a fsverity inode, then shouldn't we just elide
IOCB_DIRECT from file->f_iocb_flags in the first place?

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode
  2022-12-13 17:29 ` [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode Andrey Albershteyn
@ 2022-12-14  2:07   ` Dave Chinner
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  2:07 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:32PM +0100, Andrey Albershteyn wrote:
> fs-verity doesn't work with large folios. Don't enable large folios
> on those inode which are already sealed with fs-verity (indicated by
> diflag).
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/xfs_iops.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index b229d25c1c3d6..a4c8db588690e 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1294,7 +1294,12 @@ xfs_setup_inode(
>  	gfp_mask = mapping_gfp_mask(inode->i_mapping);
>  	mapping_set_gfp_mask(inode->i_mapping, (gfp_mask & ~(__GFP_FS)));
>  
> -	mapping_set_large_folios(inode->i_mapping);
> +	/*
> +	 * As fs-verity doesn't support folios so far, we won't enable them on
> +	 * sealed inodes
> +	 */
> +	if (!IS_VERITY(inode))
> +		mapping_set_large_folios(inode->i_mapping);
>  
>  	/*
>  	 * If there is no attribute fork no ACL can exist on this inode,

Looks reasonable to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction
  2022-12-14  1:35   ` Dave Chinner
@ 2022-12-14  5:25     ` Eric Biggers
  2022-12-14  8:18       ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-14  5:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 12:35:24PM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 06:29:30PM +0100, Andrey Albershteyn wrote:
> > fs-verity will read and attach metadata (not the tree itself) from
> > a disk for those inodes which already have fs-verity enabled.
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c  | 8 ++++++++
> >  fs/xfs/xfs_super.c | 2 ++
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 242165580e682..5eadd9a37c50e 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/mman.h>
> >  #include <linux/fadvise.h>
> >  #include <linux/mount.h>
> > +#include <linux/fsverity.h>
> >  
> >  static const struct vm_operations_struct xfs_file_vm_ops;
> >  
> > @@ -1170,9 +1171,16 @@ xfs_file_open(
> >  	struct inode	*inode,
> >  	struct file	*file)
> >  {
> > +	int		error = 0;
> > +
> >  	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
> >  		return -EIO;
> >  	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
> > +
> > +	error = fsverity_file_open(inode, file);
> > +	if (error)
> > +		return error;
> 
> This is a hot path, so shouldn't we elide the function call
> altogether if verity is not enabled on the inode? i.e:
> 
> 	if (IS_VERITY(inode)) {
> 		error = fsverity_file_open(inode, file);
> 		if (error)
> 			return error;
> 	}
> 
> It doesn't really matter for a single file open, but when you're
> opening a few million inodes every second the function call overhead
> only to immediately return because IS_VERITY() is false adds up...
> 
> >  	return generic_file_open(inode, file);
> >  }
> >  
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 8f1e9b9ed35d9..50c2c819ba940 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -45,6 +45,7 @@
> >  #include <linux/magic.h>
> >  #include <linux/fs_context.h>
> >  #include <linux/fs_parser.h>
> > +#include <linux/fsverity.h>
> >  
> >  static const struct super_operations xfs_super_operations;
> >  
> > @@ -647,6 +648,7 @@ xfs_fs_destroy_inode(
> >  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> >  	XFS_STATS_INC(ip->i_mount, vn_rele);
> >  	XFS_STATS_INC(ip->i_mount, vn_remove);
> > +	fsverity_cleanup_inode(inode);
> 
> Similarly, shouldn't this be:
> 
> 	if (fsverity_active(inode))
> 		fsverity_cleanup_inode(inode);
> 

If you actually want to do that, then we should instead make these functions
inline functions that do the "is anything needed?" check, then call a
double-underscored version that does the actual work.  Some of the fscrypt
functions are like that.  Then all filesystems would get the benefit.

Funnily enough, I had actually wanted to do that for fsverity_file_open()
originally, but Ted had preferred the simpler version.

Anyway, if this is something you want, I can change it to be that way.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 09/11] iomap: fs-verity verification on page read
  2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
  2022-12-13 19:02   ` Eric Biggers
@ 2022-12-14  5:43   ` Dave Chinner
  1 sibling, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  5:43 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote:
> Add fs-verity page verification in read IO path. The verification
> itself is offloaded into workqueue (provided by fs-verity).
> 
> The work_struct items are allocated from bioset side by side with
> bio being processed.
> 
> As inodes with fs-verity doesn't use large folios we check only
> first page of the folio for errors (set by fs-verity if verification
> failed).
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++---
>  include/linux/iomap.h  |  5 +++
>  2 files changed, 81 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 91ee0b308e13d..b7abc2f806cfc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -17,6 +17,7 @@
>  #include <linux/bio.h>
>  #include <linux/sched/signal.h>
>  #include <linux/migrate.h>
> +#include <linux/fsverity.h>
>  #include "trace.h"
>  
>  #include "../internal.h"
> @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
>  }
>  
>  static struct bio_set iomap_ioend_bioset;
> +static struct bio_set iomap_readend_bioset;
>  
>  static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
> @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio)
>  	int error = blk_status_to_errno(bio->bi_status);
>  	struct folio_iter fi;
>  
> -	bio_for_each_folio_all(fi, bio)
> +	bio_for_each_folio_all(fi, bio) {
> +		/*
> +		 * As fs-verity doesn't work with multi-page folios, verity
> +		 * inodes have large folios disabled (only single page folios
> +		 * are used)
> +		 */
> +		if (!error)
> +			error = PageError(folio_page(fi.folio, 0));
> +
>  		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> +	}
> +
>  	bio_put(bio);
> +	/* The iomap_readend has been freed by bio_put() */
> +}
> +
> +static void iomap_read_work_end_io(
> +	struct work_struct *work)
> +{
> +	struct iomap_readend *ctx =
> +		container_of(work, struct iomap_readend, read_work);
> +	struct bio *bio = &ctx->read_inline_bio;
> +
> +	fsverity_verify_bio(bio);
> +	iomap_read_end_io(bio);
> +}
> +
> +static void iomap_read_work_io(struct bio *bio)
> +{
> +	struct iomap_readend *ctx =
> +		container_of(bio, struct iomap_readend, read_inline_bio);
> +
> +	fsverity_enqueue_verify_work(&ctx->read_work);
>  }

Ok, so fsverity simple queues this to a global workqueue it has set
up as WQ_UNBOUND | WQ_HIGHPRI with, effectively, one worker thread
per CPU. This will work for simple cases and to get the basic
infrastructure in place, but there are problems here that we will
need to address.

1. High priority workqueues are used within XFS to ensure that data
IO completion cannot stall processing of journal IO completions. We
use them to prevent IO priority inversion deadlocks.

That is, journal IO completions use a WQ_HIGHPRI workqueue to ensure
that they are scheduled ahead of data IO completions as data IO
completions may require journal IO to complete before they can make
progress (e.g. to ensure transaction reservations in IO completion
can make progress).

Hence using a WQ_HIGHPRI workqueue directly in the user data IO path
is a potential filesystem livelock/deadlock vector. At best, it's
going to impact on the latency of journal commits and hence anything
that is running data integrity operations whilst fsverity read
operations are in progress.

The second thing is that the fsverity workqueue is global - it
creates a cross-filesystem contention point. That means a single
busy fsverity filesystem will create unpredictable IO latency for
filesystems that only use fsverity sparingly. i.e. heavy amounts of
fsverity work on one filesystem will impact the performance of
journal and other IO operations on all other active filesystems due
to fsverity using WQ_HIGHPRI.

This sort of workqueue setup is typically fine for a single user
device that has lots of otherwise idle CPU that can be co-opted for
create concurrency in IO completion work where there would otherwise
been none (e.g. an Android phone).

However, it is less than ideal for a high concurrent application
using AIO or io_uring to push a couple of million IOPS through the
filesystem. In these sitations, we don't want IO completion work to
be spread outi because there is no idle CPU for them to be run on.
Instead, locality is king - we want them run on the same CPU that
already has the inode, bio, and other objects hot in the cache.

So, really, for XFS we want per-filesystem, locality preserving
per-cpu workqueues for fsverity work as we get IO concurrency (and
therefore fsverity work concurrency) from the application IO
patterns. And it avoids all potential issues with using WQ_HIGHPRI
for journal IO to avoid data IO completion deadlocks.


>  struct iomap_readpage_ctx {
> @@ -264,6 +296,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  	loff_t orig_pos = pos;
>  	size_t poff, plen;
>  	sector_t sector;
> +	struct iomap_readend *readend;
>  
>  	if (iomap->type == IOMAP_INLINE)
>  		return iomap_read_inline_data(iter, folio);
> @@ -276,7 +309,21 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  
>  	if (iomap_block_needs_zeroing(iter, pos)) {
>  		folio_zero_range(folio, poff, plen);
> -		iomap_set_range_uptodate(folio, iop, poff, plen);
> +		if (!fsverity_active(iter->inode)) {
> +			iomap_set_range_uptodate(folio, iop, poff, plen);
> +			goto done;
> +		}
> +
> +		/*
> +		 * As fs-verity doesn't work with folios sealed inodes have
> +		 * multi-page folios disabled and we can check on first and only
> +		 * page
> +		 */
> +		if (fsverity_verify_page(folio_page(folio, 0)))
> +			iomap_set_range_uptodate(folio, iop, poff, plen);
> +		else
> +			folio_set_error(folio);

This makes me wonder if this should be a iomap->page_op method
rather than calling fsverity code directly. i.e. if the filesystem
knows that it fsverity is enabled on the inode during it's
iomap_begin() callout, and that state *must not change* until the IO
is complete. Hence it can set a iomap flag saying
IOMAP_F_READ_VERIFY and add a page_ops vector with a "verify_folio"
callout. Then this code can do:

	if (iomap_block_needs_zeroing(iter, pos)) {
		folio_zero_range(folio, poff, plen);
		if (iomap->flags & IOMAP_F_READ_VERIFY) {
			if (!iomap->page_ops->verify_folio(folio, poff, plen)) {
				folio_set_error(folio);
				goto done;
			}
		}
		iomap_set_range_uptodate(folio, iop, poff, plen);
		got done;
	}

> @@ -297,8 +344,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  
>  		if (ctx->rac) /* same as readahead_gfp_mask */
>  			gfp |= __GFP_NORETRY | __GFP_NOWARN;
> -		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> +		if (fsverity_active(iter->inode)) {
> +			ctx->bio = bio_alloc_bioset(iomap->bdev,
> +					bio_max_segs(nr_vecs), REQ_OP_READ,
> +					GFP_NOFS, &iomap_readend_bioset);
> +			readend = container_of(ctx->bio,
> +					struct iomap_readend,
> +					read_inline_bio);
> +			INIT_WORK(&readend->read_work, iomap_read_work_end_io);
> +		} else {
> +			ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
>  				     REQ_OP_READ, gfp);
> +		}
>  		/*
>  		 * If the bio_alloc fails, try it again for a single page to
>  		 * avoid having to deal with partial page reads.  This emulates
> @@ -311,7 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  		if (ctx->rac)
>  			ctx->bio->bi_opf |= REQ_RAHEAD;
>  		ctx->bio->bi_iter.bi_sector = sector;
> -		ctx->bio->bi_end_io = iomap_read_end_io;
> +		if (fsverity_active(iter->inode))
> +			ctx->bio->bi_end_io = iomap_read_work_io;
> +		else
> +			ctx->bio->bi_end_io = iomap_read_end_io;


Hmmm. OK, why not just always use the iomap_readend_bioset, put
a flags field in it and then do this check in iomap_read_end_io()?

i.e.

struct iomap_read_ioend {
	bool			queue_work;
	struct work_struct	work;	/* post read work (fs-verity) */
	struct bio		inline_bio;/* MUST BE LAST! */
};


Then always allocate from the iomap_read_ioend_bioset, and do:

		ctx->bio->bi_end_io = iomap_read_end_io;
		if (iomap->flags & IOMAP_F_VERITY) {
			read_ioend->queue_work = true;
			INIT_WORK(&read_ioend->work, iomap_read_ioend_work);
		}

The in iomap_read_end_io():

	if (ioend->queue_work) {
		fsverity_enqueue_verify_work(&ctx->work);
		return;
	}
	iomap_read_end_io(bio);

And if we put a fs supplied workqueue into the iomap as well, then
we have a mechanism for the filesystem to supply it's own workqueue
for fsverity, fscrypt, or any other read-side post-IO data processing
functions filesystems care to add.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files
  2022-12-14  2:07   ` Dave Chinner
@ 2022-12-14  5:44     ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-12-14  5:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 01:07:15PM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 06:29:31PM +0100, Andrey Albershteyn wrote:
> > The direct path is not supported on verity files. Attempts to use direct
> > I/O path on such files should fall back to buffered I/O path.
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 5eadd9a37c50e..fb4181e38a19d 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -245,7 +245,8 @@ xfs_file_dax_read(
> >  	struct kiocb		*iocb,
> >  	struct iov_iter		*to)
> >  {
> > -	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
> > +	struct inode		*inode = iocb->ki_filp->f_mapping->host;
> > +	struct xfs_inode	*ip = XFS_I(inode);
> >  	ssize_t			ret = 0;
> >  
> >  	trace_xfs_file_dax_read(iocb, to);
> > @@ -298,10 +299,17 @@ xfs_file_read_iter(
> >  
> >  	if (IS_DAX(inode))
> >  		ret = xfs_file_dax_read(iocb, to);
> 
> fsverity is supported on DAX?
> 
> Eric, I was under the impression that the DAX io path does not
> support fsverity, but I can't see anything that prevents ext4 from
> using fsverity on dax enabled filesystems. Does this work (is it
> tested regularly?), or is the lack of checking simply an oversight
> in that nobody thought to check DAX status when fsverity is enabled?

DAX and fsverity are mutually exclusive.  ext4_set_inode_flags() doesn't set the
DAX flag if the inode already has the verity flag, and
ext4_begin_enable_verity() doesn't allow setting the verity flag if the inode
already has the DAX flag.

> 
> > -	else if (iocb->ki_flags & IOCB_DIRECT)
> > +	else if (iocb->ki_flags & IOCB_DIRECT && !fsverity_active(inode))
> >  		ret = xfs_file_dio_read(iocb, to);
> > -	else
> > +	else {
> > +		/*
> > +		 * In case fs-verity is enabled, we also fallback to the
> > +		 * buffered read from the direct read path. Therefore,
> > +		 * IOCB_DIRECT is set and need to be cleared
> > +		 */
> > +		iocb->ki_flags &= ~IOCB_DIRECT;
> >  		ret = xfs_file_buffered_read(iocb, to);
> > +	}
> 
> Is this IOCB_DIRECT avoidance a limitation of the XFS
> implementation, or a generic limitation of the fsverity
> infrastructure?
> 
> If it's a limitation of the fsverity infrastructure, then we
> shouldn't be working around this in every single filesystem that
> supports fsverity.  If all the major filesystems are having to check
> fsverity_active() and clear IOCB_DIRECT on every single IOCB_DIRECT
> IO that is issued on a fsverity inode, then shouldn't we just elide
> IOCB_DIRECT from file->f_iocb_flags in the first place?

It's mainly a filesystem limitation, not a fs/verity/ limitation.  However, the
functions in fs/verity/verify.c do assume that the data pages are page cache
pages.  To allow filesystems to support direct I/O on verity files, functions
that take the inode and file offset explicitly would need to be added.

Not setting IOCB_DIRECT in ->f_iocb_flags is an interesting idea.  I've been
trying not to add fscrypt and fsverity stuff to the core VFS syscall paths,
since only certain filesystems support these features, so it makes sense to
limit to the overhead (however minimal) to those filesystems only.  However,
since ->f_iocb_flags was recently added to cache iocb_flags(), it does look like
the VFS could check IS_VERITY() in iocb_flags() with minimal overhead.

A potential issue is that if a file is opened with O_DIRECT and then
FS_IOC_ENABLE_VERITY is run (either on that fd or on a different fd), then the
O_DIRECT fd would still exist -- with IOCB_DIRECT in ->f_iocb_flags.

The read-time check would be needed to correctly handle that case.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-13 22:11   ` Dave Chinner
@ 2022-12-14  6:31     ` Eric Biggers
  2022-12-14 23:06       ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-14  6:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 09:11:39AM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 12:50:28PM -0800, Eric Biggers wrote:
> > On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote:
> > > Not yet implemented:
> > > - No pre-fetching of Merkle tree pages in the
> > >   read_merkle_tree_page()
> > 
> > This would be helpful, but not essential.
> > 
> > > - No marking of already verified Merkle tree pages (each read, the
> > >   whole tree is verified).
> 
> Ah, I wasn't aware that this was missing.
> 
> > 
> > This is essential to have, IMO.
> > 
> > You *could* do what btrfs does, where it caches the Merkle tree pages in the
> > inode's page cache past i_size, even though btrfs stores the Merkle tree
> > separately from the file data on-disk.
> >
> > However, I'd guess that the other XFS developers would have an adversion to that
> > approach, even though it would not affect the on-disk storage.
> 
> Yup, on an architectural level it just seems wrong to cache secure
> verification metadata in the same user accessible address space as
> the data it verifies.
> 
> > The alternatives would be to create a separate in-memory-only inode for the
> > cache, or to build a custom cache with its own shrinker.
> 
> The merkel tree blocks are cached in the XFS buffer cache.
> 
> Andrey, could we just add a new flag to the xfs_buf->b_flags to
> indicate that the buffer contains verified merkle tree records?
> i.e. if it's not set after we've read the buffer, we need to verify
> the buffer and set th verified buffer in cache and we can skip the
> verification?

Well, my proposal at
https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org is to keep
tracking the "verified" status at the individual Merkle tree block level, by
adding a bitmap fsverity_info::hash_block_verified.  That is part of the
fs/verity/ infrastructure, and all filesystems would be able to use it.

However, since it's necessary to re-verify blocks that have been evicted and
then re-instantiated, my patch also repurposes PG_checked as an indicator for
whether the Merkle tree pages are newly instantiated.  For a "non-page-cache
cache", that part would need to be replaced with something equivalent.

A different aproach would be to make it so that every time a page (or "cache
buffer", to call it something more generic) of N Merkle tree blocks is read,
then all N of those blocks are verified immediately.  Then there would be no
need to track the "verified" status of individual blocks.

My concerns with that approach are:

  * Most data reads only need a single Merkle tree block at the deepest level.
    If at least N tree blocks were verified any time that any were verified at
    all, that would make the worst-case read latency worse.

  * It's possible that the parents of N tree blocks are split across a cache
    buffer.  Thus, while N blocks can't have more than N parents, and in
    practice would just have 1-2, those 2 parents could be split into two
    separate cache buffers, with a total length of 2*N.  Verifying all of those
    would really increase the worst-case latency as well.

So I'm thinking that tracking the "verified" status of tree blocks individually
is the right way to go.  But I'd appreciate any other thoughts on this.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 21:10       ` Dave Chinner
@ 2022-12-14  6:52         ` Eric Biggers
  2022-12-14  8:12           ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-14  6:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Matthew Wilcox, Andrey Albershteyn, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 08:10:10AM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 11:33:54AM -0800, Eric Biggers wrote:
> > On Tue, Dec 13, 2022 at 05:55:22PM +0000, Matthew Wilcox wrote:
> > > I'm happy to work with you to add support for large folios to verity.
> > > It hasn't been high priority for me, but I'm now working on folio support
> > > for bufferhead filesystems and this would probably fit in.
> > 
> > I'd be very interested to know what else is needed after commit 98dc08bae678
> > ("fsverity: stop using PG_error to track error status") which is upstream now,
> > and
> > https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u
> > ("fsverity: support for non-4K pages") which is planned for 6.3.
> 
> Did you change the bio interfaces to iterate a bvec full of
> variable sized folios, or does it still expect a bio to only have
> PAGE_SIZE pages attached to it?
> 

You can take a look at fsverity_verify_bio() with
https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org applied.
It uses bio_for_each_segment_all() to iterate through the bio's segments.  For
each segment, it verifies each data block in the segment, assuming bv_len and
bv_offset are multiples of the Merkle tree block size.  The file position of
each data block is computed as '(page->index << PAGE_SHIFT) + bv_offset'.

I suppose the issue is going to be that only the first page of a folio actually
has an index.  Using bio_for_each_folio_all() would avoid this problem, I think?

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 10/11] xfs: add fs-verity support
  2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
  2022-12-13 19:08   ` Eric Biggers
@ 2022-12-14  7:58   ` Dave Chinner
  1 sibling, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  7:58 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:34PM +0100, Andrey Albershteyn wrote:
> Add integration with fs-verity. The XFS store fs-verity metadata in
> the extended attributes. The metadata consist of verity descriptor
> and Merkle tree pages.
> 
> The descriptor is stored under "verity_descriptor" extended
> attribute. The Merkle tree pages are stored under binary indexes.
> 
> When fs-verity is enabled on an inode, the XFS_IVERITY flag is set
> meaning that the Merkle tree is being build. Then, pagecache is
> flushed and large folios are disabled as these aren't yet supported
> by fs-verity. This is done in xfs_begin_enable_verity() to make sure
> that fs-verity operations on the inode don't populate cache with
> large folios during a tree build. The initialization ends with
> storing of verity descriptor and setting inode on-disk flag
> (XFS_DIFLAG2_VERITY).
> 
> Also add check that block size == PAGE_SIZE as fs-verity doesn't
> support different sizes yet.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> ---
>  fs/xfs/Makefile          |   1 +
>  fs/xfs/libxfs/xfs_attr.c |   8 ++
>  fs/xfs/xfs_inode.h       |   1 +
>  fs/xfs/xfs_super.c       |  10 ++
>  fs/xfs/xfs_verity.c      | 203 +++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_verity.h      |  19 ++++
>  6 files changed, 242 insertions(+)
>  create mode 100644 fs/xfs/xfs_verity.c
>  create mode 100644 fs/xfs/xfs_verity.h
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 42d0496fdad7d..5afa8ae5b3b7f 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -131,6 +131,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
>  xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
>  xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
>  xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
> +xfs-$(CONFIG_FS_VERITY)		+= xfs_verity.o
>  
>  # notify failure
>  ifeq ($(CONFIG_MEMORY_FAILURE),y)
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 57080ea4c869b..42013fc99b76a 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -26,6 +26,7 @@
>  #include "xfs_trace.h"
>  #include "xfs_attr_item.h"
>  #include "xfs_xattr.h"
> +#include "xfs_verity.h"
>  
>  struct kmem_cache		*xfs_attr_intent_cache;
>  
> @@ -1632,6 +1633,13 @@ xfs_attr_namecheck(
>  		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
>  	}
>  
> +	if (flags & XFS_ATTR_VERITY) {
> +		if (length != sizeof(__be64) &&
> +				length != XFS_VERITY_DESCRIPTOR_NAME_LEN)

This needs a comment describing what it is checking as it is not
obvious from reading the code. I can grok what the
XFS_VERITY_DESCRIPTOR_NAME_LEN check is about, but the sizeof()
check is not obvious.

I also suspect it is also better to explicitly check for valid
values rather than invalid values. i.e.

		/* describe what name is 8 bytes in length */
		if (length == sizeof(__be64))
			return true;
		/* Verity descriptor blocks are held in a named attribute. */
		if (length == XFS_VERITY_DESCRIPTOR_NAME_LEN)
			return true;
		/* Not a valid verity attribute name length */
		return false.


> +			return false;
> +		return true;
> +	}
> +
>  	return xfs_str_attr_namecheck(name, length);
>  }
>  
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 5735de32beebd..070631adac572 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -325,6 +325,7 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>   * plain old IRECLAIMABLE inode.
>   */
>  #define XFS_INACTIVATING	(1 << 13)
> +#define XFS_IVERITY		(1 << 14) /* merkle tree is in progress */

Does this flag mean the inode has verity information on it (as the
name suggests) or that the inode is currently being measured and the
merkle tree is being built (as the comment suggests)?
>  
>  /* All inode state flags related to inode reclaim. */
>  #define XFS_ALL_IRECLAIM_FLAGS	(XFS_IRECLAIMABLE | \
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 50c2c819ba940..a3c89d2c06a8a 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -41,6 +41,7 @@
>  #include "xfs_attr_item.h"
>  #include "xfs_xattr.h"
>  #include "xfs_iunlink_item.h"
> +#include "xfs_verity.h"
>  
>  #include <linux/magic.h>
>  #include <linux/fs_context.h>
> @@ -1469,6 +1470,9 @@ xfs_fs_fill_super(
>  	sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
>  #endif
>  	sb->s_op = &xfs_super_operations;
> +#ifdef CONFIG_FS_VERITY
> +	sb->s_vop = &xfs_verity_ops;
> +#endif
>  
>  	/*
>  	 * Delay mount work if the debug hook is set. This is debug
> @@ -1669,6 +1673,12 @@ xfs_fs_fill_super(
>  		xfs_alert(mp,
>  	"EXPERIMENTAL parent pointer feature enabled. Use at your own risk!");
>  
> +	if (xfs_has_verity(mp) && mp->m_super->s_blocksize != PAGE_SIZE) {
> +		xfs_alert(mp,
> +			"Cannot use fs-verity with block size != PAGE_SIZE");
> +		goto out_filestream_unmount;
> +	}

Needs an EXPERIMENTAL warning to be emitted here, jus tlike the
parent pointer feature check above.

> +
>  	error = xfs_mountfs(mp);
>  	if (error)
>  		goto out_filestream_unmount;
> diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c
> new file mode 100644
> index 0000000000000..112a72d0b0ca7
> --- /dev/null
> +++ b/fs/xfs/xfs_verity.c
> @@ -0,0 +1,203 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Red Hat, Inc.
> + */
> +#include "xfs.h"
> +#include "xfs_shared.h"
> +#include "xfs_format.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_inode.h"
> +#include "xfs_attr.h"
> +#include "xfs_verity.h"
> +#include "xfs_bmap_util.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +
> +static int
> +xfs_get_verity_descriptor(
> +	struct inode		*inode,
> +	void			*buf,
> +	size_t			buf_size)

So what is the API being implemented here? It passes in a NULL *buf
and buf_size = 0 to query the size of the verity descriptor?

And then allocates a buffer the size returned and calls again with
*buf of the right size and buf_size = the right size?

> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	int			error = 0;
> +	struct xfs_da_args	args = {
> +		.dp		= ip,
> +		.attr_filter	= XFS_ATTR_VERITY,
> +		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
> +		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
> +		.valuelen	= buf_size,
> +	};
> +
> +	error = xfs_attr_get(&args);
> +	if (error)
> +		return error;
> +
> +	if (buf_size == 0)
> +		return args.valuelen;
> +
> +	if (args.valuelen > buf_size) {
> +		kmem_free(args.value);
> +		return -ERANGE;
> +	}

Ok, this won't happen. If the provided value length (buf_size) is
smaller than the attribute value length found, xfs_attr_copy_value()
will return -ERANGE and set args.valuelen to the actual attribute
size found. That will be caught by the initial error return, so
you would never have noticed the kmem_free(NULL) call here.

xfs_attr_copy_value() deals with attributes both larger and smaller
than the provided buffer correctly.

> +	memcpy(buf, args.value, buf_size);

However, this will overrun the allocated args.value buffer if the
attribute that was found is smaller than buf_size - the actual
length of the attribute value that is copied is written into
args.valuelen and it may be less than the size of the buffer
supplied....

> +
> +	kmem_free(args.value);
> +	return args.valuelen;
> +}

It seems this this function can be rewritten as: 

{
	struct xfs_inode	*ip = XFS_I(inode);
	int			error = 0;
	struct xfs_da_args	args = {
		.dp		= ip,
		.attr_filter	= XFS_ATTR_VERITY,
		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
		.value		= buf,
		.valuelen	= buf_size,
	};

	error = xfs_attr_get(&args);
	if (error)
		return error;

	return args.valuelen;
}

And function as it does now.


> +
> +static int
> +xfs_begin_enable_verity(
> +	struct file	    *filp)
> +{
> +	struct inode	    *inode = file_inode(filp);
> +	struct xfs_inode    *ip = XFS_I(inode);
> +	int		    error = 0;

Hmmm. This code looks like it assumes the XFS_IOLOCK_EXCL is already
held - when was that lock taken? Can you add this:

	ASSERT(xfs_is_ilocked(ip, XFS_IOLOCK_EXCL));

> +
> +	if (IS_DAX(inode))
> +		return -EINVAL;

So fsverity is not compatible with DAX? What happens if we
turn on DAX after fsverity has been enabled on an inode? I didn't
see an exception to avoid IS_DAX() inodes with fsverity enabled
in the xfs_file_read_iter() code....

> +
> +	if (xfs_iflags_test(ip, XFS_IVERITY))
> +		return -EBUSY;
> +	xfs_iflags_set(ip, XFS_IVERITY);

Ah, so this is the "merkle tree being built" flag.

Can we name it XFS_IVERITY_CONSTRUCTION or something similar that
tells the reader that is a "work is in progress" flag rather than an
"inode has verity enabled" flag?


> +	/*
> +	 * As fs-verity doesn't support multi-page folios yet, flush everything
> +	 * from page cache and disable it
> +	 */
> +	filemap_invalidate_lock(inode->i_mapping);
> +
> +	inode_dio_wait(inode);

This only works to stop all IO if somethign is already holding
the XFS_IOLOCK_EXCL, hence my comment about the assert above.


> +	error = xfs_flush_unmap_range(ip, 0, XFS_ISIZE(ip));
> +	if (error)
> +		goto out;

	WARN_ON_ONCE(inode->i_mapping->nr_pages != 0);

Ok, so by this point nothing can instantiate new pages in the page
cache, and the mapping should be empty. Which means there should be
no large pages in it at all.

Which means it is safe to remove large folio support from the
mapping at this point..

> +	mapping_clear_large_folios(inode->i_mapping);

If Willy doesn't want to this wrapper to exist (and I can see why he
doesn't want it), just open code it with a clear comment detailing
that it will go away as the fsverity infrastructure is updated to
support multipage folios.

	/*
	 * This bit of nastiness will go away when fsverity support
	 * for multi-page folios is added.
	 */
	clear_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);


> +
> +out:
> +	filemap_invalidate_unlock(inode->i_mapping);
> +	if (error)
> +		xfs_iflags_clear(ip, XFS_IVERITY);
> +	return error;
> +}
> +
> +static int
> +xfs_end_enable_verity(
> +	struct file		*filp,
> +	const void		*desc,
> +	size_t			desc_size,
> +	u64			merkle_tree_size)
> +{
> +	struct inode		*inode = file_inode(filp);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_trans	*tp;
> +	struct xfs_da_args	args = {
> +		.dp		= ip,
> +		.whichfork	= XFS_ATTR_FORK,
> +		.attr_filter	= XFS_ATTR_VERITY,
> +		.attr_flags	= XATTR_CREATE,
> +		.name		= (const uint8_t *)XFS_VERITY_DESCRIPTOR_NAME,
> +		.namelen	= XFS_VERITY_DESCRIPTOR_NAME_LEN,
> +		.value		= (void *)desc,
> +		.valuelen	= desc_size,
> +	};
> +	int			error = 0;

	ASSERT(xfs_is_ilocked(ip, XFS_IOLOCK_EXCL));

> +
> +	/* fs-verity failed, just cleanup */
> +	if (desc == NULL) {
> +		mapping_set_large_folios(inode->i_mapping);
> +		goto out;

I don't think it's safe to enable this while there may be page cache
operations running (i.e. through the page fault path). It also uses
__set_bit(), which is not guaranteed to be an atomic bit operation
(i.e. might be a RMW operation) and so isn't safe to perform on
variables that might be modified concurrently (as can happen with
mapping->flags in this context).

I think if verity measurement fails, the least of our worries is
re-enabling large folio support. hence I'd just leave this out for
the moment - the disabling of large folios is really only a
temporary thing...

> +	}
> +
> +	error = xfs_attr_set(&args);
> +	if (error)
> +		goto out;
> +
> +	/* Set fsverity inode flag */
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
> +	if (error)
> +		goto out;
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);

xfs_trans_alloc_inode()?

> +
> +	ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
> +	inode->i_flags |= S_VERITY;
> +
> +	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +	error = xfs_trans_commit(tp);

I think this should be a data integrity commit here. We're setting
the on-disk flag and guaranteeing to the caller that the verity
information has been written and verity is enabled, so we should
really guarantee that it is persistent before returning to
the caller.

	/*
	 * Ensure that we've persisted the verity information before
	 * we enable it on the inode and tell the caller we have
	 * sealed the inode.
	 */
	ip->i_diflags2 |= XFS_DIFLAG2_VERITY;

	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
	xfs_trans_set_sync(tp);

	error = xfs_trans_commit(tp);

	if (!error)
		inode->i_flags |= S_VERITY;

> +
> +out:
> +	if (error)
> +		mapping_set_large_folios(inode->i_mapping);

See above. 

> +
> +	xfs_iflags_clear(ip, XFS_IVERITY);
> +	return error;
> +}
> +
> +static struct page *
> +xfs_read_merkle_tree_page(
> +	struct inode		*inode,
> +	pgoff_t			index,
> +	unsigned long		num_ra_pages)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct page		*page;
> +	__be64			name = cpu_to_be64(index);
> +	struct xfs_da_args	args = {
> +		.dp		= ip,
> +		.attr_filter	= XFS_ATTR_VERITY,
> +		.name		= (const uint8_t *)&name,
> +		.namelen	= sizeof(__be64),
> +		.valuelen	= PAGE_SIZE,
> +	};
> +	int			error = 0;
> +
> +	error = xfs_attr_get(&args);
> +	if (error)
> +		return ERR_PTR(-EFAULT);
> +
> +	page = alloc_page(GFP_KERNEL);
> +	if (!page)
> +		return ERR_PTR(-ENOMEM);
> +
> +	memcpy(page_address(page), args.value, args.valuelen);
> +
> +	kmem_free(args.value);
> +	return page;
> +}

Ok, this will work, but now I see why the merkel tree needs
validation on every read - the validation code makes the assumption
that the same *physical page* for a given tree block is fed to it
over and over again. It doesn't appear to check that the data in the
page is unchanged, just checks for the PageChecked() flag being set.
Not sure how the verify_page algorithm will work with variable sized
multi-page folios in the page cache, but that's a future problem...

I suspect we want to have xfs_attr_get() return us the xfs_buf that
contains the attribute value rather than a copy of the data itself.
Then we can pull the backing page from the xfs_buf here, grab a
reference to it and hand it back to the caller. 

All we need is a callout when the page ref is dropped by
verify_page() so that we can release the reference to the xfs_buf
that owns the page....

> +static int
> +xfs_write_merkle_tree_block(
> +	struct inode		*inode,
> +	const void		*buf,
> +	u64			index,
> +	int			log_blocksize)

"log blocksize"?

What's the journal got to do with writing a merkle tree block? :)

Hmmmm. The index is in units of blocks, and the size is always
one block. So to convert both size and index to units of bytes, we
have to shift upwards by "log_blocksize".

Yup, everyone immediately converts index to a byte offset, and
length to a byte count.

Eric, can we get this interface converted to pass an offset for the
index and a buffer length for the contents of *buf in bytes as
that's how everyone uses it?

> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	__be64			name = cpu_to_be64(index);
> +	struct xfs_da_args	args = {
> +		.dp		= ip,
> +		.whichfork	= XFS_ATTR_FORK,
> +		.attr_filter	= XFS_ATTR_VERITY,
> +		.attr_flags	= XATTR_CREATE,
> +		.name		= (const uint8_t *)&name,
> +		.namelen	= sizeof(__be64),
> +		.value		= (void *)buf,
> +		.valuelen	= 1 << log_blocksize,
> +	};

I'd prefer to store the merkle tree block as a byte offset (i.e.
index << log_blocksize) rather than a block index that is dependent
on something else to define the object size. That way we can
actually validate that the entire merkle tree is present (e.g. when
scrubbing/repairing inodes) without having to know what the merkle
tree block size is.


> +
> +	return xfs_attr_set(&args);
> +}
> +
> +const struct fsverity_operations xfs_verity_ops = {
> +	.begin_enable_verity = &xfs_begin_enable_verity,
> +	.end_enable_verity = &xfs_end_enable_verity,
> +	.get_verity_descriptor = &xfs_get_verity_descriptor,
> +	.read_merkle_tree_page = &xfs_read_merkle_tree_page,
> +	.write_merkle_tree_block = &xfs_write_merkle_tree_block,
> +};
> diff --git a/fs/xfs/xfs_verity.h b/fs/xfs/xfs_verity.h
> new file mode 100644
> index 0000000000000..ae5d87ca32a86
> --- /dev/null
> +++ b/fs/xfs/xfs_verity.h
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022 Red Hat, Inc.
> + */
> +#ifndef __XFS_VERITY_H__
> +#define __XFS_VERITY_H__
> +
> +#include <linux/fsverity.h>
> +
> +#define XFS_VERITY_DESCRIPTOR_NAME "verity_descriptor"
> +#define XFS_VERITY_DESCRIPTOR_NAME_LEN 17

Need to put a

	BUILD_BUG_ON(strlen(XFS_VERITY_DESCRIPTOR_NAME) ==
			XFS_VERITY_DESCRIPTOR_NAME_LEN);

somewhere in the code - there's a function that checks on-disk
structure sizes somewhere like this (in xfs_ondisk.c?)....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-14  6:52         ` Eric Biggers
@ 2022-12-14  8:12           ` Dave Chinner
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  8:12 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Matthew Wilcox, Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 10:52:11PM -0800, Eric Biggers wrote:
> On Wed, Dec 14, 2022 at 08:10:10AM +1100, Dave Chinner wrote:
> > On Tue, Dec 13, 2022 at 11:33:54AM -0800, Eric Biggers wrote:
> > > On Tue, Dec 13, 2022 at 05:55:22PM +0000, Matthew Wilcox wrote:
> > > > I'm happy to work with you to add support for large folios to verity.
> > > > It hasn't been high priority for me, but I'm now working on folio support
> > > > for bufferhead filesystems and this would probably fit in.
> > > 
> > > I'd be very interested to know what else is needed after commit 98dc08bae678
> > > ("fsverity: stop using PG_error to track error status") which is upstream now,
> > > and
> > > https://lore.kernel.org/linux-fsdevel/20221028224539.171818-1-ebiggers@kernel.org/T/#u
> > > ("fsverity: support for non-4K pages") which is planned for 6.3.
> > 
> > Did you change the bio interfaces to iterate a bvec full of
> > variable sized folios, or does it still expect a bio to only have
> > PAGE_SIZE pages attached to it?
> > 
> 
> You can take a look at fsverity_verify_bio() with
> https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org applied.
> It uses bio_for_each_segment_all() to iterate through the bio's segments.  For
> each segment, it verifies each data block in the segment, assuming bv_len and
> bv_offset are multiples of the Merkle tree block size.  The file position of
> each data block is computed as '(page->index << PAGE_SHIFT) + bv_offset'.

Right, that still has the issue that it thinks each segment is a
struct page, whereas the iomap code is putting a struct folio that
contains a contiguous multipage folio into each segment. 

> I suppose the issue is going to be that only the first page of a folio actually
> has an index.  Using bio_for_each_folio_all() would avoid this problem, I think?

Yes, using bio_for_each_folio_all() is large folio aware. But then
you'll have to also pass the folio through the verification chain
for getting data offsets into the folio, etc.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction
  2022-12-14  5:25     ` Eric Biggers
@ 2022-12-14  8:18       ` Dave Chinner
  0 siblings, 0 replies; 51+ messages in thread
From: Dave Chinner @ 2022-12-14  8:18 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 09:25:38PM -0800, Eric Biggers wrote:
> On Wed, Dec 14, 2022 at 12:35:24PM +1100, Dave Chinner wrote:
> > On Tue, Dec 13, 2022 at 06:29:30PM +0100, Andrey Albershteyn wrote:
> > > fs-verity will read and attach metadata (not the tree itself) from
> > > a disk for those inodes which already have fs-verity enabled.
> > > 
> > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > ---
> > >  fs/xfs/xfs_file.c  | 8 ++++++++
> > >  fs/xfs/xfs_super.c | 2 ++
> > >  2 files changed, 10 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index 242165580e682..5eadd9a37c50e 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -32,6 +32,7 @@
> > >  #include <linux/mman.h>
> > >  #include <linux/fadvise.h>
> > >  #include <linux/mount.h>
> > > +#include <linux/fsverity.h>
> > >  
> > >  static const struct vm_operations_struct xfs_file_vm_ops;
> > >  
> > > @@ -1170,9 +1171,16 @@ xfs_file_open(
> > >  	struct inode	*inode,
> > >  	struct file	*file)
> > >  {
> > > +	int		error = 0;
> > > +
> > >  	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
> > >  		return -EIO;
> > >  	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
> > > +
> > > +	error = fsverity_file_open(inode, file);
> > > +	if (error)
> > > +		return error;
> > 
> > This is a hot path, so shouldn't we elide the function call
> > altogether if verity is not enabled on the inode? i.e:
> > 
> > 	if (IS_VERITY(inode)) {
> > 		error = fsverity_file_open(inode, file);
> > 		if (error)
> > 			return error;
> > 	}
> > 
> > It doesn't really matter for a single file open, but when you're
> > opening a few million inodes every second the function call overhead
> > only to immediately return because IS_VERITY() is false adds up...
> > 
> > >  	return generic_file_open(inode, file);
> > >  }
> > >  
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 8f1e9b9ed35d9..50c2c819ba940 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -45,6 +45,7 @@
> > >  #include <linux/magic.h>
> > >  #include <linux/fs_context.h>
> > >  #include <linux/fs_parser.h>
> > > +#include <linux/fsverity.h>
> > >  
> > >  static const struct super_operations xfs_super_operations;
> > >  
> > > @@ -647,6 +648,7 @@ xfs_fs_destroy_inode(
> > >  	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> > >  	XFS_STATS_INC(ip->i_mount, vn_rele);
> > >  	XFS_STATS_INC(ip->i_mount, vn_remove);
> > > +	fsverity_cleanup_inode(inode);
> > 
> > Similarly, shouldn't this be:
> > 
> > 	if (fsverity_active(inode))
> > 		fsverity_cleanup_inode(inode);
> > 
> 
> If you actually want to do that, then we should instead make these functions
> inline functions that do the "is anything needed?" check, then call a
> double-underscored version that does the actual work.  Some of the fscrypt
> functions are like that.  Then all filesystems would get the benefit.

Agreed, that's the right way to do it. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-14  6:31     ` Eric Biggers
@ 2022-12-14 23:06       ` Dave Chinner
  2022-12-15  6:47         ` Eric Biggers
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-14 23:06 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 10:31:42PM -0800, Eric Biggers wrote:
> On Wed, Dec 14, 2022 at 09:11:39AM +1100, Dave Chinner wrote:
> > On Tue, Dec 13, 2022 at 12:50:28PM -0800, Eric Biggers wrote:
> > > On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote:
> > > > Not yet implemented:
> > > > - No pre-fetching of Merkle tree pages in the
> > > >   read_merkle_tree_page()
> > > 
> > > This would be helpful, but not essential.
> > > 
> > > > - No marking of already verified Merkle tree pages (each read, the
> > > >   whole tree is verified).
> > 
> > Ah, I wasn't aware that this was missing.
> > 
> > > 
> > > This is essential to have, IMO.
> > > 
> > > You *could* do what btrfs does, where it caches the Merkle tree pages in the
> > > inode's page cache past i_size, even though btrfs stores the Merkle tree
> > > separately from the file data on-disk.
> > >
> > > However, I'd guess that the other XFS developers would have an adversion to that
> > > approach, even though it would not affect the on-disk storage.
> > 
> > Yup, on an architectural level it just seems wrong to cache secure
> > verification metadata in the same user accessible address space as
> > the data it verifies.
> > 
> > > The alternatives would be to create a separate in-memory-only inode for the
> > > cache, or to build a custom cache with its own shrinker.
> > 
> > The merkel tree blocks are cached in the XFS buffer cache.
> > 
> > Andrey, could we just add a new flag to the xfs_buf->b_flags to
> > indicate that the buffer contains verified merkle tree records?
> > i.e. if it's not set after we've read the buffer, we need to verify
> > the buffer and set th verified buffer in cache and we can skip the
> > verification?
> 
> Well, my proposal at
> https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org is to keep
> tracking the "verified" status at the individual Merkle tree block level, by
> adding a bitmap fsverity_info::hash_block_verified.  That is part of the
> fs/verity/ infrastructure, and all filesystems would be able to use it.

Yeah, i had a look at that rewrite of the verification code last
night - I get the gist of what it is doing, but a single patch of
that complexity is largely impossible to sanely review...

Correct me if I'm wrong, but won't using a bitmap with 1 bit per
verified block cause problems with contiguous memory allocation
pretty quickly? i.e. a 64kB bitmap only tracks 512k blocks, which is
only 2GB of merkle tree data. Hence at file sizes of 100+GB, the
bitmap would have to be kvmalloc()d to guarantee allocation will
succeed.

I'm not really worried about the bitmap memory usage, just that it
handles large contiguous allocations sanely. I suspect we may
eventually need a sparse bitmap (e.g. the old btrfs bit-radix
implementation) to track verification in really large files
efficiently.

> However, since it's necessary to re-verify blocks that have been evicted and
> then re-instantiated, my patch also repurposes PG_checked as an indicator for
> whether the Merkle tree pages are newly instantiated.  For a "non-page-cache
> cache", that part would need to be replaced with something equivalent.

Which we could get as a boolean state from the XFS buffer cache
fairly easily - did we find the buffer in cache, or was it read from
disk...

> A different aproach would be to make it so that every time a page (or "cache
> buffer", to call it something more generic) of N Merkle tree blocks is read,
> then all N of those blocks are verified immediately.  Then there would be no
> need to track the "verified" status of individual blocks.

That won't work with XFS - merkle tree blocks are not contiguous in
the attribute b-tree so there is no efficient "sequential bulk read"
option available. The xattr structure is largely chosen because it
allows for fast, deterministic single merkle tree block
operations....

> My concerns with that approach are:
> 
>   * Most data reads only need a single Merkle tree block at the deepest level.

Yup, see above. :)

>     If at least N tree blocks were verified any time that any were verified at
>     all, that would make the worst-case read latency worse.

*nod*

>   * It's possible that the parents of N tree blocks are split across a cache
>     buffer.  Thus, while N blocks can't have more than N parents, and in
>     practice would just have 1-2, those 2 parents could be split into two
>     separate cache buffers, with a total length of 2*N.  Verifying all of those
>     would really increase the worst-case latency as well.
> 
> So I'm thinking that tracking the "verified" status of tree blocks individually
> is the right way to go.  But I'd appreciate any other thoughts on this.

I think that having the fsverity code track verified indexes itself
is a much more felxible and self contained and the right way to go
about it.

The other issue is that verify_page() assumes that it can drop the
reference to the cached object itself - the caller actually owns the
reference to the object, not the verify_page() code. Hence if we are
passing opaque buffers to verify_page() rather page cache pages, we
need a ->drop_block method that gets called instead of put_page().
This will allow the filesystem to hold a reference to the merkle
tree block data while the verification occurs, ensuring that they
don't get reclaimed by memory pressure whilst still in use...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-14 23:06       ` Dave Chinner
@ 2022-12-15  6:47         ` Eric Biggers
  2022-12-15 20:57           ` Dave Chinner
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Biggers @ 2022-12-15  6:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Thu, Dec 15, 2022 at 10:06:32AM +1100, Dave Chinner wrote:
> > Well, my proposal at
> > https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org is to keep
> > tracking the "verified" status at the individual Merkle tree block level, by
> > adding a bitmap fsverity_info::hash_block_verified.  That is part of the
> > fs/verity/ infrastructure, and all filesystems would be able to use it.
> 
> Yeah, i had a look at that rewrite of the verification code last
> night - I get the gist of what it is doing, but a single patch of
> that complexity is largely impossible to sanely review...

Thanks for taking a look at it.  It doesn't really lend itself to being split
up, unfortunately, but I'll see what I can do.

> Correct me if I'm wrong, but won't using a bitmap with 1 bit per
> verified block cause problems with contiguous memory allocation
> pretty quickly? i.e. a 64kB bitmap only tracks 512k blocks, which is
> only 2GB of merkle tree data. Hence at file sizes of 100+GB, the
> bitmap would have to be kvmalloc()d to guarantee allocation will
> succeed.
> 
> I'm not really worried about the bitmap memory usage, just that it
> handles large contiguous allocations sanely. I suspect we may
> eventually need a sparse bitmap (e.g. the old btrfs bit-radix
> implementation) to track verification in really large files
> efficiently.

Well, that's why my patch uses kvmalloc() to allocate the bitmap.

I did originally think it was going to have to be a sparse bitmap that ties into
the shrinker so that pages of it can be evicted.  But if you do the math, the
required bitmap size is only 1 / 2^22 the size of the file, assuming the Merkle
tree uses SHA-256 and 4K blocks.  So a 100MB file only needs a 24-byte bitmap,
and the bitmap for any file under 17GB fits in a 4K page.

My patch puts an arbitrary limit at a 1 MiB bitmap, which would be a 4.4TB file.

It's not ideal to say "4 TB Ought To Be Enough For Anybody".  But it does feel
that it's not currently worth the extra complexity and runtime overhead of
implementing a full-blown sparse bitmap with cache eviction support, when no one
currently has a use case for fsverity on files anywhere near that large.

> The other issue is that verify_page() assumes that it can drop the
> reference to the cached object itself - the caller actually owns the
> reference to the object, not the verify_page() code. Hence if we are
> passing opaque buffers to verify_page() rather page cache pages, we
> need a ->drop_block method that gets called instead of put_page().
> This will allow the filesystem to hold a reference to the merkle
> tree block data while the verification occurs, ensuring that they
> don't get reclaimed by memory pressure whilst still in use...

Yes, probably the prototype of ->read_merkle_tree_page will need to change too.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-15  6:47         ` Eric Biggers
@ 2022-12-15 20:57           ` Dave Chinner
  2022-12-16  5:04             ` Eric Biggers
  0 siblings, 1 reply; 51+ messages in thread
From: Dave Chinner @ 2022-12-15 20:57 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 10:47:37PM -0800, Eric Biggers wrote:
> On Thu, Dec 15, 2022 at 10:06:32AM +1100, Dave Chinner wrote:
> > > Well, my proposal at
> > > https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@kernel.org is to keep
> > > tracking the "verified" status at the individual Merkle tree block level, by
> > > adding a bitmap fsverity_info::hash_block_verified.  That is part of the
> > > fs/verity/ infrastructure, and all filesystems would be able to use it.
> > 
> > Yeah, i had a look at that rewrite of the verification code last
> > night - I get the gist of what it is doing, but a single patch of
> > that complexity is largely impossible to sanely review...
> 
> Thanks for taking a look at it.  It doesn't really lend itself to being split
> up, unfortunately, but I'll see what I can do.
> 
> > Correct me if I'm wrong, but won't using a bitmap with 1 bit per
> > verified block cause problems with contiguous memory allocation
> > pretty quickly? i.e. a 64kB bitmap only tracks 512k blocks, which is
> > only 2GB of merkle tree data. Hence at file sizes of 100+GB, the
> > bitmap would have to be kvmalloc()d to guarantee allocation will
> > succeed.
> > 
> > I'm not really worried about the bitmap memory usage, just that it
> > handles large contiguous allocations sanely. I suspect we may
> > eventually need a sparse bitmap (e.g. the old btrfs bit-radix
> > implementation) to track verification in really large files
> > efficiently.
> 
> Well, that's why my patch uses kvmalloc() to allocate the bitmap.
> 
> I did originally think it was going to have to be a sparse bitmap that ties into
> the shrinker so that pages of it can be evicted.  But if you do the math, the
> required bitmap size is only 1 / 2^22 the size of the file, assuming the Merkle
> tree uses SHA-256 and 4K blocks.  So a 100MB file only needs a 24-byte bitmap,
> and the bitmap for any file under 17GB fits in a 4K page.
> 
> My patch puts an arbitrary limit at a 1 MiB bitmap, which would be a 4.4TB file.
> 
> It's not ideal to say "4 TB Ought To Be Enough For Anybody".  But it does feel
> that it's not currently worth the extra complexity and runtime overhead of
> implementing a full-blown sparse bitmap with cache eviction support, when no one
> currently has a use case for fsverity on files anywhere near that large.

I think we can live with that for the moment, but I suspect that 4TB
filesize limit will become an issue sooner rather than later. What
will happen if someone tries to measure a file larger than this
limit? What's the failure mode?

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/11] fs-verity support for XFS
  2022-12-15 20:57           ` Dave Chinner
@ 2022-12-16  5:04             ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-12-16  5:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrey Albershteyn, linux-xfs, linux-fsdevel

On Fri, Dec 16, 2022 at 07:57:37AM +1100, Dave Chinner wrote:
> > 
> > Well, that's why my patch uses kvmalloc() to allocate the bitmap.
> > 
> > I did originally think it was going to have to be a sparse bitmap that ties into
> > the shrinker so that pages of it can be evicted.  But if you do the math, the
> > required bitmap size is only 1 / 2^22 the size of the file, assuming the Merkle
> > tree uses SHA-256 and 4K blocks.  So a 100MB file only needs a 24-byte bitmap,
> > and the bitmap for any file under 17GB fits in a 4K page.
> > 
> > My patch puts an arbitrary limit at a 1 MiB bitmap, which would be a 4.4TB file.
> > 
> > It's not ideal to say "4 TB Ought To Be Enough For Anybody".  But it does feel
> > that it's not currently worth the extra complexity and runtime overhead of
> > implementing a full-blown sparse bitmap with cache eviction support, when no one
> > currently has a use case for fsverity on files anywhere near that large.
> 
> I think we can live with that for the moment, but I suspect that 4TB
> filesize limit will become an issue sooner rather than later. What
> will happen if someone tries to measure a file larger than this
> limit? What's the failure mode?
> 

FS_IOC_ENABLE_VERITY will fail with EFBIG.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files
  2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
  2022-12-14  2:07   ` Dave Chinner
@ 2022-12-23 16:18   ` Christoph Hellwig
  2023-01-09 17:23     ` Andrey Albershteyn
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2022-12-23 16:18 UTC (permalink / raw)
  To: Andrey Albershteyn; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 06:29:31PM +0100, Andrey Albershteyn wrote:
> The direct path is not supported on verity files. Attempts to use direct
> I/O path on such files should fall back to buffered I/O path.

Just curious: why?  What prevents us from running the hash function
over the user pages in direct I/O?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper
  2022-12-13 21:08     ` Dave Chinner
@ 2023-01-09 16:34       ` Andrey Albershteyn
  0 siblings, 0 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2023-01-09 16:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Matthew Wilcox, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 08:08:13AM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 05:55:22PM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 13, 2022 at 06:29:26PM +0100, Andrey Albershteyn wrote:
> > > Add wrapper to clear mapping's large folio flag. This is handy for
> > > disabling large folios on already existing inodes (e.g. future XFS
> > > integration of fs-verity).
> > 
> > I have two problems with this.  One is your use of __clear_bit().
> > We can use __set_bit() because it's done as part of initialisation.
> > As far as I can tell from your patches, mapping_clear_large_folios() is
> > called on a live inode, so you'd have to use clear_bit() to avoid races.
> 
> I think we can do without mapping_clear_large_folios() - we
> already have precedence for this sort of mapping state change with
> the DAX inode feature flag.  That is, we change the on-disk state in
> the ioctl context, but we don't change the in-memory inode state.
> Instead, we mark it I_DONTCACHEi to get it turfed from memory with
> expediency. Then when it is re-instantiated, we see the on-disk
> state and then don't enable large mappings on that inode.
> 
> That will work just fine here, I think.

Thanks for the suggestion, I will try to look into this. If it won't
work out I will stick to large folio switch, if no other objections.
In anyway I will remove the mapping_clear_large_folios() wrapper not
to encourage further use of such approach.

> 
> > The second is that verity should obviously be enhanced to support
> > large folios (and for that matter, block sizes smaller than PAGE_SIZE).
> > Without that, this is just a toy or a prototype.  Disabling large folios
> > is not an option.
> 
> Disabling large folios is very much an option. Filesystems must opt
> in to large mapping support, so they can also choose to opt out.
> i.e. large mappings is a filesystem policy decision, not a core
> infrastructure decision. Hence how we disable large mappings
> for fsverity enabled inodes is open to discussion, but saying we
> can't opt out of an optional feature is entirely non-sensical.
> 
> > I'm happy to work with you to add support for large folios to verity.
> > It hasn't been high priority for me, but I'm now working on folio support
> > for bufferhead filesystems and this would probably fit in.
> 
> Yes, we need fsverity to support multipage folios, but modifying
> fsverity is outside the scope of initially enabling fsverity support
> on XFS.  This patch set is about sorting out the on-disk format
> changes and interfacing with the fsverity infrastructure to enable
> the feature to be tested and verified.
> 
> Stuff like large mapping support in fsverity is a future concern,
> not a show-stopper for initial feature support. We don't need every
> bell and whistle in the initial merge....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

Thanks
Andrey


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/11] xfs: add attribute type for fs-verity
  2022-12-14  1:03     ` Dave Chinner
@ 2023-01-09 16:37       ` Andrey Albershteyn
  0 siblings, 0 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2023-01-09 16:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 12:03:56PM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 11:43:42AM -0600, Eric Sandeen wrote:
> > On 12/13/22 11:29 AM, Andrey Albershteyn wrote:
> > > The Merkle tree pages and descriptor are stored in the extended
> > > attributes of the inode. Add new attribute type for fs-verity
> > > metadata. Skip fs-verity attributes for getfattr as it can not parse
> > > binary page names.
> > > 
> > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > 
> > 
> > >  DECLARE_EVENT_CLASS(xfs_attr_list_class,
> > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > index 5b57f6348d630..acbfa29d04af0 100644
> > > --- a/fs/xfs/xfs_xattr.c
> > > +++ b/fs/xfs/xfs_xattr.c
> > > @@ -237,6 +237,9 @@ xfs_xattr_put_listent(
> > >  	if (flags & XFS_ATTR_PARENT)
> > >  		return;
> > >  
> > > +	if (flags & XFS_ATTR_VERITY)
> > > +		return;
> > > +
> > 
> > Just a nitpick, but now that there are already 2 cases like this, I wonder
> > if it would be wise to #define something like an XFS_ATTR_VISIBLE_MASK
> > (or maybe XFS_ATTR_INTERNAL_MASK) and use that to decide, rather than
> > testing each one individually?
> 
> Seems like a good idea to me.

Agreed.

> 
> There's also a couple of other spots that a comment about internal
> vs externally visible xattr namespaces might be appropriate. e.g
> xfs_attr_filter() in the ioctl code should never have internal xattr
> namespaces added to it, and similarly a comment at the top of
> fs/xfs/xfs_xattr.c that the interfaces implemented in the file are
> only for exposing externally visible xattr namespaces to users.

Thanks, will describe that.

> 
> That way it becomes more obvious that we handle internal vs external
> xattr namespaces very differently.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag
  2022-12-14  1:29   ` Dave Chinner
@ 2023-01-09 16:51     ` Andrey Albershteyn
  0 siblings, 0 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2023-01-09 16:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, linux-fsdevel

On Wed, Dec 14, 2022 at 12:29:28PM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 06:29:29PM +0100, Andrey Albershteyn wrote:
> > Add flag to mark inodes which have fs-verity enabled on them (i.e.
> > descriptor exist and tree is built).
> .....
> > 
> >  static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
> >  {
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index f08a2d5f96ad4..8d9c9697d3619 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -636,6 +636,8 @@ xfs_ip2xflags(
> >  			flags |= FS_XFLAG_DAX;
> >  		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
> >  			flags |= FS_XFLAG_COWEXTSIZE;
> > +		if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
> > +			flags |= FS_VERITY_FL;
> >  	}
> 
> Ah, attribute flag confusion - easy to do. xflags (FS_XFLAG*) are a
> different set of (extended) flags than the standard VFS inode flags
> (FS_*_FL).
> 
> To place the verity enabled state in the extended flags, you would
> need to define FS_XFLAG_VERITY in include/uapi/linux/fs.h. You'll
> also need to add the conversion from FS_VERITY_FL to FS_XFLAG_VERITY
> to fileattr_fill_flags() and vice versa to fileattr_fill_xflags()
> 
> This will allow both the VFS inode flags UAPI and the
> FS_IOC_FSGETXATTR extended flag API to see the inode has verity
> enabled on it.
> 
> Once FS_XFLAG_VERITY is defined, changing the code in XFS to use it
> directly instead of FS_VERITY_FL will result in everything working
> correct throughout the code.

Oh I see, thanks for the explanation. They are truly confusing :( I
will adjusted it as suggested

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 09/11] iomap: fs-verity verification on page read
  2022-12-13 19:02   ` Eric Biggers
@ 2023-01-09 16:58     ` Andrey Albershteyn
  0 siblings, 0 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2023-01-09 16:58 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-xfs, linux-fsdevel

On Tue, Dec 13, 2022 at 11:02:59AM -0800, Eric Biggers wrote:
> On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote:
> > Add fs-verity page verification in read IO path. The verification
> > itself is offloaded into workqueue (provided by fs-verity).
> > 
> > The work_struct items are allocated from bioset side by side with
> > bio being processed.
> > 
> > As inodes with fs-verity doesn't use large folios we check only
> > first page of the folio for errors (set by fs-verity if verification
> > failed).
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > ---
> >  fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++---
> >  include/linux/iomap.h  |  5 +++
> >  2 files changed, 81 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 91ee0b308e13d..b7abc2f806cfc 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/bio.h>
> >  #include <linux/sched/signal.h>
> >  #include <linux/migrate.h>
> > +#include <linux/fsverity.h>
> >  #include "trace.h"
> >  
> >  #include "../internal.h"
> > @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio)
> >  }
> >  
> >  static struct bio_set iomap_ioend_bioset;
> > +static struct bio_set iomap_readend_bioset;
> >  
> >  static struct iomap_page *
> >  iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags)
> > @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio)
> >  	int error = blk_status_to_errno(bio->bi_status);
> >  	struct folio_iter fi;
> >  
> > -	bio_for_each_folio_all(fi, bio)
> > +	bio_for_each_folio_all(fi, bio) {
> > +		/*
> > +		 * As fs-verity doesn't work with multi-page folios, verity
> > +		 * inodes have large folios disabled (only single page folios
> > +		 * are used)
> > +		 */
> > +		if (!error)
> > +			error = PageError(folio_page(fi.folio, 0));
> > +
> >  		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> > +	}
> 
> fs/verity/ no longer uses PG_error to report errors.  See upstream commit
> 98dc08bae678 ("fsverity: stop using PG_error to track error status").

Thanks! Missed that.

> 
> - Eric
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files
  2022-12-23 16:18   ` Christoph Hellwig
@ 2023-01-09 17:23     ` Andrey Albershteyn
  0 siblings, 0 replies; 51+ messages in thread
From: Andrey Albershteyn @ 2023-01-09 17:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Fri, Dec 23, 2022 at 08:18:07AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 13, 2022 at 06:29:31PM +0100, Andrey Albershteyn wrote:
> > The direct path is not supported on verity files. Attempts to use direct
> > I/O path on such files should fall back to buffered I/O path.
> 
> Just curious: why?  What prevents us from running the hash function
> over the user pages in direct I/O?
> 

hmm, not sure if there are any technical obstacles (probably no),
but as fs-verity subsystem doesn't deal with direct i/o, I went the
same path.

-- 
- Andrey


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2023-01-09 17:24 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13 17:29 [RFC PATCH 00/11] fs-verity support for XFS Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 01/11] xfs: enable large folios in xfs_setup_inode() Andrey Albershteyn
2022-12-14  0:53   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 02/11] pagemap: add mapping_clear_large_folios() wrapper Andrey Albershteyn
2022-12-13 17:55   ` Matthew Wilcox
2022-12-13 19:33     ` Eric Biggers
2022-12-13 21:10       ` Dave Chinner
2022-12-14  6:52         ` Eric Biggers
2022-12-14  8:12           ` Dave Chinner
2022-12-13 21:08     ` Dave Chinner
2023-01-09 16:34       ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 03/11] xfs: add attribute type for fs-verity Andrey Albershteyn
2022-12-13 17:43   ` Eric Sandeen
2022-12-14  1:03     ` Dave Chinner
2023-01-09 16:37       ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 04/11] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2022-12-14  1:06   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 05/11] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2022-12-14  1:29   ` Dave Chinner
2023-01-09 16:51     ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 06/11] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2022-12-14  1:35   ` Dave Chinner
2022-12-14  5:25     ` Eric Biggers
2022-12-14  8:18       ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 07/11] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
2022-12-14  2:07   ` Dave Chinner
2022-12-14  5:44     ` Eric Biggers
2022-12-23 16:18   ` Christoph Hellwig
2023-01-09 17:23     ` Andrey Albershteyn
2022-12-13 17:29 ` [RFC PATCH 08/11] xfs: don't enable large folios on fs-verity sealed inode Andrey Albershteyn
2022-12-14  2:07   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 09/11] iomap: fs-verity verification on page read Andrey Albershteyn
2022-12-13 19:02   ` Eric Biggers
2023-01-09 16:58     ` Andrey Albershteyn
2022-12-14  5:43   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 10/11] xfs: add fs-verity support Andrey Albershteyn
2022-12-13 19:08   ` Eric Biggers
2022-12-13 19:22     ` Darrick J. Wong
2022-12-13 20:13       ` Eric Biggers
2022-12-13 20:33     ` Dave Chinner
2022-12-13 20:39       ` Eric Biggers
2022-12-13 21:40         ` Dave Chinner
2022-12-14  7:58   ` Dave Chinner
2022-12-13 17:29 ` [RFC PATCH 11/11] xfs: add fs-verity ioctls Andrey Albershteyn
2022-12-13 20:50 ` [RFC PATCH 00/11] fs-verity support for XFS Eric Biggers
2022-12-13 22:11   ` Dave Chinner
2022-12-14  6:31     ` Eric Biggers
2022-12-14 23:06       ` Dave Chinner
2022-12-15  6:47         ` Eric Biggers
2022-12-15 20:57           ` Dave Chinner
2022-12-16  5:04             ` Eric Biggers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.