linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHBOMB v5.6] fs-verity support for XFS
@ 2024-04-30  3:11 Darrick J. Wong
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:11 UTC (permalink / raw)
  To: aalbersh, ebiggers
  Cc: linux-fsdevel, fsverity, linux-xfs, alexl, walters, Dave Chinner

Hi everyone,

Another month has gone by, so here's another RFC of fsverity support for
XFS.  I'm going to take off for a bit of R&R the week before and after
LSFMM, so I wanted to blast this out for everyone's enjoyment. ;)

I /think/ I've addressed Eric's feedback about the v5.5 patchset.  The
merkle tree cache has been moved to a per-AG rhashtable; XFS now uses
only a u64 merkle tree pos value in the merkle_read/drop paths to match
the merkle_write path; and online repair can now unwind broken verity
files that are otherwise unopenable.

The kernel series now takes advantage of all the xattr cleanups that hch
and I did for parent pointers.

Full versions are here:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fsverity

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks
  2024-04-30  3:11 [PATCHBOMB v5.6] fs-verity support for XFS Darrick J. Wong
@ 2024-04-30  3:18 ` Darrick J. Wong
  2024-04-30  3:19   ` [PATCH 01/18] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
                     ` (17 more replies)
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 18 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:18 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: Christoph Hellwig, linux-xfs, alexl, walters, fsverity, linux-fsdevel

Hi all,

I've split Andrey's fsverity patchset into two parts -- refactoring
fsverity to support per-block (instead of per-page) access to merkle
tree blocks, moving all filesystems to a per-superblock workqueue, and
enhancing iomap to support validating readahead with fsverity data.
This will hopefully address everything that Eric Biggers noted in his
review of the v5 patchset.

To eliminate the requirement of using a verified bitmap, I added to the
fsverity_blockbuf object the ability to pass around verified bits so
that the underlying implementation can remember if the fsverity common
code actually validated a block.

To support cleaning up stale/dead merkle trees and online repair, I've
added a couple of patches to export enough of the merkle tree geometry
to XFS so that it can erase remnants of previous attempts to enable
verity.  I've also augmented it to share with XFS the hash of a
completely zeroed data block so that we can elide writing merkle leaves
for sparse regions of a file.  This might be useful for enabling
fsverity on gold master disk images.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity-by-block
---
Commits in this patchset:
 * fs: add FS_XFLAG_VERITY for verity files
 * fsverity: pass tree_blocksize to end_enable_verity()
 * fsverity: convert verification to use byte instead of page offsets
 * fsverity: support block-based Merkle tree caching
 * fsverity: pass the merkle tree block level to fsverity_read_merkle_tree_block
 * fsverity: add per-sb workqueue for post read processing
 * fsverity: add tracepoints
 * fsverity: pass the new tree size and block size to ->begin_enable_verity
 * fsverity: expose merkle tree geometry to callers
 * fsverity: box up the write_merkle_tree_block parameters too
 * fsverity: pass the zero-hash value to the implementation
 * fsverity: report validation errors back to the filesystem
 * fsverity: pass super_block to fsverity_enqueue_verify_work
 * ext4: use a per-superblock fsverity workqueue
 * f2fs: use a per-superblock fsverity workqueue
 * btrfs: use a per-superblock fsverity workqueue
 * fsverity: remove system-wide workqueue
 * iomap: integrate fs-verity verification into iomap's read path
---
 Documentation/filesystems/fsverity.rst |    8 +
 MAINTAINERS                            |    1 
 fs/btrfs/super.c                       |   14 ++
 fs/btrfs/verity.c                      |   13 +-
 fs/buffer.c                            |    7 +
 fs/ext4/readpage.c                     |    4 -
 fs/ext4/super.c                        |   11 ++
 fs/ext4/verity.c                       |   13 +-
 fs/f2fs/compress.c                     |    3 
 fs/f2fs/data.c                         |    2 
 fs/f2fs/super.c                        |   11 ++
 fs/f2fs/verity.c                       |   13 +-
 fs/ioctl.c                             |   11 ++
 fs/iomap/buffered-io.c                 |  133 +++++++++++++++++-
 fs/super.c                             |    3 
 fs/verity/enable.c                     |   20 ++-
 fs/verity/fsverity_private.h           |   13 ++
 fs/verity/init.c                       |    2 
 fs/verity/open.c                       |   61 ++++++++
 fs/verity/read_metadata.c              |   66 ++++-----
 fs/verity/verify.c                     |  232 +++++++++++++++++++++++---------
 include/linux/fs.h                     |    2 
 include/linux/fsverity.h               |  166 ++++++++++++++++++++++-
 include/linux/iomap.h                  |    5 +
 include/trace/events/fsverity.h        |  162 ++++++++++++++++++++++
 include/uapi/linux/fs.h                |    1 
 26 files changed, 835 insertions(+), 142 deletions(-)
 create mode 100644 include/trace/events/fsverity.h


^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCHSET v5.6 2/2] xfs: fs-verity support
  2024-04-30  3:11 [PATCHBOMB v5.6] fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
@ 2024-04-30  3:18 ` Darrick J. Wong
  2024-04-30  3:24   ` [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
                     ` (25 more replies)
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
  3 siblings, 26 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:18 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

Hi all,

This patchset adds support for fsverity to XFS.  In keeping with
Andrey's original design, XFS stores all fsverity metadata in the
extended attribute data.  However, I've made a few changes to the code:
First, it now caches merkle tree blocks directly instead of abusing the
buffer cache.  This reduces lookup overhead quite a bit, at a cost of
needing a new shrinker for cached merkle tree blocks.

To reduce the ondisk footprint further, I also made the verity
enablement code detect trailing zeroes whenever fsverity tells us to
write a buffer, and elide storing the zeroes.  To further reduce the
footprint of sparse files, I also skip writing merkle tree blocks if the
block contents are entirely hashes of zeroes.

Next, I implemented more of the tooling around verity, such as debugger
support, as much fsck support as I can manage without knowing the
internal format of the fsverity information; and added support for
xfs_scrub to read fsverity files to validate the consistency of the data
against the merkle tree.

Finally, I add the ability for administrators to turn off fsverity,
which might help recovering damaged data from an inconsistent file.

From Andrey Albershteyn:

Here's v5 of my patchset of adding fs-verity support to XFS.

This implementation uses extended attributes to store fs-verity
metadata. The Merkle tree blocks are stored in the remote extended
attributes. The names are offsets into the tree.

A few key points of this patchset:
- fs-verity can work with Merkle tree blocks based caching (xfs) and
  PAGE caching (ext4, f2fs, btrfs)
- iomap does fs-verity verification
- In XFS, fs-verity metadata is stored in extended attributes
- per-sb workqueue for verification processing
- Inodes with fs-verity have new on-disk diflag
- xfs_attr_get() can return a buffer with an extended attribute
- xfs_buf can allocate double space for Merkle tree blocks. Part of
  the space is used to store  the extended attribute data without
  leaf headers
- xfs_buf tracks verified status of merkle tree blocks

Testing:
The patchset is tested with xfstests -g verity on xfs_1k, xfs_4k,
xfs_1k_quota, xfs_4k_quota, ext4_4k, and ext4_4k_quota. With
KMEMLEAK and KASAN enabled. More testing on the way.

Changes from V4:
- Mainly fs-verity changes; removed unnecessary functions
- Replace XFS workqueue with per-sb workqueue created in
  fsverity_set_ops()
- Drop patch with readahead calculation in bytes
Changes from V3:
- redone changes to fs-verity core as previous version had an issue
  on ext4
- add blocks invalidation interface to fs-verity
- move memory ordering primitives out of block status check to fs
  read block function
- add fs-verity verification to iomap instead of general post read
  processing
Changes from V2:
- FS_XFLAG_VERITY extended attribute flag
- Change fs-verity to use Merkle tree blocks instead of expecting
  PAGE references from filesystem
- Change approach in iomap to filesystem provided bio_set and
  submit_io instead of just callouts to filesystem
- Add possibility for xfs_buf allocate more space for fs-verity
  extended attributes
- Make xfs_attr module to copy fs-verity blocks inside the xfs_buf,
  so XFS can get data without leaf headers
- Add Merkle tree removal for error path
- Makae scrub aware of new dinode flag
Changes from V1:
- Added parent pointer patches for easier testing
- Many issues and refactoring points fixed from the V1 review
- Adjusted for recent changes in fs-verity core (folios, non-4k)
- Dropped disabling of large folios
- Completely new fsverity patches (fix, callout, log_blocksize)
- Change approach to verification in iomap to the same one as in
  write path. Callouts to fs instead of direct fs-verity use.
- New XFS workqueue for post read folio verification
- xfs_attr_get() can return underlying xfs_buf
- xfs_bufs are marked with XBF_VERITY_CHECKED to track verified
  blocks

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fsverity

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fsverity
---
Commits in this patchset:
 * xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
 * xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
 * xfs: create a helper to compute the blockcount of a max sized remote value
 * xfs: minor cleanups of xfs_attr3_rmt_blocks
 * xfs: use an empty transaction to protect xfs_attr_get from deadlocks
 * xfs: add attribute type for fs-verity
 * xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks
 * xfs: add fs-verity ro-compat flag
 * xfs: add inode on-disk VERITY flag
 * xfs: initialize fs-verity on file open and cleanup on inode destruction
 * xfs: don't allow to enable DAX on fs-verity sealed inode
 * xfs: disable direct read path for fs-verity files
 * xfs: widen flags argument to the xfs_iflags_* helpers
 * xfs: add fs-verity support
 * xfs: create a per-mount shrinker for verity inodes merkle tree blocks
 * xfs: shrink verity blob cache
 * xfs: don't store trailing zeroes of merkle tree blocks
 * xfs: use merkle tree offset as attr hash
 * xfs: don't bother storing merkle tree blocks for zeroed data blocks
 * xfs: add fs-verity ioctls
 * xfs: advertise fs-verity being available on filesystem
 * xfs: check and repair the verity inode flag state
 * xfs: teach online repair to evaluate fsverity xattrs
 * xfs: report verity failures through the health system
 * xfs: make it possible to disable fsverity
 * xfs: enable ro-compat fs-verity flag
---
 Documentation/filesystems/fsverity.rst |   10 
 fs/verity/enable.c                     |   50 ++
 fs/xfs/Makefile                        |    2 
 fs/xfs/libxfs/xfs_ag.h                 |    8 
 fs/xfs/libxfs/xfs_attr.c               |   35 +
 fs/xfs/libxfs/xfs_attr_leaf.c          |    5 
 fs/xfs/libxfs/xfs_attr_remote.c        |  199 +++++-
 fs/xfs/libxfs/xfs_attr_remote.h        |   12 
 fs/xfs/libxfs/xfs_da_format.h          |   55 ++
 fs/xfs/libxfs/xfs_format.h             |   15 
 fs/xfs/libxfs/xfs_fs.h                 |    2 
 fs/xfs/libxfs/xfs_health.h             |    4 
 fs/xfs/libxfs/xfs_inode_buf.c          |    8 
 fs/xfs/libxfs/xfs_inode_util.c         |    2 
 fs/xfs/libxfs/xfs_log_format.h         |    1 
 fs/xfs/libxfs/xfs_ondisk.h             |    5 
 fs/xfs/libxfs/xfs_sb.c                 |    4 
 fs/xfs/libxfs/xfs_shared.h             |    1 
 fs/xfs/libxfs/xfs_verity.c             |   74 ++
 fs/xfs/libxfs/xfs_verity.h             |   14 
 fs/xfs/scrub/attr.c                    |  145 +++++
 fs/xfs/scrub/attr.h                    |    6 
 fs/xfs/scrub/attr_repair.c             |   51 ++
 fs/xfs/scrub/common.c                  |   68 ++
 fs/xfs/scrub/common.h                  |    3 
 fs/xfs/scrub/inode.c                   |    7 
 fs/xfs/scrub/inode_repair.c            |   36 +
 fs/xfs/scrub/reap.c                    |    4 
 fs/xfs/scrub/trace.c                   |    1 
 fs/xfs/scrub/trace.h                   |   31 +
 fs/xfs/xfs_attr_inactive.c             |    2 
 fs/xfs/xfs_file.c                      |   23 +
 fs/xfs/xfs_fsops.c                     |    6 
 fs/xfs/xfs_fsverity.c                  |  997 ++++++++++++++++++++++++++++++++
 fs/xfs/xfs_fsverity.h                  |   32 +
 fs/xfs/xfs_health.c                    |    1 
 fs/xfs/xfs_icache.c                    |    4 
 fs/xfs/xfs_inode.h                     |   16 -
 fs/xfs/xfs_ioctl.c                     |   22 +
 fs/xfs/xfs_iops.c                      |    4 
 fs/xfs/xfs_mount.c                     |   10 
 fs/xfs/xfs_mount.h                     |    8 
 fs/xfs/xfs_super.c                     |   24 +
 fs/xfs/xfs_trace.c                     |    1 
 fs/xfs/xfs_trace.h                     |   85 +++
 include/linux/fsverity.h               |   24 +
 include/trace/events/fsverity.h        |   13 
 include/uapi/linux/fsverity.h          |    1 
 48 files changed, 2048 insertions(+), 83 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_verity.c
 create mode 100644 fs/xfs/libxfs/xfs_verity.h
 create mode 100644 fs/xfs/xfs_fsverity.c
 create mode 100644 fs/xfs/xfs_fsverity.h


^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCHSET v5.6] xfsprogs: fs-verity support for XFS
  2024-04-30  3:11 [PATCHBOMB v5.6] fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
@ 2024-04-30  3:19 ` Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 01/38] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
                     ` (37 more replies)
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
  3 siblings, 38 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:19 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong
  Cc: Darrick J. Wong, linux-fsdevel, linux-xfs, fsverity

Hi all,

This patchset adds support for fsverity to XFS.  In keeping with
Andrey's original design, XFS stores all fsverity metadata in the
extended attribute data.  However, I've made a few changes to the code:
First, it now caches merkle tree blocks directly instead of abusing the
buffer cache.  This reduces lookup overhead quite a bit, at a cost of
needing a new shrinker for cached merkle tree blocks.

To reduce the ondisk footprint further, I also made the verity
enablement code detect trailing zeroes whenever fsverity tells us to
write a buffer, and elide storing the zeroes.  To further reduce the
footprint of sparse files, I also skip writing merkle tree blocks if the
block contents are entirely hashes of zeroes.

Next, I implemented more of the tooling around verity, such as debugger
support, as much fsck support as I can manage without knowing the
internal format of the fsverity information; and added support for
xfs_scrub to read fsverity files to validate the consistency of the data
against the merkle tree.

Finally, I add the ability for administrators to turn off fsverity,
which might help recovering damaged data from an inconsistent file.

From Andrey Albershteyn:

Here's v5 of my patchset of adding fs-verity support to XFS.

This implementation uses extended attributes to store fs-verity
metadata. The Merkle tree blocks are stored in the remote extended
attributes. The names are offsets into the tree.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fsverity

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fsverity
---
Commits in this patchset:
 * fs: add FS_XFLAG_VERITY for verity files
 * xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
 * xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
 * xfs: create a helper to compute the blockcount of a max sized remote value
 * xfs: minor cleanups of xfs_attr3_rmt_blocks
 * xfs: use an empty transaction to protect xfs_attr_get from deadlocks
 * xfs: add attribute type for fs-verity
 * xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks
 * xfs: add fs-verity ro-compat flag
 * xfs: add inode on-disk VERITY flag
 * xfs: add fs-verity support
 * xfs: use merkle tree offset as attr hash
 * xfs: advertise fs-verity being available on filesystem
 * xfs: report verity failures through the health system
 * xfs: enable ro-compat fs-verity flag
 * libfrog: add fsverity to xfs_report_geom output
 * xfs_db: introduce attr_modify command
 * xfs_db: add ATTR_PARENT support to attr_modify command
 * xfs_db: make attr_set/remove/modify be able to handle fs-verity attrs
 * man: document attr_modify command
 * xfs_db: create hex string as a field type
 * xfs_db: dump verity features and metadata
 * xfs_db: dump merkle tree data
 * xfs_db: dump the verity descriptor
 * xfs_db: don't obfuscate verity xattrs
 * xfs_db: dump the inode verity flag
 * xfs_db: compute hashes of merkle tree blocks
 * xfs_repair: junk fsverity xattrs when unnecessary
 * xfs_repair: clear verity iflag when verity isn't supported
 * xfs_repair: handle verity remote attrs
 * xfs_repair: allow upgrading filesystems with verity
 * xfs_scrub: check verity file metadata
 * xfs_scrub: validate verity file contents when doing a media scan
 * xfs_scrub: use MADV_POPULATE_READ to check verity files
 * xfs_spaceman: report data corruption
 * xfs_io: report fsverity status via statx
 * xfs_io: create magic command to disable verity
 * mkfs.xfs: add verity parameter
---
 configure.ac                    |    1 
 db/Makefile                     |    4 
 db/attr.c                       |  222 +++++++++++++++++++++-
 db/attrset.c                    |  237 ++++++++++++++++++++++-
 db/attrshort.c                  |   68 +++++++
 db/field.c                      |   31 +++
 db/field.h                      |    4 
 db/fprint.c                     |   24 ++
 db/fprint.h                     |    2 
 db/hash.c                       |   21 ++
 db/inode.c                      |    3 
 db/metadump.c                   |   16 +-
 db/sb.c                         |    2 
 db/write.c                      |    2 
 db/write.h                      |    1 
 include/builddefs.in            |    1 
 include/libxfs.h                |    1 
 include/linux.h                 |    4 
 include/platform_defs.h         |   13 +
 include/xfs_mount.h             |    2 
 io/attr.c                       |    2 
 io/scrub.c                      |   47 +++++
 libfrog/fsgeom.c                |    6 -
 libxfs/Makefile                 |    6 -
 libxfs/libxfs_api_defs.h        |    3 
 libxfs/xfs_ag.h                 |    8 +
 libxfs/xfs_attr.c               |   35 +++
 libxfs/xfs_attr_leaf.c          |    5 
 libxfs/xfs_attr_remote.c        |  199 +++++++++++++++----
 libxfs/xfs_attr_remote.h        |   12 +
 libxfs/xfs_da_format.h          |   55 +++++
 libxfs/xfs_format.h             |   15 +
 libxfs/xfs_fs.h                 |    2 
 libxfs/xfs_health.h             |    4 
 libxfs/xfs_inode_buf.c          |    8 +
 libxfs/xfs_inode_util.c         |    2 
 libxfs/xfs_log_format.h         |    1 
 libxfs/xfs_ondisk.h             |    5 
 libxfs/xfs_sb.c                 |    4 
 libxfs/xfs_shared.h             |    1 
 libxfs/xfs_verity.c             |   74 +++++++
 libxfs/xfs_verity.h             |   14 +
 m4/package_libcdev.m4           |   18 ++
 man/man2/ioctl_xfs_bulkstat.2   |    3 
 man/man2/ioctl_xfs_fsgetxattr.2 |    3 
 man/man8/mkfs.xfs.8.in          |    6 +
 man/man8/xfs_admin.8            |    6 +
 man/man8/xfs_db.8               |   47 ++++-
 man/man8/xfs_io.8               |    7 +
 mkfs/lts_4.19.conf              |    1 
 mkfs/lts_5.10.conf              |    1 
 mkfs/lts_5.15.conf              |    1 
 mkfs/lts_5.4.conf               |    1 
 mkfs/lts_6.1.conf               |    1 
 mkfs/lts_6.6.conf               |    1 
 mkfs/xfs_mkfs.c                 |   25 ++
 repair/attr_repair.c            |   44 ++++
 repair/dinode.c                 |   28 +++
 repair/globals.c                |    1 
 repair/globals.h                |    1 
 repair/phase2.c                 |   24 ++
 repair/xfs_repair.c             |   11 +
 scrub/Makefile                  |    4 
 scrub/inodes.h                  |   22 ++
 scrub/phase5.c                  |  182 ++++++++++++++++++
 scrub/phase6.c                  |  402 +++++++++++++++++++++++++++++++++++++++
 spaceman/health.c               |    4 
 67 files changed, 1918 insertions(+), 93 deletions(-)
 create mode 100644 libxfs/xfs_verity.c
 create mode 100644 libxfs/xfs_verity.h


^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCHSET v5.6] fstests: fs-verity support for XFS
  2024-04-30  3:11 [PATCHBOMB v5.6] fs-verity support for XFS Darrick J. Wong
                   ` (2 preceding siblings ...)
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
@ 2024-04-30  3:19 ` Darrick J. Wong
  2024-04-30  3:41   ` [PATCH 1/6] common/verity: enable fsverity " Darrick J. Wong
                     ` (6 more replies)
  3 siblings, 7 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:19 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: Andrey Albershteyn, fsverity, linux-fsdevel, guan, linux-xfs, fstests

Hi all,

This patchset adds support for fsverity to XFS.  In keeping with
Andrey's original design, XFS stores all fsverity metadata in the
extended attribute data.  However, I've made a few changes to the code:
First, it now caches merkle tree blocks directly instead of abusing the
buffer cache.  This reduces lookup overhead quite a bit, at a cost of
needing a new shrinker for cached merkle tree blocks.

To reduce the ondisk footprint further, I also made the verity
enablement code detect trailing zeroes whenever fsverity tells us to
write a buffer, and elide storing the zeroes.  To further reduce the
footprint of sparse files, I also skip writing merkle tree blocks if the
block contents are entirely hashes of zeroes.

Next, I implemented more of the tooling around verity, such as debugger
support, as much fsck support as I can manage without knowing the
internal format of the fsverity information; and added support for
xfs_scrub to read fsverity files to validate the consistency of the data
against the merkle tree.

Finally, I add the ability for administrators to turn off fsverity,
which might help recovering damaged data from an inconsistent file.

From Andrey Albershteyn:

Here's v5 of my patchset of adding fs-verity support to XFS.

This implementation uses extended attributes to store fs-verity
metadata. The Merkle tree blocks are stored in the remote extended
attributes. The names are offsets into the tree.
From Darrick J. Wong:

This v5.3 patchset builds upon v5.2 of Andrey's patchset to implement
fsverity for XFS.

The biggest thing that I didn't like in the v5 patchset is the abuse of
the data device's buffer cache to store the incore version of the merkle
tree blocks.  Not only do verity state flags end up in xfs_buf, but the
double-alloc flag wastes memory and doesn't remain internally consistent
if the xattrs shift around.

I replaced all of that with a per-inode xarray that indexes incore
merkle tree blocks.  For cache hits, this dramatically reduces the
amount of work that xfs has to do to feed fsverity.  The per-block
overhead is much lower (8 bytes instead of ~300 for xfs_bufs), and we no
longer have to entertain layering violations in the buffer cache.  I
also added a per-filesystem shrinker so that reclaim can cull cached
merkle tree blocks, starting with the leaf tree nodes.

I've also rolled in some changes recommended by the fsverity maintainer,
fixed some organization and naming problems in the xfs code, fixed a
collision in the xfs_inode iflags, and improved dead merkle tree cleanup
per the discussion of the v5 series.  At this point I'm happy enough
with this code to start integrating and testing it in my trees, so it's
time to send it out a coherent patchset for comments.

For v5.3, I've added bits and pieces of online and offline repair
support, reduced the size of partially filled merkle tree blocks by
removing trailing zeroes, changed the xattr hash function to better
avoid collisions between merkle tree keys, made the fsverity
invalidation bitmap unnecessary, and made it so that we can save space
on sparse verity files by not storing merkle tree blocks that hash
totally zeroed data blocks.

From Andrey Albershteyn:

Here's v5 of my patchset of adding fs-verity support to XFS.

This implementation uses extended attributes to store fs-verity
metadata. The Merkle tree blocks are stored in the remote extended
attributes. The names are offsets into the tree.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fsverity

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fsverity
---
Commits in this patchset:
 * common/verity: enable fsverity for XFS
 * xfs/{021,122}: adapt to fsverity xattrs
 * xfs/122: adapt to fsverity
 * xfs: test xfs_scrub detection and correction of corrupt fsverity metadata
 * xfs: test disabling fsverity
 * common/populate: add verity files to populate xfs images
---
 common/populate    |   24 +++++++++
 common/verity      |   39 ++++++++++++++-
 tests/xfs/021      |    3 +
 tests/xfs/122.out  |    3 +
 tests/xfs/1880     |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/1880.out |   37 ++++++++++++++
 tests/xfs/1881     |  111 +++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/1881.out |   28 +++++++++++
 8 files changed, 378 insertions(+), 2 deletions(-)
 create mode 100755 tests/xfs/1880
 create mode 100644 tests/xfs/1880.out
 create mode 100755 tests/xfs/1881
 create mode 100644 tests/xfs/1881.out


^ permalink raw reply	[flat|nested] 159+ messages in thread

* [PATCH 01/18] fs: add FS_XFLAG_VERITY for verity files
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
@ 2024-04-30  3:19   ` Darrick J. Wong
  2024-04-30  3:19   ` [PATCH 02/18] fsverity: pass tree_blocksize to end_enable_verity() Darrick J. Wong
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:19 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: fix broken verity flag checks]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 Documentation/filesystems/fsverity.rst |    8 ++++++++
 fs/ioctl.c                             |   11 +++++++++++
 include/uapi/linux/fs.h                |    1 +
 3 files changed, 20 insertions(+)


diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 13e4b18e5dbbb..887cdaf162a99 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -326,6 +326,14 @@ the file has fs-verity enabled.  This can perform better than
 FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require
 opening the file, and opening verity files can be expensive.
 
+FS_IOC_FSGETXATTR
+-----------------
+
+Since Linux v6.9, the FS_IOC_FSGETXATTR ioctl sets FS_XFLAG_VERITY (0x00020000)
+in the returned flags when the file has verity enabled. Note that this attribute
+cannot be set with FS_IOC_FSSETXATTR as enabling verity requires input
+parameters. See FS_IOC_ENABLE_VERITY.
+
 .. _accessing_verity_files:
 
 Accessing verity files
diff --git a/fs/ioctl.c b/fs/ioctl.c
index fb0628e680c40..d69d0feee4bc6 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -481,6 +481,8 @@ void fileattr_fill_xflags(struct fileattr *fa, u32 xflags)
 		fa->flags |= FS_DAX_FL;
 	if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
 		fa->flags |= FS_PROJINHERIT_FL;
+	if (fa->fsx_xflags & FS_XFLAG_VERITY)
+		fa->flags |= FS_VERITY_FL;
 }
 EXPORT_SYMBOL(fileattr_fill_xflags);
 
@@ -511,6 +513,8 @@ void fileattr_fill_flags(struct fileattr *fa, u32 flags)
 		fa->fsx_xflags |= FS_XFLAG_DAX;
 	if (fa->flags & FS_PROJINHERIT_FL)
 		fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
+	if (fa->flags & FS_VERITY_FL)
+		fa->fsx_xflags |= FS_XFLAG_VERITY;
 }
 EXPORT_SYMBOL(fileattr_fill_flags);
 
@@ -641,6 +645,13 @@ static int fileattr_set_prepare(struct inode *inode,
 	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
 		return -EINVAL;
 
+	/*
+	 * Verity cannot be changed through FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS.
+	 * See FS_IOC_ENABLE_VERITY.
+	 */
+	if ((fa->fsx_xflags ^ old_ma->fsx_xflags) & FS_XFLAG_VERITY)
+		return -EINVAL;
+
 	/* Extent size hints of zero turn off the flags. */
 	if (fa->fsx_extsize == 0)
 		fa->fsx_xflags &= ~(FS_XFLAG_EXTSIZE | FS_XFLAG_EXTSZINHERIT);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 45e4e64fd6643..101d1d71242c7 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -158,6 +158,7 @@ struct fsxattr {
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+#define FS_XFLAG_VERITY		0x00020000	/* fs-verity enabled */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 02/18] fsverity: pass tree_blocksize to end_enable_verity()
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
  2024-04-30  3:19   ` [PATCH 01/18] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
@ 2024-04-30  3:19   ` Darrick J. Wong
  2024-04-30  3:20   ` [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets Darrick J. Wong
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:19 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

XFS will need to know tree_blocksize to remove the tree in case of an
error. The size is needed to calculate offsets of particular Merkle
tree blocks.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: I put ebiggers' suggested changes in a separate patch]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/btrfs/verity.c        |    4 +++-
 fs/ext4/verity.c         |    3 ++-
 fs/f2fs/verity.c         |    3 ++-
 fs/verity/enable.c       |    6 ++++--
 include/linux/fsverity.h |    4 +++-
 5 files changed, 14 insertions(+), 6 deletions(-)


diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index 4042dd6437aef..647a22e07748e 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -620,6 +620,7 @@ static int btrfs_begin_enable_verity(struct file *filp)
  * @desc:              verity descriptor to write out (NULL in error conditions)
  * @desc_size:         size of the verity descriptor (variable with signatures)
  * @merkle_tree_size:  size of the merkle tree in bytes
+ * @tree_blocksize:    the Merkle tree block size
  *
  * If desc is null, then VFS is signaling an error occurred during verity
  * enable, and we should try to rollback. Otherwise, attempt to finish verity.
@@ -627,7 +628,8 @@ static int btrfs_begin_enable_verity(struct file *filp)
  * Returns 0 on success, negative error code on error.
  */
 static int btrfs_end_enable_verity(struct file *filp, const void *desc,
-				   size_t desc_size, u64 merkle_tree_size)
+				   size_t desc_size, u64 merkle_tree_size,
+				   unsigned int tree_blocksize)
 {
 	struct btrfs_inode *inode = BTRFS_I(file_inode(filp));
 	int ret = 0;
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index 2f37e1ea39551..da2095a813492 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -189,7 +189,8 @@ static int ext4_write_verity_descriptor(struct inode *inode, const void *desc,
 }
 
 static int ext4_end_enable_verity(struct file *filp, const void *desc,
-				  size_t desc_size, u64 merkle_tree_size)
+				  size_t desc_size, u64 merkle_tree_size,
+				  unsigned int tree_blocksize)
 {
 	struct inode *inode = file_inode(filp);
 	const int credits = 2; /* superblock and inode for ext4_orphan_del() */
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index f7bb0c54502c8..8fdac653ff8e8 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -144,7 +144,8 @@ static int f2fs_begin_enable_verity(struct file *filp)
 }
 
 static int f2fs_end_enable_verity(struct file *filp, const void *desc,
-				  size_t desc_size, u64 merkle_tree_size)
+				  size_t desc_size, u64 merkle_tree_size,
+				  unsigned int tree_blocksize)
 {
 	struct inode *inode = file_inode(filp);
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index c284f46d1b535..04e060880b792 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -274,7 +274,8 @@ static int enable_verity(struct file *filp,
 	 * Serialized with ->begin_enable_verity() by the inode lock.
 	 */
 	inode_lock(inode);
-	err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size);
+	err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size,
+				      params.block_size);
 	inode_unlock(inode);
 	if (err) {
 		fsverity_err(inode, "%ps() failed with err %d",
@@ -300,7 +301,8 @@ static int enable_verity(struct file *filp,
 
 rollback:
 	inode_lock(inode);
-	(void)vops->end_enable_verity(filp, NULL, 0, params.tree_size);
+	(void)vops->end_enable_verity(filp, NULL, 0, params.tree_size,
+				      params.block_size);
 	inode_unlock(inode);
 	goto out;
 }
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 1eb7eae580be7..ac58b19f23d32 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -51,6 +51,7 @@ struct fsverity_operations {
 	 * @desc: the verity descriptor to write, or NULL on failure
 	 * @desc_size: size of verity descriptor, or 0 on failure
 	 * @merkle_tree_size: total bytes the Merkle tree took up
+	 * @tree_blocksize: the Merkle tree block size
 	 *
 	 * If desc == NULL, then enabling verity failed and the filesystem only
 	 * must do any necessary cleanups.  Else, it must also store the given
@@ -65,7 +66,8 @@ struct fsverity_operations {
 	 * Return: 0 on success, -errno on failure
 	 */
 	int (*end_enable_verity)(struct file *filp, const void *desc,
-				 size_t desc_size, u64 merkle_tree_size);
+				 size_t desc_size, u64 merkle_tree_size,
+				 unsigned int tree_blocksize);
 
 	/**
 	 * Get the verity descriptor of the given inode.


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
  2024-04-30  3:19   ` [PATCH 01/18] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
  2024-04-30  3:19   ` [PATCH 02/18] fsverity: pass tree_blocksize to end_enable_verity() Darrick J. Wong
@ 2024-04-30  3:20   ` Darrick J. Wong
  2024-05-01  7:33     ` Christoph Hellwig
  2024-04-30  3:20   ` [PATCH 04/18] fsverity: support block-based Merkle tree caching Darrick J. Wong
                     ` (14 subsequent siblings)
  17 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:20 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Convert all the hash verification code to use byte offsets instead of
page offsets so that fsverity can support implementations that supply
merkle tree information in units of merkle tree blocks instead of pages.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/fsverity_private.h |    8 ++
 fs/verity/read_metadata.c    |   65 ++++++++-----------
 fs/verity/verify.c           |  145 ++++++++++++++++++++++++++++--------------
 include/linux/fsverity.h     |   19 ++++++
 4 files changed, 152 insertions(+), 85 deletions(-)


diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index b3506f56e180b..8a41e27413284 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -154,4 +154,12 @@ static inline void fsverity_init_signature(void)
 
 void __init fsverity_init_workqueue(void);
 
+int fsverity_read_merkle_tree_block(struct inode *inode,
+				    const struct merkle_tree_params *params,
+				    u64 pos, unsigned long ra_bytes,
+				    struct fsverity_blockbuf *block);
+
+void fsverity_drop_merkle_tree_block(struct inode *inode,
+				     struct fsverity_blockbuf *block);
+
 #endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index f58432772d9ea..4011a02f5d32d 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -14,65 +14,54 @@
 
 static int fsverity_read_merkle_tree(struct inode *inode,
 				     const struct fsverity_info *vi,
-				     void __user *buf, u64 offset, int length)
+				     void __user *buf, u64 pos, int length)
 {
-	const struct fsverity_operations *vops = inode->i_sb->s_vop;
-	u64 end_offset;
-	unsigned int offs_in_page;
-	pgoff_t index, last_index;
+	const u64 end_pos = min(pos + length, vi->tree_params.tree_size);
+	struct backing_dev_info *bdi = inode->i_sb->s_bdi;
+	const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT,
+				     ULONG_MAX);
+	const struct merkle_tree_params *params = &vi->tree_params;
+	unsigned int offs_in_block = pos & (params->block_size - 1);
 	int retval = 0;
 	int err = 0;
 
-	end_offset = min(offset + length, vi->tree_params.tree_size);
-	if (offset >= end_offset)
-		return 0;
-	offs_in_page = offset_in_page(offset);
-	last_index = (end_offset - 1) >> PAGE_SHIFT;
-
 	/*
-	 * Iterate through each Merkle tree page in the requested range and copy
-	 * the requested portion to userspace.  Note that the Merkle tree block
-	 * size isn't important here, as we are returning a byte stream; i.e.,
-	 * we can just work with pages even if the tree block size != PAGE_SIZE.
+	 * Iterate through each Merkle tree block in the requested range and
+	 * copy the requested portion to userspace. Note that we are returning
+	 * a byte stream.
 	 */
-	for (index = offset >> PAGE_SHIFT; index <= last_index; index++) {
-		unsigned long num_ra_pages =
-			min_t(unsigned long, last_index - index + 1,
-			      inode->i_sb->s_bdi->io_pages);
-		unsigned int bytes_to_copy = min_t(u64, end_offset - offset,
-						   PAGE_SIZE - offs_in_page);
-		struct page *page;
-		const void *virt;
+	while (pos < end_pos) {
+		unsigned long ra_bytes;
+		unsigned int bytes_to_copy;
+		struct fsverity_blockbuf block = { };
 
-		page = vops->read_merkle_tree_page(inode, index, num_ra_pages);
-		if (IS_ERR(page)) {
-			err = PTR_ERR(page);
-			fsverity_err(inode,
-				     "Error %d reading Merkle tree page %lu",
-				     err, index);
+		ra_bytes = min_t(unsigned long, end_pos - pos, max_ra_bytes);
+		bytes_to_copy = min_t(u64, end_pos - pos,
+				      params->block_size - offs_in_block);
+
+		err = fsverity_read_merkle_tree_block(inode, &vi->tree_params,
+						      pos - offs_in_block,
+						      ra_bytes, &block);
+		if (err)
 			break;
-		}
 
-		virt = kmap_local_page(page);
-		if (copy_to_user(buf, virt + offs_in_page, bytes_to_copy)) {
-			kunmap_local(virt);
-			put_page(page);
+		if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) {
+			fsverity_drop_merkle_tree_block(inode, &block);
 			err = -EFAULT;
 			break;
 		}
-		kunmap_local(virt);
-		put_page(page);
+		fsverity_drop_merkle_tree_block(inode, &block);
 
 		retval += bytes_to_copy;
 		buf += bytes_to_copy;
-		offset += bytes_to_copy;
+		pos += bytes_to_copy;
 
 		if (fatal_signal_pending(current))  {
 			err = -EINTR;
 			break;
 		}
 		cond_resched();
-		offs_in_page = 0;
+		offs_in_block = 0;
 	}
 	return retval ? retval : err;
 }
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 4fcad0825a120..1c4a7c63c0a1c 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -13,12 +13,15 @@
 static struct workqueue_struct *fsverity_read_workqueue;
 
 /*
- * Returns true if the hash block with index @hblock_idx in the tree, located in
- * @hpage, has already been verified.
+ * Returns true if the hash @block with index @hblock_idx in the merkle tree
+ * for @inode has already been verified.
  */
-static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
+static bool is_hash_block_verified(struct inode *inode,
+				   struct fsverity_blockbuf *block,
 				   unsigned long hblock_idx)
 {
+	struct fsverity_info *vi = inode->i_verity_info;
+	struct page *hpage = (struct page *)block->context;
 	unsigned int blocks_per_page;
 	unsigned int i;
 
@@ -90,20 +93,19 @@ static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage,
  */
 static bool
 verify_data_block(struct inode *inode, struct fsverity_info *vi,
-		  const void *data, u64 data_pos, unsigned long max_ra_pages)
+		  const void *data, u64 data_pos, unsigned long max_ra_bytes)
 {
 	const struct merkle_tree_params *params = &vi->tree_params;
 	const unsigned int hsize = params->digest_size;
 	int level;
+	unsigned long ra_bytes;
 	u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
 	const u8 *want_hash;
 	u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
 	/* The hash blocks that are traversed, indexed by level */
 	struct {
-		/* Page containing the hash block */
-		struct page *page;
-		/* Mapped address of the hash block (will be within @page) */
-		const void *addr;
+		/* Buffer containing the hash block */
+		struct fsverity_blockbuf block;
 		/* Index of the hash block in the tree overall */
 		unsigned long index;
 		/* Byte offset of the wanted hash relative to @addr */
@@ -143,11 +145,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 	for (level = 0; level < params->num_levels; level++) {
 		unsigned long next_hidx;
 		unsigned long hblock_idx;
-		pgoff_t hpage_idx;
-		unsigned int hblock_offset_in_page;
+		u64 hblock_pos;
 		unsigned int hoffset;
-		struct page *hpage;
-		const void *haddr;
+		struct fsverity_blockbuf *block = &hblocks[level].block;
 
 		/*
 		 * The index of the block in the current level; also the index
@@ -158,36 +158,29 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		/* Index of the hash block in the tree overall */
 		hblock_idx = params->level_start[level] + next_hidx;
 
-		/* Index of the hash page in the tree overall */
-		hpage_idx = hblock_idx >> params->log_blocks_per_page;
-
-		/* Byte offset of the hash block within the page */
-		hblock_offset_in_page =
-			(hblock_idx << params->log_blocksize) & ~PAGE_MASK;
+		/* Byte offset of the hash block in the tree overall */
+		hblock_pos = (u64)hblock_idx << params->log_blocksize;
 
 		/* Byte offset of the hash within the block */
 		hoffset = (hidx << params->log_digestsize) &
 			  (params->block_size - 1);
 
-		hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
-				hpage_idx, level == 0 ? min(max_ra_pages,
-					params->tree_pages - hpage_idx) : 0);
-		if (IS_ERR(hpage)) {
-			fsverity_err(inode,
-				     "Error %ld reading Merkle tree page %lu",
-				     PTR_ERR(hpage), hpage_idx);
+		if (level == 0)
+			ra_bytes = min_t(u64, max_ra_bytes,
+					 params->tree_size - hblock_pos);
+		else
+			ra_bytes = 0;
+
+		if (fsverity_read_merkle_tree_block(inode, params, hblock_pos,
+						    ra_bytes, block) != 0)
 			goto error;
-		}
-		haddr = kmap_local_page(hpage) + hblock_offset_in_page;
-		if (is_hash_block_verified(vi, hpage, hblock_idx)) {
-			memcpy(_want_hash, haddr + hoffset, hsize);
+
+		if (is_hash_block_verified(inode, block, hblock_idx)) {
+			memcpy(_want_hash, block->kaddr + hoffset, hsize);
 			want_hash = _want_hash;
-			kunmap_local(haddr);
-			put_page(hpage);
+			fsverity_drop_merkle_tree_block(inode, block);
 			goto descend;
 		}
-		hblocks[level].page = hpage;
-		hblocks[level].addr = haddr;
 		hblocks[level].index = hblock_idx;
 		hblocks[level].hoffset = hoffset;
 		hidx = next_hidx;
@@ -197,8 +190,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 descend:
 	/* Descend the tree verifying hash blocks. */
 	for (; level > 0; level--) {
-		struct page *hpage = hblocks[level - 1].page;
-		const void *haddr = hblocks[level - 1].addr;
+		struct fsverity_blockbuf *block = &hblocks[level - 1].block;
+		const void *haddr = block->kaddr;
 		unsigned long hblock_idx = hblocks[level - 1].index;
 		unsigned int hoffset = hblocks[level - 1].hoffset;
 
@@ -214,11 +207,10 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		if (vi->hash_block_verified)
 			set_bit(hblock_idx, vi->hash_block_verified);
 		else
-			SetPageChecked(hpage);
+			SetPageChecked((struct page *)block->context);
 		memcpy(_want_hash, haddr + hoffset, hsize);
 		want_hash = _want_hash;
-		kunmap_local(haddr);
-		put_page(hpage);
+		fsverity_drop_merkle_tree_block(inode, block);
 	}
 
 	/* Finally, verify the data block. */
@@ -235,16 +227,14 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		     params->hash_alg->name, hsize, want_hash,
 		     params->hash_alg->name, hsize, real_hash);
 error:
-	for (; level > 0; level--) {
-		kunmap_local(hblocks[level - 1].addr);
-		put_page(hblocks[level - 1].page);
-	}
+	for (; level > 0; level--)
+		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);
 	return false;
 }
 
 static bool
 verify_data_blocks(struct folio *data_folio, size_t len, size_t offset,
-		   unsigned long max_ra_pages)
+		   unsigned long max_ra_bytes)
 {
 	struct inode *inode = data_folio->mapping->host;
 	struct fsverity_info *vi = inode->i_verity_info;
@@ -262,7 +252,7 @@ verify_data_blocks(struct folio *data_folio, size_t len, size_t offset,
 
 		data = kmap_local_folio(data_folio, offset);
 		valid = verify_data_block(inode, vi, data, pos + offset,
-					  max_ra_pages);
+					  max_ra_bytes);
 		kunmap_local(data);
 		if (!valid)
 			return false;
@@ -308,7 +298,7 @@ EXPORT_SYMBOL_GPL(fsverity_verify_blocks);
 void fsverity_verify_bio(struct bio *bio)
 {
 	struct folio_iter fi;
-	unsigned long max_ra_pages = 0;
+	unsigned long max_ra_bytes = 0;
 
 	if (bio->bi_opf & REQ_RAHEAD) {
 		/*
@@ -320,12 +310,12 @@ void fsverity_verify_bio(struct bio *bio)
 		 * This improves sequential read performance, as it greatly
 		 * reduces the number of I/O requests made to the Merkle tree.
 		 */
-		max_ra_pages = bio->bi_iter.bi_size >> (PAGE_SHIFT + 2);
+		max_ra_bytes = bio->bi_iter.bi_size >> 2;
 	}
 
 	bio_for_each_folio_all(fi, bio) {
 		if (!verify_data_blocks(fi.folio, fi.length, fi.offset,
-					max_ra_pages)) {
+					max_ra_bytes)) {
 			bio->bi_status = BLK_STS_IOERR;
 			break;
 		}
@@ -362,3 +352,64 @@ void __init fsverity_init_workqueue(void)
 	if (!fsverity_read_workqueue)
 		panic("failed to allocate fsverity_read_queue");
 }
+
+/**
+ * fsverity_read_merkle_tree_block() - read Merkle tree block
+ * @inode: inode to which this Merkle tree block belongs
+ * @params: merkle tree parameters
+ * @pos: byte position within merkle tree
+ * @ra_bytes: try to read ahead this many bytes
+ * @block: block to be loaded
+ *
+ * This function loads data from a merkle tree.
+ */
+int fsverity_read_merkle_tree_block(struct inode *inode,
+				    const struct merkle_tree_params *params,
+				    u64 pos, unsigned long ra_bytes,
+				    struct fsverity_blockbuf *block)
+{
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
+	unsigned long page_idx;
+	struct page *page;
+	unsigned long index;
+	unsigned int offset_in_page;
+	int err;
+
+	block->pos = pos;
+	block->size = params->block_size;
+
+	index = pos >> params->log_blocksize;
+	page_idx = round_down(index, params->blocks_per_page);
+	offset_in_page = pos & ~PAGE_MASK;
+
+	page = vops->read_merkle_tree_page(inode, page_idx,
+			ra_bytes >> PAGE_SHIFT);
+	if (IS_ERR(page)) {
+		err = PTR_ERR(page);
+		goto bad;
+	}
+
+	block->kaddr = kmap_local_page(page) + offset_in_page;
+	block->context = page;
+	return 0;
+bad:
+	fsverity_err(inode, "Error %d reading Merkle tree block %llu", err,
+			pos);
+	return err;
+}
+
+/**
+ * fsverity_drop_merkle_tree_block() - release resources acquired by
+ * fsverity_read_merkle_tree_block
+ *
+ * @inode: inode to which this Merkle tree block belongs
+ * @block: block to be released
+ */
+void fsverity_drop_merkle_tree_block(struct inode *inode,
+				     struct fsverity_blockbuf *block)
+{
+	kunmap_local(block->kaddr);
+	put_page((struct page *)block->context);
+	block->kaddr = NULL;
+	block->context = NULL;
+}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index ac58b19f23d32..05f8e89e0f470 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -26,6 +26,25 @@
 /* Arbitrary limit to bound the kmalloc() size.  Can be changed. */
 #define FS_VERITY_MAX_DESCRIPTOR_SIZE	16384
 
+/**
+ * struct fsverity_blockbuf - Merkle Tree block buffer
+ * @context: filesystem private context
+ * @kaddr: virtual address of the block's data
+ * @pos: the position of the block in the Merkle tree (in bytes)
+ * @size: the Merkle tree block size
+ *
+ * Buffer containing a single Merkle Tree block.  When fs-verity wants to read
+ * merkle data from disk, it passes the filesystem a buffer with the @pos,
+ * @index, and @size fields filled out.  The filesystem sets @kaddr and
+ * @context.
+ */
+struct fsverity_blockbuf {
+	void *context;
+	void *kaddr;
+	loff_t pos;
+	unsigned int size;
+};
+
 /* Verity operations for filesystems */
 struct fsverity_operations {
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 04/18] fsverity: support block-based Merkle tree caching
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-30  3:20   ` [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets Darrick J. Wong
@ 2024-04-30  3:20   ` Darrick J. Wong
  2024-05-01  7:36     ` Christoph Hellwig
  2024-04-30  3:20   ` [PATCH 05/18] fsverity: pass the merkle tree block level to fsverity_read_merkle_tree_block Darrick J. Wong
                     ` (13 subsequent siblings)
  17 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:20 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

In the current implementation fs-verity expects filesystem to provide
PAGEs filled with Merkle tree blocks. Then, when fs-verity is done with
processing the blocks, reference to PAGE is freed. This doesn't fit well
with the way XFS manages its memory.

To allow XFS integrate fs-verity this patch adds ability to fs-verity
verification code to take Merkle tree blocks instead of PAGE reference.
This way ext4, f2fs, and btrfs are still able to pass PAGE references
and XFS can pass reference to Merkle tree blocks stored in XFS's
extended attribute infrastructure.

To achieve this, the XFS implementation will implement its own incore
merkle tree block cache.  These blocks will be passed to fsverity when
it needs to read a merkle tree block, and dropped  by fsverity when
validation completes.  The cache will keep track of whether or not a
given block has already been verified, which will improve performance on
random reads.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: fix uninit err variable, remove dependency on bitmap, apply
 various suggestions from maintainer, tighten changelog]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/open.c         |   22 +++++++++++++++-
 fs/verity/verify.c       |   41 +++++++++++++++++++++++++++--
 include/linux/fsverity.h |   64 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 120 insertions(+), 7 deletions(-)


diff --git a/fs/verity/open.c b/fs/verity/open.c
index fdeb95eca3af3..4777130322866 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -180,9 +180,23 @@ static int compute_file_digest(const struct fsverity_hash_alg *hash_alg,
 struct fsverity_info *fsverity_create_info(const struct inode *inode,
 					   struct fsverity_descriptor *desc)
 {
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	struct fsverity_info *vi;
 	int err;
 
+	/*
+	 * If the filesystem implementation supplies Merkle tree content on a
+	 * per-block basis, it must implement both the read and drop functions.
+	 * If it supplies content on a per-page basis, neither should be
+	 * provided.
+	 */
+	if (vops->read_merkle_tree_page)
+		WARN_ON_ONCE(vops->read_merkle_tree_block != NULL ||
+			     vops->drop_merkle_tree_block != NULL);
+	else
+		WARN_ON_ONCE(vops->read_merkle_tree_block == NULL ||
+			     vops->drop_merkle_tree_block == NULL);
+
 	vi = kmem_cache_zalloc(fsverity_info_cachep, GFP_KERNEL);
 	if (!vi)
 		return ERR_PTR(-ENOMEM);
@@ -213,7 +227,13 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode,
 	if (err)
 		goto fail;
 
-	if (vi->tree_params.block_size != PAGE_SIZE) {
+	/*
+	 * If the fs supplies Merkle tree content on a per-page basis and the
+	 * page size doesn't match the block size, fs-verity must use the
+	 * hash_block_verified bitmap instead of PG_checked.
+	 */
+	if (vops->read_merkle_tree_block == NULL &&
+	    vi->tree_params.block_size != PAGE_SIZE) {
 		/*
 		 * When the Merkle tree block size and page size differ, we use
 		 * a bitmap to keep track of which hash blocks have been
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 1c4a7c63c0a1c..55ada2af290ac 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -20,11 +20,22 @@ static bool is_hash_block_verified(struct inode *inode,
 				   struct fsverity_blockbuf *block,
 				   unsigned long hblock_idx)
 {
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	struct fsverity_info *vi = inode->i_verity_info;
-	struct page *hpage = (struct page *)block->context;
+	struct page *hpage;
 	unsigned int blocks_per_page;
 	unsigned int i;
 
+	/*
+	 * If the filesystem supplies Merkle tree content on a per-block basis,
+	 * rely on the implementation to retain verified status.
+	 */
+	if (vops->read_merkle_tree_block)
+		return block->verified;
+
+	/* Otherwise, the filesystem uses page-based caching. */
+	hpage = (struct page *)block->context;
+
 	/*
 	 * When the Merkle tree block size and page size are the same, then the
 	 * ->hash_block_verified bitmap isn't allocated, and we use PG_checked
@@ -96,6 +107,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		  const void *data, u64 data_pos, unsigned long max_ra_bytes)
 {
 	const struct merkle_tree_params *params = &vi->tree_params;
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
 	const unsigned int hsize = params->digest_size;
 	int level;
 	unsigned long ra_bytes;
@@ -204,7 +216,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		 * idempotent, as the same hash block might be verified by
 		 * multiple threads concurrently.
 		 */
-		if (vi->hash_block_verified)
+		if (vops->read_merkle_tree_block)
+			block->verified = true;
+		else if (vi->hash_block_verified)
 			set_bit(hblock_idx, vi->hash_block_verified);
 		else
 			SetPageChecked((struct page *)block->context);
@@ -377,6 +391,19 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 
 	block->pos = pos;
 	block->size = params->block_size;
+	block->verified = false;
+
+	if (vops->read_merkle_tree_block) {
+		struct fsverity_readmerkle req = {
+			.inode = inode,
+			.ra_bytes = ra_bytes,
+		};
+
+		err = vops->read_merkle_tree_block(&req, block);
+		if (err)
+			goto bad;
+		return 0;
+	}
 
 	index = pos >> params->log_blocksize;
 	page_idx = round_down(index, params->blocks_per_page);
@@ -408,8 +435,14 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 void fsverity_drop_merkle_tree_block(struct inode *inode,
 				     struct fsverity_blockbuf *block)
 {
-	kunmap_local(block->kaddr);
-	put_page((struct page *)block->context);
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
+
+	if (vops->drop_merkle_tree_block) {
+		vops->drop_merkle_tree_block(block);
+	} else {
+		kunmap_local(block->kaddr);
+		put_page((struct page *)block->context);
+	}
 	block->kaddr = NULL;
 	block->context = NULL;
 }
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 05f8e89e0f470..ad17f8553f9cf 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -32,17 +32,38 @@
  * @kaddr: virtual address of the block's data
  * @pos: the position of the block in the Merkle tree (in bytes)
  * @size: the Merkle tree block size
+ * @verified: has this buffer been validated?
  *
  * Buffer containing a single Merkle Tree block.  When fs-verity wants to read
  * merkle data from disk, it passes the filesystem a buffer with the @pos,
- * @index, and @size fields filled out.  The filesystem sets @kaddr and
- * @context.
+ * @index, and @size fields filled out.  The filesystem sets @kaddr, @context,
+ * and @verified.
+ *
+ * While reading the tree, fs-verity calls ->read_merkle_tree_block followed by
+ * ->drop_merkle_tree_block to let filesystem know that memory can be freed.
+ *
+ * The context is optional. This field can be used by filesystem to pass
+ * through state from ->read_merkle_tree_block to ->drop_merkle_tree_block.
  */
 struct fsverity_blockbuf {
 	void *context;
 	void *kaddr;
 	loff_t pos;
 	unsigned int size;
+	unsigned int verified:1;
+};
+
+/**
+ * struct fsverity_readmerkle - Request to read a Merkle Tree block buffer
+ * @inode: the inode to read
+ * @ra_bytes: The number of bytes that should be prefetched starting at pos
+ *		if the page at @block->offset isn't already cached.
+ *		Implementations may ignore this argument; it's only a
+ *		performance optimization.
+ */
+struct fsverity_readmerkle {
+	struct inode *inode;
+	unsigned long ra_bytes;
 };
 
 /* Verity operations for filesystems */
@@ -120,12 +141,35 @@ struct fsverity_operations {
 	 *
 	 * Note that this must retrieve a *page*, not necessarily a *block*.
 	 *
+	 * If this function is implemented, do not implement
+	 * ->read_merkle_tree_block or ->drop_merkle_tree_block.
+	 *
 	 * Return: the page on success, ERR_PTR() on failure
 	 */
 	struct page *(*read_merkle_tree_page)(struct inode *inode,
 					      pgoff_t index,
 					      unsigned long num_ra_pages);
 
+	/**
+	 * Read a Merkle tree block of the given inode.
+	 * @req: read request; see struct fsverity_readmerkle
+	 * @block: block buffer for filesystem to point it to the block
+	 *
+	 * This can be called at any time on an open verity file.  It may be
+	 * called by multiple processes concurrently.
+	 *
+	 * Implementations may cache the @block->verified state in
+	 * ->drop_merkle_tree_block.  They must clear the @block->verified
+	 * flag for a cache miss.
+	 *
+	 * If this function is implemented, ->drop_merkle_tree_block must also
+	 * be implemented.
+	 *
+	 * Return: 0 on success, -errno on failure
+	 */
+	int (*read_merkle_tree_block)(const struct fsverity_readmerkle *req,
+				      struct fsverity_blockbuf *block);
+
 	/**
 	 * Write a Merkle tree block to the given inode.
 	 *
@@ -141,6 +185,22 @@ struct fsverity_operations {
 	 */
 	int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
 				       u64 pos, unsigned int size);
+
+	/**
+	 * Release the reference to a Merkle tree block
+	 *
+	 * @block: the block to release
+	 *
+	 * This is called when fs-verity is done with a block obtained with
+	 * ->read_merkle_tree_block().
+	 *
+	 * Implementations should cache a @block->verified==1 state to avoid
+	 * unnecessary revalidations during later accesses.
+	 *
+	 * If this function is implemented, ->read_merkle_tree_block must also
+	 * be implemented.
+	 */
+	void (*drop_merkle_tree_block)(struct fsverity_blockbuf *block);
 };
 
 #ifdef CONFIG_FS_VERITY


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 05/18] fsverity: pass the merkle tree block level to fsverity_read_merkle_tree_block
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-30  3:20   ` [PATCH 04/18] fsverity: support block-based Merkle tree caching Darrick J. Wong
@ 2024-04-30  3:20   ` Darrick J. Wong
  2024-04-30  3:20   ` [PATCH 06/18] fsverity: add per-sb workqueue for post read processing Darrick J. Wong
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:20 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

The XFS fsverity implementation will cache merkle tree blocks on its
own.  It would be great if the shrinker that will be associated with
this cache could guesstimate the amount of pain associated with
reclaiming a cached merkle tree block.  We can use the tree level of a
block as this guesstimate, so export this information if we have it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/fsverity_private.h |    2 +-
 fs/verity/read_metadata.c    |    1 +
 fs/verity/verify.c           |   11 ++++++++---
 include/linux/fsverity.h     |    7 +++++++
 4 files changed, 17 insertions(+), 4 deletions(-)


diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 8a41e27413284..c1f82a0ea4cfa 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -156,7 +156,7 @@ void __init fsverity_init_workqueue(void);
 
 int fsverity_read_merkle_tree_block(struct inode *inode,
 				    const struct merkle_tree_params *params,
-				    u64 pos, unsigned long ra_bytes,
+				    int level, u64 pos, unsigned long ra_bytes,
 				    struct fsverity_blockbuf *block);
 
 void fsverity_drop_merkle_tree_block(struct inode *inode,
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index 4011a02f5d32d..3ec6230425d65 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -40,6 +40,7 @@ static int fsverity_read_merkle_tree(struct inode *inode,
 				      params->block_size - offs_in_block);
 
 		err = fsverity_read_merkle_tree_block(inode, &vi->tree_params,
+						      FSVERITY_STREAMING_READ,
 						      pos - offs_in_block,
 						      ra_bytes, &block);
 		if (err)
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 55ada2af290ac..daf2057dbe839 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -183,8 +183,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		else
 			ra_bytes = 0;
 
-		if (fsverity_read_merkle_tree_block(inode, params, hblock_pos,
-						    ra_bytes, block) != 0)
+		if (fsverity_read_merkle_tree_block(inode, params, level,
+						    hblock_pos, ra_bytes,
+						    block) != 0)
 			goto error;
 
 		if (is_hash_block_verified(inode, block, hblock_idx)) {
@@ -371,6 +372,8 @@ void __init fsverity_init_workqueue(void)
  * fsverity_read_merkle_tree_block() - read Merkle tree block
  * @inode: inode to which this Merkle tree block belongs
  * @params: merkle tree parameters
+ * @level: expected level of the block; level 0 are the leaves, -1 means a
+ * streaming read
  * @pos: byte position within merkle tree
  * @ra_bytes: try to read ahead this many bytes
  * @block: block to be loaded
@@ -379,7 +382,7 @@ void __init fsverity_init_workqueue(void)
  */
 int fsverity_read_merkle_tree_block(struct inode *inode,
 				    const struct merkle_tree_params *params,
-				    u64 pos, unsigned long ra_bytes,
+				    int level, u64 pos, unsigned long ra_bytes,
 				    struct fsverity_blockbuf *block)
 {
 	const struct fsverity_operations *vops = inode->i_sb->s_vop;
@@ -396,6 +399,8 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 	if (vops->read_merkle_tree_block) {
 		struct fsverity_readmerkle req = {
 			.inode = inode,
+			.level = level,
+			.num_levels = params->num_levels,
 			.ra_bytes = ra_bytes,
 		};
 
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index ad17f8553f9cf..15bf33be99d79 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -56,6 +56,9 @@ struct fsverity_blockbuf {
 /**
  * struct fsverity_readmerkle - Request to read a Merkle Tree block buffer
  * @inode: the inode to read
+ * @level: expected level of the block; level 0 are the leaves.
+ * 		A value of FSVERITY_STREAMING_READ means a streaming read.
+ * @num_levels: number of levels in the tree total
  * @ra_bytes: The number of bytes that should be prefetched starting at pos
  *		if the page at @block->offset isn't already cached.
  *		Implementations may ignore this argument; it's only a
@@ -64,8 +67,12 @@ struct fsverity_blockbuf {
 struct fsverity_readmerkle {
 	struct inode *inode;
 	unsigned long ra_bytes;
+	int level;
+	int num_levels;
 };
 
+#define FSVERITY_STREAMING_READ	(-1)
+
 /* Verity operations for filesystems */
 struct fsverity_operations {
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 06/18] fsverity: add per-sb workqueue for post read processing
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-30  3:20   ` [PATCH 05/18] fsverity: pass the merkle tree block level to fsverity_read_merkle_tree_block Darrick J. Wong
@ 2024-04-30  3:20   ` Darrick J. Wong
  2024-04-30  3:21   ` [PATCH 07/18] fsverity: add tracepoints Darrick J. Wong
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:20 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

For XFS, fsverity's global workqueue is not really suitable due to:

1. High priority workqueues are used within XFS to ensure that data
   IO completion cannot stall processing of journal IO completions.
   Hence using a WQ_HIGHPRI workqueue directly in the user data IO
   path is a potential filesystem livelock/deadlock vector.

2. The fsverity workqueue is global - it creates a cross-filesystem
   contention point.

This patch adds per-filesystem, per-cpu workqueue for fsverity
work. This allows iomap to add verification work in the read path on
BIO completion.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: make it clearer that this workqueue is for verity]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/super.c               |    3 +++
 fs/verity/verify.c       |   14 ++++++++++++++
 include/linux/fs.h       |    2 ++
 include/linux/fsverity.h |   18 ++++++++++++++++++
 4 files changed, 37 insertions(+)


diff --git a/fs/super.c b/fs/super.c
index 69ce6c6009684..7758188039554 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -37,6 +37,7 @@
 #include <linux/user_namespace.h>
 #include <linux/fs_context.h>
 #include <uapi/linux/mount.h>
+#include <linux/fsverity.h>
 #include "internal.h"
 
 static int thaw_super_locked(struct super_block *sb, enum freeze_holder who);
@@ -637,6 +638,8 @@ void generic_shutdown_super(struct super_block *sb)
 			sb->s_dio_done_wq = NULL;
 		}
 
+		fsverity_destroy_wq(sb);
+
 		if (sop->put_super)
 			sop->put_super(sb);
 
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index daf2057dbe839..cd0973c88cdba 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -339,6 +339,20 @@ void fsverity_verify_bio(struct bio *bio)
 EXPORT_SYMBOL_GPL(fsverity_verify_bio);
 #endif /* CONFIG_BLOCK */
 
+int fsverity_init_wq(struct super_block *sb, unsigned int wq_flags,
+		     int max_active)
+{
+	WARN_ON_ONCE(sb->s_verity_wq != NULL);
+
+	sb->s_verity_wq = alloc_workqueue("fsverity/%s", wq_flags, max_active,
+					  sb->s_id);
+	if (!sb->s_verity_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_init_wq);
+
 /**
  * fsverity_enqueue_verify_work() - enqueue work on the fs-verity workqueue
  * @work: the work to enqueue
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 95ef7228fd7ba..d2f51fdc62e44 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1232,6 +1232,8 @@ struct super_block {
 #endif
 #ifdef CONFIG_FS_VERITY
 	const struct fsverity_operations *s_vop;
+	/* Completion queue for post read verification */
+	struct workqueue_struct *s_verity_wq;
 #endif
 #if IS_ENABLED(CONFIG_UNICODE)
 	struct unicode_map *s_encoding;
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 15bf33be99d79..c3f04bc0166d3 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -262,6 +262,17 @@ bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
 void fsverity_verify_bio(struct bio *bio);
 void fsverity_enqueue_verify_work(struct work_struct *work);
 
+int fsverity_init_wq(struct super_block *sb, unsigned int wq_flags,
+		       int max_active);
+
+static inline void fsverity_destroy_wq(struct super_block *sb)
+{
+	if (sb->s_verity_wq) {
+		destroy_workqueue(sb->s_verity_wq);
+		sb->s_verity_wq = NULL;
+	}
+}
+
 #else /* !CONFIG_FS_VERITY */
 
 static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
@@ -339,6 +350,13 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
 	WARN_ON_ONCE(1);
 }
 
+static inline int fsverity_init_wq(struct super_block *sb)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void fsverity_destroy_wq(struct super_block *sb) { }
+
 #endif	/* !CONFIG_FS_VERITY */
 
 static inline bool fsverity_verify_folio(struct folio *folio)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 07/18] fsverity: add tracepoints
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-30  3:20   ` [PATCH 06/18] fsverity: add per-sb workqueue for post read processing Darrick J. Wong
@ 2024-04-30  3:21   ` Darrick J. Wong
  2024-04-30  3:21   ` [PATCH 08/18] fsverity: pass the new tree size and block size to ->begin_enable_verity Darrick J. Wong
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:21 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

fs-verity previously had debug printk but it was removed. This patch
adds trace points to the same places where printk were used (with a
few additional ones).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix formatting]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 MAINTAINERS                     |    1 
 fs/verity/enable.c              |    4 +
 fs/verity/fsverity_private.h    |    2 +
 fs/verity/init.c                |    1 
 fs/verity/verify.c              |    9 ++
 include/trace/events/fsverity.h |  143 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 160 insertions(+)
 create mode 100644 include/trace/events/fsverity.h


diff --git a/MAINTAINERS b/MAINTAINERS
index f6dc90559341f..e5be0b47b93b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8825,6 +8825,7 @@ T:	git https://git.kernel.org/pub/scm/fs/fsverity/linux.git
 F:	Documentation/filesystems/fsverity.rst
 F:	fs/verity/
 F:	include/linux/fsverity.h
+F:	include/trace/events/fsverity.h
 F:	include/uapi/linux/fsverity.h
 
 FT260 FTDI USB-HID TO I2C BRIDGE DRIVER
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 04e060880b792..9f743f9160100 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -227,6 +227,8 @@ static int enable_verity(struct file *filp,
 	if (err)
 		goto out;
 
+	trace_fsverity_enable(inode, &params);
+
 	/*
 	 * Start enabling verity on this file, serialized by the inode lock.
 	 * Fail if verity is already enabled or is already being enabled.
@@ -269,6 +271,8 @@ static int enable_verity(struct file *filp,
 		goto rollback;
 	}
 
+	trace_fsverity_tree_done(inode, vi, &params);
+
 	/*
 	 * Tell the filesystem to finish enabling verity on the file.
 	 * Serialized with ->begin_enable_verity() by the inode lock.
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index c1f82a0ea4cfa..c1a306fd1f9b4 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -162,4 +162,6 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 void fsverity_drop_merkle_tree_block(struct inode *inode,
 				     struct fsverity_blockbuf *block);
 
+#include <trace/events/fsverity.h>
+
 #endif /* _FSVERITY_PRIVATE_H */
diff --git a/fs/verity/init.c b/fs/verity/init.c
index cb2c9aac61ed0..3769d2dc9e3b4 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -5,6 +5,7 @@
  * Copyright 2019 Google LLC
  */
 
+#define CREATE_TRACE_POINTS
 #include "fsverity_private.h"
 
 #include <linux/ratelimit.h>
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index cd0973c88cdba..c4c5e1c082de5 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -123,6 +123,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		/* Byte offset of the wanted hash relative to @addr */
 		unsigned int hoffset;
 	} hblocks[FS_VERITY_MAX_LEVELS];
+
+	trace_fsverity_verify_data_block(inode, params, data_pos);
+
 	/*
 	 * The index of the previous level's block within that level; also the
 	 * index of that block's hash within the current level.
@@ -191,6 +194,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		if (is_hash_block_verified(inode, block, hblock_idx)) {
 			memcpy(_want_hash, block->kaddr + hoffset, hsize);
 			want_hash = _want_hash;
+			trace_fsverity_merkle_hit(inode, data_pos, hblock_idx,
+					level,
+					hoffset >> params->log_digestsize);
 			fsverity_drop_merkle_tree_block(inode, block);
 			goto descend;
 		}
@@ -225,6 +231,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 			SetPageChecked((struct page *)block->context);
 		memcpy(_want_hash, haddr + hoffset, hsize);
 		want_hash = _want_hash;
+		trace_fsverity_verify_merkle_block(inode,
+				block->pos >> params->log_blocksize,
+				level, hoffset >> params->log_digestsize);
 		fsverity_drop_merkle_tree_block(inode, block);
 	}
 
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
new file mode 100644
index 0000000000000..dab220884b897
--- /dev/null
+++ b/include/trace/events/fsverity.h
@@ -0,0 +1,143 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fsverity
+
+#if !defined(_TRACE_FSVERITY_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FSVERITY_H
+
+#include <linux/tracepoint.h>
+
+struct fsverity_descriptor;
+struct merkle_tree_params;
+struct fsverity_info;
+
+TRACE_EVENT(fsverity_enable,
+	TP_PROTO(const struct inode *inode,
+		 const struct merkle_tree_params *params),
+	TP_ARGS(inode, params),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, data_size)
+		__field(unsigned int, block_size)
+		__field(unsigned int, num_levels)
+		__field(u64, tree_size)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->data_size = i_size_read(inode);
+		__entry->block_size = params->block_size;
+		__entry->num_levels = params->num_levels;
+		__entry->tree_size = params->tree_size;
+	),
+	TP_printk("ino %lu data size %llu tree size %llu block size %u levels %u",
+		(unsigned long) __entry->ino,
+		__entry->data_size,
+		__entry->tree_size,
+		__entry->block_size,
+		__entry->num_levels)
+);
+
+TRACE_EVENT(fsverity_tree_done,
+	TP_PROTO(const struct inode *inode, const struct fsverity_info *vi,
+		 const struct merkle_tree_params *params),
+	TP_ARGS(inode, vi, params),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(unsigned int, levels)
+		__field(unsigned int, block_size)
+		__field(u64, tree_size)
+		__dynamic_array(u8, root_hash, params->digest_size)
+		__dynamic_array(u8, file_digest, params->digest_size)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->levels = params->num_levels;
+		__entry->block_size = params->block_size;
+		__entry->tree_size = params->tree_size;
+		memcpy(__get_dynamic_array(root_hash), vi->root_hash, __get_dynamic_array_len(root_hash));
+		memcpy(__get_dynamic_array(file_digest), vi->file_digest, __get_dynamic_array_len(file_digest));
+	),
+	TP_printk("ino %lu levels %d block_size %d tree_size %lld root_hash %s digest %s",
+		(unsigned long) __entry->ino,
+		__entry->levels,
+		__entry->block_size,
+		__entry->tree_size,
+		__print_hex_str(__get_dynamic_array(root_hash), __get_dynamic_array_len(root_hash)),
+		__print_hex_str(__get_dynamic_array(file_digest), __get_dynamic_array_len(file_digest)))
+);
+
+TRACE_EVENT(fsverity_verify_data_block,
+	TP_PROTO(const struct inode *inode,
+		 const struct merkle_tree_params *params,
+		 u64 data_pos),
+	TP_ARGS(inode, params, data_pos),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, data_pos)
+		__field(unsigned int, block_size)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->data_pos = data_pos;
+		__entry->block_size = params->block_size;
+	),
+	TP_printk("ino %lu pos %lld merkle_blocksize %u",
+		(unsigned long) __entry->ino,
+		__entry->data_pos,
+		__entry->block_size)
+);
+
+TRACE_EVENT(fsverity_merkle_hit,
+	TP_PROTO(const struct inode *inode, u64 data_pos,
+		 unsigned long hblock_idx, unsigned int level,
+		 unsigned int hidx),
+	TP_ARGS(inode, data_pos, hblock_idx, level, hidx),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(u64, data_pos)
+		__field(unsigned long, hblock_idx)
+		__field(unsigned int, level)
+		__field(unsigned int, hidx)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->data_pos = data_pos;
+		__entry->hblock_idx = hblock_idx;
+		__entry->level = level;
+		__entry->hidx = hidx;
+	),
+	TP_printk("ino %lu data_pos %llu hblock_idx %lu level %u hidx %u",
+		(unsigned long) __entry->ino,
+		__entry->data_pos,
+		__entry->hblock_idx,
+		__entry->level,
+		__entry->hidx)
+);
+
+TRACE_EVENT(fsverity_verify_merkle_block,
+	TP_PROTO(const struct inode *inode, unsigned long index,
+		 unsigned int level, unsigned int hidx),
+	TP_ARGS(inode, index, level, hidx),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(unsigned long, index)
+		__field(unsigned int, level)
+		__field(unsigned int, hidx)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->index = index;
+		__entry->level = level;
+		__entry->hidx = hidx;
+	),
+	TP_printk("ino %lu index %lu level %u hidx %u",
+		(unsigned long) __entry->ino,
+		__entry->index,
+		__entry->level,
+		__entry->hidx)
+);
+
+#endif /* _TRACE_FSVERITY_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 08/18] fsverity: pass the new tree size and block size to ->begin_enable_verity
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-04-30  3:21   ` [PATCH 07/18] fsverity: add tracepoints Darrick J. Wong
@ 2024-04-30  3:21   ` Darrick J. Wong
  2024-04-30  3:21   ` [PATCH 09/18] fsverity: expose merkle tree geometry to callers Darrick J. Wong
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:21 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

When starting up the process of enabling fsverity on a file, pass the
new size of the merkle tree and the merkle tree block size to the fs
implementation.  XFS will want this information later to try to clean
out a failed previous enablement attempt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/btrfs/verity.c        |    3 ++-
 fs/ext4/verity.c         |    3 ++-
 fs/f2fs/verity.c         |    3 ++-
 fs/verity/enable.c       |    3 ++-
 include/linux/fsverity.h |    5 ++++-
 5 files changed, 12 insertions(+), 5 deletions(-)


diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index 647a22e07748e..a3235571bf02d 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -578,7 +578,8 @@ static int finish_verity(struct btrfs_inode *inode, const void *desc,
  *
  * Returns 0 on success, negative error code on failure.
  */
-static int btrfs_begin_enable_verity(struct file *filp)
+static int btrfs_begin_enable_verity(struct file *filp, u64 merkle_tree_size,
+				     unsigned int tree_blocksize)
 {
 	struct btrfs_inode *inode = BTRFS_I(file_inode(filp));
 	struct btrfs_root *root = inode->root;
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index da2095a813492..a8ae8c912cb5d 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -99,7 +99,8 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count,
 	return 0;
 }
 
-static int ext4_begin_enable_verity(struct file *filp)
+static int ext4_begin_enable_verity(struct file *filp, u64 merkle_tree_size,
+				    unsigned int tree_blocksize)
 {
 	struct inode *inode = file_inode(filp);
 	const int credits = 2; /* superblock and inode for ext4_orphan_add() */
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 8fdac653ff8e8..595d702c2c5c4 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -115,7 +115,8 @@ struct fsverity_descriptor_location {
 	__le64 pos;
 };
 
-static int f2fs_begin_enable_verity(struct file *filp)
+static int f2fs_begin_enable_verity(struct file *filp, u64 merkle_tree_size,
+				    unsigned int tree_blocksize)
 {
 	struct inode *inode = file_inode(filp);
 	int err;
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 9f743f9160100..1d4a6de960149 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -237,7 +237,8 @@ static int enable_verity(struct file *filp,
 	if (IS_VERITY(inode))
 		err = -EEXIST;
 	else
-		err = vops->begin_enable_verity(filp);
+		err = vops->begin_enable_verity(filp, params.tree_size,
+				      params.block_size);
 	inode_unlock(inode);
 	if (err)
 		goto out;
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index c3f04bc0166d3..7c51d7cf835ec 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -80,6 +80,8 @@ struct fsverity_operations {
 	 * Begin enabling verity on the given file.
 	 *
 	 * @filp: a readonly file descriptor for the file
+	 * @merkle_tree_size: total bytes the Merkle tree will take up
+	 * @tree_blocksize: the Merkle tree block size
 	 *
 	 * The filesystem must do any needed filesystem-specific preparations
 	 * for enabling verity, e.g. evicting inline data.  It also must return
@@ -89,7 +91,8 @@ struct fsverity_operations {
 	 *
 	 * Return: 0 on success, -errno on failure
 	 */
-	int (*begin_enable_verity)(struct file *filp);
+	int (*begin_enable_verity)(struct file *filp, u64 merkle_tree_size,
+				   unsigned int tree_blocksize);
 
 	/**
 	 * End enabling verity on the given file.


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 09/18] fsverity: expose merkle tree geometry to callers
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-04-30  3:21   ` [PATCH 08/18] fsverity: pass the new tree size and block size to ->begin_enable_verity Darrick J. Wong
@ 2024-04-30  3:21   ` Darrick J. Wong
  2024-04-30  3:22   ` [PATCH 10/18] fsverity: box up the write_merkle_tree_block parameters too Darrick J. Wong
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:21 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Create a function that will return selected information about the
geometry of the merkle tree.  Online fsck for XFS will need this piece
to perform basic checks of the merkle tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/open.c         |   32 ++++++++++++++++++++++++++++++++
 include/linux/fsverity.h |   10 ++++++++++
 2 files changed, 42 insertions(+)


diff --git a/fs/verity/open.c b/fs/verity/open.c
index 4777130322866..aa71a4d3cbff1 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -427,6 +427,38 @@ void __fsverity_cleanup_inode(struct inode *inode)
 }
 EXPORT_SYMBOL_GPL(__fsverity_cleanup_inode);
 
+/**
+ * fsverity_merkle_tree_geometry() - return Merkle tree geometry
+ * @inode: the inode to query
+ * @block_size: will be set to the size of a merkle tree block, in bytes
+ * @tree_size: will be set to the size of the merkle tree, in bytes
+ *
+ * Callers are not required to have opened the file.
+ *
+ * Return: 0 for success, -ENODATA if verity is not enabled, or any of the
+ * error codes that can result from loading verity information while opening a
+ * file.
+ */
+int fsverity_merkle_tree_geometry(struct inode *inode, unsigned int *block_size,
+				  u64 *tree_size)
+{
+	struct fsverity_info *vi;
+	int error;
+
+	if (!IS_VERITY(inode))
+		return -ENODATA;
+
+	error = ensure_verity_info(inode);
+	if (error)
+		return error;
+
+	vi = inode->i_verity_info;
+	*block_size = vi->tree_params.block_size;
+	*tree_size = vi->tree_params.tree_size;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_merkle_tree_geometry);
+
 void __init fsverity_init_info_cache(void)
 {
 	fsverity_info_cachep = KMEM_CACHE_USERCOPY(
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 7c51d7cf835ec..a3a5b68bed0d3 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -243,6 +243,9 @@ int __fsverity_file_open(struct inode *inode, struct file *filp);
 int __fsverity_prepare_setattr(struct dentry *dentry, struct iattr *attr);
 void __fsverity_cleanup_inode(struct inode *inode);
 
+int fsverity_merkle_tree_geometry(struct inode *inode, unsigned int *block_size,
+				  u64 *tree_size);
+
 /**
  * fsverity_cleanup_inode() - free the inode's verity info, if present
  * @inode: an inode being evicted
@@ -326,6 +329,13 @@ static inline void fsverity_cleanup_inode(struct inode *inode)
 {
 }
 
+static inline int fsverity_merkle_tree_geometry(struct inode *inode,
+						unsigned int *block_size,
+						u64 *tree_size)
+{
+	return -EOPNOTSUPP;
+}
+
 /* read_metadata.c */
 
 static inline int fsverity_ioctl_read_metadata(struct file *filp,


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 10/18] fsverity: box up the write_merkle_tree_block parameters too
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-04-30  3:21   ` [PATCH 09/18] fsverity: expose merkle tree geometry to callers Darrick J. Wong
@ 2024-04-30  3:22   ` Darrick J. Wong
  2024-04-30  3:22   ` [PATCH 11/18] fsverity: pass the zero-hash value to the implementation Darrick J. Wong
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:22 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Box up the tree write request parameters into a structure so that we can
add more in the next few patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/btrfs/verity.c        |    6 ++++--
 fs/ext4/verity.c         |    7 +++++--
 fs/f2fs/verity.c         |    7 +++++--
 fs/verity/enable.c       |    5 ++++-
 include/linux/fsverity.h |   15 ++++++++++++---
 5 files changed, 30 insertions(+), 10 deletions(-)


diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index a3235571bf02d..576547c0f9e54 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -790,9 +790,11 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
  *
  * Returns 0 on success or negative error code on failure
  */
-static int btrfs_write_merkle_tree_block(struct inode *inode, const void *buf,
-					 u64 pos, unsigned int size)
+static int btrfs_write_merkle_tree_block(const struct fsverity_writemerkle *req,
+					 const void *buf, u64 pos,
+					 unsigned int size)
 {
+	struct inode *inode = req->inode;
 	loff_t merkle_pos = merkle_file_pos(inode);
 
 	if (merkle_pos < 0)
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index a8ae8c912cb5d..27eb2d51cce2f 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -382,9 +382,12 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 	return folio_file_page(folio, index);
 }
 
-static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf,
-					u64 pos, unsigned int size)
+static int ext4_write_merkle_tree_block(const struct fsverity_writemerkle *req,
+					const void *buf, u64 pos,
+					unsigned int size)
 {
+	struct inode *inode = req->inode;
+
 	pos += ext4_verity_metadata_pos(inode);
 
 	return pagecache_write(inode, buf, size, pos);
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 595d702c2c5c4..f8d974818f3bb 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -279,9 +279,12 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 	return folio_file_page(folio, index);
 }
 
-static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf,
-					u64 pos, unsigned int size)
+static int f2fs_write_merkle_tree_block(const struct fsverity_writemerkle *req,
+					const void *buf, u64 pos,
+					unsigned int size)
 {
+	struct inode *inode = req->inode;
+
 	pos += f2fs_verity_metadata_pos(inode);
 
 	return pagecache_write(inode, buf, size, pos);
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 1d4a6de960149..233b20fb12ff5 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -50,10 +50,13 @@ static int write_merkle_tree_block(struct inode *inode, const u8 *buf,
 				   unsigned long index,
 				   const struct merkle_tree_params *params)
 {
+	struct fsverity_writemerkle req = {
+		.inode = inode,
+	};
 	u64 pos = (u64)index << params->log_blocksize;
 	int err;
 
-	err = inode->i_sb->s_vop->write_merkle_tree_block(inode, buf, pos,
+	err = inode->i_sb->s_vop->write_merkle_tree_block(&req, buf, pos,
 							  params->block_size);
 	if (err)
 		fsverity_err(inode, "Error %d writing Merkle tree block %lu",
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index a3a5b68bed0d3..710006552804d 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -73,6 +73,14 @@ struct fsverity_readmerkle {
 
 #define FSVERITY_STREAMING_READ	(-1)
 
+/**
+ * struct fsverity_writemerkle - Request to write a Merkle Tree block buffer
+ * @inode: the inode to read
+ */
+struct fsverity_writemerkle {
+	struct inode *inode;
+};
+
 /* Verity operations for filesystems */
 struct fsverity_operations {
 
@@ -183,7 +191,7 @@ struct fsverity_operations {
 	/**
 	 * Write a Merkle tree block to the given inode.
 	 *
-	 * @inode: the inode for which the Merkle tree is being built
+	 * @req: write request; see struct fsverity_writemerkle
 	 * @buf: the Merkle tree block to write
 	 * @pos: the position of the block in the Merkle tree (in bytes)
 	 * @size: the Merkle tree block size (in bytes)
@@ -193,8 +201,9 @@ struct fsverity_operations {
 	 *
 	 * Return: 0 on success, -errno on failure
 	 */
-	int (*write_merkle_tree_block)(struct inode *inode, const void *buf,
-				       u64 pos, unsigned int size);
+	int (*write_merkle_tree_block)(const struct fsverity_writemerkle *req,
+				       const void *buf, u64 pos,
+				       unsigned int size);
 
 	/**
 	 * Release the reference to a Merkle tree block


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 11/18] fsverity: pass the zero-hash value to the implementation
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-04-30  3:22   ` [PATCH 10/18] fsverity: box up the write_merkle_tree_block parameters too Darrick J. Wong
@ 2024-04-30  3:22   ` Darrick J. Wong
  2024-04-30  3:22   ` [PATCH 12/18] fsverity: report validation errors back to the filesystem Darrick J. Wong
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:22 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Compute the hash of one filesystem block's worth of zeros and pass a
reference to this to the merkle tree read and write functions.  A
filesystem implementation can decide to elide merkle tree blocks
containing only this hash and synthesize the contents at read time.

Let's pretend that there's a file containing six data blocks and whose
merkle tree looks roughly like this:

root
 +--leaf0
 |   +--data0
 |   +--data1
 |   `--data2
 `--leaf1
     +--data3
     +--data4
     `--data5

If data[0-2] are sparse holes, then leaf0 will contain a repeating
sequence of @zero_digest.  Therefore, leaf0 need not be written to disk
because its contents can be synthesized.

A subsequent xfs patch will use this to reduce the size of the merkle
tree when dealing with sparse gold master disk images and the like.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/enable.c           |    2 ++
 fs/verity/fsverity_private.h |    3 +++
 fs/verity/open.c             |    7 +++++++
 fs/verity/verify.c           |    2 ++
 include/linux/fsverity.h     |    8 ++++++++
 5 files changed, 22 insertions(+)


diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 233b20fb12ff5..8c6fe4b72b14e 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -52,6 +52,8 @@ static int write_merkle_tree_block(struct inode *inode, const u8 *buf,
 {
 	struct fsverity_writemerkle req = {
 		.inode = inode,
+		.zero_digest = params->zero_digest,
+		.digest_size = params->digest_size,
 	};
 	u64 pos = (u64)index << params->log_blocksize;
 	int err;
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index c1a306fd1f9b4..20208425e56fc 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -47,6 +47,9 @@ struct merkle_tree_params {
 	u64 tree_size;			/* Merkle tree size in bytes */
 	unsigned long tree_pages;	/* Merkle tree size in pages */
 
+	/* the hash of a merkle block-sized buffer of zeroes */
+	u8 zero_digest[FS_VERITY_MAX_DIGEST_SIZE];
+
 	/*
 	 * Starting block index for each tree level, ordered from leaf level (0)
 	 * to root level ('num_levels - 1')
diff --git a/fs/verity/open.c b/fs/verity/open.c
index aa71a4d3cbff1..c9d858d99f4ac 100644
--- a/fs/verity/open.c
+++ b/fs/verity/open.c
@@ -144,6 +144,13 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
 		goto out_err;
 	}
 
+	err = fsverity_hash_block(params, inode, page_address(ZERO_PAGE(0)),
+				   params->zero_digest);
+	if (err) {
+		fsverity_err(inode, "Error %d computing zero digest", err);
+		goto out_err;
+	}
+
 	params->tree_size = offset << log_blocksize;
 	params->tree_pages = PAGE_ALIGN(params->tree_size) >> PAGE_SHIFT;
 	return 0;
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index c4c5e1c082de5..0782a69d89f26 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -425,6 +425,8 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
 			.level = level,
 			.num_levels = params->num_levels,
 			.ra_bytes = ra_bytes,
+			.zero_digest = params->zero_digest,
+			.digest_size = params->digest_size,
 		};
 
 		err = vops->read_merkle_tree_block(&req, block);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 710006552804d..dc8f85380b9c7 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -63,12 +63,16 @@ struct fsverity_blockbuf {
  *		if the page at @block->offset isn't already cached.
  *		Implementations may ignore this argument; it's only a
  *		performance optimization.
+ * @zero_digest: the hash of a merkle block-sized buffer of zeroes
+ * @digest_size: size of zero_digest, in bytes
  */
 struct fsverity_readmerkle {
 	struct inode *inode;
 	unsigned long ra_bytes;
 	int level;
 	int num_levels;
+	const u8 *zero_digest;
+	unsigned int digest_size;
 };
 
 #define FSVERITY_STREAMING_READ	(-1)
@@ -76,9 +80,13 @@ struct fsverity_readmerkle {
 /**
  * struct fsverity_writemerkle - Request to write a Merkle Tree block buffer
  * @inode: the inode to read
+ * @zero_digest: the hash of a merkle block-sized buffer of zeroes
+ * @digest_size: size of zero_digest, in bytes
  */
 struct fsverity_writemerkle {
 	struct inode *inode;
+	const u8 *zero_digest;
+	unsigned int digest_size;
 };
 
 /* Verity operations for filesystems */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 12/18] fsverity: report validation errors back to the filesystem
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-04-30  3:22   ` [PATCH 11/18] fsverity: pass the zero-hash value to the implementation Darrick J. Wong
@ 2024-04-30  3:22   ` Darrick J. Wong
  2024-04-30  3:22   ` [PATCH 13/18] fsverity: pass super_block to fsverity_enqueue_verify_work Darrick J. Wong
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:22 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Provide a new function call so that validation errors can be reported
back to the filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/verify.c              |    3 +++
 include/linux/fsverity.h        |   14 ++++++++++++++
 include/trace/events/fsverity.h |   19 +++++++++++++++++++
 3 files changed, 36 insertions(+)


diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 0782a69d89f26..2c1de3cdf24c8 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -250,6 +250,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		     data_pos, level - 1,
 		     params->hash_alg->name, hsize, want_hash,
 		     params->hash_alg->name, hsize, real_hash);
+	trace_fsverity_file_corrupt(inode, data_pos, params->block_size);
+	if (vops->file_corrupt)
+		vops->file_corrupt(inode, data_pos, params->block_size);
 error:
 	for (; level > 0; level--)
 		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index dc8f85380b9c7..6849c4e8268f8 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -228,6 +228,20 @@ struct fsverity_operations {
 	 * be implemented.
 	 */
 	void (*drop_merkle_tree_block)(struct fsverity_blockbuf *block);
+
+	/**
+	 * Notify the filesystem that file data is corrupt.
+	 *
+	 * @inode: the inode being validated
+	 * @pos: the file position of the invalid data
+	 * @len: the length of the invalid data
+	 *
+	 * This function is called when fs-verity detects that a portion of a
+	 * file's data is inconsistent with the Merkle tree, or a Merkle tree
+	 * block needed to validate the data is inconsistent with the level
+	 * above it.
+	 */
+	void (*file_corrupt)(struct inode *inode, loff_t pos, size_t len);
 };
 
 #ifdef CONFIG_FS_VERITY
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
index dab220884b897..375fdddac6a99 100644
--- a/include/trace/events/fsverity.h
+++ b/include/trace/events/fsverity.h
@@ -137,6 +137,25 @@ TRACE_EVENT(fsverity_verify_merkle_block,
 		__entry->hidx)
 );
 
+TRACE_EVENT(fsverity_file_corrupt,
+	TP_PROTO(const struct inode *inode, loff_t pos, size_t len),
+	TP_ARGS(inode, pos, len),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+		__field(loff_t, pos)
+		__field(size_t, len)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+		__entry->pos = pos;
+		__entry->len = len;
+	),
+	TP_printk("ino %lu pos %llu len %zu",
+		(unsigned long) __entry->ino,
+		__entry->pos,
+		__entry->len)
+);
+
 #endif /* _TRACE_FSVERITY_H */
 
 /* This part must be outside protection */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 13/18] fsverity: pass super_block to fsverity_enqueue_verify_work
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-04-30  3:22   ` [PATCH 12/18] fsverity: report validation errors back to the filesystem Darrick J. Wong
@ 2024-04-30  3:22   ` Darrick J. Wong
  2024-04-30  3:23   ` [PATCH 14/18] ext4: use a per-superblock fsverity workqueue Darrick J. Wong
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:22 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

In preparation for having per-superblock fsverity workqueues, pass the
super_block object to fsverity_enqueue_verify_work.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/buffer.c              |    7 +++++--
 fs/ext4/readpage.c       |    4 +++-
 fs/f2fs/compress.c       |    3 ++-
 fs/f2fs/data.c           |    2 +-
 fs/verity/verify.c       |    6 ++++--
 include/linux/fsverity.h |    6 ++++--
 6 files changed, 19 insertions(+), 9 deletions(-)


diff --git a/fs/buffer.c b/fs/buffer.c
index 4f73d23c2c469..b871fbc796e83 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -327,13 +327,15 @@ static void decrypt_bh(struct work_struct *work)
 	err = fscrypt_decrypt_pagecache_blocks(bh->b_folio, bh->b_size,
 					       bh_offset(bh));
 	if (err == 0 && need_fsverity(bh)) {
+		struct super_block *sb = bh->b_folio->mapping->host->i_sb;
+
 		/*
 		 * We use different work queues for decryption and for verity
 		 * because verity may require reading metadata pages that need
 		 * decryption, and we shouldn't recurse to the same workqueue.
 		 */
 		INIT_WORK(&ctx->work, verify_bh);
-		fsverity_enqueue_verify_work(&ctx->work);
+		fsverity_enqueue_verify_work(sb, &ctx->work);
 		return;
 	}
 	end_buffer_async_read(bh, err == 0);
@@ -362,7 +364,8 @@ static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate)
 				fscrypt_enqueue_decrypt_work(&ctx->work);
 			} else {
 				INIT_WORK(&ctx->work, verify_bh);
-				fsverity_enqueue_verify_work(&ctx->work);
+				fsverity_enqueue_verify_work(inode->i_sb,
+							     &ctx->work);
 			}
 			return;
 		}
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 21e8f0aebb3c6..d3915a3f5da7c 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -61,6 +61,7 @@ enum bio_post_read_step {
 
 struct bio_post_read_ctx {
 	struct bio *bio;
+	struct super_block *sb;
 	struct work_struct work;
 	unsigned int cur_step;
 	unsigned int enabled_steps;
@@ -132,7 +133,7 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
 	case STEP_VERITY:
 		if (ctx->enabled_steps & (1 << STEP_VERITY)) {
 			INIT_WORK(&ctx->work, verity_work);
-			fsverity_enqueue_verify_work(&ctx->work);
+			fsverity_enqueue_verify_work(ctx->sb, &ctx->work);
 			return;
 		}
 		ctx->cur_step++;
@@ -195,6 +196,7 @@ static void ext4_set_bio_post_read_ctx(struct bio *bio,
 			mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
 
 		ctx->bio = bio;
+		ctx->sb = inode->i_sb;
 		ctx->enabled_steps = post_read_steps;
 		bio->bi_private = ctx;
 	}
diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 8892c82621414..efd0b0a3a2c37 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1775,7 +1775,8 @@ void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed,
 		 * file, and these metadata pages may be compressed.
 		 */
 		INIT_WORK(&dic->verity_work, f2fs_verify_cluster);
-		fsverity_enqueue_verify_work(&dic->verity_work);
+		fsverity_enqueue_verify_work(dic->inode->i_sb,
+					     &dic->verity_work);
 		return;
 	}
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d9494b5fc7c18..994339216a06e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -221,7 +221,7 @@ static void f2fs_verify_and_finish_bio(struct bio *bio, bool in_task)
 
 	if (ctx && (ctx->enabled_steps & STEP_VERITY)) {
 		INIT_WORK(&ctx->work, f2fs_verify_bio);
-		fsverity_enqueue_verify_work(&ctx->work);
+		fsverity_enqueue_verify_work(ctx->sbi->sb, &ctx->work);
 	} else {
 		f2fs_finish_read_bio(bio, in_task);
 	}
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index 2c1de3cdf24c8..e1fab60303d6d 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -367,13 +367,15 @@ EXPORT_SYMBOL_GPL(fsverity_init_wq);
 
 /**
  * fsverity_enqueue_verify_work() - enqueue work on the fs-verity workqueue
+ * @sb: superblock for this filesystem
  * @work: the work to enqueue
  *
  * Enqueue verification work for asynchronous processing.
  */
-void fsverity_enqueue_verify_work(struct work_struct *work)
+void fsverity_enqueue_verify_work(struct super_block *sb,
+				  struct work_struct *work)
 {
-	queue_work(fsverity_read_workqueue, work);
+	queue_work(sb->s_verity_wq ?: fsverity_read_workqueue, work);
 }
 EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
 
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 6849c4e8268f8..1336f4b9011ea 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -297,7 +297,8 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);
 
 bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
 void fsverity_verify_bio(struct bio *bio);
-void fsverity_enqueue_verify_work(struct work_struct *work);
+void fsverity_enqueue_verify_work(struct super_block *sb,
+				  struct work_struct *work);
 
 int fsverity_init_wq(struct super_block *sb, unsigned int wq_flags,
 		       int max_active);
@@ -389,7 +390,8 @@ static inline void fsverity_verify_bio(struct bio *bio)
 	WARN_ON_ONCE(1);
 }
 
-static inline void fsverity_enqueue_verify_work(struct work_struct *work)
+static inline void fsverity_enqueue_verify_work(struct super_block *sb,
+						struct work_struct *work)
 {
 	WARN_ON_ONCE(1);
 }


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 14/18] ext4: use a per-superblock fsverity workqueue
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-04-30  3:22   ` [PATCH 13/18] fsverity: pass super_block to fsverity_enqueue_verify_work Darrick J. Wong
@ 2024-04-30  3:23   ` Darrick J. Wong
  2024-04-30  3:23   ` [PATCH 15/18] f2fs: " Darrick J. Wong
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:23 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Switch ext4 to use a per-sb fsverity workqueue instead of a systemwide
workqueue.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/ext4/super.c |   11 +++++++++++
 1 file changed, 11 insertions(+)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 044135796f2b6..d54c74c222999 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5332,6 +5332,17 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 #endif
 #ifdef CONFIG_FS_VERITY
 	sb->s_vop = &ext4_verityops;
+	/*
+	 * Use a high-priority workqueue to prioritize verification work, which
+	 * blocks reads from completing, over regular application tasks.
+	 *
+	 * For performance reasons, don't use an unbound workqueue.  Using an
+	 * unbound workqueue for crypto operations causes excessive scheduler
+	 * latency on ARM64.
+	 */
+	err = fsverity_init_wq(sb, WQ_HIGHPRI, num_online_cpus());
+	if (err)
+		goto failed_mount3a;
 #endif
 #ifdef CONFIG_QUOTA
 	sb->dq_op = &ext4_quota_operations;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 15/18] f2fs: use a per-superblock fsverity workqueue
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-04-30  3:23   ` [PATCH 14/18] ext4: use a per-superblock fsverity workqueue Darrick J. Wong
@ 2024-04-30  3:23   ` Darrick J. Wong
  2024-04-30  3:23   ` [PATCH 16/18] btrfs: " Darrick J. Wong
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:23 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Switch f2fs to use a per-sb fsverity workqueue instead of a systemwide
workqueue.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/f2fs/super.c |   11 +++++++++++
 1 file changed, 11 insertions(+)


diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a4bc26dfdb1af..06ac11bb2d214 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4423,6 +4423,17 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 #endif
 #ifdef CONFIG_FS_VERITY
 	sb->s_vop = &f2fs_verityops;
+	/*
+	 * Use a high-priority workqueue to prioritize verification work, which
+	 * blocks reads from completing, over regular application tasks.
+	 *
+	 * For performance reasons, don't use an unbound workqueue.  Using an
+	 * unbound workqueue for crypto operations causes excessive scheduler
+	 * latency on ARM64.
+	 */
+	err = fsverity_init_wq(sb, WQ_HIGHPRI, num_online_cpus());
+	if (err)
+		goto free_bio_info;
 #endif
 	sb->s_xattr = f2fs_xattr_handlers;
 	sb->s_export_op = &f2fs_export_ops;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 16/18] btrfs: use a per-superblock fsverity workqueue
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-04-30  3:23   ` [PATCH 15/18] f2fs: " Darrick J. Wong
@ 2024-04-30  3:23   ` Darrick J. Wong
  2024-04-30  3:23   ` [PATCH 17/18] fsverity: remove system-wide workqueue Darrick J. Wong
  2024-04-30  3:24   ` [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path Darrick J. Wong
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:23 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Switch btrfs to use a per-sb fsverity workqueue instead of a systemwide
workqueue.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/btrfs/super.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7e44ccaf348f2..937f0491c01e5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -28,6 +28,7 @@
 #include <linux/btrfs.h>
 #include <linux/security.h>
 #include <linux/fs_parser.h>
+#include <linux/fsverity.h>
 #include "messages.h"
 #include "delayed-inode.h"
 #include "ctree.h"
@@ -924,6 +925,19 @@ static int btrfs_fill_super(struct super_block *sb,
 	sb->s_export_op = &btrfs_export_ops;
 #ifdef CONFIG_FS_VERITY
 	sb->s_vop = &btrfs_verityops;
+	/*
+	 * Use a high-priority workqueue to prioritize verification work, which
+	 * blocks reads from completing, over regular application tasks.
+	 *
+	 * For performance reasons, don't use an unbound workqueue.  Using an
+	 * unbound workqueue for crypto operations causes excessive scheduler
+	 * latency on ARM64.
+	 */
+	err = fsverity_init_wq(sb, WQ_HIGHPRI, num_online_cpus());
+	if (err) {
+		btrfs_err(fs_info, "fsverity_init_wq failed");
+		return err;
+	}
 #endif
 	sb->s_xattr = btrfs_xattr_handlers;
 	sb->s_time_gran = 1;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 17/18] fsverity: remove system-wide workqueue
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-04-30  3:23   ` [PATCH 16/18] btrfs: " Darrick J. Wong
@ 2024-04-30  3:23   ` Darrick J. Wong
  2024-04-30  3:24   ` [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path Darrick J. Wong
  17 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:23 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Now that we've made the verity workqueue per-superblock, we don't need
the systemwide workqueue.  Get rid of the old implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/verity/fsverity_private.h |    2 --
 fs/verity/init.c             |    1 -
 fs/verity/verify.c           |   21 +--------------------
 3 files changed, 1 insertion(+), 23 deletions(-)


diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
index 20208425e56fc..b6273615f76af 100644
--- a/fs/verity/fsverity_private.h
+++ b/fs/verity/fsverity_private.h
@@ -155,8 +155,6 @@ static inline void fsverity_init_signature(void)
 
 /* verify.c */
 
-void __init fsverity_init_workqueue(void);
-
 int fsverity_read_merkle_tree_block(struct inode *inode,
 				    const struct merkle_tree_params *params,
 				    int level, u64 pos, unsigned long ra_bytes,
diff --git a/fs/verity/init.c b/fs/verity/init.c
index 3769d2dc9e3b4..4663696c6996c 100644
--- a/fs/verity/init.c
+++ b/fs/verity/init.c
@@ -66,7 +66,6 @@ static int __init fsverity_init(void)
 {
 	fsverity_check_hash_algs();
 	fsverity_init_info_cache();
-	fsverity_init_workqueue();
 	fsverity_init_sysctl();
 	fsverity_init_signature();
 	fsverity_init_bpf();
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index e1fab60303d6d..a30eac895338e 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -10,8 +10,6 @@
 #include <crypto/hash.h>
 #include <linux/bio.h>
 
-static struct workqueue_struct *fsverity_read_workqueue;
-
 /*
  * Returns true if the hash @block with index @hblock_idx in the merkle tree
  * for @inode has already been verified.
@@ -375,27 +373,10 @@ EXPORT_SYMBOL_GPL(fsverity_init_wq);
 void fsverity_enqueue_verify_work(struct super_block *sb,
 				  struct work_struct *work)
 {
-	queue_work(sb->s_verity_wq ?: fsverity_read_workqueue, work);
+	queue_work(sb->s_verity_wq, work);
 }
 EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work);
 
-void __init fsverity_init_workqueue(void)
-{
-	/*
-	 * Use a high-priority workqueue to prioritize verification work, which
-	 * blocks reads from completing, over regular application tasks.
-	 *
-	 * For performance reasons, don't use an unbound workqueue.  Using an
-	 * unbound workqueue for crypto operations causes excessive scheduler
-	 * latency on ARM64.
-	 */
-	fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
-						  WQ_HIGHPRI,
-						  num_online_cpus());
-	if (!fsverity_read_workqueue)
-		panic("failed to allocate fsverity_read_queue");
-}
-
 /**
  * fsverity_read_merkle_tree_block() - read Merkle tree block
  * @inode: inode to which this Merkle tree block belongs


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path
  2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-04-30  3:23   ` [PATCH 17/18] fsverity: remove system-wide workqueue Darrick J. Wong
@ 2024-04-30  3:24   ` Darrick J. Wong
  2024-05-01  7:10     ` Christoph Hellwig
  17 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:24 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: Christoph Hellwig, linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

This patch adds fs-verity verification into iomap's read path. After
BIO's io operation is complete the data are verified against
fs-verity's Merkle tree. Verification work is done in a separate
workqueue.

The read path ioend iomap_read_ioend are stored side by side with
BIOs if FS_VERITY is enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix doc warning]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/buffered-io.c |  133 +++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/iomap.h  |    5 ++
 2 files changed, 131 insertions(+), 7 deletions(-)


diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4e8e41c8b3c0e..0167f820914ff 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/compiler.h>
 #include <linux/fs.h>
+#include <linux/fsverity.h>
 #include <linux/iomap.h>
 #include <linux/pagemap.h>
 #include <linux/uio.h>
@@ -23,6 +24,8 @@
 
 #define IOEND_BATCH_SIZE	4096
 
+#define IOMAP_POOL_SIZE		(4 * (PAGE_SIZE / SECTOR_SIZE))
+
 typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
 /*
  * Structure allocated for each folio to track per-block uptodate, dirty state
@@ -368,6 +371,111 @@ static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
 		pos >= i_size_read(iter->inode);
 }
 
+#ifdef CONFIG_FS_VERITY
+struct iomap_fsverity_bio {
+	struct work_struct	work;
+	struct bio		bio;
+};
+static struct bio_set *iomap_fsverity_bioset;
+
+static int iomap_fsverity_init_bioset(void)
+{
+	struct bio_set *bs, *old;
+	int error;
+
+	bs = kzalloc(sizeof(*bs), GFP_KERNEL);
+	if (!bs)
+		return -ENOMEM;
+
+	error = bioset_init(bs, IOMAP_POOL_SIZE,
+			    offsetof(struct iomap_fsverity_bio, bio),
+			    BIOSET_NEED_BVECS);
+	if (error) {
+		kfree(bs);
+		return error;
+	}
+
+	/*
+	 * This has to be atomic as readaheads can race to create the
+	 * bioset.  If someone set the pointer before us, we drop ours.
+	 */
+	old = cmpxchg(&iomap_fsverity_bioset, NULL, bs);
+	if (old) {
+		bioset_exit(bs);
+		kfree(bs);
+	}
+
+	return 0;
+}
+
+int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
+			int max_active)
+{
+	int ret;
+
+	if (!iomap_fsverity_bioset) {
+		ret = iomap_fsverity_init_bioset();
+		if (ret)
+			return ret;
+	}
+
+	return fsverity_init_wq(sb, wq_flags, max_active);
+}
+EXPORT_SYMBOL_GPL(iomap_init_fsverity);
+
+static void
+iomap_read_fsverify_end_io_work(struct work_struct *work)
+{
+	struct iomap_fsverity_bio *fbio =
+		container_of(work, struct iomap_fsverity_bio, work);
+
+	fsverity_verify_bio(&fbio->bio);
+	iomap_read_end_io(&fbio->bio);
+}
+
+static void
+iomap_read_fsverity_end_io(struct bio *bio)
+{
+	struct iomap_fsverity_bio *fbio =
+		container_of(bio, struct iomap_fsverity_bio, bio);
+
+	INIT_WORK(&fbio->work, iomap_read_fsverify_end_io_work);
+	queue_work(bio->bi_private, &fbio->work);
+}
+
+static struct bio *
+iomap_fsverity_read_bio_alloc(struct inode *inode, struct block_device *bdev,
+			    int nr_vecs, gfp_t gfp)
+{
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(bdev, nr_vecs, REQ_OP_READ, gfp,
+			iomap_fsverity_bioset);
+	if (bio) {
+		bio->bi_private = inode->i_sb->s_verity_wq;
+		bio->bi_end_io = iomap_read_fsverity_end_io;
+	}
+	return bio;
+}
+#else
+# define iomap_fsverity_read_bio_alloc(...)	(NULL)
+# define iomap_fsverity_init_bioset(...)	(-EOPNOTSUPP)
+#endif /* CONFIG_FS_VERITY */
+
+static struct bio *iomap_read_bio_alloc(struct inode *inode,
+		struct block_device *bdev, int nr_vecs, gfp_t gfp)
+{
+	struct bio *bio;
+
+	if (fsverity_active(inode))
+		return iomap_fsverity_read_bio_alloc(inode, bdev, nr_vecs, gfp);
+
+	bio = bio_alloc(bdev, nr_vecs, REQ_OP_READ, gfp);
+	if (bio)
+		bio->bi_end_io = iomap_read_end_io;
+	return bio;
+}
+
 static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		struct iomap_readpage_ctx *ctx, loff_t offset)
 {
@@ -380,6 +488,10 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 	size_t poff, plen;
 	sector_t sector;
 
+	/* Fail reads from broken fsverity files immediately. */
+	if (IS_VERITY(iter->inode) && !fsverity_active(iter->inode))
+		return -EIO;
+
 	if (iomap->type == IOMAP_INLINE)
 		return iomap_read_inline_data(iter, folio);
 
@@ -391,6 +503,12 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 	if (iomap_block_needs_zeroing(iter, pos)) {
 		folio_zero_range(folio, poff, plen);
+		if (fsverity_active(iter->inode) &&
+		    !fsverity_verify_blocks(folio, plen, poff)) {
+			folio_set_error(folio);
+			goto done;
+		}
+
 		iomap_set_range_uptodate(folio, poff, plen);
 		goto done;
 	}
@@ -408,28 +526,29 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 	    !bio_add_folio(ctx->bio, folio, plen, poff)) {
 		gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
 		gfp_t orig_gfp = gfp;
-		unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
 
 		if (ctx->bio)
 			submit_bio(ctx->bio);
 
 		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
-		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
-				     REQ_OP_READ, gfp);
+
+		ctx->bio = iomap_read_bio_alloc(iter->inode, iomap->bdev,
+				bio_max_segs(DIV_ROUND_UP(length, PAGE_SIZE)),
+				gfp);
+
 		/*
 		 * If the bio_alloc fails, try it again for a single page to
 		 * avoid having to deal with partial page reads.  This emulates
 		 * what do_mpage_read_folio does.
 		 */
 		if (!ctx->bio) {
-			ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
-					     orig_gfp);
+			ctx->bio = iomap_read_bio_alloc(iter->inode,
+					iomap->bdev, 1, orig_gfp);
 		}
 		if (ctx->rac)
 			ctx->bio->bi_opf |= REQ_RAHEAD;
 		ctx->bio->bi_iter.bi_sector = sector;
-		ctx->bio->bi_end_io = iomap_read_end_io;
 		bio_add_folio_nofail(ctx->bio, folio, plen, poff);
 	}
 
@@ -1987,7 +2106,7 @@ EXPORT_SYMBOL_GPL(iomap_writepages);
 
 static int __init iomap_init(void)
 {
-	return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
+	return bioset_init(&iomap_ioend_bioset, IOMAP_POOL_SIZE,
 			   offsetof(struct iomap_ioend, io_bio),
 			   BIOSET_NEED_BVECS);
 }
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d1..43ec614d64e87 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -256,6 +256,11 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
 	return &i->iomap;
 }
 
+#ifdef CONFIG_FS_VERITY
+int iomap_init_fsverity(struct super_block *sb, unsigned int wq_flags,
+			int max_active);
+#endif
+
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_file_buffered_write_punch_delalloc(struct inode *inode,


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
@ 2024-04-30  3:24   ` Darrick J. Wong
  2024-05-01  6:55     ` Christoph Hellwig
  2024-04-30  3:24   ` [PATCH 02/26] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
                     ` (24 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:24 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

In the next few patches we're going to refactor the attr remote code so
that we can support headerless remote xattr values for storing merkle
tree blocks.  For now, let's change the code to use unsigned int to
describe quantities of bytes and blocks that cannot be negative.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   61 +++++++++++++++++++--------------------
 fs/xfs/libxfs/xfs_attr_remote.h |    2 +
 2 files changed, 31 insertions(+), 32 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index a8de9dc1e998a..1d44ab3e0a506 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -47,13 +47,13 @@
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
  */
-int
+unsigned int
 xfs_attr3_rmt_blocks(
-	struct xfs_mount *mp,
-	int		attrlen)
+	struct xfs_mount	*mp,
+	unsigned int		attrlen)
 {
 	if (xfs_has_crc(mp)) {
-		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		unsigned int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
 		return (attrlen + buflen - 1) / buflen;
 	}
 	return XFS_B_TO_FSB(mp, attrlen);
@@ -92,7 +92,6 @@ xfs_attr3_rmt_verify(
 	struct xfs_mount	*mp,
 	struct xfs_buf		*bp,
 	void			*ptr,
-	int			fsbsize,
 	xfs_daddr_t		bno)
 {
 	struct xfs_attr3_rmt_hdr *rmt = ptr;
@@ -103,7 +102,7 @@ xfs_attr3_rmt_verify(
 		return __this_address;
 	if (be64_to_cpu(rmt->rm_blkno) != bno)
 		return __this_address;
-	if (be32_to_cpu(rmt->rm_bytes) > fsbsize - sizeof(*rmt))
+	if (be32_to_cpu(rmt->rm_bytes) > mp->m_attr_geo->blksize - sizeof(*rmt))
 		return __this_address;
 	if (be32_to_cpu(rmt->rm_offset) +
 				be32_to_cpu(rmt->rm_bytes) > XFS_XATTR_SIZE_MAX)
@@ -122,9 +121,9 @@ __xfs_attr3_rmt_read_verify(
 {
 	struct xfs_mount *mp = bp->b_mount;
 	char		*ptr;
-	int		len;
+	unsigned int	len;
 	xfs_daddr_t	bno;
-	int		blksize = mp->m_attr_geo->blksize;
+	unsigned int	blksize = mp->m_attr_geo->blksize;
 
 	/* no verification of non-crc buffers */
 	if (!xfs_has_crc(mp))
@@ -141,7 +140,7 @@ __xfs_attr3_rmt_read_verify(
 			*failaddr = __this_address;
 			return -EFSBADCRC;
 		}
-		*failaddr = xfs_attr3_rmt_verify(mp, bp, ptr, blksize, bno);
+		*failaddr = xfs_attr3_rmt_verify(mp, bp, ptr, bno);
 		if (*failaddr)
 			return -EFSCORRUPTED;
 		len -= blksize;
@@ -186,7 +185,7 @@ xfs_attr3_rmt_write_verify(
 {
 	struct xfs_mount *mp = bp->b_mount;
 	xfs_failaddr_t	fa;
-	int		blksize = mp->m_attr_geo->blksize;
+	unsigned int	blksize = mp->m_attr_geo->blksize;
 	char		*ptr;
 	int		len;
 	xfs_daddr_t	bno;
@@ -203,7 +202,7 @@ xfs_attr3_rmt_write_verify(
 	while (len > 0) {
 		struct xfs_attr3_rmt_hdr *rmt = (struct xfs_attr3_rmt_hdr *)ptr;
 
-		fa = xfs_attr3_rmt_verify(mp, bp, ptr, blksize, bno);
+		fa = xfs_attr3_rmt_verify(mp, bp, ptr, bno);
 		if (fa) {
 			xfs_verifier_error(bp, -EFSCORRUPTED, fa);
 			return;
@@ -281,20 +280,20 @@ xfs_attr_rmtval_copyout(
 	struct xfs_buf		*bp,
 	struct xfs_inode	*dp,
 	xfs_ino_t		owner,
-	int			*offset,
-	int			*valuelen,
+	unsigned int		*offset,
+	unsigned int		*valuelen,
 	uint8_t			**dst)
 {
 	char			*src = bp->b_addr;
 	xfs_daddr_t		bno = xfs_buf_daddr(bp);
-	int			len = BBTOB(bp->b_length);
-	int			blksize = mp->m_attr_geo->blksize;
+	unsigned int		len = BBTOB(bp->b_length);
+	unsigned int		blksize = mp->m_attr_geo->blksize;
 
 	ASSERT(len >= blksize);
 
 	while (len > 0 && *valuelen > 0) {
-		int hdr_size = 0;
-		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int hdr_size = 0;
+		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 
@@ -330,20 +329,20 @@ xfs_attr_rmtval_copyin(
 	struct xfs_mount *mp,
 	struct xfs_buf	*bp,
 	xfs_ino_t	ino,
-	int		*offset,
-	int		*valuelen,
+	unsigned int	*offset,
+	unsigned int	*valuelen,
 	uint8_t		**src)
 {
 	char		*dst = bp->b_addr;
 	xfs_daddr_t	bno = xfs_buf_daddr(bp);
-	int		len = BBTOB(bp->b_length);
-	int		blksize = mp->m_attr_geo->blksize;
+	unsigned int	len = BBTOB(bp->b_length);
+	unsigned int	blksize = mp->m_attr_geo->blksize;
 
 	ASSERT(len >= blksize);
 
 	while (len > 0 && *valuelen > 0) {
-		int hdr_size;
-		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int hdr_size;
+		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
@@ -389,12 +388,12 @@ xfs_attr_rmtval_get(
 	struct xfs_buf		*bp;
 	xfs_dablk_t		lblkno = args->rmtblkno;
 	uint8_t			*dst = args->value;
-	int			valuelen;
+	unsigned int		valuelen;
 	int			nmap;
 	int			error;
-	int			blkcnt = args->rmtblkcnt;
+	unsigned int		blkcnt = args->rmtblkcnt;
 	int			i;
-	int			offset = 0;
+	unsigned int		offset = 0;
 
 	trace_xfs_attr_rmtval_get(args);
 
@@ -452,7 +451,7 @@ xfs_attr_rmt_find_hole(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	int			error;
-	int			blkcnt;
+	unsigned int		blkcnt;
 	xfs_fileoff_t		lfileoff = 0;
 
 	/*
@@ -481,11 +480,11 @@ xfs_attr_rmtval_set_value(
 	struct xfs_bmbt_irec	map;
 	xfs_dablk_t		lblkno;
 	uint8_t			*src = args->value;
-	int			blkcnt;
-	int			valuelen;
+	unsigned int		blkcnt;
+	unsigned int		valuelen;
 	int			nmap;
 	int			error;
-	int			offset = 0;
+	unsigned int		offset = 0;
 
 	/*
 	 * Roll through the "value", copying the attribute value to the
@@ -645,7 +644,7 @@ xfs_attr_rmtval_invalidate(
 	struct xfs_da_args	*args)
 {
 	xfs_dablk_t		lblkno;
-	int			blkcnt;
+	unsigned int		blkcnt;
 	int			error;
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index d097ec6c4dc35..c64b04f91cafd 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -6,7 +6,7 @@
 #ifndef __XFS_ATTR_REMOTE_H__
 #define	__XFS_ATTR_REMOTE_H__
 
-int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
+unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrlen);
 
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 02/26] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
  2024-04-30  3:24   ` [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
@ 2024-04-30  3:24   ` Darrick J. Wong
  2024-05-01  6:55     ` Christoph Hellwig
  2024-04-30  3:24   ` [PATCH 03/26] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
                     ` (23 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:24 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Turn this into a properly typechecked function, and actually use the
correct blocksize for extended attributes.  The function cannot be
static inline because xfsprogs userspace uses it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   19 ++++++++++++++++---
 fs/xfs/libxfs/xfs_da_format.h   |    4 +---
 2 files changed, 17 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 1d44ab3e0a506..626fb92d30296 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -43,6 +43,19 @@
  * the logging system and therefore never have a log item.
  */
 
+/* How many bytes can be stored in a remote value buffer? */
+inline unsigned int
+xfs_attr3_rmt_buf_space(
+	struct xfs_mount	*mp)
+{
+	unsigned int		blocksize = mp->m_attr_geo->blksize;
+
+	if (xfs_has_crc(mp))
+		return blocksize - sizeof(struct xfs_attr3_rmt_hdr);
+
+	return blocksize;
+}
+
 /*
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
@@ -53,7 +66,7 @@ xfs_attr3_rmt_blocks(
 	unsigned int		attrlen)
 {
 	if (xfs_has_crc(mp)) {
-		unsigned int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		unsigned int buflen = xfs_attr3_rmt_buf_space(mp);
 		return (attrlen + buflen - 1) / buflen;
 	}
 	return XFS_B_TO_FSB(mp, attrlen);
@@ -293,7 +306,7 @@ xfs_attr_rmtval_copyout(
 
 	while (len > 0 && *valuelen > 0) {
 		unsigned int hdr_size = 0;
-		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 
@@ -342,7 +355,7 @@ xfs_attr_rmtval_copyin(
 
 	while (len > 0 && *valuelen > 0) {
 		unsigned int hdr_size;
-		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index ebde6eb1da65d..86de99e2f7570 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -880,9 +880,7 @@ struct xfs_attr3_rmt_hdr {
 
 #define XFS_ATTR3_RMT_CRC_OFF	offsetof(struct xfs_attr3_rmt_hdr, rm_crc)
 
-#define XFS_ATTR3_RMT_BUF_SPACE(mp, bufsize)	\
-	((bufsize) - (xfs_has_crc((mp)) ? \
-			sizeof(struct xfs_attr3_rmt_hdr) : 0))
+unsigned int xfs_attr3_rmt_buf_space(struct xfs_mount *mp);
 
 /* Number of bytes in a directory block. */
 static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 03/26] xfs: create a helper to compute the blockcount of a max sized remote value
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
  2024-04-30  3:24   ` [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
  2024-04-30  3:24   ` [PATCH 02/26] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
@ 2024-04-30  3:24   ` Darrick J. Wong
  2024-05-01  6:56     ` Christoph Hellwig
  2024-04-30  3:25   ` [PATCH 04/26] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
                     ` (22 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:24 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Create a helper function to compute the number of fsblocks needed to
store a maximally-sized extended attribute value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c        |    2 +-
 fs/xfs/libxfs/xfs_attr_remote.h |    6 ++++++
 fs/xfs/scrub/reap.c             |    4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 867fe409f0027..b841096947acb 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1039,7 +1039,7 @@ xfs_attr_set(
 		break;
 	case XFS_ATTRUPDATE_REMOVE:
 		XFS_STATS_INC(mp, xs_attr_remove);
-		rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+		rmt_blks = xfs_attr3_max_rmt_blocks(mp);
 		break;
 	}
 
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index c64b04f91cafd..e3c6c7d774bf9 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -8,6 +8,12 @@
 
 unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrlen);
 
+/* Number of rmt blocks needed to store the maximally sized attr value */
+static inline unsigned int xfs_attr3_max_rmt_blocks(struct xfs_mount *mp)
+{
+	return xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+}
+
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c
index b8166e19726a4..fbf4d248f0060 100644
--- a/fs/xfs/scrub/reap.c
+++ b/fs/xfs/scrub/reap.c
@@ -227,7 +227,7 @@ xrep_bufscan_max_sectors(
 	int			max_fsbs;
 
 	/* Remote xattr values are the largest buffers that we support. */
-	max_fsbs = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+	max_fsbs = xfs_attr3_max_rmt_blocks(mp);
 
 	return XFS_FSB_TO_BB(mp, min_t(xfs_extlen_t, fsblocks, max_fsbs));
 }
@@ -1070,7 +1070,7 @@ xreap_bmapi_binval(
 	 * of the next hole.
 	 */
 	off = imap->br_startoff + imap->br_blockcount;
-	max_off = off + xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+	max_off = off + xfs_attr3_max_rmt_blocks(mp);
 	while (off < max_off) {
 		struct xfs_bmbt_irec	hmap;
 		int			nhmaps = 1;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 04/26] xfs: minor cleanups of xfs_attr3_rmt_blocks
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-30  3:24   ` [PATCH 03/26] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
@ 2024-04-30  3:25   ` Darrick J. Wong
  2024-05-01  6:56     ` Christoph Hellwig
  2024-04-30  3:25   ` [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
                     ` (21 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:25 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Clean up the type signature of this function since we don't have
negative attr lengths or block counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 626fb92d30296..0566733b6da45 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -56,19 +56,19 @@ xfs_attr3_rmt_buf_space(
 	return blocksize;
 }
 
-/*
- * Each contiguous block has a header, so it is not just a simple attribute
- * length to FSB conversion.
- */
+/* Compute number of fsblocks needed to store a remote attr value */
 unsigned int
 xfs_attr3_rmt_blocks(
 	struct xfs_mount	*mp,
 	unsigned int		attrlen)
 {
-	if (xfs_has_crc(mp)) {
-		unsigned int buflen = xfs_attr3_rmt_buf_space(mp);
-		return (attrlen + buflen - 1) / buflen;
-	}
+	/*
+	 * Each contiguous block has a header, so it is not just a simple
+	 * attribute length to FSB conversion.
+	 */
+	if (xfs_has_crc(mp))
+		return howmany(attrlen, xfs_attr3_rmt_buf_space(mp));
+
 	return XFS_B_TO_FSB(mp, attrlen);
 }
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-30  3:25   ` [PATCH 04/26] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
@ 2024-04-30  3:25   ` Darrick J. Wong
  2024-05-01  6:57     ` Christoph Hellwig
  2024-04-30  3:25   ` [PATCH 06/26] xfs: add attribute type for fs-verity Darrick J. Wong
                     ` (20 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:25 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Wrap the xfs_attr_get_ilocked call in xfs_attr_get with an empty
transaction so that we cannot livelock the kernel if someone injects a
loop into the attr structure or the attr fork bmbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b841096947acb..e0be8d0c1ffdc 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -274,6 +274,8 @@ xfs_attr_get(
 
 	XFS_STATS_INC(args->dp->i_mount, xs_attr_get);
 
+	ASSERT(!args->trans);
+
 	if (xfs_is_shutdown(args->dp->i_mount))
 		return -EIO;
 
@@ -286,8 +288,27 @@ xfs_attr_get(
 	/* Entirely possible to look up a name which doesn't exist */
 	args->op_flags = XFS_DA_OP_OKNOENT;
 
+	error = xfs_trans_alloc_empty(args->dp->i_mount, &args->trans);
+	if (error)
+		return error;
+
 	lock_mode = xfs_ilock_attr_map_shared(args->dp);
+
+        /*
+	 * Make sure the attr fork iext tree is loaded.  Use the empty
+	 * transaction to load the bmbt so that we avoid livelocking on loops.
+	 */
+        if (xfs_inode_hasattr(args->dp)) {
+                error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK);
+                if (error)
+                        goto out_cancel;
+        }
+
 	error = xfs_attr_get_ilocked(args);
+
+out_cancel:
+	xfs_trans_cancel(args->trans);
+	args->trans = NULL;
 	xfs_iunlock(args->dp, lock_mode);
 
 	return error;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 06/26] xfs: add attribute type for fs-verity
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-30  3:25   ` [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
@ 2024-04-30  3:25   ` Darrick J. Wong
  2024-04-30  3:25   ` [PATCH 07/26] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks Darrick J. Wong
                     ` (19 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:25 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

The Merkle tree blocks and descriptor are stored in the extended
attributes of the inode. Add new attribute type for fs-verity
metadata. Add XFS_ATTR_INTERNAL_MASK to skip parent pointer and
fs-verity attributes as those are only for internal use. While we're
at it add a few comments in relevant places that internally visible
attributes are not suppose to be handled via interface defined in
xfs_xattr.c.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h  |   11 ++++++++---
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_trace.h             |    3 ++-
 3 files changed, 11 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 86de99e2f7570..27b9ad9f8b2e4 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -715,19 +715,23 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
 #define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
+#define	XFS_ATTR_VERITY_BIT	4	/* verity merkle tree and descriptor */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
 #define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
+#define XFS_ATTR_VERITY		(1u << XFS_ATTR_VERITY_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
 
 #define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
-					 XFS_ATTR_PARENT)
+					 XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY)
 
 /* Private attr namespaces not exposed to userspace */
-#define XFS_ATTR_PRIVATE_NSP_MASK	(XFS_ATTR_PARENT)
+#define XFS_ATTR_PRIVATE_NSP_MASK	(XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY)
 
 #define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
 				 XFS_ATTR_LOCAL | \
@@ -737,7 +741,8 @@ struct xfs_attr3_leafblock {
 	{ XFS_ATTR_LOCAL,	"local" }, \
 	{ XFS_ATTR_ROOT,	"root" }, \
 	{ XFS_ATTR_SECURE,	"secure" }, \
-	{ XFS_ATTR_PARENT,	"parent" }
+	{ XFS_ATTR_PARENT,	"parent" }, \
+	{ XFS_ATTR_VERITY,	"verity" }
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 0f194ae71b42c..4d11d6b7b1ad6 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -1052,6 +1052,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
 					 XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 990837afbf667..7116e7d9627d0 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -107,7 +107,8 @@ struct xfs_fsrefs;
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
 	{ XFS_ATTR_SECURE,	"SECURE" }, \
 	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }, \
-	{ XFS_ATTR_PARENT,	"PARENT" }
+	{ XFS_ATTR_PARENT,	"PARENT" }, \
+	{ XFS_ATTR_VERITY,	"VERITY" }
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 07/26] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-30  3:25   ` [PATCH 06/26] xfs: add attribute type for fs-verity Darrick J. Wong
@ 2024-04-30  3:25   ` Darrick J. Wong
  2024-04-30  3:26   ` [PATCH 08/26] xfs: add fs-verity ro-compat flag Darrick J. Wong
                     ` (18 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:25 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

I enabled fsverity for a ~250MB file and noticed the following xattr
keys that got created for the merkle tree.  These two merkle tree blocks
are written out in ascending order:

nvlist[52].merkle_off = 0x111000
nvlist[53].valueblk = 0x222
nvlist[53].valuelen = 4096

nvlist[53].merkle_off = 0x112000
nvlist[54].valueblk = 0x224
nvlist[54].valuelen = 4096

Notice that while the valuelen is 4k, the block offset increases by two.
Curious, I then loaded up ablock 0x223:

hdr.magic = 0x5841524d
hdr.offset = 4040
hdr.bytes = 56
hdr.crc = 0xad1b8bd8 (correct)
hdr.uuid = 07d3f25c-e550-4118-8ff5-a45c017ba5ef
hdr.owner = 133
hdr.bno = 442144
hdr.lsn = 0xffffffffffffffff
data = <56 bytes of charns data>

Ugh!  Each 4k merkle tree block takes up two fsblocks due to the remote
value header that XFS puts at the start of each remote value block.
That header is 56 bytes long, which is exactly the length of the
spillover here.  This isn't good.

The first thing that I tried was enabling fsverity on a bunch of files,
extracting the merkle tree blocks one by one, and testing their
compressability with gzip, zstd, and xz.  Merkle tree blocks are nearly
indistinguishable from random data, with the result that 99% of the
blocks I sampled got larger under compression.  So that's out.

Next I decided to try eliminating the xfs_attr3_rmt_hdr header, which
would make verity remote values align perfectly with filesystem blocks.
Because remote value blocks are written out with xfs_bwrite, the lsn
field isn't useful.  The merkle tree is itself a bunch of hashes of data
blocks or other merkle tree blocks, which means that a bitflip will
result in a verity failure somewhere in the file.  Hence we don't need
to store an explicit crc, and we could just XOR the ondisk merkle tree
contents with selected attributes.

In the end I decided to create a smaller header structure containing
only a magic, the fsuuid, the inode owner, and the ondisk block number.
These values get XORd into the beginning of the merkle tree block to
detect lost writes when we're writing remote XFS_ATTR_VERITY values to
disk, and XORd out when reading them back in.

With this format change applied, the fsverity overhead halves.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c        |    6 +-
 fs/xfs/libxfs/xfs_attr_leaf.c   |    5 +-
 fs/xfs/libxfs/xfs_attr_remote.c |  125 ++++++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_attr_remote.h |    8 ++
 fs/xfs/libxfs/xfs_da_format.h   |   22 +++++++
 fs/xfs/libxfs/xfs_ondisk.h      |    2 +
 fs/xfs/libxfs/xfs_shared.h      |    1 
 fs/xfs/xfs_attr_inactive.c      |    2 -
 8 files changed, 148 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index e0be8d0c1ffdc..1b9d9ffb16833 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -342,7 +342,8 @@ xfs_attr_calc_size(
 		 * Out of line attribute, cannot double split, but
 		 * make room for the attribute value itself.
 		 */
-		uint	dblocks = xfs_attr3_rmt_blocks(mp, args->valuelen);
+		uint	dblocks = xfs_attr3_rmt_blocks(mp, args->attr_filter,
+						       args->valuelen);
 		nblks += dblocks;
 		nblks += XFS_NEXTENTADD_SPACE_RES(mp, dblocks, XFS_ATTR_FORK);
 	}
@@ -1056,7 +1057,8 @@ xfs_attr_set(
 		}
 
 		if (!local)
-			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
+			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen,
+					args->valuelen);
 		break;
 	case XFS_ATTRUPDATE_REMOVE:
 		XFS_STATS_INC(mp, xs_attr_remove);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 6aaec1246c950..fd4a5ace52c64 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -1566,7 +1566,8 @@ xfs_attr3_leaf_add_work(
 		name_rmt->valuelen = 0;
 		name_rmt->valueblk = 0;
 		args->rmtblkno = 1;
-		args->rmtblkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
+		args->rmtblkcnt = xfs_attr3_rmt_blocks(mp, args->attr_filter,
+				args->valuelen);
 		args->rmtvaluelen = args->valuelen;
 	}
 	xfs_trans_log_buf(args->trans, bp,
@@ -2501,6 +2502,7 @@ xfs_attr3_leaf_lookup_int(
 			args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
 			args->rmtblkcnt = xfs_attr3_rmt_blocks(
 							args->dp->i_mount,
+							args->attr_filter,
 							args->rmtvaluelen);
 			return -EEXIST;
 		}
@@ -2549,6 +2551,7 @@ xfs_attr3_leaf_getvalue(
 	args->rmtvaluelen = be32_to_cpu(name_rmt->valuelen);
 	args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
 	args->rmtblkcnt = xfs_attr3_rmt_blocks(args->dp->i_mount,
+					       args->attr_filter,
 					       args->rmtvaluelen);
 	return xfs_attr_copy_value(args, NULL, args->rmtvaluelen);
 }
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 0566733b6da45..6accc8ae46c45 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -43,14 +43,23 @@
  * the logging system and therefore never have a log item.
  */
 
+static inline bool
+xfs_attr3_rmt_has_header(
+	struct xfs_mount	*mp,
+	unsigned int		attrns)
+{
+	return xfs_has_crc(mp) && !(attrns & XFS_ATTR_VERITY);
+}
+
 /* How many bytes can be stored in a remote value buffer? */
 inline unsigned int
 xfs_attr3_rmt_buf_space(
-	struct xfs_mount	*mp)
+	struct xfs_mount	*mp,
+	unsigned int		attrns)
 {
 	unsigned int		blocksize = mp->m_attr_geo->blksize;
 
-	if (xfs_has_crc(mp))
+	if (xfs_attr3_rmt_has_header(mp, attrns))
 		return blocksize - sizeof(struct xfs_attr3_rmt_hdr);
 
 	return blocksize;
@@ -60,14 +69,15 @@ xfs_attr3_rmt_buf_space(
 unsigned int
 xfs_attr3_rmt_blocks(
 	struct xfs_mount	*mp,
+	unsigned int		attrns,
 	unsigned int		attrlen)
 {
 	/*
 	 * Each contiguous block has a header, so it is not just a simple
 	 * attribute length to FSB conversion.
 	 */
-	if (xfs_has_crc(mp))
-		return howmany(attrlen, xfs_attr3_rmt_buf_space(mp));
+	if (xfs_attr3_rmt_has_header(mp, attrns))
+		return howmany(attrlen, xfs_attr3_rmt_buf_space(mp, attrns));
 
 	return XFS_B_TO_FSB(mp, attrlen);
 }
@@ -248,6 +258,42 @@ const struct xfs_buf_ops xfs_attr3_rmt_buf_ops = {
 	.verify_struct = xfs_attr3_rmt_verify_struct,
 };
 
+static void
+xfs_attr3_rmtverity_read_verify(
+	struct xfs_buf	*bp)
+{
+}
+
+static xfs_failaddr_t
+xfs_attr3_rmtverity_verify_struct(
+	struct xfs_buf	*bp)
+{
+	return NULL;
+}
+
+static void
+xfs_attr3_rmtverity_write_verify(
+	struct xfs_buf	*bp)
+{
+}
+
+const struct xfs_buf_ops xfs_attr3_rmtverity_buf_ops = {
+	.name = "xfs_attr3_remote_verity",
+	.magic = { 0, 0 },
+	.verify_read = xfs_attr3_rmtverity_read_verify,
+	.verify_write = xfs_attr3_rmtverity_write_verify,
+	.verify_struct = xfs_attr3_rmtverity_verify_struct,
+};
+
+inline const struct xfs_buf_ops *
+xfs_attr3_remote_buf_ops(
+	unsigned int		attrns)
+{
+	if (attrns & XFS_ATTR_VERITY)
+		return &xfs_attr3_rmtverity_buf_ops;
+	return &xfs_attr3_rmt_buf_ops;
+}
+
 STATIC int
 xfs_attr3_rmt_hdr_set(
 	struct xfs_mount	*mp,
@@ -284,6 +330,40 @@ xfs_attr3_rmt_hdr_set(
 	return sizeof(struct xfs_attr3_rmt_hdr);
 }
 
+static void
+xfs_attr_rmtverity_transform(
+	struct xfs_buf		*bp,
+	xfs_ino_t		ino,
+	void			*buf,
+	unsigned int		byte_cnt)
+{
+	struct xfs_mount	*mp = bp->b_mount;
+	struct xfs_attr3_rmtverity_hdr	*hdr = buf;
+	char			*dst;
+	const char		*src;
+	unsigned int		i;
+
+	if (byte_cnt >= offsetofend(struct xfs_attr3_rmtverity_hdr, rmv_owner))
+		hdr->rmv_owner ^= cpu_to_be64(ino);
+
+	if (byte_cnt >= offsetofend(struct xfs_attr3_rmtverity_hdr, rmv_blkno))
+		hdr->rmv_blkno ^= cpu_to_be64(xfs_buf_daddr(bp));
+
+	if (byte_cnt >= offsetofend(struct xfs_attr3_rmtverity_hdr, rmv_magic))
+		hdr->rmv_magic ^= cpu_to_be32(XFS_ATTR3_RMTVERITY_MAGIC);
+
+	if (byte_cnt <= offsetof(struct xfs_attr3_rmtverity_hdr, rmv_uuid))
+		return;
+
+	byte_cnt -= offsetof(struct xfs_attr3_rmtverity_hdr, rmv_uuid);
+	byte_cnt = min(byte_cnt, sizeof(uuid_t));
+
+	dst = (void *)&hdr->rmv_uuid;
+	src = (void *)&mp->m_sb.sb_meta_uuid;
+	for (i = 0; i < byte_cnt; i++)
+		dst[i] ^= src[i];
+}
+
 /*
  * Helper functions to copy attribute data in and out of the one disk extents
  */
@@ -293,6 +373,7 @@ xfs_attr_rmtval_copyout(
 	struct xfs_buf		*bp,
 	struct xfs_inode	*dp,
 	xfs_ino_t		owner,
+	unsigned int		attrns,
 	unsigned int		*offset,
 	unsigned int		*valuelen,
 	uint8_t			**dst)
@@ -306,11 +387,11 @@ xfs_attr_rmtval_copyout(
 
 	while (len > 0 && *valuelen > 0) {
 		unsigned int hdr_size = 0;
-		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp, attrns);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 
-		if (xfs_has_crc(mp)) {
+		if (xfs_attr3_rmt_has_header(mp, attrns)) {
 			if (xfs_attr3_rmt_hdr_ok(src, owner, *offset,
 						  byte_cnt, bno)) {
 				xfs_alert(mp,
@@ -324,6 +405,10 @@ xfs_attr_rmtval_copyout(
 
 		memcpy(*dst, src + hdr_size, byte_cnt);
 
+		if (attrns & XFS_ATTR_VERITY)
+			xfs_attr_rmtverity_transform(bp, dp->i_ino, *dst,
+					byte_cnt);
+
 		/* roll buffer forwards */
 		len -= blksize;
 		src += blksize;
@@ -342,6 +427,7 @@ xfs_attr_rmtval_copyin(
 	struct xfs_mount *mp,
 	struct xfs_buf	*bp,
 	xfs_ino_t	ino,
+	unsigned int	attrns,
 	unsigned int	*offset,
 	unsigned int	*valuelen,
 	uint8_t		**src)
@@ -354,15 +440,20 @@ xfs_attr_rmtval_copyin(
 	ASSERT(len >= blksize);
 
 	while (len > 0 && *valuelen > 0) {
-		unsigned int hdr_size;
-		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
+		unsigned int hdr_size = 0;
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp, attrns);
 
 		byte_cnt = min(*valuelen, byte_cnt);
-		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
-						 byte_cnt, bno);
+		if (xfs_attr3_rmt_has_header(mp, attrns))
+			hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
+					byte_cnt, bno);
 
 		memcpy(dst + hdr_size, *src, byte_cnt);
 
+		if (attrns & XFS_ATTR_VERITY)
+			xfs_attr_rmtverity_transform(bp, ino, dst + hdr_size,
+					byte_cnt);
+
 		/*
 		 * If this is the last block, zero the remainder of it.
 		 * Check that we are actually the last block, too.
@@ -407,6 +498,7 @@ xfs_attr_rmtval_get(
 	unsigned int		blkcnt = args->rmtblkcnt;
 	int			i;
 	unsigned int		offset = 0;
+	const struct xfs_buf_ops *ops = xfs_attr3_remote_buf_ops(args->attr_filter);
 
 	trace_xfs_attr_rmtval_get(args);
 
@@ -432,14 +524,15 @@ xfs_attr_rmtval_get(
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
 			error = xfs_buf_read(mp->m_ddev_targp, dblkno, dblkcnt,
-					0, &bp, &xfs_attr3_rmt_buf_ops);
+					0, &bp, ops);
 			if (xfs_metadata_is_sick(error))
 				xfs_dirattr_mark_sick(args->dp, XFS_ATTR_FORK);
 			if (error)
 				return error;
 
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp,
-					args->owner, &offset, &valuelen, &dst);
+					args->owner, args->attr_filter,
+					&offset, &valuelen, &dst);
 			xfs_buf_relse(bp);
 			if (error)
 				return error;
@@ -472,7 +565,7 @@ xfs_attr_rmt_find_hole(
 	 * straight byte to FSB conversion and have to take the header space
 	 * into account.
 	 */
-	blkcnt = xfs_attr3_rmt_blocks(mp, args->rmtvaluelen);
+	blkcnt = xfs_attr3_rmt_blocks(mp, args->attr_filter, args->rmtvaluelen);
 	error = xfs_bmap_first_unused(args->trans, args->dp, blkcnt, &lfileoff,
 						   XFS_ATTR_FORK);
 	if (error)
@@ -531,10 +624,10 @@ xfs_attr_rmtval_set_value(
 		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, &bp);
 		if (error)
 			return error;
-		bp->b_ops = &xfs_attr3_rmt_buf_ops;
+		bp->b_ops = xfs_attr3_remote_buf_ops(args->attr_filter);
 
-		xfs_attr_rmtval_copyin(mp, bp, args->owner, &offset, &valuelen,
-				&src);
+		xfs_attr_rmtval_copyin(mp, bp, args->owner, args->attr_filter,
+				&offset, &valuelen, &src);
 
 		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
 		xfs_buf_relse(bp);
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index e3c6c7d774bf9..344fea1b9b50e 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -6,12 +6,13 @@
 #ifndef __XFS_ATTR_REMOTE_H__
 #define	__XFS_ATTR_REMOTE_H__
 
-unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrlen);
+unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrns,
+		unsigned int attrlen);
 
 /* Number of rmt blocks needed to store the maximally sized attr value */
 static inline unsigned int xfs_attr3_max_rmt_blocks(struct xfs_mount *mp)
 {
-	return xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+	return xfs_attr3_rmt_blocks(mp, 0, XFS_XATTR_SIZE_MAX);
 }
 
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
@@ -23,4 +24,7 @@ int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_blk(struct xfs_attr_intent *attr);
 int xfs_attr_rmtval_find_space(struct xfs_attr_intent *attr);
+
+const struct xfs_buf_ops *xfs_attr3_remote_buf_ops(unsigned int attrns);
+
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 27b9ad9f8b2e4..c84b94da3f321 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -885,7 +885,27 @@ struct xfs_attr3_rmt_hdr {
 
 #define XFS_ATTR3_RMT_CRC_OFF	offsetof(struct xfs_attr3_rmt_hdr, rm_crc)
 
-unsigned int xfs_attr3_rmt_buf_space(struct xfs_mount *mp);
+unsigned int xfs_attr3_rmt_buf_space(struct xfs_mount *mp, unsigned int attrns);
+
+/*
+ * XFS_ATTR_VERITY remote attribute block format definition
+ *
+ * fsverity stores blocks of a merkle tree in the extended attributes.  The
+ * size of these blocks are a power of two, so we'd like to reduce overhead by
+ * not storing a remote header at the start of each ondisk block.  Because
+ * merkle tree blocks are themselves hashes of other merkle tree or data
+ * blocks, we can detect bitflips without needing our own checksum.  Settle for
+ * XORing the owner, blkno, magic, and metauuid into the start of each ondisk
+ * merkle tree block.
+ */
+#define XFS_ATTR3_RMTVERITY_MAGIC	0x5955434B	/* YUCK */
+
+struct xfs_attr3_rmtverity_hdr {
+	__be64	rmv_owner;
+	__be64	rmv_blkno;
+	__be32	rmv_magic;
+	uuid_t	rmv_uuid;
+} __packed;
 
 /* Number of bytes in a directory block. */
 static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h
index 653ea6d643489..7a312aed23373 100644
--- a/fs/xfs/libxfs/xfs_ondisk.h
+++ b/fs/xfs/libxfs/xfs_ondisk.h
@@ -59,6 +59,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr,	80);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leafblock,	80);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_rmt_hdr,		56);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_rmtverity_hdr,	36);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_da3_blkinfo,		56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_da3_intnode,		64);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_da3_node_hdr,		64);
@@ -207,6 +208,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
 			16299260424LL);
+
 }
 
 #endif /* __XFS_ONDISK_H */
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 40a4826603074..eb3a674fe1615 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -26,6 +26,7 @@ extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_agi_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
+extern const struct xfs_buf_ops xfs_attr3_rmtverity_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
 extern const struct xfs_buf_ops xfs_bnobt_buf_ops;
 extern const struct xfs_buf_ops xfs_cntbt_buf_ops;
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 24fb12986a568..93fa78a230d04 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -110,7 +110,7 @@ xfs_attr3_leaf_inactive(
 		if (!name_rmt->valueblk)
 			continue;
 
-		blkcnt = xfs_attr3_rmt_blocks(dp->i_mount,
+		blkcnt = xfs_attr3_rmt_blocks(dp->i_mount, entry->flags,
 				be32_to_cpu(name_rmt->valuelen));
 		error = xfs_attr3_rmt_stale(dp,
 				be32_to_cpu(name_rmt->valueblk), blkcnt);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 08/26] xfs: add fs-verity ro-compat flag
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-04-30  3:25   ` [PATCH 07/26] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks Darrick J. Wong
@ 2024-04-30  3:26   ` Darrick J. Wong
  2024-04-30  3:26   ` [PATCH 09/26] xfs: add inode on-disk VERITY flag Darrick J. Wong
                     ` (17 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:26 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

To mark inodes with fs-verity enabled the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h |    1 +
 fs/xfs/libxfs/xfs_sb.c     |    2 ++
 fs/xfs/xfs_mount.h         |    2 ++
 3 files changed, 5 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e9585ba12ded3..563f359f2f075 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -387,6 +387,7 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index ad64647234f44..0bf5b4007afd8 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -167,6 +167,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+		features |= XFS_FEAT_VERITY;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e44eef998477d..78284e91244a8 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -311,6 +311,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
 #define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
 #define XFS_FEAT_RTGROUPS	(1ULL << 29)	/* realtime groups */
+#define XFS_FEAT_VERITY		(1ULL << 30)	/* fs-verity */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -377,6 +378,7 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64)
 __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
 __XFS_HAS_FEAT(metadir, METADIR)
 __XFS_HAS_FEAT(rtgroups, RTGROUPS)
+__XFS_HAS_FEAT(verity, VERITY)
 
 static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp)
 {


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 09/26] xfs: add inode on-disk VERITY flag
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-04-30  3:26   ` [PATCH 08/26] xfs: add fs-verity ro-compat flag Darrick J. Wong
@ 2024-04-30  3:26   ` Darrick J. Wong
  2024-04-30  3:26   ` [PATCH 10/26] xfs: initialize fs-verity on file open and cleanup on inode destruction Darrick J. Wong
                     ` (16 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:26 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h     |    5 ++++-
 fs/xfs/libxfs/xfs_inode_buf.c  |    8 ++++++++
 fs/xfs/libxfs/xfs_inode_util.c |    2 ++
 fs/xfs/xfs_iops.c              |    2 ++
 4 files changed, 16 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 563f359f2f075..810f2556762b0 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1190,6 +1190,7 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT	4	/* large extent counters */
+#define XFS_DIFLAG2_VERITY_BIT	5	/* inode sealed by fsverity */
 #define XFS_DIFLAG2_METADIR_BIT	63	/* filesystem metadata */
 
 #define XFS_DIFLAG2_DAX		(1ULL << XFS_DIFLAG2_DAX_BIT)
@@ -1197,6 +1198,7 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE	(1ULL << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1ULL << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1ULL << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_VERITY	(1ULL << XFS_DIFLAG2_VERITY_BIT)
 
 /*
  * The inode contains filesystem metadata and can be found through the metadata
@@ -1225,7 +1227,8 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADIR)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADIR | \
+	 XFS_DIFLAG2_VERITY)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index adc457da52ef0..dae0f27d3961b 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -695,6 +695,14 @@ xfs_dinode_verify(
 	    !xfs_has_rtreflink(mp))
 		return __this_address;
 
+	/* only regular files can have fsverity */
+	if (flags2 & XFS_DIFLAG2_VERITY) {
+		if (!xfs_has_verity(mp))
+			return __this_address;
+		if ((mode & S_IFMT) != S_IFREG)
+			return __this_address;
+	}
+
 	/* COW extent size hint validation */
 	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
 			mode, flags, flags2);
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index a448e4a2a3e59..fcea20ad675e8 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -127,6 +127,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+			flags |= FS_XFLAG_VERITY;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index c22411a8ed16b..80e3c2a3c6dbf 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1291,6 +1291,8 @@ xfs_diflags_to_iflags(
 		flags |= S_NOATIME;
 	if (init && xfs_inode_should_enable_dax(ip))
 		flags |= S_DAX;
+	if (xflags & FS_XFLAG_VERITY)
+		flags |= S_VERITY;
 
 	/*
 	 * S_DAX can only be set during inode initialization and is never set by


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 10/26] xfs: initialize fs-verity on file open and cleanup on inode destruction
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-04-30  3:26   ` [PATCH 09/26] xfs: add inode on-disk VERITY flag Darrick J. Wong
@ 2024-04-30  3:26   ` Darrick J. Wong
  2024-04-30  3:26   ` [PATCH 11/26] xfs: don't allow to enable DAX on fs-verity sealed inode Darrick J. Wong
                     ` (15 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:26 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

fs-verity will read and attach metadata (not the tree itself) from
a disk for those inodes which already have fs-verity enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c  |    8 ++++++++
 fs/xfs/xfs_super.c |    2 ++
 2 files changed, 10 insertions(+)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 75ec4152ecafc..fe1f108aa6bff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -33,6 +33,7 @@
 #include <linux/fadvise.h>
 #include <linux/mount.h>
 #include <linux/fsnotify.h>
+#include <linux/fsverity.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
 
@@ -1477,10 +1478,17 @@ xfs_file_open(
 	struct inode	*inode,
 	struct file	*file)
 {
+	int		error;
+
 	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
 		return -EIO;
 	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC |
 			FMODE_DIO_PARALLEL_WRITE | FMODE_CAN_ODIRECT;
+
+	error = fsverity_file_open(inode, file);
+	if (error)
+		return error;
+
 	return generic_file_open(inode, file);
 }
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8e2f263b444c6..72842d4f16c92 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -52,6 +52,7 @@
 #include <linux/magic.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
+#include <linux/fsverity.h>
 
 static const struct super_operations xfs_super_operations;
 
@@ -667,6 +668,7 @@ xfs_fs_destroy_inode(
 	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 	XFS_STATS_INC(ip->i_mount, vn_rele);
 	XFS_STATS_INC(ip->i_mount, vn_remove);
+	fsverity_cleanup_inode(inode);
 	xfs_inode_mark_reclaimable(ip);
 }
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 11/26] xfs: don't allow to enable DAX on fs-verity sealed inode
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-04-30  3:26   ` [PATCH 10/26] xfs: initialize fs-verity on file open and cleanup on inode destruction Darrick J. Wong
@ 2024-04-30  3:26   ` Darrick J. Wong
  2024-04-30  3:27   ` [PATCH 12/26] xfs: disable direct read path for fs-verity files Darrick J. Wong
                     ` (14 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:26 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

fs-verity doesn't support DAX. Forbid filesystem to enable DAX on
inodes which already have fs-verity enabled. The opposite is checked
when fs-verity is enabled, it won't be enabled if DAX is.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix typo in subject]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iops.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 80e3c2a3c6dbf..2d65da94631c5 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1263,6 +1263,8 @@ xfs_inode_should_enable_dax(
 		return false;
 	if (!xfs_inode_supports_dax(ip))
 		return false;
+	if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+		return false;
 	if (xfs_has_dax_always(ip->i_mount))
 		return true;
 	if (ip->i_diflags2 & XFS_DIFLAG2_DAX)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 12/26] xfs: disable direct read path for fs-verity files
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-04-30  3:26   ` [PATCH 11/26] xfs: don't allow to enable DAX on fs-verity sealed inode Darrick J. Wong
@ 2024-04-30  3:27   ` Darrick J. Wong
  2024-04-30  3:27   ` [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers Darrick J. Wong
                     ` (13 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:27 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

The direct path is not supported on verity files. Attempts to use direct
I/O path on such files should fall back to buffered I/O path.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix braces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index fe1f108aa6bff..2ab28c64373d6 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -269,7 +269,8 @@ xfs_file_dax_read(
 	struct kiocb		*iocb,
 	struct iov_iter		*to)
 {
-	struct xfs_inode	*ip = XFS_I(iocb->ki_filp->f_mapping->host);
+	struct inode		*inode = iocb->ki_filp->f_mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret = 0;
 
 	trace_xfs_file_dax_read(iocb, to);
@@ -322,10 +323,18 @@ xfs_file_read_iter(
 
 	if (IS_DAX(inode))
 		ret = xfs_file_dax_read(iocb, to);
-	else if (iocb->ki_flags & IOCB_DIRECT)
+	else if ((iocb->ki_flags & IOCB_DIRECT) && !fsverity_active(inode))
 		ret = xfs_file_dio_read(iocb, to);
-	else
+	else {
+		/*
+		 * In case fs-verity is enabled, we also fallback to the
+		 * buffered read from the direct read path. Therefore,
+		 * IOCB_DIRECT is set and need to be cleared (see
+		 * generic_file_read_iter())
+		 */
+		iocb->ki_flags &= ~IOCB_DIRECT;
 		ret = xfs_file_buffered_read(iocb, to);
+	}
 
 	if (ret > 0)
 		XFS_STATS_ADD(mp, xs_read_bytes, ret);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-04-30  3:27   ` [PATCH 12/26] xfs: disable direct read path for fs-verity files Darrick J. Wong
@ 2024-04-30  3:27   ` Darrick J. Wong
  2024-05-01  6:54     ` Christoph Hellwig
  2024-04-30  3:27   ` [PATCH 14/26] xfs: add fs-verity support Darrick J. Wong
                     ` (12 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:27 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

xfs_inode.i_flags is an unsigned long, so make these helpers take that
as the flags argument instead of unsigned short.  This is needed for the
next patch.

While we're at it, remove the iflags variable from xfs_iget_cache_miss
because we no longer need it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_icache.c |    4 +---
 fs/xfs/xfs_inode.h  |   14 +++++++-------
 2 files changed, 8 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index ad93df7a47c1c..f05c6510e94f4 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -616,7 +616,6 @@ xfs_iget_cache_miss(
 	struct xfs_inode	*ip;
 	int			error;
 	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ino);
-	int			iflags;
 
 	ip = xfs_inode_alloc(mp, ino);
 	if (!ip)
@@ -696,13 +695,12 @@ xfs_iget_cache_miss(
 	 * memory barrier that ensures this detection works correctly at lookup
 	 * time.
 	 */
-	iflags = XFS_INEW;
 	if (flags & XFS_IGET_DONTCACHE)
 		d_mark_dontcache(VFS_I(ip));
 	ip->i_udquot = NULL;
 	ip->i_gdquot = NULL;
 	ip->i_pdquot = NULL;
-	xfs_iflags_set(ip, iflags);
+	xfs_iflags_set(ip, XFS_INEW);
 
 	/* insert the new inode */
 	spin_lock(&pag->pag_ici_lock);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 247cff3d75fd7..503ea082dfac4 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -211,13 +211,13 @@ xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
  * i_flags helper functions
  */
 static inline void
-__xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
+__xfs_iflags_set(xfs_inode_t *ip, unsigned long flags)
 {
 	ip->i_flags |= flags;
 }
 
 static inline void
-xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
+xfs_iflags_set(xfs_inode_t *ip, unsigned long flags)
 {
 	spin_lock(&ip->i_flags_lock);
 	__xfs_iflags_set(ip, flags);
@@ -225,7 +225,7 @@ xfs_iflags_set(xfs_inode_t *ip, unsigned short flags)
 }
 
 static inline void
-xfs_iflags_clear(xfs_inode_t *ip, unsigned short flags)
+xfs_iflags_clear(xfs_inode_t *ip, unsigned long flags)
 {
 	spin_lock(&ip->i_flags_lock);
 	ip->i_flags &= ~flags;
@@ -233,13 +233,13 @@ xfs_iflags_clear(xfs_inode_t *ip, unsigned short flags)
 }
 
 static inline int
-__xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
+__xfs_iflags_test(xfs_inode_t *ip, unsigned long flags)
 {
 	return (ip->i_flags & flags);
 }
 
 static inline int
-xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
+xfs_iflags_test(xfs_inode_t *ip, unsigned long flags)
 {
 	int ret;
 	spin_lock(&ip->i_flags_lock);
@@ -249,7 +249,7 @@ xfs_iflags_test(xfs_inode_t *ip, unsigned short flags)
 }
 
 static inline int
-xfs_iflags_test_and_clear(xfs_inode_t *ip, unsigned short flags)
+xfs_iflags_test_and_clear(xfs_inode_t *ip, unsigned long flags)
 {
 	int ret;
 
@@ -262,7 +262,7 @@ xfs_iflags_test_and_clear(xfs_inode_t *ip, unsigned short flags)
 }
 
 static inline int
-xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned short flags)
+xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned long flags)
 {
 	int ret;
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 14/26] xfs: add fs-verity support
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-04-30  3:27   ` [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers Darrick J. Wong
@ 2024-04-30  3:27   ` Darrick J. Wong
  2024-04-30  3:28   ` [PATCH 15/26] xfs: create a per-mount shrinker for verity inodes merkle tree blocks Darrick J. Wong
                     ` (11 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:27 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

Add integration with fs-verity. The XFS store fs-verity metadata in
the extended file attributes. The metadata consist of verity
descriptor and Merkle tree blocks.

The descriptor is stored under "vdesc" extended attribute. The
Merkle tree blocks are stored under binary indexes which are offsets
into the Merkle tree.

When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
flag is set meaning that the Merkle tree is being build. The
initialization ends with storing of verity descriptor and setting
inode on-disk flag (XFS_DIFLAG2_VERITY).

The verification on read is done in read path of iomap.

Merkle tree blocks are indexed by a per-AG rhashtable to reduce the time
it takes to load a block from disk in a manner that doesn't bloat struct
xfs_inode.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: replace caching implementation with an xarray, other cleanups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile               |    2 
 fs/xfs/libxfs/xfs_ag.h        |    8 
 fs/xfs/libxfs/xfs_attr.c      |    4 
 fs/xfs/libxfs/xfs_da_format.h |   14 +
 fs/xfs/libxfs/xfs_ondisk.h    |    3 
 fs/xfs/libxfs/xfs_verity.c    |   58 +++
 fs/xfs/libxfs/xfs_verity.h    |   13 +
 fs/xfs/xfs_fsops.c            |    6 
 fs/xfs/xfs_fsverity.c         |  758 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_fsverity.h         |   32 ++
 fs/xfs/xfs_inode.h            |    2 
 fs/xfs/xfs_mount.c            |   10 -
 fs/xfs/xfs_super.c            |   22 +
 fs/xfs/xfs_trace.c            |    1 
 fs/xfs/xfs_trace.h            |   39 ++
 15 files changed, 971 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/libxfs/xfs_verity.c
 create mode 100644 fs/xfs/libxfs/xfs_verity.h
 create mode 100644 fs/xfs/xfs_fsverity.c
 create mode 100644 fs/xfs/xfs_fsverity.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f8e72e53d9ec5..34176ba4c77ef 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -57,6 +57,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_trans_resv.o \
 				   xfs_trans_space.o \
 				   xfs_types.o \
+				   xfs_verity.o \
 				   )
 # xfs_rtbitmap is shared with libxfs
 xfs-$(CONFIG_XFS_RT)		+= $(addprefix libxfs/, \
@@ -142,6 +143,7 @@ xfs-$(CONFIG_XFS_POSIX_ACL)	+= xfs_acl.o
 xfs-$(CONFIG_SYSCTL)		+= xfs_sysctl.o
 xfs-$(CONFIG_COMPAT)		+= xfs_ioctl32.o
 xfs-$(CONFIG_EXPORTFS_BLOCK_OPS)	+= xfs_pnfs.o
+xfs-$(CONFIG_FS_VERITY)		+= xfs_fsverity.o
 
 # notify failure
 ifeq ($(CONFIG_MEMORY_FAILURE),y)
diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 80bf8771ea2ac..792ce162312e7 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -123,6 +123,12 @@ struct xfs_perag {
 
 	/* Hook to feed rmapbt updates to an active online repair. */
 	struct xfs_hooks	pag_rmap_update_hooks;
+
+# ifdef CONFIG_FS_VERITY
+	/* per-inode merkle tree caches */
+	spinlock_t		pagi_merkle_lock;
+	struct rhashtable	pagi_merkle_blobs;
+# endif /* CONFIG_FS_VERITY */
 #endif /* __KERNEL__ */
 };
 
@@ -135,6 +141,7 @@ struct xfs_perag {
 #define XFS_AGSTATE_ALLOWS_INODES	3
 #define XFS_AGSTATE_AGFL_NEEDS_RESET	4
 #define XFS_AGSTATE_NOALLOC		5
+#define XFS_AGSTATE_MERKLE		6
 
 #define __XFS_AG_OPSTATE(name, NAME) \
 static inline bool xfs_perag_ ## name (struct xfs_perag *pag) \
@@ -148,6 +155,7 @@ __XFS_AG_OPSTATE(prefers_metadata, PREFERS_METADATA)
 __XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES)
 __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
 __XFS_AG_OPSTATE(prohibits_alloc, NOALLOC)
+__XFS_AG_OPSTATE(caches_merkle, MERKLE)
 
 void xfs_free_unused_perag_range(struct xfs_mount *mp, xfs_agnumber_t agstart,
 			xfs_agnumber_t agend);
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 1b9d9ffb16833..953a82d70223e 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -27,6 +27,7 @@
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
 #include "xfs_parent.h"
+#include "xfs_verity.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1619,6 +1620,9 @@ xfs_attr_namecheck(
 	if (!xfs_attr_check_namespace(attr_flags))
 		return false;
 
+	if (attr_flags & XFS_ATTR_VERITY)
+		return xfs_verity_namecheck(attr_flags, name, length);
+
 	/*
 	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
 	 * out, so use >= for the length check.
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index c84b94da3f321..43e9d1f00a4ab 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -929,4 +929,18 @@ struct xfs_parent_rec {
 	__be32	p_gen;
 } __packed;
 
+/*
+ * fs-verity attribute name format
+ *
+ * Merkle tree blocks are stored under extended attributes of the inode.  The
+ * name of the attributes are byte positions into the merkle data.
+ */
+struct xfs_merkle_key {
+	__be64	mk_pos;
+};
+
+/* ondisk xattr name used for the fsverity descriptor */
+#define XFS_VERITY_DESCRIPTOR_NAME	"vdesc"
+#define XFS_VERITY_DESCRIPTOR_NAME_LEN	(sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h
index 7a312aed23373..03aaf508e4a49 100644
--- a/fs/xfs/libxfs/xfs_ondisk.h
+++ b/fs/xfs/libxfs/xfs_ondisk.h
@@ -209,6 +209,9 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
 			16299260424LL);
 
+	/* fs-verity xattrs */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_merkle_key,		8);
+	XFS_CHECK_VALUE(sizeof(XFS_VERITY_DESCRIPTOR_NAME),	6);
 }
 
 #endif /* __XFS_ONDISK_H */
diff --git a/fs/xfs/libxfs/xfs_verity.c b/fs/xfs/libxfs/xfs_verity.c
new file mode 100644
index 0000000000000..ff02c5c840b58
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_verity.c
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_log_format.h"
+#include "xfs_attr.h"
+#include "xfs_verity.h"
+
+/* Set a merkle tree pos in preparation for setting merkle tree attrs. */
+void
+xfs_merkle_key_to_disk(
+	struct xfs_merkle_key	*key,
+	uint64_t		pos)
+{
+	key->mk_pos = cpu_to_be64(pos);
+}
+
+/* Retrieve the merkle tree pos from the attr data. */
+uint64_t
+xfs_merkle_key_from_disk(
+	const void		*attr_name,
+	int			namelen)
+{
+	const struct xfs_merkle_key *key = attr_name;
+
+	ASSERT(namelen == sizeof(struct xfs_merkle_key));
+
+	return be64_to_cpu(key->mk_pos);
+}
+
+/* Return true if verity attr name is valid. */
+bool
+xfs_verity_namecheck(
+	unsigned int		attr_flags,
+	const void		*name,
+	int			namelen)
+{
+	if (!(attr_flags & XFS_ATTR_VERITY))
+		return false;
+
+	/*
+	 * Merkle tree pages are stored under u64 indexes; verity descriptor
+	 * blocks are held in a named attribute.
+	 */
+	if (namelen != sizeof(struct xfs_merkle_key) &&
+	    namelen != XFS_VERITY_DESCRIPTOR_NAME_LEN)
+		return false;
+
+	return true;
+}
diff --git a/fs/xfs/libxfs/xfs_verity.h b/fs/xfs/libxfs/xfs_verity.h
new file mode 100644
index 0000000000000..5813665c5a01e
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_verity.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_VERITY_H__
+#define __XFS_VERITY_H__
+
+void xfs_merkle_key_to_disk(struct xfs_merkle_key *key, uint64_t pos);
+uint64_t xfs_merkle_key_from_disk(const void *attr_name, int namelen);
+bool xfs_verity_namecheck(unsigned int attr_flags, const void *name,
+		int namelen);
+
+#endif	/* __XFS_VERITY_H__ */
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index a2929a0e0367e..1187b1a33b76c 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -25,6 +25,7 @@
 #include "xfs_rtalloc.h"
 #include "xfs_rtrmap_btree.h"
 #include "xfs_rtrefcount_btree.h"
+#include "xfs_fsverity.h"
 
 /*
  * Write new AG headers to disk. Non-transactional, but need to be
@@ -155,6 +156,11 @@ xfs_growfs_data_private(
 		error = xfs_initialize_perag(mp, nagcount, nb, &nagimax);
 		if (error)
 			return error;
+		error = xfs_fsverity_growfs(mp, oagcount, nagcount);
+		if (error) {
+			xfs_free_unused_perag_range(mp, oagcount, nagcount);
+			return error;
+		}
 	} else if (nagcount < oagcount) {
 		/* TODO: shrinking the entire AGs hasn't yet completed */
 		return -EINVAL;
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
new file mode 100644
index 0000000000000..e0f54acd4f786
--- /dev/null
+++ b/fs/xfs/xfs_fsverity.c
@@ -0,0 +1,758 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 Red Hat, Inc.
+ */
+#include "xfs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_log_format.h"
+#include "xfs_attr.h"
+#include "xfs_verity.h"
+#include "xfs_bmap_util.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_trace.h"
+#include "xfs_quota.h"
+#include "xfs_ag.h"
+#include "xfs_fsverity.h"
+#include <linux/fsverity.h>
+
+/*
+ * Merkle Tree Block Cache
+ * =======================
+ *
+ * fsverity requires that the filesystem implement caching of ondisk merkle
+ * tree blocks.  XFS stores merkle tree blocks in the extended attribute data,
+ * which makes it important to keep copies in memory for as long as possible.
+ * This is performed by allocating the data blob structure defined below,
+ * passing the data portion of the blob to xfs_attr_get, and later caching the
+ * data blob via a per-ag hashtable.
+ *
+ * The cache structure indexes merkle tree blocks by the pos given to us by
+ * fsverity, which drastically reduces lookups.  First, it eliminating the need
+ * to walk the xattr structure to find the remote block containing the merkle
+ * tree block.  Second, access to each block in the xattr structure requires a
+ * lookup in the incore extent btree.
+ */
+struct xfs_merkle_blob {
+	struct rhash_head	rhash;
+	struct rcu_head		rcu;
+
+	struct xfs_merkle_bkey	key;
+
+	/* refcount of this item; the cache holds its own ref */
+	refcount_t		refcount;
+
+	unsigned long		flags;
+
+	/* Pointer to the merkle tree block, which is power-of-2 sized */
+	void			*data;
+};
+
+#define XFS_MERKLE_BLOB_VERIFIED_BIT	(0) /* fsverity validated this */
+
+static const struct rhashtable_params xfs_fsverity_merkle_hash_params = {
+	.key_len		= sizeof(struct xfs_merkle_bkey),
+	.key_offset		= offsetof(struct xfs_merkle_blob, key),
+	.head_offset		= offsetof(struct xfs_merkle_blob, rhash),
+	.automatic_shrinking	= true,
+};
+
+/*
+ * Allocate a merkle tree blob object to prepare for reading a merkle tree
+ * object from disk.
+ */
+static inline struct xfs_merkle_blob *
+xfs_merkle_blob_alloc(
+	struct xfs_inode	*ip,
+	u64			pos,
+	unsigned int		blocksize)
+{
+	struct xfs_merkle_blob	*mk;
+
+	mk = kmalloc(sizeof(struct xfs_merkle_blob), GFP_KERNEL);
+	if (!mk)
+		return NULL;
+
+	mk->data = kvzalloc(blocksize, GFP_KERNEL);
+	if (!mk->data) {
+		kfree(mk);
+		return NULL;
+	}
+
+	/* Caller owns this refcount. */
+	refcount_set(&mk->refcount, 1);
+	mk->flags = 0;
+	mk->key.ino = ip->i_ino;
+	mk->key.pos = pos;
+	return mk;
+}
+
+/* Actually free this blob. */
+static void
+xfs_merkle_blob_free(
+	struct callback_head	*cb)
+{
+	struct xfs_merkle_blob	*mk =
+		container_of(cb, struct xfs_merkle_blob, rcu);
+
+	kvfree(mk->data);
+	kfree(mk);
+}
+
+/* Free a merkle tree blob. */
+static inline void
+xfs_merkle_blob_rele(
+	struct xfs_merkle_blob	*mk)
+{
+	if (refcount_dec_and_test(&mk->refcount))
+		call_rcu(&mk->rcu, xfs_merkle_blob_free);
+}
+
+/*
+ * Drop this merkle tree blob from the cache.  Caller must have a reference to
+ * the blob, which will be dropped at the end.
+ */
+static inline void
+xfs_merkle_blob_drop(
+	struct xfs_perag	*pag,
+	struct xfs_merkle_blob	*mk)
+{
+	/*
+	 * Remove the blob from the hash table and drop the cache's
+	 * ref to the blob handle.
+	 */
+	spin_lock(&pag->pagi_merkle_lock);
+	rhashtable_remove_fast(&pag->pagi_merkle_blobs, &mk->rhash,
+			xfs_fsverity_merkle_hash_params);
+	xfs_merkle_blob_rele(mk);
+	spin_unlock(&pag->pagi_merkle_lock);
+
+	/* Drop the reference we obtained above. */
+	xfs_merkle_blob_rele(mk);
+}
+
+/* Drop all the merkle tree blocks from this part of the cache. */
+STATIC void
+xfs_fsverity_drop_cache(
+	struct xfs_inode	*ip,
+	u64			tree_size,
+	unsigned int		block_size)
+{
+	struct xfs_merkle_bkey	key = {
+		.ino		= ip->i_ino,
+		.pos		= 0,
+	};
+	struct xfs_perag	*pag;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_merkle_blob	*mk;
+	s64			freed = 0;
+
+	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
+	if (!pag)
+		return;
+
+	for (key.pos = 0; key.pos < tree_size; key.pos += block_size) {
+		/*
+		 * Try to grab the blob from the hash table and get our own
+		 * reference to the object.  If there's a blob handle but it
+		 * has zero refcount then we're racing with reclaim and can
+		 * move on.
+		 */
+		rcu_read_lock();
+		mk = rhashtable_lookup(&pag->pagi_merkle_blobs, &key,
+				xfs_fsverity_merkle_hash_params);
+		if (mk && !refcount_inc_not_zero(&mk->refcount))
+			mk = NULL;
+		rcu_read_unlock();
+
+		if (!mk)
+			continue;
+
+		trace_xfs_fsverity_cache_drop(mp, &mk->key, _RET_IP_);
+
+		xfs_merkle_blob_drop(pag, mk);
+		freed++;
+	}
+
+	xfs_perag_put(pag);
+}
+
+/*
+ * Drop all the merkle tree blocks out of the cache.  Caller must ensure that
+ * there are no active references to cache items.
+ */
+void
+xfs_fsverity_destroy_inode(
+	struct xfs_inode	*ip)
+{
+	u64			tree_size;
+	unsigned int		block_size;
+	int			error;
+
+	error = fsverity_merkle_tree_geometry(VFS_I(ip), &block_size,
+			&tree_size);
+	if (error)
+		return;
+
+	xfs_fsverity_drop_cache(ip, tree_size, block_size);
+}
+
+/* Return a cached merkle tree block, or NULL. */
+static struct xfs_merkle_blob *
+xfs_fsverity_cache_load(
+	struct xfs_inode	*ip,
+	u64			pos)
+{
+	struct xfs_merkle_bkey	key = {
+		.ino		= ip->i_ino,
+		.pos		= pos,
+	};
+	struct xfs_perag	*pag;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_merkle_blob	*mk;
+
+	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
+	if (!pag)
+		return NULL;
+
+	rcu_read_lock();
+	mk = rhashtable_lookup(&pag->pagi_merkle_blobs, &key,
+			xfs_fsverity_merkle_hash_params);
+	if (mk && !refcount_inc_not_zero(&mk->refcount))
+		mk = NULL;
+	rcu_read_unlock();
+	xfs_perag_put(pag);
+
+	if (!mk) {
+		trace_xfs_fsverity_cache_miss(mp, &key, _RET_IP_);
+		return NULL;
+	}
+
+	trace_xfs_fsverity_cache_hit(mp, &mk->key, _RET_IP_);
+	return mk;
+}
+
+/*
+ * Try to store a merkle tree block in the cache with the given key.
+ *
+ * If the merkle tree block is not already in the cache, the given block @mk
+ * will be added to the cache and returned.  The caller retains its active
+ * reference to @mk.
+ *
+ * If there was already a merkle block in the cache, it will be returned to
+ * the caller with an active reference.  @mk will be untouched.
+ */
+static struct xfs_merkle_blob *
+xfs_fsverity_cache_store(
+	struct xfs_inode	*ip,
+	struct xfs_merkle_blob	*mk)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_merkle_blob	*old;
+	struct xfs_perag	*pag;
+
+	ASSERT(ip->i_ino == mk->key.ino);
+
+	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
+	if (!pag) {
+		ASSERT(pag);
+		return ERR_PTR(-EFSCORRUPTED);
+	}
+
+	spin_lock(&pag->pagi_merkle_lock);
+	old = rhashtable_lookup_get_insert_fast(&pag->pagi_merkle_blobs,
+			&mk->rhash, xfs_fsverity_merkle_hash_params);
+	if (IS_ERR(old)) {
+		spin_unlock(&pag->pagi_merkle_lock);
+		xfs_perag_put(pag);
+		return old;
+	}
+	if (!old) {
+		/*
+		 * There was no previous value.  @mk is now live in the cache.
+		 * Bump the active refcount to transfer ownership to the cache
+		 * and return @mk to the caller.
+		 */
+		refcount_inc(&mk->refcount);
+		spin_unlock(&pag->pagi_merkle_lock);
+		xfs_perag_put(pag);
+
+		trace_xfs_fsverity_cache_store(mp, &mk->key, _RET_IP_);
+		return mk;
+	}
+
+	/*
+	 * We obtained an active reference to a previous value in the cache.
+	 * Return it to the caller.
+	 */
+	refcount_inc(&old->refcount);
+	spin_unlock(&pag->pagi_merkle_lock);
+	xfs_perag_put(pag);
+
+	trace_xfs_fsverity_cache_reuse(mp, &old->key, _RET_IP_);
+	return old;
+}
+
+/* Set up fsverity for this mount. */
+int
+xfs_fsverity_mount(
+	struct xfs_mount	*mp)
+{
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+	int			error;
+
+	if (!xfs_has_verity(mp))
+		return 0;
+
+	for_each_perag(mp, agno, pag) {
+		spin_lock_init(&pag->pagi_merkle_lock);
+		error = rhashtable_init(&pag->pagi_merkle_blobs,
+				&xfs_fsverity_merkle_hash_params);
+		if (error) {
+			xfs_perag_put(pag);
+			goto out_perag;
+		}
+		set_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate);
+	}
+
+	return 0;
+out_perag:
+	for_each_perag(mp, agno, pag) {
+		if (test_and_clear_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate))
+			rhashtable_destroy(&pag->pagi_merkle_blobs);
+	}
+
+	return error;
+}
+
+/* Set up new merkle tree caches for new AGs. */
+int
+xfs_fsverity_growfs(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		old_agcount,
+	xfs_agnumber_t		new_agcount)
+{
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+	int			error;
+
+	if (!xfs_has_verity(mp))
+		return 0;
+
+	agno = old_agcount;
+	for_each_perag_range(mp, agno, new_agcount - 1, pag) {
+		spin_lock_init(&pag->pagi_merkle_lock);
+		error = rhashtable_init(&pag->pagi_merkle_blobs,
+				&xfs_fsverity_merkle_hash_params);
+		if (error) {
+			xfs_perag_put(pag);
+			goto out_perag;
+		}
+		set_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate);
+	}
+
+	return 0;
+out_perag:
+	agno = old_agcount;
+	for_each_perag_range(mp, agno, new_agcount - 1, pag) {
+		if (test_and_clear_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate))
+			rhashtable_destroy(&pag->pagi_merkle_blobs);
+	}
+
+	return error;
+}
+
+struct xfs_fsverity_umount {
+	struct xfs_mount	*mp;
+	s64			freed;
+};
+
+/* Destroy this blob that's still left over in the cache. */
+static void
+xfs_merkle_blob_destroy(
+	void			*ptr,
+	void			*arg)
+{
+	struct xfs_fsverity_umount *fu = arg;
+	struct xfs_merkle_blob	*mk = ptr;
+
+	trace_xfs_fsverity_cache_unmount(fu->mp, &mk->key, _RET_IP_);
+
+	xfs_merkle_blob_rele(ptr);
+	fu->freed++;
+}
+
+/* Tear down fsverity from this mount. */
+void
+xfs_fsverity_unmount(
+	struct xfs_mount	*mp)
+{
+	struct xfs_fsverity_umount fu = {
+		.mp		= mp,
+		.freed		= 0,
+	};
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+
+	if (!xfs_has_verity(mp))
+		return;
+
+	for_each_perag(mp, agno, pag) {
+		if (test_and_clear_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate))
+			rhashtable_free_and_destroy(&pag->pagi_merkle_blobs,
+					xfs_merkle_blob_destroy, &fu);
+	}
+}
+
+/*
+ * Initialize an args structure to load or store the fsverity descriptor.
+ * Caller must ensure @args is zeroed except for value and valuelen.
+ */
+static inline void
+xfs_fsverity_init_vdesc_args(
+	struct xfs_inode	*ip,
+	struct xfs_da_args	*args)
+{
+	args->geo = ip->i_mount->m_attr_geo;
+	args->whichfork = XFS_ATTR_FORK,
+	args->attr_filter = XFS_ATTR_VERITY;
+	args->op_flags = XFS_DA_OP_OKNOENT;
+	args->dp = ip;
+	args->owner = ip->i_ino;
+	args->name = XFS_VERITY_DESCRIPTOR_NAME;
+	args->namelen = XFS_VERITY_DESCRIPTOR_NAME_LEN;
+	xfs_attr_sethash(args);
+}
+
+/*
+ * Initialize an args structure to load or store a merkle tree block.
+ * Caller must ensure @args is zeroed except for value and valuelen.
+ */
+static inline void
+xfs_fsverity_init_merkle_args(
+	struct xfs_inode	*ip,
+	struct xfs_merkle_key	*key,
+	uint64_t		merkleoff,
+	struct xfs_da_args	*args)
+{
+	xfs_merkle_key_to_disk(key, merkleoff);
+	args->geo = ip->i_mount->m_attr_geo;
+	args->whichfork = XFS_ATTR_FORK,
+	args->attr_filter = XFS_ATTR_VERITY;
+	args->op_flags = XFS_DA_OP_OKNOENT;
+	args->dp = ip;
+	args->owner = ip->i_ino;
+	args->name = (const uint8_t *)key;
+	args->namelen = sizeof(struct xfs_merkle_key);
+	xfs_attr_sethash(args);
+}
+
+/* Delete the verity descriptor. */
+static int
+xfs_fsverity_delete_descriptor(
+	struct xfs_inode	*ip)
+{
+	struct xfs_da_args	args = { };
+
+	xfs_fsverity_init_vdesc_args(ip, &args);
+	return xfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
+}
+
+/* Delete a merkle tree block. */
+static int
+xfs_fsverity_delete_merkle_block(
+	struct xfs_inode	*ip,
+	u64			pos)
+{
+	struct xfs_merkle_key	name;
+	struct xfs_da_args	args = { };
+
+	xfs_fsverity_init_merkle_args(ip, &name, pos, &args);
+	return xfs_attr_set(&args, XFS_ATTRUPDATE_REMOVE, false);
+}
+
+/* Retrieve the verity descriptor. */
+static int
+xfs_fsverity_get_descriptor(
+	struct inode		*inode,
+	void			*buf,
+	size_t			buf_size)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_da_args	args = {
+		.value		= buf,
+		.valuelen	= buf_size,
+	};
+	int			error = 0;
+
+	/*
+	 * The fact that (returned attribute size) == (provided buf_size) is
+	 * checked by xfs_attr_copy_value() (returns -ERANGE).  No descriptor
+	 * is treated as a short read so that common fsverity code will
+	 * complain.
+	 */
+	xfs_fsverity_init_vdesc_args(ip, &args);
+	error = xfs_attr_get(&args);
+	if (error == -ENOATTR)
+		return 0;
+	if (error)
+		return error;
+
+	return args.valuelen;
+}
+
+/*
+ * Clear out old fsverity metadata before we start building a new one.  This
+ * could happen if, say, we crashed while building fsverity data.
+ */
+static int
+xfs_fsverity_delete_stale_metadata(
+	struct xfs_inode	*ip,
+	u64			new_tree_size,
+	unsigned int		tree_blocksize)
+{
+	u64			pos;
+	int			error = 0;
+
+	/*
+	 * Delete as many merkle tree blocks in increasing blkno order until we
+	 * don't find any more.  That ought to be good enough for avoiding
+	 * dead bloat without excessive runtime.
+	 */
+	for (pos = new_tree_size; !error; pos += tree_blocksize) {
+		if (fatal_signal_pending(current))
+			return -EINTR;
+		error = xfs_fsverity_delete_merkle_block(ip, pos);
+		if (error)
+			break;
+	}
+
+	return error != -ENOATTR ? error : 0;
+}
+
+/* Prepare to enable fsverity by clearing old metadata. */
+static int
+xfs_fsverity_begin_enable(
+	struct file		*filp,
+	u64			merkle_tree_size,
+	unsigned int		tree_blocksize)
+{
+	struct inode		*inode = file_inode(filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	int			error;
+
+	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
+
+	if (IS_DAX(inode))
+		return -EINVAL;
+
+	if (xfs_iflags_test_and_set(ip, XFS_VERITY_CONSTRUCTION))
+		return -EBUSY;
+
+	error = xfs_qm_dqattach(ip);
+	if (error)
+		return error;
+
+	return xfs_fsverity_delete_stale_metadata(ip, merkle_tree_size,
+			tree_blocksize);
+}
+
+/* Try to remove all the fsverity metadata after a failed enablement. */
+static int
+xfs_fsverity_delete_metadata(
+	struct xfs_inode	*ip,
+	u64			merkle_tree_size,
+	unsigned int		tree_blocksize)
+{
+	u64			pos;
+	int			error;
+
+	if (!merkle_tree_size)
+		return 0;
+
+	for (pos = 0; pos < merkle_tree_size; pos += tree_blocksize) {
+		if (fatal_signal_pending(current))
+			return -EINTR;
+		error = xfs_fsverity_delete_merkle_block(ip, pos);
+		if (error == -ENOATTR)
+			error = 0;
+		if (error)
+			return error;
+	}
+
+	error = xfs_fsverity_delete_descriptor(ip);
+	return error != -ENOATTR ? error : 0;
+}
+
+/* Complete (or fail) the process of enabling fsverity. */
+static int
+xfs_fsverity_end_enable(
+	struct file		*filp,
+	const void		*desc,
+	size_t			desc_size,
+	u64			merkle_tree_size,
+	unsigned int		tree_blocksize)
+{
+	struct xfs_da_args	args = {
+		.value		= (void *)desc,
+		.valuelen	= desc_size,
+	};
+	struct inode		*inode = file_inode(filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	int			error = 0;
+
+	xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL);
+
+	/* fs-verity failed, just cleanup */
+	if (desc == NULL)
+		goto out;
+
+	xfs_fsverity_init_vdesc_args(ip, &args);
+	error = xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
+	if (error)
+		goto out;
+
+	/* Set fsverity inode flag */
+	error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange,
+			0, 0, false, &tp);
+	if (error)
+		goto out;
+
+	/*
+	 * Ensure that we've persisted the verity information before we enable
+	 * it on the inode and tell the caller we have sealed the inode.
+	 */
+	ip->i_diflags2 |= XFS_DIFLAG2_VERITY;
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	xfs_trans_set_sync(tp);
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	if (!error)
+		inode->i_flags |= S_VERITY;
+
+out:
+	if (error) {
+		int	error2;
+
+		error2 = xfs_fsverity_delete_metadata(ip,
+				merkle_tree_size, tree_blocksize);
+		if (error2)
+			xfs_alert(ip->i_mount,
+ "ino 0x%llx failed to clean up new fsverity metadata, err %d",
+					ip->i_ino, error2);
+	}
+
+	xfs_iflags_clear(ip, XFS_VERITY_CONSTRUCTION);
+	return error;
+}
+
+/* Retrieve a merkle tree block. */
+static int
+xfs_fsverity_read_merkle(
+	const struct fsverity_readmerkle *req,
+	struct fsverity_blockbuf	*block)
+{
+	struct xfs_inode		*ip = XFS_I(req->inode);
+	struct xfs_merkle_key		name;
+	struct xfs_da_args		args = {
+		.valuelen		= block->size,
+	};
+	struct xfs_merkle_blob		*mk, *new_mk;
+	int				error;
+
+	/* Is the block already cached? */
+	mk = xfs_fsverity_cache_load(ip, block->pos);
+	if (mk)
+		goto out_hit;
+
+	new_mk = xfs_merkle_blob_alloc(ip, block->pos, block->size);
+	if (!new_mk)
+		return -ENOMEM;
+	args.value = new_mk->data;
+
+	/* Read the block in from disk and try to store it in the cache. */
+	xfs_fsverity_init_merkle_args(ip, &name, block->pos, &args);
+	error = xfs_attr_get(&args);
+	if (error)
+		goto out_new_mk;
+
+	mk = xfs_fsverity_cache_store(ip, new_mk);
+	if (IS_ERR(mk)) {
+		xfs_merkle_blob_rele(new_mk);
+		return PTR_ERR(mk);
+	}
+	if (mk != new_mk) {
+		/*
+		 * We raced with another thread to populate the cache and lost.
+		 * Free the new cache blob and continue with the existing one.
+		 */
+		xfs_merkle_blob_rele(new_mk);
+	}
+
+out_hit:
+	block->kaddr   = (void *)mk->data;
+	block->context = mk;
+	block->verified = test_bit(XFS_MERKLE_BLOB_VERIFIED_BIT, &mk->flags);
+
+	return 0;
+
+out_new_mk:
+	xfs_merkle_blob_rele(new_mk);
+	return error;
+}
+
+/* Write a merkle tree block. */
+static int
+xfs_fsverity_write_merkle(
+	const struct fsverity_writemerkle *req,
+	const void			*buf,
+	u64				pos,
+	unsigned int			size)
+{
+	struct inode			*inode = req->inode;
+	struct xfs_inode		*ip = XFS_I(inode);
+	struct xfs_merkle_key		name;
+	struct xfs_da_args		args = {
+		.value			= (void *)buf,
+		.valuelen		= size,
+	};
+
+	xfs_fsverity_init_merkle_args(ip, &name, pos, &args);
+	return xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);
+}
+
+/* Drop a cached merkle tree block.. */
+static void
+xfs_fsverity_drop_merkle(
+	struct fsverity_blockbuf	*block)
+{
+	struct xfs_merkle_blob		*mk = block->context;
+
+	if (block->verified)
+		set_bit(XFS_MERKLE_BLOB_VERIFIED_BIT, &mk->flags);
+	xfs_merkle_blob_rele(mk);
+	block->kaddr = NULL;
+	block->context = NULL;
+}
+
+const struct fsverity_operations xfs_fsverity_ops = {
+	.begin_enable_verity		= xfs_fsverity_begin_enable,
+	.end_enable_verity		= xfs_fsverity_end_enable,
+	.get_verity_descriptor		= xfs_fsverity_get_descriptor,
+	.read_merkle_tree_block		= xfs_fsverity_read_merkle,
+	.write_merkle_tree_block	= xfs_fsverity_write_merkle,
+	.drop_merkle_tree_block		= xfs_fsverity_drop_merkle,
+};
diff --git a/fs/xfs/xfs_fsverity.h b/fs/xfs/xfs_fsverity.h
new file mode 100644
index 0000000000000..9156244dce4fe
--- /dev/null
+++ b/fs/xfs/xfs_fsverity.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_FSVERITY_H__
+#define __XFS_FSVERITY_H__
+
+#ifdef CONFIG_FS_VERITY
+struct xfs_merkle_bkey {
+	/* inumber of the file */
+	xfs_ino_t		ino;
+
+	/* the position of the block in the Merkle tree (in bytes) */
+	u64			pos;
+};
+
+void xfs_fsverity_destroy_inode(struct xfs_inode *ip);
+
+int xfs_fsverity_mount(struct xfs_mount *mp);
+void xfs_fsverity_unmount(struct xfs_mount *mp);
+int xfs_fsverity_growfs(struct xfs_mount *mp, xfs_agnumber_t old_agcount,
+		xfs_agnumber_t new_agcount);
+
+extern const struct fsverity_operations xfs_fsverity_ops;
+#else
+# define xfs_fsverity_destroy_inode(ip)		((void)0)
+# define xfs_fsverity_mount(mp)			(0)
+# define xfs_fsverity_unmount(mp)		((void)0)
+# define xfs_fsverity_growfs(mp, o, n)		(0)
+#endif	/* CONFIG_FS_VERITY */
+
+#endif	/* __XFS_FSVERITY_H__ */
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 503ea082dfac4..a90ed25b14769 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -391,6 +391,8 @@ static inline bool xfs_inode_needs_cow_around(struct xfs_inode *ip)
  */
 #define XFS_IREMAPPING		(1U << 15)
 
+#define XFS_VERITY_CONSTRUCTION	(1U << 16) /* merkle tree construction */
+
 /* All inode state flags related to inode reclaim. */
 #define XFS_ALL_IRECLAIM_FLAGS	(XFS_IRECLAIMABLE | \
 				 XFS_IRECLAIM | \
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index b40c850d97f59..71942e46c7db4 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -38,6 +38,7 @@
 #include "xfs_rtgroup.h"
 #include "xfs_rtrmap_btree.h"
 #include "xfs_rtrefcount_btree.h"
+#include "xfs_fsverity.h"
 #include "scrub/stats.h"
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
@@ -881,6 +882,10 @@ xfs_mountfs(
 	if (error)
 		goto out_fail_wait;
 
+	error = xfs_fsverity_mount(mp);
+	if (error)
+		goto out_inodegc_shrinker;
+
 	/*
 	 * Log's mount-time initialization. The first part of recovery can place
 	 * some items on the AIL, to be handled when recovery is finished or
@@ -891,7 +896,7 @@ xfs_mountfs(
 			      XFS_FSB_TO_BB(mp, sbp->sb_logblocks));
 	if (error) {
 		xfs_warn(mp, "log mount failed");
-		goto out_inodegc_shrinker;
+		goto out_fsverity;
 	}
 
 	/*
@@ -1103,6 +1108,8 @@ xfs_mountfs(
 	 */
 	xfs_unmount_flush_inodes(mp);
 	xfs_log_mount_cancel(mp);
+ out_fsverity:
+	xfs_fsverity_unmount(mp);
  out_inodegc_shrinker:
 	shrinker_free(mp->m_inodegc_shrinker);
  out_fail_wait:
@@ -1194,6 +1201,7 @@ xfs_unmountfs(
 #if defined(DEBUG)
 	xfs_errortag_clearall(mp);
 #endif
+	xfs_fsverity_unmount(mp);
 	shrinker_free(mp->m_inodegc_shrinker);
 	xfs_free_rtgroups(mp);
 	xfs_free_perag(mp);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 72842d4f16c92..24d67b710a1e9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -30,6 +30,7 @@
 #include "xfs_filestream.h"
 #include "xfs_quota.h"
 #include "xfs_sysfs.h"
+#include "xfs_fsverity.h"
 #include "xfs_ondisk.h"
 #include "xfs_rmap_item.h"
 #include "xfs_refcount_item.h"
@@ -53,6 +54,7 @@
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
 #include <linux/fsverity.h>
+#include <linux/iomap.h>
 
 static const struct super_operations xfs_super_operations;
 
@@ -668,6 +670,8 @@ xfs_fs_destroy_inode(
 	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 	XFS_STATS_INC(ip->i_mount, vn_rele);
 	XFS_STATS_INC(ip->i_mount, vn_remove);
+	if (fsverity_active(inode))
+		xfs_fsverity_destroy_inode(ip);
 	fsverity_cleanup_inode(inode);
 	xfs_inode_mark_reclaimable(ip);
 }
@@ -1524,6 +1528,9 @@ xfs_fs_fill_super(
 	sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
 #endif
 	sb->s_op = &xfs_super_operations;
+#ifdef CONFIG_FS_VERITY
+	sb->s_vop = &xfs_fsverity_ops;
+#endif
 
 	/*
 	 * Delay mount work if the debug hook is set. This is debug
@@ -1769,10 +1776,25 @@ xfs_fs_fill_super(
 		xfs_warn(mp,
 	"EXPERIMENTAL parent pointer feature enabled. Use at your own risk!");
 
+	if (xfs_has_verity(mp))
+		xfs_warn(mp,
+	"EXPERIMENTAL fsverity feature in use. Use at your own risk!");
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;
 
+#ifdef CONFIG_FS_VERITY
+	/*
+	 * Don't use a high priority workqueue like the other fsverity
+	 * implementations because that will lead to conflicts with the xfs log
+	 * workqueue.
+	 */
+	error = iomap_init_fsverity(mp->m_super, 0, 0);
+	if (error)
+		goto out_unmount;
+#endif
+
 	root = igrab(VFS_I(mp->m_rootip));
 	if (!root) {
 		error = -ENOENT;
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index b40f01cb0fe8d..9777360088897 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -47,6 +47,7 @@
 #include "xfs_rmap.h"
 #include "xfs_refcount.h"
 #include "xfs_fsrefs.h"
+#include "xfs_fsverity.h"
 
 static inline void
 xfs_rmapbt_crack_agno_opdev(
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 7116e7d9627d0..3e44d38fd871a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -102,6 +102,7 @@ struct xfs_extent_free_item;
 struct xfs_rmap_intent;
 struct xfs_refcount_intent;
 struct xfs_fsrefs;
+struct xfs_merkle_bkey;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -5922,6 +5923,44 @@ TRACE_EVENT(xfs_growfs_check_rtgeom,
 );
 #endif /* CONFIG_XFS_RT */
 
+#ifdef CONFIG_FS_VERITY
+DECLARE_EVENT_CLASS(xfs_fsverity_cache_class,
+	TP_PROTO(struct xfs_mount *mp, const struct xfs_merkle_bkey *key,
+		 unsigned long caller_ip),
+	TP_ARGS(mp, key, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(u64, pos)
+		__field(void *, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->ino = key->ino;
+		__entry->pos = key->pos;
+		__entry->caller_ip = (void *)caller_ip;
+	),
+	TP_printk("dev %d:%d ino 0x%llx pos 0x%llx caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->pos,
+		  __entry->caller_ip)
+)
+
+#define DEFINE_XFS_FSVERITY_CACHE_EVENT(name) \
+DEFINE_EVENT(xfs_fsverity_cache_class, name, \
+	TP_PROTO(struct xfs_mount *mp, const struct xfs_merkle_bkey *key, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, key, caller_ip))
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_miss);
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_hit);
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_reuse);
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_store);
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_drop);
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_unmount);
+DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_reclaim);
+#endif /* CONFIG_XFS_VERITY */
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 15/26] xfs: create a per-mount shrinker for verity inodes merkle tree blocks
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-04-30  3:27   ` [PATCH 14/26] xfs: add fs-verity support Darrick J. Wong
@ 2024-04-30  3:28   ` Darrick J. Wong
  2024-04-30  3:28   ` [PATCH 16/26] xfs: shrink verity blob cache Darrick J. Wong
                     ` (10 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:28 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Create a shrinker for an entire filesystem that will walk the inodes
looking for inodes that are caching merkle tree blocks, and invoke
shrink functions on that cache.  The actual details of shrinking merkle
tree caches are left for subsequent patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsverity.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h    |    6 +++++
 fs/xfs/xfs_trace.h    |   20 +++++++++++++++++
 3 files changed, 84 insertions(+)


diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index e0f54acd4f786..ae3d1bdac2876 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -21,6 +21,7 @@
 #include "xfs_quota.h"
 #include "xfs_ag.h"
 #include "xfs_fsverity.h"
+#include "xfs_icache.h"
 #include <linux/fsverity.h>
 
 /*
@@ -182,6 +183,7 @@ xfs_fsverity_drop_cache(
 	}
 
 	xfs_perag_put(pag);
+	percpu_counter_sub(&mp->m_verity_blocks, freed);
 }
 
 /*
@@ -283,6 +285,7 @@ xfs_fsverity_cache_store(
 		refcount_inc(&mk->refcount);
 		spin_unlock(&pag->pagi_merkle_lock);
 		xfs_perag_put(pag);
+		percpu_counter_add(&mp->m_verity_blocks, 1);
 
 		trace_xfs_fsverity_cache_store(mp, &mk->key, _RET_IP_);
 		return mk;
@@ -300,6 +303,38 @@ xfs_fsverity_cache_store(
 	return old;
 }
 
+/* Count the merkle tree blocks that we might be able to reclaim. */
+static unsigned long
+xfs_fsverity_shrinker_count(
+	struct shrinker		*shrink,
+	struct shrink_control	*sc)
+{
+	struct xfs_mount	*mp = shrink->private_data;
+	s64			count;
+
+	if (!xfs_has_verity(mp))
+		return SHRINK_EMPTY;
+
+	count = percpu_counter_sum_positive(&mp->m_verity_blocks);
+
+	trace_xfs_fsverity_shrinker_count(mp, count, _RET_IP_);
+	return min_t(u64, ULONG_MAX, count);
+}
+
+/* Actually try to reclaim merkle tree blocks. */
+static unsigned long
+xfs_fsverity_shrinker_scan(
+	struct shrinker		*shrink,
+	struct shrink_control	*sc)
+{
+	struct xfs_mount	*mp = shrink->private_data;
+
+	if (!xfs_has_verity(mp))
+		return SHRINK_STOP;
+
+	return 0;
+}
+
 /* Set up fsverity for this mount. */
 int
 xfs_fsverity_mount(
@@ -312,6 +347,10 @@ xfs_fsverity_mount(
 	if (!xfs_has_verity(mp))
 		return 0;
 
+	error = percpu_counter_init(&mp->m_verity_blocks, 0, GFP_KERNEL);
+	if (error)
+		return error;
+
 	for_each_perag(mp, agno, pag) {
 		spin_lock_init(&pag->pagi_merkle_lock);
 		error = rhashtable_init(&pag->pagi_merkle_blobs,
@@ -323,6 +362,20 @@ xfs_fsverity_mount(
 		set_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate);
 	}
 
+	mp->m_verity_shrinker = shrinker_alloc(0, "xfs-verity:%s",
+			mp->m_super->s_id);
+	if (!mp->m_verity_shrinker) {
+		error = -ENOMEM;
+		goto out_perag;
+	}
+
+	mp->m_verity_shrinker->count_objects = xfs_fsverity_shrinker_count;
+	mp->m_verity_shrinker->scan_objects = xfs_fsverity_shrinker_scan;
+	mp->m_verity_shrinker->seeks = 0;
+	mp->m_verity_shrinker->private_data = mp;
+
+	shrinker_register(mp->m_verity_shrinker);
+
 	return 0;
 out_perag:
 	for_each_perag(mp, agno, pag) {
@@ -405,11 +458,16 @@ xfs_fsverity_unmount(
 	if (!xfs_has_verity(mp))
 		return;
 
+	shrinker_free(mp->m_verity_shrinker);
+
 	for_each_perag(mp, agno, pag) {
 		if (test_and_clear_bit(XFS_AGSTATE_MERKLE, &pag->pag_opstate))
 			rhashtable_free_and_destroy(&pag->pagi_merkle_blobs,
 					xfs_merkle_blob_destroy, &fu);
 	}
+
+	ASSERT(percpu_counter_sum(&mp->m_verity_blocks) == fu.freed);
+	percpu_counter_destroy(&mp->m_verity_blocks);
 }
 
 /*
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 78284e91244a8..dd6d33deed030 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -271,6 +271,12 @@ typedef struct xfs_mount {
 
 	/* Hook to feed dirent updates to an active online repair. */
 	struct xfs_hooks	m_dir_update_hooks;
+
+#ifdef CONFIG_FS_VERITY
+	/* shrinker and cached blocks count for merkle trees */
+	struct shrinker		*m_verity_shrinker;
+	struct percpu_counter	m_verity_blocks;
+#endif
 } xfs_mount_t;
 
 #define M_IGEO(mp)		(&(mp)->m_ino_geo)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 3e44d38fd871a..3810e20b9ee9b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -5959,6 +5959,26 @@ DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_store);
 DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_drop);
 DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_unmount);
 DEFINE_XFS_FSVERITY_CACHE_EVENT(xfs_fsverity_cache_reclaim);
+
+TRACE_EVENT(xfs_fsverity_shrinker_count,
+	TP_PROTO(struct xfs_mount *mp, unsigned long long count,
+		 unsigned long caller_ip),
+	TP_ARGS(mp, count, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long long, count)
+		__field(void *, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->count = count;
+		__entry->caller_ip = (void *)caller_ip;
+	),
+	TP_printk("dev %d:%d count %llu caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->count,
+		  __entry->caller_ip)
+)
 #endif /* CONFIG_XFS_VERITY */
 
 #endif /* _TRACE_XFS_H */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 16/26] xfs: shrink verity blob cache
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-04-30  3:28   ` [PATCH 15/26] xfs: create a per-mount shrinker for verity inodes merkle tree blocks Darrick J. Wong
@ 2024-04-30  3:28   ` Darrick J. Wong
  2024-04-30  3:28   ` [PATCH 17/26] xfs: don't store trailing zeroes of merkle tree blocks Darrick J. Wong
                     ` (9 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:28 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Add some shrinkers so that reclaim can free cached merkle tree blocks
when memory is tight.  We add a shrinkref variable to bias reclaim
against freeing the upper levels of the merkle tree in the hope of
maintaining read performance.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsverity.c |   89 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trace.h    |   23 +++++++++++++
 2 files changed, 111 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index ae3d1bdac2876..546c7ec6daadc 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -50,6 +50,9 @@ struct xfs_merkle_blob {
 	/* refcount of this item; the cache holds its own ref */
 	refcount_t		refcount;
 
+	/* number of times the shrinker should ignore this item */
+	atomic_t		shrinkref;
+
 	unsigned long		flags;
 
 	/* Pointer to the merkle tree block, which is power-of-2 sized */
@@ -89,6 +92,7 @@ xfs_merkle_blob_alloc(
 
 	/* Caller owns this refcount. */
 	refcount_set(&mk->refcount, 1);
+	atomic_set(&mk->shrinkref, 0);
 	mk->flags = 0;
 	mk->key.ino = ip->i_ino;
 	mk->key.pos = pos;
@@ -321,18 +325,94 @@ xfs_fsverity_shrinker_count(
 	return min_t(u64, ULONG_MAX, count);
 }
 
+struct xfs_fsverity_scan {
+	struct shrink_control	*sc;
+
+	unsigned long		scanned;
+	unsigned long		freed;
+};
+
+/* Reclaim inactive merkle tree blocks that have run out of second chances. */
+static void
+xfs_fsverity_perag_reclaim(
+	struct xfs_perag		*pag,
+	struct xfs_fsverity_scan	*vs)
+{
+	struct rhashtable_iter		iter;
+	struct xfs_mount		*mp = pag->pag_mount;
+	struct xfs_merkle_blob		*mk;
+	s64				freed = 0;
+
+	rhashtable_walk_enter(&pag->pagi_merkle_blobs, &iter);
+	rhashtable_walk_start(&iter);
+	while ((mk = rhashtable_walk_next(&iter)) != NULL) {
+		if (IS_ERR(mk))
+			continue;
+
+		/*
+		 * Tell the shrinker that we scanned this merkle tree block,
+		 * even if we don't remove it.
+		 */
+		vs->scanned++;
+		if (vs->sc->nr_to_scan-- == 0)
+			break;
+
+		/* Retain if there are active references */
+		if (refcount_read(&mk->refcount) > 1)
+			continue;
+
+		/* Ignore if the item still has lru refcount */
+		if (atomic_add_unless(&mk->shrinkref, -1, 0))
+			continue;
+
+		/*
+		 * Grab our own active reference to the blob handle.  If we
+		 * can't, then we're racing with a cache drop and can move on.
+		 */
+		if (!refcount_inc_not_zero(&mk->refcount))
+			continue;
+
+		rhashtable_walk_stop(&iter);
+
+		trace_xfs_fsverity_cache_reclaim(mp, &mk->key, _RET_IP_);
+
+		xfs_merkle_blob_drop(pag, mk);
+		freed++;
+
+		rhashtable_walk_start(&iter);
+	}
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+
+	percpu_counter_sub(&mp->m_verity_blocks, freed);
+	vs->freed += freed;
+}
+
 /* Actually try to reclaim merkle tree blocks. */
 static unsigned long
 xfs_fsverity_shrinker_scan(
 	struct shrinker		*shrink,
 	struct shrink_control	*sc)
 {
+	struct xfs_fsverity_scan vs = { .sc = sc };
 	struct xfs_mount	*mp = shrink->private_data;
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
 
 	if (!xfs_has_verity(mp))
 		return SHRINK_STOP;
 
-	return 0;
+	for_each_perag(mp, agno, pag) {
+		xfs_fsverity_perag_reclaim(pag, &vs);
+
+		if (sc->nr_to_scan == 0) {
+			xfs_perag_rele(pag);
+			break;
+		}
+	}
+
+	trace_xfs_fsverity_shrinker_scan(mp, vs.scanned, vs.freed, _RET_IP_);
+	return vs.freed;
 }
 
 /* Set up fsverity for this mount. */
@@ -765,6 +845,13 @@ xfs_fsverity_read_merkle(
 	block->context = mk;
 	block->verified = test_bit(XFS_MERKLE_BLOB_VERIFIED_BIT, &mk->flags);
 
+	/*
+	 * Prioritize keeping the root-adjacent levels cached if this isn't a
+	 * streaming read.
+	 */
+	if (req->level != FSVERITY_STREAMING_READ)
+		atomic_set(&mk->shrinkref, req->level + 1);
+
 	return 0;
 
 out_new_mk:
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 3810e20b9ee9b..21e8643e021eb 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -5979,6 +5979,29 @@ TRACE_EVENT(xfs_fsverity_shrinker_count,
 		  __entry->count,
 		  __entry->caller_ip)
 )
+
+TRACE_EVENT(xfs_fsverity_shrinker_scan,
+	TP_PROTO(struct xfs_mount *mp, unsigned long scanned,
+		 unsigned long freed, unsigned long caller_ip),
+	TP_ARGS(mp, scanned, freed, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, scanned)
+		__field(unsigned long, freed)
+		__field(void *, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->scanned = scanned;
+		__entry->freed = freed;
+		__entry->caller_ip = (void *)caller_ip;
+	),
+	TP_printk("dev %d:%d scanned %lu freed %lu caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->scanned,
+		  __entry->freed,
+		  __entry->caller_ip)
+)
 #endif /* CONFIG_XFS_VERITY */
 
 #endif /* _TRACE_XFS_H */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 17/26] xfs: don't store trailing zeroes of merkle tree blocks
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-04-30  3:28   ` [PATCH 16/26] xfs: shrink verity blob cache Darrick J. Wong
@ 2024-04-30  3:28   ` Darrick J. Wong
  2024-04-30  3:28   ` [PATCH 18/26] xfs: use merkle tree offset as attr hash Darrick J. Wong
                     ` (8 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:28 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

As a minor space optimization, don't store trailing zeroes of merkle
tree blocks to reduce space consumption and copying overhead.  This
really only affects the rightmost blocks at each level of the tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_fsverity.c |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index 546c7ec6daadc..f6c650e81cb26 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -874,6 +874,16 @@ xfs_fsverity_write_merkle(
 		.value			= (void *)buf,
 		.valuelen		= size,
 	};
+	const char			*p;
+
+	/*
+	 * Don't store trailing zeroes, except for the first byte, which we
+	 * need to avoid ENODATA errors in the merkle read path.
+	 */
+	p = buf + size - 1;
+	while (p >= (const char *)buf && *p == 0)
+		p--;
+	args.valuelen = max(1, p - (const char *)buf + 1);
 
 	xfs_fsverity_init_merkle_args(ip, &name, pos, &args);
 	return xfs_attr_set(&args, XFS_ATTRUPDATE_UPSERT, false);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-04-30  3:28   ` [PATCH 17/26] xfs: don't store trailing zeroes of merkle tree blocks Darrick J. Wong
@ 2024-04-30  3:28   ` Darrick J. Wong
  2024-05-01  6:53     ` Christoph Hellwig
  2024-04-30  3:29   ` [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks Darrick J. Wong
                     ` (7 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:28 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

I was exploring the fsverity metadata with xfs_db after creating a 220MB
verity file, and I noticed the following in the debugger output:

entries[0-75] = [hashval,nameidx,incomplete,root,secure,local,parent,verity]
0:[0,4076,0,0,0,0,0,1]
1:[0,1472,0,0,0,1,0,1]
2:[0x800,4056,0,0,0,0,0,1]
3:[0x800,4036,0,0,0,0,0,1]
...
72:[0x12000,2716,0,0,0,0,0,1]
73:[0x12000,2696,0,0,0,0,0,1]
74:[0x12800,2676,0,0,0,0,0,1]
75:[0x12800,2656,0,0,0,0,0,1]
...
nvlist[0].merkle_off = 0x18000
nvlist[1].merkle_off = 0
nvlist[2].merkle_off = 0x19000
nvlist[3].merkle_off = 0x1000
...
nvlist[71].merkle_off = 0x5b000
nvlist[72].merkle_off = 0x44000
nvlist[73].merkle_off = 0x5c000
nvlist[74].merkle_off = 0x45000
nvlist[75].merkle_off = 0x5d000

Within just this attr leaf block, there are 76 attr entries, but only 38
distinct hash values.  There are 415 merkle tree blocks for this file,
but we already have hash collisions.  This isn't good performance from
the standard da hash function because we're mostly shifting and rolling
zeroes around.

However, we don't even have to do that much work -- the merkle tree
block keys are themslves u64 values.  Truncate that value to 32 bits
(the size of xfs_dahash_t) and use that for the hash.  We won't have any
collisions between merkle tree blocks until that tree grows to 2^32nd
blocks.  On a 4k block filesystem, we won't hit that unless the file
contains more than 2^49 bytes, assuming sha256.

As a side effect, the keys for merkle tree blocks get written out in
roughly sequential order, though I didn't observe any change in
performance.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c      |    2 ++
 fs/xfs/libxfs/xfs_da_format.h |    6 ++++++
 fs/xfs/libxfs/xfs_verity.c    |   16 ++++++++++++++++
 fs/xfs/libxfs/xfs_verity.h    |    1 +
 4 files changed, 25 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 953a82d70223e..d21a743f90ea7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -462,6 +462,8 @@ xfs_attr_hashval(
 
 	if (attr_flags & XFS_ATTR_PARENT)
 		return xfs_parent_hashattr(mp, name, namelen, value, valuelen);
+	if (attr_flags & XFS_ATTR_VERITY)
+		return xfs_verity_hashname(name, namelen);
 
 	return xfs_attr_hashname(name, namelen);
 }
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 43e9d1f00a4ab..c95e8ca22daad 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -943,4 +943,10 @@ struct xfs_merkle_key {
 #define XFS_VERITY_DESCRIPTOR_NAME	"vdesc"
 #define XFS_VERITY_DESCRIPTOR_NAME_LEN	(sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
 
+/*
+ * Merkle tree blocks cannot be smaller than 1k in size, so the hash function
+ * can right-shift the merkle offset by this amount without losing anything.
+ */
+#define XFS_VERITY_HASH_SHIFT		(10)
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_verity.c b/fs/xfs/libxfs/xfs_verity.c
index ff02c5c840b58..8c470014b915c 100644
--- a/fs/xfs/libxfs/xfs_verity.c
+++ b/fs/xfs/libxfs/xfs_verity.c
@@ -56,3 +56,19 @@ xfs_verity_namecheck(
 
 	return true;
 }
+
+/*
+ * Compute name hash for a verity attribute.  For merkle tree blocks, we want
+ * to use the merkle tree block offset as the hash value to avoid collisions
+ * between blocks unless the merkle tree becomes larger than 2^32 blocks.
+ */
+xfs_dahash_t
+xfs_verity_hashname(
+	const uint8_t		*name,
+	unsigned int		namelen)
+{
+	if (namelen != sizeof(struct xfs_merkle_key))
+		return xfs_attr_hashname(name, namelen);
+
+	return xfs_merkle_key_from_disk(name, namelen) >> XFS_VERITY_HASH_SHIFT;
+}
diff --git a/fs/xfs/libxfs/xfs_verity.h b/fs/xfs/libxfs/xfs_verity.h
index 5813665c5a01e..3d7485c511d58 100644
--- a/fs/xfs/libxfs/xfs_verity.h
+++ b/fs/xfs/libxfs/xfs_verity.h
@@ -9,5 +9,6 @@ void xfs_merkle_key_to_disk(struct xfs_merkle_key *key, uint64_t pos);
 uint64_t xfs_merkle_key_from_disk(const void *attr_name, int namelen);
 bool xfs_verity_namecheck(unsigned int attr_flags, const void *name,
 		int namelen);
+xfs_dahash_t xfs_verity_hashname(const uint8_t *name, unsigned int namelen);
 
 #endif	/* __XFS_VERITY_H__ */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-04-30  3:28   ` [PATCH 18/26] xfs: use merkle tree offset as attr hash Darrick J. Wong
@ 2024-04-30  3:29   ` Darrick J. Wong
  2024-05-01  6:47     ` Christoph Hellwig
  2024-04-30  3:29   ` [PATCH 20/26] xfs: add fs-verity ioctls Darrick J. Wong
                     ` (6 subsequent siblings)
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:29 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Now that fsverity tells our merkle tree io functions about what a hash
of a data block full of zeroes looks like, we can use this information
to avoid writing out merkle tree blocks for sparse regions of the file.
For verified gold master images this can save quite a bit of overhead.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_fsverity.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index f6c650e81cb26..e2de99272b7da 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -824,6 +824,20 @@ xfs_fsverity_read_merkle(
 	/* Read the block in from disk and try to store it in the cache. */
 	xfs_fsverity_init_merkle_args(ip, &name, block->pos, &args);
 	error = xfs_attr_get(&args);
+	if (error == -ENOATTR) {
+		u8		*p;
+		unsigned int	i;
+
+		/*
+		 * No attribute found.  Synthesize a buffer full of the zero
+		 * digests on the assumption that we elided them at write time.
+		 */
+		for (i = 0, p = new_mk->data;
+		     i < block->size;
+		     i += req->digest_size, p += req->digest_size)
+			memcpy(p, req->zero_digest, req->digest_size);
+		error = 0;
+	}
 	if (error)
 		goto out_new_mk;
 
@@ -875,10 +889,23 @@ xfs_fsverity_write_merkle(
 		.valuelen		= size,
 	};
 	const char			*p;
+	unsigned int			i;
+
+	/*
+	 * If this is a block full of hashes of zeroed blocks, don't bother
+	 * storing the block.  We can synthesize them later.
+	 */
+	for (i = 0, p = buf;
+	     i < size;
+	     i += req->digest_size, p += req->digest_size)
+		if (memcmp(p, req->zero_digest, req->digest_size))
+			break;
+	if (i == size)
+		return 0;
 
 	/*
 	 * Don't store trailing zeroes, except for the first byte, which we
-	 * need to avoid ENODATA errors in the merkle read path.
+	 * need to avoid confusion with elided blocks.
 	 */
 	p = buf + size - 1;
 	while (p >= (const char *)buf && *p == 0)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 20/26] xfs: add fs-verity ioctls
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-04-30  3:29   ` [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks Darrick J. Wong
@ 2024-04-30  3:29   ` Darrick J. Wong
  2024-04-30  3:29   ` [PATCH 21/26] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
                     ` (5 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:29 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

Add fs-verity ioctls to enable, dump metadata (descriptor and Merkle
tree pages) and obtain file's digest.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: remove unnecessary casting]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_ioctl.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6eed1e52d3fde..b05930462f461 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -47,6 +47,7 @@
 #include <linux/fileattr.h>
 #include <linux/security.h>
 #include <linux/fsnotify.h>
+#include <linux/fsverity.h>
 
 /* Return 0 on success or positive error */
 int
@@ -1574,6 +1575,21 @@ xfs_file_ioctl(
 	case XFS_IOC_MAP_FREESP:
 		return xfs_ioc_map_freesp(filp, arg);
 
+	case FS_IOC_ENABLE_VERITY:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_enable(filp, arg);
+
+	case FS_IOC_MEASURE_VERITY:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_measure(filp, arg);
+
+	case FS_IOC_READ_VERITY_METADATA:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_read_metadata(filp, arg);
+
 	default:
 		return -ENOTTY;
 	}


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 21/26] xfs: advertise fs-verity being available on filesystem
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-04-30  3:29   ` [PATCH 20/26] xfs: add fs-verity ioctls Darrick J. Wong
@ 2024-04-30  3:29   ` Darrick J. Wong
  2024-04-30  3:29   ` [PATCH 22/26] xfs: check and repair the verity inode flag state Darrick J. Wong
                     ` (4 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:29 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Advertise that this filesystem supports fsverity.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_fs.h |    1 +
 fs/xfs/libxfs/xfs_sb.c |    2 ++
 2 files changed, 3 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index f9a6a678f1b45..edc019d89702d 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -246,6 +246,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE (1 << 24) /* exchange range */
 #define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 25) /* linux parent pointers */
 
+#define XFS_FSOP_GEOM_FLAGS_VERITY	(1U << 29) /* fs-verity */
 #define XFS_FSOP_GEOM_FLAGS_METADIR	(1U << 30) /* metadata directories */
 
 /*
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 0bf5b4007afd8..29fcbe24f33fd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1437,6 +1437,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE;
 	if (xfs_has_metadir(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
+	if (xfs_has_verity(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_VERITY;
 	geo->rtsectsize = sbp->sb_blocksize;
 	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 22/26] xfs: check and repair the verity inode flag state
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-04-30  3:29   ` [PATCH 21/26] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
@ 2024-04-30  3:29   ` Darrick J. Wong
  2024-04-30  3:30   ` [PATCH 23/26] xfs: teach online repair to evaluate fsverity xattrs Darrick J. Wong
                     ` (3 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:29 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

If an inode has the incore verity iflag set, make sure that we can
actually activate fsverity on that inode.  If activation fails due to
a fsverity metadata validation error, clear the flag.  The usage model
for fsverity requires that any program that cares about verity state is
required to call statx/getflags to check that the flag is set after
opening the file, so clearing the flag will not compromise that model.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/attr.c         |    7 ++++
 fs/xfs/scrub/common.c       |   68 +++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/common.h       |    3 ++
 fs/xfs/scrub/inode.c        |    7 ++++
 fs/xfs/scrub/inode_repair.c |   36 +++++++++++++++++++++++
 5 files changed, 121 insertions(+)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 708334f9b2bd1..b1448832ae6ba 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -646,6 +646,13 @@ xchk_xattr(
 	if (!xfs_inode_hasattr(sc->ip))
 		return -ENOENT;
 
+	/*
+	 * If this is a verity file that won't activate, we cannot check the
+	 * merkle tree geometry.
+	 */
+	if (xchk_inode_verity_broken(sc->ip))
+		xchk_set_incomplete(sc);
+
 	/* Allocate memory for xattr checking. */
 	error = xchk_setup_xattr_buf(sc, 0);
 	if (error == -ENOMEM)
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index ee7355f4450a6..106e079aac71d 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -45,6 +45,8 @@
 #include "scrub/health.h"
 #include "scrub/tempfile.h"
 
+#include <linux/fsverity.h>
+
 /* Common code for the metadata scrubbers. */
 
 /*
@@ -1871,6 +1873,72 @@ xchk_inode_count_blocks(
 	return 0;
 }
 
+/*
+ * If this inode has S_VERITY set on it, read the merkle tree geometry, which
+ * will activate the incore fsverity context for this file.  If the activation
+ * fails with anything other than ENOMEM, the file is corrupt, which we can
+ * detect later with fsverity_active.
+ *
+ * Callers must hold the IOLOCK and must not hold the ILOCK of sc->ip because
+ * activation reads xattrs.  @blocksize and @treesize will be filled out with
+ * merkle tree geometry if they are not NULL pointers.
+ */
+int
+xchk_inode_setup_verity(
+	struct xfs_scrub	*sc,
+	unsigned int		*blocksize,
+	u64			*treesize)
+{
+	unsigned int		bs;
+	u64			ts;
+	int			error;
+
+	if (!IS_VERITY(VFS_I(sc->ip)))
+		return 0;
+
+	error = fsverity_merkle_tree_geometry(VFS_I(sc->ip), &bs, &ts);
+	switch (error) {
+	case 0:
+		/* fsverity is active; return tree geometry. */
+		if (blocksize)
+			*blocksize = bs;
+		if (treesize)
+			*treesize = ts;
+		break;
+	case -ENODATA:
+	case -EMSGSIZE:
+	case -EINVAL:
+	case -EFSCORRUPTED:
+	case -EFBIG:
+		/*
+		 * The nonzero errno codes above are the error codes that can
+		 * be returned from fsverity on metadata validation errors.
+		 * Set the geometry to zero.
+		 */
+		if (blocksize)
+			*blocksize = 0;
+		if (treesize)
+			*treesize = 0;
+		return 0;
+	default:
+		/* runtime errors */
+		return error;
+	}
+
+	return 0;
+}
+
+/*
+ * Is this a verity file that failed to activate?  Callers must have tried to
+ * activate fsverity via xchk_inode_setup_verity.
+ */
+bool
+xchk_inode_verity_broken(
+	struct xfs_inode	*ip)
+{
+	return IS_VERITY(VFS_I(ip)) && !fsverity_active(VFS_I(ip));
+}
+
 /* Complain about failures... */
 void
 xchk_whine(
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index f15038dd6dedc..673347d51f29f 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -302,5 +302,8 @@ int xchk_inode_is_allocated(struct xfs_scrub *sc, xfs_agino_t agino,
 		bool *inuse);
 int xchk_inode_count_blocks(struct xfs_scrub *sc, int whichfork,
 		xfs_extnum_t *nextents, xfs_filblks_t *count);
+int xchk_inode_setup_verity(struct xfs_scrub *sc, unsigned int *blocksize,
+		u64 *treesize);
+bool xchk_inode_verity_broken(struct xfs_inode *ip);
 
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index cb2530a93a001..91eab60947b12 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -36,6 +36,10 @@ xchk_prepare_iscrub(
 
 	xchk_ilock(sc, XFS_IOLOCK_EXCL);
 
+	error = xchk_inode_setup_verity(sc, NULL, NULL);
+	if (error)
+		return error;
+
 	error = xchk_trans_alloc(sc, 0);
 	if (error)
 		return error;
@@ -825,6 +829,9 @@ xchk_inode(
 	if (S_ISREG(VFS_I(sc->ip)->i_mode))
 		xchk_inode_check_reflink_iflag(sc, sc->ip->i_ino);
 
+	if (xchk_inode_verity_broken(sc->ip))
+		xchk_ino_set_corrupt(sc, sc->sm->sm_ino);
+
 	xchk_inode_check_unlinked(sc);
 
 	xchk_inode_xref(sc, sc->ip->i_ino, &di);
diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index fb8d1ba1f35c0..c990fd7483529 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -566,6 +566,8 @@ xrep_dinode_flags(
 		dip->di_nrext64_pad = 0;
 	else if (dip->di_version >= 3)
 		dip->di_v3_pad = 0;
+	if (!xfs_has_verity(mp) || !S_ISREG(mode))
+		flags2 &= ~XFS_DIFLAG2_VERITY;
 
 	if (flags2 & XFS_DIFLAG2_METADIR) {
 		xfs_failaddr_t	fa;
@@ -1589,6 +1591,10 @@ xrep_dinode_core(
 	if (iget_error)
 		return iget_error;
 
+	error = xchk_inode_setup_verity(sc, NULL, NULL);
+	if (error)
+		return error;
+
 	error = xchk_trans_alloc(sc, 0);
 	if (error)
 		return error;
@@ -2015,6 +2021,27 @@ xrep_inode_unlinked(
 	return 0;
 }
 
+/*
+ * If this file is a fsverity file, xchk_prepare_iscrub or xrep_dinode_core
+ * should have activated it.  If it's still not active, then there's something
+ * wrong with the verity descriptor and we should turn it off.
+ */
+STATIC int
+xrep_inode_verity(
+	struct xfs_scrub	*sc)
+{
+	struct inode		*inode = VFS_I(sc->ip);
+
+	if (xchk_inode_verity_broken(sc->ip)) {
+		sc->ip->i_diflags2 &= ~XFS_DIFLAG2_VERITY;
+		inode->i_flags &= ~S_VERITY;
+
+		xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+	}
+
+	return 0;
+}
+
 /* Repair an inode's fields. */
 int
 xrep_inode(
@@ -2064,6 +2091,15 @@ xrep_inode(
 			return error;
 	}
 
+	/*
+	 * Disable fsverity if it cannot be activated.  Activation failure
+	 * prohibits the file from being opened, so there cannot be another
+	 * program with an open fd to what it thinks is a verity file.
+	 */
+	error = xrep_inode_verity(sc);
+	if (error)
+		return error;
+
 	/* Reconnect incore unlinked list */
 	error = xrep_inode_unlinked(sc);
 	if (error)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 23/26] xfs: teach online repair to evaluate fsverity xattrs
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-04-30  3:29   ` [PATCH 22/26] xfs: check and repair the verity inode flag state Darrick J. Wong
@ 2024-04-30  3:30   ` Darrick J. Wong
  2024-04-30  3:30   ` [PATCH 24/26] xfs: report verity failures through the health system Darrick J. Wong
                     ` (2 subsequent siblings)
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:30 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Teach online repair to check for unused fsverity metadata and purge it
on reconstruction.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/scrub/attr.c        |  138 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/attr.h        |    6 ++
 fs/xfs/scrub/attr_repair.c |   51 ++++++++++++++++
 fs/xfs/scrub/trace.c       |    1 
 fs/xfs/scrub/trace.h       |   31 ++++++++++
 5 files changed, 226 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index b1448832ae6ba..f5fd7424bad1a 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -18,6 +18,7 @@
 #include "xfs_attr_leaf.h"
 #include "xfs_attr_sf.h"
 #include "xfs_parent.h"
+#include "xfs_verity.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
@@ -126,6 +127,47 @@ xchk_setup_xattr_buf(
 	return 0;
 }
 
+#ifdef CONFIG_FS_VERITY
+/*
+ * Obtain merkle tree geometry information for a verity file so that we can
+ * perform sanity checks of the fsverity xattrs.
+ */
+STATIC int
+xchk_xattr_setup_verity(
+	struct xfs_scrub	*sc)
+{
+	struct xchk_xattr_buf	*ab;
+	int			error;
+
+	/*
+	 * Drop the ILOCK and the transaction because loading the fsverity
+	 * metadata will call into the xattr code.  S_VERITY is enabled with
+	 * IOLOCK_EXCL held, so it should not change here.
+	 */
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+	xchk_trans_cancel(sc);
+
+	error = xchk_setup_xattr_buf(sc, 0);
+	if (error)
+		return error;
+
+	ab = sc->buf;
+	error = xchk_inode_setup_verity(sc, &ab->merkle_blocksize,
+			&ab->merkle_tree_size);
+	if (error)
+		return error;
+
+	error = xchk_trans_alloc(sc, 0);
+	if (error)
+		return error;
+
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	return 0;
+}
+#else
+# define xchk_xattr_setup_verity(...)	(0)
+#endif /* CONFIG_FS_VERITY */
+
 /* Set us up to scrub an inode's extended attributes. */
 int
 xchk_setup_xattr(
@@ -150,9 +192,89 @@ xchk_setup_xattr(
 			return error;
 	}
 
-	return xchk_setup_inode_contents(sc, 0);
+	error = xchk_setup_inode_contents(sc, 0);
+	if (error)
+		return error;
+
+	if (IS_VERITY(VFS_I(sc->ip))) {
+		error = xchk_xattr_setup_verity(sc);
+		if (error)
+			return error;
+	}
+
+	return error;
 }
 
+#ifdef CONFIG_FS_VERITY
+/* Check the merkle tree xattrs. */
+STATIC void
+xchk_xattr_verity(
+	struct xfs_scrub		*sc,
+	xfs_dablk_t			blkno,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	unsigned int			valuelen)
+{
+	struct xchk_xattr_buf		*ab = sc->buf;
+
+	/* Non-verity filesystems should never have verity xattrs. */
+	if (!xfs_has_verity(sc->mp)) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, blkno);
+		return;
+	}
+
+	/*
+	 * Any verity metadata on a non-verity file are leftovers from a
+	 * previous attempt to enable verity.
+	 */
+	if (!IS_VERITY(VFS_I(sc->ip))) {
+		xchk_ino_set_preen(sc, sc->ip->i_ino);
+		return;
+	}
+
+	/* Zero blocksize occurs if we couldn't load the merkle tree data. */
+	if (ab->merkle_blocksize == 0)
+		return;
+
+	switch (namelen) {
+	case sizeof(struct xfs_merkle_key):
+		/* Oversized blocks are not allowed */
+		if (valuelen > ab->merkle_blocksize) {
+			xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, blkno);
+			return;
+		}
+		break;
+	case XFS_VERITY_DESCRIPTOR_NAME_LEN:
+		/* Has to match the descriptor xattr name */
+		if (memcmp(name, XFS_VERITY_DESCRIPTOR_NAME, namelen))
+			xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, blkno);
+		return;
+	default:
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, blkno);
+		return;
+	}
+
+	/*
+	 * Merkle tree blocks beyond the end of the tree are leftovers from
+	 * a previous failed attempt to enable verity.
+	 */
+	if (xfs_merkle_key_from_disk(name, namelen) >= ab->merkle_tree_size)
+		xchk_ino_set_preen(sc, sc->ip->i_ino);
+}
+#else
+static void
+xchk_xattr_verity(
+	struct xfs_scrub		*sc,
+	xfs_dablk_t			blkno,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	unsigned int			valuelen)
+{
+	/* Should never see verity xattrs when verity is not enabled. */
+	xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, blkno);
+}
+#endif /* CONFIG_FS_VERITY */
+
 /* Extended Attributes */
 
 /*
@@ -216,6 +338,13 @@ xchk_xattr_actor(
 		return -ECANCELED;
 	}
 
+	/* Check verity xattr geometry */
+	if (attr_flags & XFS_ATTR_VERITY) {
+		xchk_xattr_verity(sc, args.blkno, name, namelen, valuelen);
+		if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return -ECANCELED;
+	}
+
 	/*
 	 * Try to allocate enough memory to extract the attr value.  If that
 	 * doesn't work, return -EDEADLOCK as a signal to try again with a
@@ -653,6 +782,13 @@ xchk_xattr(
 	if (xchk_inode_verity_broken(sc->ip))
 		xchk_set_incomplete(sc);
 
+	/*
+	 * If this is a verity file that won't activate, we cannot check the
+	 * merkle tree geometry.
+	 */
+	if (xchk_inode_verity_broken(sc->ip))
+		xchk_set_incomplete(sc);
+
 	/* Allocate memory for xattr checking. */
 	error = xchk_setup_xattr_buf(sc, 0);
 	if (error == -ENOMEM)
diff --git a/fs/xfs/scrub/attr.h b/fs/xfs/scrub/attr.h
index 7db58af56646b..40b8c12384f55 100644
--- a/fs/xfs/scrub/attr.h
+++ b/fs/xfs/scrub/attr.h
@@ -22,6 +22,12 @@ struct xchk_xattr_buf {
 	/* Memory buffer used to extract xattr values. */
 	void			*value;
 	size_t			value_sz;
+
+#ifdef CONFIG_FS_VERITY
+	/* Geometry of the merkle tree attached to this verity file. */
+	u64			merkle_tree_size;
+	unsigned int		merkle_blocksize;
+#endif
 };
 
 bool xchk_xattr_set_map(struct xfs_scrub *sc, unsigned long *map,
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index c7eb94069cafc..ff38c563a090b 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -29,6 +29,7 @@
 #include "xfs_exchrange.h"
 #include "xfs_acl.h"
 #include "xfs_parent.h"
+#include "xfs_verity.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -155,6 +156,44 @@ xrep_setup_xattr(
 	return xrep_tempfile_create(sc, S_IFREG);
 }
 
+#ifdef CONFIG_FS_VERITY
+static int
+xrep_xattr_want_salvage_verity(
+	struct xrep_xattr	*rx,
+	const void		*name,
+	int			namelen,
+	int			valuelen)
+{
+	struct xchk_xattr_buf	*ab = rx->sc->buf;
+
+	if (!xfs_has_verity(rx->sc->mp))
+		return false;
+	if (!IS_VERITY(VFS_I(rx->sc->ip)))
+		return false;
+
+	switch (namelen) {
+	case sizeof(struct xfs_merkle_key):
+		/* Oversized blocks are not allowed */
+		if (valuelen > ab->merkle_blocksize)
+			return false;
+		break;
+	case XFS_VERITY_DESCRIPTOR_NAME_LEN:
+		/* Has to match the descriptor xattr name */
+		return !memcmp(name, XFS_VERITY_DESCRIPTOR_NAME, namelen);
+	default:
+		return false;
+	}
+
+	/*
+	 * Merkle tree blocks beyond the end of the tree are leftovers from
+	 * a previous failed attempt to enable verity.
+	 */
+	return xfs_merkle_key_from_disk(name, namelen) < ab->merkle_tree_size;
+}
+#else
+# define xrep_xattr_want_salvage_verity(...)	(false)
+#endif /* CONFIG_FS_VERITY */
+
 /*
  * Decide if we want to salvage this attribute.  We don't bother with
  * incomplete or oversized keys or values.  The @value parameter can be null
@@ -179,6 +218,9 @@ xrep_xattr_want_salvage(
 		return false;
 	if (attr_flags & XFS_ATTR_PARENT)
 		return xfs_parent_valuecheck(rx->sc->mp, value, valuelen);
+	if (attr_flags & XFS_ATTR_VERITY)
+		return xrep_xattr_want_salvage_verity(rx, name, namelen,
+				valuelen);
 
 	return true;
 }
@@ -212,6 +254,11 @@ xrep_xattr_salvage_key(
 
 		trace_xrep_xattr_salvage_pptr(rx->sc->ip, flags, name,
 				key.namelen, value, valuelen);
+	} else if (flags & XFS_ATTR_VERITY) {
+		key.namelen = namelen;
+
+		trace_xrep_xattr_salvage_verity(rx->sc->ip, flags, name,
+				key.namelen, value, valuelen);
 	} else {
 		while (i < namelen && name[i] != 0)
 			i++;
@@ -663,6 +710,10 @@ xrep_xattr_insert_rec(
 				ab->name, key->namelen, ab->value,
 				key->valuelen);
 		args.op_flags |= XFS_DA_OP_LOGGED;
+	} else if (key->flags & XFS_ATTR_VERITY) {
+		trace_xrep_xattr_insert_verity(rx->sc->ip, key->flags,
+				ab->name, key->namelen, ab->value,
+				key->valuelen);
 	} else {
 		trace_xrep_xattr_insert_rec(rx->sc->tempip, key->flags,
 				ab->name, key->namelen, key->valuelen);
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 6d8acb2f63d8a..69c234f2a4b32 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -22,6 +22,7 @@
 #include "xfs_parent.h"
 #include "xfs_imeta.h"
 #include "xfs_rtgroup.h"
+#include "xfs_verity.h"
 #include "scrub/scrub.h"
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index c43d02f9afade..c41598456dfcf 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -3072,6 +3072,37 @@ DEFINE_EVENT(xrep_pptr_salvage_class, name, \
 DEFINE_XREP_PPTR_SALVAGE_EVENT(xrep_xattr_salvage_pptr);
 DEFINE_XREP_PPTR_SALVAGE_EVENT(xrep_xattr_insert_pptr);
 
+DECLARE_EVENT_CLASS(xrep_verity_salvage_class,
+	TP_PROTO(struct xfs_inode *ip, unsigned int flags, const void *name,
+		 unsigned int namelen, const void *value, unsigned int valuelen),
+	TP_ARGS(ip, flags, name, namelen, value, valuelen),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned long long, merkle_off)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		if (namelen == sizeof(struct xfs_merkle_key))
+			__entry->merkle_off = xfs_merkle_key_from_disk(name,
+								namelen);
+		else
+			__entry->merkle_off = -1ULL;
+	),
+	TP_printk("dev %d:%d ino 0x%llx merkle_off 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->merkle_off)
+)
+#define DEFINE_XREP_VERITY_SALVAGE_EVENT(name) \
+DEFINE_EVENT(xrep_verity_salvage_class, name, \
+	TP_PROTO(struct xfs_inode *ip, unsigned int flags, const void *name, \
+		 unsigned int namelen, const void *value, unsigned int valuelen), \
+	TP_ARGS(ip, flags, name, namelen, value, valuelen))
+DEFINE_XREP_VERITY_SALVAGE_EVENT(xrep_xattr_salvage_verity);
+DEFINE_XREP_VERITY_SALVAGE_EVENT(xrep_xattr_insert_verity);
+
 TRACE_EVENT(xrep_xattr_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_inode *arg_ip),
 	TP_ARGS(ip, arg_ip),


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 24/26] xfs: report verity failures through the health system
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-04-30  3:30   ` [PATCH 23/26] xfs: teach online repair to evaluate fsverity xattrs Darrick J. Wong
@ 2024-04-30  3:30   ` Darrick J. Wong
  2024-04-30  3:30   ` [PATCH 25/26] xfs: make it possible to disable fsverity Darrick J. Wong
  2024-04-30  3:30   ` [PATCH 26/26] xfs: enable ro-compat fs-verity flag Darrick J. Wong
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:30 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Record verity failures and report them through the health system.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/libxfs/xfs_fs.h     |    1 +
 fs/xfs/libxfs/xfs_health.h |    4 +++-
 fs/xfs/xfs_fsverity.c      |   11 +++++++++++
 fs/xfs/xfs_health.c        |    1 +
 4 files changed, 16 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index edc019d89702d..bc529d862af75 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -424,6 +424,7 @@ struct xfs_bulkstat {
 #define XFS_BS_SICK_SYMLINK	(1 << 6)  /* symbolic link remote target */
 #define XFS_BS_SICK_PARENT	(1 << 7)  /* parent pointers */
 #define XFS_BS_SICK_DIRTREE	(1 << 8)  /* directory tree structure */
+#define XFS_BS_SICK_DATA	(1 << 9)  /* file data */
 
 /*
  * Project quota id helpers (previously projid was 16bit only
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index 89b80e957917e..0f8533335e25f 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -105,6 +105,7 @@ struct xfs_rtgroup;
 /* Don't propagate sick status to ag health summary during inactivation */
 #define XFS_SICK_INO_FORGET	(1 << 12)
 #define XFS_SICK_INO_DIRTREE	(1 << 13)  /* directory tree structure */
+#define XFS_SICK_INO_DATA	(1 << 14)  /* file data */
 
 /* Primary evidence of health problems in a given group. */
 #define XFS_SICK_FS_PRIMARY	(XFS_SICK_FS_COUNTERS | \
@@ -143,7 +144,8 @@ struct xfs_rtgroup;
 				 XFS_SICK_INO_XATTR | \
 				 XFS_SICK_INO_SYMLINK | \
 				 XFS_SICK_INO_PARENT | \
-				 XFS_SICK_INO_DIRTREE)
+				 XFS_SICK_INO_DIRTREE | \
+				 XFS_SICK_INO_DATA)
 
 #define XFS_SICK_INO_ZAPPED	(XFS_SICK_INO_BMBTD_ZAPPED | \
 				 XFS_SICK_INO_BMBTA_ZAPPED | \
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index e2de99272b7da..87edf23954336 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -22,6 +22,7 @@
 #include "xfs_ag.h"
 #include "xfs_fsverity.h"
 #include "xfs_icache.h"
+#include "xfs_health.h"
 #include <linux/fsverity.h>
 
 /*
@@ -930,6 +931,15 @@ xfs_fsverity_drop_merkle(
 	block->context = NULL;
 }
 
+static void
+xfs_fsverity_file_corrupt(
+	struct inode		*inode,
+	loff_t			pos,
+	size_t			len)
+{
+	xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
+}
+
 const struct fsverity_operations xfs_fsverity_ops = {
 	.begin_enable_verity		= xfs_fsverity_begin_enable,
 	.end_enable_verity		= xfs_fsverity_end_enable,
@@ -937,4 +947,5 @@ const struct fsverity_operations xfs_fsverity_ops = {
 	.read_merkle_tree_block		= xfs_fsverity_read_merkle,
 	.write_merkle_tree_block	= xfs_fsverity_write_merkle,
 	.drop_merkle_tree_block		= xfs_fsverity_drop_merkle,
+	.file_corrupt			= xfs_fsverity_file_corrupt,
 };
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index 33059d979857a..ce7385c207d37 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -591,6 +591,7 @@ static const struct ioctl_sick_map ino_map[] = {
 	{ XFS_SICK_INO_DIR_ZAPPED,	XFS_BS_SICK_DIR },
 	{ XFS_SICK_INO_SYMLINK_ZAPPED,	XFS_BS_SICK_SYMLINK },
 	{ XFS_SICK_INO_DIRTREE,	XFS_BS_SICK_DIRTREE },
+	{ XFS_SICK_INO_DATA,	XFS_BS_SICK_DATA },
 	{ 0, 0 },
 };
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-04-30  3:30   ` [PATCH 24/26] xfs: report verity failures through the health system Darrick J. Wong
@ 2024-04-30  3:30   ` Darrick J. Wong
  2024-05-01  6:48     ` Christoph Hellwig
  2024-04-30  3:30   ` [PATCH 26/26] xfs: enable ro-compat fs-verity flag Darrick J. Wong
  25 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:30 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Darrick J. Wong <djwong@kernel.org>

Create an experimental ioctl so that we can turn off fsverity.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 Documentation/filesystems/fsverity.rst |   10 ++++++
 fs/verity/enable.c                     |   50 ++++++++++++++++++++++++++++++++
 fs/xfs/xfs_fsverity.c                  |   46 +++++++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c                     |    6 ++++
 include/linux/fsverity.h               |   24 +++++++++++++++
 include/trace/events/fsverity.h        |   13 ++++++++
 include/uapi/linux/fsverity.h          |    1 +
 7 files changed, 150 insertions(+)


diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index 887cdaf162a99..dc688b2eda68d 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -189,6 +189,16 @@ FS_IOC_ENABLE_VERITY can fail with the following errors:
   caller's file descriptor, another open file descriptor, or the file
   reference held by a writable memory map.
 
+FS_IOC_DISABLE_VERITY
+--------------------
+
+The FS_IOC_DISABLE_VERITY ioctl disables fs-verity on a file.  It takes
+a file descriptor.
+
+FS_IOC_DISABLE_VERITY can fail with the following errors:
+
+- ``EOPNOTSUPP``: the filesystem does not support disabling fs-verity.
+
 FS_IOC_MEASURE_VERITY
 ---------------------
 
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 8c6fe4b72b14e..adf8886f4ed29 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -415,3 +415,53 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
 	return err;
 }
 EXPORT_SYMBOL_GPL(fsverity_ioctl_enable);
+
+/**
+ * fsverity_ioctl_disable() - disable verity on a file
+ * @filp: file to enable verity on
+ *
+ * Disable fs-verity on a file.  See the "FS_IOC_DISABLE_VERITY" section of
+ * Documentation/filesystems/fsverity.rst for the documentation.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int fsverity_ioctl_disable(struct file *filp)
+{
+	struct inode *inode = file_inode(filp);
+	const struct fsverity_operations *vops = inode->i_sb->s_vop;
+	struct fsverity_info *vi;
+	u64 tree_size = 0;
+	unsigned int block_size = 0;
+	int err;
+
+	trace_fsverity_disable(inode);
+
+	inode_lock(inode);
+	if (IS_VERITY(inode)) {
+		err = 0;
+		goto out_unlock;
+	}
+
+	if (!vops->disable_verity) {
+		err = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	vi = fsverity_get_info(inode);
+	if (vi) {
+		block_size = vi->tree_params.block_size;
+		tree_size = vi->tree_params.tree_size;
+	}
+
+	err = vops->disable_verity(filp, tree_size, block_size);
+	if (err)
+		goto out_unlock;
+
+	fsverity_cleanup_inode(inode);
+	inode_unlock(inode);
+	return 0;
+out_unlock:
+	inode_unlock(inode);
+	return err;
+}
+EXPORT_SYMBOL_GPL(fsverity_ioctl_disable);
diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index 87edf23954336..184c3e14d581f 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -940,9 +940,55 @@ xfs_fsverity_file_corrupt(
 	xfs_inode_mark_sick(XFS_I(inode), XFS_SICK_INO_DATA);
 }
 
+/* Turn off fs-verity. */
+static int
+xfs_fsverity_disable(
+	struct file		*file,
+	u64			tree_size,
+	unsigned int		block_size)
+{
+	struct inode		*inode = file_inode(file);
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	int			error;
+
+	if (xfs_iflags_test(ip, XFS_VERITY_CONSTRUCTION))
+		return -EBUSY;
+
+	error = xfs_qm_dqattach(ip);
+	if (error)
+		return error;
+
+	xfs_fsverity_drop_cache(ip, tree_size, block_size);
+
+	/* Clear fsverity inode flag */
+	error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange, 0, 0, false,
+			&tp);
+	if (error)
+		return error;
+
+	ip->i_diflags2 &= ~XFS_DIFLAG2_VERITY;
+
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	xfs_trans_set_sync(tp);
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		return error;
+
+	inode->i_flags &= ~S_VERITY;
+	fsverity_cleanup_inode(inode);
+
+	/* Remove the fsverity xattrs. */
+	return xfs_fsverity_delete_metadata(ip, tree_size, block_size);
+}
+
 const struct fsverity_operations xfs_fsverity_ops = {
 	.begin_enable_verity		= xfs_fsverity_begin_enable,
 	.end_enable_verity		= xfs_fsverity_end_enable,
+	.disable_verity			= xfs_fsverity_disable,
 	.get_verity_descriptor		= xfs_fsverity_get_descriptor,
 	.read_merkle_tree_block		= xfs_fsverity_read_merkle,
 	.write_merkle_tree_block	= xfs_fsverity_write_merkle,
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index b05930462f461..d71fc9e6b83eb 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -42,6 +42,7 @@
 #include "xfs_exchrange.h"
 #include "xfs_handle.h"
 #include "xfs_rtgroup.h"
+#include "xfs_fsverity.h"
 
 #include <linux/mount.h>
 #include <linux/fileattr.h>
@@ -1590,6 +1591,11 @@ xfs_file_ioctl(
 			return -EOPNOTSUPP;
 		return fsverity_ioctl_read_metadata(filp, arg);
 
+	case FS_IOC_DISABLE_VERITY:
+		if (!xfs_has_verity(mp))
+			return -EOPNOTSUPP;
+		return fsverity_ioctl_disable(filp);
+
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index 1336f4b9011ea..e9f570f65ed54 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -135,6 +135,24 @@ struct fsverity_operations {
 				 size_t desc_size, u64 merkle_tree_size,
 				 unsigned int tree_blocksize);
 
+	/**
+	 * Disable verity on the given file.
+	 *
+	 * @filp: a readonly file descriptor for the file
+	 * @merkle_tree_size: total bytes the Merkle tree takes up
+	 * @tree_blocksize: the Merkle tree block size
+	 *
+	 * The filesystem must do any needed filesystem-specific preparations
+	 * for disabling verity, e.g. truncating the merkle tree.  It also must
+	 * return -EBUSY if verity is already being enabled on the given file.
+	 *
+	 * i_rwsem is held for write.
+	 *
+	 * Return: 0 on success, -errno on failure
+	 */
+	int (*disable_verity)(struct file *filp, u64 merkle_tree_size,
+			      unsigned int tree_blocksize);
+
 	/**
 	 * Get the verity descriptor of the given inode.
 	 *
@@ -260,6 +278,7 @@ static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
 /* enable.c */
 
 int fsverity_ioctl_enable(struct file *filp, const void __user *arg);
+int fsverity_ioctl_disable(struct file *filp);
 
 /* measure.c */
 
@@ -326,6 +345,11 @@ static inline int fsverity_ioctl_enable(struct file *filp,
 	return -EOPNOTSUPP;
 }
 
+static inline int fsverity_ioctl_disable(struct file *filp)
+{
+	return -EOPNOTSUPP;
+}
+
 /* measure.c */
 
 static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg)
diff --git a/include/trace/events/fsverity.h b/include/trace/events/fsverity.h
index 375fdddac6a99..2678dd3249b32 100644
--- a/include/trace/events/fsverity.h
+++ b/include/trace/events/fsverity.h
@@ -37,6 +37,19 @@ TRACE_EVENT(fsverity_enable,
 		__entry->num_levels)
 );
 
+TRACE_EVENT(fsverity_disable,
+	TP_PROTO(const struct inode *inode),
+	TP_ARGS(inode),
+	TP_STRUCT__entry(
+		__field(ino_t, ino)
+	),
+	TP_fast_assign(
+		__entry->ino = inode->i_ino;
+	),
+	TP_printk("ino %lu",
+		(unsigned long) __entry->ino)
+);
+
 TRACE_EVENT(fsverity_tree_done,
 	TP_PROTO(const struct inode *inode, const struct fsverity_info *vi,
 		 const struct merkle_tree_params *params),
diff --git a/include/uapi/linux/fsverity.h b/include/uapi/linux/fsverity.h
index 15384e22e331e..73a5f83754792 100644
--- a/include/uapi/linux/fsverity.h
+++ b/include/uapi/linux/fsverity.h
@@ -99,5 +99,6 @@ struct fsverity_read_metadata_arg {
 #define FS_IOC_MEASURE_VERITY	_IOWR('f', 134, struct fsverity_digest)
 #define FS_IOC_READ_VERITY_METADATA \
 	_IOWR('f', 135, struct fsverity_read_metadata_arg)
+#define FS_IOC_DISABLE_VERITY	_IO('f', 136)
 
 #endif /* _UAPI_LINUX_FSVERITY_H */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 26/26] xfs: enable ro-compat fs-verity flag
  2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-04-30  3:30   ` [PATCH 25/26] xfs: make it possible to disable fsverity Darrick J. Wong
@ 2024-04-30  3:30   ` Darrick J. Wong
  25 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:30 UTC (permalink / raw)
  To: aalbersh, ebiggers, djwong
  Cc: linux-xfs, alexl, walters, fsverity, linux-fsdevel

From: Andrey Albershteyn <aalbersh@redhat.com>

Finalize fs-verity integration in XFS by making kernel fs-verity
aware with ro-compat flag.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: add spaces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 810f2556762b0..78a12705a88da 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -389,10 +389,11 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
 #define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
-		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
-		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT   | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT   | \
+		 XFS_SB_FEAT_RO_COMPAT_REFLINK  | \
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+		 XFS_SB_FEAT_RO_COMPAT_VERITY)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 01/38] fs: add FS_XFLAG_VERITY for verity files
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
@ 2024-04-30  3:31   ` Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 02/38] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
                     ` (36 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:31 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

Add extended attribute FS_XFLAG_VERITY for inodes with fs-verity
enabled.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
[djwong: fix broken verity flag checks]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/linux.h |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/include/linux.h b/include/linux.h
index 95a0deee2594..d98d387e88b0 100644
--- a/include/linux.h
+++ b/include/linux.h
@@ -249,6 +249,10 @@ struct fsxattr {
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
 #endif
 
+#ifndef FS_XFLAG_VERITY
+#define FS_XFLAG_VERITY		0x00020000	/* fs-verity enabled */
+#endif
+
 /*
  * Reminder: anything added to this file will be compiled into downstream
  * userspace projects!


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 02/38] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 01/38] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
@ 2024-04-30  3:31   ` Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 03/38] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
                     ` (35 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:31 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

In the next few patches we're going to refactor the attr remote code so
that we can support headerless remote xattr values for storing merkle
tree blocks.  For now, let's change the code to use unsigned int to
describe quantities of bytes and blocks that cannot be negative.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 libxfs/xfs_attr_remote.c |   61 +++++++++++++++++++++++-----------------------
 libxfs/xfs_attr_remote.h |    2 +-
 2 files changed, 31 insertions(+), 32 deletions(-)


diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index eb15b272b80f..5f1b9810c5c8 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -46,13 +46,13 @@
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
  */
-int
+unsigned int
 xfs_attr3_rmt_blocks(
-	struct xfs_mount *mp,
-	int		attrlen)
+	struct xfs_mount	*mp,
+	unsigned int		attrlen)
 {
 	if (xfs_has_crc(mp)) {
-		int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		unsigned int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
 		return (attrlen + buflen - 1) / buflen;
 	}
 	return XFS_B_TO_FSB(mp, attrlen);
@@ -91,7 +91,6 @@ xfs_attr3_rmt_verify(
 	struct xfs_mount	*mp,
 	struct xfs_buf		*bp,
 	void			*ptr,
-	int			fsbsize,
 	xfs_daddr_t		bno)
 {
 	struct xfs_attr3_rmt_hdr *rmt = ptr;
@@ -102,7 +101,7 @@ xfs_attr3_rmt_verify(
 		return __this_address;
 	if (be64_to_cpu(rmt->rm_blkno) != bno)
 		return __this_address;
-	if (be32_to_cpu(rmt->rm_bytes) > fsbsize - sizeof(*rmt))
+	if (be32_to_cpu(rmt->rm_bytes) > mp->m_attr_geo->blksize - sizeof(*rmt))
 		return __this_address;
 	if (be32_to_cpu(rmt->rm_offset) +
 				be32_to_cpu(rmt->rm_bytes) > XFS_XATTR_SIZE_MAX)
@@ -121,9 +120,9 @@ __xfs_attr3_rmt_read_verify(
 {
 	struct xfs_mount *mp = bp->b_mount;
 	char		*ptr;
-	int		len;
+	unsigned int	len;
 	xfs_daddr_t	bno;
-	int		blksize = mp->m_attr_geo->blksize;
+	unsigned int	blksize = mp->m_attr_geo->blksize;
 
 	/* no verification of non-crc buffers */
 	if (!xfs_has_crc(mp))
@@ -140,7 +139,7 @@ __xfs_attr3_rmt_read_verify(
 			*failaddr = __this_address;
 			return -EFSBADCRC;
 		}
-		*failaddr = xfs_attr3_rmt_verify(mp, bp, ptr, blksize, bno);
+		*failaddr = xfs_attr3_rmt_verify(mp, bp, ptr, bno);
 		if (*failaddr)
 			return -EFSCORRUPTED;
 		len -= blksize;
@@ -185,7 +184,7 @@ xfs_attr3_rmt_write_verify(
 {
 	struct xfs_mount *mp = bp->b_mount;
 	xfs_failaddr_t	fa;
-	int		blksize = mp->m_attr_geo->blksize;
+	unsigned int	blksize = mp->m_attr_geo->blksize;
 	char		*ptr;
 	int		len;
 	xfs_daddr_t	bno;
@@ -202,7 +201,7 @@ xfs_attr3_rmt_write_verify(
 	while (len > 0) {
 		struct xfs_attr3_rmt_hdr *rmt = (struct xfs_attr3_rmt_hdr *)ptr;
 
-		fa = xfs_attr3_rmt_verify(mp, bp, ptr, blksize, bno);
+		fa = xfs_attr3_rmt_verify(mp, bp, ptr, bno);
 		if (fa) {
 			xfs_verifier_error(bp, -EFSCORRUPTED, fa);
 			return;
@@ -280,20 +279,20 @@ xfs_attr_rmtval_copyout(
 	struct xfs_buf		*bp,
 	struct xfs_inode	*dp,
 	xfs_ino_t		owner,
-	int			*offset,
-	int			*valuelen,
+	unsigned int		*offset,
+	unsigned int		*valuelen,
 	uint8_t			**dst)
 {
 	char			*src = bp->b_addr;
 	xfs_daddr_t		bno = xfs_buf_daddr(bp);
-	int			len = BBTOB(bp->b_length);
-	int			blksize = mp->m_attr_geo->blksize;
+	unsigned int		len = BBTOB(bp->b_length);
+	unsigned int		blksize = mp->m_attr_geo->blksize;
 
 	ASSERT(len >= blksize);
 
 	while (len > 0 && *valuelen > 0) {
-		int hdr_size = 0;
-		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int hdr_size = 0;
+		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 
@@ -329,20 +328,20 @@ xfs_attr_rmtval_copyin(
 	struct xfs_mount *mp,
 	struct xfs_buf	*bp,
 	xfs_ino_t	ino,
-	int		*offset,
-	int		*valuelen,
+	unsigned int	*offset,
+	unsigned int	*valuelen,
 	uint8_t		**src)
 {
 	char		*dst = bp->b_addr;
 	xfs_daddr_t	bno = xfs_buf_daddr(bp);
-	int		len = BBTOB(bp->b_length);
-	int		blksize = mp->m_attr_geo->blksize;
+	unsigned int	len = BBTOB(bp->b_length);
+	unsigned int	blksize = mp->m_attr_geo->blksize;
 
 	ASSERT(len >= blksize);
 
 	while (len > 0 && *valuelen > 0) {
-		int hdr_size;
-		int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int hdr_size;
+		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
@@ -388,12 +387,12 @@ xfs_attr_rmtval_get(
 	struct xfs_buf		*bp;
 	xfs_dablk_t		lblkno = args->rmtblkno;
 	uint8_t			*dst = args->value;
-	int			valuelen;
+	unsigned int		valuelen;
 	int			nmap;
 	int			error;
-	int			blkcnt = args->rmtblkcnt;
+	unsigned int		blkcnt = args->rmtblkcnt;
 	int			i;
-	int			offset = 0;
+	unsigned int		offset = 0;
 
 	trace_xfs_attr_rmtval_get(args);
 
@@ -451,7 +450,7 @@ xfs_attr_rmt_find_hole(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	int			error;
-	int			blkcnt;
+	unsigned int		blkcnt;
 	xfs_fileoff_t		lfileoff = 0;
 
 	/*
@@ -480,11 +479,11 @@ xfs_attr_rmtval_set_value(
 	struct xfs_bmbt_irec	map;
 	xfs_dablk_t		lblkno;
 	uint8_t			*src = args->value;
-	int			blkcnt;
-	int			valuelen;
+	unsigned int		blkcnt;
+	unsigned int		valuelen;
 	int			nmap;
 	int			error;
-	int			offset = 0;
+	unsigned int		offset = 0;
 
 	/*
 	 * Roll through the "value", copying the attribute value to the
@@ -644,7 +643,7 @@ xfs_attr_rmtval_invalidate(
 	struct xfs_da_args	*args)
 {
 	xfs_dablk_t		lblkno;
-	int			blkcnt;
+	unsigned int		blkcnt;
 	int			error;
 
 	/*
diff --git a/libxfs/xfs_attr_remote.h b/libxfs/xfs_attr_remote.h
index d097ec6c4dc3..c64b04f91caf 100644
--- a/libxfs/xfs_attr_remote.h
+++ b/libxfs/xfs_attr_remote.h
@@ -6,7 +6,7 @@
 #ifndef __XFS_ATTR_REMOTE_H__
 #define	__XFS_ATTR_REMOTE_H__
 
-int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
+unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrlen);
 
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 03/38] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 01/38] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 02/38] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
@ 2024-04-30  3:31   ` Darrick J. Wong
  2024-04-30  3:31   ` [PATCH 04/38] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
                     ` (34 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:31 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Turn this into a properly typechecked function, and actually use the
correct blocksize for extended attributes.  The function cannot be
static inline because xfsprogs userspace uses it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 db/attr.c                |    2 +-
 db/metadump.c            |    8 ++++----
 libxfs/xfs_attr_remote.c |   19 ++++++++++++++++---
 libxfs/xfs_da_format.h   |    4 +---
 4 files changed, 22 insertions(+), 11 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index a83ee14d0791..0b1f498e457c 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -221,7 +221,7 @@ attr3_remote_data_count(
 
 	if (hdr->rm_magic != cpu_to_be32(XFS_ATTR3_RMT_MAGIC))
 		return 0;
-	buf_space = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+	buf_space = xfs_attr3_rmt_buf_space(mp);
 	if (be32_to_cpu(hdr->rm_bytes) > buf_space)
 		return buf_space;
 	return be32_to_cpu(hdr->rm_bytes);
diff --git a/db/metadump.c b/db/metadump.c
index 90bec1467623..7337c716fc11 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -1748,7 +1748,7 @@ add_remote_vals(
 		attr_data.remote_vals[attr_data.remote_val_count] = blockidx;
 		attr_data.remote_val_count++;
 		blockidx++;
-		length -= XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		length -= xfs_attr3_rmt_buf_space(mp);
 	}
 
 	if (attr_data.remote_val_count >= MAX_REMOTE_VALS) {
@@ -1785,8 +1785,8 @@ process_attr_block(
 			    attr_data.remote_vals[i] == offset)
 				/* Macros to handle both attr and attr3 */
 				memset(block +
-					(bs - XFS_ATTR3_RMT_BUF_SPACE(mp, bs)),
-				      'v', XFS_ATTR3_RMT_BUF_SPACE(mp, bs));
+					(bs - xfs_attr3_rmt_buf_space(mp)),
+				      'v', xfs_attr3_rmt_buf_space(mp));
 		}
 		return;
 	}
@@ -1798,7 +1798,7 @@ process_attr_block(
 	if (nentries == 0 ||
 	    nentries * sizeof(xfs_attr_leaf_entry_t) +
 			xfs_attr3_leaf_hdr_size(leaf) >
-				XFS_ATTR3_RMT_BUF_SPACE(mp, bs)) {
+				xfs_attr3_rmt_buf_space(mp)) {
 		if (metadump.show_warnings)
 			print_warning("invalid attr count in inode %llu",
 					(long long)metadump.cur_ino);
diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index 5f1b9810c5c8..b98805bb5926 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -42,6 +42,19 @@
  * the logging system and therefore never have a log item.
  */
 
+/* How many bytes can be stored in a remote value buffer? */
+inline unsigned int
+xfs_attr3_rmt_buf_space(
+	struct xfs_mount	*mp)
+{
+	unsigned int		blocksize = mp->m_attr_geo->blksize;
+
+	if (xfs_has_crc(mp))
+		return blocksize - sizeof(struct xfs_attr3_rmt_hdr);
+
+	return blocksize;
+}
+
 /*
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
@@ -52,7 +65,7 @@ xfs_attr3_rmt_blocks(
 	unsigned int		attrlen)
 {
 	if (xfs_has_crc(mp)) {
-		unsigned int buflen = XFS_ATTR3_RMT_BUF_SPACE(mp, mp->m_sb.sb_blocksize);
+		unsigned int buflen = xfs_attr3_rmt_buf_space(mp);
 		return (attrlen + buflen - 1) / buflen;
 	}
 	return XFS_B_TO_FSB(mp, attrlen);
@@ -292,7 +305,7 @@ xfs_attr_rmtval_copyout(
 
 	while (len > 0 && *valuelen > 0) {
 		unsigned int hdr_size = 0;
-		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 
@@ -341,7 +354,7 @@ xfs_attr_rmtval_copyin(
 
 	while (len > 0 && *valuelen > 0) {
 		unsigned int hdr_size;
-		unsigned int byte_cnt = XFS_ATTR3_RMT_BUF_SPACE(mp, blksize);
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index ebde6eb1da65..86de99e2f757 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -880,9 +880,7 @@ struct xfs_attr3_rmt_hdr {
 
 #define XFS_ATTR3_RMT_CRC_OFF	offsetof(struct xfs_attr3_rmt_hdr, rm_crc)
 
-#define XFS_ATTR3_RMT_BUF_SPACE(mp, bufsize)	\
-	((bufsize) - (xfs_has_crc((mp)) ? \
-			sizeof(struct xfs_attr3_rmt_hdr) : 0))
+unsigned int xfs_attr3_rmt_buf_space(struct xfs_mount *mp);
 
 /* Number of bytes in a directory block. */
 static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 04/38] xfs: create a helper to compute the blockcount of a max sized remote value
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-30  3:31   ` [PATCH 03/38] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
@ 2024-04-30  3:31   ` Darrick J. Wong
  2024-04-30  3:32   ` [PATCH 05/38] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
                     ` (33 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:31 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Create a helper function to compute the number of fsblocks needed to
store a maximally-sized extended attribute value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 libxfs/xfs_attr.c        |    2 +-
 libxfs/xfs_attr_remote.h |    6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 04cb39f31bdc..3058e609c514 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -1038,7 +1038,7 @@ xfs_attr_set(
 		break;
 	case XFS_ATTRUPDATE_REMOVE:
 		XFS_STATS_INC(mp, xs_attr_remove);
-		rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+		rmt_blks = xfs_attr3_max_rmt_blocks(mp);
 		break;
 	}
 
diff --git a/libxfs/xfs_attr_remote.h b/libxfs/xfs_attr_remote.h
index c64b04f91caf..e3c6c7d774bf 100644
--- a/libxfs/xfs_attr_remote.h
+++ b/libxfs/xfs_attr_remote.h
@@ -8,6 +8,12 @@
 
 unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrlen);
 
+/* Number of rmt blocks needed to store the maximally sized attr value */
+static inline unsigned int xfs_attr3_max_rmt_blocks(struct xfs_mount *mp)
+{
+	return xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+}
+
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 05/38] xfs: minor cleanups of xfs_attr3_rmt_blocks
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-30  3:31   ` [PATCH 04/38] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
@ 2024-04-30  3:32   ` Darrick J. Wong
  2024-04-30  3:32   ` [PATCH 06/38] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
                     ` (32 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:32 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Clean up the type signature of this function since we don't have
negative attr lengths or block counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 libxfs/xfs_attr_remote.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)


diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index b98805bb5926..f9c0da51a8fa 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -55,19 +55,19 @@ xfs_attr3_rmt_buf_space(
 	return blocksize;
 }
 
-/*
- * Each contiguous block has a header, so it is not just a simple attribute
- * length to FSB conversion.
- */
+/* Compute number of fsblocks needed to store a remote attr value */
 unsigned int
 xfs_attr3_rmt_blocks(
 	struct xfs_mount	*mp,
 	unsigned int		attrlen)
 {
-	if (xfs_has_crc(mp)) {
-		unsigned int buflen = xfs_attr3_rmt_buf_space(mp);
-		return (attrlen + buflen - 1) / buflen;
-	}
+	/*
+	 * Each contiguous block has a header, so it is not just a simple
+	 * attribute length to FSB conversion.
+	 */
+	if (xfs_has_crc(mp))
+		return howmany(attrlen, xfs_attr3_rmt_buf_space(mp));
+
 	return XFS_B_TO_FSB(mp, attrlen);
 }
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 06/38] xfs: use an empty transaction to protect xfs_attr_get from deadlocks
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-30  3:32   ` [PATCH 05/38] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
@ 2024-04-30  3:32   ` Darrick J. Wong
  2024-04-30  3:32   ` [PATCH 07/38] xfs: add attribute type for fs-verity Darrick J. Wong
                     ` (31 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:32 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Wrap the xfs_attr_get_ilocked call in xfs_attr_get with an empty
transaction so that we cannot livelock the kernel if someone injects a
loop into the attr structure or the attr fork bmbt.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 3058e609c514..0a9fb396885e 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -273,6 +273,8 @@ xfs_attr_get(
 
 	XFS_STATS_INC(args->dp->i_mount, xs_attr_get);
 
+	ASSERT(!args->trans);
+
 	if (xfs_is_shutdown(args->dp->i_mount))
 		return -EIO;
 
@@ -285,8 +287,27 @@ xfs_attr_get(
 	/* Entirely possible to look up a name which doesn't exist */
 	args->op_flags = XFS_DA_OP_OKNOENT;
 
+	error = xfs_trans_alloc_empty(args->dp->i_mount, &args->trans);
+	if (error)
+		return error;
+
 	lock_mode = xfs_ilock_attr_map_shared(args->dp);
+
+        /*
+	 * Make sure the attr fork iext tree is loaded.  Use the empty
+	 * transaction to load the bmbt so that we avoid livelocking on loops.
+	 */
+        if (xfs_inode_hasattr(args->dp)) {
+                error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK);
+                if (error)
+                        goto out_cancel;
+        }
+
 	error = xfs_attr_get_ilocked(args);
+
+out_cancel:
+	xfs_trans_cancel(args->trans);
+	args->trans = NULL;
 	xfs_iunlock(args->dp, lock_mode);
 
 	return error;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 07/38] xfs: add attribute type for fs-verity
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-30  3:32   ` [PATCH 06/38] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
@ 2024-04-30  3:32   ` Darrick J. Wong
  2024-04-30  3:32   ` [PATCH 08/38] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks Darrick J. Wong
                     ` (30 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:32 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

The Merkle tree blocks and descriptor are stored in the extended
attributes of the inode. Add new attribute type for fs-verity
metadata. Add XFS_ATTR_INTERNAL_MASK to skip parent pointer and
fs-verity attributes as those are only for internal use. While we're
at it add a few comments in relevant places that internally visible
attributes are not suppose to be handled via interface defined in
xfs_xattr.c.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_da_format.h  |   11 ++++++++---
 libxfs/xfs_log_format.h |    1 +
 2 files changed, 9 insertions(+), 3 deletions(-)


diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 86de99e2f757..27b9ad9f8b2e 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -715,19 +715,23 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
 #define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
+#define	XFS_ATTR_VERITY_BIT	4	/* verity merkle tree and descriptor */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
 #define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
+#define XFS_ATTR_VERITY		(1u << XFS_ATTR_VERITY_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
 
 #define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
-					 XFS_ATTR_PARENT)
+					 XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY)
 
 /* Private attr namespaces not exposed to userspace */
-#define XFS_ATTR_PRIVATE_NSP_MASK	(XFS_ATTR_PARENT)
+#define XFS_ATTR_PRIVATE_NSP_MASK	(XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY)
 
 #define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
 				 XFS_ATTR_LOCAL | \
@@ -737,7 +741,8 @@ struct xfs_attr3_leafblock {
 	{ XFS_ATTR_LOCAL,	"local" }, \
 	{ XFS_ATTR_ROOT,	"root" }, \
 	{ XFS_ATTR_SECURE,	"secure" }, \
-	{ XFS_ATTR_PARENT,	"parent" }
+	{ XFS_ATTR_PARENT,	"parent" }, \
+	{ XFS_ATTR_VERITY,	"verity" }
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index 0f194ae71b42..4d11d6b7b1ad 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -1052,6 +1052,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
 					 XFS_ATTR_PARENT | \
+					 XFS_ATTR_VERITY | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 08/38] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-04-30  3:32   ` [PATCH 07/38] xfs: add attribute type for fs-verity Darrick J. Wong
@ 2024-04-30  3:32   ` Darrick J. Wong
  2024-04-30  3:33   ` [PATCH 09/38] xfs: add fs-verity ro-compat flag Darrick J. Wong
                     ` (29 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:32 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

I enabled fsverity for a ~250MB file and noticed the following xattr
keys that got created for the merkle tree.  These two merkle tree blocks
are written out in ascending order:

nvlist[52].merkle_off = 0x111000
nvlist[53].valueblk = 0x222
nvlist[53].valuelen = 4096

nvlist[53].merkle_off = 0x112000
nvlist[54].valueblk = 0x224
nvlist[54].valuelen = 4096

Notice that while the valuelen is 4k, the block offset increases by two.
Curious, I then loaded up ablock 0x223:

hdr.magic = 0x5841524d
hdr.offset = 4040
hdr.bytes = 56
hdr.crc = 0xad1b8bd8 (correct)
hdr.uuid = 07d3f25c-e550-4118-8ff5-a45c017ba5ef
hdr.owner = 133
hdr.bno = 442144
hdr.lsn = 0xffffffffffffffff
data = <56 bytes of charns data>

Ugh!  Each 4k merkle tree block takes up two fsblocks due to the remote
value header that XFS puts at the start of each remote value block.
That header is 56 bytes long, which is exactly the length of the
spillover here.  This isn't good.

The first thing that I tried was enabling fsverity on a bunch of files,
extracting the merkle tree blocks one by one, and testing their
compressability with gzip, zstd, and xz.  Merkle tree blocks are nearly
indistinguishable from random data, with the result that 99% of the
blocks I sampled got larger under compression.  So that's out.

Next I decided to try eliminating the xfs_attr3_rmt_hdr header, which
would make verity remote values align perfectly with filesystem blocks.
Because remote value blocks are written out with xfs_bwrite, the lsn
field isn't useful.  The merkle tree is itself a bunch of hashes of data
blocks or other merkle tree blocks, which means that a bitflip will
result in a verity failure somewhere in the file.  Hence we don't need
to store an explicit crc, and we could just XOR the ondisk merkle tree
contents with selected attributes.

In the end I decided to create a smaller header structure containing
only a magic, the fsuuid, the inode owner, and the ondisk block number.
These values get XORd into the beginning of the merkle tree block to
detect lost writes when we're writing remote XFS_ATTR_VERITY values to
disk, and XORd out when reading them back in.

With this format change applied, the fsverity overhead halves.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c                |    2 -
 db/metadump.c            |    8 +--
 include/platform_defs.h  |    9 +++
 libxfs/libxfs_api_defs.h |    1 
 libxfs/xfs_attr.c        |    6 +-
 libxfs/xfs_attr_leaf.c   |    5 +-
 libxfs/xfs_attr_remote.c |  125 ++++++++++++++++++++++++++++++++++++++++------
 libxfs/xfs_attr_remote.h |    8 ++-
 libxfs/xfs_da_format.h   |   22 ++++++++
 libxfs/xfs_ondisk.h      |    2 +
 libxfs/xfs_shared.h      |    1 
 11 files changed, 162 insertions(+), 27 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index 0b1f498e457c..8e2bce7b7e02 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -221,7 +221,7 @@ attr3_remote_data_count(
 
 	if (hdr->rm_magic != cpu_to_be32(XFS_ATTR3_RMT_MAGIC))
 		return 0;
-	buf_space = xfs_attr3_rmt_buf_space(mp);
+	buf_space = xfs_attr3_rmt_buf_space(mp, 0);
 	if (be32_to_cpu(hdr->rm_bytes) > buf_space)
 		return buf_space;
 	return be32_to_cpu(hdr->rm_bytes);
diff --git a/db/metadump.c b/db/metadump.c
index 7337c716fc11..23defaee929f 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -1748,7 +1748,7 @@ add_remote_vals(
 		attr_data.remote_vals[attr_data.remote_val_count] = blockidx;
 		attr_data.remote_val_count++;
 		blockidx++;
-		length -= xfs_attr3_rmt_buf_space(mp);
+		length -= xfs_attr3_rmt_buf_space(mp, 0);
 	}
 
 	if (attr_data.remote_val_count >= MAX_REMOTE_VALS) {
@@ -1785,8 +1785,8 @@ process_attr_block(
 			    attr_data.remote_vals[i] == offset)
 				/* Macros to handle both attr and attr3 */
 				memset(block +
-					(bs - xfs_attr3_rmt_buf_space(mp)),
-				      'v', xfs_attr3_rmt_buf_space(mp));
+					(bs - xfs_attr3_rmt_buf_space(mp, 0)),
+				      'v', xfs_attr3_rmt_buf_space(mp, 0));
 		}
 		return;
 	}
@@ -1798,7 +1798,7 @@ process_attr_block(
 	if (nentries == 0 ||
 	    nentries * sizeof(xfs_attr_leaf_entry_t) +
 			xfs_attr3_leaf_hdr_size(leaf) >
-				xfs_attr3_rmt_buf_space(mp)) {
+				xfs_attr3_rmt_buf_space(mp, 0)) {
 		if (metadump.show_warnings)
 			print_warning("invalid attr count in inode %llu",
 					(long long)metadump.cur_ino);
diff --git a/include/platform_defs.h b/include/platform_defs.h
index c01d4c426746..9c28e2744a8d 100644
--- a/include/platform_defs.h
+++ b/include/platform_defs.h
@@ -121,6 +121,15 @@ static inline size_t __ab_c_size(size_t a, size_t b, size_t c)
 #define struct_size_t(type, member, count)					\
 	struct_size((type *)NULL, member, count)
 
+/**
+ * offsetofend() - Report the offset of a struct field within the struct
+ *
+ * @TYPE: The type of the structure
+ * @MEMBER: The member within the structure to get the end offset of
+ */
+#define offsetofend(TYPE, MEMBER) \
+	(offsetof(TYPE, MEMBER)	+ sizeof_field(TYPE, MEMBER))
+
 /*
  * Add the pseudo keyword 'fallthrough' so case statement blocks
  * must end with any of these keywords:
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 19b7ecf5798d..1b6efac9290d 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -43,6 +43,7 @@
 
 #define xfs_attr3_leaf_hdr_from_disk	libxfs_attr3_leaf_hdr_from_disk
 #define xfs_attr3_leaf_read		libxfs_attr3_leaf_read
+#define xfs_attr3_remote_buf_ops	libxfs_attr3_remote_buf_ops
 #define xfs_attr_check_namespace	libxfs_attr_check_namespace
 #define xfs_attr_get			libxfs_attr_get
 #define xfs_attr_hashname		libxfs_attr_hashname
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 0a9fb396885e..c2c411268904 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -341,7 +341,8 @@ xfs_attr_calc_size(
 		 * Out of line attribute, cannot double split, but
 		 * make room for the attribute value itself.
 		 */
-		uint	dblocks = xfs_attr3_rmt_blocks(mp, args->valuelen);
+		uint	dblocks = xfs_attr3_rmt_blocks(mp, args->attr_filter,
+						       args->valuelen);
 		nblks += dblocks;
 		nblks += XFS_NEXTENTADD_SPACE_RES(mp, dblocks, XFS_ATTR_FORK);
 	}
@@ -1055,7 +1056,8 @@ xfs_attr_set(
 		}
 
 		if (!local)
-			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
+			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen,
+					args->valuelen);
 		break;
 	case XFS_ATTRUPDATE_REMOVE:
 		XFS_STATS_INC(mp, xs_attr_remove);
diff --git a/libxfs/xfs_attr_leaf.c b/libxfs/xfs_attr_leaf.c
index 97b71b6500bd..56db9f992fc7 100644
--- a/libxfs/xfs_attr_leaf.c
+++ b/libxfs/xfs_attr_leaf.c
@@ -1563,7 +1563,8 @@ xfs_attr3_leaf_add_work(
 		name_rmt->valuelen = 0;
 		name_rmt->valueblk = 0;
 		args->rmtblkno = 1;
-		args->rmtblkcnt = xfs_attr3_rmt_blocks(mp, args->valuelen);
+		args->rmtblkcnt = xfs_attr3_rmt_blocks(mp, args->attr_filter,
+				args->valuelen);
 		args->rmtvaluelen = args->valuelen;
 	}
 	xfs_trans_log_buf(args->trans, bp,
@@ -2498,6 +2499,7 @@ xfs_attr3_leaf_lookup_int(
 			args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
 			args->rmtblkcnt = xfs_attr3_rmt_blocks(
 							args->dp->i_mount,
+							args->attr_filter,
 							args->rmtvaluelen);
 			return -EEXIST;
 		}
@@ -2546,6 +2548,7 @@ xfs_attr3_leaf_getvalue(
 	args->rmtvaluelen = be32_to_cpu(name_rmt->valuelen);
 	args->rmtblkno = be32_to_cpu(name_rmt->valueblk);
 	args->rmtblkcnt = xfs_attr3_rmt_blocks(args->dp->i_mount,
+					       args->attr_filter,
 					       args->rmtvaluelen);
 	return xfs_attr_copy_value(args, NULL, args->rmtvaluelen);
 }
diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index f9c0da51a8fa..d9c3346f1f5c 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -42,14 +42,23 @@
  * the logging system and therefore never have a log item.
  */
 
+static inline bool
+xfs_attr3_rmt_has_header(
+	struct xfs_mount	*mp,
+	unsigned int		attrns)
+{
+	return xfs_has_crc(mp) && !(attrns & XFS_ATTR_VERITY);
+}
+
 /* How many bytes can be stored in a remote value buffer? */
 inline unsigned int
 xfs_attr3_rmt_buf_space(
-	struct xfs_mount	*mp)
+	struct xfs_mount	*mp,
+	unsigned int		attrns)
 {
 	unsigned int		blocksize = mp->m_attr_geo->blksize;
 
-	if (xfs_has_crc(mp))
+	if (xfs_attr3_rmt_has_header(mp, attrns))
 		return blocksize - sizeof(struct xfs_attr3_rmt_hdr);
 
 	return blocksize;
@@ -59,14 +68,15 @@ xfs_attr3_rmt_buf_space(
 unsigned int
 xfs_attr3_rmt_blocks(
 	struct xfs_mount	*mp,
+	unsigned int		attrns,
 	unsigned int		attrlen)
 {
 	/*
 	 * Each contiguous block has a header, so it is not just a simple
 	 * attribute length to FSB conversion.
 	 */
-	if (xfs_has_crc(mp))
-		return howmany(attrlen, xfs_attr3_rmt_buf_space(mp));
+	if (xfs_attr3_rmt_has_header(mp, attrns))
+		return howmany(attrlen, xfs_attr3_rmt_buf_space(mp, attrns));
 
 	return XFS_B_TO_FSB(mp, attrlen);
 }
@@ -247,6 +257,42 @@ const struct xfs_buf_ops xfs_attr3_rmt_buf_ops = {
 	.verify_struct = xfs_attr3_rmt_verify_struct,
 };
 
+static void
+xfs_attr3_rmtverity_read_verify(
+	struct xfs_buf	*bp)
+{
+}
+
+static xfs_failaddr_t
+xfs_attr3_rmtverity_verify_struct(
+	struct xfs_buf	*bp)
+{
+	return NULL;
+}
+
+static void
+xfs_attr3_rmtverity_write_verify(
+	struct xfs_buf	*bp)
+{
+}
+
+const struct xfs_buf_ops xfs_attr3_rmtverity_buf_ops = {
+	.name = "xfs_attr3_remote_verity",
+	.magic = { 0, 0 },
+	.verify_read = xfs_attr3_rmtverity_read_verify,
+	.verify_write = xfs_attr3_rmtverity_write_verify,
+	.verify_struct = xfs_attr3_rmtverity_verify_struct,
+};
+
+inline const struct xfs_buf_ops *
+xfs_attr3_remote_buf_ops(
+	unsigned int		attrns)
+{
+	if (attrns & XFS_ATTR_VERITY)
+		return &xfs_attr3_rmtverity_buf_ops;
+	return &xfs_attr3_rmt_buf_ops;
+}
+
 STATIC int
 xfs_attr3_rmt_hdr_set(
 	struct xfs_mount	*mp,
@@ -283,6 +329,40 @@ xfs_attr3_rmt_hdr_set(
 	return sizeof(struct xfs_attr3_rmt_hdr);
 }
 
+static void
+xfs_attr_rmtverity_transform(
+	struct xfs_buf		*bp,
+	xfs_ino_t		ino,
+	void			*buf,
+	unsigned int		byte_cnt)
+{
+	struct xfs_mount	*mp = bp->b_mount;
+	struct xfs_attr3_rmtverity_hdr	*hdr = buf;
+	char			*dst;
+	const char		*src;
+	unsigned int		i;
+
+	if (byte_cnt >= offsetofend(struct xfs_attr3_rmtverity_hdr, rmv_owner))
+		hdr->rmv_owner ^= cpu_to_be64(ino);
+
+	if (byte_cnt >= offsetofend(struct xfs_attr3_rmtverity_hdr, rmv_blkno))
+		hdr->rmv_blkno ^= cpu_to_be64(xfs_buf_daddr(bp));
+
+	if (byte_cnt >= offsetofend(struct xfs_attr3_rmtverity_hdr, rmv_magic))
+		hdr->rmv_magic ^= cpu_to_be32(XFS_ATTR3_RMTVERITY_MAGIC);
+
+	if (byte_cnt <= offsetof(struct xfs_attr3_rmtverity_hdr, rmv_uuid))
+		return;
+
+	byte_cnt -= offsetof(struct xfs_attr3_rmtverity_hdr, rmv_uuid);
+	byte_cnt = min(byte_cnt, sizeof(uuid_t));
+
+	dst = (void *)&hdr->rmv_uuid;
+	src = (void *)&mp->m_sb.sb_meta_uuid;
+	for (i = 0; i < byte_cnt; i++)
+		dst[i] ^= src[i];
+}
+
 /*
  * Helper functions to copy attribute data in and out of the one disk extents
  */
@@ -292,6 +372,7 @@ xfs_attr_rmtval_copyout(
 	struct xfs_buf		*bp,
 	struct xfs_inode	*dp,
 	xfs_ino_t		owner,
+	unsigned int		attrns,
 	unsigned int		*offset,
 	unsigned int		*valuelen,
 	uint8_t			**dst)
@@ -305,11 +386,11 @@ xfs_attr_rmtval_copyout(
 
 	while (len > 0 && *valuelen > 0) {
 		unsigned int hdr_size = 0;
-		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp, attrns);
 
 		byte_cnt = min(*valuelen, byte_cnt);
 
-		if (xfs_has_crc(mp)) {
+		if (xfs_attr3_rmt_has_header(mp, attrns)) {
 			if (xfs_attr3_rmt_hdr_ok(src, owner, *offset,
 						  byte_cnt, bno)) {
 				xfs_alert(mp,
@@ -323,6 +404,10 @@ xfs_attr_rmtval_copyout(
 
 		memcpy(*dst, src + hdr_size, byte_cnt);
 
+		if (attrns & XFS_ATTR_VERITY)
+			xfs_attr_rmtverity_transform(bp, dp->i_ino, *dst,
+					byte_cnt);
+
 		/* roll buffer forwards */
 		len -= blksize;
 		src += blksize;
@@ -341,6 +426,7 @@ xfs_attr_rmtval_copyin(
 	struct xfs_mount *mp,
 	struct xfs_buf	*bp,
 	xfs_ino_t	ino,
+	unsigned int	attrns,
 	unsigned int	*offset,
 	unsigned int	*valuelen,
 	uint8_t		**src)
@@ -353,15 +439,20 @@ xfs_attr_rmtval_copyin(
 	ASSERT(len >= blksize);
 
 	while (len > 0 && *valuelen > 0) {
-		unsigned int hdr_size;
-		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp);
+		unsigned int hdr_size = 0;
+		unsigned int byte_cnt = xfs_attr3_rmt_buf_space(mp, attrns);
 
 		byte_cnt = min(*valuelen, byte_cnt);
-		hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
-						 byte_cnt, bno);
+		if (xfs_attr3_rmt_has_header(mp, attrns))
+			hdr_size = xfs_attr3_rmt_hdr_set(mp, dst, ino, *offset,
+					byte_cnt, bno);
 
 		memcpy(dst + hdr_size, *src, byte_cnt);
 
+		if (attrns & XFS_ATTR_VERITY)
+			xfs_attr_rmtverity_transform(bp, ino, dst + hdr_size,
+					byte_cnt);
+
 		/*
 		 * If this is the last block, zero the remainder of it.
 		 * Check that we are actually the last block, too.
@@ -406,6 +497,7 @@ xfs_attr_rmtval_get(
 	unsigned int		blkcnt = args->rmtblkcnt;
 	int			i;
 	unsigned int		offset = 0;
+	const struct xfs_buf_ops *ops = xfs_attr3_remote_buf_ops(args->attr_filter);
 
 	trace_xfs_attr_rmtval_get(args);
 
@@ -431,14 +523,15 @@ xfs_attr_rmtval_get(
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
 			error = xfs_buf_read(mp->m_ddev_targp, dblkno, dblkcnt,
-					0, &bp, &xfs_attr3_rmt_buf_ops);
+					0, &bp, ops);
 			if (xfs_metadata_is_sick(error))
 				xfs_dirattr_mark_sick(args->dp, XFS_ATTR_FORK);
 			if (error)
 				return error;
 
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp,
-					args->owner, &offset, &valuelen, &dst);
+					args->owner, args->attr_filter,
+					&offset, &valuelen, &dst);
 			xfs_buf_relse(bp);
 			if (error)
 				return error;
@@ -471,7 +564,7 @@ xfs_attr_rmt_find_hole(
 	 * straight byte to FSB conversion and have to take the header space
 	 * into account.
 	 */
-	blkcnt = xfs_attr3_rmt_blocks(mp, args->rmtvaluelen);
+	blkcnt = xfs_attr3_rmt_blocks(mp, args->attr_filter, args->rmtvaluelen);
 	error = xfs_bmap_first_unused(args->trans, args->dp, blkcnt, &lfileoff,
 						   XFS_ATTR_FORK);
 	if (error)
@@ -530,10 +623,10 @@ xfs_attr_rmtval_set_value(
 		error = xfs_buf_get(mp->m_ddev_targp, dblkno, dblkcnt, &bp);
 		if (error)
 			return error;
-		bp->b_ops = &xfs_attr3_rmt_buf_ops;
+		bp->b_ops = xfs_attr3_remote_buf_ops(args->attr_filter);
 
-		xfs_attr_rmtval_copyin(mp, bp, args->owner, &offset, &valuelen,
-				&src);
+		xfs_attr_rmtval_copyin(mp, bp, args->owner, args->attr_filter,
+				&offset, &valuelen, &src);
 
 		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
 		xfs_buf_relse(bp);
diff --git a/libxfs/xfs_attr_remote.h b/libxfs/xfs_attr_remote.h
index e3c6c7d774bf..344fea1b9b50 100644
--- a/libxfs/xfs_attr_remote.h
+++ b/libxfs/xfs_attr_remote.h
@@ -6,12 +6,13 @@
 #ifndef __XFS_ATTR_REMOTE_H__
 #define	__XFS_ATTR_REMOTE_H__
 
-unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrlen);
+unsigned int xfs_attr3_rmt_blocks(struct xfs_mount *mp, unsigned int attrns,
+		unsigned int attrlen);
 
 /* Number of rmt blocks needed to store the maximally sized attr value */
 static inline unsigned int xfs_attr3_max_rmt_blocks(struct xfs_mount *mp)
 {
-	return xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
+	return xfs_attr3_rmt_blocks(mp, 0, XFS_XATTR_SIZE_MAX);
 }
 
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
@@ -23,4 +24,7 @@ int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_blk(struct xfs_attr_intent *attr);
 int xfs_attr_rmtval_find_space(struct xfs_attr_intent *attr);
+
+const struct xfs_buf_ops *xfs_attr3_remote_buf_ops(unsigned int attrns);
+
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 27b9ad9f8b2e..c84b94da3f32 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -885,7 +885,27 @@ struct xfs_attr3_rmt_hdr {
 
 #define XFS_ATTR3_RMT_CRC_OFF	offsetof(struct xfs_attr3_rmt_hdr, rm_crc)
 
-unsigned int xfs_attr3_rmt_buf_space(struct xfs_mount *mp);
+unsigned int xfs_attr3_rmt_buf_space(struct xfs_mount *mp, unsigned int attrns);
+
+/*
+ * XFS_ATTR_VERITY remote attribute block format definition
+ *
+ * fsverity stores blocks of a merkle tree in the extended attributes.  The
+ * size of these blocks are a power of two, so we'd like to reduce overhead by
+ * not storing a remote header at the start of each ondisk block.  Because
+ * merkle tree blocks are themselves hashes of other merkle tree or data
+ * blocks, we can detect bitflips without needing our own checksum.  Settle for
+ * XORing the owner, blkno, magic, and metauuid into the start of each ondisk
+ * merkle tree block.
+ */
+#define XFS_ATTR3_RMTVERITY_MAGIC	0x5955434B	/* YUCK */
+
+struct xfs_attr3_rmtverity_hdr {
+	__be64	rmv_owner;
+	__be64	rmv_blkno;
+	__be32	rmv_magic;
+	uuid_t	rmv_uuid;
+} __packed;
 
 /* Number of bytes in a directory block. */
 static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 653ea6d64348..7a312aed2337 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -59,6 +59,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr,	80);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leafblock,	80);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_rmt_hdr,		56);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_rmtverity_hdr,	36);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_da3_blkinfo,		56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_da3_intnode,		64);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_da3_node_hdr,		64);
@@ -207,6 +208,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MIN << XFS_DQ_BIGTIME_SHIFT, 4);
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
 			16299260424LL);
+
 }
 
 #endif /* __XFS_ONDISK_H */
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 40a482660307..eb3a674fe161 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -26,6 +26,7 @@ extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_agi_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
+extern const struct xfs_buf_ops xfs_attr3_rmtverity_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
 extern const struct xfs_buf_ops xfs_bnobt_buf_ops;
 extern const struct xfs_buf_ops xfs_cntbt_buf_ops;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 09/38] xfs: add fs-verity ro-compat flag
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-04-30  3:32   ` [PATCH 08/38] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks Darrick J. Wong
@ 2024-04-30  3:33   ` Darrick J. Wong
  2024-04-30  3:33   ` [PATCH 10/38] xfs: add inode on-disk VERITY flag Darrick J. Wong
                     ` (28 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:33 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

To mark inodes with fs-verity enabled the new XFS_DIFLAG2_VERITY flag
will be added in further patch. This requires ro-compat flag to let
older kernels know that fs with fs-verity can not be modified.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/xfs_mount.h |    2 ++
 libxfs/xfs_format.h |    1 +
 libxfs/xfs_sb.c     |    2 ++
 3 files changed, 5 insertions(+)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index c78266e602b2..d63fee5718f1 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -187,6 +187,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
 #define XFS_FEAT_METADIR	(1ULL << 28)	/* metadata directory tree */
 #define XFS_FEAT_RTGROUPS	(1ULL << 29)	/* realtime groups */
+#define XFS_FEAT_VERITY		(1ULL << 30)	/* fs-verity */
 
 #define __XFS_HAS_FEAT(name, NAME) \
 static inline bool xfs_has_ ## name (struct xfs_mount *mp) \
@@ -234,6 +235,7 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64)
 __XFS_HAS_FEAT(exchange_range, EXCHANGE_RANGE)
 __XFS_HAS_FEAT(metadir, METADIR)
 __XFS_HAS_FEAT(rtgroups, RTGROUPS)
+__XFS_HAS_FEAT(verity, VERITY)
 
 static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp)
 {
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index e9585ba12ded..563f359f2f07 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -387,6 +387,7 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index 2e5161c63b6b..f8902c4778da 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -164,6 +164,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_VERITY)
+		features |= XFS_FEAT_VERITY;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 10/38] xfs: add inode on-disk VERITY flag
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-04-30  3:33   ` [PATCH 09/38] xfs: add fs-verity ro-compat flag Darrick J. Wong
@ 2024-04-30  3:33   ` Darrick J. Wong
  2024-04-30  3:33   ` [PATCH 11/38] xfs: add fs-verity support Darrick J. Wong
                     ` (27 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:33 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

Add flag to mark inodes which have fs-verity enabled on them (i.e.
descriptor exist and tree is built).

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_format.h     |    5 ++++-
 libxfs/xfs_inode_buf.c  |    8 ++++++++
 libxfs/xfs_inode_util.c |    2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 563f359f2f07..810f2556762b 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1190,6 +1190,7 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT	4	/* large extent counters */
+#define XFS_DIFLAG2_VERITY_BIT	5	/* inode sealed by fsverity */
 #define XFS_DIFLAG2_METADIR_BIT	63	/* filesystem metadata */
 
 #define XFS_DIFLAG2_DAX		(1ULL << XFS_DIFLAG2_DAX_BIT)
@@ -1197,6 +1198,7 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE	(1ULL << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1ULL << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1ULL << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_VERITY	(1ULL << XFS_DIFLAG2_VERITY_BIT)
 
 /*
  * The inode contains filesystem metadata and can be found through the metadata
@@ -1225,7 +1227,8 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADIR)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADIR | \
+	 XFS_DIFLAG2_VERITY)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 085c128c5422..12872acc70c0 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -692,6 +692,14 @@ xfs_dinode_verify(
 	    !xfs_has_rtreflink(mp))
 		return __this_address;
 
+	/* only regular files can have fsverity */
+	if (flags2 & XFS_DIFLAG2_VERITY) {
+		if (!xfs_has_verity(mp))
+			return __this_address;
+		if ((mode & S_IFMT) != S_IFREG)
+			return __this_address;
+	}
+
 	/* COW extent size hint validation */
 	fa = xfs_inode_validate_cowextsize(mp, be32_to_cpu(dip->di_cowextsize),
 			mode, flags, flags2);
diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c
index 432186283866..aba80a9769c3 100644
--- a/libxfs/xfs_inode_util.c
+++ b/libxfs/xfs_inode_util.c
@@ -124,6 +124,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+			flags |= FS_XFLAG_VERITY;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 11/38] xfs: add fs-verity support
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-04-30  3:33   ` [PATCH 10/38] xfs: add inode on-disk VERITY flag Darrick J. Wong
@ 2024-04-30  3:33   ` Darrick J. Wong
  2024-04-30  3:34   ` [PATCH 12/38] xfs: use merkle tree offset as attr hash Darrick J. Wong
                     ` (26 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:33 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

Add integration with fs-verity. The XFS store fs-verity metadata in
the extended file attributes. The metadata consist of verity
descriptor and Merkle tree blocks.

The descriptor is stored under "vdesc" extended attribute. The
Merkle tree blocks are stored under binary indexes which are offsets
into the Merkle tree.

When fs-verity is enabled on an inode, the XFS_IVERITY_CONSTRUCTION
flag is set meaning that the Merkle tree is being build. The
initialization ends with storing of verity descriptor and setting
inode on-disk flag (XFS_DIFLAG2_VERITY).

The verification on read is done in read path of iomap.

Merkle tree blocks are indexed by a per-AG rhashtable to reduce the time
it takes to load a block from disk in a manner that doesn't bloat struct
xfs_inode.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: replace caching implementation with an xarray, other cleanups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/Makefile        |    6 +++--
 libxfs/xfs_ag.h        |    8 +++++++
 libxfs/xfs_attr.c      |    4 +++
 libxfs/xfs_da_format.h |   14 ++++++++++++
 libxfs/xfs_ondisk.h    |    3 ++
 libxfs/xfs_verity.c    |   58 ++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_verity.h    |   13 +++++++++++
 7 files changed, 104 insertions(+), 2 deletions(-)
 create mode 100644 libxfs/xfs_verity.c
 create mode 100644 libxfs/xfs_verity.h


diff --git a/libxfs/Makefile b/libxfs/Makefile
index ac3484efe914..c67e9449835e 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -69,7 +69,8 @@ HFILES = \
 	xfs_shared.h \
 	xfs_trans_resv.h \
 	xfs_trans_space.h \
-	xfs_dir2_priv.h
+	xfs_dir2_priv.h \
+	xfs_verity.h
 
 CFILES = buf_mem.c \
 	cache.c \
@@ -131,7 +132,8 @@ CFILES = buf_mem.c \
 	xfs_trans_inode.c \
 	xfs_trans_resv.c \
 	xfs_trans_space.c \
-	xfs_types.c
+	xfs_types.c \
+	xfs_verity.c
 
 #
 # Tracing flags:
diff --git a/libxfs/xfs_ag.h b/libxfs/xfs_ag.h
index 80bf8771ea2a..792ce162312e 100644
--- a/libxfs/xfs_ag.h
+++ b/libxfs/xfs_ag.h
@@ -123,6 +123,12 @@ struct xfs_perag {
 
 	/* Hook to feed rmapbt updates to an active online repair. */
 	struct xfs_hooks	pag_rmap_update_hooks;
+
+# ifdef CONFIG_FS_VERITY
+	/* per-inode merkle tree caches */
+	spinlock_t		pagi_merkle_lock;
+	struct rhashtable	pagi_merkle_blobs;
+# endif /* CONFIG_FS_VERITY */
 #endif /* __KERNEL__ */
 };
 
@@ -135,6 +141,7 @@ struct xfs_perag {
 #define XFS_AGSTATE_ALLOWS_INODES	3
 #define XFS_AGSTATE_AGFL_NEEDS_RESET	4
 #define XFS_AGSTATE_NOALLOC		5
+#define XFS_AGSTATE_MERKLE		6
 
 #define __XFS_AG_OPSTATE(name, NAME) \
 static inline bool xfs_perag_ ## name (struct xfs_perag *pag) \
@@ -148,6 +155,7 @@ __XFS_AG_OPSTATE(prefers_metadata, PREFERS_METADATA)
 __XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES)
 __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
 __XFS_AG_OPSTATE(prohibits_alloc, NOALLOC)
+__XFS_AG_OPSTATE(caches_merkle, MERKLE)
 
 void xfs_free_unused_perag_range(struct xfs_mount *mp, xfs_agnumber_t agstart,
 			xfs_agnumber_t agend);
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index c2c411268904..94c425b984d2 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -26,6 +26,7 @@
 #include "xfs_trace.h"
 #include "defer_item.h"
 #include "xfs_parent.h"
+#include "xfs_verity.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1618,6 +1619,9 @@ xfs_attr_namecheck(
 	if (!xfs_attr_check_namespace(attr_flags))
 		return false;
 
+	if (attr_flags & XFS_ATTR_VERITY)
+		return xfs_verity_namecheck(attr_flags, name, length);
+
 	/*
 	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
 	 * out, so use >= for the length check.
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index c84b94da3f32..43e9d1f00a4a 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -929,4 +929,18 @@ struct xfs_parent_rec {
 	__be32	p_gen;
 } __packed;
 
+/*
+ * fs-verity attribute name format
+ *
+ * Merkle tree blocks are stored under extended attributes of the inode.  The
+ * name of the attributes are byte positions into the merkle data.
+ */
+struct xfs_merkle_key {
+	__be64	mk_pos;
+};
+
+/* ondisk xattr name used for the fsverity descriptor */
+#define XFS_VERITY_DESCRIPTOR_NAME	"vdesc"
+#define XFS_VERITY_DESCRIPTOR_NAME_LEN	(sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/libxfs/xfs_ondisk.h b/libxfs/xfs_ondisk.h
index 7a312aed2337..03aaf508e4a4 100644
--- a/libxfs/xfs_ondisk.h
+++ b/libxfs/xfs_ondisk.h
@@ -209,6 +209,9 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_VALUE(XFS_DQ_BIGTIME_EXPIRY_MAX << XFS_DQ_BIGTIME_SHIFT,
 			16299260424LL);
 
+	/* fs-verity xattrs */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_merkle_key,		8);
+	XFS_CHECK_VALUE(sizeof(XFS_VERITY_DESCRIPTOR_NAME),	6);
 }
 
 #endif /* __XFS_ONDISK_H */
diff --git a/libxfs/xfs_verity.c b/libxfs/xfs_verity.c
new file mode 100644
index 000000000000..8d1a759f995b
--- /dev/null
+++ b/libxfs/xfs_verity.c
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 Red Hat, Inc.
+ */
+#include "libxfs_priv.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_log_format.h"
+#include "xfs_attr.h"
+#include "xfs_verity.h"
+
+/* Set a merkle tree pos in preparation for setting merkle tree attrs. */
+void
+xfs_merkle_key_to_disk(
+	struct xfs_merkle_key	*key,
+	uint64_t		pos)
+{
+	key->mk_pos = cpu_to_be64(pos);
+}
+
+/* Retrieve the merkle tree pos from the attr data. */
+uint64_t
+xfs_merkle_key_from_disk(
+	const void		*attr_name,
+	int			namelen)
+{
+	const struct xfs_merkle_key *key = attr_name;
+
+	ASSERT(namelen == sizeof(struct xfs_merkle_key));
+
+	return be64_to_cpu(key->mk_pos);
+}
+
+/* Return true if verity attr name is valid. */
+bool
+xfs_verity_namecheck(
+	unsigned int		attr_flags,
+	const void		*name,
+	int			namelen)
+{
+	if (!(attr_flags & XFS_ATTR_VERITY))
+		return false;
+
+	/*
+	 * Merkle tree pages are stored under u64 indexes; verity descriptor
+	 * blocks are held in a named attribute.
+	 */
+	if (namelen != sizeof(struct xfs_merkle_key) &&
+	    namelen != XFS_VERITY_DESCRIPTOR_NAME_LEN)
+		return false;
+
+	return true;
+}
diff --git a/libxfs/xfs_verity.h b/libxfs/xfs_verity.h
new file mode 100644
index 000000000000..5813665c5a01
--- /dev/null
+++ b/libxfs/xfs_verity.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 Red Hat, Inc.
+ */
+#ifndef __XFS_VERITY_H__
+#define __XFS_VERITY_H__
+
+void xfs_merkle_key_to_disk(struct xfs_merkle_key *key, uint64_t pos);
+uint64_t xfs_merkle_key_from_disk(const void *attr_name, int namelen);
+bool xfs_verity_namecheck(unsigned int attr_flags, const void *name,
+		int namelen);
+
+#endif	/* __XFS_VERITY_H__ */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 12/38] xfs: use merkle tree offset as attr hash
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-04-30  3:33   ` [PATCH 11/38] xfs: add fs-verity support Darrick J. Wong
@ 2024-04-30  3:34   ` Darrick J. Wong
  2024-04-30  3:34   ` [PATCH 13/38] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
                     ` (25 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:34 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

I was exploring the fsverity metadata with xfs_db after creating a 220MB
verity file, and I noticed the following in the debugger output:

entries[0-75] = [hashval,nameidx,incomplete,root,secure,local,parent,verity]
0:[0,4076,0,0,0,0,0,1]
1:[0,1472,0,0,0,1,0,1]
2:[0x800,4056,0,0,0,0,0,1]
3:[0x800,4036,0,0,0,0,0,1]
...
72:[0x12000,2716,0,0,0,0,0,1]
73:[0x12000,2696,0,0,0,0,0,1]
74:[0x12800,2676,0,0,0,0,0,1]
75:[0x12800,2656,0,0,0,0,0,1]
...
nvlist[0].merkle_off = 0x18000
nvlist[1].merkle_off = 0
nvlist[2].merkle_off = 0x19000
nvlist[3].merkle_off = 0x1000
...
nvlist[71].merkle_off = 0x5b000
nvlist[72].merkle_off = 0x44000
nvlist[73].merkle_off = 0x5c000
nvlist[74].merkle_off = 0x45000
nvlist[75].merkle_off = 0x5d000

Within just this attr leaf block, there are 76 attr entries, but only 38
distinct hash values.  There are 415 merkle tree blocks for this file,
but we already have hash collisions.  This isn't good performance from
the standard da hash function because we're mostly shifting and rolling
zeroes around.

However, we don't even have to do that much work -- the merkle tree
block keys are themslves u64 values.  Truncate that value to 32 bits
(the size of xfs_dahash_t) and use that for the hash.  We won't have any
collisions between merkle tree blocks until that tree grows to 2^32nd
blocks.  On a 4k block filesystem, we won't hit that unless the file
contains more than 2^49 bytes, assuming sha256.

As a side effect, the keys for merkle tree blocks get written out in
roughly sequential order, though I didn't observe any change in
performance.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 libxfs/xfs_attr.c      |    2 ++
 libxfs/xfs_da_format.h |    6 ++++++
 libxfs/xfs_verity.c    |   16 ++++++++++++++++
 libxfs/xfs_verity.h    |    1 +
 4 files changed, 25 insertions(+)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 94c425b984d2..2f491d072294 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -461,6 +461,8 @@ xfs_attr_hashval(
 
 	if (attr_flags & XFS_ATTR_PARENT)
 		return xfs_parent_hashattr(mp, name, namelen, value, valuelen);
+	if (attr_flags & XFS_ATTR_VERITY)
+		return xfs_verity_hashname(name, namelen);
 
 	return xfs_attr_hashname(name, namelen);
 }
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 43e9d1f00a4a..c95e8ca22daa 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -943,4 +943,10 @@ struct xfs_merkle_key {
 #define XFS_VERITY_DESCRIPTOR_NAME	"vdesc"
 #define XFS_VERITY_DESCRIPTOR_NAME_LEN	(sizeof(XFS_VERITY_DESCRIPTOR_NAME) - 1)
 
+/*
+ * Merkle tree blocks cannot be smaller than 1k in size, so the hash function
+ * can right-shift the merkle offset by this amount without losing anything.
+ */
+#define XFS_VERITY_HASH_SHIFT		(10)
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/libxfs/xfs_verity.c b/libxfs/xfs_verity.c
index 8d1a759f995b..907a0e0fcf41 100644
--- a/libxfs/xfs_verity.c
+++ b/libxfs/xfs_verity.c
@@ -56,3 +56,19 @@ xfs_verity_namecheck(
 
 	return true;
 }
+
+/*
+ * Compute name hash for a verity attribute.  For merkle tree blocks, we want
+ * to use the merkle tree block offset as the hash value to avoid collisions
+ * between blocks unless the merkle tree becomes larger than 2^32 blocks.
+ */
+xfs_dahash_t
+xfs_verity_hashname(
+	const uint8_t		*name,
+	unsigned int		namelen)
+{
+	if (namelen != sizeof(struct xfs_merkle_key))
+		return xfs_attr_hashname(name, namelen);
+
+	return xfs_merkle_key_from_disk(name, namelen) >> XFS_VERITY_HASH_SHIFT;
+}
diff --git a/libxfs/xfs_verity.h b/libxfs/xfs_verity.h
index 5813665c5a01..3d7485c511d5 100644
--- a/libxfs/xfs_verity.h
+++ b/libxfs/xfs_verity.h
@@ -9,5 +9,6 @@ void xfs_merkle_key_to_disk(struct xfs_merkle_key *key, uint64_t pos);
 uint64_t xfs_merkle_key_from_disk(const void *attr_name, int namelen);
 bool xfs_verity_namecheck(unsigned int attr_flags, const void *name,
 		int namelen);
+xfs_dahash_t xfs_verity_hashname(const uint8_t *name, unsigned int namelen);
 
 #endif	/* __XFS_VERITY_H__ */


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 13/38] xfs: advertise fs-verity being available on filesystem
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-04-30  3:34   ` [PATCH 12/38] xfs: use merkle tree offset as attr hash Darrick J. Wong
@ 2024-04-30  3:34   ` Darrick J. Wong
  2024-04-30  3:34   ` [PATCH 14/38] xfs: report verity failures through the health system Darrick J. Wong
                     ` (24 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:34 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Advertise that this filesystem supports fsverity.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 libxfs/xfs_fs.h |    1 +
 libxfs/xfs_sb.c |    2 ++
 2 files changed, 3 insertions(+)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index f9a6a678f1b4..edc019d89702 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -246,6 +246,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE (1 << 24) /* exchange range */
 #define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 25) /* linux parent pointers */
 
+#define XFS_FSOP_GEOM_FLAGS_VERITY	(1U << 29) /* fs-verity */
 #define XFS_FSOP_GEOM_FLAGS_METADIR	(1U << 30) /* metadata directories */
 
 /*
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index f8902c4778da..936071abb207 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -1434,6 +1434,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE;
 	if (xfs_has_metadir(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR;
+	if (xfs_has_verity(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_VERITY;
 	geo->rtsectsize = sbp->sb_blocksize;
 	geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 14/38] xfs: report verity failures through the health system
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-04-30  3:34   ` [PATCH 13/38] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
@ 2024-04-30  3:34   ` Darrick J. Wong
  2024-04-30  3:34   ` [PATCH 15/38] xfs: enable ro-compat fs-verity flag Darrick J. Wong
                     ` (23 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:34 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Record verity failures and report them through the health system.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 libxfs/xfs_fs.h     |    1 +
 libxfs/xfs_health.h |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index edc019d89702..bc529d862af7 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -424,6 +424,7 @@ struct xfs_bulkstat {
 #define XFS_BS_SICK_SYMLINK	(1 << 6)  /* symbolic link remote target */
 #define XFS_BS_SICK_PARENT	(1 << 7)  /* parent pointers */
 #define XFS_BS_SICK_DIRTREE	(1 << 8)  /* directory tree structure */
+#define XFS_BS_SICK_DATA	(1 << 9)  /* file data */
 
 /*
  * Project quota id helpers (previously projid was 16bit only
diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h
index 89b80e957917..0f8533335e25 100644
--- a/libxfs/xfs_health.h
+++ b/libxfs/xfs_health.h
@@ -105,6 +105,7 @@ struct xfs_rtgroup;
 /* Don't propagate sick status to ag health summary during inactivation */
 #define XFS_SICK_INO_FORGET	(1 << 12)
 #define XFS_SICK_INO_DIRTREE	(1 << 13)  /* directory tree structure */
+#define XFS_SICK_INO_DATA	(1 << 14)  /* file data */
 
 /* Primary evidence of health problems in a given group. */
 #define XFS_SICK_FS_PRIMARY	(XFS_SICK_FS_COUNTERS | \
@@ -143,7 +144,8 @@ struct xfs_rtgroup;
 				 XFS_SICK_INO_XATTR | \
 				 XFS_SICK_INO_SYMLINK | \
 				 XFS_SICK_INO_PARENT | \
-				 XFS_SICK_INO_DIRTREE)
+				 XFS_SICK_INO_DIRTREE | \
+				 XFS_SICK_INO_DATA)
 
 #define XFS_SICK_INO_ZAPPED	(XFS_SICK_INO_BMBTD_ZAPPED | \
 				 XFS_SICK_INO_BMBTA_ZAPPED | \


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 15/38] xfs: enable ro-compat fs-verity flag
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-04-30  3:34   ` [PATCH 14/38] xfs: report verity failures through the health system Darrick J. Wong
@ 2024-04-30  3:34   ` Darrick J. Wong
  2024-04-30  3:35   ` [PATCH 16/38] libfrog: add fsverity to xfs_report_geom output Darrick J. Wong
                     ` (22 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:34 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

Finalize fs-verity integration in XFS by making kernel fs-verity
aware with ro-compat flag.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: add spaces]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_format.h |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 810f2556762b..78a12705a88d 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -389,10 +389,11 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
 #define XFS_SB_FEAT_RO_COMPAT_VERITY   (1 << 4)		/* fs-verity */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
-		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
-		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT   | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT   | \
+		 XFS_SB_FEAT_RO_COMPAT_REFLINK  | \
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+		 XFS_SB_FEAT_RO_COMPAT_VERITY)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 16/38] libfrog: add fsverity to xfs_report_geom output
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-04-30  3:34   ` [PATCH 15/38] xfs: enable ro-compat fs-verity flag Darrick J. Wong
@ 2024-04-30  3:35   ` Darrick J. Wong
  2024-04-30  3:35   ` [PATCH 17/38] xfs_db: introduce attr_modify command Darrick J. Wong
                     ` (21 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:35 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Announce the presence of fsverity on a filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/fsgeom.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 41958f00ce34..99d6d98e4679 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -34,6 +34,7 @@ xfs_report_geom(
 	int			exchangerange;
 	int			parent;
 	int			metadir;
+	int			verity;
 
 	isint = geo->logstart > 0;
 	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -55,13 +56,14 @@ xfs_report_geom(
 	exchangerange = geo->flags & XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE ? 1 : 0;
 	parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
 	metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0;
+	verity = geo->flags & XFS_FSOP_GEOM_FLAGS_VERITY ? 1 : 0;
 
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
 "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
 "         =%-22s reflink=%-4u bigtime=%u inobtcount=%u nrext64=%u\n"
-"         =%-22s exchange=%-3u metadir=%u\n"
+"         =%-22s exchange=%-3u metadir=%u verity=%u\n"
 "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 "         =%-22s sunit=%-6u swidth=%u blks\n"
 "naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d, parent=%d\n"
@@ -73,7 +75,7 @@ xfs_report_geom(
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
 		"", reflink_enabled, bigtime_enabled, inobtcount, nrext64,
-		"", exchangerange, metadir,
+		"", exchangerange, metadir, verity,
 		"", geo->blocksize, (unsigned long long)geo->datablocks,
 			geo->imaxpct,
 		"", geo->sunit, geo->swidth,


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 17/38] xfs_db: introduce attr_modify command
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-04-30  3:35   ` [PATCH 16/38] libfrog: add fsverity to xfs_report_geom output Darrick J. Wong
@ 2024-04-30  3:35   ` Darrick J. Wong
  2024-04-30  3:35   ` [PATCH 18/38] xfs_db: add ATTR_PARENT support to " Darrick J. Wong
                     ` (20 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:35 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

This command allows for writing value over already existing value of
inode's extended attribute. The difference from 'write' command is
that extended attribute can be addressed by name and new value is
written over old value.

The command also allows addressing via binary names (introduced by
parent pointers). This can be done by specified name length (-m) and
value in #hex format.

Example:

	# Modify attribute with name #00000042 by overwriting 8
	# bytes at offset 3 with value #0000000000FF00FF
	attr_modify -o 3 -m 4 -v 8 #42 #FF00FF

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 db/attrset.c |  210 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 db/write.c   |    2 -
 db/write.h   |    1 
 3 files changed, 210 insertions(+), 3 deletions(-)


diff --git a/db/attrset.c b/db/attrset.c
index 81d530055193..cfd6d9c1c954 100644
--- a/db/attrset.c
+++ b/db/attrset.c
@@ -16,10 +16,12 @@
 #include "field.h"
 #include "inode.h"
 #include "malloc.h"
+#include "write.h"
 #include <sys/xattr.h>
 
 static int		attr_set_f(int argc, char **argv);
 static int		attr_remove_f(int argc, char **argv);
+static int		attr_modify_f(int argc, char **argv);
 static void		attrset_help(void);
 
 static const cmdinfo_t	attr_set_cmd =
@@ -30,6 +32,11 @@ static const cmdinfo_t	attr_remove_cmd =
 	{ "attr_remove", "aremove", attr_remove_f, 1, -1, 0,
 	  N_("[-r|-s|-u|-p] [-n] name"),
 	  N_("remove the named attribute from the current inode"), attrset_help };
+static const cmdinfo_t	attr_modify_cmd =
+	{ "attr_modify", "amodify", attr_modify_f, 1, -1, 0,
+	  N_("[-r|-s|-u] [-o n] [-v n] [-m n] name value"),
+	  N_("modify value of the named attribute of the current inode"),
+		attrset_help };
 
 static void
 attrset_help(void)
@@ -38,8 +45,9 @@ attrset_help(void)
 "\n"
 " The 'attr_set' and 'attr_remove' commands provide interfaces for debugging\n"
 " the extended attribute allocation and removal code.\n"
-" Both commands require an attribute name to be specified, and the attr_set\n"
-" command allows an optional value length (-v) to be provided as well.\n"
+" Both commands together with 'attr_modify' require an attribute name to be\n"
+" specified. The attr_set and attr_modify commands allow an optional value\n"
+" length (-v) to be provided as well.\n"
 " There are 4 namespace flags:\n"
 "  -r -- 'root'\n"
 "  -u -- 'user'		(default)\n"
@@ -49,6 +57,9 @@ attrset_help(void)
 " For attr_set, these options further define the type of set operation:\n"
 "  -C -- 'create'    - create attribute, fail if it already exists\n"
 "  -R -- 'replace'   - replace attribute, fail if it does not exist\n"
+" attr_modify command provides more of the following options:\n"
+"  -m -- 'name length'   - specify length of the name (handy with binary names)\n"
+"  -o -- 'value offset'   - offset new value within old attr's value\n"
 " The backward compatibility mode 'noattr2' can be emulated (-n) also.\n"
 "\n"));
 }
@@ -61,6 +72,7 @@ attrset_init(void)
 
 	add_command(&attr_set_cmd);
 	add_command(&attr_remove_cmd);
+	add_command(&attr_modify_cmd);
 }
 
 static unsigned char *
@@ -402,3 +414,197 @@ attr_remove_f(
 		free((void *)args.name);
 	return 0;
 }
+
+static int
+attr_modify_f(
+	int			argc,
+	char			**argv)
+{
+	struct xfs_da_args	args = {
+		.geo		= mp->m_attr_geo,
+		.whichfork	= XFS_ATTR_FORK,
+		.op_flags	= XFS_DA_OP_OKNOENT,
+	};
+	int			c;
+	int			offset = 0;
+	char			*sp;
+	char			*converted;
+	uint8_t			*name;
+	int			namelen = 0;
+	uint8_t			*value;
+	int			valuelen = 0;
+	int			error;
+
+	if (cur_typ == NULL) {
+		dbprintf(_("no current type\n"));
+		return 0;
+	}
+
+	if (cur_typ->typnm != TYP_INODE) {
+		dbprintf(_("current type is not inode\n"));
+		return 0;
+	}
+
+	while ((c = getopt(argc, argv, "rusnv:o:m:")) != EOF) {
+		switch (c) {
+		/* namespaces */
+		case 'r':
+			args.attr_filter |= LIBXFS_ATTR_ROOT;
+			args.attr_filter &= ~LIBXFS_ATTR_SECURE;
+			break;
+		case 'u':
+			args.attr_filter &= ~(LIBXFS_ATTR_ROOT |
+					      LIBXFS_ATTR_SECURE);
+			break;
+		case 's':
+			args.attr_filter |= LIBXFS_ATTR_SECURE;
+			args.attr_filter &= ~LIBXFS_ATTR_ROOT;
+			break;
+
+		case 'n':
+			/*
+			 * We never touch attr2 these days; leave this here to
+			 * avoid breaking scripts.
+			 */
+			break;
+
+		case 'o':
+			offset = strtol(optarg, &sp, 0);
+			if (*sp != '\0' || offset < 0 || offset > XFS_XATTR_SIZE_MAX) {
+				dbprintf(_("bad attr_modify offset %s\n"),
+						optarg);
+				return 0;
+			}
+			break;
+
+		case 'v':
+			valuelen = strtol(optarg, &sp, 0);
+			if (*sp != '\0' || offset < 0 || valuelen > XFS_XATTR_SIZE_MAX) {
+				dbprintf(_("bad attr_modify value len %s\n"),
+						optarg);
+				return 0;
+			}
+			break;
+
+		case 'm':
+			namelen = strtol(optarg, &sp, 0);
+			if (*sp != '\0' || offset < 0 || namelen > MAXNAMELEN) {
+				dbprintf(_("bad attr_modify name len %s\n"),
+						optarg);
+				return 0;
+			}
+			break;
+
+		default:
+			dbprintf(_("bad option for attr_modify command\n"));
+			return 0;
+		}
+	}
+
+	if (optind != argc - 2) {
+		dbprintf(_("too few options for attr_modify\n"));
+		return 0;
+	}
+
+	if (namelen >= MAXNAMELEN) {
+		dbprintf(_("name too long\n"));
+		return 0;
+	}
+
+	if (!namelen) {
+		if (argv[optind][0] == '#')
+			namelen = strlen(argv[optind])/2;
+		if (argv[optind][0] == '"')
+			namelen = strlen(argv[optind]) - 2;
+	}
+
+	name = xcalloc(namelen, sizeof(uint8_t));
+	converted = convert_arg(argv[optind], (int)(namelen*8));
+	if (!converted) {
+		dbprintf(_("invalid name\n"));
+		goto out_free_name;
+	}
+
+	memcpy(name, converted, namelen);
+	args.name = (const uint8_t *)name;
+	args.namelen = namelen;
+
+	optind++;
+
+	if (valuelen > XFS_XATTR_SIZE_MAX) {
+		dbprintf(_("value too long\n"));
+		goto out_free_name;
+	}
+
+	if (!valuelen) {
+		if (argv[optind][0] == '#')
+			valuelen = strlen(argv[optind])/2;
+		if (argv[optind][0] == '"')
+			valuelen = strlen(argv[optind]) - 2;
+	}
+
+	if ((valuelen + offset) > XFS_XATTR_SIZE_MAX) {
+		dbprintf(_("offsetted value too long\n"));
+		goto out_free_name;
+	}
+
+	value = xcalloc(valuelen, sizeof(uint8_t));
+	converted = convert_arg(argv[optind], (int)(valuelen*8));
+	if (!converted) {
+		dbprintf(_("invalid value\n"));
+		goto out_free_value;
+	}
+	memcpy(value, converted, valuelen);
+
+	if (libxfs_iget(mp, NULL, iocur_top->ino, 0, &args.dp)) {
+		dbprintf(_("failed to iget inode %llu\n"),
+			(unsigned long long)iocur_top->ino);
+		goto out;
+	}
+
+	args.owner = iocur_top->ino;
+	libxfs_attr_sethash(&args);
+
+	/*
+	 * Look up attr value with a maximally long length and a null buffer
+	 * to return the value and the correct length.
+	 */
+	args.valuelen = XATTR_SIZE_MAX;
+	error = -libxfs_attr_get(&args);
+	if (error) {
+		dbprintf(_("failed to get attr '%s' from inode %llu: %s\n"),
+			args.name, (unsigned long long)iocur_top->ino,
+			strerror(error));
+		goto out;
+	}
+
+	if (valuelen + offset > args.valuelen) {
+		dbprintf(_("new value too long\n"));
+		goto out;
+	}
+
+	/* modify value */
+	memcpy((uint8_t *)args.value + offset, value, valuelen);
+
+	error = -libxfs_attr_set(&args, XFS_ATTRUPDATE_REPLACE, false);
+	if (error) {
+		dbprintf(_("failed to set attr '%s' from inode %llu: %s\n"),
+			(unsigned char *)args.name,
+			(unsigned long long)iocur_top->ino,
+			strerror(error));
+		goto out;
+	}
+
+	/* refresh with updated inode contents */
+	set_cur_inode(iocur_top->ino);
+
+out:
+	if (args.dp)
+		libxfs_irele(args.dp);
+	xfree(args.value);
+out_free_value:
+	xfree(value);
+out_free_name:
+	xfree(name);
+	return 0;
+}
diff --git a/db/write.c b/db/write.c
index 96dea70519ba..9295dbc92a40 100644
--- a/db/write.c
+++ b/db/write.c
@@ -511,7 +511,7 @@ convert_oct(
  * are adjusted in the buffer so that the first input bit is to be be written to
  * the first bit in the output.
  */
-static char *
+char *
 convert_arg(
 	char		*arg,
 	int		bit_length)
diff --git a/db/write.h b/db/write.h
index e24e07d4c464..4ba04d0300fb 100644
--- a/db/write.h
+++ b/db/write.h
@@ -6,6 +6,7 @@
 
 struct field;
 
+extern char	*convert_arg(char *arg, int bit_length);
 extern void	write_init(void);
 extern void	write_block(const field_t *fields, int argc, char **argv);
 extern void	write_struct(const field_t *fields, int argc, char **argv);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 18/38] xfs_db: add ATTR_PARENT support to attr_modify command
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-04-30  3:35   ` [PATCH 17/38] xfs_db: introduce attr_modify command Darrick J. Wong
@ 2024-04-30  3:35   ` Darrick J. Wong
  2024-04-30  3:35   ` [PATCH 19/38] xfs_db: make attr_set/remove/modify be able to handle fs-verity attrs Darrick J. Wong
                     ` (19 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:35 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Add the parent namespace to the attr_modify command.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attrset.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)


diff --git a/db/attrset.c b/db/attrset.c
index cfd6d9c1c954..915c20f8beb8 100644
--- a/db/attrset.c
+++ b/db/attrset.c
@@ -445,20 +445,23 @@ attr_modify_f(
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "rusnv:o:m:")) != EOF) {
+	while ((c = getopt(argc, argv, "ruspnv:o:m:")) != EOF) {
 		switch (c) {
 		/* namespaces */
 		case 'r':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= LIBXFS_ATTR_ROOT;
-			args.attr_filter &= ~LIBXFS_ATTR_SECURE;
 			break;
 		case 'u':
-			args.attr_filter &= ~(LIBXFS_ATTR_ROOT |
-					      LIBXFS_ATTR_SECURE);
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			break;
 		case 's':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= LIBXFS_ATTR_SECURE;
-			args.attr_filter &= ~LIBXFS_ATTR_ROOT;
+			break;
+		case 'p':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
+			args.attr_filter |= XFS_ATTR_PARENT;
 			break;
 
 		case 'n':


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 19/38] xfs_db: make attr_set/remove/modify be able to handle fs-verity attrs
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-04-30  3:35   ` [PATCH 18/38] xfs_db: add ATTR_PARENT support to " Darrick J. Wong
@ 2024-04-30  3:35   ` Darrick J. Wong
  2024-04-30  3:36   ` [PATCH 20/38] man: document attr_modify command Darrick J. Wong
                     ` (18 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:35 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 db/attrset.c             |   28 +++++++++++++++++++++-------
 libxfs/libxfs_api_defs.h |    1 +
 2 files changed, 22 insertions(+), 7 deletions(-)


diff --git a/db/attrset.c b/db/attrset.c
index 915c20f8beb8..477ea7cb29c1 100644
--- a/db/attrset.c
+++ b/db/attrset.c
@@ -26,15 +26,15 @@ static void		attrset_help(void);
 
 static const cmdinfo_t	attr_set_cmd =
 	{ "attr_set", "aset", attr_set_f, 1, -1, 0,
-	  N_("[-r|-s|-u|-p] [-n] [-R|-C] [-v n] name"),
+	  N_("[-r|-s|-u|-p|-f] [-n] [-R|-C] [-v n] name"),
 	  N_("set the named attribute on the current inode"), attrset_help };
 static const cmdinfo_t	attr_remove_cmd =
 	{ "attr_remove", "aremove", attr_remove_f, 1, -1, 0,
-	  N_("[-r|-s|-u|-p] [-n] name"),
+	  N_("[-r|-s|-u|-p|-f] [-n] name"),
 	  N_("remove the named attribute from the current inode"), attrset_help };
 static const cmdinfo_t	attr_modify_cmd =
 	{ "attr_modify", "amodify", attr_modify_f, 1, -1, 0,
-	  N_("[-r|-s|-u] [-o n] [-v n] [-m n] name value"),
+	  N_("[-r|-s|-u|-f] [-o n] [-v n] [-m n] name value"),
 	  N_("modify value of the named attribute of the current inode"),
 		attrset_help };
 
@@ -53,6 +53,7 @@ attrset_help(void)
 "  -u -- 'user'		(default)\n"
 "  -s -- 'secure'\n"
 "  -p -- 'parent'\n"
+"  -f -- 'fs-verity'\n"
 "\n"
 " For attr_set, these options further define the type of set operation:\n"
 "  -C -- 'create'    - create attribute, fail if it already exists\n"
@@ -116,7 +117,8 @@ get_buf_from_file(
 
 #define LIBXFS_ATTR_NS		(LIBXFS_ATTR_SECURE | \
 				 LIBXFS_ATTR_ROOT | \
-				 LIBXFS_ATTR_PARENT)
+				 LIBXFS_ATTR_PARENT | \
+				 LIBXFS_ATTR_VERITY)
 
 static int
 attr_set_f(
@@ -144,7 +146,7 @@ attr_set_f(
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "ruspCRnN:v:V:")) != EOF) {
+	while ((c = getopt(argc, argv, "fruspCRnN:v:V:")) != EOF) {
 		switch (c) {
 		/* namespaces */
 		case 'r':
@@ -162,6 +164,10 @@ attr_set_f(
 			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= XFS_ATTR_PARENT;
 			break;
+		case 'f':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
+			args.attr_filter |= LIBXFS_ATTR_VERITY;
+			break;
 
 		/* modifiers */
 		case 'C':
@@ -317,7 +323,7 @@ attr_remove_f(
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "ruspnN:")) != EOF) {
+	while ((c = getopt(argc, argv, "fruspnN:")) != EOF) {
 		switch (c) {
 		/* namespaces */
 		case 'r':
@@ -335,6 +341,10 @@ attr_remove_f(
 			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= XFS_ATTR_PARENT;
 			break;
+		case 'f':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
+			args.attr_filter |= LIBXFS_ATTR_VERITY;
+			break;
 
 		case 'N':
 			name_from_file = optarg;
@@ -445,7 +455,7 @@ attr_modify_f(
 		return 0;
 	}
 
-	while ((c = getopt(argc, argv, "ruspnv:o:m:")) != EOF) {
+	while ((c = getopt(argc, argv, "fruspnv:o:m:")) != EOF) {
 		switch (c) {
 		/* namespaces */
 		case 'r':
@@ -463,6 +473,10 @@ attr_modify_f(
 			args.attr_filter &= ~LIBXFS_ATTR_NS;
 			args.attr_filter |= XFS_ATTR_PARENT;
 			break;
+		case 'f':
+			args.attr_filter &= ~LIBXFS_ATTR_NS;
+			args.attr_filter |= LIBXFS_ATTR_VERITY;
+			break;
 
 		case 'n':
 			/*
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 1b6efac9290d..6ad728af2e0a 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -16,6 +16,7 @@
 #define LIBXFS_ATTR_ROOT		XFS_ATTR_ROOT
 #define LIBXFS_ATTR_SECURE		XFS_ATTR_SECURE
 #define LIBXFS_ATTR_PARENT		XFS_ATTR_PARENT
+#define LIBXFS_ATTR_VERITY		XFS_ATTR_VERITY
 
 #define xfs_agfl_size			libxfs_agfl_size
 #define xfs_agfl_walk			libxfs_agfl_walk


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 20/38] man: document attr_modify command
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-04-30  3:35   ` [PATCH 19/38] xfs_db: make attr_set/remove/modify be able to handle fs-verity attrs Darrick J. Wong
@ 2024-04-30  3:36   ` Darrick J. Wong
  2024-04-30  3:36   ` [PATCH 21/38] xfs_db: create hex string as a field type Darrick J. Wong
                     ` (17 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:36 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong
  Cc: Darrick J. Wong, linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@djwong.org>

Add some documentation for the new attr_modify command.  I'm not sure
all what this this supposed to do, but there needs to be /something/ to
satisfy the documentation tests.

Signed-off-by: Darrick J. Wong <djwong@djwong.org>
---
 man/man8/xfs_db.8 |   42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)


diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 701035cb986d..2c5aed2cf38c 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -206,7 +206,45 @@ Displays the length, free block count, per-AG reservation size, and per-AG
 reservation usage for a given AG.
 If no argument is given, display information for all AGs.
 .TP
-.BI "attr_remove [\-p|\-r|\-u|\-s] [\-n] [\-N " namefile "|" name "] "
+.BI "attr_modify [\-p|\-r|\-u|\-s|\-f] [\-o n] [\-v n] [\-m n] name value
+Modifies an extended attribute on the current file with the given name.
+
+If the
+.B name
+is a string that can be converted into an integer value, it will be.
+.RS 1.0i
+.TP 0.4i
+.B \-p
+Sets the attribute in the parent namespace.
+Only one namespace option can be specified.
+.TP
+.B \-r
+Sets the attribute in the root namespace.
+Only one namespace option can be specified.
+.TP
+.B \-u
+Sets the attribute in the user namespace.
+Only one namespace option can be specified.
+.TP
+.B \-s
+Sets the attribute in the secure namespace.
+Only one namespace option can be specified.
+.TP
+.B \-f
+Sets the attribute in the verity namespace.
+Only one namespace option can be specified.
+.TP
+.B \-m
+Length of the attr name.
+.TP
+.B \-o
+Offset into the attr value to place the new contents.
+.TP
+.B \-v
+Length of the attr value.
+.RE
+.TP
+.BI "attr_remove [\-p|\-r|\-u|\-s|\-f] [\-n] [\-N " namefile "|" name "] "
 Remove the specified extended attribute from the current file.
 .RS 1.0i
 .TP 0.4i
@@ -233,7 +271,7 @@ Read the name from this file.
 Do not enable 'noattr2' mode on V4 filesystems.
 .RE
 .TP
-.BI "attr_set [\-p\-r|\-u|\-s] [\-n] [\-R|\-C] [\-v " valuelen "|\-V " valuefile "] [\-N " namefile "|" name "] "
+.BI "attr_set [\-p\-r|\-u|\-s|\-f] [\-n] [\-R|\-C] [\-v " valuelen "|\-V " valuefile "] [\-N " namefile "|" name "] "
 Sets an extended attribute on the current file with the given name.
 .RS 1.0i
 .TP 0.4i


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 21/38] xfs_db: create hex string as a field type
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-04-30  3:36   ` [PATCH 20/38] man: document attr_modify command Darrick J. Wong
@ 2024-04-30  3:36   ` Darrick J. Wong
  2024-04-30  3:36   ` [PATCH 22/38] xfs_db: dump verity features and metadata Darrick J. Wong
                     ` (16 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:36 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Define a field type for hex strings so that we can print things such as:

file_digest = deadbeef31337023aaaa

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/field.c  |    2 ++
 db/field.h  |    1 +
 db/fprint.c |   24 ++++++++++++++++++++++++
 db/fprint.h |    2 ++
 4 files changed, 29 insertions(+)


diff --git a/db/field.c b/db/field.c
index d5879f4ada7d..066239ae6073 100644
--- a/db/field.c
+++ b/db/field.c
@@ -158,6 +158,8 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_CHARNS, "charns", fp_charns, NULL, SI(bitsz(char)), 0, NULL,
 	  NULL },
 	{ FLDT_CHARS, "chars", fp_num, "%c", SI(bitsz(char)), 0, NULL, NULL },
+	{ FLDT_HEXSTRING, "hexstring", fp_hexstr, NULL, SI(bitsz(char)), 0, NULL,
+	  NULL },
 	{ FLDT_REXTLEN, "rextlen", fp_num, "%u", SI(RMAPBT_BLOCKCOUNT_BITLEN),
 	  0, NULL, NULL },
 	{ FLDT_RFILEOFFD, "rfileoffd", fp_num, "%llu", SI(RMAPBT_OFFSET_BITLEN),
diff --git a/db/field.h b/db/field.h
index f1b4f4e217de..89752d07b84c 100644
--- a/db/field.h
+++ b/db/field.h
@@ -67,6 +67,7 @@ typedef enum fldt	{
 	FLDT_CFSBLOCK,
 	FLDT_CHARNS,
 	FLDT_CHARS,
+	FLDT_HEXSTRING,
 	FLDT_REXTLEN,
 	FLDT_RFILEOFFD,
 	FLDT_REXTFLG,
diff --git a/db/fprint.c b/db/fprint.c
index ac916d511e87..182e5b7cb27c 100644
--- a/db/fprint.c
+++ b/db/fprint.c
@@ -54,6 +54,30 @@ fp_charns(
 	return 1;
 }
 
+int
+fp_hexstr(
+	void	*obj,
+	int	bit,
+	int	count,
+	char	*fmtstr,
+	int	size,
+	int	arg,
+	int	base,
+	int	array)
+{
+	int	i;
+	char	*p;
+
+	ASSERT(bitoffs(bit) == 0);
+	ASSERT(size == bitsz(char));
+	for (i = 0, p = (char *)obj + byteize(bit);
+	     i < count && !seenint();
+	     i++, p++) {
+		dbprintf("%02x", *p & 0xff);
+	}
+	return 1;
+}
+
 int
 fp_num(
 	void		*obj,
diff --git a/db/fprint.h b/db/fprint.h
index a1ea935ca531..348e04215588 100644
--- a/db/fprint.h
+++ b/db/fprint.h
@@ -9,6 +9,8 @@ typedef int (*prfnc_t)(void *obj, int bit, int count, char *fmtstr, int size,
 
 extern int	fp_charns(void *obj, int bit, int count, char *fmtstr, int size,
 			  int arg, int base, int array);
+extern int	fp_hexstr(void *obj, int bit, int count, char *fmtstr, int size,
+			  int arg, int base, int array);
 extern int	fp_num(void *obj, int bit, int count, char *fmtstr, int size,
 		       int arg, int base, int array);
 extern int	fp_sarray(void *obj, int bit, int count, char *fmtstr, int size,


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 22/38] xfs_db: dump verity features and metadata
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-04-30  3:36   ` [PATCH 21/38] xfs_db: create hex string as a field type Darrick J. Wong
@ 2024-04-30  3:36   ` Darrick J. Wong
  2024-04-30  3:36   ` [PATCH 23/38] xfs_db: dump merkle tree data Darrick J. Wong
                     ` (15 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:36 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Teach the debugger how to decode the merkle tree block number in the
attr name, and to display the fact that this is a verity filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/sb.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/db/sb.c b/db/sb.c
index cf5251cd728f..e4ca8f72ae97 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -857,6 +857,8 @@ version_string(
 		strcat(s, ",METADIR");
 	if (xfs_has_rtgroups(mp))
 		strcat(s, ",RTGROUPS");
+	if (xfs_has_verity(mp))
+		strcat(s, ",VERITY");
 	return s;
 }
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 23/38] xfs_db: dump merkle tree data
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-04-30  3:36   ` [PATCH 22/38] xfs_db: dump verity features and metadata Darrick J. Wong
@ 2024-04-30  3:36   ` Darrick J. Wong
  2024-04-30  3:37   ` [PATCH 24/38] xfs_db: dump the verity descriptor Darrick J. Wong
                     ` (14 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:36 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Teach the debugger to dump the specific fields in the fsverity xattr
blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c      |  189 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 db/attrshort.c |   50 +++++++++++++++
 2 files changed, 237 insertions(+), 2 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index 8e2bce7b7e02..7d8bdeb53032 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -35,6 +35,12 @@ static int	attr3_remote_data_count(void *obj, int startoff);
 
 static int	attr_leaf_value_pptr_count(void *obj, int startoff);
 
+static bool	is_verity_file(void);
+static int	attr3_remote_merkledata_count(void *obj, int startoff);
+static int	attr_leaf_name_local_merkledata_count(void *obj, int startoff);
+static int	attr_leaf_name_local_merkleoff_count(void *obj, int startoff);
+static int	attr_leaf_name_remote_merkleoff_count(void *obj, int startoff);
+
 const field_t	attr_hfld[] = {
 	{ "", FLDT_ATTR, OI(0), C1, 0, TYP_NONE },
 	{ NULL }
@@ -87,6 +93,9 @@ const field_t	attr_leaf_entry_flds[] = {
 	{ "parent", FLDT_UINT1,
 	  OI(LEOFF(flags) + bitsz(uint8_t) - XFS_ATTR_PARENT_BIT - 1), C1, 0,
 	  TYP_NONE },
+	{ "verity", FLDT_UINT1,
+	  OI(LEOFF(flags) + bitsz(uint8_t) - XFS_ATTR_VERITY_BIT - 1), C1, 0,
+	  TYP_NONE },
 	{ "pad2", FLDT_UINT8X, OI(LEOFF(pad2)), C1, FLD_SKIPALL, TYP_NONE },
 	{ NULL }
 };
@@ -113,6 +122,10 @@ const field_t	attr_leaf_map_flds[] = {
 
 #define	LNOFF(f)	bitize(offsetof(xfs_attr_leaf_name_local_t, f))
 #define	LVOFF(f)	bitize(offsetof(xfs_attr_leaf_name_remote_t, f))
+#define	MKLOFF(f)	bitize(offsetof(xfs_attr_leaf_name_local_t, nameval) + \
+			       offsetof(struct xfs_merkle_key, f))
+#define	MKROFF(f)	bitize(offsetof(xfs_attr_leaf_name_remote_t, name) + \
+			       offsetof(struct xfs_merkle_key, f))
 const field_t	attr_leaf_name_flds[] = {
 	{ "valuelen", FLDT_UINT16D, OI(LNOFF(valuelen)),
 	  attr_leaf_name_local_count, FLD_COUNT, TYP_NONE },
@@ -122,8 +135,12 @@ const field_t	attr_leaf_name_flds[] = {
 	  attr_leaf_name_local_name_count, FLD_COUNT, TYP_NONE },
 	{ "parent_dir", FLDT_PARENT_REC, attr_leaf_name_local_value_offset,
 	  attr_leaf_value_pptr_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
+	{ "merkle_pos", FLDT_UINT64X, OI(MKLOFF(mk_pos)),
+	  attr_leaf_name_local_merkleoff_count, FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_leaf_name_local_value_offset,
 	  attr_leaf_name_local_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "merkle_data", FLDT_HEXSTRING, attr_leaf_name_local_value_offset,
+	  attr_leaf_name_local_merkledata_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "valueblk", FLDT_UINT32X, OI(LVOFF(valueblk)),
 	  attr_leaf_name_remote_count, FLD_COUNT, TYP_NONE },
 	{ "valuelen", FLDT_UINT32D, OI(LVOFF(valuelen)),
@@ -132,6 +149,8 @@ const field_t	attr_leaf_name_flds[] = {
 	  attr_leaf_name_remote_count, FLD_COUNT, TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(LVOFF(name)),
 	  attr_leaf_name_remote_name_count, FLD_COUNT, TYP_NONE },
+	{ "merkle_pos", FLDT_UINT64X, OI(MKROFF(mk_pos)),
+	  attr_leaf_name_remote_merkleoff_count, FLD_COUNT, TYP_NONE },
 	{ NULL }
 };
 
@@ -265,7 +284,19 @@ __attr_leaf_name_local_count(
 	struct xfs_attr_leaf_entry      *e,
 	int				i)
 {
-	return (e->flags & XFS_ATTR_LOCAL) != 0;
+	struct xfs_attr_leaf_name_local	*l;
+
+	if (!(e->flags & XFS_ATTR_LOCAL))
+		return 0;
+
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY) {
+		l = xfs_attr3_leaf_name_local(leaf, i);
+
+		if (l->namelen == sizeof(struct xfs_merkle_key))
+			return 0;
+	}
+
+	return 1;
 }
 
 static int
@@ -289,6 +320,10 @@ __attr_leaf_name_local_name_count(
 		return 0;
 
 	l = xfs_attr3_leaf_name_local(leaf, i);
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
+	    l->namelen == sizeof(struct xfs_merkle_key))
+		return 0;
+
 	return l->namelen;
 }
 
@@ -311,7 +346,8 @@ __attr_leaf_name_local_value_count(
 
 	if (!(e->flags & XFS_ATTR_LOCAL))
 		return 0;
-	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT)
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT ||
+	    (e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY)
 		return 0;
 
 	l = xfs_attr3_leaf_name_local(leaf, i);
@@ -382,6 +418,10 @@ __attr_leaf_name_remote_name_count(
 		return 0;
 
 	r = xfs_attr3_leaf_name_remote(leaf, i);
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
+	    r->namelen == sizeof(struct xfs_merkle_key))
+		return 0;
+
 	return r->namelen;
 }
 
@@ -542,6 +582,141 @@ attr_leaf_value_pptr_count(
 	return attr_leaf_entry_walk(obj, startoff, __leaf_pptr_count);
 }
 
+/*
+ * Is the current file a verity file?  This is a kludge for handling merkle
+ * tree blocks stored in a XFS_ATTR_VERITY attr's remote value block because we
+ * can't access the leaf entry to find out if the attr is actually a verity
+ * attr.
+ */
+static bool
+is_verity_file(void)
+{
+	struct xfs_inode	*ip;
+	bool			ret = false;
+
+	if (iocur_top->ino == 0 || iocur_top->ino == NULLFSINO)
+		return false;
+
+	if (!xfs_has_verity(mp))
+		return false;
+
+	ret = -libxfs_iget(mp, NULL, iocur_top->ino, 0, &ip);
+	if (ret)
+		return false;
+
+	if (ip->i_diflags2 & XFS_DIFLAG2_VERITY)
+		ret = true;
+
+	libxfs_irele(ip);
+	return ret;
+}
+
+static int
+attr3_remote_merkledata_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr3_leaf_hdr	*lhdr = obj;
+	struct xfs_attr3_rmt_hdr	*rhdr = obj;
+
+	if (rhdr->rm_magic == cpu_to_be32(XFS_ATTR3_RMT_MAGIC) ||
+	    lhdr->info.hdr.magic == cpu_to_be16(XFS_DA_NODE_MAGIC) ||
+	    lhdr->info.hdr.magic == cpu_to_be16(XFS_DA3_NODE_MAGIC) ||
+	    lhdr->info.hdr.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC) ||
+	    lhdr->info.hdr.magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC))
+		return 0;
+
+	if (startoff != 0 || !is_verity_file())
+		return 0;
+
+	return mp->m_sb.sb_blocksize;
+}
+
+static int
+__leaf_local_merkledata_count(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	struct xfs_attr_leaf_name_local	*l;
+
+	if (!(e->flags & XFS_ATTR_LOCAL))
+		return 0;
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT)
+		return 0;
+
+	l = xfs_attr3_leaf_name_local(leaf, i);
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
+	    l->namelen == sizeof(struct xfs_merkle_key))
+		return be16_to_cpu(l->valuelen);
+
+	return 0;
+}
+
+static int
+attr_leaf_name_local_merkledata_count(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff, __leaf_local_merkledata_count);
+}
+
+static int
+__leaf_local_merkleoff_count(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	struct xfs_attr_leaf_name_local	*l;
+
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) != XFS_ATTR_VERITY)
+		return 0;
+	if (!(e->flags & XFS_ATTR_LOCAL))
+		return 0;
+
+	l = xfs_attr3_leaf_name_local(leaf, i);
+	if (l->namelen != sizeof(struct xfs_merkle_key))
+		return 0;
+
+	return 1;
+}
+
+static int
+attr_leaf_name_local_merkleoff_count(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff, __leaf_local_merkleoff_count);
+}
+
+static int
+__leaf_remote_merkleoff_count(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	struct xfs_attr_leaf_name_remote	*r;
+
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) != XFS_ATTR_VERITY)
+		return 0;
+	if (e->flags & XFS_ATTR_LOCAL)
+		return 0;
+
+	r = xfs_attr3_leaf_name_remote(leaf, i);
+	if (r->namelen != sizeof(struct xfs_merkle_key))
+		return 0;
+
+	return 1;
+}
+
+static int
+attr_leaf_name_remote_merkleoff_count(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff, __leaf_remote_merkleoff_count);
+}
+
 int
 attr_size(
 	void	*obj,
@@ -570,6 +745,8 @@ const field_t	attr3_flds[] = {
 	  FLD_COUNT, TYP_NONE },
 	{ "data", FLDT_CHARNS, OI(bitize(sizeof(struct xfs_attr3_rmt_hdr))),
 	  attr3_remote_data_count, FLD_COUNT, TYP_NONE },
+	{ "merkle_data", FLDT_HEXSTRING, OI(0),
+	  attr3_remote_merkledata_count, FLD_COUNT, TYP_NONE },
 	{ "entries", FLDT_ATTR_LEAF_ENTRY, OI(L3OFF(entries)),
 	  attr3_leaf_entries_count, FLD_ARRAY|FLD_COUNT, TYP_NONE },
 	{ "btree", FLDT_ATTR_NODE_ENTRY, OI(N3OFF(__btree)),
@@ -652,6 +829,9 @@ xfs_attr3_set_crc(
 		xfs_buf_update_cksum(bp, XFS_ATTR3_RMT_CRC_OFF);
 		return;
 	default:
+		if (is_verity_file())
+			return;
+
 		dbprintf(_("Unknown attribute buffer type!\n"));
 		break;
 	}
@@ -687,6 +867,11 @@ xfs_attr3_db_read_verify(
 		bp->b_ops = &xfs_attr3_rmt_buf_ops;
 		break;
 	default:
+		if (is_verity_file()) {
+			bp->b_ops = &xfs_attr3_rmtverity_buf_ops;
+			goto verify;
+		}
+
 		dbprintf(_("Unknown attribute buffer type!\n"));
 		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 		return;
diff --git a/db/attrshort.c b/db/attrshort.c
index 7e5c94ca533d..1d26a358335f 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -20,6 +20,9 @@ static int	attr_shortform_list_offset(void *obj, int startoff, int idx);
 
 static int	attr_sf_entry_pptr_count(void *obj, int startoff);
 
+static int	attr_sf_entry_merkleoff_count(void *obj, int startoff);
+static int	attr_sf_entry_merkledata_count(void *obj, int startoff);
+
 const field_t	attr_shortform_flds[] = {
 	{ "hdr", FLDT_ATTR_SF_HDR, OI(0), C1, 0, TYP_NONE },
 	{ "list", FLDT_ATTR_SF_ENTRY, attr_shortform_list_offset,
@@ -35,6 +38,8 @@ const field_t	attr_sf_hdr_flds[] = {
 };
 
 #define	EOFF(f)	bitize(offsetof(struct xfs_attr_sf_entry, f))
+#define	MKOFF(f) bitize(offsetof(struct xfs_attr_sf_entry, nameval) + \
+			offsetof(struct xfs_merkle_key, f))
 const field_t	attr_sf_entry_flds[] = {
 	{ "namelen", FLDT_UINT8D, OI(EOFF(namelen)), C1, 0, TYP_NONE },
 	{ "valuelen", FLDT_UINT8D, OI(EOFF(valuelen)), C1, 0, TYP_NONE },
@@ -48,10 +53,17 @@ const field_t	attr_sf_entry_flds[] = {
 	{ "parent", FLDT_UINT1,
 	  OI(EOFF(flags) + bitsz(uint8_t) - XFS_ATTR_PARENT_BIT - 1), C1, 0,
 	  TYP_NONE },
+	{ "verity", FLDT_UINT1,
+	  OI(EOFF(flags) + bitsz(uint8_t) - XFS_ATTR_VERITY_BIT - 1), C1, 0,
+	  TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(EOFF(nameval)), attr_sf_entry_name_count,
 	  FLD_COUNT, TYP_NONE },
 	{ "parent_dir", FLDT_PARENT_REC, attr_sf_entry_value_offset,
 	  attr_sf_entry_pptr_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
+	{ "merkle_pos", FLDT_UINT32X, OI(MKOFF(mk_pos)),
+	  attr_sf_entry_merkleoff_count, FLD_COUNT, TYP_NONE },
+	{ "merkle_data", FLDT_HEXSTRING, attr_sf_entry_value_offset,
+	  attr_sf_entry_merkledata_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,
 	  attr_sf_entry_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ NULL }
@@ -100,6 +112,10 @@ attr_sf_entry_value_count(
 	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT)
 		return 0;
 
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
+	    e->namelen == sizeof(struct xfs_merkle_key))
+		return 0;
+
 	return e->valuelen;
 }
 
@@ -183,3 +199,37 @@ attr_sf_entry_pptr_count(
 
 	return 1;
 }
+
+static int
+attr_sf_entry_merkleoff_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr_sf_entry	*e;
+
+	ASSERT(bitoffs(startoff) == 0);
+	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) != XFS_ATTR_VERITY)
+		return 0;
+
+	if (e->namelen != sizeof(struct xfs_merkle_key))
+		return 0;
+
+	return 1;
+}
+
+static int
+attr_sf_entry_merkledata_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr_sf_entry	*e;
+
+	ASSERT(bitoffs(startoff) == 0);
+	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
+	    e->namelen == sizeof(struct xfs_merkle_key))
+		return e->valuelen;
+
+	return 0;
+}


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 24/38] xfs_db: dump the verity descriptor
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-04-30  3:36   ` [PATCH 23/38] xfs_db: dump merkle tree data Darrick J. Wong
@ 2024-04-30  3:37   ` Darrick J. Wong
  2024-04-30  3:37   ` [PATCH 25/38] xfs_db: don't obfuscate verity xattrs Darrick J. Wong
                     ` (13 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:37 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Dump the fsverity descriptor if fsverity.h is present.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 configure.ac            |    1 +
 db/Makefile             |    4 ++++
 db/attr.c               |   31 +++++++++++++++++++++++++++++++
 db/attrshort.c          |   22 ++++++++++++++++++++--
 db/field.c              |   29 +++++++++++++++++++++++++++++
 db/field.h              |    3 +++
 include/builddefs.in    |    1 +
 include/platform_defs.h |    4 ++++
 m4/package_libcdev.m4   |   18 ++++++++++++++++++
 9 files changed, 111 insertions(+), 2 deletions(-)


diff --git a/configure.ac b/configure.ac
index 1cb7d59c5582..ade0aca58418 100644
--- a/configure.ac
+++ b/configure.ac
@@ -223,6 +223,7 @@ fi
 AC_MANUAL_FORMAT
 AC_HAVE_LIBURCU_ATOMIC64
 AC_USE_RADIX_TREE_FOR_INUMS
+AC_HAVE_FSVERITY_DESCRIPTOR
 
 AC_CONFIG_FILES([include/builddefs])
 AC_OUTPUT
diff --git a/db/Makefile b/db/Makefile
index 02eeead25b49..9fe6fed727e1 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -78,6 +78,10 @@ LLDLIBS += $(LIBEDITLINE) $(LIBTERMCAP)
 CFLAGS += -DENABLE_EDITLINE
 endif
 
+ifeq ($(HAVE_FSVERITY_DESCR),yes)
+CFLAGS += -DHAVE_FSVERITY_DESCR
+endif
+
 default: depend $(LTCOMMAND)
 
 include $(BUILDRULES)
diff --git a/db/attr.c b/db/attr.c
index 7d8bdeb53032..e05243ff16fa 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -40,6 +40,7 @@ static int	attr3_remote_merkledata_count(void *obj, int startoff);
 static int	attr_leaf_name_local_merkledata_count(void *obj, int startoff);
 static int	attr_leaf_name_local_merkleoff_count(void *obj, int startoff);
 static int	attr_leaf_name_remote_merkleoff_count(void *obj, int startoff);
+static int	attr_leaf_vdesc_count(void *obj, int startoff);
 
 const field_t	attr_hfld[] = {
 	{ "", FLDT_ATTR, OI(0), C1, 0, TYP_NONE },
@@ -151,6 +152,8 @@ const field_t	attr_leaf_name_flds[] = {
 	  attr_leaf_name_remote_name_count, FLD_COUNT, TYP_NONE },
 	{ "merkle_pos", FLDT_UINT64X, OI(MKROFF(mk_pos)),
 	  attr_leaf_name_remote_merkleoff_count, FLD_COUNT, TYP_NONE },
+	{ "vdesc", FLDT_FSVERITY_DESCR, attr_leaf_name_local_value_offset,
+	  attr_leaf_vdesc_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ NULL }
 };
 
@@ -717,6 +720,34 @@ attr_leaf_name_remote_merkleoff_count(
 	return attr_leaf_entry_walk(obj, startoff, __leaf_remote_merkleoff_count);
 }
 
+static int
+__leaf_vdesc_count(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	struct xfs_attr_leaf_name_local	*l;
+
+	if (!(e->flags & XFS_ATTR_LOCAL))
+		return 0;
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) != XFS_ATTR_VERITY)
+		return 0;
+
+	l = xfs_attr3_leaf_name_local(leaf, i);
+	if (l->namelen != XFS_VERITY_DESCRIPTOR_NAME_LEN)
+		return 0;
+
+	return 1;
+}
+
+static int
+attr_leaf_vdesc_count(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff, __leaf_vdesc_count);
+}
+
 int
 attr_size(
 	void	*obj,
diff --git a/db/attrshort.c b/db/attrshort.c
index 1d26a358335f..4ff19d1284c8 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -22,6 +22,7 @@ static int	attr_sf_entry_pptr_count(void *obj, int startoff);
 
 static int	attr_sf_entry_merkleoff_count(void *obj, int startoff);
 static int	attr_sf_entry_merkledata_count(void *obj, int startoff);
+static int	attr_sf_entry_vdesc_count(void *obj, int startoff);
 
 const field_t	attr_shortform_flds[] = {
 	{ "hdr", FLDT_ATTR_SF_HDR, OI(0), C1, 0, TYP_NONE },
@@ -66,6 +67,8 @@ const field_t	attr_sf_entry_flds[] = {
 	  attr_sf_entry_merkledata_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,
 	  attr_sf_entry_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
+	{ "vdesc", FLDT_FSVERITY_DESCR, attr_sf_entry_value_offset,
+	  attr_sf_entry_vdesc_count, FLD_COUNT | FLD_OFFSET, TYP_NONE },
 	{ NULL }
 };
 
@@ -112,8 +115,7 @@ attr_sf_entry_value_count(
 	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_PARENT)
 		return 0;
 
-	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
-	    e->namelen == sizeof(struct xfs_merkle_key))
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY)
 		return 0;
 
 	return e->valuelen;
@@ -233,3 +235,19 @@ attr_sf_entry_merkledata_count(
 
 	return 0;
 }
+
+static int
+attr_sf_entry_vdesc_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr_sf_entry	*e;
+
+	ASSERT(bitoffs(startoff) == 0);
+	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+	if ((e->flags & XFS_ATTR_NSP_ONDISK_MASK) == XFS_ATTR_VERITY &&
+	    e->namelen == XFS_VERITY_DESCRIPTOR_NAME_LEN)
+		return 1;
+
+	return 0;
+}
diff --git a/db/field.c b/db/field.c
index 066239ae6073..4f9dafbee182 100644
--- a/db/field.c
+++ b/db/field.c
@@ -33,6 +33,25 @@ const field_t		parent_flds[] = {
 };
 #undef PPOFF
 
+#ifdef HAVE_FSVERITY_DESCR
+# define	OFF(f)	bitize(offsetof(struct fsverity_descriptor, f))
+const field_t	vdesc_flds[] = {
+	{ "version", FLDT_UINT8D, OI(OFF(version)), C1, 0, TYP_NONE },
+	{ "hash_algorithm", FLDT_UINT8D, OI(OFF(hash_algorithm)), C1, 0, TYP_NONE },
+	{ "log_blocksize", FLDT_UINT8D, OI(OFF(log_blocksize)), C1, 0, TYP_NONE },
+	{ "salt_size", FLDT_UINT8D, OI(OFF(salt_size)), C1, 0, TYP_NONE },
+	{ "data_size", FLDT_UINT64D_LE, OI(OFF(data_size)), C1, 0, TYP_NONE },
+	{ "root_hash", FLDT_HEXSTRING, OI(OFF(root_hash)), CI(64), 0, TYP_NONE },
+	{ "salt", FLDT_HEXSTRING, OI(OFF(salt)), CI(32), 0, TYP_NONE },
+	{ NULL }
+};
+# undef OFF
+#else
+const field_t	vdesc_flds[] = {
+	{ NULL }
+};
+#endif
+
 const ftattr_t	ftattrtab[] = {
 	{ FLDT_AGBLOCK, "agblock", fp_num, "%u", SI(bitsz(xfs_agblock_t)),
 	  FTARG_DONULL, fa_agblock, NULL },
@@ -440,6 +459,16 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_RGSUMMARY, "rgsummary", NULL, (char *)rgsummary_flds,
 	  btblock_size, FTARG_SIZE, NULL, rgsummary_flds },
 
+	{ FLDT_UINT64D_LE, "uint64d_le", fp_num, "%llu", SI(bitsz(uint64_t)),
+	  FTARG_LE, NULL, NULL },
+
+#ifdef HAVE_FSVERITY_DESCR
+	{ FLDT_FSVERITY_DESCR, "verity", NULL, (char *)vdesc_flds,
+	  SI(bitsz(struct fsverity_descriptor)), 0, NULL, vdesc_flds },
+#else
+	{ FLDT_FSVERITY_DESCR, "verity", NULL, NULL, 0, 0, NULL, NULL },
+#endif
+
 	{ FLDT_ZZZ, NULL }
 };
 
diff --git a/db/field.h b/db/field.h
index 89752d07b84c..bc5426f47293 100644
--- a/db/field.h
+++ b/db/field.h
@@ -211,6 +211,9 @@ typedef enum fldt	{
 	FLDT_SUMINFO,
 	FLDT_RGSUMMARY,
 
+	FLDT_UINT64D_LE,
+	FLDT_FSVERITY_DESCR,
+
 	FLDT_ZZZ			/* mark last entry */
 } fldt_t;
 
diff --git a/include/builddefs.in b/include/builddefs.in
index 5a4008318c84..0e2974044a55 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -114,6 +114,7 @@ HAVE_UDEV = @have_udev@
 UDEV_RULE_DIR = @udev_rule_dir@
 HAVE_LIBURCU_ATOMIC64 = @have_liburcu_atomic64@
 USE_RADIX_TREE_FOR_INUMS = @use_radix_tree_for_inums@
+HAVE_FSVERITY_DESCR = @have_fsverity_descr@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/include/platform_defs.h b/include/platform_defs.h
index 9c28e2744a8d..95f9df0d3d86 100644
--- a/include/platform_defs.h
+++ b/include/platform_defs.h
@@ -174,4 +174,8 @@ static inline size_t __ab_c_size(size_t a, size_t b, size_t c)
 # define barrier() __memory_barrier()
 #endif
 
+#ifdef HAVE_FSVERITY_DESCR
+# include <linux/fsverity.h>
+#endif
+
 #endif	/* __XFS_PLATFORM_DEFS_H__ */
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index 711ff81f3332..1edf1fc12d6b 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -237,3 +237,21 @@ AC_DEFUN([AC_USE_RADIX_TREE_FOR_INUMS],
        AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)])
     AC_SUBST(use_radix_tree_for_inums)
   ])
+
+#
+# Check if linux/fsverity.h defines the verity descriptor
+#
+AC_DEFUN([AC_HAVE_FSVERITY_DESCRIPTOR],
+  [ AC_MSG_CHECKING([for fsverity_descriptor in linux/fsverity.h ])
+    AC_COMPILE_IFELSE(
+    [	AC_LANG_PROGRAM([[
+#include <linux/types.h>
+#include <linux/fsverity.h>
+	]], [[
+struct fsverity_descriptor m = { };
+	]])
+    ], have_fsverity_descr=yes
+       AC_MSG_RESULT(yes),
+       AC_MSG_RESULT(no))
+    AC_SUBST(have_fsverity_descr)
+  ])


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 25/38] xfs_db: don't obfuscate verity xattrs
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-04-30  3:37   ` [PATCH 24/38] xfs_db: dump the verity descriptor Darrick J. Wong
@ 2024-04-30  3:37   ` Darrick J. Wong
  2024-04-30  3:37   ` [PATCH 26/38] xfs_db: dump the inode verity flag Darrick J. Wong
                     ` (12 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:37 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Don't obfuscate fsverity metadata when performing a metadump.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/metadump.c |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/db/metadump.c b/db/metadump.c
index 23defaee929f..112d762a8c31 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -1448,6 +1448,8 @@ process_sf_attr(
 		if (asfep->flags & XFS_ATTR_PARENT) {
 			maybe_obfuscate_pptr(asfep->flags, name, namelen,
 					value, asfep->valuelen, is_meta);
+		} else if (asfep->flags & XFS_ATTR_VERITY) {
+			; /* never obfuscate verity metadata */
 		} else if (want_obfuscate_attr(asfep->flags, name, namelen,
 					value, asfep->valuelen, is_meta)) {
 			generate_obfuscated_name(0, asfep->namelen, name);
@@ -1843,6 +1845,8 @@ process_attr_block(
 				maybe_obfuscate_pptr(entry->flags, name,
 						local->namelen, value,
 						valuelen, is_meta);
+			} else if (entry->flags & XFS_ATTR_VERITY) {
+				; /* never obfuscate verity metadata */
 			} else if (want_obfuscate_attr(entry->flags, name,
 						local->namelen, value,
 						valuelen, is_meta)) {
@@ -1871,6 +1875,10 @@ process_attr_block(
 				/* do not obfuscate obviously busted pptr */
 				add_remote_vals(be32_to_cpu(remote->valueblk),
 						be32_to_cpu(remote->valuelen));
+			} else if (entry->flags & XFS_ATTR_VERITY) {
+				/* never obfuscate verity metadata */
+				add_remote_vals(be32_to_cpu(remote->valueblk),
+						be32_to_cpu(remote->valuelen));
 			} else if (want_obfuscate_dirents(is_meta)) {
 				generate_obfuscated_name(0, remote->namelen,
 							 &remote->name[0]);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 26/38] xfs_db: dump the inode verity flag
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-04-30  3:37   ` [PATCH 25/38] xfs_db: don't obfuscate verity xattrs Darrick J. Wong
@ 2024-04-30  3:37   ` Darrick J. Wong
  2024-04-30  3:37   ` [PATCH 27/38] xfs_db: compute hashes of merkle tree blocks Darrick J. Wong
                     ` (11 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:37 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Display the verity iflag.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/inode.c |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/db/inode.c b/db/inode.c
index d7ce7eb77365..6a6bb43dc15a 100644
--- a/db/inode.c
+++ b/db/inode.c
@@ -213,6 +213,9 @@ const field_t	inode_v3_flds[] = {
 	{ "metadir", FLDT_UINT1,
 	  OI(COFF(flags2) + bitsz(uint64_t) - XFS_DIFLAG2_METADIR_BIT-1), C1,
 	  0, TYP_NONE },
+	{ "verity", FLDT_UINT1,
+	  OI(COFF(flags2) + bitsz(uint64_t) - XFS_DIFLAG2_VERITY_BIT-1), C1,
+	  0, TYP_NONE },
 	{ NULL }
 };
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 27/38] xfs_db: compute hashes of merkle tree blocks
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (25 preceding siblings ...)
  2024-04-30  3:37   ` [PATCH 26/38] xfs_db: dump the inode verity flag Darrick J. Wong
@ 2024-04-30  3:37   ` Darrick J. Wong
  2024-04-30  3:38   ` [PATCH 28/38] xfs_repair: junk fsverity xattrs when unnecessary Darrick J. Wong
                     ` (10 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:37 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Compute the hash of verity merkle tree blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/hash.c                |   21 +++++++++++++++++++--
 include/libxfs.h         |    1 +
 libxfs/libxfs_api_defs.h |    1 +
 man/man8/xfs_db.8        |    5 +++++
 4 files changed, 26 insertions(+), 2 deletions(-)


diff --git a/db/hash.c b/db/hash.c
index ab9c435b545f..e88d7d326bb5 100644
--- a/db/hash.c
+++ b/db/hash.c
@@ -36,7 +36,7 @@ hash_help(void)
 " 'hash' prints out the calculated hash value for a string using the\n"
 "directory/attribute code hash function.\n"
 "\n"
-" Usage:  \"hash [-d|-p parent_ino] <string>\"\n"
+" Usage:  \"hash [-d|-p parent_ino|-m merkle_blkno] <string>\"\n"
 "\n"
 ));
 
@@ -46,6 +46,7 @@ enum hash_what {
 	ATTR,
 	DIRECTORY,
 	PPTR,
+	MERKLE,
 };
 
 /* ARGSUSED */
@@ -54,16 +55,28 @@ hash_f(
 	int		argc,
 	char		**argv)
 {
+	struct xfs_merkle_key mk = { };
 	xfs_ino_t	p_ino = 0;
 	xfs_dahash_t	hashval;
+	unsigned long long mk_pos;
 	enum hash_what	what = ATTR;
 	int		c;
 
-	while ((c = getopt(argc, argv, "dp:")) != EOF) {
+	while ((c = getopt(argc, argv, "dm:p:")) != EOF) {
 		switch (c) {
 		case 'd':
 			what = DIRECTORY;
 			break;
+		case 'm':
+			errno = 0;
+			mk_pos = strtoull(optarg, NULL, 0);
+			if (errno) {
+				perror(optarg);
+				return 1;
+			}
+			mk.mk_pos = cpu_to_be64(mk_pos << XFS_VERITY_HASH_SHIFT);
+			what = MERKLE;
+			break;
 		case 'p':
 			errno = 0;
 			p_ino = strtoull(optarg, NULL, 0);
@@ -97,6 +110,10 @@ hash_f(
 		case ATTR:
 			hashval = libxfs_attr_hashname(xname.name, xname.len);
 			break;
+		case MERKLE:
+			hashval = libxfs_verity_hashname((void *)&mk, sizeof(mk));
+			break;
+
 		}
 		dbprintf("0x%x\n", hashval);
 	}
diff --git a/include/libxfs.h b/include/libxfs.h
index b4c6a2882aa3..0c3f0be85565 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -100,6 +100,7 @@ struct iomap;
 #include "xfs_rtgroup.h"
 #include "xfs_rtrmap_btree.h"
 #include "xfs_ag_resv.h"
+#include "xfs_verity.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 6ad728af2e0a..d125e2679348 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -394,6 +394,7 @@
 #define xfs_verify_fsbno		libxfs_verify_fsbno
 #define xfs_verify_ino			libxfs_verify_ino
 #define xfs_verify_rtbno		libxfs_verify_rtbno
+#define xfs_verity_hashname		libxfs_verity_hashname
 #define xfs_zero_extent			libxfs_zero_extent
 
 /* Please keep this list alphabetized. */
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 2c5aed2cf38c..deba4a6354aa 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -902,6 +902,11 @@ option is specified, the directory-specific hash function is used.
 This only makes a difference on filesystems with ascii case-insensitive
 lookups enabled.
 
+If the
+.B \-m
+option is specified, the merkle tree-specific hash function is used.
+The merkle tree block offset must be specified as an argument.
+
 If the
 .B \-p
 option is specified, the parent pointer-specific hash function is used.


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 28/38] xfs_repair: junk fsverity xattrs when unnecessary
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (26 preceding siblings ...)
  2024-04-30  3:37   ` [PATCH 27/38] xfs_db: compute hashes of merkle tree blocks Darrick J. Wong
@ 2024-04-30  3:38   ` Darrick J. Wong
  2024-04-30  3:38   ` [PATCH 29/38] xfs_repair: clear verity iflag when verity isn't supported Darrick J. Wong
                     ` (9 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:38 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Remove any fs-verity extended attributes when the filesystem doesn't
support fs-verity.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/attr_repair.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)


diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 227e5dbcd016..898eb3edfd12 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -334,6 +334,13 @@ process_shortform_attr(
 			junkit |= 1;
 		}
 
+		if ((currententry->flags & XFS_ATTR_VERITY) &&
+		    !xfs_has_verity(mp)) {
+			do_warn(
+ _("verity metadata found on filesystem that doesn't support verity\n"));
+			junkit |= 1;
+		}
+
 		remainingspace = remainingspace -
 					xfs_attr_sf_entsize(currententry);
 
@@ -543,6 +550,14 @@ process_leaf_attr_local(
 		return -1;
 	}
 
+	if ((entry->flags & XFS_ATTR_VERITY) && !xfs_has_verity(mp)) {
+		do_warn(
+ _("verity metadata found in attribute entry %d in attr block %u, inode %"
+   PRIu64 " on filesystem that doesn't support verity\n"),
+				i, da_bno, ino);
+		return -1;
+	}
+
 	return xfs_attr_leaf_entsize_local(local->namelen,
 						be16_to_cpu(local->valuelen));
 }
@@ -592,6 +607,14 @@ process_leaf_attr_remote(
 		return -1;
 	}
 
+	if ((entry->flags & XFS_ATTR_VERITY) && !xfs_has_verity(mp)) {
+		do_warn(
+ _("verity metadata found in attribute entry %d in attr block %u, inode %"
+   PRIu64 " on filesystem that doesn't support verity\n"),
+				i, da_bno, ino);
+		return -1;
+	}
+
 	value = malloc(be32_to_cpu(remotep->valuelen));
 	if (value == NULL) {
 		do_warn(


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 29/38] xfs_repair: clear verity iflag when verity isn't supported
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (27 preceding siblings ...)
  2024-04-30  3:38   ` [PATCH 28/38] xfs_repair: junk fsverity xattrs when unnecessary Darrick J. Wong
@ 2024-04-30  3:38   ` Darrick J. Wong
  2024-04-30  3:38   ` [PATCH 30/38] xfs_repair: handle verity remote attrs Darrick J. Wong
                     ` (8 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:38 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Clear the fsverity inode flag if the filesystem doesn't support it or if
the file is not a regular file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/dinode.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)


diff --git a/repair/dinode.c b/repair/dinode.c
index 4e39e5e76e90..bbb2db5c8e23 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -3324,6 +3324,34 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"),
 				*dirty = 1;
 		}
 
+		if ((flags2 & XFS_DIFLAG2_VERITY) &&
+		    !xfs_has_verity(mp)) {
+			if (!uncertain) {
+				do_warn(
+	_("inode %" PRIu64 " is marked verity but file system does not support fs-verity\n"),
+					lino);
+			}
+
+			flags2 &= ~XFS_DIFLAG2_VERITY;
+			if (!no_modify)
+				*dirty = 1;
+		}
+
+		if (flags2 & XFS_DIFLAG2_VERITY) {
+			/* must be a file */
+			if (di_mode && !S_ISREG(di_mode)) {
+				if (!uncertain) {
+					do_warn(
+	_("verity flag set on non-file inode %" PRIu64 "\n"),
+						lino);
+				}
+
+				flags2 &= ~XFS_DIFLAG2_VERITY;
+				if (!no_modify)
+					*dirty = 1;
+			}
+		}
+
 		if (xfs_dinode_has_large_extent_counts(dino)) {
 			if (dino->di_nrext64_pad) {
 				if (!no_modify) {


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 30/38] xfs_repair: handle verity remote attrs
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (28 preceding siblings ...)
  2024-04-30  3:38   ` [PATCH 29/38] xfs_repair: clear verity iflag when verity isn't supported Darrick J. Wong
@ 2024-04-30  3:38   ` Darrick J. Wong
  2024-04-30  3:38   ` [PATCH 31/38] xfs_repair: allow upgrading filesystems with verity Darrick J. Wong
                     ` (7 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:38 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Teach xfs_repair to handle remote verity xattr values.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/attr_repair.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)


diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 898eb3edfd12..2d0df492f71a 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -428,8 +428,14 @@ process_shortform_attr(
  * many blocks per remote value, so one by one is sufficient.
  */
 static int
-rmtval_get(xfs_mount_t *mp, xfs_ino_t ino, blkmap_t *blkmap,
-		xfs_dablk_t blocknum, int valuelen, char* value)
+rmtval_get(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	unsigned int		attrns,
+	blkmap_t		*blkmap,
+	xfs_dablk_t		blocknum,
+	int			valuelen,
+	char*			value)
 {
 	xfs_fsblock_t	bno;
 	struct xfs_buf	*bp;
@@ -437,12 +443,14 @@ rmtval_get(xfs_mount_t *mp, xfs_ino_t ino, blkmap_t *blkmap,
 	int		hdrsize = 0;
 	int		error;
 
-	if (xfs_has_crc(mp))
+	if (xfs_has_crc(mp) && !(attrns & XFS_ATTR_VERITY))
 		hdrsize = sizeof(struct xfs_attr3_rmt_hdr);
 
 	/* ASSUMPTION: valuelen is a valid number, so use it for looping */
 	/* Note that valuelen is not a multiple of blocksize */
 	while (amountdone < valuelen) {
+		const struct xfs_buf_ops	*ops;
+
 		bno = blkmap_get(blkmap, blocknum + i);
 		if (bno == NULLFSBLOCK) {
 			do_warn(
@@ -450,9 +458,11 @@ rmtval_get(xfs_mount_t *mp, xfs_ino_t ino, blkmap_t *blkmap,
 			clearit = 1;
 			break;
 		}
+
+		ops = libxfs_attr3_remote_buf_ops(attrns);
 		error = -libxfs_buf_read(mp->m_dev, XFS_FSB_TO_DADDR(mp, bno),
 				XFS_FSB_TO_BB(mp, 1), LIBXFS_READBUF_SALVAGE,
-				&bp, &xfs_attr3_rmt_buf_ops);
+				&bp, ops);
 		if (error) {
 			do_warn(
 	_("can't read remote block for attributes of inode %" PRIu64 "\n"), ino);
@@ -623,7 +633,8 @@ process_leaf_attr_remote(
 		do_warn(_("SKIPPING this remote attribute\n"));
 		goto out;
 	}
-	if (rmtval_get(mp, ino, blkmap, be32_to_cpu(remotep->valueblk),
+	if (rmtval_get(mp, ino, entry->flags, blkmap,
+				be32_to_cpu(remotep->valueblk),
 				be32_to_cpu(remotep->valuelen), value)) {
 		do_warn(
 	_("remote attribute get failed for entry %d, inode %" PRIu64 "\n"),


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 31/38] xfs_repair: allow upgrading filesystems with verity
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (29 preceding siblings ...)
  2024-04-30  3:38   ` [PATCH 30/38] xfs_repair: handle verity remote attrs Darrick J. Wong
@ 2024-04-30  3:38   ` Darrick J. Wong
  2024-04-30  3:39   ` [PATCH 32/38] xfs_scrub: check verity file metadata Darrick J. Wong
                     ` (6 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:38 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Allow upgrading of filesystems to support verity.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/xfs_admin.8 |    6 ++++++
 repair/globals.c     |    1 +
 repair/globals.h     |    1 +
 repair/phase2.c      |   24 ++++++++++++++++++++++++
 repair/xfs_repair.c  |   11 +++++++++++
 5 files changed, 43 insertions(+)


diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8
index 83f8fe88ff18..cd18c18fd1b5 100644
--- a/man/man8/xfs_admin.8
+++ b/man/man8/xfs_admin.8
@@ -209,6 +209,12 @@ The filesystem cannot be downgraded after this feature is enabled.
 This upgrade is not possible if a realtime volume has already been added to the
 filesystem.
 This feature is not upstream yet.
+.TP 0.4i
+.B verity
+Enable fs-verity on the filesystem, which allows for sealing of regular file
+data with signed hashes.
+The filesystem cannot be downgraded after this feature is enabled.
+This feature is not upstream yet.
 .RE
 .TP
 .BI \-U " uuid"
diff --git a/repair/globals.c b/repair/globals.c
index a50e4959cbc1..410c3cd39d05 100644
--- a/repair/globals.c
+++ b/repair/globals.c
@@ -59,6 +59,7 @@ bool	add_rmapbt;		/* add reverse mapping btrees */
 bool	add_parent;		/* add parent pointers */
 bool	add_metadir;		/* add metadata directory tree */
 bool	add_rtgroups;		/* add realtime allocation groups */
+bool	add_verity;		/* add fs-verity support */
 
 /* misc status variables */
 
diff --git a/repair/globals.h b/repair/globals.h
index 4f9683bda949..994ea2b4e946 100644
--- a/repair/globals.h
+++ b/repair/globals.h
@@ -100,6 +100,7 @@ extern bool	add_rmapbt;		/* add reverse mapping btrees */
 extern bool	add_parent;		/* add parent pointers */
 extern bool	add_metadir;		/* add metadata directory tree */
 extern bool	add_rtgroups;		/* add realtime allocation groups */
+extern bool	add_verity;		/* add fs-verity support */
 
 /* misc status variables */
 
diff --git a/repair/phase2.c b/repair/phase2.c
index d1b2824caace..f8b0fefe3bc0 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -429,6 +429,28 @@ set_rtgroups(
 	return true;
 }
 
+static bool
+set_verity(
+	struct xfs_mount	*mp,
+	struct xfs_sb		*new_sb)
+{
+	if (xfs_has_verity(mp)) {
+		printf(_("Filesystem already supports verity.\n"));
+		exit(0);
+	}
+
+	if (!xfs_has_crc(mp)) {
+		printf(
+	_("Verity feature only supported on V5 filesystems.\n"));
+		exit(0);
+	}
+
+	printf(_("Adding verity to filesystem.\n"));
+	new_sb->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_VERITY;
+	new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR;
+	return true;
+}
+
 struct check_state {
 	struct xfs_sb		sb;
 	uint64_t		features;
@@ -868,6 +890,8 @@ upgrade_filesystem(
 		dirty |= set_metadir(mp, &new_sb);
 	if (add_rtgroups)
 		dirty |= set_rtgroups(mp, &new_sb);
+	if (add_verity)
+		dirty |= set_verity(mp, &new_sb);
 	if (!dirty)
 		return;
 
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index faaea4d45224..ab6f97157f1b 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -77,6 +77,7 @@ enum c_opt_nums {
 	CONVERT_PARENT,
 	CONVERT_METADIR,
 	CONVERT_RTGROUPS,
+	CONVERT_VERITY,
 	C_MAX_OPTS,
 };
 
@@ -92,6 +93,7 @@ static char *c_opts[] = {
 	[CONVERT_PARENT]	= "parent",
 	[CONVERT_METADIR]	= "metadir",
 	[CONVERT_RTGROUPS]	= "rtgroups",
+	[CONVERT_VERITY]	= "verity",
 	[C_MAX_OPTS]		= NULL,
 };
 
@@ -438,6 +440,15 @@ process_args(int argc, char **argv)
 		_("-c rtgroups only supports upgrades\n"));
 					add_rtgroups = true;
 					break;
+				case CONVERT_VERITY:
+					if (!val)
+						do_abort(
+		_("-c verity requires a parameter\n"));
+					if (strtol(val, NULL, 0) != 1)
+						do_abort(
+		_("-c verity only supports upgrades\n"));
+					add_verity = true;
+					break;
 				default:
 					unknown('c', val);
 					break;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 32/38] xfs_scrub: check verity file metadata
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (30 preceding siblings ...)
  2024-04-30  3:38   ` [PATCH 31/38] xfs_repair: allow upgrading filesystems with verity Darrick J. Wong
@ 2024-04-30  3:39   ` Darrick J. Wong
  2024-04-30  3:39   ` [PATCH 33/38] xfs_scrub: validate verity file contents when doing a media scan Darrick J. Wong
                     ` (5 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:39 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

If phase 5 encounters a fsverity file, read its metadata to see if we
encounter any errors.  The consistency of the file data vs. the hashes
in the merkle tree are checked during the media scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/Makefile |    4 +
 scrub/inodes.h |   22 +++++++
 scrub/phase5.c |  182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+)


diff --git a/scrub/Makefile b/scrub/Makefile
index 885b43e9948d..ad010a05249f 100644
--- a/scrub/Makefile
+++ b/scrub/Makefile
@@ -109,6 +109,10 @@ CFILES += unicrash.c
 LCFLAGS += -DHAVE_LIBICU $(LIBICU_CFLAGS)
 endif
 
+ifeq ($(HAVE_FSVERITY_DESCR),yes)
+LCFLAGS += -DHAVE_FSVERITY_DESCR
+endif
+
 # Automatically trigger a media scan once per month
 XFS_SCRUB_ALL_AUTO_MEDIA_SCAN_INTERVAL=1mo
 
diff --git a/scrub/inodes.h b/scrub/inodes.h
index 7a0b275e575e..aab2d721fe02 100644
--- a/scrub/inodes.h
+++ b/scrub/inodes.h
@@ -25,4 +25,26 @@ int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
 
 int scrub_open_handle(struct xfs_handle *handle);
 
+/*
+ * Might this be a file that's missing its fsverity metadata?  When this is the
+ * case, an open() call will return ENODATA.
+ */
+static inline bool fsverity_meta_is_missing(int error)
+{
+	switch (error) {
+	case ENODATA:
+	case EMSGSIZE:
+	case EINVAL:
+	case EFSCORRUPTED:
+	case EFBIG:
+		/*
+		 * The nonzero errno codes above are the error codes that can
+		 * be returned from fsverity on metadata validation errors.
+		 */
+		return true;
+	}
+
+	return false;
+}
+
 #endif /* XFS_SCRUB_INODES_H_ */
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 6fd3c6982704..6f157fa3570c 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -28,6 +28,7 @@
 #include "descr.h"
 #include "unicrash.h"
 #include "repair.h"
+#include "atomic.h"
 
 /* Phase 5: Full inode scans and check directory connectivity. */
 
@@ -359,6 +360,183 @@ check_dir_connection(
 	return EADDRNOTAVAIL;
 }
 
+#ifdef HAVE_FSVERITY_DESCR
+struct fsverity_object {
+	const char		*name;
+	int			type;
+};
+
+struct fsverity_object fsverity_objects[] = {
+	{
+		.name		= "descriptor",
+		.type		= FS_VERITY_METADATA_TYPE_DESCRIPTOR,
+	},
+	{
+		.name		= "merkle tree",
+		.type		= FS_VERITY_METADATA_TYPE_MERKLE_TREE,
+	},
+	{
+		.name		= "signature",
+		.type		= FS_VERITY_METADATA_TYPE_SIGNATURE,
+	},
+};
+
+static void *fsverity_buf;
+#define FSVERITY_BUFSIZE	(32768)
+
+static inline void *
+get_fsverity_buf(void)
+{
+	static pthread_mutex_t	buf_lock = PTHREAD_MUTEX_INITIALIZER;
+	void			*new_buf;
+
+	if (!fsverity_buf) {
+		new_buf = malloc(FSVERITY_BUFSIZE);
+		if (!new_buf)
+			return NULL;
+
+		pthread_mutex_lock(&buf_lock);
+		if (!fsverity_buf) {
+			fsverity_buf = new_buf;
+			new_buf = NULL;
+		}
+		pthread_mutex_unlock(&buf_lock);
+		if (new_buf)
+			free(new_buf);
+	}
+
+	return fsverity_buf;
+}
+
+static int
+read_fsverity_object(
+	struct scrub_ctx		*ctx,
+	struct descr			*dsc,
+	int				fd,
+	const struct fsverity_object	*verity_obj)
+{
+	struct fsverity_read_metadata_arg arg = {
+		.buf_ptr		= (uintptr_t)get_fsverity_buf(),
+		.metadata_type		= verity_obj->type,
+		.length			= FSVERITY_BUFSIZE,
+	};
+	int				ret;
+
+	if (!arg.buf_ptr) {
+		str_liberror(ctx, ENOMEM, descr_render(dsc));
+		return ENOMEM;
+	}
+
+	do {
+		ret = ioctl(fd, FS_IOC_READ_VERITY_METADATA, &arg);
+		if (ret < 0) {
+			ret = errno;
+			switch (ret) {
+			case ENODATA:
+				/* No fsverity metadata found.  We're done. */
+				return 0;
+			case ENOTTY:
+			case EOPNOTSUPP:
+				/* not a verity file or object doesn't exist */
+				str_error(ctx, descr_render(dsc),
+ _("fsverity %s not supported at data offset %llu length %llu?"),
+					verity_obj->name,
+					arg.offset, arg.length);
+				return ret;
+			default:
+				/* some other error */
+				str_error(ctx, descr_render(dsc),
+ _("fsverity %s read error at data offset %llu length %llu."),
+					verity_obj->name,
+					arg.offset, arg.length);
+				return ret;
+			}
+		}
+		arg.offset += ret;
+	} while (ret > 0);
+
+	return 0;
+}
+
+/* Read all the fsverity metadata. */
+static int
+check_fsverity_metadata(
+	struct scrub_ctx	*ctx,
+	struct descr		*dsc,
+	int			fd)
+{
+	unsigned int		i;
+	int			error;
+
+	for (i = 0; i < ARRAY_SIZE(fsverity_objects); i++) {
+		error = read_fsverity_object(ctx, dsc, fd,
+				&fsverity_objects[i]);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
+/* Open this verity file and check its merkle tree and verity descriptor. */
+static int
+check_verity_file(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bulkstat	*bstat,
+	struct descr		*dsc,
+	int			*fdp)
+{
+	int			error;
+
+	/* Only regular files can have fsverity set. */
+	if (!S_ISREG(bstat->bs_mode)) {
+		str_error(ctx, descr_render(dsc),
+				_("fsverity cannot be set on a regular file."));
+		return 0;
+	}
+
+	*fdp = scrub_open_handle(handle);
+	if (*fdp >= 0)
+		return check_fsverity_metadata(ctx, dsc, *fdp);
+
+	/* Handle is stale, try again. */
+	if (errno == ESTALE)
+		return ESTALE;
+
+	/*
+	 * If the fsverity metadata is missing, inform the user and move on to
+	 * the next file.
+	 */
+	if (fsverity_meta_is_missing(errno)) {
+		str_error(ctx, descr_render(dsc),
+				_("fsverity metadata missing."));
+		return 0;
+	}
+
+	/* Some other runtime error. */
+	error = errno;
+	str_errno(ctx, descr_render(dsc));
+	return error;
+}
+#else
+static int
+check_verity_file(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bulkstat	*bstat,
+	struct descr		*dsc,
+	int			*fdp)
+{
+	static atomic_t		warned;
+
+	if (!atomic_inc_return(&warned))
+		str_warn(ctx, descr_render(dsc),
+				_("fsverity metadata checking not supported\n"));
+	return 0;
+}
+#endif /* HAVE_FSVERITY_DESCR */
+
 /*
  * Verify the connectivity of the directory tree.
  * We know that the kernel's open-by-handle function will try to reconnect
@@ -422,6 +600,10 @@ check_inode_names(
 		error = check_dirent_names(ctx, &dsc, &fd, bstat);
 		if (error)
 			goto err_fd;
+	} else if (bstat->bs_xflags & FS_XFLAG_VERITY) {
+		error = check_verity_file(ctx, handle, bstat, &dsc, &fd);
+		if (error)
+			goto err_fd;
 	}
 
 	progress_add(1);


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 33/38] xfs_scrub: validate verity file contents when doing a media scan
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (31 preceding siblings ...)
  2024-04-30  3:39   ` [PATCH 32/38] xfs_scrub: check verity file metadata Darrick J. Wong
@ 2024-04-30  3:39   ` Darrick J. Wong
  2024-04-30  3:39   ` [PATCH 34/38] xfs_scrub: use MADV_POPULATE_READ to check verity files Darrick J. Wong
                     ` (4 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:39 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Augment the media scan when verity files are detected by reading those
files' pagecache to force verity to compare the hash and (optional)
signatures.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase6.c |  305 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 305 insertions(+)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index de7fcb548fe6..983470b7bece 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -23,6 +23,8 @@
 #include "vfs.h"
 #include "common.h"
 #include "libfrog/bulkstat.h"
+#include "descr.h"
+#include "progress.h"
 
 /*
  * Phase 6: Verify data file integrity.
@@ -34,6 +36,9 @@
  * to tell us if metadata are now corrupt.  Otherwise, we'll scan the
  * whole directory tree looking for files that overlap the bad regions
  * and report the paths of the now corrupt files.
+ *
+ * If the filesystem supports verity, read the contents of each verity file to
+ * force it to validate the file contents.
  */
 
 /* Verify disk blocks with GETFSMAP */
@@ -674,6 +679,285 @@ remember_ioerr(
 		str_liberror(ctx, ret, _("setting bad block bitmap"));
 }
 
+struct verity_ctx {
+	struct scrub_ctx	*ctx;
+	struct workqueue	wq_ddev;
+	struct workqueue	wq_rtdev;
+	bool			aborted;
+};
+
+struct verity_file_ctx {
+	struct xfs_handle	handle;
+	struct verity_ctx	*vc;
+};
+
+static int
+render_ino_from_handle(
+	struct scrub_ctx	*ctx,
+	char			*buf,
+	size_t			buflen,
+	void			*data)
+{
+	struct xfs_handle	*han = data;
+
+	return scrub_render_ino_descr(ctx, buf, buflen, han->ha_fid.fid_ino,
+			han->ha_fid.fid_gen, NULL);
+}
+
+static inline void
+report_verity_error(
+	struct scrub_ctx	*ctx,
+	struct descr		*dsc,
+	off_t			fail_pos,
+	off_t			fail_len)
+{
+	if (fail_pos < 0)
+		return;
+
+	str_unfixable_error(ctx, descr_render(dsc),
+ _("verity error at offsets %llu-%llu"),
+			(unsigned long long)fail_pos,
+			(unsigned long long)(fail_pos + fail_len - 1));
+}
+
+/* Record a verity validation error and maybe log an old error. */
+static inline void
+record_verity_error(
+	struct scrub_ctx	*ctx,
+	struct descr		*dsc,
+	off_t			pos,
+	size_t			len,
+	off_t			*fail_pos,
+	off_t			*fail_len)
+{
+	if (*fail_pos < 0)
+		goto record;
+
+	if (pos == *fail_pos + *fail_len) {
+		*fail_len += len;
+		return;
+	}
+
+	report_verity_error(ctx, dsc, *fail_pos, *fail_len);
+record:
+	*fail_pos = pos;
+	*fail_len = len;
+}
+
+/* Record a verity validation success and maybe log an old error. */
+static inline void
+record_verity_success(
+	struct scrub_ctx	*ctx,
+	struct descr		*dsc,
+	off_t			*fail_pos,
+	off_t			*fail_len)
+{
+	if (*fail_pos >= 0)
+		report_verity_error(ctx, dsc, *fail_pos, *fail_len);
+
+	*fail_pos = -1;
+	*fail_len = 0;
+}
+
+/* Scan a verity file's data looking for validation errors. */
+static void
+scan_verity_file(
+	struct workqueue	*wq,
+	uint32_t		index,
+	void			*arg)
+{
+	struct stat		sb;
+	struct verity_file_ctx	*vf = arg;
+	struct scrub_ctx	*ctx = vf->vc->ctx;
+	off_t			pos;
+	off_t			fail_pos = -1, fail_len = 0;
+	int			fd;
+	int			ret;
+	DEFINE_DESCR(dsc, ctx, render_ino_from_handle);
+
+	descr_set(&dsc, &vf->handle);
+
+	if (vf->vc->aborted) {
+		ret = ECANCELED;
+		goto out_vf;
+	}
+
+	fd = scrub_open_handle(&vf->handle);
+	if (fd < 0) {
+		/*
+		 * Stale file handle means that the verity file is gone.
+		 *
+		 * Even if there's a replacement file, its contents have been
+		 * freshly written and checked.  Either way, we can skip
+		 * scanning this file.
+		 */
+		if (errno == ESTALE) {
+			ret = 0;
+			goto out_vf;
+		}
+
+		/*
+		 * If the fsverity metadata is missing, inform the user and
+		 * move on to the next file.
+		 */
+		if (fsverity_meta_is_missing(errno)) {
+			str_error(ctx, descr_render(&dsc),
+ _("fsverity metadata missing."));
+			ret = 0;
+			goto out_vf;
+		}
+
+		ret = -errno;
+		str_errno(ctx, descr_render(&dsc));
+		goto out_vf;
+	}
+
+	ret = fstat(fd, &sb);
+	if (ret) {
+		str_errno(ctx, descr_render(&dsc));
+		goto out_fd;
+	}
+
+	/* Read a single byte from each block in the file to validate. */
+	for (pos = 0; pos < sb.st_size; pos += sb.st_blksize) {
+		char	c;
+		ssize_t	bytes_read;
+
+		bytes_read = pread(fd, &c, 1, pos);
+		if (!bytes_read)
+			break;
+		if (bytes_read > 0) {
+			record_verity_success(ctx, &dsc, &fail_pos, &fail_len);
+			progress_add(sb.st_blksize);
+			continue;
+		}
+
+		if (errno == EIO) {
+			size_t	length = min(sb.st_size - pos, sb.st_blksize);
+
+			record_verity_error(ctx, &dsc, pos, length, &fail_pos,
+					&fail_len);
+			continue;
+		}
+
+		str_errno(ctx, descr_render(&dsc));
+		break;
+	}
+	report_verity_error(ctx, &dsc, fail_pos, fail_len);
+
+	ret = close(fd);
+	if (ret) {
+		str_errno(ctx, descr_render(&dsc));
+		goto out_vf;
+	}
+	fd = -1;
+
+out_fd:
+	if (fd >= 0)
+		close(fd);
+out_vf:
+	if (ret)
+		vf->vc->aborted = true;
+	free(vf);
+	return;
+}
+
+/* If this is a verity file, queue it for scanning. */
+static int
+schedule_verity_file(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	struct xfs_bulkstat	*bs,
+	void			*arg)
+{
+	struct verity_ctx	*vc = arg;
+	struct verity_file_ctx	*vf;
+	int			ret;
+
+	if (vc->aborted)
+		return ECANCELED;
+
+	if (!(bs->bs_xflags & FS_XFLAG_VERITY)) {
+		progress_add(bs->bs_size);
+		return 0;
+	}
+
+	vf = malloc(sizeof(struct verity_file_ctx));
+	if (!vf) {
+		str_errno(ctx, _("could not allocate fsverity scan context"));
+		vc->aborted = true;
+		return ENOMEM;
+	}
+
+	/* Queue the validation work. */
+	vf->handle = *handle; /* struct copy */
+	vf->vc = vc;
+
+	if (bs->bs_xflags & FS_XFLAG_REALTIME)
+		ret = -workqueue_add(&vc->wq_rtdev, scan_verity_file, 0, vf);
+	else
+		ret = -workqueue_add(&vc->wq_ddev, scan_verity_file, 0, vf);
+	if (ret) {
+		str_liberror(ctx, ret, _("could not schedule fsverity scan"));
+		vc->aborted = true;
+		return ECANCELED;
+	}
+
+	return 0;
+}
+
+static int
+scan_verity_files(
+	struct scrub_ctx	*ctx)
+{
+	struct verity_ctx	vc = {
+		.ctx		= ctx,
+	};
+	unsigned int		verifier_threads;
+	int			ret;
+
+	/* Create thread pool for data dev fsverity processing. */
+	verifier_threads = disk_heads(ctx->datadev);
+	if (verifier_threads == 1)
+		verifier_threads = 0;
+	ret = -workqueue_create_bound(&vc.wq_ddev, ctx, verifier_threads, 500);
+	if (ret) {
+		str_liberror(ctx, ret, _("creating data dev fsverity workqueue"));
+		return ret;
+	}
+
+	/* Create thread pool for rtdev fsverity processing. */
+	if (ctx->rtdev) {
+		verifier_threads = disk_heads(ctx->rtdev);
+		if (verifier_threads == 1)
+			verifier_threads = 0;
+		ret = -workqueue_create_bound(&vc.wq_rtdev, ctx,
+				verifier_threads, 500);
+		if (ret) {
+			str_liberror(ctx, ret,
+					_("creating rt dev fsverity workqueue"));
+			goto out_ddev;
+		}
+	}
+
+	/* Find all the verity inodes. */
+	ret = scrub_scan_all_inodes(ctx, schedule_verity_file, 0, &vc);
+	if (ret)
+		goto out_rtdev;
+	if (vc.aborted) {
+		ret = ECANCELED;
+		goto out_rtdev;
+	}
+
+out_rtdev:
+	workqueue_terminate(&vc.wq_rtdev);
+	workqueue_destroy(&vc.wq_rtdev);
+out_ddev:
+	workqueue_terminate(&vc.wq_ddev);
+	workqueue_destroy(&vc.wq_ddev);
+	return ret;
+}
+
 /*
  * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
  * do data checksums, we trust that the underlying storage will pass back
@@ -689,6 +973,12 @@ phase6_func(
 	struct media_verify_state	vs = { NULL };
 	int				ret, ret2, ret3;
 
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_VERITY) {
+		ret = scan_verity_files(ctx);
+		if (ret)
+			return ret;
+	}
+
 	ret = -bitmap_alloc(&vs.d_bad);
 	if (ret) {
 		str_liberror(ctx, ret, _("creating datadev badblock bitmap"));
@@ -816,5 +1106,20 @@ phase6_estimate(
 	if (ctx->logdev)
 		*nr_threads += disk_heads(ctx->logdev);
 	*rshift = 20;
+
+	/*
+	 * If fsverity is active, double the amount of progress items because
+	 * we will want to validate individual files' data with fsverity.
+	 * Bump the thread counts for the separate verity thread pools and the
+	 * inode scanner.
+	 */
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_VERITY) {
+		*items *= 2;
+		*nr_threads += disk_heads(ctx->datadev);
+		*nr_threads += scrub_nproc_workqueue(ctx);
+		if (ctx->rtdev)
+			*nr_threads += disk_heads(ctx->rtdev);
+	}
+
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 34/38] xfs_scrub: use MADV_POPULATE_READ to check verity files
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (32 preceding siblings ...)
  2024-04-30  3:39   ` [PATCH 33/38] xfs_scrub: validate verity file contents when doing a media scan Darrick J. Wong
@ 2024-04-30  3:39   ` Darrick J. Wong
  2024-04-30  3:40   ` [PATCH 35/38] xfs_spaceman: report data corruption Darrick J. Wong
                     ` (3 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:39 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Use madvise to pull a large number of pages into the pagecache with a
single system call.  For the common case that everything is consistent,
this amortizes syscall overhead over a large amount of data.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase6.c |  133 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 115 insertions(+), 18 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index 983470b7bece..7bb11510d332 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -25,6 +25,7 @@
 #include "libfrog/bulkstat.h"
 #include "descr.h"
 #include "progress.h"
+#include <sys/mman.h>
 
 /*
  * Phase 6: Verify data file integrity.
@@ -759,6 +760,97 @@ record_verity_success(
 	*fail_len = 0;
 }
 
+/* Map at most this many bytes at a time. */
+#define MMAP_LENGTH		(4194304)
+
+/*
+ * Use MADV_POPULATE_READ to validate verity file contents.  Returns @length if
+ * the entire region validated ok; 0 to signal to the caller that they should
+ * fall back to regular reads; or a negative errno if some other error
+ * happened.
+ */
+static ssize_t
+validate_mmap(
+	int		fd,
+	off_t		pos,
+	size_t		length)
+{
+	void		*addr;
+	int		ret;
+
+	/*
+	 * Try to map this file into the address space.  If that fails, we can
+	 * fall back to reading the file contents with read(), so collapse all
+	 * error codes to EFAULT.
+	 */
+	addr = mmap(NULL, length, PROT_READ, MAP_SHARED, fd, pos);
+	if (addr == MAP_FAILED)
+		return 0;
+
+	/* Returns EFAULT for read IO errors. */
+	ret = madvise(addr, length, MADV_POPULATE_READ);
+	if (ret) {
+		munmap(addr, length);
+		if (errno == EFAULT)
+			return 0;
+		return -errno;
+	}
+
+	ret = munmap(addr, length);
+	if (ret)
+		return -errno;
+
+	return length;
+}
+
+/*
+ * Use pread to validate verity file contents.  Returns the number of bytes
+ * validated; 0 to signal to the caller that EOF was encountered; or a negative
+ * errno if some other error happened.
+ */
+static ssize_t
+validate_pread(
+	struct scrub_ctx	*ctx,
+	struct descr		*dsc,
+	int			fd,
+	const struct stat	*statbuf,
+	off_t			pos,
+	size_t			length,
+	off_t			*fail_pos,
+	off_t			*fail_len)
+{
+	ssize_t			validated;
+
+	for (validated = 0;
+	     validated < length;
+	     validated += statbuf->st_blksize, pos += statbuf->st_blksize) {
+		char	c;
+		ssize_t	bytes_read;
+
+		bytes_read = pread(fd, &c, 1, pos);
+		if (!bytes_read)
+			break;
+		if (bytes_read > 0) {
+			record_verity_success(ctx, dsc, fail_pos, fail_len);
+			continue;
+		}
+
+		if (errno == EIO) {
+			size_t	length = min(statbuf->st_size - pos,
+					     statbuf->st_blksize);
+
+			record_verity_error(ctx, dsc, pos, length, fail_pos,
+					fail_len);
+			continue;
+		}
+
+		str_errno(ctx, descr_render(dsc));
+		return -errno;
+	}
+
+	return validated;
+}
+
 /* Scan a verity file's data looking for validation errors. */
 static void
 scan_verity_file(
@@ -770,10 +862,15 @@ scan_verity_file(
 	struct verity_file_ctx	*vf = arg;
 	struct scrub_ctx	*ctx = vf->vc->ctx;
 	off_t			pos;
+	off_t			max_map_pos;
 	off_t			fail_pos = -1, fail_len = 0;
 	int			fd;
 	int			ret;
 	DEFINE_DESCR(dsc, ctx, render_ino_from_handle);
+	static long		pagesize;
+
+	if (!pagesize)
+		pagesize = sysconf(_SC_PAGESIZE);
 
 	descr_set(&dsc, &vf->handle);
 
@@ -818,30 +915,30 @@ scan_verity_file(
 		goto out_fd;
 	}
 
-	/* Read a single byte from each block in the file to validate. */
-	for (pos = 0; pos < sb.st_size; pos += sb.st_blksize) {
-		char	c;
-		ssize_t	bytes_read;
+	/* Validate the file contents with MADV_POPULATE_READ and pread */
+	max_map_pos = roundup(sb.st_size, pagesize);
+	for (pos = 0; pos < max_map_pos; pos += MMAP_LENGTH) {
+		size_t	length = min(max_map_pos - pos, MMAP_LENGTH);
+		ssize_t	validated;
 
-		bytes_read = pread(fd, &c, 1, pos);
-		if (!bytes_read)
-			break;
-		if (bytes_read > 0) {
+		validated = validate_mmap(fd, pos, length);
+		if (validated > 0) {
 			record_verity_success(ctx, &dsc, &fail_pos, &fail_len);
-			progress_add(sb.st_blksize);
+			progress_add(validated);
 			continue;
 		}
-
-		if (errno == EIO) {
-			size_t	length = min(sb.st_size - pos, sb.st_blksize);
-
-			record_verity_error(ctx, &dsc, pos, length, &fail_pos,
-					&fail_len);
-			continue;
+		if (validated < 0) {
+			errno = -validated;
+			str_errno(ctx, descr_render(&dsc));
+			goto out_fd;
 		}
 
-		str_errno(ctx, descr_render(&dsc));
-		break;
+		validated = validate_pread(ctx, &dsc, fd, &sb, pos, length,
+				&fail_pos, &fail_len);
+		if (validated <= 0)
+			break;
+
+		progress_add(validated);
 	}
 	report_verity_error(ctx, &dsc, fail_pos, fail_len);
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 35/38] xfs_spaceman: report data corruption
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (33 preceding siblings ...)
  2024-04-30  3:39   ` [PATCH 34/38] xfs_scrub: use MADV_POPULATE_READ to check verity files Darrick J. Wong
@ 2024-04-30  3:40   ` Darrick J. Wong
  2024-04-30  3:40   ` [PATCH 36/38] xfs_io: report fsverity status via statx Darrick J. Wong
                     ` (2 subsequent siblings)
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:40 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Report data corruption to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man2/ioctl_xfs_bulkstat.2 |    3 +++
 spaceman/health.c             |    4 ++++
 2 files changed, 7 insertions(+)


diff --git a/man/man2/ioctl_xfs_bulkstat.2 b/man/man2/ioctl_xfs_bulkstat.2
index b6d51aa43811..0afa8177ebb3 100644
--- a/man/man2/ioctl_xfs_bulkstat.2
+++ b/man/man2/ioctl_xfs_bulkstat.2
@@ -329,6 +329,9 @@ Parent pointers.
 .TP
 .B XFS_BS_SICK_DIRTREE
 Directory is the source of corruption in the directory tree.
+.TP
+.B XFS_BS_SICK_DATA
+File data is corrupt.
 .RE
 .SH ERRORS
 Error codes can be one of, but are not limited to, the following:
diff --git a/spaceman/health.c b/spaceman/health.c
index ee0e108d5b2d..43270209b6a9 100644
--- a/spaceman/health.c
+++ b/spaceman/health.c
@@ -201,6 +201,10 @@ static const struct flag_map inode_flags[] = {
 		.mask = XFS_BS_SICK_DIRTREE,
 		.descr = "directory tree structure",
 	},
+	{
+		.mask = XFS_BS_SICK_DATA,
+		.descr = "file data",
+	},
 	{0},
 };
 


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 36/38] xfs_io: report fsverity status via statx
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (34 preceding siblings ...)
  2024-04-30  3:40   ` [PATCH 35/38] xfs_spaceman: report data corruption Darrick J. Wong
@ 2024-04-30  3:40   ` Darrick J. Wong
  2024-04-30  3:40   ` [PATCH 37/38] xfs_io: create magic command to disable verity Darrick J. Wong
  2024-04-30  3:40   ` [PATCH 38/38] mkfs.xfs: add verity parameter Darrick J. Wong
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:40 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Report if a file has verity enable.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/attr.c                       |    2 ++
 man/man2/ioctl_xfs_fsgetxattr.2 |    3 +++
 man/man8/xfs_io.8               |    4 ++++
 3 files changed, 9 insertions(+)


diff --git a/io/attr.c b/io/attr.c
index fd82a2e73801..5df4edbbbb41 100644
--- a/io/attr.c
+++ b/io/attr.c
@@ -37,6 +37,7 @@ static struct xflags {
 	{ FS_XFLAG_FILESTREAM,		"S", "filestream"	},
 	{ FS_XFLAG_DAX,			"x", "dax"		},
 	{ FS_XFLAG_COWEXTSIZE,		"C", "cowextsize"	},
+	{ FS_XFLAG_VERITY,		"v", "verity"		},
 	{ FS_XFLAG_HASATTR,		"X", "has-xattr"	},
 	{ 0, NULL, NULL }
 };
@@ -66,6 +67,7 @@ lsattr_help(void)
 " S -- enable filestreams allocator for this directory\n"
 " x -- Use direct access (DAX) for data in this file\n"
 " C -- for files with shared blocks, observe the inode CoW extent size value\n"
+" v -- file has fsverity metadata to validate data contents\n"
 " X -- file has extended attributes (cannot be changed using chattr)\n"
 "\n"
 " Options:\n"
diff --git a/man/man2/ioctl_xfs_fsgetxattr.2 b/man/man2/ioctl_xfs_fsgetxattr.2
index 2c626a7e3742..ffcdedc1fb13 100644
--- a/man/man2/ioctl_xfs_fsgetxattr.2
+++ b/man/man2/ioctl_xfs_fsgetxattr.2
@@ -200,6 +200,9 @@ below).
 If set on a directory, new files and subdirectories created in the directory
 will have both the flag and the CoW extent size value set.
 .TP
+.B XFS_XFLAG_VERITY
+The file has fsverity metadata to verify the file contents.
+.TP
 .B XFS_XFLAG_HASATTR
 The file has extended attributes associated with it.
 
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index cd6e953d8223..4991ad471bd7 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -934,6 +934,10 @@ direct access persistent memory (XFS_XFLAG_DAX)
 .B C
 copy on write extent hint (XFS_XFLAG_COWEXTSIZE)
 
+.TP
+.B v
+fsverity enabled (XFS_XFLAG_VERITY)
+
 .TP
 .B X
 has extended attributes (XFS_XFLAG_HASATTR)


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 37/38] xfs_io: create magic command to disable verity
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (35 preceding siblings ...)
  2024-04-30  3:40   ` [PATCH 36/38] xfs_io: report fsverity status via statx Darrick J. Wong
@ 2024-04-30  3:40   ` Darrick J. Wong
  2024-04-30  3:40   ` [PATCH 38/38] mkfs.xfs: add verity parameter Darrick J. Wong
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:40 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Darrick J. Wong <djwong@kernel.org>

Create a secret command to turn off fsverity if we need to.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/scrub.c        |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_io.8 |    3 +++
 2 files changed, 50 insertions(+)


diff --git a/io/scrub.c b/io/scrub.c
index dc40afdfb36f..8a4a7e2fc3af 100644
--- a/io/scrub.c
+++ b/io/scrub.c
@@ -19,6 +19,7 @@
 static struct cmdinfo scrub_cmd;
 static struct cmdinfo repair_cmd;
 static const struct cmdinfo scrubv_cmd;
+static const struct cmdinfo noverity_cmd;
 
 static void
 scrub_help(void)
@@ -356,6 +357,7 @@ scrub_init(void)
 
 	add_command(&scrub_cmd);
 	add_command(&scrubv_cmd);
+	add_command(&noverity_cmd);
 }
 
 static void
@@ -730,3 +732,48 @@ static const struct cmdinfo scrubv_cmd = {
 	.oneline	= N_("vectored metadata scrub"),
 	.help		= scrubv_help,
 };
+
+static void
+noverity_help(void)
+{
+	printf(_(
+"\n"
+" Disable fsverity on a file.\n"));
+}
+
+#ifndef FS_IOC_DISABLE_VERITY
+# define FS_IOC_DISABLE_VERITY _IO('f', 136)
+#endif
+
+static int
+noverity_f(
+	int		argc,
+	char		**argv)
+{
+	int		c;
+	int		error;
+
+	while ((c = getopt(argc, argv, "")) != EOF) {
+		switch (c) {
+		default:
+			noverity_help();
+			return 0;
+		}
+	}
+
+	error = ioctl(file->fd, FS_IOC_DISABLE_VERITY);
+	if (error)
+		perror("noverity");
+
+	return 0;
+}
+
+static const struct cmdinfo noverity_cmd = {
+	.name		= "noverity",
+	.cfunc		= noverity_f,
+	.argmin		= -1,
+	.argmax		= -1,
+	.flags		= CMD_NOMAP_OK,
+	.oneline	= N_("disable fsverity"),
+	.help		= noverity_help,
+};
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 4991ad471bd7..013750faa113 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1093,6 +1093,9 @@ Check parameters without changing anything.
 Do not print timing information at all.
 .PD
 .RE
+.TP
+.B noverity
+Disable fs-verity on this file.
 
 .SH MEMORY MAPPED I/O COMMANDS
 .TP


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 38/38] mkfs.xfs: add verity parameter
  2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
                     ` (36 preceding siblings ...)
  2024-04-30  3:40   ` [PATCH 37/38] xfs_io: create magic command to disable verity Darrick J. Wong
@ 2024-04-30  3:40   ` Darrick J. Wong
  37 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:40 UTC (permalink / raw)
  To: aalbersh, ebiggers, cem, djwong; +Cc: linux-fsdevel, linux-xfs, fsverity

From: Andrey Albershteyn <aalbersh@redhat.com>

fs-verity brings on-disk changes (inode flag). Add parameter to
enable (default disabled) fs-verity flag in superblock. This will
make newly create filesystem read-only for older kernels.

Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: make this an -i(node) option, edit manpage]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/mkfs.xfs.8.in |    6 ++++++
 mkfs/lts_4.19.conf     |    1 +
 mkfs/lts_5.10.conf     |    1 +
 mkfs/lts_5.15.conf     |    1 +
 mkfs/lts_5.4.conf      |    1 +
 mkfs/lts_6.1.conf      |    1 +
 mkfs/lts_6.6.conf      |    1 +
 mkfs/xfs_mkfs.c        |   25 ++++++++++++++++++++++++-
 8 files changed, 36 insertions(+), 1 deletion(-)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 1db6765a805a..431cbcb8c7be 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -688,6 +688,12 @@ Online repair uses this functionality to rebuild extended attributes,
 directories, symbolic links, and realtime metadata files.
 This feature is disabled by default.
 This feature is only available for filesystems formatted with -m crc=1.
+.TP
+.BI verity[= value]
+This flag activates verity support, which enables sealing of regular file data
+with hashes and cryptographic signatures.
+This feature is disabled by default.
+This feature is only available for filesystems formatted with -m crc=1.
 .RE
 .PP
 .PD 0
diff --git a/mkfs/lts_4.19.conf b/mkfs/lts_4.19.conf
index 700dd2dff977..2cd8999b207c 100644
--- a/mkfs/lts_4.19.conf
+++ b/mkfs/lts_4.19.conf
@@ -14,6 +14,7 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+verity=0
 
 [naming]
 parent=0
diff --git a/mkfs/lts_5.10.conf b/mkfs/lts_5.10.conf
index a03cebfc41b9..765ffde89dca 100644
--- a/mkfs/lts_5.10.conf
+++ b/mkfs/lts_5.10.conf
@@ -14,6 +14,7 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+verity=0
 
 [naming]
 parent=0
diff --git a/mkfs/lts_5.15.conf b/mkfs/lts_5.15.conf
index 0c93950f3119..76afb3cae691 100644
--- a/mkfs/lts_5.15.conf
+++ b/mkfs/lts_5.15.conf
@@ -14,6 +14,7 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+verity=0
 
 [naming]
 parent=0
diff --git a/mkfs/lts_5.4.conf b/mkfs/lts_5.4.conf
index 059af4126223..f0f6526da72c 100644
--- a/mkfs/lts_5.4.conf
+++ b/mkfs/lts_5.4.conf
@@ -14,6 +14,7 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+verity=0
 
 [naming]
 parent=0
diff --git a/mkfs/lts_6.1.conf b/mkfs/lts_6.1.conf
index 4d1409208669..7591699396ca 100644
--- a/mkfs/lts_6.1.conf
+++ b/mkfs/lts_6.1.conf
@@ -14,6 +14,7 @@ rmapbt=0
 sparse=1
 nrext64=0
 exchange=0
+verity=0
 
 [naming]
 parent=0
diff --git a/mkfs/lts_6.6.conf b/mkfs/lts_6.6.conf
index 0420e8e4760b..e3f99d2aa4ee 100644
--- a/mkfs/lts_6.6.conf
+++ b/mkfs/lts_6.6.conf
@@ -14,6 +14,7 @@ rmapbt=1
 sparse=1
 nrext64=1
 exchange=0
+verity=0
 
 [naming]
 parent=0
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 7e30404646c2..f41d9749b4ba 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -92,6 +92,7 @@ enum {
 	I_SPINODES,
 	I_NREXT64,
 	I_EXCHANGE,
+	I_VERITY,
 	I_MAX_OPTS,
 };
 
@@ -477,6 +478,7 @@ static struct opt_params iopts = {
 		[I_SPINODES] = "sparse",
 		[I_NREXT64] = "nrext64",
 		[I_EXCHANGE] = "exchange",
+		[I_VERITY] = "verity",
 		[I_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -538,6 +540,12 @@ static struct opt_params iopts = {
 		  .maxval = 1,
 		  .defaultval = 1,
 		},
+		{ .index = I_VERITY,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
 	},
 };
 
@@ -946,6 +954,7 @@ struct sb_feat_args {
 	bool	nrext64;
 	bool	exchrange;		/* XFS_SB_FEAT_INCOMPAT_EXCHRANGE */
 	bool	rtgroups;		/* XFS_SB_FEAT_INCOMPAT_RTGROUPS */
+	bool	verity;			/* XFS_SB_FEAT_RO_COMPAT_VERITY */
 };
 
 struct cli_params {
@@ -1087,7 +1096,7 @@ usage( void )
 /* force overwrite */	[-f]\n\
 /* inode size */	[-i perblock=n|size=num,maxpct=n,attr=0|1|2,\n\
 			    projid32bit=0|1,sparse=0|1,nrext64=0|1,\n\
-			    exchange=0|1]\n\
+			    exchange=0|1,verity=0|1]\n\
 /* no discard */	[-K]\n\
 /* log subvol */	[-l agnum=n,internal,size=num,logdev=xxx,version=n\n\
 			    sunit=value|su=num,sectsize=num,lazy-count=0|1,\n\
@@ -1789,6 +1798,9 @@ inode_opts_parser(
 	case I_EXCHANGE:
 		cli->sb_feat.exchrange = getnum(value, opts, subopt);
 		break;
+	case I_VERITY:
+		cli->sb_feat.verity = getnum(value, opts, subopt);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2470,6 +2482,14 @@ _("metadata directory not supported without CRC support\n"));
 			usage();
 		}
 		cli->sb_feat.metadir = false;
+
+		if (cli->sb_feat.verity &&
+		    cli_opt_set(&iopts, I_VERITY)) {
+			fprintf(stderr,
+_("verity not supported without CRC support\n"));
+			usage();
+		}
+		cli->sb_feat.verity = false;
 	}
 
 	if (!cli->sb_feat.finobt) {
@@ -3813,6 +3833,8 @@ sb_set_features(
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK;
 	if (fp->inobtcnt)
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_INOBTCNT;
+	if (fp->verity)
+		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_VERITY;
 	if (fp->bigtime)
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_BIGTIME;
 	if (fp->parent_pointers) {
@@ -4766,6 +4788,7 @@ main(
 			.nortalign = false,
 			.bigtime = true,
 			.nrext64 = true,
+			.verity = false,
 			/*
 			 * When we decide to enable a new feature by default,
 			 * please remember to update the mkfs conf files.


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 1/6] common/verity: enable fsverity for XFS
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
@ 2024-04-30  3:41   ` Darrick J. Wong
  2024-04-30 12:39     ` Andrey Albershteyn
  2024-04-30  3:41   ` [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs Darrick J. Wong
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:41 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: Andrey Albershteyn, fsverity, linux-fsdevel, guan, linux-xfs, fstests

From: Andrey Albershteyn <aalbersh@redhat.com>

XFS supports verity and can be enabled for -g verity group.

Signed-off-by: Andrey Albershteyn <andrey.albershteyn@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/verity |   39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)


diff --git a/common/verity b/common/verity
index 59b67e1201..20408c8c0e 100644
--- a/common/verity
+++ b/common/verity
@@ -43,7 +43,16 @@ _require_scratch_verity()
 
 	# The filesystem may be aware of fs-verity but have it disabled by
 	# CONFIG_FS_VERITY=n.  Detect support via sysfs.
-	if [ ! -e /sys/fs/$fstyp/features/verity ]; then
+	case $FSTYP in
+	xfs)
+		_scratch_unmount
+		_check_scratch_xfs_features VERITY &>>$seqres.full
+		_scratch_mount
+	;;
+	*)
+		test -e /sys/fs/$fstyp/features/verity
+	esac
+	if [ ! $? ]; then
 		_notrun "kernel $fstyp isn't configured with verity support"
 	fi
 
@@ -201,6 +210,9 @@ _scratch_mkfs_verity()
 	ext4|f2fs)
 		_scratch_mkfs -O verity
 		;;
+	xfs)
+		_scratch_mkfs -i verity
+		;;
 	btrfs)
 		_scratch_mkfs
 		;;
@@ -334,12 +346,19 @@ _fsv_scratch_corrupt_bytes()
 	local lstart lend pstart pend
 	local dd_cmds=()
 	local cmd
+	local device=$SCRATCH_DEV
 
 	sync	# Sync to avoid unwritten extents
 
 	cat > $tmp.bytes
 	local end=$(( offset + $(_get_filesize $tmp.bytes ) ))
 
+	# If this is an xfs realtime file, switch @device to the rt device
+	if [ $FSTYP = "xfs" ]; then
+		$XFS_IO_PROG -r -c 'stat -v' "$file" | grep -q -w realtime && \
+			device=$SCRATCH_RTDEV
+	fi
+
 	# For each extent that intersects the requested range in order, add a
 	# command that writes the next part of the data to that extent.
 	while read -r lstart lend pstart pend; do
@@ -355,7 +374,7 @@ _fsv_scratch_corrupt_bytes()
 		elif (( offset < lend )); then
 			local len=$((lend - offset))
 			local seek=$((pstart + (offset - lstart)))
-			dd_cmds+=("head -c $len | dd of=$SCRATCH_DEV oflag=seek_bytes seek=$seek status=none")
+			dd_cmds+=("head -c $len | dd of=$device oflag=seek_bytes seek=$seek status=none")
 			(( offset += len ))
 		fi
 	done < <($XFS_IO_PROG -r -c "fiemap $offset $((end - offset))" "$file" \
@@ -408,6 +427,22 @@ _fsv_scratch_corrupt_merkle_tree()
 		done
 		_scratch_mount
 		;;
+	xfs)
+		local ino=$(stat -c '%i' $file)
+		local attr_offset=$(( $offset % $FSV_BLOCK_SIZE ))
+		local attr_index=$(printf "%08d" $(( offset - attr_offset )))
+		_scratch_unmount
+		# Attribute name is 8 bytes long (byte position of Merkle tree block)
+		_scratch_xfs_db -x -c "inode $ino" \
+			-c "attr_modify -f -m 8 -o $attr_offset $attr_index \"BUG\"" \
+			-c "ablock 0" -c "print" \
+			>>$seqres.full
+		# In case bsize == 4096 and merkle block size == 1024, by
+		# modifying attribute with 'attr_modify we can corrupt quota
+		# account. Let's repair it
+		_scratch_xfs_repair >> $seqres.full 2>&1
+		_scratch_mount
+		;;
 	*)
 		_fail "_fsv_scratch_corrupt_merkle_tree() unimplemented on $FSTYP"
 		;;


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:41   ` [PATCH 1/6] common/verity: enable fsverity " Darrick J. Wong
@ 2024-04-30  3:41   ` Darrick J. Wong
  2024-04-30 12:46     ` Andrey Albershteyn
  2024-04-30  3:41   ` [PATCH 3/6] xfs/122: adapt to fsverity Darrick J. Wong
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:41 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: fsverity, linux-fsdevel, guan, linux-xfs, fstests

From: Darrick J. Wong <djwong@kernel.org>

Adjust these tests to accomdate the use of xattrs to store fsverity
metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/021     |    3 +++
 tests/xfs/122.out |    1 +
 2 files changed, 4 insertions(+)


diff --git a/tests/xfs/021 b/tests/xfs/021
index ef307fc064..dcecf41958 100755
--- a/tests/xfs/021
+++ b/tests/xfs/021
@@ -118,6 +118,7 @@ _scratch_xfs_db -r -c "inode $inum_1" -c "print a.sfattr"  | \
 	perl -ne '
 /\.secure/ && next;
 /\.parent/ && next;
+/\.verity/ && next;
 	print unless /^\d+:\[.*/;'
 
 echo "*** dump attributes (2)"
@@ -128,6 +129,7 @@ _scratch_xfs_db -r -c "inode $inum_2" -c "a a.bmx[0].startblock" -c print  \
 	| perl -ne '
 s/,secure//;
 s/,parent//;
+s/,verity//;
 s/info.hdr/info/;
 /hdr.info.crc/ && next;
 /hdr.info.bno/ && next;
@@ -135,6 +137,7 @@ s/info.hdr/info/;
 /hdr.info.lsn/ && next;
 /hdr.info.owner/ && next;
 /\.parent/ && next;
+/\.verity/ && next;
 s/^(hdr.info.magic =) 0x3bee/\1 0xfbee/;
 s/^(hdr.firstused =) (\d+)/\1 FIRSTUSED/;
 s/^(hdr.freemap\[0-2] = \[base,size]).*/\1 [FREEMAP..]/;
diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index abd82e7142..019fe7545f 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -142,6 +142,7 @@ sizeof(struct xfs_scrub_vec) = 16
 sizeof(struct xfs_scrub_vec_head) = 40
 sizeof(struct xfs_swap_extent) = 64
 sizeof(struct xfs_unmount_log_format) = 8
+sizeof(struct xfs_verity_merkle_key) = 8
 sizeof(struct xfs_xmd_log_format) = 16
 sizeof(struct xfs_xmi_log_format) = 88
 sizeof(union xfs_rtword_raw) = 4


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 3/6] xfs/122: adapt to fsverity
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
  2024-04-30  3:41   ` [PATCH 1/6] common/verity: enable fsverity " Darrick J. Wong
  2024-04-30  3:41   ` [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs Darrick J. Wong
@ 2024-04-30  3:41   ` Darrick J. Wong
  2024-04-30 12:45     ` Andrey Albershteyn
  2024-04-30  3:41   ` [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata Darrick J. Wong
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:41 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: fsverity, linux-fsdevel, guan, linux-xfs, fstests

From: Darrick J. Wong <djwong@kernel.org>

Add fields for fsverity ondisk structures.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/122.out |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index 019fe7545f..22f36c0311 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -65,6 +65,7 @@ sizeof(struct xfs_agfl) = 36
 sizeof(struct xfs_attr3_leaf_hdr) = 80
 sizeof(struct xfs_attr3_leafblock) = 88
 sizeof(struct xfs_attr3_rmt_hdr) = 56
+sizeof(struct xfs_attr3_rmtverity_hdr) = 36
 sizeof(struct xfs_attr_sf_entry) = 3
 sizeof(struct xfs_attr_sf_hdr) = 4
 sizeof(struct xfs_attr_shortform) = 8
@@ -120,6 +121,7 @@ sizeof(struct xfs_log_dinode) = 176
 sizeof(struct xfs_log_legacy_timestamp) = 8
 sizeof(struct xfs_map_extent) = 32
 sizeof(struct xfs_map_freesp) = 32
+sizeof(struct xfs_merkle_key) = 8
 sizeof(struct xfs_parent_rec) = 12
 sizeof(struct xfs_phys_extent) = 16
 sizeof(struct xfs_refcount_key) = 4


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-30  3:41   ` [PATCH 3/6] xfs/122: adapt to fsverity Darrick J. Wong
@ 2024-04-30  3:41   ` Darrick J. Wong
  2024-04-30 12:29     ` Andrey Albershteyn
  2024-04-30  3:42   ` [PATCH 5/6] xfs: test disabling fsverity Darrick J. Wong
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:41 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: fsverity, linux-fsdevel, guan, linux-xfs, fstests

From: Darrick J. Wong <djwong@kernel.org>

Create a basic test to ensure that xfs_scrub media scans complain about
files that don't pass fsverity validation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/1880     |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/1880.out |   37 ++++++++++++++
 2 files changed, 172 insertions(+)
 create mode 100755 tests/xfs/1880
 create mode 100644 tests/xfs/1880.out


diff --git a/tests/xfs/1880 b/tests/xfs/1880
new file mode 100755
index 0000000000..a2119f04c2
--- /dev/null
+++ b/tests/xfs/1880
@@ -0,0 +1,135 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Oracle.  All Rights Reserved.
+#
+# FS QA Test 1880
+#
+# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
+# that xfs_scrub detects this and repairs whatever it can.
+#
+. ./common/preamble
+_begin_fstest auto quick verity
+
+_cleanup()
+{
+	cd /
+	_restore_fsverity_signatures
+	rm -f $tmp.*
+}
+
+. ./common/verity
+. ./common/filter
+. ./common/fuzzy
+
+_supported_fs xfs
+_require_scratch_verity
+_disable_fsverity_signatures
+_require_fsverity_corruption
+_require_scratch_nocheck	# fsck test
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount
+
+_require_scratch_xfs_scrub
+_require_xfs_has_feature "$SCRATCH_MNT" verity
+VICTIM_FILE="$SCRATCH_MNT/a"
+_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"
+
+create_victim()
+{
+	local filesize="${1:-3}"
+
+	rm -f "$VICTIM_FILE"
+	perl -e "print 'moo' x $((filesize / 3))" > "$VICTIM_FILE"
+	fsverity enable --hash-alg=sha256 --block-size=1024 "$VICTIM_FILE"
+	fsverity measure "$VICTIM_FILE" | _filter_scratch
+}
+
+filter_scrub() {
+	awk '{
+		if ($0 ~ /fsverity metadata missing/) {
+			print("fsverity metadata missing");
+		} else if ($0 ~ /Corruption.*inode record/) {
+			print("xfs_ino corruption");
+		} else if ($0 ~ /verity error at offset/) {
+			print("fsverity read error");
+		}
+	}'
+}
+
+run_scrub() {
+	$XFS_SCRUB_PROG -b -x $* $SCRATCH_MNT &> $tmp.moo
+	filter_scrub < $tmp.moo
+	cat $tmp.moo >> $seqres.full
+}
+
+cat_victim() {
+	$XFS_IO_PROG -r -c 'pread -q 0 4096' "$VICTIM_FILE" 2>&1 | _filter_scratch
+}
+
+echo "Part 1: Delete the fsverity descriptor" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c "attr_remove -f vdesc" -c 'ablock 0' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+run_scrub -n
+
+echo "Part 2: Run repair to clear XFS_DIFLAG2_VERITY" | tee -a $seqres.full
+run_scrub
+cat_victim
+run_scrub -n
+
+echo "Part 3: Corrupt the fsverity descriptor" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 0 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+run_scrub -n
+
+echo "Part 4: Run repair to clear XFS_DIFLAG2_VERITY" | tee -a $seqres.full
+run_scrub
+cat_victim
+run_scrub -n
+
+echo "Part 5: Corrupt the fsverity file data" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c 'dblock 0' -c 'blocktrash -3 -o 0 -x 24 -y 24 -z' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+run_scrub -n
+
+echo "Part 6: Run repair which will not help" | tee -a $seqres.full
+run_scrub
+cat_victim
+run_scrub -n
+
+echo "Part 7: Corrupt a merkle tree block" | tee -a $seqres.full
+create_victim 1234 # two merkle tree blocks
+_fsv_scratch_corrupt_merkle_tree "$VICTIM_FILE" 0
+cat_victim
+run_scrub -n
+
+echo "Part 8: Run repair which will not help" | tee -a $seqres.full
+run_scrub
+cat_victim
+run_scrub -n
+
+echo "Part 9: Corrupt the fsverity salt" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 3 #08' -c 'attr_modify -f "vdesc" -o 80 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+run_scrub -n
+
+echo "Part 10: Run repair which will not help" | tee -a $seqres.full
+run_scrub
+cat_victim
+run_scrub -n
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/1880.out b/tests/xfs/1880.out
new file mode 100644
index 0000000000..17961ec70b
--- /dev/null
+++ b/tests/xfs/1880.out
@@ -0,0 +1,37 @@
+QA output created by 1880
+Part 1: Delete the fsverity descriptor
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+SCRATCH_MNT/a: Invalid argument
+xfs_ino corruption
+fsverity metadata missing
+Part 2: Run repair to clear XFS_DIFLAG2_VERITY
+Part 3: Corrupt the fsverity descriptor
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+SCRATCH_MNT/a: Invalid argument
+xfs_ino corruption
+fsverity metadata missing
+Part 4: Run repair to clear XFS_DIFLAG2_VERITY
+Part 5: Corrupt the fsverity file data
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+pread: Input/output error
+fsverity read error
+Part 6: Run repair which will not help
+fsverity read error
+pread: Input/output error
+fsverity read error
+Part 7: Corrupt a merkle tree block
+sha256:c56f1115966bafa6c9d32b4717f554b304161f33923c9292c7a92a27866a853c SCRATCH_MNT/a
+pread: Input/output error
+fsverity read error
+Part 8: Run repair which will not help
+fsverity read error
+pread: Input/output error
+fsverity read error
+Part 9: Corrupt the fsverity salt
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+pread: Input/output error
+fsverity read error
+Part 10: Run repair which will not help
+fsverity read error
+pread: Input/output error
+fsverity read error


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 5/6] xfs: test disabling fsverity
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-30  3:41   ` [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata Darrick J. Wong
@ 2024-04-30  3:42   ` Darrick J. Wong
  2024-04-30 12:56     ` Andrey Albershteyn
  2024-04-30 13:11     ` Andrey Albershteyn
  2024-04-30  3:42   ` [PATCH 6/6] common/populate: add verity files to populate xfs images Darrick J. Wong
  2024-05-11  5:01   ` [PATCHSET v5.6] fstests: fs-verity support for XFS Zorro Lang
  6 siblings, 2 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:42 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: fsverity, linux-fsdevel, guan, linux-xfs, fstests

From: Darrick J. Wong <djwong@kernel.org>

Add a test to make sure that we can disable fsverity on a file that
doesn't pass fsverity validation on its contents anymore.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/1881     |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/1881.out |   28 +++++++++++++
 2 files changed, 139 insertions(+)
 create mode 100755 tests/xfs/1881
 create mode 100644 tests/xfs/1881.out


diff --git a/tests/xfs/1881 b/tests/xfs/1881
new file mode 100755
index 0000000000..411802d7c7
--- /dev/null
+++ b/tests/xfs/1881
@@ -0,0 +1,111 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Oracle.  All Rights Reserved.
+#
+# FS QA Test 1881
+#
+# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
+# that we can still disable fsverity, at least for the latter cases.
+#
+. ./common/preamble
+_begin_fstest auto quick verity
+
+_cleanup()
+{
+	cd /
+	_restore_fsverity_signatures
+	rm -f $tmp.*
+}
+
+. ./common/verity
+. ./common/filter
+. ./common/fuzzy
+
+_supported_fs xfs
+_require_scratch_verity
+_disable_fsverity_signatures
+_require_fsverity_corruption
+_require_xfs_io_command noverity
+_require_scratch_nocheck	# corruption test
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount
+
+_require_xfs_has_feature "$SCRATCH_MNT" verity
+VICTIM_FILE="$SCRATCH_MNT/a"
+_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"
+
+create_victim()
+{
+	local filesize="${1:-3}"
+
+	rm -f "$VICTIM_FILE"
+	perl -e "print 'moo' x $((filesize / 3))" > "$VICTIM_FILE"
+	fsverity enable --hash-alg=sha256 --block-size=1024 "$VICTIM_FILE"
+	fsverity measure "$VICTIM_FILE" | _filter_scratch
+}
+
+disable_verity() {
+	$XFS_IO_PROG -r -c 'noverity' "$VICTIM_FILE" 2>&1 | _filter_scratch
+}
+
+cat_victim() {
+	$XFS_IO_PROG -r -c 'pread -q 0 4096' "$VICTIM_FILE" 2>&1 | _filter_scratch
+}
+
+echo "Part 1: Delete the fsverity descriptor" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c "attr_remove -f vdesc" -c 'ablock 0' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+
+echo "Part 2: Disable fsverity, which won't work" | tee -a $seqres.full
+disable_verity
+cat_victim
+
+echo "Part 3: Corrupt the fsverity descriptor" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 0 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+
+echo "Part 4: Disable fsverity, which won't work" | tee -a $seqres.full
+disable_verity
+cat_victim
+
+echo "Part 5: Corrupt the fsverity file data" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c 'dblock 0' -c 'blocktrash -3 -o 0 -x 24 -y 24 -z' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+
+echo "Part 6: Disable fsverity, which should work" | tee -a $seqres.full
+disable_verity
+cat_victim
+
+echo "Part 7: Corrupt a merkle tree block" | tee -a $seqres.full
+create_victim 1234 # two merkle tree blocks
+_fsv_scratch_corrupt_merkle_tree "$VICTIM_FILE" 0
+cat_victim
+
+echo "Part 8: Disable fsverity, which should work" | tee -a $seqres.full
+disable_verity
+cat_victim
+
+echo "Part 9: Corrupt the fsverity salt" | tee -a $seqres.full
+create_victim
+_scratch_unmount
+_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 3 #08' -c 'attr_modify -f "vdesc" -o 80 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
+_scratch_mount
+cat_victim
+
+echo "Part 10: Disable fsverity, which should work" | tee -a $seqres.full
+disable_verity
+cat_victim
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/1881.out b/tests/xfs/1881.out
new file mode 100644
index 0000000000..3e94b8001e
--- /dev/null
+++ b/tests/xfs/1881.out
@@ -0,0 +1,28 @@
+QA output created by 1881
+Part 1: Delete the fsverity descriptor
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+SCRATCH_MNT/a: Invalid argument
+Part 2: Disable fsverity, which won't work
+SCRATCH_MNT/a: Invalid argument
+SCRATCH_MNT/a: Invalid argument
+Part 3: Corrupt the fsverity descriptor
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+SCRATCH_MNT/a: Invalid argument
+Part 4: Disable fsverity, which won't work
+SCRATCH_MNT/a: Invalid argument
+SCRATCH_MNT/a: Invalid argument
+Part 5: Corrupt the fsverity file data
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+pread: Input/output error
+Part 6: Disable fsverity, which should work
+pread: Input/output error
+Part 7: Corrupt a merkle tree block
+sha256:c56f1115966bafa6c9d32b4717f554b304161f33923c9292c7a92a27866a853c SCRATCH_MNT/a
+pread: Input/output error
+Part 8: Disable fsverity, which should work
+pread: Input/output error
+Part 9: Corrupt the fsverity salt
+sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
+pread: Input/output error
+Part 10: Disable fsverity, which should work
+pread: Input/output error


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* [PATCH 6/6] common/populate: add verity files to populate xfs images
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-30  3:42   ` [PATCH 5/6] xfs: test disabling fsverity Darrick J. Wong
@ 2024-04-30  3:42   ` Darrick J. Wong
  2024-04-30 13:22     ` Andrey Albershteyn
  2024-05-11  5:01   ` [PATCHSET v5.6] fstests: fs-verity support for XFS Zorro Lang
  6 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30  3:42 UTC (permalink / raw)
  To: aalbersh, zlang, ebiggers, djwong
  Cc: fsverity, linux-fsdevel, guan, linux-xfs, fstests

From: Darrick J. Wong <djwong@kernel.org>

If verity is enabled on a filesystem, we should create some sample
verity files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/populate |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)


diff --git a/common/populate b/common/populate
index 35071f4210..ab9495e739 100644
--- a/common/populate
+++ b/common/populate
@@ -520,6 +520,30 @@ _scratch_xfs_populate() {
 		done
 	fi
 
+	# verity merkle trees
+	is_verity="$(_xfs_has_feature "$SCRATCH_MNT" verity -v)"
+	if [ $is_verity -gt 0 ]; then
+		echo "+ fsverity"
+
+		# Create a biggish file with all zeroes, because metadump
+		# won't preserve data blocks and we don't want the hashes to
+		# stop working for our sample fs.
+		for ((pos = 0, i = 88; pos < 23456789; pos += 234567, i++)); do
+			$XFS_IO_PROG -f -c "pwrite -S 0 $pos 234567" "$SCRATCH_MNT/verity"
+		done
+
+		fsverity enable "$SCRATCH_MNT/verity"
+
+		# Create a sparse file
+		$XFS_IO_PROG -f -c "pwrite -S 0 0 3" -c "pwrite -S 0 23456789 3" "$SCRATCH_MNT/sparse_verity"
+		fsverity enable "$SCRATCH_MNT/sparse_verity"
+
+		# Create a salted sparse file
+		$XFS_IO_PROG -f -c "pwrite -S 0 0 3" -c "pwrite -S 0 23456789 3" "$SCRATCH_MNT/salted_verity"
+		local salt="5846532066696e616c6c7920686173206461746120636865636b73756d732121"	# XFS finally has data checksums!!
+		fsverity enable --salt="$salt" "$SCRATCH_MNT/salted_verity"
+	fi
+
 	# Copy some real files (xfs tests, I guess...)
 	echo "+ real files"
 	test $fill -ne 0 && __populate_fill_fs "${SCRATCH_MNT}" 5


^ permalink raw reply related	[flat|nested] 159+ messages in thread

* Re: [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata
  2024-04-30  3:41   ` [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata Darrick J. Wong
@ 2024-04-30 12:29     ` Andrey Albershteyn
  2024-04-30 15:43       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 12:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-29 20:41:50, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Create a basic test to ensure that xfs_scrub media scans complain about
> files that don't pass fsverity validation.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/1880     |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1880.out |   37 ++++++++++++++
>  2 files changed, 172 insertions(+)
>  create mode 100755 tests/xfs/1880
>  create mode 100644 tests/xfs/1880.out
> 
> 
> diff --git a/tests/xfs/1880 b/tests/xfs/1880
> new file mode 100755
> index 0000000000..a2119f04c2
> --- /dev/null
> +++ b/tests/xfs/1880
> @@ -0,0 +1,135 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 1880
> +#
> +# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
> +# that xfs_scrub detects this and repairs whatever it can.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick verity
> +
> +_cleanup()
> +{
> +	cd /
> +	_restore_fsverity_signatures
> +	rm -f $tmp.*
> +}
> +
> +. ./common/verity
> +. ./common/filter
> +. ./common/fuzzy
> +
> +_supported_fs xfs
> +_require_scratch_verity
> +_disable_fsverity_signatures
> +_require_fsverity_corruption
> +_require_scratch_nocheck	# fsck test
> +
> +_scratch_mkfs >> $seqres.full
> +_scratch_mount
> +
> +_require_scratch_xfs_scrub
> +_require_xfs_has_feature "$SCRATCH_MNT" verity
> +VICTIM_FILE="$SCRATCH_MNT/a"
> +_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"

I think this is not necessary, _require_scratch_verity already does
check if verity can be enabled (with more detailed errors).

Otherwise, looks good to me:
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 1/6] common/verity: enable fsverity for XFS
  2024-04-30  3:41   ` [PATCH 1/6] common/verity: enable fsverity " Darrick J. Wong
@ 2024-04-30 12:39     ` Andrey Albershteyn
  2024-04-30 15:35       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 12:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, Andrey Albershteyn, fsverity, linux-fsdevel,
	guan, linux-xfs, fstests

On 2024-04-29 20:41:03, Darrick J. Wong wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
> 
> XFS supports verity and can be enabled for -g verity group.
> 
> Signed-off-by: Andrey Albershteyn <andrey.albershteyn@gmail.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/verity |   39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/common/verity b/common/verity
> index 59b67e1201..20408c8c0e 100644
> --- a/common/verity
> +++ b/common/verity
> @@ -43,7 +43,16 @@ _require_scratch_verity()
>  
>  	# The filesystem may be aware of fs-verity but have it disabled by
>  	# CONFIG_FS_VERITY=n.  Detect support via sysfs.
> -	if [ ! -e /sys/fs/$fstyp/features/verity ]; then
> +	case $FSTYP in
> +	xfs)
> +		_scratch_unmount
> +		_check_scratch_xfs_features VERITY &>>$seqres.full
> +		_scratch_mount
> +	;;
> +	*)
> +		test -e /sys/fs/$fstyp/features/verity
> +	esac
> +	if [ ! $? ]; then
>  		_notrun "kernel $fstyp isn't configured with verity support"
>  	fi
>  
> @@ -201,6 +210,9 @@ _scratch_mkfs_verity()
>  	ext4|f2fs)
>  		_scratch_mkfs -O verity
>  		;;
> +	xfs)
> +		_scratch_mkfs -i verity
> +		;;
>  	btrfs)
>  		_scratch_mkfs
>  		;;
> @@ -334,12 +346,19 @@ _fsv_scratch_corrupt_bytes()
>  	local lstart lend pstart pend
>  	local dd_cmds=()
>  	local cmd
> +	local device=$SCRATCH_DEV
>  
>  	sync	# Sync to avoid unwritten extents
>  
>  	cat > $tmp.bytes
>  	local end=$(( offset + $(_get_filesize $tmp.bytes ) ))
>  
> +	# If this is an xfs realtime file, switch @device to the rt device
> +	if [ $FSTYP = "xfs" ]; then
> +		$XFS_IO_PROG -r -c 'stat -v' "$file" | grep -q -w realtime && \
> +			device=$SCRATCH_RTDEV
> +	fi
> +
>  	# For each extent that intersects the requested range in order, add a
>  	# command that writes the next part of the data to that extent.
>  	while read -r lstart lend pstart pend; do
> @@ -355,7 +374,7 @@ _fsv_scratch_corrupt_bytes()
>  		elif (( offset < lend )); then
>  			local len=$((lend - offset))
>  			local seek=$((pstart + (offset - lstart)))
> -			dd_cmds+=("head -c $len | dd of=$SCRATCH_DEV oflag=seek_bytes seek=$seek status=none")
> +			dd_cmds+=("head -c $len | dd of=$device oflag=seek_bytes seek=$seek status=none")
>  			(( offset += len ))
>  		fi
>  	done < <($XFS_IO_PROG -r -c "fiemap $offset $((end - offset))" "$file" \
> @@ -408,6 +427,22 @@ _fsv_scratch_corrupt_merkle_tree()
>  		done
>  		_scratch_mount
>  		;;
> +	xfs)
> +		local ino=$(stat -c '%i' $file)

I didn't know about xfs_db's "path" command, this can be probably
replace with -c "path $file", below in _scratch_xfs_db.

> +		local attr_offset=$(( $offset % $FSV_BLOCK_SIZE ))
> +		local attr_index=$(printf "%08d" $(( offset - attr_offset )))
> +		_scratch_unmount
> +		# Attribute name is 8 bytes long (byte position of Merkle tree block)
> +		_scratch_xfs_db -x -c "inode $ino" \
                                here   ^^^^^^^^^^
> +			-c "attr_modify -f -m 8 -o $attr_offset $attr_index \"BUG\"" \
> +			-c "ablock 0" -c "print" \
> +			>>$seqres.full
> +		# In case bsize == 4096 and merkle block size == 1024, by
> +		# modifying attribute with 'attr_modify we can corrupt quota
> +		# account. Let's repair it
> +		_scratch_xfs_repair >> $seqres.full 2>&1
> +		_scratch_mount
> +		;;
>  	*)
>  		_fail "_fsv_scratch_corrupt_merkle_tree() unimplemented on $FSTYP"
>  		;;
> 
> 

Otherwise, looks good to me:
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 3/6] xfs/122: adapt to fsverity
  2024-04-30  3:41   ` [PATCH 3/6] xfs/122: adapt to fsverity Darrick J. Wong
@ 2024-04-30 12:45     ` Andrey Albershteyn
  2024-04-30 15:37       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 12:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-29 20:41:34, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add fields for fsverity ondisk structures.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/122.out |    2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> diff --git a/tests/xfs/122.out b/tests/xfs/122.out
> index 019fe7545f..22f36c0311 100644
> --- a/tests/xfs/122.out
> +++ b/tests/xfs/122.out
> @@ -65,6 +65,7 @@ sizeof(struct xfs_agfl) = 36
>  sizeof(struct xfs_attr3_leaf_hdr) = 80
>  sizeof(struct xfs_attr3_leafblock) = 88
>  sizeof(struct xfs_attr3_rmt_hdr) = 56
> +sizeof(struct xfs_attr3_rmtverity_hdr) = 36
>  sizeof(struct xfs_attr_sf_entry) = 3
>  sizeof(struct xfs_attr_sf_hdr) = 4
>  sizeof(struct xfs_attr_shortform) = 8
> @@ -120,6 +121,7 @@ sizeof(struct xfs_log_dinode) = 176
>  sizeof(struct xfs_log_legacy_timestamp) = 8
>  sizeof(struct xfs_map_extent) = 32
>  sizeof(struct xfs_map_freesp) = 32
> +sizeof(struct xfs_merkle_key) = 8
>  sizeof(struct xfs_parent_rec) = 12
>  sizeof(struct xfs_phys_extent) = 16
>  sizeof(struct xfs_refcount_key) = 4
> 
> 

Shouldn't this patch be squashed with previous one?

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs
  2024-04-30  3:41   ` [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs Darrick J. Wong
@ 2024-04-30 12:46     ` Andrey Albershteyn
  2024-04-30 15:36       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 12:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-29 20:41:19, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Adjust these tests to accomdate the use of xattrs to store fsverity
> metadata.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/021     |    3 +++
>  tests/xfs/122.out |    1 +
>  2 files changed, 4 insertions(+)
> 
> 
> diff --git a/tests/xfs/021 b/tests/xfs/021
> index ef307fc064..dcecf41958 100755
> --- a/tests/xfs/021
> +++ b/tests/xfs/021
> @@ -118,6 +118,7 @@ _scratch_xfs_db -r -c "inode $inum_1" -c "print a.sfattr"  | \
>  	perl -ne '
>  /\.secure/ && next;
>  /\.parent/ && next;
> +/\.verity/ && next;
>  	print unless /^\d+:\[.*/;'
>  
>  echo "*** dump attributes (2)"
> @@ -128,6 +129,7 @@ _scratch_xfs_db -r -c "inode $inum_2" -c "a a.bmx[0].startblock" -c print  \
>  	| perl -ne '
>  s/,secure//;
>  s/,parent//;
> +s/,verity//;
>  s/info.hdr/info/;
>  /hdr.info.crc/ && next;
>  /hdr.info.bno/ && next;
> @@ -135,6 +137,7 @@ s/info.hdr/info/;
>  /hdr.info.lsn/ && next;
>  /hdr.info.owner/ && next;
>  /\.parent/ && next;
> +/\.verity/ && next;
>  s/^(hdr.info.magic =) 0x3bee/\1 0xfbee/;
>  s/^(hdr.firstused =) (\d+)/\1 FIRSTUSED/;
>  s/^(hdr.freemap\[0-2] = \[base,size]).*/\1 [FREEMAP..]/;
> diff --git a/tests/xfs/122.out b/tests/xfs/122.out
> index abd82e7142..019fe7545f 100644
> --- a/tests/xfs/122.out
> +++ b/tests/xfs/122.out
> @@ -142,6 +142,7 @@ sizeof(struct xfs_scrub_vec) = 16
>  sizeof(struct xfs_scrub_vec_head) = 40
>  sizeof(struct xfs_swap_extent) = 64
>  sizeof(struct xfs_unmount_log_format) = 8
> +sizeof(struct xfs_verity_merkle_key) = 8
>  sizeof(struct xfs_xmd_log_format) = 16
>  sizeof(struct xfs_xmi_log_format) = 88
>  sizeof(union xfs_rtword_raw) = 4
> 

Looks good to me:
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 5/6] xfs: test disabling fsverity
  2024-04-30  3:42   ` [PATCH 5/6] xfs: test disabling fsverity Darrick J. Wong
@ 2024-04-30 12:56     ` Andrey Albershteyn
  2024-04-30 13:11     ` Andrey Albershteyn
  1 sibling, 0 replies; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 12:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-29 20:42:05, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add a test to make sure that we can disable fsverity on a file that
> doesn't pass fsverity validation on its contents anymore.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/1881     |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1881.out |   28 +++++++++++++
>  2 files changed, 139 insertions(+)
>  create mode 100755 tests/xfs/1881
>  create mode 100644 tests/xfs/1881.out
> 
> 
> diff --git a/tests/xfs/1881 b/tests/xfs/1881
> new file mode 100755
> index 0000000000..411802d7c7
> --- /dev/null
> +++ b/tests/xfs/1881
> @@ -0,0 +1,111 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 1881
> +#
> +# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
> +# that we can still disable fsverity, at least for the latter cases.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick verity
> +
> +_cleanup()
> +{
> +	cd /
> +	_restore_fsverity_signatures
> +	rm -f $tmp.*
> +}
> +
> +. ./common/verity
> +. ./common/filter
> +. ./common/fuzzy
> +
> +_supported_fs xfs
> +_require_scratch_verity
> +_disable_fsverity_signatures
> +_require_fsverity_corruption
> +_require_xfs_io_command noverity
> +_require_scratch_nocheck	# corruption test
> +
> +_scratch_mkfs >> $seqres.full
> +_scratch_mount
> +
> +_require_xfs_has_feature "$SCRATCH_MNT" verity
> +VICTIM_FILE="$SCRATCH_MNT/a"
> +_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"

also here, if not needed in 1880

Looks good to me:
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 5/6] xfs: test disabling fsverity
  2024-04-30  3:42   ` [PATCH 5/6] xfs: test disabling fsverity Darrick J. Wong
  2024-04-30 12:56     ` Andrey Albershteyn
@ 2024-04-30 13:11     ` Andrey Albershteyn
  2024-04-30 15:48       ` Darrick J. Wong
  1 sibling, 1 reply; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 13:11 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-29 20:42:05, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add a test to make sure that we can disable fsverity on a file that
> doesn't pass fsverity validation on its contents anymore.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  tests/xfs/1881     |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1881.out |   28 +++++++++++++
>  2 files changed, 139 insertions(+)
>  create mode 100755 tests/xfs/1881
>  create mode 100644 tests/xfs/1881.out
> 
> 
> diff --git a/tests/xfs/1881 b/tests/xfs/1881
> new file mode 100755
> index 0000000000..411802d7c7
> --- /dev/null
> +++ b/tests/xfs/1881
> @@ -0,0 +1,111 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> +#
> +# FS QA Test 1881
> +#
> +# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
> +# that we can still disable fsverity, at least for the latter cases.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick verity
> +
> +_cleanup()
> +{
> +	cd /
> +	_restore_fsverity_signatures
> +	rm -f $tmp.*
> +}
> +
> +. ./common/verity
> +. ./common/filter
> +. ./common/fuzzy
> +
> +_supported_fs xfs
> +_require_scratch_verity
> +_disable_fsverity_signatures
> +_require_fsverity_corruption
> +_require_xfs_io_command noverity
> +_require_scratch_nocheck	# corruption test
> +
> +_scratch_mkfs >> $seqres.full
> +_scratch_mount
> +
> +_require_xfs_has_feature "$SCRATCH_MNT" verity
> +VICTIM_FILE="$SCRATCH_MNT/a"
> +_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"
> +
> +create_victim()
> +{
> +	local filesize="${1:-3}"
> +
> +	rm -f "$VICTIM_FILE"
> +	perl -e "print 'moo' x $((filesize / 3))" > "$VICTIM_FILE"
> +	fsverity enable --hash-alg=sha256 --block-size=1024 "$VICTIM_FILE"
> +	fsverity measure "$VICTIM_FILE" | _filter_scratch
> +}
> +
> +disable_verity() {
> +	$XFS_IO_PROG -r -c 'noverity' "$VICTIM_FILE" 2>&1 | _filter_scratch
> +}
> +
> +cat_victim() {
> +	$XFS_IO_PROG -r -c 'pread -q 0 4096' "$VICTIM_FILE" 2>&1 | _filter_scratch
> +}
> +
> +echo "Part 1: Delete the fsverity descriptor" | tee -a $seqres.full
> +create_victim
> +_scratch_unmount
> +_scratch_xfs_db -x -c "path /a" -c "attr_remove -f vdesc" -c 'ablock 0' -c print >> $seqres.full
> +_scratch_mount
> +cat_victim
> +
> +echo "Part 2: Disable fsverity, which won't work" | tee -a $seqres.full
> +disable_verity
> +cat_victim
> +
> +echo "Part 3: Corrupt the fsverity descriptor" | tee -a $seqres.full
> +create_victim
> +_scratch_unmount
> +_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 0 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
> +_scratch_mount
> +cat_victim
> +
> +echo "Part 4: Disable fsverity, which won't work" | tee -a $seqres.full
> +disable_verity
> +cat_victim
> +
> +echo "Part 5: Corrupt the fsverity file data" | tee -a $seqres.full
> +create_victim
> +_scratch_unmount
> +_scratch_xfs_db -x -c "path /a" -c 'dblock 0' -c 'blocktrash -3 -o 0 -x 24 -y 24 -z' -c print >> $seqres.full
> +_scratch_mount
> +cat_victim
> +
> +echo "Part 6: Disable fsverity, which should work" | tee -a $seqres.full
> +disable_verity
> +cat_victim
> +
> +echo "Part 7: Corrupt a merkle tree block" | tee -a $seqres.full
> +create_victim 1234 # two merkle tree blocks
> +_fsv_scratch_corrupt_merkle_tree "$VICTIM_FILE" 0

hmm, _fsv_scratch_corrupt_merkle_tree calls _scratch_xfs_repair, and
now with xfs_repair knowing about fs-verity is probably a problem. I
don't remember what was the problem with quota (why xfs_repiar is
there), I can check it.

> +cat_victim
> +
> +echo "Part 8: Disable fsverity, which should work" | tee -a $seqres.full
> +disable_verity
> +cat_victim
> +
> +echo "Part 9: Corrupt the fsverity salt" | tee -a $seqres.full
> +create_victim
> +_scratch_unmount
> +_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 3 #08' -c 'attr_modify -f "vdesc" -o 80 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
> +_scratch_mount
> +cat_victim
> +
> +echo "Part 10: Disable fsverity, which should work" | tee -a $seqres.full
> +disable_verity
> +cat_victim
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/1881.out b/tests/xfs/1881.out
> new file mode 100644
> index 0000000000..3e94b8001e
> --- /dev/null
> +++ b/tests/xfs/1881.out
> @@ -0,0 +1,28 @@
> +QA output created by 1881
> +Part 1: Delete the fsverity descriptor
> +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> +SCRATCH_MNT/a: Invalid argument
> +Part 2: Disable fsverity, which won't work
> +SCRATCH_MNT/a: Invalid argument
> +SCRATCH_MNT/a: Invalid argument
> +Part 3: Corrupt the fsverity descriptor
> +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> +SCRATCH_MNT/a: Invalid argument
> +Part 4: Disable fsverity, which won't work
> +SCRATCH_MNT/a: Invalid argument
> +SCRATCH_MNT/a: Invalid argument
> +Part 5: Corrupt the fsverity file data
> +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> +pread: Input/output error
> +Part 6: Disable fsverity, which should work
> +pread: Input/output error
> +Part 7: Corrupt a merkle tree block
> +sha256:c56f1115966bafa6c9d32b4717f554b304161f33923c9292c7a92a27866a853c SCRATCH_MNT/a
> +pread: Input/output error
> +Part 8: Disable fsverity, which should work
> +pread: Input/output error
> +Part 9: Corrupt the fsverity salt
> +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> +pread: Input/output error
> +Part 10: Disable fsverity, which should work
> +pread: Input/output error
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 6/6] common/populate: add verity files to populate xfs images
  2024-04-30  3:42   ` [PATCH 6/6] common/populate: add verity files to populate xfs images Darrick J. Wong
@ 2024-04-30 13:22     ` Andrey Albershteyn
  2024-04-30 15:49       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 13:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-29 20:42:21, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If verity is enabled on a filesystem, we should create some sample
> verity files.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/populate |   24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> 
> diff --git a/common/populate b/common/populate
> index 35071f4210..ab9495e739 100644
> --- a/common/populate
> +++ b/common/populate
> @@ -520,6 +520,30 @@ _scratch_xfs_populate() {
>  		done
>  	fi
>  
> +	# verity merkle trees
> +	is_verity="$(_xfs_has_feature "$SCRATCH_MNT" verity -v)"
> +	if [ $is_verity -gt 0 ]; then
> +		echo "+ fsverity"
> +
> +		# Create a biggish file with all zeroes, because metadump
> +		# won't preserve data blocks and we don't want the hashes to
> +		# stop working for our sample fs.

Hashes of the data blocks in the merkle tree? All zeros to use
.zero_digest in fs-verity? Not sure if got this comment right

> +		for ((pos = 0, i = 88; pos < 23456789; pos += 234567, i++)); do
> +			$XFS_IO_PROG -f -c "pwrite -S 0 $pos 234567" "$SCRATCH_MNT/verity"
> +		done
> +
> +		fsverity enable "$SCRATCH_MNT/verity"
> +
> +		# Create a sparse file
> +		$XFS_IO_PROG -f -c "pwrite -S 0 0 3" -c "pwrite -S 0 23456789 3" "$SCRATCH_MNT/sparse_verity"
> +		fsverity enable "$SCRATCH_MNT/sparse_verity"
> +
> +		# Create a salted sparse file
> +		$XFS_IO_PROG -f -c "pwrite -S 0 0 3" -c "pwrite -S 0 23456789 3" "$SCRATCH_MNT/salted_verity"
> +		local salt="5846532066696e616c6c7920686173206461746120636865636b73756d732121"	# XFS finally has data checksums!!
> +		fsverity enable --salt="$salt" "$SCRATCH_MNT/salted_verity"
> +	fi
> +
>  	# Copy some real files (xfs tests, I guess...)
>  	echo "+ real files"
>  	test $fill -ne 0 && __populate_fill_fs "${SCRATCH_MNT}" 5
> 

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 1/6] common/verity: enable fsverity for XFS
  2024-04-30 12:39     ` Andrey Albershteyn
@ 2024-04-30 15:35       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30 15:35 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: zlang, ebiggers, Andrey Albershteyn, fsverity, linux-fsdevel,
	guan, linux-xfs, fstests

On Tue, Apr 30, 2024 at 02:39:04PM +0200, Andrey Albershteyn wrote:
> On 2024-04-29 20:41:03, Darrick J. Wong wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> > 
> > XFS supports verity and can be enabled for -g verity group.
> > 
> > Signed-off-by: Andrey Albershteyn <andrey.albershteyn@gmail.com>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  common/verity |   39 +++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 37 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/common/verity b/common/verity
> > index 59b67e1201..20408c8c0e 100644
> > --- a/common/verity
> > +++ b/common/verity
> > @@ -43,7 +43,16 @@ _require_scratch_verity()
> >  
> >  	# The filesystem may be aware of fs-verity but have it disabled by
> >  	# CONFIG_FS_VERITY=n.  Detect support via sysfs.
> > -	if [ ! -e /sys/fs/$fstyp/features/verity ]; then
> > +	case $FSTYP in
> > +	xfs)
> > +		_scratch_unmount
> > +		_check_scratch_xfs_features VERITY &>>$seqres.full
> > +		_scratch_mount
> > +	;;
> > +	*)
> > +		test -e /sys/fs/$fstyp/features/verity
> > +	esac
> > +	if [ ! $? ]; then
> >  		_notrun "kernel $fstyp isn't configured with verity support"
> >  	fi
> >  
> > @@ -201,6 +210,9 @@ _scratch_mkfs_verity()
> >  	ext4|f2fs)
> >  		_scratch_mkfs -O verity
> >  		;;
> > +	xfs)
> > +		_scratch_mkfs -i verity
> > +		;;
> >  	btrfs)
> >  		_scratch_mkfs
> >  		;;
> > @@ -334,12 +346,19 @@ _fsv_scratch_corrupt_bytes()
> >  	local lstart lend pstart pend
> >  	local dd_cmds=()
> >  	local cmd
> > +	local device=$SCRATCH_DEV
> >  
> >  	sync	# Sync to avoid unwritten extents
> >  
> >  	cat > $tmp.bytes
> >  	local end=$(( offset + $(_get_filesize $tmp.bytes ) ))
> >  
> > +	# If this is an xfs realtime file, switch @device to the rt device
> > +	if [ $FSTYP = "xfs" ]; then
> > +		$XFS_IO_PROG -r -c 'stat -v' "$file" | grep -q -w realtime && \
> > +			device=$SCRATCH_RTDEV
> > +	fi
> > +
> >  	# For each extent that intersects the requested range in order, add a
> >  	# command that writes the next part of the data to that extent.
> >  	while read -r lstart lend pstart pend; do
> > @@ -355,7 +374,7 @@ _fsv_scratch_corrupt_bytes()
> >  		elif (( offset < lend )); then
> >  			local len=$((lend - offset))
> >  			local seek=$((pstart + (offset - lstart)))
> > -			dd_cmds+=("head -c $len | dd of=$SCRATCH_DEV oflag=seek_bytes seek=$seek status=none")
> > +			dd_cmds+=("head -c $len | dd of=$device oflag=seek_bytes seek=$seek status=none")
> >  			(( offset += len ))
> >  		fi
> >  	done < <($XFS_IO_PROG -r -c "fiemap $offset $((end - offset))" "$file" \
> > @@ -408,6 +427,22 @@ _fsv_scratch_corrupt_merkle_tree()
> >  		done
> >  		_scratch_mount
> >  		;;
> > +	xfs)
> > +		local ino=$(stat -c '%i' $file)
> 
> I didn't know about xfs_db's "path" command, this can be probably
> replace with -c "path $file", below in _scratch_xfs_db.

You /can/ use the xfs_db path command here, but then you have to strip
out $SCRATCH_MNT from $file since it of course doesn't know about mount
points.  Since $file is a file path, we might as well use stat to find
the inumber.

> > +		local attr_offset=$(( $offset % $FSV_BLOCK_SIZE ))
> > +		local attr_index=$(printf "%08d" $(( offset - attr_offset )))
> > +		_scratch_unmount
> > +		# Attribute name is 8 bytes long (byte position of Merkle tree block)
> > +		_scratch_xfs_db -x -c "inode $ino" \
>                                 here   ^^^^^^^^^^
> > +			-c "attr_modify -f -m 8 -o $attr_offset $attr_index \"BUG\"" \
> > +			-c "ablock 0" -c "print" \
> > +			>>$seqres.full
> > +		# In case bsize == 4096 and merkle block size == 1024, by
> > +		# modifying attribute with 'attr_modify we can corrupt quota
> > +		# account. Let's repair it
> > +		_scratch_xfs_repair >> $seqres.full 2>&1
> > +		_scratch_mount
> > +		;;
> >  	*)
> >  		_fail "_fsv_scratch_corrupt_merkle_tree() unimplemented on $FSTYP"
> >  		;;
> > 
> > 
> 
> Otherwise, looks good to me:
> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>

<nod>

--D

> -- 
> - Andrey
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs
  2024-04-30 12:46     ` Andrey Albershteyn
@ 2024-04-30 15:36       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30 15:36 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On Tue, Apr 30, 2024 at 02:46:18PM +0200, Andrey Albershteyn wrote:
> On 2024-04-29 20:41:19, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Adjust these tests to accomdate the use of xattrs to store fsverity
> > metadata.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/xfs/021     |    3 +++
> >  tests/xfs/122.out |    1 +
> >  2 files changed, 4 insertions(+)
> > 
> > 
> > diff --git a/tests/xfs/021 b/tests/xfs/021
> > index ef307fc064..dcecf41958 100755
> > --- a/tests/xfs/021
> > +++ b/tests/xfs/021
> > @@ -118,6 +118,7 @@ _scratch_xfs_db -r -c "inode $inum_1" -c "print a.sfattr"  | \
> >  	perl -ne '
> >  /\.secure/ && next;
> >  /\.parent/ && next;
> > +/\.verity/ && next;
> >  	print unless /^\d+:\[.*/;'
> >  
> >  echo "*** dump attributes (2)"
> > @@ -128,6 +129,7 @@ _scratch_xfs_db -r -c "inode $inum_2" -c "a a.bmx[0].startblock" -c print  \
> >  	| perl -ne '
> >  s/,secure//;
> >  s/,parent//;
> > +s/,verity//;
> >  s/info.hdr/info/;
> >  /hdr.info.crc/ && next;
> >  /hdr.info.bno/ && next;
> > @@ -135,6 +137,7 @@ s/info.hdr/info/;
> >  /hdr.info.lsn/ && next;
> >  /hdr.info.owner/ && next;
> >  /\.parent/ && next;
> > +/\.verity/ && next;
> >  s/^(hdr.info.magic =) 0x3bee/\1 0xfbee/;
> >  s/^(hdr.firstused =) (\d+)/\1 FIRSTUSED/;
> >  s/^(hdr.freemap\[0-2] = \[base,size]).*/\1 [FREEMAP..]/;
> > diff --git a/tests/xfs/122.out b/tests/xfs/122.out
> > index abd82e7142..019fe7545f 100644
> > --- a/tests/xfs/122.out
> > +++ b/tests/xfs/122.out
> > @@ -142,6 +142,7 @@ sizeof(struct xfs_scrub_vec) = 16
> >  sizeof(struct xfs_scrub_vec_head) = 40
> >  sizeof(struct xfs_swap_extent) = 64
> >  sizeof(struct xfs_unmount_log_format) = 8
> > +sizeof(struct xfs_verity_merkle_key) = 8

Whoops, this change isn't needed anymore.

--D

> >  sizeof(struct xfs_xmd_log_format) = 16
> >  sizeof(struct xfs_xmi_log_format) = 88
> >  sizeof(union xfs_rtword_raw) = 4
> > 
> 
> Looks good to me:
> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
> 
> -- 
> - Andrey
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 3/6] xfs/122: adapt to fsverity
  2024-04-30 12:45     ` Andrey Albershteyn
@ 2024-04-30 15:37       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30 15:37 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On Tue, Apr 30, 2024 at 02:45:29PM +0200, Andrey Albershteyn wrote:
> On 2024-04-29 20:41:34, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Add fields for fsverity ondisk structures.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/xfs/122.out |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > 
> > diff --git a/tests/xfs/122.out b/tests/xfs/122.out
> > index 019fe7545f..22f36c0311 100644
> > --- a/tests/xfs/122.out
> > +++ b/tests/xfs/122.out
> > @@ -65,6 +65,7 @@ sizeof(struct xfs_agfl) = 36
> >  sizeof(struct xfs_attr3_leaf_hdr) = 80
> >  sizeof(struct xfs_attr3_leafblock) = 88
> >  sizeof(struct xfs_attr3_rmt_hdr) = 56
> > +sizeof(struct xfs_attr3_rmtverity_hdr) = 36
> >  sizeof(struct xfs_attr_sf_entry) = 3
> >  sizeof(struct xfs_attr_sf_hdr) = 4
> >  sizeof(struct xfs_attr_shortform) = 8
> > @@ -120,6 +121,7 @@ sizeof(struct xfs_log_dinode) = 176
> >  sizeof(struct xfs_log_legacy_timestamp) = 8
> >  sizeof(struct xfs_map_extent) = 32
> >  sizeof(struct xfs_map_freesp) = 32
> > +sizeof(struct xfs_merkle_key) = 8
> >  sizeof(struct xfs_parent_rec) = 12
> >  sizeof(struct xfs_phys_extent) = 16
> >  sizeof(struct xfs_refcount_key) = 4
> > 
> > 
> 
> Shouldn't this patch be squashed with previous one?

Actualy, the 122.out change in the previous patch is now wrong and can
go away.  These two changes are still relevant though.

--D

> -- 
> - Andrey
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata
  2024-04-30 12:29     ` Andrey Albershteyn
@ 2024-04-30 15:43       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30 15:43 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On Tue, Apr 30, 2024 at 02:29:03PM +0200, Andrey Albershteyn wrote:
> On 2024-04-29 20:41:50, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Create a basic test to ensure that xfs_scrub media scans complain about
> > files that don't pass fsverity validation.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/xfs/1880     |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/1880.out |   37 ++++++++++++++
> >  2 files changed, 172 insertions(+)
> >  create mode 100755 tests/xfs/1880
> >  create mode 100644 tests/xfs/1880.out
> > 
> > 
> > diff --git a/tests/xfs/1880 b/tests/xfs/1880
> > new file mode 100755
> > index 0000000000..a2119f04c2
> > --- /dev/null
> > +++ b/tests/xfs/1880
> > @@ -0,0 +1,135 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> > +#
> > +# FS QA Test 1880
> > +#
> > +# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
> > +# that xfs_scrub detects this and repairs whatever it can.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick verity
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	_restore_fsverity_signatures
> > +	rm -f $tmp.*
> > +}
> > +
> > +. ./common/verity
> > +. ./common/filter
> > +. ./common/fuzzy
> > +
> > +_supported_fs xfs
> > +_require_scratch_verity
> > +_disable_fsverity_signatures
> > +_require_fsverity_corruption
> > +_require_scratch_nocheck	# fsck test
> > +
> > +_scratch_mkfs >> $seqres.full
> > +_scratch_mount
> > +
> > +_require_scratch_xfs_scrub
> > +_require_xfs_has_feature "$SCRATCH_MNT" verity
> > +VICTIM_FILE="$SCRATCH_MNT/a"
> > +_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"
> 
> I think this is not necessary, _require_scratch_verity already does
> check if verity can be enabled (with more detailed errors).

It is because _require_scratch_verity calls _scratch_mkfs_verity to
format the filesystem.  _scratch_mkfs_verity in turn forces verity on,
possibly overriding MKFS_OPTIONS to make it happen.  -iverity=1 might
not be set for a regular _scratch_mkfs call.

Therefore, this second _fsv_can_enable call checks that the test
runner's MKFS_OPTIONS set actually supports fsverity.

I'll leave a comment summarizing this:

# Check again to confirm that the caller's MKFS_OPTIONS result in a filesystem
# that supports fsverity.
_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"

--D

> Otherwise, looks good to me:
> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
> 
> -- 
> - Andrey
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 5/6] xfs: test disabling fsverity
  2024-04-30 13:11     ` Andrey Albershteyn
@ 2024-04-30 15:48       ` Darrick J. Wong
  2024-04-30 18:06         ` Andrey Albershteyn
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30 15:48 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On Tue, Apr 30, 2024 at 03:11:11PM +0200, Andrey Albershteyn wrote:
> On 2024-04-29 20:42:05, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Add a test to make sure that we can disable fsverity on a file that
> > doesn't pass fsverity validation on its contents anymore.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  tests/xfs/1881     |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/1881.out |   28 +++++++++++++
> >  2 files changed, 139 insertions(+)
> >  create mode 100755 tests/xfs/1881
> >  create mode 100644 tests/xfs/1881.out
> > 
> > 
> > diff --git a/tests/xfs/1881 b/tests/xfs/1881
> > new file mode 100755
> > index 0000000000..411802d7c7
> > --- /dev/null
> > +++ b/tests/xfs/1881
> > @@ -0,0 +1,111 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> > +#
> > +# FS QA Test 1881
> > +#
> > +# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
> > +# that we can still disable fsverity, at least for the latter cases.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick verity
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	_restore_fsverity_signatures
> > +	rm -f $tmp.*
> > +}
> > +
> > +. ./common/verity
> > +. ./common/filter
> > +. ./common/fuzzy
> > +
> > +_supported_fs xfs
> > +_require_scratch_verity
> > +_disable_fsverity_signatures
> > +_require_fsverity_corruption
> > +_require_xfs_io_command noverity
> > +_require_scratch_nocheck	# corruption test
> > +
> > +_scratch_mkfs >> $seqres.full
> > +_scratch_mount
> > +
> > +_require_xfs_has_feature "$SCRATCH_MNT" verity
> > +VICTIM_FILE="$SCRATCH_MNT/a"
> > +_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"
> > +
> > +create_victim()
> > +{
> > +	local filesize="${1:-3}"
> > +
> > +	rm -f "$VICTIM_FILE"
> > +	perl -e "print 'moo' x $((filesize / 3))" > "$VICTIM_FILE"
> > +	fsverity enable --hash-alg=sha256 --block-size=1024 "$VICTIM_FILE"
> > +	fsverity measure "$VICTIM_FILE" | _filter_scratch
> > +}
> > +
> > +disable_verity() {
> > +	$XFS_IO_PROG -r -c 'noverity' "$VICTIM_FILE" 2>&1 | _filter_scratch
> > +}
> > +
> > +cat_victim() {
> > +	$XFS_IO_PROG -r -c 'pread -q 0 4096' "$VICTIM_FILE" 2>&1 | _filter_scratch
> > +}
> > +
> > +echo "Part 1: Delete the fsverity descriptor" | tee -a $seqres.full
> > +create_victim
> > +_scratch_unmount
> > +_scratch_xfs_db -x -c "path /a" -c "attr_remove -f vdesc" -c 'ablock 0' -c print >> $seqres.full
> > +_scratch_mount
> > +cat_victim
> > +
> > +echo "Part 2: Disable fsverity, which won't work" | tee -a $seqres.full
> > +disable_verity
> > +cat_victim
> > +
> > +echo "Part 3: Corrupt the fsverity descriptor" | tee -a $seqres.full
> > +create_victim
> > +_scratch_unmount
> > +_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 0 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
> > +_scratch_mount
> > +cat_victim
> > +
> > +echo "Part 4: Disable fsverity, which won't work" | tee -a $seqres.full
> > +disable_verity
> > +cat_victim
> > +
> > +echo "Part 5: Corrupt the fsverity file data" | tee -a $seqres.full
> > +create_victim
> > +_scratch_unmount
> > +_scratch_xfs_db -x -c "path /a" -c 'dblock 0' -c 'blocktrash -3 -o 0 -x 24 -y 24 -z' -c print >> $seqres.full
> > +_scratch_mount
> > +cat_victim
> > +
> > +echo "Part 6: Disable fsverity, which should work" | tee -a $seqres.full
> > +disable_verity
> > +cat_victim
> > +
> > +echo "Part 7: Corrupt a merkle tree block" | tee -a $seqres.full
> > +create_victim 1234 # two merkle tree blocks
> > +_fsv_scratch_corrupt_merkle_tree "$VICTIM_FILE" 0
> 
> hmm, _fsv_scratch_corrupt_merkle_tree calls _scratch_xfs_repair, and
> now with xfs_repair knowing about fs-verity is probably a problem. I

It shouldn't be -- xfs_repair doesn't check the contents of the merkle
tree itself.

(xfs_scrub sort of does, but only by calling out to the kernel fsverity
code to get rough tree geometry and calling MADV_POPULATE_READ to
exercise the read validation.)

> don't remember what was the problem with quota (why xfs_repiar is
> there), I can check it.

If the attr_modify commandline changes the block count of the file, it
won't update the quota accounting information.  That can happen if the
dabtree changes shape, or if the new attr requires the creation of a new
attr leaf block, or if the remote value block count changes due to
changes in the size of the attr value.

--D

> > +cat_victim
> > +
> > +echo "Part 8: Disable fsverity, which should work" | tee -a $seqres.full
> > +disable_verity
> > +cat_victim
> > +
> > +echo "Part 9: Corrupt the fsverity salt" | tee -a $seqres.full
> > +create_victim
> > +_scratch_unmount
> > +_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 3 #08' -c 'attr_modify -f "vdesc" -o 80 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
> > +_scratch_mount
> > +cat_victim
> > +
> > +echo "Part 10: Disable fsverity, which should work" | tee -a $seqres.full
> > +disable_verity
> > +cat_victim
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/1881.out b/tests/xfs/1881.out
> > new file mode 100644
> > index 0000000000..3e94b8001e
> > --- /dev/null
> > +++ b/tests/xfs/1881.out
> > @@ -0,0 +1,28 @@
> > +QA output created by 1881
> > +Part 1: Delete the fsverity descriptor
> > +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> > +SCRATCH_MNT/a: Invalid argument
> > +Part 2: Disable fsverity, which won't work
> > +SCRATCH_MNT/a: Invalid argument
> > +SCRATCH_MNT/a: Invalid argument
> > +Part 3: Corrupt the fsverity descriptor
> > +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> > +SCRATCH_MNT/a: Invalid argument
> > +Part 4: Disable fsverity, which won't work
> > +SCRATCH_MNT/a: Invalid argument
> > +SCRATCH_MNT/a: Invalid argument
> > +Part 5: Corrupt the fsverity file data
> > +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> > +pread: Input/output error
> > +Part 6: Disable fsverity, which should work
> > +pread: Input/output error
> > +Part 7: Corrupt a merkle tree block
> > +sha256:c56f1115966bafa6c9d32b4717f554b304161f33923c9292c7a92a27866a853c SCRATCH_MNT/a
> > +pread: Input/output error
> > +Part 8: Disable fsverity, which should work
> > +pread: Input/output error
> > +Part 9: Corrupt the fsverity salt
> > +sha256:bab5cfebae30d53e4318629d4ba0b4760d6aae38e03ae235741ed69a31873f1f SCRATCH_MNT/a
> > +pread: Input/output error
> > +Part 10: Disable fsverity, which should work
> > +pread: Input/output error
> > 
> 
> -- 
> - Andrey
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 6/6] common/populate: add verity files to populate xfs images
  2024-04-30 13:22     ` Andrey Albershteyn
@ 2024-04-30 15:49       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-04-30 15:49 UTC (permalink / raw)
  To: Andrey Albershteyn
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On Tue, Apr 30, 2024 at 03:22:50PM +0200, Andrey Albershteyn wrote:
> On 2024-04-29 20:42:21, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If verity is enabled on a filesystem, we should create some sample
> > verity files.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  common/populate |   24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> > 
> > 
> > diff --git a/common/populate b/common/populate
> > index 35071f4210..ab9495e739 100644
> > --- a/common/populate
> > +++ b/common/populate
> > @@ -520,6 +520,30 @@ _scratch_xfs_populate() {
> >  		done
> >  	fi
> >  
> > +	# verity merkle trees
> > +	is_verity="$(_xfs_has_feature "$SCRATCH_MNT" verity -v)"
> > +	if [ $is_verity -gt 0 ]; then
> > +		echo "+ fsverity"
> > +
> > +		# Create a biggish file with all zeroes, because metadump
> > +		# won't preserve data blocks and we don't want the hashes to
> > +		# stop working for our sample fs.
> 
> Hashes of the data blocks in the merkle tree? All zeros to use
> .zero_digest in fs-verity? Not sure if got this comment right

Oooh, yeah, I need to go check that.  The block elision code might be
neutralizing this.

--D

> > +		for ((pos = 0, i = 88; pos < 23456789; pos += 234567, i++)); do
> > +			$XFS_IO_PROG -f -c "pwrite -S 0 $pos 234567" "$SCRATCH_MNT/verity"
> > +		done
> > +
> > +		fsverity enable "$SCRATCH_MNT/verity"
> > +
> > +		# Create a sparse file
> > +		$XFS_IO_PROG -f -c "pwrite -S 0 0 3" -c "pwrite -S 0 23456789 3" "$SCRATCH_MNT/sparse_verity"
> > +		fsverity enable "$SCRATCH_MNT/sparse_verity"
> > +
> > +		# Create a salted sparse file
> > +		$XFS_IO_PROG -f -c "pwrite -S 0 0 3" -c "pwrite -S 0 23456789 3" "$SCRATCH_MNT/salted_verity"
> > +		local salt="5846532066696e616c6c7920686173206461746120636865636b73756d732121"	# XFS finally has data checksums!!
> > +		fsverity enable --salt="$salt" "$SCRATCH_MNT/salted_verity"
> > +	fi
> > +
> >  	# Copy some real files (xfs tests, I guess...)
> >  	echo "+ real files"
> >  	test $fill -ne 0 && __populate_fill_fs "${SCRATCH_MNT}" 5
> > 
> 
> -- 
> - Andrey
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 5/6] xfs: test disabling fsverity
  2024-04-30 15:48       ` Darrick J. Wong
@ 2024-04-30 18:06         ` Andrey Albershteyn
  0 siblings, 0 replies; 159+ messages in thread
From: Andrey Albershteyn @ 2024-04-30 18:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: zlang, ebiggers, fsverity, linux-fsdevel, guan, linux-xfs, fstests

On 2024-04-30 08:48:10, Darrick J. Wong wrote:
> On Tue, Apr 30, 2024 at 03:11:11PM +0200, Andrey Albershteyn wrote:
> > On 2024-04-29 20:42:05, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Add a test to make sure that we can disable fsverity on a file that
> > > doesn't pass fsverity validation on its contents anymore.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  tests/xfs/1881     |  111 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/xfs/1881.out |   28 +++++++++++++
> > >  2 files changed, 139 insertions(+)
> > >  create mode 100755 tests/xfs/1881
> > >  create mode 100644 tests/xfs/1881.out
> > > 
> > > 
> > > diff --git a/tests/xfs/1881 b/tests/xfs/1881
> > > new file mode 100755
> > > index 0000000000..411802d7c7
> > > --- /dev/null
> > > +++ b/tests/xfs/1881
> > > @@ -0,0 +1,111 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2024 Oracle.  All Rights Reserved.
> > > +#
> > > +# FS QA Test 1881
> > > +#
> > > +# Corrupt fsverity descriptor, merkle tree blocks, and file contents.  Ensure
> > > +# that we can still disable fsverity, at least for the latter cases.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto quick verity
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	_restore_fsverity_signatures
> > > +	rm -f $tmp.*
> > > +}
> > > +
> > > +. ./common/verity
> > > +. ./common/filter
> > > +. ./common/fuzzy
> > > +
> > > +_supported_fs xfs
> > > +_require_scratch_verity
> > > +_disable_fsverity_signatures
> > > +_require_fsverity_corruption
> > > +_require_xfs_io_command noverity
> > > +_require_scratch_nocheck	# corruption test
> > > +
> > > +_scratch_mkfs >> $seqres.full
> > > +_scratch_mount
> > > +
> > > +_require_xfs_has_feature "$SCRATCH_MNT" verity
> > > +VICTIM_FILE="$SCRATCH_MNT/a"
> > > +_fsv_can_enable "$VICTIM_FILE" || _notrun "cannot enable fsverity"
> > > +
> > > +create_victim()
> > > +{
> > > +	local filesize="${1:-3}"
> > > +
> > > +	rm -f "$VICTIM_FILE"
> > > +	perl -e "print 'moo' x $((filesize / 3))" > "$VICTIM_FILE"
> > > +	fsverity enable --hash-alg=sha256 --block-size=1024 "$VICTIM_FILE"
> > > +	fsverity measure "$VICTIM_FILE" | _filter_scratch
> > > +}
> > > +
> > > +disable_verity() {
> > > +	$XFS_IO_PROG -r -c 'noverity' "$VICTIM_FILE" 2>&1 | _filter_scratch
> > > +}
> > > +
> > > +cat_victim() {
> > > +	$XFS_IO_PROG -r -c 'pread -q 0 4096' "$VICTIM_FILE" 2>&1 | _filter_scratch
> > > +}
> > > +
> > > +echo "Part 1: Delete the fsverity descriptor" | tee -a $seqres.full
> > > +create_victim
> > > +_scratch_unmount
> > > +_scratch_xfs_db -x -c "path /a" -c "attr_remove -f vdesc" -c 'ablock 0' -c print >> $seqres.full
> > > +_scratch_mount
> > > +cat_victim
> > > +
> > > +echo "Part 2: Disable fsverity, which won't work" | tee -a $seqres.full
> > > +disable_verity
> > > +cat_victim
> > > +
> > > +echo "Part 3: Corrupt the fsverity descriptor" | tee -a $seqres.full
> > > +create_victim
> > > +_scratch_unmount
> > > +_scratch_xfs_db -x -c "path /a" -c 'attr_modify -f "vdesc" -o 0 "BUGSAHOY"' -c 'ablock 0' -c print >> $seqres.full
> > > +_scratch_mount
> > > +cat_victim
> > > +
> > > +echo "Part 4: Disable fsverity, which won't work" | tee -a $seqres.full
> > > +disable_verity
> > > +cat_victim
> > > +
> > > +echo "Part 5: Corrupt the fsverity file data" | tee -a $seqres.full
> > > +create_victim
> > > +_scratch_unmount
> > > +_scratch_xfs_db -x -c "path /a" -c 'dblock 0' -c 'blocktrash -3 -o 0 -x 24 -y 24 -z' -c print >> $seqres.full
> > > +_scratch_mount
> > > +cat_victim
> > > +
> > > +echo "Part 6: Disable fsverity, which should work" | tee -a $seqres.full
> > > +disable_verity
> > > +cat_victim
> > > +
> > > +echo "Part 7: Corrupt a merkle tree block" | tee -a $seqres.full
> > > +create_victim 1234 # two merkle tree blocks
> > > +_fsv_scratch_corrupt_merkle_tree "$VICTIM_FILE" 0
> > 
> > hmm, _fsv_scratch_corrupt_merkle_tree calls _scratch_xfs_repair, and
> > now with xfs_repair knowing about fs-verity is probably a problem. I
> 
> It shouldn't be -- xfs_repair doesn't check the contents of the merkle
> tree itself.
> 
> (xfs_scrub sort of does, but only by calling out to the kernel fsverity
> code to get rough tree geometry and calling MADV_POPULATE_READ to
> exercise the read validation.)

oh right, it's xfs_scrub, I meant re-reading file validation

> 
> > don't remember what was the problem with quota (why xfs_repiar is
> > there), I can check it.
> 
> If the attr_modify commandline changes the block count of the file, it
> won't update the quota accounting information.  That can happen if the
> dabtree changes shape, or if the new attr requires the creation of a new
> attr leaf block, or if the remote value block count changes due to
> changes in the size of the attr value.

aha, yeah

-- 
- Andrey


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks
  2024-04-30  3:29   ` [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks Darrick J. Wong
@ 2024-05-01  6:47     ` Christoph Hellwig
  2024-05-01 22:47       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Now that fsverity tells our merkle tree io functions about what a hash
> of a data block full of zeroes looks like, we can use this information
> to avoid writing out merkle tree blocks for sparse regions of the file.
> For verified gold master images this can save quite a bit of overhead.

Is this something that fsverity should be doing in a generic way?
It feels odd to have XFS behave different from everyone else here,
even if this does feel useful.  Do we also need any hash validation
that no one tampered with the metadata and added a new extent, or
is this out of scope for fsverity?


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-04-30  3:30   ` [PATCH 25/26] xfs: make it possible to disable fsverity Darrick J. Wong
@ 2024-05-01  6:48     ` Christoph Hellwig
  2024-05-01 22:50       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:30:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Create an experimental ioctl so that we can turn off fsverity.

Didn't Eric argue against this?  And if we're adding this, I think
it should be a generic feature and not just xfs specific.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-04-30  3:28   ` [PATCH 18/26] xfs: use merkle tree offset as attr hash Darrick J. Wong
@ 2024-05-01  6:53     ` Christoph Hellwig
  2024-05-01  7:23       ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:28:48PM -0700, Darrick J. Wong wrote:
> Within just this attr leaf block, there are 76 attr entries, but only 38
> distinct hash values.  There are 415 merkle tree blocks for this file,
> but we already have hash collisions.  This isn't good performance from
> the standard da hash function because we're mostly shifting and rolling
> zeroes around.
> 
> However, we don't even have to do that much work -- the merkle tree
> block keys are themslves u64 values.  Truncate that value to 32 bits
> (the size of xfs_dahash_t) and use that for the hash.  We won't have any
> collisions between merkle tree blocks until that tree grows to 2^32nd
> blocks.  On a 4k block filesystem, we won't hit that unless the file
> contains more than 2^49 bytes, assuming sha256.
> 
> As a side effect, the keys for merkle tree blocks get written out in
> roughly sequential order, though I didn't observe any change in
> performance.

This and the header hacks suggest to me that shoe horning the fsverity
blocks into attrs just feels like the wrong approach.

They don't really behave like attrs, they aren't key/value paris that
are separate, but a large amount of same sized blocks with logical
indexing.  All that is actually nicely solved by the original fsverity
used by ext4/f2fs, while we have to pile workarounds ontop of
workarounds to make attrs work.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers
  2024-04-30  3:27   ` [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers Darrick J. Wong
@ 2024-05-01  6:54     ` Christoph Hellwig
  2024-05-01 22:44       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:27:29PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> xfs_inode.i_flags is an unsigned long, so make these helpers take that
> as the flags argument instead of unsigned short.  This is needed for the
> next patch.
> 
> While we're at it, remove the iflags variable from xfs_iget_cache_miss
> because we no longer need it.

I just reinvented this for another flag in work in progress code.
Can we just get included in the current for-next tree?


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-04-30  3:24   ` [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
@ 2024-05-01  6:55     ` Christoph Hellwig
  2024-05-01 22:39       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:55 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:24:22PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> In the next few patches we're going to refactor the attr remote code so
> that we can support headerless remote xattr values for storing merkle
> tree blocks.  For now, let's change the code to use unsigned int to
> describe quantities of bytes and blocks that cannot be negative.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Can we please get this included ASAP instead of having it linger around?


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 02/26] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
  2024-04-30  3:24   ` [PATCH 02/26] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
@ 2024-05-01  6:55     ` Christoph Hellwig
  0 siblings, 0 replies; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:55 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:24:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Turn this into a properly typechecked function, and actually use the
> correct blocksize for extended attributes.  The function cannot be
> static inline because xfsprogs userspace uses it.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

please expedite this as well.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 03/26] xfs: create a helper to compute the blockcount of a max sized remote value
  2024-04-30  3:24   ` [PATCH 03/26] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
@ 2024-05-01  6:56     ` Christoph Hellwig
  0 siblings, 0 replies; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:24:53PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Create a helper function to compute the number of fsblocks needed to
> store a maximally-sized extended attribute value.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 04/26] xfs: minor cleanups of xfs_attr3_rmt_blocks
  2024-04-30  3:25   ` [PATCH 04/26] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
@ 2024-05-01  6:56     ` Christoph Hellwig
  0 siblings, 0 replies; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks
  2024-04-30  3:25   ` [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
@ 2024-05-01  6:57     ` Christoph Hellwig
  2024-05-01 22:42       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  6:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

> +	if (error)
> +		return error;
> +
>  	lock_mode = xfs_ilock_attr_map_shared(args->dp);
> +
> +        /*
> +	 * Make sure the attr fork iext tree is loaded.  Use the empty
> +	 * transaction to load the bmbt so that we avoid livelocking on loops.
> +	 */
> +        if (xfs_inode_hasattr(args->dp)) {
> +                error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK);

Overly long line here.  But I'd expect the xfs_iread_extents to be in
xfs_attr_get_ilocked anyway instead of duplicated in the callers.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path
  2024-04-30  3:24   ` [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path Darrick J. Wong
@ 2024-05-01  7:10     ` Christoph Hellwig
  2024-05-01 22:37       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  7:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, Christoph Hellwig, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Mon, Apr 29, 2024 at 08:24:06PM -0700, Darrick J. Wong wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
> 
> This patch adds fs-verity verification into iomap's read path. After
> BIO's io operation is complete the data are verified against
> fs-verity's Merkle tree. Verification work is done in a separate
> workqueue.
> 
> The read path ioend iomap_read_ioend are stored side by side with
> BIOs if FS_VERITY is enabled.
> 
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Not sure where my signoff is coming from.  It looks pretty similar to
a patch I sent a long time ago, but apparently it's been modified enough
to drop my authorship, in whih case my signoff should be dropped as
well.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-01  6:53     ` Christoph Hellwig
@ 2024-05-01  7:23       ` Christoph Hellwig
  2024-05-07 21:24         ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  7:23 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Tue, Apr 30, 2024 at 11:53:00PM -0700, Christoph Hellwig wrote:
> This and the header hacks suggest to me that shoe horning the fsverity
> blocks into attrs just feels like the wrong approach.
> 
> They don't really behave like attrs, they aren't key/value paris that
> are separate, but a large amount of same sized blocks with logical
> indexing.  All that is actually nicely solved by the original fsverity
> used by ext4/f2fs, while we have to pile workarounds ontop of
> workarounds to make attrs work.

Taking this a bit further:  If we want to avoid the problems associated
with the original scheme, mostly the file size limitation, and the (IMHO
more cosmetic than real) confusion with post-EOF preallocations, we
can still store the data in the attr fork, but not in the traditional
attr format.  The attr fork provides the logical index to physical
translation as the data fork, and while that is current only used for
dabtree blocks and remote attr values, that isn't actually a fundamental
requirement for using it.

All the attr fork placement works through xfs_bmap_first_unused() to
find completely random free space in the logic address space.

Now if we reserved say the high bit for verity blocks in verity enabled
file systems we can simply use the bmap btree to do the mapping from
the verity index to the on-disk verify blocks without any other impact
to the attr code.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets
  2024-04-30  3:20   ` [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets Darrick J. Wong
@ 2024-05-01  7:33     ` Christoph Hellwig
  2024-05-01 22:33       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  7:33 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

> +	const u64 end_pos = min(pos + length, vi->tree_params.tree_size);
> +	struct backing_dev_info *bdi = inode->i_sb->s_bdi;
> +	const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT,
> +				     ULONG_MAX);
> +	const struct merkle_tree_params *params = &vi->tree_params;

bdi->io_pages is really a VM readahead concept.  I know this is existing
code, but can we rething why this is even used here?

> +	unsigned int offs_in_block = pos & (params->block_size - 1);
>  	int retval = 0;
>  	int err = 0;
>  
> +	 * Iterate through each Merkle tree block in the requested range and
> +	 * copy the requested portion to userspace. Note that we are returning
> +	 * a byte stream.
>  	 */
> +	while (pos < end_pos) {
> +		unsigned long ra_bytes;
> +		unsigned int bytes_to_copy;
> +		struct fsverity_blockbuf block = { };
>  
> +		ra_bytes = min_t(unsigned long, end_pos - pos, max_ra_bytes);
> +		bytes_to_copy = min_t(u64, end_pos - pos,
> +				      params->block_size - offs_in_block);
> +
> +		err = fsverity_read_merkle_tree_block(inode, &vi->tree_params,
> +						      pos - offs_in_block,
> +						      ra_bytes, &block);

Maybe it's just me, but isn't passing a byte offset to a read...block
routine a bit weird and this should operate on the block number instead?

> +		if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) {

And the returned/passed value should be a kernel pointer to the start
of the in-memory copy of the block?
to 

> +static bool is_hash_block_verified(struct inode *inode,
> +				   struct fsverity_blockbuf *block,
>  				   unsigned long hblock_idx)

Other fsverify code seems to use the (IMHO) much more readable
two-tab indentation for prototype continuations, maybe stick to that?

>
>  {
> +	struct fsverity_info *vi = inode->i_verity_info;
> +	struct page *hpage = (struct page *)block->context;

block->context is a void pointer, no need for casting it.

> +	for (; level > 0; level--)
> +		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);

Overlh long line here.  But the loop kinda looks odd anyway with the
exta one off in the body instead of the loop.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 04/18] fsverity: support block-based Merkle tree caching
  2024-04-30  3:20   ` [PATCH 04/18] fsverity: support block-based Merkle tree caching Darrick J. Wong
@ 2024-05-01  7:36     ` Christoph Hellwig
  2024-05-01 22:35       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-01  7:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

> @@ -377,6 +391,19 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
>  
>  	block->pos = pos;
>  	block->size = params->block_size;
> +	block->verified = false;
> +
> +	if (vops->read_merkle_tree_block) {
> +		struct fsverity_readmerkle req = {
> +			.inode = inode,
> +			.ra_bytes = ra_bytes,
> +		};
> +
> +		err = vops->read_merkle_tree_block(&req, block);
> +		if (err)
> +			goto bad;
> +		return 0;

I still don't understand why we're keeping two interfaces instead of
providing a read through pagecache helper that implements the
->read_block interface.  That makes the interface really hard to follow
and feel rather ad-hoc.  I also have vague memories of providing such a
refactoring a long time ago.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets
  2024-05-01  7:33     ` Christoph Hellwig
@ 2024-05-01 22:33       ` Darrick J. Wong
  2024-05-02  0:42         ` Eric Biggers
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Wed, May 01, 2024 at 12:33:14AM -0700, Christoph Hellwig wrote:
> > +	const u64 end_pos = min(pos + length, vi->tree_params.tree_size);
> > +	struct backing_dev_info *bdi = inode->i_sb->s_bdi;
> > +	const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT,
> > +				     ULONG_MAX);
> > +	const struct merkle_tree_params *params = &vi->tree_params;
> 
> bdi->io_pages is really a VM readahead concept.  I know this is existing
> code, but can we rething why this is even used here?

I would get rid of it entirely for the merkle-by-block case, since we'd
have to walk the xattr tree again just to find the next block.  XFS
ignores the readahead value entirely.

I think this only makes sense for the merkle-by-page case, and only
because ext4 and friends are stuffing the merkle data in the posteof
parts of the file mapping.

And even then, shouldn't we figure out the amount of readahead going on
and only ask for enough readahead of the merkle tree to satisfy that
readahead?

> > +	unsigned int offs_in_block = pos & (params->block_size - 1);
> >  	int retval = 0;
> >  	int err = 0;
> >  
> > +	 * Iterate through each Merkle tree block in the requested range and
> > +	 * copy the requested portion to userspace. Note that we are returning
> > +	 * a byte stream.
> >  	 */
> > +	while (pos < end_pos) {
> > +		unsigned long ra_bytes;
> > +		unsigned int bytes_to_copy;
> > +		struct fsverity_blockbuf block = { };
> >  
> > +		ra_bytes = min_t(unsigned long, end_pos - pos, max_ra_bytes);
> > +		bytes_to_copy = min_t(u64, end_pos - pos,
> > +				      params->block_size - offs_in_block);
> > +
> > +		err = fsverity_read_merkle_tree_block(inode, &vi->tree_params,
> > +						      pos - offs_in_block,
> > +						      ra_bytes, &block);
> 
> Maybe it's just me, but isn't passing a byte offset to a read...block
> routine a bit weird and this should operate on the block number instead?

I would think so, but here's the thing -- the write_merkle_tree_block
functions get passed pos and length in units of bytes.  Maybe fsverity
should clean be passing (blockno, blocksize) to the read and write
functions?  Eric said he could be persuaded to change it:

https://lore.kernel.org/linux-xfs/20240307224903.GE1799@sol.localdomain/

> > +		if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) {
> 
> And the returned/passed value should be a kernel pointer to the start
> of the in-memory copy of the block?
> to 

<shrug> This particular callsite is reading merkle data on behalf of an
ioctl that exports data.  Maybe we want the filesystem's errors to be
bounced up to userspace?

> > +static bool is_hash_block_verified(struct inode *inode,
> > +				   struct fsverity_blockbuf *block,
> >  				   unsigned long hblock_idx)
> 
> Other fsverify code seems to use the (IMHO) much more readable
> two-tab indentation for prototype continuations, maybe stick to that?

I'll do that, if Eric says so. :)

> >
> >  {
> > +	struct fsverity_info *vi = inode->i_verity_info;
> > +	struct page *hpage = (struct page *)block->context;
> 
> block->context is a void pointer, no need for casting it.

Eric insisted on it:
https://lore.kernel.org/linux-xfs/20240306035622.GA68962@sol.localdomain/

> > +	for (; level > 0; level--)
> > +		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);
> 
> Overlh long line here.  But the loop kinda looks odd anyway with the
> exta one off in the body instead of the loop.

I /think/ that's a side effect of reusing the value of @level after the
first loop fails as the initial conditions of the unwind loop.  AFAICT
it doesn't leak, but it's not entirely straightforward.

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 04/18] fsverity: support block-based Merkle tree caching
  2024-05-01  7:36     ` Christoph Hellwig
@ 2024-05-01 22:35       ` Darrick J. Wong
  2024-05-02  4:42         ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Wed, May 01, 2024 at 12:36:11AM -0700, Christoph Hellwig wrote:
> > @@ -377,6 +391,19 @@ int fsverity_read_merkle_tree_block(struct inode *inode,
> >  
> >  	block->pos = pos;
> >  	block->size = params->block_size;
> > +	block->verified = false;
> > +
> > +	if (vops->read_merkle_tree_block) {
> > +		struct fsverity_readmerkle req = {
> > +			.inode = inode,
> > +			.ra_bytes = ra_bytes,
> > +		};
> > +
> > +		err = vops->read_merkle_tree_block(&req, block);
> > +		if (err)
> > +			goto bad;
> > +		return 0;
> 
> I still don't understand why we're keeping two interfaces instead of
> providing a read through pagecache helper that implements the
> ->read_block interface.  That makes the interface really hard to follow
> and feel rather ad-hoc.  I also have vague memories of providing such a
> refactoring a long time ago.

Got a link?  This is the first I've heard of this, but TBH I've been
ignoring a /lot/ of things trying to get online repair merged (thank
you!) over the past months...

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path
  2024-05-01  7:10     ` Christoph Hellwig
@ 2024-05-01 22:37       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, Christoph Hellwig, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Wed, May 01, 2024 at 12:10:53AM -0700, Christoph Hellwig wrote:
> On Mon, Apr 29, 2024 at 08:24:06PM -0700, Darrick J. Wong wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> > 
> > This patch adds fs-verity verification into iomap's read path. After
> > BIO's io operation is complete the data are verified against
> > fs-verity's Merkle tree. Verification work is done in a separate
> > workqueue.
> > 
> > The read path ioend iomap_read_ioend are stored side by side with
> > BIOs if FS_VERITY is enabled.
> > 
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Not sure where my signoff is coming from.  It looks pretty similar to
> a patch I sent a long time ago, but apparently it's been modified enough
> to drop my authorship, in whih case my signoff should be dropped as
> well.

Removed.

---D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-05-01  6:55     ` Christoph Hellwig
@ 2024-05-01 22:39       ` Darrick J. Wong
  2024-05-02  4:56         ` Christoph Hellwig
  2024-05-02  5:56         ` Chandan Babu R
  0 siblings, 2 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:39 UTC (permalink / raw)
  To: Christoph Hellwig, Chandan Babu R
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Tue, Apr 30, 2024 at 11:55:26PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 29, 2024 at 08:24:22PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > In the next few patches we're going to refactor the attr remote code so
> > that we can support headerless remote xattr values for storing merkle
> > tree blocks.  For now, let's change the code to use unsigned int to
> > describe quantities of bytes and blocks that cannot be negative.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

> Can we please get this included ASAP instead of having it linger around?

Chandan, how many more patches are you willing to take for 6.10?  I
think Christoph has a bunch of fully-reviewed cleanups lurking on the
list, and then there's this one.

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks
  2024-05-01  6:57     ` Christoph Hellwig
@ 2024-05-01 22:42       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Tue, Apr 30, 2024 at 11:57:13PM -0700, Christoph Hellwig wrote:
> > +	if (error)
> > +		return error;
> > +
> >  	lock_mode = xfs_ilock_attr_map_shared(args->dp);
> > +
> > +        /*
> > +	 * Make sure the attr fork iext tree is loaded.  Use the empty
> > +	 * transaction to load the bmbt so that we avoid livelocking on loops.
> > +	 */
> > +        if (xfs_inode_hasattr(args->dp)) {
> > +                error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK);
> 
> Overly long line here.  But I'd expect the xfs_iread_extents to be in
> xfs_attr_get_ilocked anyway instead of duplicated in the callers.

Hmm, I think that's the result of xfs_iread_extents whackamole in
djwong-dev.  You are correct that we don't need this whitespace damaged
blob.

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers
  2024-05-01  6:54     ` Christoph Hellwig
@ 2024-05-01 22:44       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:44 UTC (permalink / raw)
  To: Christoph Hellwig, Chandan Babu R
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Tue, Apr 30, 2024 at 11:54:30PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 29, 2024 at 08:27:29PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > xfs_inode.i_flags is an unsigned long, so make these helpers take that
> > as the flags argument instead of unsigned short.  This is needed for the
> > next patch.
> > 
> > While we're at it, remove the iflags variable from xfs_iget_cache_miss
> > because we no longer need it.
> 
> I just reinvented this for another flag in work in progress code.
> Can we just get included in the current for-next tree?

Chandan?  Any thoughts on pushing this for 6.10?

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks
  2024-05-01  6:47     ` Christoph Hellwig
@ 2024-05-01 22:47       ` Darrick J. Wong
  2024-05-02  0:01         ` Eric Biggers
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Tue, Apr 30, 2024 at 11:47:23PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Now that fsverity tells our merkle tree io functions about what a hash
> > of a data block full of zeroes looks like, we can use this information
> > to avoid writing out merkle tree blocks for sparse regions of the file.
> > For verified gold master images this can save quite a bit of overhead.
> 
> Is this something that fsverity should be doing in a generic way?

I don't think it's all that useful for ext4/f2fs because they always
write out full merkle tree blocks even if it's the zerohash over and
over again.  Old kernels aren't going to know how to deal with that.

> It feels odd to have XFS behave different from everyone else here,
> even if this does feel useful.  Do we also need any hash validation
> that no one tampered with the metadata and added a new extent, or
> is this out of scope for fsverity?

If they wrote a new extent with nonzero contents, then the validation
will fail, right?

If they added a new unwritten extent (or a written one full of zeroes),
then the file data hasn't changed and validation would still pass,
correct?

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-01  6:48     ` Christoph Hellwig
@ 2024-05-01 22:50       ` Darrick J. Wong
  2024-05-02  0:15         ` Eric Biggers
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-01 22:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Tue, Apr 30, 2024 at 11:48:29PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 29, 2024 at 08:30:37PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Create an experimental ioctl so that we can turn off fsverity.
> 
> Didn't Eric argue against this?  And if we're adding this, I think
> it should be a generic feature and not just xfs specific.

The tagging is a bit wrong, but it is a generic fsverity ioctl, though
ext4/f2fs/btrfs don't have implementations.

<shrug> According to Ted, programs that care about fsverity are supposed
to check that VERITY is set in the stat data, but I imagine those
programs aren't expecting it to turn off suddenly.  Maybe I should make
this CAP_SYS_ADMIN?  Or withdraw it?

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks
  2024-05-01 22:47       ` Darrick J. Wong
@ 2024-05-02  0:01         ` Eric Biggers
  2024-05-08 20:26           ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Eric Biggers @ 2024-05-02  0:01 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Wed, May 01, 2024 at 03:47:36PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2024 at 11:47:23PM -0700, Christoph Hellwig wrote:
> > On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Now that fsverity tells our merkle tree io functions about what a hash
> > > of a data block full of zeroes looks like, we can use this information
> > > to avoid writing out merkle tree blocks for sparse regions of the file.
> > > For verified gold master images this can save quite a bit of overhead.
> > 
> > Is this something that fsverity should be doing in a generic way?
> 
> I don't think it's all that useful for ext4/f2fs because they always
> write out full merkle tree blocks even if it's the zerohash over and
> over again.  Old kernels aren't going to know how to deal with that.
> 
> > It feels odd to have XFS behave different from everyone else here,
> > even if this does feel useful.  Do we also need any hash validation
> > that no one tampered with the metadata and added a new extent, or
> > is this out of scope for fsverity?
> 
> If they wrote a new extent with nonzero contents, then the validation
> will fail, right?
> 
> If they added a new unwritten extent (or a written one full of zeroes),
> then the file data hasn't changed and validation would still pass,
> correct?

The point of fsverity is to verify that file data is consistent with the
top-level file digest.  It doesn't really matter which type of extent the data
came from, or if the data got synthesized somehow (e.g. zeroes synthesized from
a hole), as long as fsverity still gets invoked to verify the data.  If the data
itself passes verification, then it's good.  The same applies to Merkle tree
blocks which are an intermediate step in the verification.

In the Merkle tree, ext4 and f2fs currently just use the same concept of
sparsity as the file data, i.e. when a block is unmapped, it is filled in with
all zeroes.  As Darrick noticed, this isn't really the right concept of sparsity
for the Merkle tree, as a block full of hashes of zeroed blocks should be used,
not literally a zeroed block.  I think it makes sense to fix this in XFS, as
it's newly adding fsverity support, and this is a filesystem-level
implementation detail.  It would be difficult to fix this in ext4 and f2fs since
it would be an on-disk format upgrade.  (Existing files should not actually have
any sparse Merkle tree blocks, so we probably could redefine what they mean.
But even if so, old kernels would not be able to read the new files.)

- Eric

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-01 22:50       ` Darrick J. Wong
@ 2024-05-02  0:15         ` Eric Biggers
  2024-05-08 20:31           ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Eric Biggers @ 2024-05-02  0:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Wed, May 01, 2024 at 03:50:07PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2024 at 11:48:29PM -0700, Christoph Hellwig wrote:
> > On Mon, Apr 29, 2024 at 08:30:37PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Create an experimental ioctl so that we can turn off fsverity.
> > 
> > Didn't Eric argue against this?  And if we're adding this, I think
> > it should be a generic feature and not just xfs specific.
> 
> The tagging is a bit wrong, but it is a generic fsverity ioctl, though
> ext4/f2fs/btrfs don't have implementations.
> 
> <shrug> According to Ted, programs that care about fsverity are supposed
> to check that VERITY is set in the stat data, but I imagine those
> programs aren't expecting it to turn off suddenly.  Maybe I should make
> this CAP_SYS_ADMIN?  Or withdraw it?
> 

I'm concerned that fsverity could be disabled after someone has already checked
for fsverity on a particular file.  Currently users only have to re-check for
fsverity if they close the file and re-open it (as in that case it might have
been replaced with a new file with fsverity disabled).

A similar issue also would exist for the in-kernel users of fsverity such as
overlayfs and IMA (upstream), and IPE
(https://lore.kernel.org/linux-security-module/1712969764-31039-1-git-send-email-wufan@linux.microsoft.com/).
For example, IPE is being proposed to cache some state about fsverity in the LSM
blob associated with the struct inode.  If fsverity is disabled on an inode,
that state would get out of sync.  This could allow bypassing the IPE policy.

CAP_SYS_ADMIN isn't supposed to give a license to bypass all security features
including LSMs, so using CAP_SYS_ADMIN doesn't seem like a great solution.

- Eric

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets
  2024-05-01 22:33       ` Darrick J. Wong
@ 2024-05-02  0:42         ` Eric Biggers
  2024-05-08 20:14           ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Eric Biggers @ 2024-05-02  0:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Wed, May 01, 2024 at 03:33:03PM -0700, Darrick J. Wong wrote:
> On Wed, May 01, 2024 at 12:33:14AM -0700, Christoph Hellwig wrote:
> > > +	const u64 end_pos = min(pos + length, vi->tree_params.tree_size);
> > > +	struct backing_dev_info *bdi = inode->i_sb->s_bdi;
> > > +	const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT,
> > > +				     ULONG_MAX);
> > > +	const struct merkle_tree_params *params = &vi->tree_params;
> > 
> > bdi->io_pages is really a VM readahead concept.  I know this is existing
> > code, but can we rething why this is even used here?
> 
> I would get rid of it entirely for the merkle-by-block case, since we'd
> have to walk the xattr tree again just to find the next block.  XFS
> ignores the readahead value entirely.
> 
> I think this only makes sense for the merkle-by-page case, and only
> because ext4 and friends are stuffing the merkle data in the posteof
> parts of the file mapping.
> 
> And even then, shouldn't we figure out the amount of readahead going on
> and only ask for enough readahead of the merkle tree to satisfy that
> readahead?

The existing code is:

                unsigned long num_ra_pages =
                        min_t(unsigned long, last_index - index + 1,
                              inode->i_sb->s_bdi->io_pages);

So it does limit the readahead amount to the amount remaining to be read.

In addition, it's limited to io_pages.  It's possible that's not the best value
to use (maybe it should be ra_pages?), but the intent was to just use a large
readahead size, since this code is doing a fully sequential read.

I do think that the concept of Merkle tree readahead makes sense regardless of
how the blocks are being stored.  Having to go to disk every time a new 4K
Merkle tree block is needed increases read latencies.  It doesn't need to be
included in the initial implementation though.

> > And the returned/passed value should be a kernel pointer to the start
> > of the in-memory copy of the block?
> > to 
> 
> <shrug> This particular callsite is reading merkle data on behalf of an
> ioctl that exports data.  Maybe we want the filesystem's errors to be
> bounced up to userspace?

Yes, I think so.

> > > +static bool is_hash_block_verified(struct inode *inode,
> > > +				   struct fsverity_blockbuf *block,
> > >  				   unsigned long hblock_idx)
> > 
> > Other fsverify code seems to use the (IMHO) much more readable
> > two-tab indentation for prototype continuations, maybe stick to that?
> 
> I'll do that, if Eric says so. :)

My preference is to align continuations with the line that they're continuing:

static bool is_hash_block_verified(struct inode *inode,
				   struct fsverity_blockbuf *block,
				   unsigned long hblock_idx)

> > >
> > >  {
> > > +	struct fsverity_info *vi = inode->i_verity_info;
> > > +	struct page *hpage = (struct page *)block->context;
> > 
> > block->context is a void pointer, no need for casting it.
> 
> Eric insisted on it:
> https://lore.kernel.org/linux-xfs/20240306035622.GA68962@sol.localdomain/

No, I didn't.  It showed up in some code snippets that I suggested, but the
casts originated from the patch itself.  Leaving out the cast is fine with me.

> 
> > > +	for (; level > 0; level--)
> > > +		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);
> > 
> > Overlh long line here.  But the loop kinda looks odd anyway with the
> > exta one off in the body instead of the loop.
> 
> I /think/ that's a side effect of reusing the value of @level after the
> first loop fails as the initial conditions of the unwind loop.  AFAICT
> it doesn't leak, but it's not entirely straightforward.

When an error occurs either ascending or descending the tree, we end up here
with 'level' containing the number of levels that need to be cleaned up.  It
might be clearer if it was called 'num_levels', though that could be confused
with 'params->num_levels'.  Or we could use: 'while (level-- > 0)'.

This is unrelated to this patch though.

- Eric

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 04/18] fsverity: support block-based Merkle tree caching
  2024-05-01 22:35       ` Darrick J. Wong
@ 2024-05-02  4:42         ` Christoph Hellwig
  2024-05-15  2:16           ` Eric Biggers
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-02  4:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, ebiggers, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Wed, May 01, 2024 at 03:35:19PM -0700, Darrick J. Wong wrote:
> Got a link?  This is the first I've heard of this, but TBH I've been
> ignoring a /lot/ of things trying to get online repair merged (thank
> you!) over the past months...

This was long before I got involved with repair :)

Below is what I found in my local tree.  It doesn't have a proper commit
log, so I probably only sent it out as a RFC in reply to a patch series
posting, most likely untested:

commit c11dcbe101a240c7a9e9bae7efaff2779d88b292
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Oct 16 14:14:11 2023 +0200

    fsverity block interface

diff --git a/Documentation/filesystems/fsverity.rst b/Documentation/filesystems/fsverity.rst
index af889512c6ac99..c616d530a89086 100644
--- a/Documentation/filesystems/fsverity.rst
+++ b/Documentation/filesystems/fsverity.rst
@@ -648,7 +648,7 @@ which verifies data that has been read into the pagecache of a verity
 inode.  The containing folio must still be locked and not Uptodate, so
 it's not yet readable by userspace.  As needed to do the verification,
 fsverity_verify_blocks() will call back into the filesystem to read
-hash blocks via fsverity_operations::read_merkle_tree_page().
+hash blocks via fsverity_operations::read_merkle_tree_block().
 
 fsverity_verify_blocks() returns false if verification failed; in this
 case, the filesystem must not set the folio Uptodate.  Following this,
diff --git a/fs/btrfs/verity.c b/fs/btrfs/verity.c
index 2b34796f68d349..4b6134923232e7 100644
--- a/fs/btrfs/verity.c
+++ b/fs/btrfs/verity.c
@@ -713,20 +713,20 @@ int btrfs_get_verity_descriptor(struct inode *inode, void *buf, size_t buf_size)
  *
  * Returns the page we read, or an ERR_PTR on error.
  */
-static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
-						pgoff_t index,
-						unsigned long num_ra_pages,
-						u8 log_blocksize)
+static int btrfs_read_merkle_tree_block(struct inode *inode,
+		unsigned int offset, struct fsverity_block *block,
+		unsigned long num_ra_pages)
 {
 	struct folio *folio;
+	pgoff_t index = offset >> PAGE_SHIFT;
 	u64 off = (u64)index << PAGE_SHIFT;
 	loff_t merkle_pos = merkle_file_pos(inode);
 	int ret;
 
 	if (merkle_pos < 0)
-		return ERR_PTR(merkle_pos);
+		return merkle_pos;
 	if (merkle_pos > inode->i_sb->s_maxbytes - off - PAGE_SIZE)
-		return ERR_PTR(-EFBIG);
+		return -EFBIG;
 	index += merkle_pos >> PAGE_SHIFT;
 again:
 	folio = __filemap_get_folio(inode->i_mapping, index, FGP_ACCESSED, 0);
@@ -739,7 +739,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 		if (!folio_test_uptodate(folio)) {
 			folio_unlock(folio);
 			folio_put(folio);
-			return ERR_PTR(-EIO);
+			return -EIO;
 		}
 		folio_unlock(folio);
 		goto out;
@@ -748,7 +748,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 	folio = filemap_alloc_folio(mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS),
 				    0);
 	if (!folio)
-		return ERR_PTR(-ENOMEM);
+		return -ENOMEM;
 
 	ret = filemap_add_folio(inode->i_mapping, folio, index, GFP_NOFS);
 	if (ret) {
@@ -756,7 +756,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 		/* Did someone else insert a folio here? */
 		if (ret == -EEXIST)
 			goto again;
-		return ERR_PTR(ret);
+		return ret;
 	}
 
 	/*
@@ -769,7 +769,7 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 			     folio_address(folio), PAGE_SIZE, &folio->page);
 	if (ret < 0) {
 		folio_put(folio);
-		return ERR_PTR(ret);
+		return ret;
 	}
 	if (ret < PAGE_SIZE)
 		folio_zero_segment(folio, ret, PAGE_SIZE);
@@ -778,7 +778,8 @@ static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
 	folio_unlock(folio);
 
 out:
-	return folio_file_page(folio, index);
+	return fsverity_set_block_page(block, folio_file_page(folio, index),
+				       offset);
 }
 
 /*
@@ -809,6 +810,7 @@ const struct fsverity_operations btrfs_verityops = {
 	.begin_enable_verity     = btrfs_begin_enable_verity,
 	.end_enable_verity       = btrfs_end_enable_verity,
 	.get_verity_descriptor   = btrfs_get_verity_descriptor,
-	.read_merkle_tree_page   = btrfs_read_merkle_tree_page,
+	.read_merkle_tree_block  = btrfs_read_merkle_tree_block,
 	.write_merkle_tree_block = btrfs_write_merkle_tree_block,
+	.drop_merkle_tree_block	 = fsverity_drop_page_merke_tree_block,
 };
diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index 4e2f01f048c09b..5623e2c1c302e8 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -358,15 +358,13 @@ static int ext4_get_verity_descriptor(struct inode *inode, void *buf,
 	return desc_size;
 }
 
-static struct page *ext4_read_merkle_tree_page(struct inode *inode,
-					       pgoff_t index,
-					       unsigned long num_ra_pages,
-					       u8 log_blocksize)
+static int ext4_read_merkle_tree_block(struct inode *inode, unsigned int offset,
+		struct fsverity_block *block, unsigned long num_ra_pages)
 {
 	struct folio *folio;
+	pgoff_t index;
 
-	index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
-
+	index = (ext4_verity_metadata_pos(inode) + offset) >> PAGE_SHIFT;
 	folio = __filemap_get_folio(inode->i_mapping, index, FGP_ACCESSED, 0);
 	if (IS_ERR(folio) || !folio_test_uptodate(folio)) {
 		DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index);
@@ -377,9 +375,10 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 			page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
 		folio = read_mapping_folio(inode->i_mapping, index, NULL);
 		if (IS_ERR(folio))
-			return ERR_CAST(folio);
+			return PTR_ERR(folio);
 	}
-	return folio_file_page(folio, index);
+	return fsverity_set_block_page(block, folio_file_page(folio, index),
+				       offset);
 }
 
 static int ext4_write_merkle_tree_block(struct inode *inode, const void *buf,
@@ -394,6 +393,7 @@ const struct fsverity_operations ext4_verityops = {
 	.begin_enable_verity	= ext4_begin_enable_verity,
 	.end_enable_verity	= ext4_end_enable_verity,
 	.get_verity_descriptor	= ext4_get_verity_descriptor,
-	.read_merkle_tree_page	= ext4_read_merkle_tree_page,
+	.read_merkle_tree_block	= ext4_read_merkle_tree_block,
 	.write_merkle_tree_block = ext4_write_merkle_tree_block,
+	.drop_merkle_tree_block	= fsverity_drop_page_merke_tree_block,
 };
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 601ab9f0c02492..aac9281e9c4565 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -255,15 +255,13 @@ static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
 	return size;
 }
 
-static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
-					       pgoff_t index,
-					       unsigned long num_ra_pages,
-					       u8 log_blocksize)
+static int f2fs_read_merkle_tree_block(struct inode *inode, unsigned int offset,
+		struct fsverity_block *block, unsigned long num_ra_pages)
 {
 	struct page *page;
+	pgoff_t index;
 
-	index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
-
+	index = (f2fs_verity_metadata_pos(inode) + offset) >> PAGE_SHIFT;
 	page = find_get_page_flags(inode->i_mapping, index, FGP_ACCESSED);
 	if (!page || !PageUptodate(page)) {
 		DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index);
@@ -274,7 +272,7 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 			page_cache_ra_unbounded(&ractl, num_ra_pages, 0);
 		page = read_mapping_page(inode->i_mapping, index, NULL);
 	}
-	return page;
+	return fsverity_set_block_page(block, page, offset);
 }
 
 static int f2fs_write_merkle_tree_block(struct inode *inode, const void *buf,
@@ -289,6 +287,7 @@ const struct fsverity_operations f2fs_verityops = {
 	.begin_enable_verity	= f2fs_begin_enable_verity,
 	.end_enable_verity	= f2fs_end_enable_verity,
 	.get_verity_descriptor	= f2fs_get_verity_descriptor,
-	.read_merkle_tree_page	= f2fs_read_merkle_tree_page,
+	.read_merkle_tree_block	= f2fs_read_merkle_tree_block,
 	.write_merkle_tree_block = f2fs_write_merkle_tree_block,
+	.drop_merkle_tree_block	= fsverity_drop_page_merke_tree_block,
 };
diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
index 182bddf5dec54c..5e362f8562bd5d 100644
--- a/fs/verity/read_metadata.c
+++ b/fs/verity/read_metadata.c
@@ -12,10 +12,33 @@
 #include <linux/sched/signal.h>
 #include <linux/uaccess.h>
 
+int fsverity_set_block_page(struct fsverity_block *block,
+		struct page *page, unsigned int index)
+{
+	if (IS_ERR(page))
+		return PTR_ERR(page);
+	block->kaddr = page_address(page) + (index % PAGE_SIZE);
+	block->cached = PageChecked(page);
+	block->context = page;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fsverity_set_block_page);
+
+void fsverity_drop_page_merke_tree_block(struct fsverity_block *block)
+{
+	struct page *page = block->context;
+
+	if (block->verified)
+		SetPageChecked(page);
+	put_page(page);
+}
+EXPORT_SYMBOL_GPL(fsverity_drop_page_merke_tree_block);
+
 static int fsverity_read_merkle_tree(struct inode *inode,
 				     const struct fsverity_info *vi,
 				     void __user *buf, u64 offset, int length)
 {
+	const struct fsverity_operations *vop = inode->i_sb->s_vop;
 	u64 end_offset;
 	unsigned int offs_in_block;
 	unsigned int block_size = vi->tree_params.block_size;
@@ -45,20 +68,19 @@ static int fsverity_read_merkle_tree(struct inode *inode,
 		struct fsverity_block block;
 
 		block.len = block_size;
-		if (fsverity_read_merkle_tree_block(inode,
-					index << vi->tree_params.log_blocksize,
-					&block, num_ra_pages)) {
-			fsverity_drop_block(inode, &block);
+		if (vop->read_merkle_tree_block(inode,
+				index << vi->tree_params.log_blocksize,
+				&block, num_ra_pages)) {
 			err = -EFAULT;
 			break;
 		}
 
 		if (copy_to_user(buf, block.kaddr + offs_in_block, bytes_to_copy)) {
-			fsverity_drop_block(inode, &block);
+			vop->drop_merkle_tree_block(&block);
 			err = -EFAULT;
 			break;
 		}
-		fsverity_drop_block(inode, &block);
+		vop->drop_merkle_tree_block(&block);
 		block.kaddr = NULL;
 
 		retval += bytes_to_copy;
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index dfe01f12184341..9b84262a6fa413 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -42,6 +42,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		  const void *data, u64 data_pos, unsigned long max_ra_pages)
 {
 	const struct merkle_tree_params *params = &vi->tree_params;
+	const struct fsverity_operations *vop = inode->i_sb->s_vop;
 	const unsigned int hsize = params->digest_size;
 	int level;
 	int err;
@@ -115,9 +116,9 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		block->len = params->block_size;
 		num_ra_pages = level == 0 ?
 			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;
-		err = fsverity_read_merkle_tree_block(
-			inode, hblock_idx << params->log_blocksize, block,
-			num_ra_pages);
+		err = vop->read_merkle_tree_block(inode,
+				hblock_idx << params->log_blocksize, block,
+				num_ra_pages);
 		if (err) {
 			fsverity_err(inode,
 				     "Error %d reading Merkle tree block %lu",
@@ -127,7 +128,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		if (is_hash_block_verified(vi, hblock_idx, block->cached)) {
 			memcpy(_want_hash, block->kaddr + hoffset, hsize);
 			want_hash = _want_hash;
-			fsverity_drop_block(inode, block);
+			vop->drop_merkle_tree_block(block);
 			goto descend;
 		}
 		hblocks[level].index = hblock_idx;
@@ -157,7 +158,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		block->verified = true;
 		memcpy(_want_hash, haddr + hoffset, hsize);
 		want_hash = _want_hash;
-		fsverity_drop_block(inode, block);
+		vop->drop_merkle_tree_block(block);
 	}
 
 	/* Finally, verify the data block. */
@@ -174,9 +175,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
 		     params->hash_alg->name, hsize, want_hash,
 		     params->hash_alg->name, hsize, real_hash);
 error:
-	for (; level > 0; level--) {
-		fsverity_drop_block(inode, &hblocks[level - 1].block);
-	}
+	for (; level > 0; level--)
+		vop->drop_merkle_tree_block(&hblocks[level - 1].block);
 	return false;
 }
 
diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
index ce37a430bc97f2..ae9ae7719af558 100644
--- a/include/linux/fsverity.h
+++ b/include/linux/fsverity.h
@@ -104,27 +104,6 @@ struct fsverity_operations {
 	int (*get_verity_descriptor)(struct inode *inode, void *buf,
 				     size_t bufsize);
 
-	/**
-	 * Read a Merkle tree page of the given inode.
-	 *
-	 * @inode: the inode
-	 * @index: 0-based index of the page within the Merkle tree
-	 * @num_ra_pages: The number of Merkle tree pages that should be
-	 *		  prefetched starting at @index if the page at @index
-	 *		  isn't already cached.  Implementations may ignore this
-	 *		  argument; it's only a performance optimization.
-	 *
-	 * This can be called at any time on an open verity file.  It may be
-	 * called by multiple processes concurrently, even with the same page.
-	 *
-	 * Note that this must retrieve a *page*, not necessarily a *block*.
-	 *
-	 * Return: the page on success, ERR_PTR() on failure
-	 */
-	struct page *(*read_merkle_tree_page)(struct inode *inode,
-					      pgoff_t index,
-					      unsigned long num_ra_pages,
-					      u8 log_blocksize);
 	/**
 	 * Read a Merkle tree block of the given inode.
 	 * @inode: the inode
@@ -162,13 +141,12 @@ struct fsverity_operations {
 
 	/**
 	 * Release the reference to a Merkle tree block
-	 *
-	 * @page: the block to release
+	 * @block: the block to release
 	 *
 	 * This is called when fs-verity is done with a block obtained with
 	 * ->read_merkle_tree_block().
 	 */
-	void (*drop_block)(struct fsverity_block *block);
+	void (*drop_merkle_tree_block)(struct fsverity_block *block);
 };
 
 #ifdef CONFIG_FS_VERITY
@@ -217,74 +195,16 @@ static inline void fsverity_cleanup_inode(struct inode *inode)
 
 int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg);
 
+int fsverity_set_block_page(struct fsverity_block *block,
+		struct page *page, unsigned int index);
+void fsverity_drop_page_merke_tree_block(struct fsverity_block *block);
+
 /* verify.c */
 
 bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset);
 void fsverity_verify_bio(struct bio *bio);
 void fsverity_enqueue_verify_work(struct work_struct *work);
 
-/**
- * fsverity_drop_block() - drop block obtained with ->read_merkle_tree_block()
- * @inode: inode in use for verification or metadata reading
- * @block: block to be dropped
- *
- * Generic put_page() method. Calls out back to filesystem if ->drop_block() is
- * set, otherwise do nothing.
- *
- */
-static inline void fsverity_drop_block(struct inode *inode,
-		struct fsverity_block *block)
-{
-	if (inode->i_sb->s_vop->drop_block)
-		inode->i_sb->s_vop->drop_block(block);
-	else {
-		struct page *page = (struct page *)block->context;
-
-		if (block->verified)
-			SetPageChecked(page);
-
-		put_page(page);
-	}
-}
-
-/**
- * fsverity_read_block_from_page() - layer between fs using read page
- * and read block
- * @inode: inode in use for verification or metadata reading
- * @index: index of the block in the tree (offset into the tree)
- * @block: block to be read
- * @num_ra_pages: number of pages to readahead, may be ignored
- *
- * Depending on fs implementation use read_merkle_tree_block or
- * read_merkle_tree_page.
- */
-static inline int fsverity_read_merkle_tree_block(struct inode *inode,
-					unsigned int index,
-					struct fsverity_block *block,
-					unsigned long num_ra_pages)
-{
-	struct page *page;
-
-	if (inode->i_sb->s_vop->read_merkle_tree_block)
-		return inode->i_sb->s_vop->read_merkle_tree_block(
-			inode, index, block, num_ra_pages);
-
-	page = inode->i_sb->s_vop->read_merkle_tree_page(
-			inode, index >> PAGE_SHIFT, num_ra_pages,
-			block->len);
-
-	block->kaddr = page_address(page) + (index % PAGE_SIZE);
-	block->cached = PageChecked(page);
-	block->context = page;
-
-	if (IS_ERR(page))
-		return PTR_ERR(page);
-	else
-		return 0;
-}
-
-
-
 #else /* !CONFIG_FS_VERITY */
 
 static inline struct fsverity_info *fsverity_get_info(const struct inode *inode)
@@ -362,20 +282,6 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work)
 	WARN_ON_ONCE(1);
 }
 
-static inline void fsverity_drop_page(struct inode *inode, struct page *page)
-{
-	WARN_ON_ONCE(1);
-}
-
-static inline int fsverity_read_merkle_tree_block(struct inode *inode,
-					unsigned int index,
-					struct fsverity_block *block,
-					unsigned long num_ra_pages)
-{
-	WARN_ON_ONCE(1);
-	return -EOPNOTSUPP;
-}
-
 #endif	/* !CONFIG_FS_VERITY */
 
 static inline bool fsverity_verify_folio(struct folio *folio)

^ permalink raw reply related	[flat|nested] 159+ messages in thread

* Re: [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-05-01 22:39       ` Darrick J. Wong
@ 2024-05-02  4:56         ` Christoph Hellwig
  2024-05-02  5:56         ` Chandan Babu R
  1 sibling, 0 replies; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-02  4:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, aalbersh, ebiggers, linux-xfs,
	alexl, walters, fsverity, linux-fsdevel

On Wed, May 01, 2024 at 03:39:27PM -0700, Darrick J. Wong wrote:
> > Can we please get this included ASAP instead of having it linger around?
> 
> Chandan, how many more patches are you willing to take for 6.10?  I
> think Christoph has a bunch of fully-reviewed cleanups lurking on the
> list, and then there's this one.

Also a bunch of more bugfixes for the log recovery out of bounds
access and the racy iext accesses.  Although I need to respin that
last series for the name change requested by Dave unless someone
disagrees with that.

But even if it's not or 6.10 I would so love if we could feed this kind
of generally useful cleanups upstream ASAP instead of reposting it
hundreds of times or have it linger in a branch somewhere.  That just
leads to frustranting rebases, conflicts and reinventions.  You have
quite a few more candidates like that in your patch stack.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-05-01 22:39       ` Darrick J. Wong
  2024-05-02  4:56         ` Christoph Hellwig
@ 2024-05-02  5:56         ` Chandan Babu R
  2024-05-02  6:34           ` Christoph Hellwig
  1 sibling, 1 reply; 159+ messages in thread
From: Chandan Babu R @ 2024-05-02  5:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, ebiggers, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Wed, May 01, 2024 at 03:39:27 PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2024 at 11:55:26PM -0700, Christoph Hellwig wrote:
>> On Mon, Apr 29, 2024 at 08:24:22PM -0700, Darrick J. Wong wrote:
>> > From: Darrick J. Wong <djwong@kernel.org>
>> > 
>> > In the next few patches we're going to refactor the attr remote code so
>> > that we can support headerless remote xattr values for storing merkle
>> > tree blocks.  For now, let's change the code to use unsigned int to
>> > describe quantities of bytes and blocks that cannot be negative.
>> > 
>> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>> > Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
>> 
>> Looks good:
>> 
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> Thanks!
>
>> Can we please get this included ASAP instead of having it linger around?
>
> Chandan, how many more patches are you willing to take for 6.10?  I
> think Christoph has a bunch of fully-reviewed cleanups lurking on the
> list, and then there's this one.

I have pushed a set of new patches to for-next a few hours ago.

Also, I have queued the following patchsets for internal testing,
1. fix h_size validation v2
2. quota (un)reservation cleanups
3. Removal of duplicate includes
   i.e. https://lore.kernel.org/linux-xfs/20240430034728.86811-1-jiapeng.chong@linux.alibaba.com/ 

The remaining patchsets from Christoph i.e.
1. optimize COW end I/O remapping v2
2. optimize local for and shortform directory handling
2. iext handling fixes and cleanup
... are either missing RVBs or need to address review comments.

Darrick, I will update for-next sometime tomorrow evening my time. Can you
please send me a pull request containing fs-verity patchset based on
tomorrow's updated for-next branch by end of Friday? This will be last
patchset I will be applying for 6.10 merge window since I would like to test
linux-next during next week.

-- 
Chandan

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  2024-05-02  5:56         ` Chandan Babu R
@ 2024-05-02  6:34           ` Christoph Hellwig
  0 siblings, 0 replies; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-02  6:34 UTC (permalink / raw)
  To: Chandan Babu R
  Cc: Darrick J. Wong, Christoph Hellwig, aalbersh, ebiggers,
	linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Thu, May 02, 2024 at 11:26:08AM +0530, Chandan Babu R wrote:
> 1. optimize COW end I/O remapping v2

This has been retracted and split.

> 2. iext handling fixes and cleanup
> ... are either missing RVBs or need to address review comments.

I'll resend this with the rename in a bit.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-01  7:23       ` Christoph Hellwig
@ 2024-05-07 21:24         ` Darrick J. Wong
  2024-05-08 11:47           ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-07 21:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Wed, May 01, 2024 at 12:23:02AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 30, 2024 at 11:53:00PM -0700, Christoph Hellwig wrote:
> > This and the header hacks suggest to me that shoe horning the fsverity
> > blocks into attrs just feels like the wrong approach.
> > 
> > They don't really behave like attrs, they aren't key/value paris that
> > are separate, but a large amount of same sized blocks with logical
> > indexing.  All that is actually nicely solved by the original fsverity
> > used by ext4/f2fs, while we have to pile workarounds ontop of
> > workarounds to make attrs work.
> 
> Taking this a bit further:  If we want to avoid the problems associated
> with the original scheme, mostly the file size limitation, and the (IMHO
> more cosmetic than real) confusion with post-EOF preallocations, we
> can still store the data in the attr fork, but not in the traditional
> attr format.  The attr fork provides the logical index to physical
> translation as the data fork, and while that is current only used for
> dabtree blocks and remote attr values, that isn't actually a fundamental
> requirement for using it.
> 
> All the attr fork placement works through xfs_bmap_first_unused() to
> find completely random free space in the logic address space.
> 
> Now if we reserved say the high bit for verity blocks in verity enabled
> file systems we can simply use the bmap btree to do the mapping from
> the verity index to the on-disk verify blocks without any other impact
> to the attr code.

Since we know the size of the merkle data ahead of time, we could also
preallocate space in the attr fork and create a remote ATTR_VERITY xattr
named "merkle" that points to the allocated space.  Then we don't have
to have magic meanings for the high bit.

Though I guess the question is, given the format:

struct xfs_attr_leaf_name_remote {
	__be32	valueblk;		/* block number of value bytes */
	__be32	valuelen;		/* number of bytes in value */
	__u8	namelen;		/* length of name bytes */
	/*
	 * In Linux 6.5 this flex array was converted from name[1] to name[].
	 * Be very careful here about extra padding at the end; see
	 * xfs_attr_leaf_entsize_remote() for details.
	 */
	__u8	name[];			/* name bytes */
};

Will we ever have a merkle tree larger than 2^32-1 bytes in length?  If
that's possible, then either we shard the merkle tree, or we have to rev
the ondisk xfs_attr_leaf_name_remote structure.

I think we have to rev the format anyway, since with nrext64==1 we can
have attr fork extents that start above 2^32 blocks, and the codebase
will blindly truncate the 64-bit quantity returned by
xfs_bmap_first_unused.

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-07 21:24         ` Darrick J. Wong
@ 2024-05-08 11:47           ` Christoph Hellwig
  2024-05-08 20:26             ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-08 11:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, ebiggers, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Tue, May 07, 2024 at 02:24:54PM -0700, Darrick J. Wong wrote:
> Since we know the size of the merkle data ahead of time, we could also
> preallocate space in the attr fork and create a remote ATTR_VERITY xattr
> named "merkle" that points to the allocated space.  Then we don't have
> to have magic meanings for the high bit.

Note that high bit was just an example, a random high offset
might be a better choice, sized with some space to spare for the maximum
verify data.

> Will we ever have a merkle tree larger than 2^32-1 bytes in length?  If
> that's possible, then either we shard the merkle tree, or we have to rev
> the ondisk xfs_attr_leaf_name_remote structure.

If we did that would be yet another indicator that they aren't attrs
but something else.  But maybe I should stop banging that drum and
agree that everything is a nail if all you got is a hammer.. :)

> 
> I think we have to rev the format anyway, since with nrext64==1 we can
> have attr fork extents that start above 2^32 blocks, and the codebase
> will blindly truncate the 64-bit quantity returned by
> xfs_bmap_first_unused.

Or we decide the space above 2^32 blocks can't be used by attrs,
and only by other users with other means of discover.  Say the
verify hashes..


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets
  2024-05-02  0:42         ` Eric Biggers
@ 2024-05-08 20:14           ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-08 20:14 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 02, 2024 at 12:42:31AM +0000, Eric Biggers wrote:
> On Wed, May 01, 2024 at 03:33:03PM -0700, Darrick J. Wong wrote:
> > On Wed, May 01, 2024 at 12:33:14AM -0700, Christoph Hellwig wrote:
> > > > +	const u64 end_pos = min(pos + length, vi->tree_params.tree_size);
> > > > +	struct backing_dev_info *bdi = inode->i_sb->s_bdi;
> > > > +	const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT,
> > > > +				     ULONG_MAX);
> > > > +	const struct merkle_tree_params *params = &vi->tree_params;
> > > 
> > > bdi->io_pages is really a VM readahead concept.  I know this is existing
> > > code, but can we rething why this is even used here?
> > 
> > I would get rid of it entirely for the merkle-by-block case, since we'd
> > have to walk the xattr tree again just to find the next block.  XFS
> > ignores the readahead value entirely.
> > 
> > I think this only makes sense for the merkle-by-page case, and only
> > because ext4 and friends are stuffing the merkle data in the posteof
> > parts of the file mapping.
> > 
> > And even then, shouldn't we figure out the amount of readahead going on
> > and only ask for enough readahead of the merkle tree to satisfy that
> > readahead?
> 
> The existing code is:
> 
>                 unsigned long num_ra_pages =
>                         min_t(unsigned long, last_index - index + 1,
>                               inode->i_sb->s_bdi->io_pages);
> 
> So it does limit the readahead amount to the amount remaining to be read.
> 
> In addition, it's limited to io_pages.  It's possible that's not the best value
> to use (maybe it should be ra_pages?), but the intent was to just use a large
> readahead size, since this code is doing a fully sequential read.

io_pages is supposed to be the optimal IO size, whereas ra_pages is the
readahead size for the block device.  I don't know why you chose
io_pages, but I'm assuming there's a reason there. :)

Somewhat confusingly, I think mm/readahead.c picks the maximum of
io_pages and ra_pages, which doesn't clear things up for me either.

Personally I think fsverity should be using ra_pages here, but changing
it should be a different patch with a separate justification.  This
patch simply has to translate the merkle-by-page code to handle by-block.

> I do think that the concept of Merkle tree readahead makes sense regardless of
> how the blocks are being stored.  Having to go to disk every time a new 4K
> Merkle tree block is needed increases read latencies.  It doesn't need to be
> included in the initial implementation though.

Of course, if we're really ok with xfs making a giant left turn and
storing the entire merkle tree as one big chunk of file range in the
attr fork, then suddenly it *does* make sense to allow merkle tree
readahead again.

> > > And the returned/passed value should be a kernel pointer to the start
> > > of the in-memory copy of the block?
> > > to 
> > 
> > <shrug> This particular callsite is reading merkle data on behalf of an
> > ioctl that exports data.  Maybe we want the filesystem's errors to be
> > bounced up to userspace?
> 
> Yes, I think so.

Ok, thanks for confirming that.

> > > > +static bool is_hash_block_verified(struct inode *inode,
> > > > +				   struct fsverity_blockbuf *block,
> > > >  				   unsigned long hblock_idx)
> > > 
> > > Other fsverify code seems to use the (IMHO) much more readable
> > > two-tab indentation for prototype continuations, maybe stick to that?
> > 
> > I'll do that, if Eric says so. :)
> 
> My preference is to align continuations with the line that they're continuing:
> 
> static bool is_hash_block_verified(struct inode *inode,
> 				   struct fsverity_blockbuf *block,
> 				   unsigned long hblock_idx)
> 
> > > >
> > > >  {
> > > > +	struct fsverity_info *vi = inode->i_verity_info;
> > > > +	struct page *hpage = (struct page *)block->context;
> > > 
> > > block->context is a void pointer, no need for casting it.
> > 
> > Eric insisted on it:
> > https://lore.kernel.org/linux-xfs/20240306035622.GA68962@sol.localdomain/
> 
> No, I didn't.  It showed up in some code snippets that I suggested, but the
> casts originated from the patch itself.  Leaving out the cast is fine with me.

Oh ok.  I'll drop those then.

> > 
> > > > +	for (; level > 0; level--)
> > > > +		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);
> > > 
> > > Overlh long line here.  But the loop kinda looks odd anyway with the
> > > exta one off in the body instead of the loop.
> > 
> > I /think/ that's a side effect of reusing the value of @level after the
> > first loop fails as the initial conditions of the unwind loop.  AFAICT
> > it doesn't leak, but it's not entirely straightforward.
> 
> When an error occurs either ascending or descending the tree, we end up here
> with 'level' containing the number of levels that need to be cleaned up.  It
> might be clearer if it was called 'num_levels', though that could be confused
> with 'params->num_levels'.  Or we could use: 'while (level-- > 0)'.
> 
> This is unrelated to this patch though.

<nod>

--D

> - Eric
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-08 11:47           ` Christoph Hellwig
@ 2024-05-08 20:26             ` Darrick J. Wong
  2024-05-09  5:02               ` Christoph Hellwig
  2024-05-09 17:46               ` Eric Biggers
  0 siblings, 2 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-08 20:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Wed, May 08, 2024 at 04:47:32AM -0700, Christoph Hellwig wrote:
> On Tue, May 07, 2024 at 02:24:54PM -0700, Darrick J. Wong wrote:
> > Since we know the size of the merkle data ahead of time, we could also
> > preallocate space in the attr fork and create a remote ATTR_VERITY xattr
> > named "merkle" that points to the allocated space.  Then we don't have
> > to have magic meanings for the high bit.
> 
> Note that high bit was just an example, a random high offset
> might be a better choice, sized with some space to spare for the maximum
> verify data.

I guess we could make it really obvious by allocating range in the
mapping starting at MAX_FILEOFF and going downwards.  Chances are pretty
good that with the xattr info growing upwards they're never going to
meet.

> > Will we ever have a merkle tree larger than 2^32-1 bytes in length?  If
> > that's possible, then either we shard the merkle tree, or we have to rev
> > the ondisk xfs_attr_leaf_name_remote structure.
> 
> If we did that would be yet another indicator that they aren't attrs
> but something else.  But maybe I should stop banging that drum and
> agree that everything is a nail if all you got is a hammer.. :)

Hammer?  All I've got is a big block of cheese. :P

FWIW the fsverity code seems to cut us off at U32_MAX bytes of merkle
data so that's going to be the limit until they rev the ondisk format.

> > I think we have to rev the format anyway, since with nrext64==1 we can
> > have attr fork extents that start above 2^32 blocks, and the codebase
> > will blindly truncate the 64-bit quantity returned by
> > xfs_bmap_first_unused.
> 
> Or we decide the space above 2^32 blocks can't be used by attrs,
> and only by other users with other means of discover.  Say the
> verify hashes..

Well right now they can't be used by attrs because xfs_dablk_t isn't big
enough to fit a larger value.  The dangerous part here is that the code
silently truncates the outparam of xfs_bmap_first_unused, so I'll fix
that too.

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks
  2024-05-02  0:01         ` Eric Biggers
@ 2024-05-08 20:26           ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-08 20:26 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 02, 2024 at 12:01:32AM +0000, Eric Biggers wrote:
> On Wed, May 01, 2024 at 03:47:36PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 30, 2024 at 11:47:23PM -0700, Christoph Hellwig wrote:
> > > On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Now that fsverity tells our merkle tree io functions about what a hash
> > > > of a data block full of zeroes looks like, we can use this information
> > > > to avoid writing out merkle tree blocks for sparse regions of the file.
> > > > For verified gold master images this can save quite a bit of overhead.
> > > 
> > > Is this something that fsverity should be doing in a generic way?
> > 
> > I don't think it's all that useful for ext4/f2fs because they always
> > write out full merkle tree blocks even if it's the zerohash over and
> > over again.  Old kernels aren't going to know how to deal with that.
> > 
> > > It feels odd to have XFS behave different from everyone else here,
> > > even if this does feel useful.  Do we also need any hash validation
> > > that no one tampered with the metadata and added a new extent, or
> > > is this out of scope for fsverity?
> > 
> > If they wrote a new extent with nonzero contents, then the validation
> > will fail, right?
> > 
> > If they added a new unwritten extent (or a written one full of zeroes),
> > then the file data hasn't changed and validation would still pass,
> > correct?
> 
> The point of fsverity is to verify that file data is consistent with the
> top-level file digest.  It doesn't really matter which type of extent the data
> came from, or if the data got synthesized somehow (e.g. zeroes synthesized from
> a hole), as long as fsverity still gets invoked to verify the data.  If the data
> itself passes verification, then it's good.  The same applies to Merkle tree
> blocks which are an intermediate step in the verification.

<nod>

> In the Merkle tree, ext4 and f2fs currently just use the same concept of
> sparsity as the file data, i.e. when a block is unmapped, it is filled in with
> all zeroes.  As Darrick noticed, this isn't really the right concept of sparsity
> for the Merkle tree, as a block full of hashes of zeroed blocks should be used,
> not literally a zeroed block.  I think it makes sense to fix this in XFS, as
> it's newly adding fsverity support, and this is a filesystem-level
> implementation detail.  It would be difficult to fix this in ext4 and f2fs since
> it would be an on-disk format upgrade.  (Existing files should not actually have
> any sparse Merkle tree blocks, so we probably could redefine what they mean.
> But even if so, old kernels would not be able to read the new files.)

<nod>

--D

> - Eric
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-02  0:15         ` Eric Biggers
@ 2024-05-08 20:31           ` Darrick J. Wong
  2024-05-09  5:04             ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-08 20:31 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 02, 2024 at 12:15:01AM +0000, Eric Biggers wrote:
> On Wed, May 01, 2024 at 03:50:07PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 30, 2024 at 11:48:29PM -0700, Christoph Hellwig wrote:
> > > On Mon, Apr 29, 2024 at 08:30:37PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Create an experimental ioctl so that we can turn off fsverity.
> > > 
> > > Didn't Eric argue against this?  And if we're adding this, I think
> > > it should be a generic feature and not just xfs specific.
> > 
> > The tagging is a bit wrong, but it is a generic fsverity ioctl, though
> > ext4/f2fs/btrfs don't have implementations.
> > 
> > <shrug> According to Ted, programs that care about fsverity are supposed
> > to check that VERITY is set in the stat data, but I imagine those
> > programs aren't expecting it to turn off suddenly.  Maybe I should make
> > this CAP_SYS_ADMIN?  Or withdraw it?
> > 
> 
> I'm concerned that fsverity could be disabled after someone has already checked
> for fsverity on a particular file.  Currently users only have to re-check for
> fsverity if they close the file and re-open it (as in that case it might have
> been replaced with a new file with fsverity disabled).
> 
> A similar issue also would exist for the in-kernel users of fsverity such as
> overlayfs and IMA (upstream), and IPE
> (https://lore.kernel.org/linux-security-module/1712969764-31039-1-git-send-email-wufan@linux.microsoft.com/).
> For example, IPE is being proposed to cache some state about fsverity in the LSM
> blob associated with the struct inode.  If fsverity is disabled on an inode,
> that state would get out of sync.  This could allow bypassing the IPE policy.
> 
> CAP_SYS_ADMIN isn't supposed to give a license to bypass all security features
> including LSMs, so using CAP_SYS_ADMIN doesn't seem like a great solution.

Hmm.  What if did something like what fsdax does to update the file
access methods?  We could clear the ondisk iflag but not the incore one;
set DONTCACHE on the dentry and the inode so that it will get reclaimed
ASAP instead of being put on the lru; and then tell userspace they have
to wait until the inode gets reclaimed and reloaded?

That would solve the problem of cached state (whether the statx flag
or IPE blobs) going stale because the only time we'd change the incore
flag is when there are zero open fds.

--D

> - Eric
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-08 20:26             ` Darrick J. Wong
@ 2024-05-09  5:02               ` Christoph Hellwig
  2024-05-09 20:02                 ` Darrick J. Wong
  2024-05-09 17:46               ` Eric Biggers
  1 sibling, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-09  5:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, ebiggers, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Wed, May 08, 2024 at 01:26:03PM -0700, Darrick J. Wong wrote:
> I guess we could make it really obvious by allocating range in the
> mapping starting at MAX_FILEOFF and going downwards.  Chances are pretty
> good that with the xattr info growing upwards they're never going to
> meet.

Yes, although I'd avoid taking chances.  More below.

> > Or we decide the space above 2^32 blocks can't be used by attrs,
> > and only by other users with other means of discover.  Say the
> > verify hashes..
> 
> Well right now they can't be used by attrs because xfs_dablk_t isn't big
> enough to fit a larger value.

Yes.

> The dangerous part here is that the code
> silently truncates the outparam of xfs_bmap_first_unused, so I'll fix
> that too.

Well, we should check for that in xfs_attr_rmt_find_hole /
xfs_da_grow_inode_int, totally independent of the fsverity work.
The condition is basically impossible to hit right now, but I'd rather
make sure we do have a solid check.  I'll prepare a patch for it.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-08 20:31           ` Darrick J. Wong
@ 2024-05-09  5:04             ` Christoph Hellwig
  2024-05-09 14:45               ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-09  5:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Eric Biggers, Christoph Hellwig, aalbersh, linux-xfs, alexl,
	walters, fsverity, linux-fsdevel

On Wed, May 08, 2024 at 01:31:48PM -0700, Darrick J. Wong wrote:
> Hmm.  What if did something like what fsdax does to update the file
> access methods?  We could clear the ondisk iflag but not the incore one;
> set DONTCACHE on the dentry and the inode so that it will get reclaimed
> ASAP instead of being put on the lru; and then tell userspace they have
> to wait until the inode gets reclaimed and reloaded?

Yikes.  That's a completely mess I'd rather get rid of than add more of
it.

What is the use case of disabling fsverity to start with vs just
removing a fsverity enabled file after copying the content out?


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-09  5:04             ` Christoph Hellwig
@ 2024-05-09 14:45               ` Darrick J. Wong
  2024-05-09 15:06                 ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-09 14:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Biggers, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Wed, May 08, 2024 at 10:04:05PM -0700, Christoph Hellwig wrote:
> On Wed, May 08, 2024 at 01:31:48PM -0700, Darrick J. Wong wrote:
> > Hmm.  What if did something like what fsdax does to update the file
> > access methods?  We could clear the ondisk iflag but not the incore one;
> > set DONTCACHE on the dentry and the inode so that it will get reclaimed
> > ASAP instead of being put on the lru; and then tell userspace they have
> > to wait until the inode gets reclaimed and reloaded?
> 
> Yikes.  That's a completely mess I'd rather get rid of than add more of
> it.
> 
> What is the use case of disabling fsverity to start with vs just
> removing a fsverity enabled file after copying the content out?

How do you salvage the content of a fsverity file if the merkle tree
hashes don't match the data?  I'm thinking about the backup disk usecase
where you enable fsverity to detect bitrot in your video files but
they'd otherwise be mostly playable if it weren't for the EIO.

I guess you could always ddrescue the file, right?

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-09 14:45               ` Darrick J. Wong
@ 2024-05-09 15:06                 ` Christoph Hellwig
  2024-05-09 15:09                   ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-09 15:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Eric Biggers, aalbersh, linux-xfs, alexl,
	walters, fsverity, linux-fsdevel

On Thu, May 09, 2024 at 07:45:42AM -0700, Darrick J. Wong wrote:
> How do you salvage the content of a fsverity file if the merkle tree
> hashes don't match the data?  I'm thinking about the backup disk usecase
> where you enable fsverity to detect bitrot in your video files but
> they'd otherwise be mostly playable if it weren't for the EIO.

Why would you enable fsverity for DVD images on your backup files?


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-09 15:06                 ` Christoph Hellwig
@ 2024-05-09 15:09                   ` Darrick J. Wong
  2024-05-09 15:13                     ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-09 15:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Biggers, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 09, 2024 at 08:06:46AM -0700, Christoph Hellwig wrote:
> On Thu, May 09, 2024 at 07:45:42AM -0700, Darrick J. Wong wrote:
> > How do you salvage the content of a fsverity file if the merkle tree
> > hashes don't match the data?  I'm thinking about the backup disk usecase
> > where you enable fsverity to detect bitrot in your video files but
> > they'd otherwise be mostly playable if it weren't for the EIO.
> 
> Why would you enable fsverity for DVD images on your backup files?

xfs doesn't do data block checksums.

I already have a dumb python program that basically duplicates fsverity
style merkle trees but I was looking forward to sunsetting it... :P

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-09 15:09                   ` Darrick J. Wong
@ 2024-05-09 15:13                     ` Christoph Hellwig
  2024-05-09 15:43                       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-09 15:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Eric Biggers, aalbersh, linux-xfs, alexl,
	walters, fsverity, linux-fsdevel

On Thu, May 09, 2024 at 08:09:55AM -0700, Darrick J. Wong wrote:
> xfs doesn't do data block checksums.
> 
> I already have a dumb python program that basically duplicates fsverity
> style merkle trees but I was looking forward to sunsetting it... :P

Well, fsverity as-is is intended for use cases where you care about
integrity of the file.  For that disabling it really doesn't make
sense.  If we have other use cases we can probably add a variant
of fsverity that clearly deals with non-integrity checksums.
Although just disabling them if they mismatch still feels like a
somewhat odd usage model.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-09 15:13                     ` Christoph Hellwig
@ 2024-05-09 15:43                       ` Darrick J. Wong
  2024-05-17 19:36                         ` Theodore Ts'o
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-09 15:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Biggers, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 09, 2024 at 08:13:48AM -0700, Christoph Hellwig wrote:
> On Thu, May 09, 2024 at 08:09:55AM -0700, Darrick J. Wong wrote:
> > xfs doesn't do data block checksums.
> > 
> > I already have a dumb python program that basically duplicates fsverity
> > style merkle trees but I was looking forward to sunsetting it... :P
> 
> Well, fsverity as-is is intended for use cases where you care about
> integrity of the file.  For that disabling it really doesn't make
> sense.  If we have other use cases we can probably add a variant
> of fsverity that clearly deals with non-integrity checksums.
> Although just disabling them if they mismatch still feels like a
> somewhat odd usage model.

Yeah, it definitely exists in the same weird grey area of turning off
metadata checksum validation to extract as many files from a busted fs
as can be done.

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-08 20:26             ` Darrick J. Wong
  2024-05-09  5:02               ` Christoph Hellwig
@ 2024-05-09 17:46               ` Eric Biggers
  2024-05-09 18:04                 ` Darrick J. Wong
  1 sibling, 1 reply; 159+ messages in thread
From: Eric Biggers @ 2024-05-09 17:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Wed, May 08, 2024 at 01:26:03PM -0700, Darrick J. Wong wrote:
> > If we did that would be yet another indicator that they aren't attrs
> > but something else.  But maybe I should stop banging that drum and
> > agree that everything is a nail if all you got is a hammer.. :)
> 
> Hammer?  All I've got is a big block of cheese. :P
> 
> FWIW the fsverity code seems to cut us off at U32_MAX bytes of merkle
> data so that's going to be the limit until they rev the ondisk format.
> 

Where does that happen?

- Eric

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-09 17:46               ` Eric Biggers
@ 2024-05-09 18:04                 ` Darrick J. Wong
  2024-05-09 18:36                   ` Eric Biggers
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-09 18:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 09, 2024 at 10:46:52AM -0700, Eric Biggers wrote:
> On Wed, May 08, 2024 at 01:26:03PM -0700, Darrick J. Wong wrote:
> > > If we did that would be yet another indicator that they aren't attrs
> > > but something else.  But maybe I should stop banging that drum and
> > > agree that everything is a nail if all you got is a hammer.. :)
> > 
> > Hammer?  All I've got is a big block of cheese. :P
> > 
> > FWIW the fsverity code seems to cut us off at U32_MAX bytes of merkle
> > data so that's going to be the limit until they rev the ondisk format.
> > 
> 
> Where does that happen?

fsverity_init_merkle_tree_params has the following:

	/*
	 * With block_size != PAGE_SIZE, an in-memory bitmap will need to be
	 * allocated to track the "verified" status of hash blocks.  Don't allow
	 * this bitmap to get too large.  For now, limit it to 1 MiB, which
	 * limits the file size to about 4.4 TB with SHA-256 and 4K blocks.
	 *
	 * Together with the fact that the data, and thus also the Merkle tree,
	 * cannot have more than ULONG_MAX pages, this implies that hash block
	 * indices can always fit in an 'unsigned long'.  But to be safe, we
	 * explicitly check for that too.  Note, this is only for hash block
	 * indices; data block indices might not fit in an 'unsigned long'.
	 */
	if ((params->block_size != PAGE_SIZE && offset > 1 << 23) ||
	    offset > ULONG_MAX) {
		fsverity_err(inode, "Too many blocks in Merkle tree");
		err = -EFBIG;
		goto out_err;
	}

Hmm.  I didn't read this correctly -- the comment says ULONG_MAX pages,
not bytes.  I got confused by the units of @offset, because "u64"
doesn't really help me distinguish bytes, blocks, or pages. :(

OTOH looking at how @offset is computed, it seems to be the total number
of blocks in the merkle tree by the time we get here?

So I guess we actually /can/ create a very large (e.g. 2^33 blocks)
merkle tree on a 64-bit machine, which could then return -EFBIG on
32-bit?

My dumb btree geometry calculator seems to think that an 8EiB file with
a sha256 hash in 4k blocks would generate a 69,260,574,978MB merkle
tree, or roughly a 2^44 block merkle tree?

Ok I guess xfs fsverity really does need a substantial amount of attr
fork space then. :)

--D

> - Eric
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-09 18:04                 ` Darrick J. Wong
@ 2024-05-09 18:36                   ` Eric Biggers
  0 siblings, 0 replies; 159+ messages in thread
From: Eric Biggers @ 2024-05-09 18:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Thu, May 09, 2024 at 11:04:27AM -0700, Darrick J. Wong wrote:
> On Thu, May 09, 2024 at 10:46:52AM -0700, Eric Biggers wrote:
> > On Wed, May 08, 2024 at 01:26:03PM -0700, Darrick J. Wong wrote:
> > > > If we did that would be yet another indicator that they aren't attrs
> > > > but something else.  But maybe I should stop banging that drum and
> > > > agree that everything is a nail if all you got is a hammer.. :)
> > > 
> > > Hammer?  All I've got is a big block of cheese. :P
> > > 
> > > FWIW the fsverity code seems to cut us off at U32_MAX bytes of merkle
> > > data so that's going to be the limit until they rev the ondisk format.
> > > 
> > 
> > Where does that happen?
> 
> fsverity_init_merkle_tree_params has the following:
> 
> 	/*
> 	 * With block_size != PAGE_SIZE, an in-memory bitmap will need to be
> 	 * allocated to track the "verified" status of hash blocks.  Don't allow
> 	 * this bitmap to get too large.  For now, limit it to 1 MiB, which
> 	 * limits the file size to about 4.4 TB with SHA-256 and 4K blocks.
> 	 *
> 	 * Together with the fact that the data, and thus also the Merkle tree,
> 	 * cannot have more than ULONG_MAX pages, this implies that hash block
> 	 * indices can always fit in an 'unsigned long'.  But to be safe, we
> 	 * explicitly check for that too.  Note, this is only for hash block
> 	 * indices; data block indices might not fit in an 'unsigned long'.
> 	 */
> 	if ((params->block_size != PAGE_SIZE && offset > 1 << 23) ||
> 	    offset > ULONG_MAX) {
> 		fsverity_err(inode, "Too many blocks in Merkle tree");
> 		err = -EFBIG;
> 		goto out_err;
> 	}
> 
> Hmm.  I didn't read this correctly -- the comment says ULONG_MAX pages,
> not bytes.  I got confused by the units of @offset, because "u64"
> doesn't really help me distinguish bytes, blocks, or pages. :(
> 
> OTOH looking at how @offset is computed, it seems to be the total number
> of blocks in the merkle tree by the time we get here?

Yes, it's blocks here.

> So I guess we actually /can/ create a very large (e.g. 2^33 blocks)
> merkle tree on a 64-bit machine, which could then return -EFBIG on
> 32-bit?

Sure, but the page cache is indexed with unsigned long, and there are more data
pages than Merkle tree blocks, so that becomes a problem first.  That's why
fs/verity/ uses unsigned long for Merkle tree block indices.

> My dumb btree geometry calculator seems to think that an 8EiB file with
> a sha256 hash in 4k blocks would generate a 69,260,574,978MB merkle
> tree, or roughly a 2^44 block merkle tree?
> 
> Ok I guess xfs fsverity really does need a substantial amount of attr
> fork space then. :)

- Eric

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-09  5:02               ` Christoph Hellwig
@ 2024-05-09 20:02                 ` Darrick J. Wong
  2024-05-10  5:08                   ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-09 20:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Wed, May 08, 2024 at 10:02:52PM -0700, Christoph Hellwig wrote:
> On Wed, May 08, 2024 at 01:26:03PM -0700, Darrick J. Wong wrote:
> > I guess we could make it really obvious by allocating range in the
> > mapping starting at MAX_FILEOFF and going downwards.  Chances are pretty
> > good that with the xattr info growing upwards they're never going to
> > meet.
> 
> Yes, although I'd avoid taking chances.  More below.
> 
> > > Or we decide the space above 2^32 blocks can't be used by attrs,
> > > and only by other users with other means of discover.  Say the
> > > verify hashes..
> > 
> > Well right now they can't be used by attrs because xfs_dablk_t isn't big
> > enough to fit a larger value.
> 
> Yes.

Thinking about this further, I think building the merkle tree becomes a
lot more difficult than the current design.  At first I thought of
reserving a static partition in the attr fork address range, but got
bogged donw in figuring out how big the static partition has to be.

Downthread I realized that the maximum size of a merkle tree is actually
ULONG_MAX blocks, which means that on a 64-bit machine there effectively
is no limit.

For an 8EB file with sha256 hashes (32 bytes) and 4k blocks, we need
2^(63-12) hashes.  Assuming maximum loading factor of 128 hashes per
block, btheight spits out:

# xfs_db /dev/sdf -c 'btheight -b 4096 merkle_sha256 -n '$(( 2 ** (63 - 12) ))
merkle_sha256: best case per 4096-byte block: 128 records (leaf) / 128 keyptrs (node)
level 0: 2251799813685248 records, 17592186044416 blocks
level 1: 17592186044416 records, 137438953472 blocks
level 2: 137438953472 records, 1073741824 blocks
level 3: 1073741824 records, 8388608 blocks
level 4: 8388608 records, 65536 blocks
level 5: 65536 records, 512 blocks
level 6: 512 records, 4 blocks
level 7: 4 records, 1 block
8 levels, 17730707194373 blocks total

That's about 2^45 blocks.  If the hash is sha512, double those figures.
For a 1k block size, we get:

# xfs_db /dev/sdf -c 'btheight -b 1024 merkle_sha256 -n '$(( 2 ** (63 - 10) ))
merkle_sha256: best case per 1024-byte block: 32 records (leaf) / 32 keyptrs (node)
level 0: 9007199254740992 records, 281474976710656 blocks
level 1: 281474976710656 records, 8796093022208 blocks
level 2: 8796093022208 records, 274877906944 blocks
level 3: 274877906944 records, 8589934592 blocks
level 4: 8589934592 records, 268435456 blocks
level 5: 268435456 records, 8388608 blocks
level 6: 8388608 records, 262144 blocks
level 7: 262144 records, 8192 blocks
level 8: 8192 records, 256 blocks
level 9: 256 records, 8 blocks
level 10: 8 records, 1 block
11 levels, 290554814669065 blocks total

That would be 2^49 blocks but mercifully fsverity doesn't allow more
than 8 levels of tree.

So I don't think it's a good idea to create a hardcoded partition in the
attr fork for merkle tree data, since it would have to be absurdly large
for the common case of sub-1T files:

# xfs_db /dev/sdf -c 'btheight -b 4096 merkle_sha512 -n '$(( 2 ** (40 - 12) ))
merkle_sha512: best case per 4096-byte block: 64 records (leaf) / 64 keyptrs (node)
level 0: 268435456 records, 4194304 blocks
level 1: 4194304 records, 65536 blocks
level 2: 65536 records, 1024 blocks
level 3: 1024 records, 16 blocks
level 4: 16 records, 1 block
5 levels, 4260881 blocks total

That led me to the idea of dynamic partitioning, where we find a sparse
part of the attr fork fileoff range and use that.  That burns a lot less
address range but means that we cannot elide merkle tree blocks that
contain entirely hash(zeroes) because elided blocks become sparse holes
in the attr fork, and xfs_bmap_first_unused can still find those holes.
I guess you could mark those blocks as unwritten, but that wastes space.

Setting even /that/ aside, how would we allocate/map the range?  I think
we'd want a second ATTR_VERITY attr to claim ownership of whatever attr
fork range we found.  xfs_fsverity_write_merkle would have to do
something like this, pretending that the merkle tree blocksize matches
the fs blocksize:

	offset = /* merkle tree pos to attr fork xfs_fileoff_t */
	xfs_trans_alloc()

	xfs_bmapi_write(..., offset, 1...);

	xfs_trans_buf_get()
	/* copy merkle tree contents */
	xfs_trans_log_buf()

	/* update verity extent attr value */
	xfs_attr_defer_add("verity.merkle",
			{fileoff: /* start of range */,
			 blockcount: /* blocks mapped so far */
			});

	xfs_trans_commit()

Note that xfs_fsverity_begin_enable would have to ensure that there's an
attr fork and that it's not in local format.  On the plus side, doing
all this transactionally means we have a second user of logged xattr
updates. :P

Online repair would need to grow new code to copy the merkle tree.

Tearing down the merkle tree (aka if tree setup fails and someone wants
to try again) use the "verity.merkle" attr to figure out which blocks to
clobber.

Caching at least is pretty easy, look up the "verity.merkle" attribute
to find the fileoff, compute the fileoff of the particular block we want
in the attr fork, xfs_buf_read the buffer, and toss the contents in the
incore cache that we have now.

<shrug> What do you think of that?

--D

> > The dangerous part here is that the code
> > silently truncates the outparam of xfs_bmap_first_unused, so I'll fix
> > that too.
> 
> Well, we should check for that in xfs_attr_rmt_find_hole /
> xfs_da_grow_inode_int, totally independent of the fsverity work.
> The condition is basically impossible to hit right now, but I'd rather
> make sure we do have a solid check.  I'll prepare a patch for it.
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-09 20:02                 ` Darrick J. Wong
@ 2024-05-10  5:08                   ` Christoph Hellwig
  2024-05-10  6:20                     ` Christoph Hellwig
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-10  5:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, ebiggers, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

On Thu, May 09, 2024 at 01:02:50PM -0700, Darrick J. Wong wrote:
> Thinking about this further, I think building the merkle tree becomes a
> lot more difficult than the current design.  At first I thought of
> reserving a static partition in the attr fork address range, but got
> bogged donw in figuring out how big the static partition has to be.
> 
> Downthread I realized that the maximum size of a merkle tree is actually
> ULONG_MAX blocks, which means that on a 64-bit machine there effectively
> is no limit.

Do we care about using up the limit?  Remember that in ext4/f2fs
the merkle tree is stored in what XFS calls the data fork,
so the file data plus the merkle tree have to fit into the size
limit, be that S64_MAX or lower limit imposed by the page cache.
And besides being the limit imposed by the current most common
implementation (I haven't checked btrfs as the only other one),
that does seem like a pretty reasonable one.

> That led me to the idea of dynamic partitioning, where we find a sparse
> part of the attr fork fileoff range and use that.  That burns a lot less
> address range but means that we cannot elide merkle tree blocks that
> contain entirely hash(zeroes) because elided blocks become sparse holes
> in the attr fork, and xfs_bmap_first_unused can still find those holes.

xfs_bmap_first_unused currently finds them.  It should not as it's
callers are limited to 32-bit addressing.  I'll send a patch to make
that clear. 

> Setting even /that/ aside, how would we allocate/map the range?

IFF we stick to a static range (which I think still make sense),
that range would be statically reserved and should exist if the
VERITY bit is set on the inode, and the size is calculated from
the file size.  If not we'd indeed need to record the mapping
somewhere, and an attr would be the right place.  It still feels
like going down a rabit hole for no obvious benefit to me.


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-10  5:08                   ` Christoph Hellwig
@ 2024-05-10  6:20                     ` Christoph Hellwig
  2024-05-17 17:17                       ` Darrick J. Wong
  0 siblings, 1 reply; 159+ messages in thread
From: Christoph Hellwig @ 2024-05-10  6:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, aalbersh, ebiggers, linux-xfs, alexl, walters,
	fsverity, linux-fsdevel

FYI, I spent some time looking over the core verity and ext4 code,
and I can't find anything enforcing any kind of size limit.  Of course
testing that is kinda hard without taking sparseness into account.

Eric, should fsverity or the fs backend check for a max size instead
od trying to build the merkle tree and evnetually failing to write it
out?

An interesting note I found in the ext4 code is:

  Note that the verity metadata *must* be encrypted when the file is,
  since it contains hashes of the plaintext data.

While xfs doesn't currently support fscrypyt it would actually be very
useful feature, so we're locking us into encrypting attrs or at least
magic attr fork data if we do our own non-standard fsverity storage.
I'm getting less and less happy with not just doing the normal post
i_size storage.  Yes, it's not pretty (so isn't the whole fsverity idea
of shoehorning the hashes into file systems not built for it), but it
avoid adding tons of code and beeing very different.

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCHSET v5.6] fstests: fs-verity support for XFS
  2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-30  3:42   ` [PATCH 6/6] common/populate: add verity files to populate xfs images Darrick J. Wong
@ 2024-05-11  5:01   ` Zorro Lang
  2024-05-17 15:56     ` Darrick J. Wong
  6 siblings, 1 reply; 159+ messages in thread
From: Zorro Lang @ 2024-05-11  5:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: fsverity, linux-fsdevel, linux-xfs, fstests

Hi Darrick,

Due to only half of this patchset got reviewed, so I'd like to wait for your
later version. I won't pick up part of this patchset to merge this time, I
think better to merge it as an integrated patchset.

Thanks,
Zorro

On Mon, Apr 29, 2024 at 08:19:24PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> This patchset adds support for fsverity to XFS.  In keeping with
> Andrey's original design, XFS stores all fsverity metadata in the
> extended attribute data.  However, I've made a few changes to the code:
> First, it now caches merkle tree blocks directly instead of abusing the
> buffer cache.  This reduces lookup overhead quite a bit, at a cost of
> needing a new shrinker for cached merkle tree blocks.
> 
> To reduce the ondisk footprint further, I also made the verity
> enablement code detect trailing zeroes whenever fsverity tells us to
> write a buffer, and elide storing the zeroes.  To further reduce the
> footprint of sparse files, I also skip writing merkle tree blocks if the
> block contents are entirely hashes of zeroes.
> 
> Next, I implemented more of the tooling around verity, such as debugger
> support, as much fsck support as I can manage without knowing the
> internal format of the fsverity information; and added support for
> xfs_scrub to read fsverity files to validate the consistency of the data
> against the merkle tree.
> 
> Finally, I add the ability for administrators to turn off fsverity,
> which might help recovering damaged data from an inconsistent file.
> 
> From Andrey Albershteyn:
> 
> Here's v5 of my patchset of adding fs-verity support to XFS.
> 
> This implementation uses extended attributes to store fs-verity
> metadata. The Merkle tree blocks are stored in the remote extended
> attributes. The names are offsets into the tree.
> From Darrick J. Wong:
> 
> This v5.3 patchset builds upon v5.2 of Andrey's patchset to implement
> fsverity for XFS.
> 
> The biggest thing that I didn't like in the v5 patchset is the abuse of
> the data device's buffer cache to store the incore version of the merkle
> tree blocks.  Not only do verity state flags end up in xfs_buf, but the
> double-alloc flag wastes memory and doesn't remain internally consistent
> if the xattrs shift around.
> 
> I replaced all of that with a per-inode xarray that indexes incore
> merkle tree blocks.  For cache hits, this dramatically reduces the
> amount of work that xfs has to do to feed fsverity.  The per-block
> overhead is much lower (8 bytes instead of ~300 for xfs_bufs), and we no
> longer have to entertain layering violations in the buffer cache.  I
> also added a per-filesystem shrinker so that reclaim can cull cached
> merkle tree blocks, starting with the leaf tree nodes.
> 
> I've also rolled in some changes recommended by the fsverity maintainer,
> fixed some organization and naming problems in the xfs code, fixed a
> collision in the xfs_inode iflags, and improved dead merkle tree cleanup
> per the discussion of the v5 series.  At this point I'm happy enough
> with this code to start integrating and testing it in my trees, so it's
> time to send it out a coherent patchset for comments.
> 
> For v5.3, I've added bits and pieces of online and offline repair
> support, reduced the size of partially filled merkle tree blocks by
> removing trailing zeroes, changed the xattr hash function to better
> avoid collisions between merkle tree keys, made the fsverity
> invalidation bitmap unnecessary, and made it so that we can save space
> on sparse verity files by not storing merkle tree blocks that hash
> totally zeroed data blocks.
> 
> From Andrey Albershteyn:
> 
> Here's v5 of my patchset of adding fs-verity support to XFS.
> 
> This implementation uses extended attributes to store fs-verity
> metadata. The Merkle tree blocks are stored in the remote extended
> attributes. The names are offsets into the tree.
> 
> If you're going to start using this code, I strongly recommend pulling
> from my git trees, which are linked below.
> 
> This has been running on the djcloud for months with no problems.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity
> 
> xfsprogs git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fsverity
> 
> fstests git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fsverity
> ---
> Commits in this patchset:
>  * common/verity: enable fsverity for XFS
>  * xfs/{021,122}: adapt to fsverity xattrs
>  * xfs/122: adapt to fsverity
>  * xfs: test xfs_scrub detection and correction of corrupt fsverity metadata
>  * xfs: test disabling fsverity
>  * common/populate: add verity files to populate xfs images
> ---
>  common/populate    |   24 +++++++++
>  common/verity      |   39 ++++++++++++++-
>  tests/xfs/021      |    3 +
>  tests/xfs/122.out  |    3 +
>  tests/xfs/1880     |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1880.out |   37 ++++++++++++++
>  tests/xfs/1881     |  111 +++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/1881.out |   28 +++++++++++
>  8 files changed, 378 insertions(+), 2 deletions(-)
>  create mode 100755 tests/xfs/1880
>  create mode 100644 tests/xfs/1880.out
>  create mode 100755 tests/xfs/1881
>  create mode 100644 tests/xfs/1881.out
> 


^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 04/18] fsverity: support block-based Merkle tree caching
  2024-05-02  4:42         ` Christoph Hellwig
@ 2024-05-15  2:16           ` Eric Biggers
  0 siblings, 0 replies; 159+ messages in thread
From: Eric Biggers @ 2024-05-15  2:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, aalbersh, linux-xfs, alexl, walters, fsverity,
	linux-fsdevel

On Wed, May 01, 2024 at 09:42:07PM -0700, Christoph Hellwig wrote:
> On Wed, May 01, 2024 at 03:35:19PM -0700, Darrick J. Wong wrote:
> > Got a link?  This is the first I've heard of this, but TBH I've been
> > ignoring a /lot/ of things trying to get online repair merged (thank
> > you!) over the past months...
> 
> This was long before I got involved with repair :)
> 
> Below is what I found in my local tree.  It doesn't have a proper commit
> log, so I probably only sent it out as a RFC in reply to a patch series
> posting, most likely untested:
> 
> commit c11dcbe101a240c7a9e9bae7efaff2779d88b292
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Oct 16 14:14:11 2023 +0200
> 
>     fsverity block interface

That RFC patch doesn't take into account the bitmap, but the overall idea does
seem to work.  I've had a go at the block-based Merkle tree caching support at
https://lore.kernel.org/fsverity/20240515015320.323443-1-ebiggers@kernel.org.
Let me know what you think.

(The one thing I'm not a huge fan of is the indirect call on the drop path.
Previously, it wasn't necessary for filesystems using page based caching.  This
hopefully is a minor point, but I'm not sure, since unfortunately indirect calls
are atrociously expensive these days -- especially on x86.  Having the single
read_block / drop_block interface does seem like the right solution, though.  We
could always optimize the pagecache-based drop to a direct call later, while
conceptually still having it be an implementation of the same interface.)

- Eric

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCHSET v5.6] fstests: fs-verity support for XFS
  2024-05-11  5:01   ` [PATCHSET v5.6] fstests: fs-verity support for XFS Zorro Lang
@ 2024-05-17 15:56     ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-17 15:56 UTC (permalink / raw)
  To: Zorro Lang; +Cc: fsverity, linux-fsdevel, linux-xfs, fstests

On Sat, May 11, 2024 at 01:01:46PM +0800, Zorro Lang wrote:
> Hi Darrick,
> 
> Due to only half of this patchset got reviewed, so I'd like to wait for your
> later version. I won't pick up part of this patchset to merge this time, I
> think better to merge it as an integrated patchset.

Christoph and I talked about the future of this patchset at LSF and
there are some file format changes in store, so please hold off on
analyzing this patchset for now.

--D

> Thanks,
> Zorro
> 
> On Mon, Apr 29, 2024 at 08:19:24PM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > This patchset adds support for fsverity to XFS.  In keeping with
> > Andrey's original design, XFS stores all fsverity metadata in the
> > extended attribute data.  However, I've made a few changes to the code:
> > First, it now caches merkle tree blocks directly instead of abusing the
> > buffer cache.  This reduces lookup overhead quite a bit, at a cost of
> > needing a new shrinker for cached merkle tree blocks.
> > 
> > To reduce the ondisk footprint further, I also made the verity
> > enablement code detect trailing zeroes whenever fsverity tells us to
> > write a buffer, and elide storing the zeroes.  To further reduce the
> > footprint of sparse files, I also skip writing merkle tree blocks if the
> > block contents are entirely hashes of zeroes.
> > 
> > Next, I implemented more of the tooling around verity, such as debugger
> > support, as much fsck support as I can manage without knowing the
> > internal format of the fsverity information; and added support for
> > xfs_scrub to read fsverity files to validate the consistency of the data
> > against the merkle tree.
> > 
> > Finally, I add the ability for administrators to turn off fsverity,
> > which might help recovering damaged data from an inconsistent file.
> > 
> > From Andrey Albershteyn:
> > 
> > Here's v5 of my patchset of adding fs-verity support to XFS.
> > 
> > This implementation uses extended attributes to store fs-verity
> > metadata. The Merkle tree blocks are stored in the remote extended
> > attributes. The names are offsets into the tree.
> > From Darrick J. Wong:
> > 
> > This v5.3 patchset builds upon v5.2 of Andrey's patchset to implement
> > fsverity for XFS.
> > 
> > The biggest thing that I didn't like in the v5 patchset is the abuse of
> > the data device's buffer cache to store the incore version of the merkle
> > tree blocks.  Not only do verity state flags end up in xfs_buf, but the
> > double-alloc flag wastes memory and doesn't remain internally consistent
> > if the xattrs shift around.
> > 
> > I replaced all of that with a per-inode xarray that indexes incore
> > merkle tree blocks.  For cache hits, this dramatically reduces the
> > amount of work that xfs has to do to feed fsverity.  The per-block
> > overhead is much lower (8 bytes instead of ~300 for xfs_bufs), and we no
> > longer have to entertain layering violations in the buffer cache.  I
> > also added a per-filesystem shrinker so that reclaim can cull cached
> > merkle tree blocks, starting with the leaf tree nodes.
> > 
> > I've also rolled in some changes recommended by the fsverity maintainer,
> > fixed some organization and naming problems in the xfs code, fixed a
> > collision in the xfs_inode iflags, and improved dead merkle tree cleanup
> > per the discussion of the v5 series.  At this point I'm happy enough
> > with this code to start integrating and testing it in my trees, so it's
> > time to send it out a coherent patchset for comments.
> > 
> > For v5.3, I've added bits and pieces of online and offline repair
> > support, reduced the size of partially filled merkle tree blocks by
> > removing trailing zeroes, changed the xattr hash function to better
> > avoid collisions between merkle tree keys, made the fsverity
> > invalidation bitmap unnecessary, and made it so that we can save space
> > on sparse verity files by not storing merkle tree blocks that hash
> > totally zeroed data blocks.
> > 
> > From Andrey Albershteyn:
> > 
> > Here's v5 of my patchset of adding fs-verity support to XFS.
> > 
> > This implementation uses extended attributes to store fs-verity
> > metadata. The Merkle tree blocks are stored in the remote extended
> > attributes. The names are offsets into the tree.
> > 
> > If you're going to start using this code, I strongly recommend pulling
> > from my git trees, which are linked below.
> > 
> > This has been running on the djcloud for months with no problems.  Enjoy!
> > Comments and questions are, as always, welcome.
> > 
> > --D
> > 
> > kernel git tree:
> > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fsverity
> > 
> > xfsprogs git tree:
> > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fsverity
> > 
> > fstests git tree:
> > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fsverity
> > ---
> > Commits in this patchset:
> >  * common/verity: enable fsverity for XFS
> >  * xfs/{021,122}: adapt to fsverity xattrs
> >  * xfs/122: adapt to fsverity
> >  * xfs: test xfs_scrub detection and correction of corrupt fsverity metadata
> >  * xfs: test disabling fsverity
> >  * common/populate: add verity files to populate xfs images
> > ---
> >  common/populate    |   24 +++++++++
> >  common/verity      |   39 ++++++++++++++-
> >  tests/xfs/021      |    3 +
> >  tests/xfs/122.out  |    3 +
> >  tests/xfs/1880     |  135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/1880.out |   37 ++++++++++++++
> >  tests/xfs/1881     |  111 +++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/1881.out |   28 +++++++++++
> >  8 files changed, 378 insertions(+), 2 deletions(-)
> >  create mode 100755 tests/xfs/1880
> >  create mode 100644 tests/xfs/1880.out
> >  create mode 100755 tests/xfs/1881
> >  create mode 100644 tests/xfs/1881.out
> > 
> 
> 

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash
  2024-05-10  6:20                     ` Christoph Hellwig
@ 2024-05-17 17:17                       ` Darrick J. Wong
  0 siblings, 0 replies; 159+ messages in thread
From: Darrick J. Wong @ 2024-05-17 17:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: aalbersh, ebiggers, linux-xfs, alexl, walters, fsverity, linux-fsdevel

On Thu, May 09, 2024 at 11:20:17PM -0700, Christoph Hellwig wrote:
> FYI, I spent some time looking over the core verity and ext4 code,
> and I can't find anything enforcing any kind of size limit.  Of course
> testing that is kinda hard without taking sparseness into account.
> 
> Eric, should fsverity or the fs backend check for a max size instead
> od trying to build the merkle tree and evnetually failing to write it
> out?
> 
> An interesting note I found in the ext4 code is:
> 
>   Note that the verity metadata *must* be encrypted when the file is,
>   since it contains hashes of the plaintext data.

Refresh my memory of fscrypt -- does it encrypt directory names, xattr
names, and xattr values too?  Or does it only do that to file data?

> While xfs doesn't currently support fscrypyt it would actually be very
> useful feature, so we're locking us into encrypting attrs or at least
> magic attr fork data if we do our own non-standard fsverity storage.
> I'm getting less and less happy with not just doing the normal post
> i_size storage.  Yes, it's not pretty (so isn't the whole fsverity idea
> of shoehorning the hashes into file systems not built for it), but it
> avoid adding tons of code and beeing very different.

And if we copy the ext4 method of putting the merkle data after eof and
loading it into the pagecache, how much of the generic fs/verity cleanup
patches do we really need?

--D

^ permalink raw reply	[flat|nested] 159+ messages in thread

* Re: [PATCH 25/26] xfs: make it possible to disable fsverity
  2024-05-09 15:43                       ` Darrick J. Wong
@ 2024-05-17 19:36                         ` Theodore Ts'o
  0 siblings, 0 replies; 159+ messages in thread
From: Theodore Ts'o @ 2024-05-17 19:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Eric Biggers, aalbersh, linux-xfs, alexl,
	walters, fsverity, linux-fsdevel

On Thu, May 09, 2024 at 08:43:23AM -0700, Darrick J. Wong wrote:
> > Well, fsverity as-is is intended for use cases where you care about
> > integrity of the file.  For that disabling it really doesn't make
> > sense.  If we have other use cases we can probably add a variant
> > of fsverity that clearly deals with non-integrity checksums.
> > Although just disabling them if they mismatch still feels like a
> > somewhat odd usage model.
> 
> Yeah, it definitely exists in the same weird grey area of turning off
> metadata checksum validation to extract as many files from a busted fs
> as can be done.

I've certainly thought about the possibilities of adding a CRC
checksum type.  We do need to explicitly mark this as a
non-cryptographic checksum since it might have make a difference for
IMA policies, etc.  This would be useful for detecting problems for
people's video or music archives, for example.

I can imagine situations where it might make sense to allow the file
owner to be able to disable fsverity, whether the checksum and use
case involves cryptographic or non-cryptographic checksums.  Having a
flag in the fsverity header indicating whether dropping fsverity
protection requires elevated privileged or can be done by the file
owner seems to make sense to me.

      	       	    	     - Ted

^ permalink raw reply	[flat|nested] 159+ messages in thread

end of thread, other threads:[~2024-05-17 19:37 UTC | newest]

Thread overview: 159+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-30  3:11 [PATCHBOMB v5.6] fs-verity support for XFS Darrick J. Wong
2024-04-30  3:18 ` [PATCHSET v5.6 1/2] fs-verity: support merkle tree access by blocks Darrick J. Wong
2024-04-30  3:19   ` [PATCH 01/18] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
2024-04-30  3:19   ` [PATCH 02/18] fsverity: pass tree_blocksize to end_enable_verity() Darrick J. Wong
2024-04-30  3:20   ` [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets Darrick J. Wong
2024-05-01  7:33     ` Christoph Hellwig
2024-05-01 22:33       ` Darrick J. Wong
2024-05-02  0:42         ` Eric Biggers
2024-05-08 20:14           ` Darrick J. Wong
2024-04-30  3:20   ` [PATCH 04/18] fsverity: support block-based Merkle tree caching Darrick J. Wong
2024-05-01  7:36     ` Christoph Hellwig
2024-05-01 22:35       ` Darrick J. Wong
2024-05-02  4:42         ` Christoph Hellwig
2024-05-15  2:16           ` Eric Biggers
2024-04-30  3:20   ` [PATCH 05/18] fsverity: pass the merkle tree block level to fsverity_read_merkle_tree_block Darrick J. Wong
2024-04-30  3:20   ` [PATCH 06/18] fsverity: add per-sb workqueue for post read processing Darrick J. Wong
2024-04-30  3:21   ` [PATCH 07/18] fsverity: add tracepoints Darrick J. Wong
2024-04-30  3:21   ` [PATCH 08/18] fsverity: pass the new tree size and block size to ->begin_enable_verity Darrick J. Wong
2024-04-30  3:21   ` [PATCH 09/18] fsverity: expose merkle tree geometry to callers Darrick J. Wong
2024-04-30  3:22   ` [PATCH 10/18] fsverity: box up the write_merkle_tree_block parameters too Darrick J. Wong
2024-04-30  3:22   ` [PATCH 11/18] fsverity: pass the zero-hash value to the implementation Darrick J. Wong
2024-04-30  3:22   ` [PATCH 12/18] fsverity: report validation errors back to the filesystem Darrick J. Wong
2024-04-30  3:22   ` [PATCH 13/18] fsverity: pass super_block to fsverity_enqueue_verify_work Darrick J. Wong
2024-04-30  3:23   ` [PATCH 14/18] ext4: use a per-superblock fsverity workqueue Darrick J. Wong
2024-04-30  3:23   ` [PATCH 15/18] f2fs: " Darrick J. Wong
2024-04-30  3:23   ` [PATCH 16/18] btrfs: " Darrick J. Wong
2024-04-30  3:23   ` [PATCH 17/18] fsverity: remove system-wide workqueue Darrick J. Wong
2024-04-30  3:24   ` [PATCH 18/18] iomap: integrate fs-verity verification into iomap's read path Darrick J. Wong
2024-05-01  7:10     ` Christoph Hellwig
2024-05-01 22:37       ` Darrick J. Wong
2024-04-30  3:18 ` [PATCHSET v5.6 2/2] xfs: fs-verity support Darrick J. Wong
2024-04-30  3:24   ` [PATCH 01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
2024-05-01  6:55     ` Christoph Hellwig
2024-05-01 22:39       ` Darrick J. Wong
2024-05-02  4:56         ` Christoph Hellwig
2024-05-02  5:56         ` Chandan Babu R
2024-05-02  6:34           ` Christoph Hellwig
2024-04-30  3:24   ` [PATCH 02/26] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
2024-05-01  6:55     ` Christoph Hellwig
2024-04-30  3:24   ` [PATCH 03/26] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
2024-05-01  6:56     ` Christoph Hellwig
2024-04-30  3:25   ` [PATCH 04/26] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
2024-05-01  6:56     ` Christoph Hellwig
2024-04-30  3:25   ` [PATCH 05/26] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
2024-05-01  6:57     ` Christoph Hellwig
2024-05-01 22:42       ` Darrick J. Wong
2024-04-30  3:25   ` [PATCH 06/26] xfs: add attribute type for fs-verity Darrick J. Wong
2024-04-30  3:25   ` [PATCH 07/26] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks Darrick J. Wong
2024-04-30  3:26   ` [PATCH 08/26] xfs: add fs-verity ro-compat flag Darrick J. Wong
2024-04-30  3:26   ` [PATCH 09/26] xfs: add inode on-disk VERITY flag Darrick J. Wong
2024-04-30  3:26   ` [PATCH 10/26] xfs: initialize fs-verity on file open and cleanup on inode destruction Darrick J. Wong
2024-04-30  3:26   ` [PATCH 11/26] xfs: don't allow to enable DAX on fs-verity sealed inode Darrick J. Wong
2024-04-30  3:27   ` [PATCH 12/26] xfs: disable direct read path for fs-verity files Darrick J. Wong
2024-04-30  3:27   ` [PATCH 13/26] xfs: widen flags argument to the xfs_iflags_* helpers Darrick J. Wong
2024-05-01  6:54     ` Christoph Hellwig
2024-05-01 22:44       ` Darrick J. Wong
2024-04-30  3:27   ` [PATCH 14/26] xfs: add fs-verity support Darrick J. Wong
2024-04-30  3:28   ` [PATCH 15/26] xfs: create a per-mount shrinker for verity inodes merkle tree blocks Darrick J. Wong
2024-04-30  3:28   ` [PATCH 16/26] xfs: shrink verity blob cache Darrick J. Wong
2024-04-30  3:28   ` [PATCH 17/26] xfs: don't store trailing zeroes of merkle tree blocks Darrick J. Wong
2024-04-30  3:28   ` [PATCH 18/26] xfs: use merkle tree offset as attr hash Darrick J. Wong
2024-05-01  6:53     ` Christoph Hellwig
2024-05-01  7:23       ` Christoph Hellwig
2024-05-07 21:24         ` Darrick J. Wong
2024-05-08 11:47           ` Christoph Hellwig
2024-05-08 20:26             ` Darrick J. Wong
2024-05-09  5:02               ` Christoph Hellwig
2024-05-09 20:02                 ` Darrick J. Wong
2024-05-10  5:08                   ` Christoph Hellwig
2024-05-10  6:20                     ` Christoph Hellwig
2024-05-17 17:17                       ` Darrick J. Wong
2024-05-09 17:46               ` Eric Biggers
2024-05-09 18:04                 ` Darrick J. Wong
2024-05-09 18:36                   ` Eric Biggers
2024-04-30  3:29   ` [PATCH 19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks Darrick J. Wong
2024-05-01  6:47     ` Christoph Hellwig
2024-05-01 22:47       ` Darrick J. Wong
2024-05-02  0:01         ` Eric Biggers
2024-05-08 20:26           ` Darrick J. Wong
2024-04-30  3:29   ` [PATCH 20/26] xfs: add fs-verity ioctls Darrick J. Wong
2024-04-30  3:29   ` [PATCH 21/26] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
2024-04-30  3:29   ` [PATCH 22/26] xfs: check and repair the verity inode flag state Darrick J. Wong
2024-04-30  3:30   ` [PATCH 23/26] xfs: teach online repair to evaluate fsverity xattrs Darrick J. Wong
2024-04-30  3:30   ` [PATCH 24/26] xfs: report verity failures through the health system Darrick J. Wong
2024-04-30  3:30   ` [PATCH 25/26] xfs: make it possible to disable fsverity Darrick J. Wong
2024-05-01  6:48     ` Christoph Hellwig
2024-05-01 22:50       ` Darrick J. Wong
2024-05-02  0:15         ` Eric Biggers
2024-05-08 20:31           ` Darrick J. Wong
2024-05-09  5:04             ` Christoph Hellwig
2024-05-09 14:45               ` Darrick J. Wong
2024-05-09 15:06                 ` Christoph Hellwig
2024-05-09 15:09                   ` Darrick J. Wong
2024-05-09 15:13                     ` Christoph Hellwig
2024-05-09 15:43                       ` Darrick J. Wong
2024-05-17 19:36                         ` Theodore Ts'o
2024-04-30  3:30   ` [PATCH 26/26] xfs: enable ro-compat fs-verity flag Darrick J. Wong
2024-04-30  3:19 ` [PATCHSET v5.6] xfsprogs: fs-verity support for XFS Darrick J. Wong
2024-04-30  3:31   ` [PATCH 01/38] fs: add FS_XFLAG_VERITY for verity files Darrick J. Wong
2024-04-30  3:31   ` [PATCH 02/38] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c Darrick J. Wong
2024-04-30  3:31   ` [PATCH 03/38] xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function Darrick J. Wong
2024-04-30  3:31   ` [PATCH 04/38] xfs: create a helper to compute the blockcount of a max sized remote value Darrick J. Wong
2024-04-30  3:32   ` [PATCH 05/38] xfs: minor cleanups of xfs_attr3_rmt_blocks Darrick J. Wong
2024-04-30  3:32   ` [PATCH 06/38] xfs: use an empty transaction to protect xfs_attr_get from deadlocks Darrick J. Wong
2024-04-30  3:32   ` [PATCH 07/38] xfs: add attribute type for fs-verity Darrick J. Wong
2024-04-30  3:32   ` [PATCH 08/38] xfs: do not use xfs_attr3_rmt_hdr for remote verity value blocks Darrick J. Wong
2024-04-30  3:33   ` [PATCH 09/38] xfs: add fs-verity ro-compat flag Darrick J. Wong
2024-04-30  3:33   ` [PATCH 10/38] xfs: add inode on-disk VERITY flag Darrick J. Wong
2024-04-30  3:33   ` [PATCH 11/38] xfs: add fs-verity support Darrick J. Wong
2024-04-30  3:34   ` [PATCH 12/38] xfs: use merkle tree offset as attr hash Darrick J. Wong
2024-04-30  3:34   ` [PATCH 13/38] xfs: advertise fs-verity being available on filesystem Darrick J. Wong
2024-04-30  3:34   ` [PATCH 14/38] xfs: report verity failures through the health system Darrick J. Wong
2024-04-30  3:34   ` [PATCH 15/38] xfs: enable ro-compat fs-verity flag Darrick J. Wong
2024-04-30  3:35   ` [PATCH 16/38] libfrog: add fsverity to xfs_report_geom output Darrick J. Wong
2024-04-30  3:35   ` [PATCH 17/38] xfs_db: introduce attr_modify command Darrick J. Wong
2024-04-30  3:35   ` [PATCH 18/38] xfs_db: add ATTR_PARENT support to " Darrick J. Wong
2024-04-30  3:35   ` [PATCH 19/38] xfs_db: make attr_set/remove/modify be able to handle fs-verity attrs Darrick J. Wong
2024-04-30  3:36   ` [PATCH 20/38] man: document attr_modify command Darrick J. Wong
2024-04-30  3:36   ` [PATCH 21/38] xfs_db: create hex string as a field type Darrick J. Wong
2024-04-30  3:36   ` [PATCH 22/38] xfs_db: dump verity features and metadata Darrick J. Wong
2024-04-30  3:36   ` [PATCH 23/38] xfs_db: dump merkle tree data Darrick J. Wong
2024-04-30  3:37   ` [PATCH 24/38] xfs_db: dump the verity descriptor Darrick J. Wong
2024-04-30  3:37   ` [PATCH 25/38] xfs_db: don't obfuscate verity xattrs Darrick J. Wong
2024-04-30  3:37   ` [PATCH 26/38] xfs_db: dump the inode verity flag Darrick J. Wong
2024-04-30  3:37   ` [PATCH 27/38] xfs_db: compute hashes of merkle tree blocks Darrick J. Wong
2024-04-30  3:38   ` [PATCH 28/38] xfs_repair: junk fsverity xattrs when unnecessary Darrick J. Wong
2024-04-30  3:38   ` [PATCH 29/38] xfs_repair: clear verity iflag when verity isn't supported Darrick J. Wong
2024-04-30  3:38   ` [PATCH 30/38] xfs_repair: handle verity remote attrs Darrick J. Wong
2024-04-30  3:38   ` [PATCH 31/38] xfs_repair: allow upgrading filesystems with verity Darrick J. Wong
2024-04-30  3:39   ` [PATCH 32/38] xfs_scrub: check verity file metadata Darrick J. Wong
2024-04-30  3:39   ` [PATCH 33/38] xfs_scrub: validate verity file contents when doing a media scan Darrick J. Wong
2024-04-30  3:39   ` [PATCH 34/38] xfs_scrub: use MADV_POPULATE_READ to check verity files Darrick J. Wong
2024-04-30  3:40   ` [PATCH 35/38] xfs_spaceman: report data corruption Darrick J. Wong
2024-04-30  3:40   ` [PATCH 36/38] xfs_io: report fsverity status via statx Darrick J. Wong
2024-04-30  3:40   ` [PATCH 37/38] xfs_io: create magic command to disable verity Darrick J. Wong
2024-04-30  3:40   ` [PATCH 38/38] mkfs.xfs: add verity parameter Darrick J. Wong
2024-04-30  3:19 ` [PATCHSET v5.6] fstests: fs-verity support for XFS Darrick J. Wong
2024-04-30  3:41   ` [PATCH 1/6] common/verity: enable fsverity " Darrick J. Wong
2024-04-30 12:39     ` Andrey Albershteyn
2024-04-30 15:35       ` Darrick J. Wong
2024-04-30  3:41   ` [PATCH 2/6] xfs/{021,122}: adapt to fsverity xattrs Darrick J. Wong
2024-04-30 12:46     ` Andrey Albershteyn
2024-04-30 15:36       ` Darrick J. Wong
2024-04-30  3:41   ` [PATCH 3/6] xfs/122: adapt to fsverity Darrick J. Wong
2024-04-30 12:45     ` Andrey Albershteyn
2024-04-30 15:37       ` Darrick J. Wong
2024-04-30  3:41   ` [PATCH 4/6] xfs: test xfs_scrub detection and correction of corrupt fsverity metadata Darrick J. Wong
2024-04-30 12:29     ` Andrey Albershteyn
2024-04-30 15:43       ` Darrick J. Wong
2024-04-30  3:42   ` [PATCH 5/6] xfs: test disabling fsverity Darrick J. Wong
2024-04-30 12:56     ` Andrey Albershteyn
2024-04-30 13:11     ` Andrey Albershteyn
2024-04-30 15:48       ` Darrick J. Wong
2024-04-30 18:06         ` Andrey Albershteyn
2024-04-30  3:42   ` [PATCH 6/6] common/populate: add verity files to populate xfs images Darrick J. Wong
2024-04-30 13:22     ` Andrey Albershteyn
2024-04-30 15:49       ` Darrick J. Wong
2024-05-11  5:01   ` [PATCHSET v5.6] fstests: fs-verity support for XFS Zorro Lang
2024-05-17 15:56     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).