linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Andrey Albershteyn <aalbersh@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	fsverity@lists.linux.dev, djwong@kernel.org, david@fromorbit.com,
	dchinner@redhat.com
Subject: Re: [PATCH v3 10/28] fsverity: operate with Merkle tree blocks instead of pages
Date: Tue, 10 Oct 2023 20:56:22 -0700	[thread overview]
Message-ID: <20231011035622.GE1185@sol.localdomain> (raw)
In-Reply-To: <20231006184922.252188-11-aalbersh@redhat.com>

On Fri, Oct 06, 2023 at 08:49:04PM +0200, Andrey Albershteyn wrote:
> fsverity: operate with Merkle tree blocks instead of pages

Well, it already does, just not for the Merkle tree caching.  A better title
might be something like "fsverity: support block-based Merkle tree caching".

> fsverity expects filesystem to provide PAGEs with Merkle tree
> blocks in it. Then, when fsverity is done with processing the
> blocks, reference to PAGE is freed. This doesn't fit well with the
> way XFS manages its memory.

BTW, I encourage using "fs/verity/" when referring specifically to the fsverity
support code in the kernel, which is located in that directory, as opposed to
the whole fsverity feature.

> This patch moves page reference management out of fsverity to
> filesystem. This way fsverity expects a kaddr to the Merkle tree
> block and filesystem can handle all caching and reference counting.

That's not done for the existing filesystems, though.  Which is probably the
right choice, but it isn't what this commit message says.

> diff --git a/fs/verity/open.c b/fs/verity/open.c
> index dfb9fe6aaae9..8665d8b40081 100644
> --- a/fs/verity/open.c
> +++ b/fs/verity/open.c
> @@ -126,19 +126,16 @@ int fsverity_init_merkle_tree_params(struct merkle_tree_params *params,
>  	}
>  
>  	/*
> -	 * With block_size != PAGE_SIZE, an in-memory bitmap will need to be
> -	 * allocated to track the "verified" status of hash blocks.  Don't allow
> -	 * this bitmap to get too large.  For now, limit it to 1 MiB, which
> -	 * limits the file size to about 4.4 TB with SHA-256 and 4K blocks.
> +	 * An in-memory bitmap will need to be allocated to track the "verified"
> +	 * status of hash blocks.  Don't allow this bitmap to get too large.
> +	 * For now, limit it to 1 MiB, which limits the file size to
> +	 * about 4.4 TB with SHA-256 and 4K blocks.
>  	 *
>  	 * Together with the fact that the data, and thus also the Merkle tree,
>  	 * cannot have more than ULONG_MAX pages, this implies that hash block
> -	 * indices can always fit in an 'unsigned long'.  But to be safe, we
> -	 * explicitly check for that too.  Note, this is only for hash block
> -	 * indices; data block indices might not fit in an 'unsigned long'.
> +	 * indices can always fit in an 'unsigned long'.
>  	 */
> -	if ((params->block_size != PAGE_SIZE && offset > 1 << 23) ||
> -	    offset > ULONG_MAX) {
> +	if (offset > (1 << 23)) {
>  		fsverity_err(inode, "Too many blocks in Merkle tree");
>  		err = -EFBIG;
>  		goto out_err;

This hunk should have been in the patch "fsverity: always use bitmap to track
verified status".

> diff --git a/fs/verity/read_metadata.c b/fs/verity/read_metadata.c
> index 197624cab43e..182bddf5dec5 100644
> --- a/fs/verity/read_metadata.c
> +++ b/fs/verity/read_metadata.c
> @@ -16,9 +16,9 @@ static int fsverity_read_merkle_tree(struct inode *inode,
>  				     const struct fsverity_info *vi,
>  				     void __user *buf, u64 offset, int length)
>  {
> -	const struct fsverity_operations *vops = inode->i_sb->s_vop;
>  	u64 end_offset;
> -	unsigned int offs_in_page;
> +	unsigned int offs_in_block;
> +	unsigned int block_size = vi->tree_params.block_size;

Maybe do here:

	const unsigned int block_size = vi->tree_params.block_size;
	const u8 log_blocksize = vi->tree_params.log_blocksize;

Then both are easily accessible to the rest of the function.

>  	pgoff_t index, last_index;
>  	int retval = 0;
>  	int err = 0;
> @@ -26,8 +26,8 @@ static int fsverity_read_merkle_tree(struct inode *inode,
>  	end_offset = min(offset + length, vi->tree_params.tree_size);
>  	if (offset >= end_offset)
>  		return 0;
> -	offs_in_page = offset_in_page(offset);
> -	last_index = (end_offset - 1) >> PAGE_SHIFT;
> +	offs_in_block = offset % block_size;

No modulo by non-constant values, please.  The block size always is a power of
2, so use 'offset & (block_size - 1)'.

> +	last_index = (end_offset - 1) >> vi->tree_params.log_blocksize;
>  
>  	/*
>	 * Iterate through each Merkle tree page in the requested range and copy
>	 * the requested portion to userspace.  Note that the Merkle tree block
>  	 * size isn't important here, as we are returning a byte stream; i.e.,
>  	 * we can just work with pages even if the tree block size != PAGE_SIZE.
>  	 */

The above comment needs to be updated.

> -	for (index = offset >> PAGE_SHIFT; index <= last_index; index++) {
> +	for (index = offset >> vi->tree_params.log_blocksize;
> +			index <= last_index; index++) {
>  		unsigned long num_ra_pages =
>  			min_t(unsigned long, last_index - index + 1,
>  			      inode->i_sb->s_bdi->io_pages);
>  		unsigned int bytes_to_copy = min_t(u64, end_offset - offset,
> -						   PAGE_SIZE - offs_in_page);
> -		struct page *page;
> -		const void *virt;
> +						   block_size - offs_in_block);
> +		struct fsverity_block block;
>  
> -		page = vops->read_merkle_tree_page(inode, index, num_ra_pages,
> -						   vi->tree_params.log_blocksize);
> -		if (IS_ERR(page)) {
> -			err = PTR_ERR(page);
> -			fsverity_err(inode,
> -				     "Error %d reading Merkle tree page %lu",
> -				     err, index);
> +		block.len = block_size;
> +		if (fsverity_read_merkle_tree_block(inode,
> +					index << vi->tree_params.log_blocksize,
> +					&block, num_ra_pages)) {
> +			fsverity_drop_block(inode, &block);
> +			err = -EFAULT;
>  			break;
>  		}

EFAULT is the wrong error code for an I/O error.

> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> index f556336ebd8d..dfe01f121843 100644
> --- a/fs/verity/verify.c
> +++ b/fs/verity/verify.c
> @@ -44,15 +44,15 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
>  	const struct merkle_tree_params *params = &vi->tree_params;
>  	const unsigned int hsize = params->digest_size;
>  	int level;
> +	int err;
> +	int num_ra_pages;
>  	u8 _want_hash[FS_VERITY_MAX_DIGEST_SIZE];
>  	const u8 *want_hash;
>  	u8 real_hash[FS_VERITY_MAX_DIGEST_SIZE];
>  	/* The hash blocks that are traversed, indexed by level */
>  	struct {
> -		/* Page containing the hash block */
> -		struct page *page;
> -		/* Mapped address of the hash block (will be within @page) */
> -		const void *addr;
> +		/* Block containing the hash block */
> +		struct fsverity_block block;

"Block containing the hash block" is nonsensical.  I think this indicates that
fsverity_block is misnamed.  It should be called something like
fsverity_blockbuf.  Then the above comment would be something like "Buffer
containing the hash block".

> @@ -93,10 +93,8 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
>  		unsigned long next_hidx;
>  		unsigned long hblock_idx;
>  		pgoff_t hpage_idx;
> -		unsigned int hblock_offset_in_page;
>  		unsigned int hoffset;
> -		struct page *hpage;
> -		const void *haddr;
> +		struct fsverity_block *block = &hblocks[level].block;
>  
>  		/*
>  		 * The index of the block in the current level; also the index
> @@ -110,34 +108,28 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi,
>  		/* Index of the hash page in the tree overall */
>  		hpage_idx = hblock_idx >> params->log_blocks_per_page;
>  
> -		/* Byte offset of the hash block within the page */
> -		hblock_offset_in_page =
> -			(hblock_idx << params->log_blocksize) & ~PAGE_MASK;
> -
>  		/* Byte offset of the hash within the block */
>  		hoffset = (hidx << params->log_digestsize) &
>  			  (params->block_size - 1);
>  
> -		hpage = inode->i_sb->s_vop->read_merkle_tree_page(inode,
> -				hpage_idx, level == 0 ? min(max_ra_pages,
> -					params->tree_pages - hpage_idx) : 0,
> -				params->log_blocksize);
> -		if (IS_ERR(hpage)) {
> +		block->len = params->block_size;
> +		num_ra_pages = level == 0 ?
> +			min(max_ra_pages, params->tree_pages - hpage_idx) : 0;

The fact that the readahead amount is still calculated in pages seems out of
place.  Maybe it should be done in bytes?

> -		if (vi->hash_block_verified)
> -			set_bit(hblock_idx, vi->hash_block_verified);
> -		else
> -			SetPageChecked(hpage);
> +		set_bit(hblock_idx, vi->hash_block_verified);

This hunk also should have been in "fsverity: always use bitmap to track
verified status".  But as I mentioned on that patch, I'm not sure we should
always use the bitmap.

> diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h
> index cac012d4c86a..ce37a430bc97 100644
> --- a/include/linux/fsverity.h
> +++ b/include/linux/fsverity.h
> @@ -26,6 +26,24 @@
>  /* Arbitrary limit to bound the kmalloc() size.  Can be changed. */
>  #define FS_VERITY_MAX_DESCRIPTOR_SIZE	16384
>  
> +/**
> + * struct fsverity_block - Merkle Tree block
> + * @kaddr: virtual address of the block's data
> + * @len: length of the data

Maybe use size instead of len?  Currently it's pretty consistently called the
"block size", not "block length".

> + * @cached: true if block was already in cache, false otherwise

cached => already_cached?

> + * @verified: true if block is verified against Merkle tree
> + * @context: filesystem private context
> + *
> + * Merkle Tree blocks passed and requested from filesystem
> + */

It needs to be clearly documented how these fields flow between the filesystem
and fs/verity/ when ->read_merkle_tree_block() and ->drop_block() are used.  It
seems to be different for different fields.  Which are inputs and which are
outputs to which functions?

> +	/**
> +	 * Read a Merkle tree block of the given inode.
> +	 * @inode: the inode
> +	 * @index: 0-based index of the block within the Merkle tree
> +	 * @num_ra_pages: The number of pages with blocks that should be
> +	 *		  prefetched starting at @index if the page at @index
> +	 *		  isn't already cached.  Implementations may ignore this
> +	 *		  argument; it's only a performance optimization.
> +	 *
> +	 * This can be called at any time on an open verity file.  It may be
> +	 * called by multiple processes concurrently.
> +	 *
> +	 * Return: 0 on success, -errno on failure
> +	 */
> +	int (*read_merkle_tree_block)(struct inode *inode,
> +				      unsigned int index,
> +				      struct fsverity_block *block,
> +				      unsigned long num_ra_pages);

For the second parameter your code actually passes the block position (in
bytes), not the block index.  Perhaps you meant for it to be 'u64 pos'?  Either
way, 'unsigned int' is wrong.  Merkle tree block indices are unsigned long;
positions are u64.

> -static inline void fsverity_drop_page(struct inode *inode, struct page *page)
> +static inline void fsverity_drop_block(struct inode *inode,
> +		struct fsverity_block *block)
>  {
> -	if (inode->i_sb->s_vop->drop_page)
> -		inode->i_sb->s_vop->drop_page(page);
> -	else
> +	if (inode->i_sb->s_vop->drop_block)
> +		inode->i_sb->s_vop->drop_block(block);
> +	else {
> +		struct page *page = (struct page *)block->context;
> +
> +		if (block->verified)
> +			SetPageChecked(page);
> +

Why is PG_checked being set here?

> +/**
> + * fsverity_read_block_from_page() - layer between fs using read page
> + * and read block
> + * @inode: inode in use for verification or metadata reading
> + * @index: index of the block in the tree (offset into the tree)
> + * @block: block to be read
> + * @num_ra_pages: number of pages to readahead, may be ignored
> + *
> + * Depending on fs implementation use read_merkle_tree_block or
> + * read_merkle_tree_page.
> + */
> +static inline int fsverity_read_merkle_tree_block(struct inode *inode,
> +					unsigned int index,
> +					struct fsverity_block *block,
> +					unsigned long num_ra_pages)
> +{
> +	struct page *page;
> +
> +	if (inode->i_sb->s_vop->read_merkle_tree_block)
> +		return inode->i_sb->s_vop->read_merkle_tree_block(
> +			inode, index, block, num_ra_pages);
> +
> +	page = inode->i_sb->s_vop->read_merkle_tree_page(
> +			inode, index >> PAGE_SHIFT, num_ra_pages,
> +			block->len);
> +
> +	block->kaddr = page_address(page) + (index % PAGE_SIZE);
> +	block->cached = PageChecked(page);
> +	block->context = page;
> +
> +	if (IS_ERR(page))
> +		return PTR_ERR(page);
> +	else
> +		return 0;
> +}

This isn't handling errors from ->read_merkle_tree_page() correctly.  Also,
page_address() can't be used for pagecache pages unless the file has opted out
of highmem.  You need to use kmap_local_page(), as the code was doing before.

Also, since fsverity_read_merkle_tree_block() is a private function for
fs/verity/, not for filesystems to use, it should be put in
fs/verity/fsverity_private.h, not in include/linux/fsverity.h.

- Eric

  parent reply	other threads:[~2023-10-11  3:56 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 18:48 [PATCH v3 00/28] fs-verity support for XFS Andrey Albershteyn
2023-10-06 18:48 ` [PATCH v3 01/28] xfs: Add new name to attri/d Andrey Albershteyn
2023-10-06 18:48 ` [PATCH v3 02/28] xfs: add parent pointer support to attribute code Andrey Albershteyn
2023-10-06 18:48 ` [PATCH v3 03/28] xfs: define parent pointer xattr format Andrey Albershteyn
2023-10-06 18:48 ` [PATCH v3 04/28] xfs: Add xfs_verify_pptr Andrey Albershteyn
2023-10-11  1:01   ` Darrick J. Wong
2023-10-11 11:09     ` Andrey Albershteyn
2023-10-06 18:48 ` [PATCH v3 05/28] fs: add FS_XFLAG_VERITY for fs-verity sealed inodes Andrey Albershteyn
2023-10-11  4:05   ` Eric Biggers
2023-10-11 11:06     ` Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 06/28] fsverity: add drop_page() callout Andrey Albershteyn
2023-10-11  3:06   ` Eric Biggers
2023-10-11 11:11     ` Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 07/28] fsverity: always use bitmap to track verified status Andrey Albershteyn
2023-10-11  3:15   ` Eric Biggers
2023-10-11 13:03     ` Andrey Albershteyn
2023-10-12  7:27       ` Eric Biggers
2023-10-13  3:12         ` Darrick J. Wong
2023-10-17  4:58           ` Eric Biggers
2023-10-18  2:35             ` Darrick J. Wong
2023-10-17  6:01           ` Christoph Hellwig
2023-10-16 11:52         ` Andrey Albershteyn
2023-10-17  5:57         ` Christoph Hellwig
2023-10-17 17:49           ` Eric Biggers
2023-10-06 18:49 ` [PATCH v3 08/28] fsverity: pass Merkle tree block size to ->read_merkle_tree_page() Andrey Albershteyn
2023-10-11  3:17   ` Eric Biggers
2023-10-11 11:13     ` Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 09/28] fsverity: pass log_blocksize to end_enable_verity() Andrey Albershteyn
2023-10-11  3:19   ` Eric Biggers
2023-10-11 11:17     ` Andrey Albershteyn
2023-10-12  7:34       ` Eric Biggers
2023-10-06 18:49 ` [PATCH v3 10/28] fsverity: operate with Merkle tree blocks instead of pages Andrey Albershteyn
2023-10-07  4:02   ` kernel test robot
2023-10-11  3:56   ` Eric Biggers [this message]
2023-10-16 13:00   ` Christoph Hellwig
2023-10-06 18:49 ` [PATCH v3 11/28] iomap: pass readpage operation to read path Andrey Albershteyn
2023-10-11 18:31   ` Darrick J. Wong
2023-10-16 12:35     ` Andrey Albershteyn
2023-10-16  9:15   ` Christoph Hellwig
2023-10-16 12:32     ` Andrey Albershteyn
2023-10-16 12:58       ` Christoph Hellwig
2023-10-06 18:49 ` [PATCH v3 12/28] iomap: allow filesystem to implement read path verification Andrey Albershteyn
2023-10-11 18:39   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 13/28] xfs: add XBF_VERITY_CHECKED xfs_buf flag Andrey Albershteyn
2023-10-11 18:54   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 14/28] xfs: add XFS_DA_OP_BUFFER to make xfs_attr_get() return buffer Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 15/28] xfs: introduce workqueue for post read IO work Andrey Albershteyn
2023-10-11 18:55   ` Darrick J. Wong
2023-10-16 12:37     ` Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 16/28] xfs: add bio_set and submit_io for ioend post-processing Andrey Albershteyn
2023-10-11 18:47   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 17/28] xfs: add attribute type for fs-verity Andrey Albershteyn
2023-10-11 18:48   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 18/28] xfs: make xfs_buf_get() to take XBF_* flags Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 19/28] xfs: add XBF_DOUBLE_ALLOC to increase size of the buffer Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 20/28] xfs: add fs-verity ro-compat flag Andrey Albershteyn
2023-10-11 18:56   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 21/28] xfs: add inode on-disk VERITY flag Andrey Albershteyn
2023-10-11 18:57   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 22/28] xfs: initialize fs-verity on file open and cleanup on inode destruction Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 23/28] xfs: don't allow to enable DAX on fs-verity sealsed inode Andrey Albershteyn
2023-10-11 19:00   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 24/28] xfs: disable direct read path for fs-verity sealed files Andrey Albershteyn
2023-10-11 19:02   ` Darrick J. Wong
2023-10-06 18:49 ` [PATCH v3 25/28] xfs: add fs-verity support Andrey Albershteyn
2023-10-06 23:40   ` kernel test robot
2023-10-11  1:39   ` Darrick J. Wong
2023-10-11 14:36     ` Andrey Albershteyn
2023-10-18 19:18   ` kernel test robot
2023-10-06 18:49 ` [PATCH v3 26/28] xfs: make scrub aware of verity dinode flag Andrey Albershteyn
2023-10-11  1:06   ` Darrick J. Wong
2023-10-11 14:37     ` Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 27/28] xfs: add fs-verity ioctls Andrey Albershteyn
2023-10-06 18:49 ` [PATCH v3 28/28] xfs: enable ro-compat fs-verity flag Andrey Albershteyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231011035622.GE1185@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=aalbersh@redhat.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=fsverity@lists.linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).