Re: [PATCH v4 3/7] iomap: support direct I/O with fscrypt using blk-crypto

From: Eric Biggers <ebiggers@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Satya Tangirala <satyat@google.com>,
	linux-fscrypt@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4 3/7] iomap: support direct I/O with fscrypt using blk-crypto
Date: Wed, 22 Jul 2020 15:34:04 -0700	[thread overview]
Message-ID: <20200722223404.GA76479@sol.localdomain> (raw)
In-Reply-To: <20200722211629.GE2005@dread.disaster.area>

On Thu, Jul 23, 2020 at 07:16:29AM +1000, Dave Chinner wrote:
> On Mon, Jul 20, 2020 at 11:37:35PM +0000, Satya Tangirala wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Wire up iomap direct I/O with the fscrypt additions for direct I/O.
> > This allows ext4 to support direct I/O on encrypted files when inline
> > encryption is enabled.
> > 
> > This change consists of two parts:
> > 
> > - Set a bio_crypt_ctx on bios for encrypted files, so that the file
> >   contents get encrypted (or decrypted).
> > 
> > - Ensure that encryption data unit numbers (DUNs) are contiguous within
> >   each bio.  Use the new function fscrypt_limit_io_pages() for this,
> >   since the iomap code works directly with logical ranges and thus
> >   doesn't have a chance to call fscrypt_mergeable_bio() on each page.
> > 
> > Note that fscrypt_limit_io_pages() is normally a no-op, as normally the
> > DUNs simply increment along with the logical blocks.  But it's needed to
> > handle an edge case in one of the fscrypt IV generation methods.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > Co-developed-by: Satya Tangirala <satyat@google.com>
> > Signed-off-by: Satya Tangirala <satyat@google.com>
> > ---
> >  fs/iomap/direct-io.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index ec7b78e6feca..12064daa3e3d 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -6,6 +6,7 @@
> >  #include <linux/module.h>
> >  #include <linux/compiler.h>
> >  #include <linux/fs.h>
> > +#include <linux/fscrypt.h>
> >  #include <linux/iomap.h>
> >  #include <linux/backing-dev.h>
> >  #include <linux/uio.h>
> > @@ -183,11 +184,16 @@ static void
> >  iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
> >  		unsigned len)
> >  {
> > +	struct inode *inode = file_inode(dio->iocb->ki_filp);
> >  	struct page *page = ZERO_PAGE(0);
> >  	int flags = REQ_SYNC | REQ_IDLE;
> >  	struct bio *bio;
> >  
> >  	bio = bio_alloc(GFP_KERNEL, 1);
> > +
> > +	/* encrypted direct I/O is guaranteed to be fs-block aligned */
> > +	WARN_ON_ONCE(fscrypt_needs_contents_encryption(inode));
> 
> Which means you are now placing a new constraint on this code in
> that we cannot ever, in future, zero entire blocks here.
> 
> This code can issue arbitrary sized zeroing bios - multiple entire fs blocks
> blocks if necessary - so I think constraining it to only support
> partial block zeroing by adding a warning like this is no correct.

In v3 and earlier this instead had the code to set an encryption context:

	fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
				  GFP_KERNEL);

Would you prefer that, even though the call to fscrypt_set_bio_crypt_ctx() would
always be a no-op currently (since for now, iomap_dio_zero() will never be
called with an encrypted file) and thus wouldn't be properly tested?

BTW, iomap_dio_zero() is actually limited to one page, so it's not quite
"arbitrary sizes".
	
> 
> > @@ -253,6 +259,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
> >  		ret = nr_pages;
> >  		goto out;
> >  	}
> > +	nr_pages = fscrypt_limit_io_pages(inode, pos, nr_pages);
> 
> So if "pos" overlaps a 2^32 offset when a certain mode is used, we
> have to break up the IO?

More or less.  It's actually when 'hashed_ino + (pos >> i_blkbits)' overlaps a
2^32 offset.  But yes, we have to break up the I/O when it happens.

> 
> If true, I'm not sure that this belongs here. Limiting the size of
> the IOs because of filesytem contraints belongs in the filesystem
> extent mapping code. That's the point where the IO is broken up into
> maximally sized chunks the filesystem can issue as a contiguous
> range. If the fscrypt code is breaking that "contiguous IO range"
> because of the mode being used, the fs mapping code should break
> the mapping at the boundery where the IO needs to be broken.
> 
> Hence the dio mapping code here will never build IOs that cross this
> -filesystem- encryption limitation, and we don't need this fscrypt
> code in the direct IO path at all.
> 

I think that would work.

iomap is used for other filesystem operations too, so we need to consider when
to actually do the limiting.  I don't think we should break up the extents
returned FS_IOC_FIEMAP, for example.  FIEMAP already has a defined behavior.
Also, it would be weird for the list of extents that FIEMAP returns to change
depending on whether the filesystem is mounted with '-o inlinecrypt' or not.

So, something like this:

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 44bad4bb8831..2816194db46c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3437,6 +3437,15 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits,
 			  EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1;
 
+	/*
+	 * When inline encryption is enabled, sometimes I/O to an encrypted file
+	 * has to be broken up to guarantee DUN contiguity.  Handle this by
+	 * limiting the length of the mapping returned.
+	 */
+	if (!(flags & IOMAP_REPORT))
+		map.m_len = fscrypt_limit_io_blocks(inode, map.m_lblk,
+						    map.m_len);
+
 	if (flags & IOMAP_WRITE)
 		ret = ext4_iomap_alloc(inode, &map, flags);
 	else


That also avoids any confusion between pages and blocks, which is nice.

- Eric