linux-fscrypt.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fscrypt, i_blkbits and network filesystems
@ 2020-10-08 12:25 Jeff Layton
  2020-10-08 17:46 ` Eric Biggers
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2020-10-08 12:25 UTC (permalink / raw)
  To: linux-fscrypt; +Cc: Eric Biggers

I've had to table the work on fscrypt+ceph for a bit to take care of
some other issues, but I'm hoping to return to it soon, and I've started
looking at the content encryption in more detail.

One thing I'm not sure how to handle yet is fscrypt's reliance on
inode->i_blkbits. For ceph (and most netfs's), this value is a fiction.
We're not constrained to reading/writing along block boundaries.

Cephfs usually sets the blocksize in a S_ISREG inode to the same as a
"chunk" on the OSD (usu. 4M). That's a bit too large to deal with IMO,
so I'm looking at lowering that to PAGE_SIZE when fscrypt is enabled.

That's reasonable when we can do pagecache-based I/O, but sometimes
netfs's will do I/O directly from read_iter/write_iter. For ceph, we may
need to do a rmw cycle if the iovec passed down from userland doesn't
align to crypto block boundaries. Ceph has a way to do a cmp_extent
operation such that it will only do the write if nothing changed in the
interim, so we can handle that case, but it would be better not to have
to read/write more than we need.

For the netfs case, would we be better off avoiding routines that take
i_blkbits into account, and instead just work with
fscrypt_encrypt_block_inplace / fscrypt_decrypt_block_inplace, maybe
even by rolling new helpers that call them under the hood? Or, would
that cause issues that I haven't forseen, and I should just stick to
PAGE_SIZE blocks?

Thoughts?
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fscrypt, i_blkbits and network filesystems
  2020-10-08 12:25 fscrypt, i_blkbits and network filesystems Jeff Layton
@ 2020-10-08 17:46 ` Eric Biggers
  2020-10-09 20:16   ` Jeff Layton
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Biggers @ 2020-10-08 17:46 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fscrypt

On Thu, Oct 08, 2020 at 08:25:10AM -0400, Jeff Layton wrote:
> I've had to table the work on fscrypt+ceph for a bit to take care of
> some other issues, but I'm hoping to return to it soon, and I've started
> looking at the content encryption in more detail.
> 
> One thing I'm not sure how to handle yet is fscrypt's reliance on
> inode->i_blkbits. For ceph (and most netfs's), this value is a fiction.
> We're not constrained to reading/writing along block boundaries.
> 
> Cephfs usually sets the blocksize in a S_ISREG inode to the same as a
> "chunk" on the OSD (usu. 4M). That's a bit too large to deal with IMO,
> so I'm looking at lowering that to PAGE_SIZE when fscrypt is enabled.
> 
> That's reasonable when we can do pagecache-based I/O, but sometimes
> netfs's will do I/O directly from read_iter/write_iter. For ceph, we may
> need to do a rmw cycle if the iovec passed down from userland doesn't
> align to crypto block boundaries. Ceph has a way to do a cmp_extent
> operation such that it will only do the write if nothing changed in the
> interim, so we can handle that case, but it would be better not to have
> to read/write more than we need.
> 
> For the netfs case, would we be better off avoiding routines that take
> i_blkbits into account, and instead just work with
> fscrypt_encrypt_block_inplace / fscrypt_decrypt_block_inplace, maybe
> even by rolling new helpers that call them under the hood? Or, would
> that cause issues that I haven't forseen, and I should just stick to
> PAGE_SIZE blocks?

First, you should avoid using "PAGE_SIZE" as the crypto data unit size, since
PAGE_SIZE isn't the same everywhere.  E.g. PAGE_SIZE is 4096 bytes on x86, but
usually 65536 bytes on PowerPC.  If encrypted files are created on x86, they
should be readable on PowerPC too, and vice versa.  That means the crypto data
unit size should be a specific value, generally 4096 bytes.  But other
power-of-2 sizes could be allowed too.

Second, I'm not really understanding what the problem is with setting i_blkbits
for IS_ENCRYPTED() inodes to the log2 of the crypto data unit size.  Wouldn't
that be the right thing to do?  Even though it wouldn't have any meaning for the
server, it would have a meaning for the client -- it would be the granularity of
encryption (and decryption).

If it really is a problem, by "fscrypt's reliance on inode->i_blkbits" are you
specifically referring to fscrypt_encrypt_pagecache_blocks() and
fscrypt_decrypt_pagecache_blocks()?  If so, I think the way to go would be to
add __fscrypt_encrypt_pagecache_blocks() and
__fscrypt_decrypt_pagecache_blocks() which have a blkbits argument.

Or alternatively just add a blkbits argument to the existing functions, but I'd
prefer to avoid adding error-prone arguments to all callers of these.

fscrypt_encrypt_block_inplace() does in-place encryption, which isn't what you
want because you want to encrypt into a bounce page, right?
fscrypt_encrypt_block_inplace() and fscrypt_decrypt_block_inplace() also take
too many arguments, including lblk_num, which is error-prone.

- Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fscrypt, i_blkbits and network filesystems
  2020-10-08 17:46 ` Eric Biggers
@ 2020-10-09 20:16   ` Jeff Layton
  2020-10-09 21:50     ` Eric Biggers
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2020-10-09 20:16 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-fscrypt

On Thu, 2020-10-08 at 10:46 -0700, Eric Biggers wrote:
> On Thu, Oct 08, 2020 at 08:25:10AM -0400, Jeff Layton wrote:
> > I've had to table the work on fscrypt+ceph for a bit to take care of
> > some other issues, but I'm hoping to return to it soon, and I've started
> > looking at the content encryption in more detail.
> > 
> > One thing I'm not sure how to handle yet is fscrypt's reliance on
> > inode->i_blkbits. For ceph (and most netfs's), this value is a fiction.
> > We're not constrained to reading/writing along block boundaries.
> > 
> > Cephfs usually sets the blocksize in a S_ISREG inode to the same as a
> > "chunk" on the OSD (usu. 4M). That's a bit too large to deal with IMO,
> > so I'm looking at lowering that to PAGE_SIZE when fscrypt is enabled.
> > 
> > That's reasonable when we can do pagecache-based I/O, but sometimes
> > netfs's will do I/O directly from read_iter/write_iter. For ceph, we may
> > need to do a rmw cycle if the iovec passed down from userland doesn't
> > align to crypto block boundaries. Ceph has a way to do a cmp_extent
> > operation such that it will only do the write if nothing changed in the
> > interim, so we can handle that case, but it would be better not to have
> > to read/write more than we need.
> > 
> > For the netfs case, would we be better off avoiding routines that take
> > i_blkbits into account, and instead just work with
> > fscrypt_encrypt_block_inplace / fscrypt_decrypt_block_inplace, maybe
> > even by rolling new helpers that call them under the hood? Or, would
> > that cause issues that I haven't forseen, and I should just stick to
> > PAGE_SIZE blocks?
> 
> First, you should avoid using "PAGE_SIZE" as the crypto data unit size, since
> PAGE_SIZE isn't the same everywhere.  E.g. PAGE_SIZE is 4096 bytes on x86, but
> usually 65536 bytes on PowerPC.  If encrypted files are created on x86, they
> should be readable on PowerPC too, and vice versa.  That means the crypto data
> unit size should be a specific value, generally 4096 bytes.  But other
> power-of-2 sizes could be allowed too.
> 

Ok, good point.

Pardon my lack of crypto knowledge, but I assume we have to ensure that
we use the same crypto block size everywhere for the same inode as well?
i.e., I can't encrypt a 4k block and then read in and decrypt a 16 byte
chunk of it?

> Second, I'm not really understanding what the problem is with setting i_blkbits
> for IS_ENCRYPTED() inodes to the log2 of the crypto data unit size.  Wouldn't
> that be the right thing to do?  Even though it wouldn't have any meaning for the
> server, it would have a meaning for the client -- it would be the granularity of
> encryption (and decryption).
> 

It's not a huge problem. I was thinking there might be an issue with
some applications, but I don't think it really matters. The blocksize
reported by stat is sort of a nebulous concept anyway when you get to a
network filesystem.

The only real problem we have is that an application might pass down an
I/O that is smaller than 4k, but we haven't been granted the capability
to do buffered I/O. In that situation, we'll need to read what's there
now (if anything) and then dispatch a synchronous write op that is gated
on that data not having changed. 

There's some benefit to dealing with as small a chunk of data as we can,
but 4k is probably a reasonable chunk to work with in most cases if
that's not possible.

> If it really is a problem, by "fscrypt's reliance on inode->i_blkbits" are you
> specifically referring to fscrypt_encrypt_pagecache_blocks() and
> fscrypt_decrypt_pagecache_blocks()?  If so, I think the way to go would be to
> add __fscrypt_encrypt_pagecache_blocks() and
> __fscrypt_decrypt_pagecache_blocks() which have a blkbits argument.
> 
> Or alternatively just add a blkbits argument to the existing functions, but I'd
> prefer to avoid adding error-prone arguments to all callers of these.
> 
> fscrypt_encrypt_block_inplace() does in-place encryption, which isn't what you
> want because you want to encrypt into a bounce page, right?
> fscrypt_encrypt_block_inplace() and fscrypt_decrypt_block_inplace() also take
> too many arguments, including lblk_num, which is error-prone.

Ok, got it, thanks.

We do want to encrypt into bounce pages that we can pass to the
messenger engine when doing writeback. We should be able to decrypt in
place in most cases though.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fscrypt, i_blkbits and network filesystems
  2020-10-09 20:16   ` Jeff Layton
@ 2020-10-09 21:50     ` Eric Biggers
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Biggers @ 2020-10-09 21:50 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fscrypt

On Fri, Oct 09, 2020 at 04:16:38PM -0400, Jeff Layton wrote:
> On Thu, 2020-10-08 at 10:46 -0700, Eric Biggers wrote:
> > 
> > First, you should avoid using "PAGE_SIZE" as the crypto data unit size, since
> > PAGE_SIZE isn't the same everywhere.  E.g. PAGE_SIZE is 4096 bytes on x86, but
> > usually 65536 bytes on PowerPC.  If encrypted files are created on x86, they
> > should be readable on PowerPC too, and vice versa.  That means the crypto data
> > unit size should be a specific value, generally 4096 bytes.  But other
> > power-of-2 sizes could be allowed too.
> > 
> 
> Ok, good point.
> 
> Pardon my lack of crypto knowledge, but I assume we have to ensure that
> we use the same crypto block size everywhere for the same inode as well?
> i.e., I can't encrypt a 4k block and then read in and decrypt a 16 byte
> chunk of it?

That's basically correct.  As I mentioned earlier: For AES-XTS specifically,
*in principle* it's possible to encrypt/decrypt an individual 16-byte aligned
region.  But Linux's crypto API doesn't currently support sub-message crypto,
and also fscrypt supports the AES-CBC and Adiantum encryption modes which have
stricter requirements.

> > Second, I'm not really understanding what the problem is with setting i_blkbits
> > for IS_ENCRYPTED() inodes to the log2 of the crypto data unit size.  Wouldn't
> > that be the right thing to do?  Even though it wouldn't have any meaning for the
> > server, it would have a meaning for the client -- it would be the granularity of
> > encryption (and decryption).
> > 
> 
> It's not a huge problem. I was thinking there might be an issue with
> some applications, but I don't think it really matters. The blocksize
> reported by stat is sort of a nebulous concept anyway when you get to a
> network filesystem.
> 
> The only real problem we have is that an application might pass down an
> I/O that is smaller than 4k, but we haven't been granted the capability
> to do buffered I/O. In that situation, we'll need to read what's there
> now (if anything) and then dispatch a synchronous write op that is gated
> on that data not having changed. 
> 
> There's some benefit to dealing with as small a chunk of data as we can,
> but 4k is probably a reasonable chunk to work with in most cases if
> that's not possible.

Applications can do reads/writes of any length regardless of what they see in
stat::st_blksize.  So you're going to have to support reads/writes with length
less than the data unit size (granularity of encryption) anyway.

You can choose whatever data unit size you want; it's a trade-off between the
fixed overhead of doing each encryption/decryption operation, and the
granularity of I/O that you want to support.  I'd assume that 4096 bytes would
be a good compromise for ceph, like it is for the other filesystems.  It also
matches PAGE_SIZE on most platforms.  But it's possible that something else
would be better.

- Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-09 21:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08 12:25 fscrypt, i_blkbits and network filesystems Jeff Layton
2020-10-08 17:46 ` Eric Biggers
2020-10-09 20:16   ` Jeff Layton
2020-10-09 21:50     ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).