ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ceph-mds infrastructure for fscrypt
@ 2021-04-22 18:18 Jeff Layton
  2021-04-23 11:46 ` Luis Henriques
  2021-04-29 23:46 ` Patrick Donnelly
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Layton @ 2021-04-22 18:18 UTC (permalink / raw)
  To: dev, ceph-devel
  Cc: Luis Henriques, Patrick Donnelly, Xiubo Li, Gregory Farnum,
	Douglas Fuller

tl;dr: we need to change the MDS infrastructure for fscrypt (again), and
I want to do it in a way that would clean up some existing mess and more
easily allow for future changes. The design is a bit odd though...

Sorry for the long email here, but I needed communicate this design, and
the rationale for the changes I'm proposing. First, the rationale:

I've been (intermittently) working on the fscrypt implementation for
cephfs, and have posted a few different draft proposals for the first
part of it [1], which rely on a couple of changes in the MDS:

- the alternate_names feature [2]. This is needed to handle extra-long
  filenames without allowing unprintable characters in the filename.

- setting an "fscrypted" flag if the inode has an fscrypt context blob
  in encryption.ctx xattr [3].

With the filenames part more or less done, the next steps are to plumb
in content encryption. Because the MDS handles truncates, we have to
teach it to align those on fscrypt block boundaries. Rather than foist
those details onto the MDS, the current idea is to add an opaque blob to
the inode that would get updated along with size changes. The client
would be responsible for filling out that field with the actual i_size,
and would always round the existing size field up to the end of the last
crypto block. That keeps the real size opaque to the MDS and the
existing size handling logic should "just work". Regardless, that means
we need another inode field for the size.

Storing the context in an xattr is also proving to be problematic [4].
There are some situations where we can end up with an inode that is
flagged as encrypted but doesn't have the caps to trust its xattrs. We
could just treat "encryption.ctx" as special and not require Xs caps to
read whatever cached value we have, and that might fix that issue, but
I'm not fully convinced that's foolproof. We might end up with no cached
context on a directory that is actually encrypted in some cases and not
have a context.

At this point, I'm thinking it might be best to unify all of the 
per-inode info into a single field that the MDS would treat as opaque.
Note that the alternate_names feature would remain more or less
untouched since it's associated more with dentries than inodes.

The initial version of this field would look something like this:

struct ceph_fscrypt_context {
	u8				version;	// == 1
	struct fscrypt_context_v2	fscrypt_ctx;	// 40 bytes
	__le32				blocksize	// 4k for now
	__le64				size;		// "real"
i_size
};

The MDS would send this along with any size updates (InodeStat, and
MClientCaps replies). The client would need to send this in cap
flushes/updates, and we'd also need to extend the SETATTR op too, so the
client can update this field in truncates (at least).

I don't look forward to having to plumb this into all of the different
client ops that can create inodes though. What I'm thinking we might
want to do is expose this field as the "ceph.fscrypt" vxattr.

The client can stuff that into the xattr blob when creating a new inode,
and the MDS can scrape it out of that and move the data into the correct
field in the inode. A setxattr on this field would update the new field
too. It's an ugly interface, but shouldn't be too bad to handle and we
have some precedent for this sort of thing.

The rules for handling the new field in the client would be a bit weird
though. We'll need to allow it to reading the fscrypt_ctx part without
any caps (since that should be static once it's set), but the size
handling needs to be under the same caps as the traditional size field
(Is that Fsx? The rules for this are never quite clear to me.)

Would it be better to have two different fields here -- fscrypt_auth and
fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
need to keep all of this info together, but it seemed neater that way.

Thoughts? Opinions? Is this a horrible idea? What would be better?

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

[1]: latest draft was posted here:
https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#t
[2]: https://github.com/ceph/ceph/pull/37297
[3]:
https://github.com/ceph/ceph/commit/7fe1c57846a42443f0258fd877d7166f33fd596f
[4]:
https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-22 18:18 ceph-mds infrastructure for fscrypt Jeff Layton
@ 2021-04-23 11:46 ` Luis Henriques
  2021-04-23 12:27   ` Jeff Layton
  2021-04-29 23:46 ` Patrick Donnelly
  1 sibling, 1 reply; 12+ messages in thread
From: Luis Henriques @ 2021-04-23 11:46 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dev, ceph-devel, Luis Henriques, Patrick Donnelly, Xiubo Li,
	Gregory Farnum, Douglas Fuller

Jeff Layton <jlayton@redhat.com> writes:

> tl;dr: we need to change the MDS infrastructure for fscrypt (again), and
> I want to do it in a way that would clean up some existing mess and more
> easily allow for future changes. The design is a bit odd though...

Thanks for summarizing this issue in an email.  It really helps to see the
full picture.

> Sorry for the long email here, but I needed communicate this design, and
> the rationale for the changes I'm proposing. First, the rationale:
>
> I've been (intermittently) working on the fscrypt implementation for
> cephfs, and have posted a few different draft proposals for the first
> part of it [1], which rely on a couple of changes in the MDS:
>
> - the alternate_names feature [2]. This is needed to handle extra-long
>   filenames without allowing unprintable characters in the filename.
>
> - setting an "fscrypted" flag if the inode has an fscrypt context blob
>   in encryption.ctx xattr [3].
>
> With the filenames part more or less done, the next steps are to plumb
> in content encryption. Because the MDS handles truncates, we have to
> teach it to align those on fscrypt block boundaries. Rather than foist
> those details onto the MDS, the current idea is to add an opaque blob to
> the inode that would get updated along with size changes. The client
> would be responsible for filling out that field with the actual i_size,
> and would always round the existing size field up to the end of the last
> crypto block. That keeps the real size opaque to the MDS and the
> existing size handling logic should "just work". Regardless, that means
> we need another inode field for the size.
>
> Storing the context in an xattr is also proving to be problematic [4].
> There are some situations where we can end up with an inode that is
> flagged as encrypted but doesn't have the caps to trust its xattrs. We
> could just treat "encryption.ctx" as special and not require Xs caps to
> read whatever cached value we have, and that might fix that issue, but
> I'm not fully convinced that's foolproof. We might end up with no cached
> context on a directory that is actually encrypted in some cases and not
> have a context.
>
> At this point, I'm thinking it might be best to unify all of the 
> per-inode info into a single field that the MDS would treat as opaque.
> Note that the alternate_names feature would remain more or less
> untouched since it's associated more with dentries than inodes.
>
> The initial version of this field would look something like this:
>
> struct ceph_fscrypt_context {
> 	u8				version;	// == 1
> 	struct fscrypt_context_v2	fscrypt_ctx;	// 40 bytes
> 	__le32				blocksize	// 4k for now
> 	__le64				size;		// "real"
> i_size
> };
>
> The MDS would send this along with any size updates (InodeStat, and
> MClientCaps replies). The client would need to send this in cap
> flushes/updates, and we'd also need to extend the SETATTR op too, so the
> client can update this field in truncates (at least).
>
> I don't look forward to having to plumb this into all of the different
> client ops that can create inodes though. What I'm thinking we might
> want to do is expose this field as the "ceph.fscrypt" vxattr.
>
> The client can stuff that into the xattr blob when creating a new inode,
> and the MDS can scrape it out of that and move the data into the correct
> field in the inode. A setxattr on this field would update the new field
> too. It's an ugly interface, but shouldn't be too bad to handle and we
> have some precedent for this sort of thing.

I don't really have an objection for this, but I'm not sure I understand
why we would want to have this as a vxattr if the it will really be stored
in the inode.  Will this make things easier on the client side?  Or is
that just a matter of having visibility into these fields?

> The rules for handling the new field in the client would be a bit weird
> though. We'll need to allow it to reading the fscrypt_ctx part without
> any caps (since that should be static once it's set), but the size

The PIN cap seems to fit here as the ctx can be considered an "immutable"
field to some extent.  This means that, if there's a context, it's safe to
assume it's valid.

If it's possible to request PIN caps to be revoked (is it?), a client
could simply do that when a directory is initially encrypted.  After that,
any client getting PIN caps for it will have the new fscrypt_ctx.

> handling needs to be under the same caps as the traditional size field
> (Is that Fsx? The rules for this are never quite clear to me.)

 [ A different question which is maybe a bit OT in this context but that
   pops up in my mind quite often is how to handle multiple writers to the
   same file.  It should be OK to have 2 clients doing O_DIRECT as long as
   they are writing to different encryption blocks but it's still tricky,
   isn't it?  Can't we end-up with a corrupted file that is completely
   unrecoverable?  Should O_DIRECT be forbid in encrypted inodes?  Not to
   mention LAZYIO... ]

> Would it be better to have two different fields here -- fscrypt_auth and
> fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
> need to keep all of this info together, but it seemed neater that way.
>
> Thoughts? Opinions? Is this a horrible idea? What would be better?

I've been looping through this since yesterday and I'm convinced the
design will need to be something close to what you just described.  I
can't poke holes in it, but the devil is always on the details.

Cheers,
-- 
Luis

>
> Thanks,
> -- 
> Jeff Layton <jlayton@redhat.com>
>
> [1]: latest draft was posted here:
> https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#t
> [2]: https://github.com/ceph/ceph/pull/37297
> [3]:
> https://github.com/ceph/ceph/commit/7fe1c57846a42443f0258fd877d7166f33fd596f
> [4]:
> https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-23 11:46 ` Luis Henriques
@ 2021-04-23 12:27   ` Jeff Layton
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Layton @ 2021-04-23 12:27 UTC (permalink / raw)
  To: Luis Henriques
  Cc: dev, ceph-devel, Luis Henriques, Patrick Donnelly, Xiubo Li,
	Gregory Farnum, Douglas Fuller

On Fri, 2021-04-23 at 12:46 +0100, Luis Henriques wrote:
> Jeff Layton <jlayton@redhat.com> writes:
> 
> > tl;dr: we need to change the MDS infrastructure for fscrypt (again), and
> > I want to do it in a way that would clean up some existing mess and more
> > easily allow for future changes. The design is a bit odd though...
> 
> Thanks for summarizing this issue in an email.  It really helps to see the
> full picture.
> 
> > Sorry for the long email here, but I needed communicate this design, and
> > the rationale for the changes I'm proposing. First, the rationale:
> > 
> > I've been (intermittently) working on the fscrypt implementation for
> > cephfs, and have posted a few different draft proposals for the first
> > part of it [1], which rely on a couple of changes in the MDS:
> > 
> > - the alternate_names feature [2]. This is needed to handle extra-long
> >   filenames without allowing unprintable characters in the filename.
> > 
> > - setting an "fscrypted" flag if the inode has an fscrypt context blob
> >   in encryption.ctx xattr [3].
> > 
> > With the filenames part more or less done, the next steps are to plumb
> > in content encryption. Because the MDS handles truncates, we have to
> > teach it to align those on fscrypt block boundaries. Rather than foist
> > those details onto the MDS, the current idea is to add an opaque blob to
> > the inode that would get updated along with size changes. The client
> > would be responsible for filling out that field with the actual i_size,
> > and would always round the existing size field up to the end of the last
> > crypto block. That keeps the real size opaque to the MDS and the
> > existing size handling logic should "just work". Regardless, that means
> > we need another inode field for the size.
> > 
> > Storing the context in an xattr is also proving to be problematic [4].
> > There are some situations where we can end up with an inode that is
> > flagged as encrypted but doesn't have the caps to trust its xattrs. We
> > could just treat "encryption.ctx" as special and not require Xs caps to
> > read whatever cached value we have, and that might fix that issue, but
> > I'm not fully convinced that's foolproof. We might end up with no cached
> > context on a directory that is actually encrypted in some cases and not
> > have a context.
> > 
> > At this point, I'm thinking it might be best to unify all of the 
> > per-inode info into a single field that the MDS would treat as opaque.
> > Note that the alternate_names feature would remain more or less
> > untouched since it's associated more with dentries than inodes.
> > 
> > The initial version of this field would look something like this:
> > 
> > struct ceph_fscrypt_context {
> > 	u8				version;	// == 1
> > 	struct fscrypt_context_v2	fscrypt_ctx;	// 40 bytes
> > 	__le32				blocksize	// 4k for now
> > 	__le64				size;		// "real"
> > i_size
> > };
> > 
> > The MDS would send this along with any size updates (InodeStat, and
> > MClientCaps replies). The client would need to send this in cap
> > flushes/updates, and we'd also need to extend the SETATTR op too, so the
> > client can update this field in truncates (at least).
> > 
> > I don't look forward to having to plumb this into all of the different
> > client ops that can create inodes though. What I'm thinking we might
> > want to do is expose this field as the "ceph.fscrypt" vxattr.
> > 
> > The client can stuff that into the xattr blob when creating a new inode,
> > and the MDS can scrape it out of that and move the data into the correct
> > field in the inode. A setxattr on this field would update the new field
> > too. It's an ugly interface, but shouldn't be too bad to handle and we
> > have some precedent for this sort of thing.
> 
> I don't really have an objection for this, but I'm not sure I understand
> why we would want to have this as a vxattr if the it will really be stored
> in the inode.  Will this make things easier on the client side?  Or is
> that just a matter of having visibility into these fields?
> 

Mainly because I don't want to have to extend a bunch of MDS calls. We
need to ship the context along with every create, which means we'd need
to extend the MClientRequest calls for openc, mkdir, mknod, symlink,
etc. Since we already send an xattr blob with all of those, we can just
stuff this into there and avoid having to extend them all.

Being able to fetch the value with getxattr is sort of secondary, but
it's nice too.

> The rules for handling the new field in the client would be a bitweird
> > though. We'll need to allow it to reading the fscrypt_ctx part without
> > any caps (since that should be static once it's set), but the size
> 
> The PIN cap seems to fit here as the ctx can be considered an "immutable"
> field to some extent.  This means that, if there's a context, it's safe to
> assume it's valid.
> 
> If it's possible to request PIN caps to be revoked (is it?), a client
> could simply do that when a directory is initially encrypted.  After that,
> any client getting PIN caps for it will have the new fscrypt_ctx.
> 

The client doesn't get to request a revocation from another client. It
can request caps itself (or do a regular MDS request), and the MDS may
revoke them from another client if they conflict. We'd need this to be
under some sort of cap with "exclusive" semantics, I think.

> > handling needs to be under the same caps as the traditional size field
> > (Is that Fsx? The rules for this are never quite clear to me.)
> 
>  [ A different question which is maybe a bit OT in this context but that
>    pops up in my mind quite often is how to handle multiple writers to the
>    same file.  It should be OK to have 2 clients doing O_DIRECT as long as
>    they are writing to different encryption blocks but it's still tricky,
>    isn't it?  Can't we end-up with a corrupted file that is completely
>    unrecoverable?  Should O_DIRECT be forbid in encrypted inodes?  Not to
>    mention LAZYIO... ]
> 

If the client doesn't have Fb caps and wants to write a partial block,
then it'll have to do a read/modify/write cycle, and the write will have
to assert that the object version hasn't changed since read.

So basically:

- read the complete block and get the object version
- decrypt it
- modify and reencrypt it
- do a write but preface the OSD write with a version assert that the
data hasn't changed

If we hit the version assertion, just do the whole thing all over again.
Terribly inefficient, but it should be safe.

I'll also note that the rados libraries have a way to assert on an
extent not having changed, so we could potentially do this a bit more
efficiently if we have multiple writers writing different blocks in the
same object.

> > Would it be better to have two different fields here -- fscrypt_auth and
> > fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
> > need to keep all of this info together, but it seemed neater that way.
> > 
> > Thoughts? Opinions? Is this a horrible idea? What would be better?
> 
> I've been looping through this since yesterday and I'm convinced the
> design will need to be something close to what you just described.  I
> can't poke holes in it, but the devil is always on the details.
> 

Indeed. Please do let me know if you see anything. Thanks for the
thoughts on the matter!

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-22 18:18 ceph-mds infrastructure for fscrypt Jeff Layton
  2021-04-23 11:46 ` Luis Henriques
@ 2021-04-29 23:46 ` Patrick Donnelly
  2021-04-30 13:45   ` Jeff Layton
  1 sibling, 1 reply; 12+ messages in thread
From: Patrick Donnelly @ 2021-04-29 23:46 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

Hi Jeff,

On Thu, Apr 22, 2021 at 11:19 AM Jeff Layton <jlayton@redhat.com> wrote:
> At this point, I'm thinking it might be best to unify all of the
> per-inode info into a single field that the MDS would treat as opaque.
> Note that the alternate_names feature would remain more or less
> untouched since it's associated more with dentries than inodes.
>
> The initial version of this field would look something like this:
>
> struct ceph_fscrypt_context {
>         u8                              version;        // == 1
>         struct fscrypt_context_v2       fscrypt_ctx;    // 40 bytes
>         __le32                          blocksize       // 4k for now
>         __le64                          size;           // "real"
> i_size
> };
>
> The MDS would send this along with any size updates (InodeStat, and
> MClientCaps replies). The client would need to send this in cap
> flushes/updates, and we'd also need to extend the SETATTR op too, so the
> client can update this field in truncates (at least).
>
> I don't look forward to having to plumb this into all of the different
> client ops that can create inodes though. What I'm thinking we might
> want to do is expose this field as the "ceph.fscrypt" vxattr.

I think the process for adding the fscrypt bits to the MClientRequest
will be the same as adding alternate_name? In the
ceph_mds_request_head payload. I don't like the idea of stuffing this
data in the xattr map.

> The client can stuff that into the xattr blob when creating a new inode,
> and the MDS can scrape it out of that and move the data into the correct
> field in the inode. A setxattr on this field would update the new field
> too. It's an ugly interface, but shouldn't be too bad to handle and we
> have some precedent for this sort of thing.
>
> The rules for handling the new field in the client would be a bit weird
> though. We'll need to allow it to reading the fscrypt_ctx part without
> any caps (since that should be static once it's set), but the size
> handling needs to be under the same caps as the traditional size field
> (Is that Fsx? The rules for this are never quite clear to me.)
>
> Would it be better to have two different fields here -- fscrypt_auth and
> fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
> need to keep all of this info together, but it seemed neater that way.

I'm not seeing a reason to split the struct.

> Thoughts? Opinions? Is this a horrible idea? What would be better?
>
> Thanks,
> --
> Jeff Layton <jlayton@redhat.com>
>
> [1]: latest draft was posted here:
> https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#t
> [2]: https://github.com/ceph/ceph/pull/37297
> [3]:
> https://github.com/ceph/ceph/commit/7fe1c57846a42443f0258fd877d7166f33fd596f
> [4]:
> https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
>
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-29 23:46 ` Patrick Donnelly
@ 2021-04-30 13:45   ` Jeff Layton
  2021-04-30 14:20     ` Patrick Donnelly
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2021-04-30 13:45 UTC (permalink / raw)
  To: Patrick Donnelly
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Thu, 2021-04-29 at 16:46 -0700, Patrick Donnelly wrote:
> Hi Jeff,
> 
> On Thu, Apr 22, 2021 at 11:19 AM Jeff Layton <jlayton@redhat.com> wrote:
> > At this point, I'm thinking it might be best to unify all of the
> > per-inode info into a single field that the MDS would treat as opaque.
> > Note that the alternate_names feature would remain more or less
> > untouched since it's associated more with dentries than inodes.
> > 
> > The initial version of this field would look something like this:
> > 
> > struct ceph_fscrypt_context {
> >         u8                              version;        // == 1
> >         struct fscrypt_context_v2       fscrypt_ctx;    // 40 bytes
> >         __le32                          blocksize       // 4k for now
> >         __le64                          size;           // "real"
> > i_size
> > };
> > 
> > The MDS would send this along with any size updates (InodeStat, and
> > MClientCaps replies). The client would need to send this in cap
> > flushes/updates, and we'd also need to extend the SETATTR op too, so the
> > client can update this field in truncates (at least).
> > 
> > I don't look forward to having to plumb this into all of the different
> > client ops that can create inodes though. What I'm thinking we might
> > want to do is expose this field as the "ceph.fscrypt" vxattr.
> 
> I think the process for adding the fscrypt bits to the MClientRequest
> will be the same as adding alternate_name? In the
> ceph_mds_request_head payload. I don't like the idea of stuffing this
> data in the xattr map.
> 

That does sound considerably less hacky. I'll look into doing that.

> > The client can stuff that into the xattr blob when creating a new inode,
> > and the MDS can scrape it out of that and move the data into the correct
> > field in the inode. A setxattr on this field would update the new field
> > too. It's an ugly interface, but shouldn't be too bad to handle and we
> > have some precedent for this sort of thing.
> > 
> > The rules for handling the new field in the client would be a bit weird
> > though. We'll need to allow it to reading the fscrypt_ctx part without
> > any caps (since that should be static once it's set), but the size
> > handling needs to be under the same caps as the traditional size field
> > (Is that Fsx? The rules for this are never quite clear to me.)
> > 
> > Would it be better to have two different fields here -- fscrypt_auth and
> > fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
> > need to keep all of this info together, but it seemed neater that way.
> 
> I'm not seeing a reason to split the struct.
> 

What caps should this live under? We have different requirements for
different parts of the struct.

1) fscrypt context: needs to be always available, especially when an
inode is initially instantiated, though it should almost always be
static once it's set. The exception is that an empty directory can grow
a new context when it's first encrypted, and we'll want other clients to
pick up on this change when it occurs.

2) "real" size: needs to be under Fwx, I think (though I need to look
more closely at the truncation path to be sure).

...and that's not even considering what rules we might want in the
future for other info we stuff into here. Given that the MDS needs to
treat this as opaque, what locks/caps should cover this new field?

> > Thoughts? Opinions? Is this a horrible idea? What would be better?
> > 
> > Thanks,
> > --
> > Jeff Layton <jlayton@redhat.com>
> > 
> > [1]: latest draft was posted here:
> > https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#t
> > [2]: https://github.com/ceph/ceph/pull/37297
> > [3]:
> > https://github.com/ceph/ceph/commit/7fe1c57846a42443f0258fd877d7166f33fd596f
> > [4]:
> > https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
> > 
> > 
> 
> 

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-30 13:45   ` Jeff Layton
@ 2021-04-30 14:20     ` Patrick Donnelly
  2021-04-30 14:33       ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Patrick Donnelly @ 2021-04-30 14:20 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, Apr 30, 2021 at 6:45 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > The client can stuff that into the xattr blob when creating a new inode,
> > > and the MDS can scrape it out of that and move the data into the correct
> > > field in the inode. A setxattr on this field would update the new field
> > > too. It's an ugly interface, but shouldn't be too bad to handle and we
> > > have some precedent for this sort of thing.
> > >
> > > The rules for handling the new field in the client would be a bit weird
> > > though. We'll need to allow it to reading the fscrypt_ctx part without
> > > any caps (since that should be static once it's set), but the size
> > > handling needs to be under the same caps as the traditional size field
> > > (Is that Fsx? The rules for this are never quite clear to me.)
> > >
> > > Would it be better to have two different fields here -- fscrypt_auth and
> > > fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
> > > need to keep all of this info together, but it seemed neater that way.
> >
> > I'm not seeing a reason to split the struct.
> >
>
> What caps should this live under? We have different requirements for
> different parts of the struct.
>
> 1) fscrypt context: needs to be always available, especially when an
> inode is initially instantiated, though it should almost always be
> static once it's set. The exception is that an empty directory can grow
> a new context when it's first encrypted, and we'll want other clients to
> pick up on this change when it occurs.

Do clients need to see this when not reading/writing to the file?

> 2) "real" size: needs to be under Fwx, I think (though I need to look
> more closely at the truncation path to be sure).

Frs would need the size as well.

> ...and that's not even considering what rules we might want in the
> future for other info we stuff into here. Given that the MDS needs to
> treat this as opaque, what locks/caps should cover this new field?

I think because the encryption context is used for reads/writes, it
can fall under the same lock domain as the file size. I don't see a
need (yet) for accessing e.g. the encrypted version/blocksize outside
of the Fsx cap. It's good to think about though and I wonder if anyone
else has thoughts on it.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-30 14:20     ` Patrick Donnelly
@ 2021-04-30 14:33       ` Jeff Layton
  2021-04-30 14:45         ` Patrick Donnelly
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2021-04-30 14:33 UTC (permalink / raw)
  To: Patrick Donnelly
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, 2021-04-30 at 07:20 -0700, Patrick Donnelly wrote:
> On Fri, Apr 30, 2021 at 6:45 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > > The client can stuff that into the xattr blob when creating a new inode,
> > > > and the MDS can scrape it out of that and move the data into the correct
> > > > field in the inode. A setxattr on this field would update the new field
> > > > too. It's an ugly interface, but shouldn't be too bad to handle and we
> > > > have some precedent for this sort of thing.
> > > > 
> > > > The rules for handling the new field in the client would be a bit weird
> > > > though. We'll need to allow it to reading the fscrypt_ctx part without
> > > > any caps (since that should be static once it's set), but the size
> > > > handling needs to be under the same caps as the traditional size field
> > > > (Is that Fsx? The rules for this are never quite clear to me.)
> > > > 
> > > > Would it be better to have two different fields here -- fscrypt_auth and
> > > > fscrypt_file? Or maybe, fscrypt_static/_dynamic? We don't necessarily
> > > > need to keep all of this info together, but it seemed neater that way.
> > > 
> > > I'm not seeing a reason to split the struct.
> > > 
> > 
> > What caps should this live under? We have different requirements for
> > different parts of the struct.
> > 
> > 1) fscrypt context: needs to be always available, especially when an
> > inode is initially instantiated, though it should almost always be
> > static once it's set. The exception is that an empty directory can grow
> > a new context when it's first encrypted, and we'll want other clients to
> > pick up on this change when it occurs.
> 
> Do clients need to see this when not reading/writing to the file?
> 

Generally, yes. It's used for regular files when reading/writing,
directories for accessing their contents, and for encrypting/decrypting
symlink contents.


> > 2) "real" size: needs to be under Fwx, I think (though I need to look
> > more closely at the truncation path to be sure).
> 
> Frs would need the size as well.
>

Correct, I was speaking more about what you'd need to cache changes to
it. Reads would indeed need Fr or Fs.

> > ...and that's not even considering what rules we might want in the
> > future for other info we stuff into here. Given that the MDS needs to
> > treat this as opaque, what locks/caps should cover this new field?
> 
> I think because the encryption context is used for reads/writes, it
> can fall under the same lock domain as the file size. I don't see a
> need (yet) for accessing e.g. the encrypted version/blocksize outside
> of the Fsx cap. It's good to think about though and I wonder if anyone
> else has thoughts on it.
> 

We specifically need this for directories and symlinks during pathwalks
too. Eventually we may also want to encrypt certain data for other inode
types as well (e.g. block/char devices). That's less critical though.

The problem with fetching it after the inode is first instantiated is
that we can end up recursing into a separate request while encoding a
path. For instance, see this stack trace that Luis reported:
https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa

While that implementation stored the context in an xattr, the problem
isstill the same if you have to fetch the context in the middle of
building a path. The best solution is just to always ensure it's
available.

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-30 14:33       ` Jeff Layton
@ 2021-04-30 14:45         ` Patrick Donnelly
  2021-04-30 15:03           ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Patrick Donnelly @ 2021-04-30 14:45 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, Apr 30, 2021 at 7:33 AM Jeff Layton <jlayton@redhat.com> wrote:
> We specifically need this for directories and symlinks during pathwalks
> too. Eventually we may also want to encrypt certain data for other inode
> types as well (e.g. block/char devices). That's less critical though.
>
> The problem with fetching it after the inode is first instantiated is
> that we can end up recursing into a separate request while encoding a
> path. For instance, see this stack trace that Luis reported:
> https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
>
> While that implementation stored the context in an xattr, the problem
> isstill the same if you have to fetch the context in the middle of
> building a path. The best solution is just to always ensure it's
> available.

Got it. Splitting the struct makes sense then. The pin cap would be
suitable for the immutable encryption context (if truly
immutable?).Otherwise maybe the Axs cap?


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-30 14:45         ` Patrick Donnelly
@ 2021-04-30 15:03           ` Jeff Layton
  2021-05-01  1:03             ` Patrick Donnelly
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2021-04-30 15:03 UTC (permalink / raw)
  To: Patrick Donnelly
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, 2021-04-30 at 07:45 -0700, Patrick Donnelly wrote:
> On Fri, Apr 30, 2021 at 7:33 AM Jeff Layton <jlayton@redhat.com> wrote:
> > We specifically need this for directories and symlinks during pathwalks
> > too. Eventually we may also want to encrypt certain data for other inode
> > types as well (e.g. block/char devices). That's less critical though.
> > 
> > The problem with fetching it after the inode is first instantiated is
> > that we can end up recursing into a separate request while encoding a
> > path. For instance, see this stack trace that Luis reported:
> > https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
> > 
> > While that implementation stored the context in an xattr, the problem
> > isstill the same if you have to fetch the context in the middle of
> > building a path. The best solution is just to always ensure it's
> > available.
> 
> Got it. Splitting the struct makes sense then. The pin cap would be
> suitable for the immutable encryption context (if truly
> immutable?).Otherwise maybe the Axs cap?
> 

Ok. In that case, then we probably need to put the context blob under
AUTH caps so we can ensure that it's consulted during the permission
checks for pathwalks. The size will need to live under FILE.

Now for the hard part...what do we name these fields?

    fscrypt_context
    fscrypt_size

...or maybe...

    fscrypt_auth
    fscrypt_file

Since they'll be vector blobs, we can version these too so that we can
add other fields later if the need arises (even for non-fscrypt stuff).
Maybe we could consider:

    client_opaque_auth
    client_opaque_file

-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-04-30 15:03           ` Jeff Layton
@ 2021-05-01  1:03             ` Patrick Donnelly
  2021-05-07 13:07               ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Patrick Donnelly @ 2021-05-01  1:03 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, Apr 30, 2021 at 8:04 AM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Fri, 2021-04-30 at 07:45 -0700, Patrick Donnelly wrote:
> > On Fri, Apr 30, 2021 at 7:33 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > We specifically need this for directories and symlinks during pathwalks
> > > too. Eventually we may also want to encrypt certain data for other inode
> > > types as well (e.g. block/char devices). That's less critical though.
> > >
> > > The problem with fetching it after the inode is first instantiated is
> > > that we can end up recursing into a separate request while encoding a
> > > path. For instance, see this stack trace that Luis reported:
> > > https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
> > >
> > > While that implementation stored the context in an xattr, the problem
> > > isstill the same if you have to fetch the context in the middle of
> > > building a path. The best solution is just to always ensure it's
> > > available.
> >
> > Got it. Splitting the struct makes sense then. The pin cap would be
> > suitable for the immutable encryption context (if truly
> > immutable?).Otherwise maybe the Axs cap?
> >
>
> Ok. In that case, then we probably need to put the context blob under
> AUTH caps so we can ensure that it's consulted during the permission
> checks for pathwalks. The size will need to live under FILE.
>
> Now for the hard part...what do we name these fields?
>
>     fscrypt_context
>     fscrypt_size
>
> ...or maybe...
>
>     fscrypt_auth
>     fscrypt_file
>
> Since they'll be vector blobs, we can version these too so that we can
> add other fields later if the need arises (even for non-fscrypt stuff).
> Maybe we could consider:
>
>     client_opaque_auth
>     client_opaque_file

An opaque blob makes sense but you'd want a sentinel indicating it's
an fscrypt blob. Don't think we'd be able to have two competing
use-cases but it'd be nice to have it generic enough for future
encryption libraries maybe.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-05-01  1:03             ` Patrick Donnelly
@ 2021-05-07 13:07               ` Jeff Layton
  2021-05-07 17:15                 ` Patrick Donnelly
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2021-05-07 13:07 UTC (permalink / raw)
  To: Patrick Donnelly
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, 2021-04-30 at 18:03 -0700, Patrick Donnelly wrote:
> On Fri, Apr 30, 2021 at 8:04 AM Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > On Fri, 2021-04-30 at 07:45 -0700, Patrick Donnelly wrote:
> > > On Fri, Apr 30, 2021 at 7:33 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > > We specifically need this for directories and symlinks during pathwalks
> > > > too. Eventually we may also want to encrypt certain data for other inode
> > > > types as well (e.g. block/char devices). That's less critical though.
> > > > 
> > > > The problem with fetching it after the inode is first instantiated is
> > > > that we can end up recursing into a separate request while encoding a
> > > > path. For instance, see this stack trace that Luis reported:
> > > > https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
> > > > 
> > > > While that implementation stored the context in an xattr, the problem
> > > > isstill the same if you have to fetch the context in the middle of
> > > > building a path. The best solution is just to always ensure it's
> > > > available.
> > > 
> > > Got it. Splitting the struct makes sense then. The pin cap would be
> > > suitable for the immutable encryption context (if truly
> > > immutable?).Otherwise maybe the Axs cap?
> > > 
> > 
> > Ok. In that case, then we probably need to put the context blob under
> > AUTH caps so we can ensure that it's consulted during the permission
> > checks for pathwalks. The size will need to live under FILE.
> > 
> > Now for the hard part...what do we name these fields?
> > 
> >     fscrypt_context
> >     fscrypt_size
> > 
> > ...or maybe...
> > 
> >     fscrypt_auth
> >     fscrypt_file
> > 
> > Since they'll be vector blobs, we can version these too so that we can
> > add other fields later if the need arises (even for non-fscrypt stuff).
> > Maybe we could consider:
> > 
> >     client_opaque_auth
> >     client_opaque_file
> 
> An opaque blob makes sense but you'd want a sentinel indicating it's
> an fscrypt blob. Don't think we'd be able to have two competing
> use-cases but it'd be nice to have it generic enough for future
> encryption libraries maybe.
> 

I'm going with fscrypt_auth and fscrypt_file for now. We can rename them
later though if we want. What I'll probably do is just declare a
versioned format for these blobs. The MDS won't care about it, but the
clients can follow that convention.

I've made a bit of progress on this this week (fixing up the encoding
and decoding was a bit of a hassle, fwiw). These fields are associated
with the core inodes. The clients will use SETATTR calls to set them,
though they will also be updated with cap flushes, etc.

I need to be able to validate this feature in userland though and I
don't really want to roll dedicated functions for them. What I may do is
add new vxattrs (ceph.fscrypt_auth and ceph.fscrypt_file) and have those
expose these fields. Doing a setxattr on them will do a SETATTR under
the hood. The alternative is to declare new libcephfs routines for
fetching and setting these.

I'm not terribly crazy about either, but I have a slight preference for
the vxattr since it's something we could replicate in the kernel for
debugging purposes.

Thoughts?
-- 
Jeff Layton <jlayton@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ceph-mds infrastructure for fscrypt
  2021-05-07 13:07               ` Jeff Layton
@ 2021-05-07 17:15                 ` Patrick Donnelly
  0 siblings, 0 replies; 12+ messages in thread
From: Patrick Donnelly @ 2021-05-07 17:15 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dev, Ceph Development, Luis Henriques, Xiubo Li, Gregory Farnum,
	Douglas Fuller

On Fri, May 7, 2021 at 6:07 AM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Fri, 2021-04-30 at 18:03 -0700, Patrick Donnelly wrote:
> > On Fri, Apr 30, 2021 at 8:04 AM Jeff Layton <jlayton@redhat.com> wrote:
> > >
> > > On Fri, 2021-04-30 at 07:45 -0700, Patrick Donnelly wrote:
> > > > On Fri, Apr 30, 2021 at 7:33 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > We specifically need this for directories and symlinks during pathwalks
> > > > > too. Eventually we may also want to encrypt certain data for other inode
> > > > > types as well (e.g. block/char devices). That's less critical though.
> > > > >
> > > > > The problem with fetching it after the inode is first instantiated is
> > > > > that we can end up recursing into a separate request while encoding a
> > > > > path. For instance, see this stack trace that Luis reported:
> > > > > https://lore.kernel.org/ceph-devel/53d5bebb28c1e0cd354a336a56bf103d5e3a6344.camel@kernel.org/T/#m0f7bbed6280623d761b8b4e70671ed568535d7fa
> > > > >
> > > > > While that implementation stored the context in an xattr, the problem
> > > > > isstill the same if you have to fetch the context in the middle of
> > > > > building a path. The best solution is just to always ensure it's
> > > > > available.
> > > >
> > > > Got it. Splitting the struct makes sense then. The pin cap would be
> > > > suitable for the immutable encryption context (if truly
> > > > immutable?).Otherwise maybe the Axs cap?
> > > >
> > >
> > > Ok. In that case, then we probably need to put the context blob under
> > > AUTH caps so we can ensure that it's consulted during the permission
> > > checks for pathwalks. The size will need to live under FILE.
> > >
> > > Now for the hard part...what do we name these fields?
> > >
> > >     fscrypt_context
> > >     fscrypt_size
> > >
> > > ...or maybe...
> > >
> > >     fscrypt_auth
> > >     fscrypt_file
> > >
> > > Since they'll be vector blobs, we can version these too so that we can
> > > add other fields later if the need arises (even for non-fscrypt stuff).
> > > Maybe we could consider:
> > >
> > >     client_opaque_auth
> > >     client_opaque_file
> >
> > An opaque blob makes sense but you'd want a sentinel indicating it's
> > an fscrypt blob. Don't think we'd be able to have two competing
> > use-cases but it'd be nice to have it generic enough for future
> > encryption libraries maybe.
> >
>
> I'm going with fscrypt_auth and fscrypt_file for now. We can rename them
> later though if we want. What I'll probably do is just declare a
> versioned format for these blobs. The MDS won't care about it, but the
> clients can follow that convention.
>
> I've made a bit of progress on this this week (fixing up the encoding
> and decoding was a bit of a hassle, fwiw). These fields are associated
> with the core inodes. The clients will use SETATTR calls to set them,
> though they will also be updated with cap flushes, etc.
>
> I need to be able to validate this feature in userland though and I
> don't really want to roll dedicated functions for them. What I may do is
> add new vxattrs (ceph.fscrypt_auth and ceph.fscrypt_file) and have those
> expose these fields. Doing a setxattr on them will do a SETATTR under
> the hood. The alternative is to declare new libcephfs routines for
> fetching and setting these.

A client-side vxattr sounds good to me.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-07 17:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22 18:18 ceph-mds infrastructure for fscrypt Jeff Layton
2021-04-23 11:46 ` Luis Henriques
2021-04-23 12:27   ` Jeff Layton
2021-04-29 23:46 ` Patrick Donnelly
2021-04-30 13:45   ` Jeff Layton
2021-04-30 14:20     ` Patrick Donnelly
2021-04-30 14:33       ` Jeff Layton
2021-04-30 14:45         ` Patrick Donnelly
2021-04-30 15:03           ` Jeff Layton
2021-05-01  1:03             ` Patrick Donnelly
2021-05-07 13:07               ` Jeff Layton
2021-05-07 17:15                 ` Patrick Donnelly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).