All of lore.kernel.org
 help / color / mirror / Atom feed
* [Virtio-fs] Current file handle status and open questions
@ 2021-03-18 17:09 Max Reitz
  2021-03-23 16:39 ` Sergio Lopez
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Max Reitz @ 2021-03-18 17:09 UTC (permalink / raw)
  To: virtio-fs-list

Hi,

As threatened in our last meeting, I’ve written this mail to give an
overview on where we stand with regards to virtiofsd(-rs) using file
handles.

Technically, this should be a reply to the “Securint file handles”
thread, but this mail is so long I think it’s better to split it off.

There are multiple problems that somehow relate to file handles, the
ones I’m aware of are:

(A) We have a problem with too many FDs open.  To solve it, we could
     attach a file handle to each node, then close the FD (as far as
     possible) and reopen it when needed from the file handle.

(B) We want to allow the guest to use persistent file handles.

(C) For live migration, the problem isn’t even clear yet, but it seems
     like we’ll want to translate nodes into their file handles and
     transmit those and open them again on the destination (at least on
     shared filesystems).

Now every case has its own specific tricky bits:

Case (A) is something that we’d really like to have by default, and it
would need to work all the time during runtime.  So the problem here is
that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
we don’t want that.  One interesting bit is that we don’t need these
file handles to be persistent between virtiofsd invocations.

For (B) we have the problem of needing to protect against a potentially
malicious guest, i.e. that it must not be able to reference files
outside the shared directory.  (Perhaps except for cases where the file
was once reachable, i.e. where a file handle was generated by the guest,
then the file was moved outside of the shared directory, but remains
accessible through the file handle.)  Furthermore, file handles should
really be persistent between virtiofsd invocations.  On the positive
side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
case, because it really is optional.  We could require users to give us
that capability if they want file handles in the guest (and we find no
way to avoid requiring that capability).

(C) needs persistency between source and destination, but on the
positive side, we only need to be able to open file handles during the
in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
only during that phase might not be that big of a deal.


(Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
as you can see, that requirement is weakened for cases (B) and
especially (C).)


As far as I’ve understood, (A) is the case that we want to focus on
first, and the main problem there is that we need to open file handles
without CAP_DAC_READ_SEARCH.

To do that, I at one point proposed a service process that has
CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
probably won’t really be an improvement, because this process too would
probably need to run in the container and so if we can’t give virtiofsd
that capability, we can’t give it to that service process either.

What Miklos proposed was to modify the kernel to allow processes to open
file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
those files are in the process’s scope.  One way to implement this
restriction (in a very restrictive manner) is to only allow opening file
handles that the process has generated before, e.g. by appending a MAC
to every file handle (generated with a process-specific key) and
checking that when opening a handle.  (You would request this MAC with a
new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
recognizes it due to a special file handle type value.)

(Process-specific key = stored in current->files, i.e. the files_struct.
I’m not 100% sure what this is, but I guess this is the structure that
keeps a process’s open file descriptors, and so should generally be
unique to a process, or at least unique to a group of processes that
share the same FDs.)


So far so good.

What I’m now worried about is whether we can make use of that kernel
feature for (B) and (C), too.  I don’t really want to design completely
independent solutions for those problems.

I don’t think it’s important that we come up with exactly plotted
implementations for (B) and (C) now, but a rough sketch on how we might
approach them would be nice.

What we need for (B) is persistency between virtiofsd invocations.  For
(C), we can decide whether we just don’t want to care (i.e., we simply
require CAP_DAC_READ_SEARCH during the in-migrate phase on the
destination), or whether we need some way to transfer the MAC key from
the source to the destination.  (But if we want (B) with live migration,
we will need to transfer the key to the destination.)

Let’s talk about (B).

I can’t imagine a design where the kernel could create persistent keys
without having a user space process store them.  Well, except something
where you measure a program image to derive its specific MAC key from
some persistent platform key, but I don’t think that’s feasible (for
multiple reasons), and it’d also make it impossible to migrate a VM to a
different host and keep guest file handles valid.

So I think we need some way to let a user space process upload a key for
some other key to use.  Vivek has proposed using keyctl(2) for that.
I’m not exactly sure how we’d approach that in practice, though...  I
suppose we’d add the key to some global keyring (write-only), and then
somehow we’d need to let virtiofsd inform the kernel which key to use
for its MACs.  (Perhaps we can do the last bit with keyctl_search(), by
copying a key from the upload keyring to a process-specific keyring,
perhaps even a keyring that exists only to hold the process’s MAC key?)

A problem here is that when a process (a) uploads a key for some other
process (b) to use, you can effectively give (b) CAP_DAC_READ_SEARCH;
because if (a) then shared the key with (b), (b) could forge all of its
MACs and thus bypass the MAC protection (so it effectively gains
CAP_DAC_READ_SEARCH).  To be allowed this, (a) would need a level of
privilege that’s equivalent to being able to grant arbitrary processes
CAP_DAC_READ_SEARCH.  Miklos suggested that we might as well fall back
to requiring to CAP_SYS_ADMIN, and I think that’s reasonable.

The next question is, how would we require this capability for a key
upload?  I don’t think keyctl currently has a solution for that.  I
suppose we would need to have a special global keyring for such MACs and
only allow privileged processes to upload keys there, so once a process
asks for a specific key to be used, we can fetch it from that keyring
and be sure that key has been uploaded by a privileged application.

Then, there’d be the question of how an application would select its key
from the upload keyring.  I think we would need to prevent it from
selecting a key that’s intended for some other application, because then
it could use that other application’s file handles...  Which I don’t
think is a problem if that other application willingly shares that
ability (because in such a case, that other application might as well
just do I/O on the first application’s behalf), but other applications’
keys must not be reachable by guessing.  So perhaps a key description of
high entropy would do (that’s given to keyctl_search()).

I think Dave also suggested using a certificate-based system, but I’m
afraid I don’t fully understand it.  As far as I had guessed, that would
mean the privileged process only configures what CAs are to be trusted,
and then every process could upload its own MAC key along with a
certificate from a trusted CA.  That would get around the whole keyring
stuff, but it means you’d need to be able to provide trusted CAs and
we’d need to check those certificates in the kernel.  I have no idea how
we’d do that, I don’t think keyctl could help us there.

Furthermore, the obvious problem with my interpretation of the
certificate proposal is that the uploading application would see its own
key, which would be bad.  I suspect it would need to be encrypted
somehow, but if we do that, we might as well skip the certificate part
and just let the privileged process upload one or more global encryption
keys, right?  (Applications could then upload encrypted keys with nonces
and a hash, I guess, so the kernel can verify that the key is valid.)


So, overall it seems like it may be workable to extend the in-kernel MAC
verification to allow for persistent keys, but I still have some open
questions, and I don’t know whether I should just defer them until we
get to the point where we need persistency.

(Of note: If we implement something where a user space process (or
multiple in conjunction) can arbitrarily choose a MAC key, then this
will also be usable for live migration, because you can continue to use
the key from the source on the destination.)


Final minor question that doesn’t really fit in fully elsewhere: When
generating a MAC over a file handle, should the mount ID be included?
I’m worried that this might definitely break persistency, but I think we
should include some FS ID.  Maybe the kernel should query the FS UUID
for this MAC calculation, and use that instead of the mount ID?


Thanks for reading and all your help,

Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-18 17:09 [Virtio-fs] Current file handle status and open questions Max Reitz
@ 2021-03-23 16:39 ` Sergio Lopez
  2021-03-24  7:40   ` Max Reitz
  2021-03-23 19:52 ` Vivek Goyal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Sergio Lopez @ 2021-03-23 16:39 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

[-- Attachment #1: Type: text/plain, Size: 4626 bytes --]

On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> Hi,
> 
> As threatened in our last meeting, I’ve written this mail to give an
> overview on where we stand with regards to virtiofsd(-rs) using file
> handles.
> 
> Technically, this should be a reply to the “Securint file handles”
> thread, but this mail is so long I think it’s better to split it off.
> 
> There are multiple problems that somehow relate to file handles, the
> ones I’m aware of are:
> 
> (A) We have a problem with too many FDs open.  To solve it, we could
>     attach a file handle to each node, then close the FD (as far as
>     possible) and reopen it when needed from the file handle.
> 
> (B) We want to allow the guest to use persistent file handles.
> 
> (C) For live migration, the problem isn’t even clear yet, but it seems
>     like we’ll want to translate nodes into their file handles and
>     transmit those and open them again on the destination (at least on
>     shared filesystems).
> 
> Now every case has its own specific tricky bits:
> 
> Case (A) is something that we’d really like to have by default, and it
> would need to work all the time during runtime.  So the problem here is
> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> we don’t want that.  One interesting bit is that we don’t need these
> file handles to be persistent between virtiofsd invocations.
> 
> For (B) we have the problem of needing to protect against a potentially
> malicious guest, i.e. that it must not be able to reference files
> outside the shared directory.  (Perhaps except for cases where the file
> was once reachable, i.e. where a file handle was generated by the guest,
> then the file was moved outside of the shared directory, but remains
> accessible through the file handle.)  Furthermore, file handles should
> really be persistent between virtiofsd invocations.  On the positive
> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> case, because it really is optional.  We could require users to give us
> that capability if they want file handles in the guest (and we find no
> way to avoid requiring that capability).
> 
> (C) needs persistency between source and destination, but on the
> positive side, we only need to be able to open file handles during the
> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> only during that phase might not be that big of a deal.
> 
> 
> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> as you can see, that requirement is weakened for cases (B) and
> especially (C).)
> 
> 
> As far as I’ve understood, (A) is the case that we want to focus on
> first, and the main problem there is that we need to open file handles
> without CAP_DAC_READ_SEARCH.
> 
> To do that, I at one point proposed a service process that has
> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> probably won’t really be an improvement, because this process too would
> probably need to run in the container and so if we can’t give virtiofsd
> that capability, we can’t give it to that service process either.
> 
> What Miklos proposed was to modify the kernel to allow processes to open
> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> those files are in the process’s scope.  One way to implement this
> restriction (in a very restrictive manner) is to only allow opening file
> handles that the process has generated before, e.g. by appending a MAC
> to every file handle (generated with a process-specific key) and
> checking that when opening a handle.  (You would request this MAC with a
> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> recognizes it due to a special file handle type value.)
> 
> (Process-specific key = stored in current->files, i.e. the files_struct.
> I’m not 100% sure what this is, but I guess this is the structure that
> keeps a process’s open file descriptors, and so should generally be
> unique to a process, or at least unique to a group of processes that
> share the same FDs.)

Does this mean that if we have, let's say, 3 virtiofsd instances each
one "remembering" (because there's a reference in the dentry cache of
the guest they are servicing) 10.000 files, the Host's kernel would
need to keep 30.000 entries holding the host_file+MAC information? Or
is there some kind of VFS trickery at play here that allows to do the
check without holding all that information?

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-18 17:09 [Virtio-fs] Current file handle status and open questions Max Reitz
  2021-03-23 16:39 ` Sergio Lopez
@ 2021-03-23 19:52 ` Vivek Goyal
  2021-03-24  8:06   ` Max Reitz
  2021-03-24 11:51 ` Stefan Hajnoczi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2021-03-23 19:52 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> Hi,
> 
> As threatened in our last meeting, I’ve written this mail to give an
> overview on where we stand with regards to virtiofsd(-rs) using file
> handles.
> 
> Technically, this should be a reply to the “Securint file handles”
> thread, but this mail is so long I think it’s better to split it off.
> 
> There are multiple problems that somehow relate to file handles, the
> ones I’m aware of are:
> 
> (A) We have a problem with too many FDs open.  To solve it, we could
>     attach a file handle to each node, then close the FD (as far as
>     possible) and reopen it when needed from the file handle.
> 
> (B) We want to allow the guest to use persistent file handles.
> 
> (C) For live migration, the problem isn’t even clear yet, but it seems
>     like we’ll want to translate nodes into their file handles and
>     transmit those and open them again on the destination (at least on
>     shared filesystems).
> 
> Now every case has its own specific tricky bits:
> 
> Case (A) is something that we’d really like to have by default, and it
> would need to work all the time during runtime.  So the problem here is
> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> we don’t want that.  One interesting bit is that we don’t need these
> file handles to be persistent between virtiofsd invocations.
> 
> For (B) we have the problem of needing to protect against a potentially
> malicious guest, i.e. that it must not be able to reference files
> outside the shared directory.  (Perhaps except for cases where the file
> was once reachable, i.e. where a file handle was generated by the guest,
> then the file was moved outside of the shared directory, but remains
> accessible through the file handle.)  Furthermore, file handles should
> really be persistent between virtiofsd invocations.  On the positive
> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> case, because it really is optional.  We could require users to give us
> that capability if they want file handles in the guest (and we find no
> way to avoid requiring that capability).
> 
> (C) needs persistency between source and destination, but on the
> positive side, we only need to be able to open file handles during the
> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> only during that phase might not be that big of a deal.
> 
> 
> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> as you can see, that requirement is weakened for cases (B) and
> especially (C).)
> 
> 
> As far as I’ve understood, (A) is the case that we want to focus on
> first, and the main problem there is that we need to open file handles
> without CAP_DAC_READ_SEARCH.
> 
> To do that, I at one point proposed a service process that has
> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> probably won’t really be an improvement, because this process too would
> probably need to run in the container and so if we can’t give virtiofsd
> that capability, we can’t give it to that service process either.
> 
> What Miklos proposed was to modify the kernel to allow processes to open
> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> those files are in the process’s scope.  One way to implement this
> restriction (in a very restrictive manner) is to only allow opening file
> handles that the process has generated before, e.g. by appending a MAC
> to every file handle (generated with a process-specific key) and
> checking that when opening a handle.  (You would request this MAC with a
> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> recognizes it due to a special file handle type value.)
> 
> (Process-specific key = stored in current->files, i.e. the files_struct.
> I’m not 100% sure what this is, but I guess this is the structure that
> keeps a process’s open file descriptors, and so should generally be
> unique to a process, or at least unique to a group of processes that
> share the same FDs.)

People are now discussing the idea of doing crash recovery for virtiofsd
and pass all the information to systemd(or other process) and get it
back once virtiofsd restarts. So tying the key to process life time
might not work with crash recovery.

May be we can have a kernel system wide keyring and kernel can generate
a key and load there and that key can be used for the lifetime of kernel.
And every reboot will generate a new key.

I don't know much about encryption. How does HMAC key look like. Typically
with asymmetric keys, we sign kernel modules during build outside the
kernel and build public key into the kernel. Is this HMAC key a single
key which can be used both for encoding and decoding operation.

If same key can be used both for encoding/decoding, then question
arises, what are the chances that this key can leak to user space
and then user space can artifically encode file handles and be
able to open any files. I guess that's the reason TPM kind of things
are there to entrust that hardware with private key and nobody else
can get to it. (Sorry, I know very little about cryptography, so lot
of above might be wrong).

Thanks
Vivek


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-23 16:39 ` Sergio Lopez
@ 2021-03-24  7:40   ` Max Reitz
  2021-03-24  8:51     ` Sergio Lopez
  0 siblings, 1 reply; 17+ messages in thread
From: Max Reitz @ 2021-03-24  7:40 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: virtio-fs-list

On 23.03.21 17:39, Sergio Lopez wrote:
> On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
>> Hi,
>>
>> As threatened in our last meeting, I’ve written this mail to give an
>> overview on where we stand with regards to virtiofsd(-rs) using file
>> handles.
>>
>> Technically, this should be a reply to the “Securint file handles”
>> thread, but this mail is so long I think it’s better to split it off.
>>
>> There are multiple problems that somehow relate to file handles, the
>> ones I’m aware of are:
>>
>> (A) We have a problem with too many FDs open.  To solve it, we could
>>      attach a file handle to each node, then close the FD (as far as
>>      possible) and reopen it when needed from the file handle.
>>
>> (B) We want to allow the guest to use persistent file handles.
>>
>> (C) For live migration, the problem isn’t even clear yet, but it seems
>>      like we’ll want to translate nodes into their file handles and
>>      transmit those and open them again on the destination (at least on
>>      shared filesystems).
>>
>> Now every case has its own specific tricky bits:
>>
>> Case (A) is something that we’d really like to have by default, and it
>> would need to work all the time during runtime.  So the problem here is
>> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
>> we don’t want that.  One interesting bit is that we don’t need these
>> file handles to be persistent between virtiofsd invocations.
>>
>> For (B) we have the problem of needing to protect against a potentially
>> malicious guest, i.e. that it must not be able to reference files
>> outside the shared directory.  (Perhaps except for cases where the file
>> was once reachable, i.e. where a file handle was generated by the guest,
>> then the file was moved outside of the shared directory, but remains
>> accessible through the file handle.)  Furthermore, file handles should
>> really be persistent between virtiofsd invocations.  On the positive
>> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
>> case, because it really is optional.  We could require users to give us
>> that capability if they want file handles in the guest (and we find no
>> way to avoid requiring that capability).
>>
>> (C) needs persistency between source and destination, but on the
>> positive side, we only need to be able to open file handles during the
>> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
>> only during that phase might not be that big of a deal.
>>
>>
>> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
>> as you can see, that requirement is weakened for cases (B) and
>> especially (C).)
>>
>>
>> As far as I’ve understood, (A) is the case that we want to focus on
>> first, and the main problem there is that we need to open file handles
>> without CAP_DAC_READ_SEARCH.
>>
>> To do that, I at one point proposed a service process that has
>> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
>> probably won’t really be an improvement, because this process too would
>> probably need to run in the container and so if we can’t give virtiofsd
>> that capability, we can’t give it to that service process either.
>>
>> What Miklos proposed was to modify the kernel to allow processes to open
>> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
>> those files are in the process’s scope.  One way to implement this
>> restriction (in a very restrictive manner) is to only allow opening file
>> handles that the process has generated before, e.g. by appending a MAC
>> to every file handle (generated with a process-specific key) and
>> checking that when opening a handle.  (You would request this MAC with a
>> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
>> recognizes it due to a special file handle type value.)
>>
>> (Process-specific key = stored in current->files, i.e. the files_struct.
>> I’m not 100% sure what this is, but I guess this is the structure that
>> keeps a process’s open file descriptors, and so should generally be
>> unique to a process, or at least unique to a group of processes that
>> share the same FDs.)
> 
> Does this mean that if we have, let's say, 3 virtiofsd instances each
> one "remembering" (because there's a reference in the dentry cache of
> the guest they are servicing) 10.000 files, the Host's kernel would
> need to keep 30.000 entries holding the host_file+MAC information? Or
> is there some kind of VFS trickery at play here that allows to do the
> check without holding all that information?

No, you only store one MAC key per process.  The MAC is appended to the 
file handle that is handed back to virtiofsd, so the kernel doesn’t need 
to store it.

Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-23 19:52 ` Vivek Goyal
@ 2021-03-24  8:06   ` Max Reitz
  2021-03-24 13:01     ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: Max Reitz @ 2021-03-24  8:06 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list

On 23.03.21 20:52, Vivek Goyal wrote:
> On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
>> Hi,
>>
>> As threatened in our last meeting, I’ve written this mail to give an
>> overview on where we stand with regards to virtiofsd(-rs) using file
>> handles.
>>
>> Technically, this should be a reply to the “Securint file handles”
>> thread, but this mail is so long I think it’s better to split it off.
>>
>> There are multiple problems that somehow relate to file handles, the
>> ones I’m aware of are:
>>
>> (A) We have a problem with too many FDs open.  To solve it, we could
>>      attach a file handle to each node, then close the FD (as far as
>>      possible) and reopen it when needed from the file handle.
>>
>> (B) We want to allow the guest to use persistent file handles.
>>
>> (C) For live migration, the problem isn’t even clear yet, but it seems
>>      like we’ll want to translate nodes into their file handles and
>>      transmit those and open them again on the destination (at least on
>>      shared filesystems).
>>
>> Now every case has its own specific tricky bits:
>>
>> Case (A) is something that we’d really like to have by default, and it
>> would need to work all the time during runtime.  So the problem here is
>> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
>> we don’t want that.  One interesting bit is that we don’t need these
>> file handles to be persistent between virtiofsd invocations.
>>
>> For (B) we have the problem of needing to protect against a potentially
>> malicious guest, i.e. that it must not be able to reference files
>> outside the shared directory.  (Perhaps except for cases where the file
>> was once reachable, i.e. where a file handle was generated by the guest,
>> then the file was moved outside of the shared directory, but remains
>> accessible through the file handle.)  Furthermore, file handles should
>> really be persistent between virtiofsd invocations.  On the positive
>> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
>> case, because it really is optional.  We could require users to give us
>> that capability if they want file handles in the guest (and we find no
>> way to avoid requiring that capability).
>>
>> (C) needs persistency between source and destination, but on the
>> positive side, we only need to be able to open file handles during the
>> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
>> only during that phase might not be that big of a deal.
>>
>>
>> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
>> as you can see, that requirement is weakened for cases (B) and
>> especially (C).)
>>
>>
>> As far as I’ve understood, (A) is the case that we want to focus on
>> first, and the main problem there is that we need to open file handles
>> without CAP_DAC_READ_SEARCH.
>>
>> To do that, I at one point proposed a service process that has
>> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
>> probably won’t really be an improvement, because this process too would
>> probably need to run in the container and so if we can’t give virtiofsd
>> that capability, we can’t give it to that service process either.
>>
>> What Miklos proposed was to modify the kernel to allow processes to open
>> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
>> those files are in the process’s scope.  One way to implement this
>> restriction (in a very restrictive manner) is to only allow opening file
>> handles that the process has generated before, e.g. by appending a MAC
>> to every file handle (generated with a process-specific key) and
>> checking that when opening a handle.  (You would request this MAC with a
>> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
>> recognizes it due to a special file handle type value.)
>>
>> (Process-specific key = stored in current->files, i.e. the files_struct.
>> I’m not 100% sure what this is, but I guess this is the structure that
>> keeps a process’s open file descriptors, and so should generally be
>> unique to a process, or at least unique to a group of processes that
>> share the same FDs.)
> 
> People are now discussing the idea of doing crash recovery for virtiofsd
> and pass all the information to systemd(or other process) and get it
> back once virtiofsd restarts. So tying the key to process life time
> might not work with crash recovery.

Well, this would be solved by having the "persistency between virtiofsd 
invocations" I go on to sketch.

> May be we can have a kernel system wide keyring and kernel can generate
> a key and load there and that key can be used for the lifetime of kernel.
> And every reboot will generate a new key.

Hm.  The problem I see with a system-wide key, i.e. one that isn’t 
process-specific, is that processes could exchange file handles among 
each other.  Probably not a real problem, because again, if processes 
can share file handles, they can probably do I/O for each other.  OTOH, 
a file handle persists after a process has exited, so maybe this isn’t 
truly equivalent.

The nice thing about a global key would be that it would make true 
persistency easier, too, because you could let it be uploaded from an 
arbitrary user space process and don’t have to deal with having to 
assign it to virtiofsd somehow.

But the question remains whether it is a good idea to have a single key 
for all processes.  I feel uneasy about it.


Or did you mean that the kernel would generate a new key per process, 
put it into the keyring, and then after a restart the application could 
select its old key again?  I don’t know how you’d verify that the 
application has the right to do so.  I think this would have the same 
problem as a global key, because a process could quit and let some 
different application use its key and its file handles.

> I don't know much about encryption. How does HMAC key look like. Typically
> with asymmetric keys, we sign kernel modules during build outside the
> kernel and build public key into the kernel. Is this HMAC key a single
> key which can be used both for encoding and decoding operation.

Yes.  You basically combine a single secret key with the message to be 
authenticated, hash it twice, and the resulting hash is the MAC.

For verification, you do exactly the same process (i.e., it’s the same 
key) and check whether the MAC matches.

> If same key can be used both for encoding/decoding, then question
> arises, what are the chances that this key can leak to user space
> and then user space can artifically encode file handles and be
> able to open any files. I guess that's the reason TPM kind of things
> are there to entrust that hardware with private key and nobody else
> can get to it. (Sorry, I know very little about cryptography, so lot
> of above might be wrong).

I don’t understand this question.  We would have the same problem with 
an asymmetric key, right?  The private key could leak to user space.

The chances are as high as it is for user space applications to read 
kernel space.  I guess when you upload a key with keyctl() so that no 
other process may read it, you rely on the same protection.

Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-24  7:40   ` Max Reitz
@ 2021-03-24  8:51     ` Sergio Lopez
  0 siblings, 0 replies; 17+ messages in thread
From: Sergio Lopez @ 2021-03-24  8:51 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

[-- Attachment #1: Type: text/plain, Size: 5479 bytes --]

On Wed, Mar 24, 2021 at 08:40:26AM +0100, Max Reitz wrote:
> On 23.03.21 17:39, Sergio Lopez wrote:
> > On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> > > Hi,
> > > 
> > > As threatened in our last meeting, I’ve written this mail to give an
> > > overview on where we stand with regards to virtiofsd(-rs) using file
> > > handles.
> > > 
> > > Technically, this should be a reply to the “Securint file handles”
> > > thread, but this mail is so long I think it’s better to split it off.
> > > 
> > > There are multiple problems that somehow relate to file handles, the
> > > ones I’m aware of are:
> > > 
> > > (A) We have a problem with too many FDs open.  To solve it, we could
> > >      attach a file handle to each node, then close the FD (as far as
> > >      possible) and reopen it when needed from the file handle.
> > > 
> > > (B) We want to allow the guest to use persistent file handles.
> > > 
> > > (C) For live migration, the problem isn’t even clear yet, but it seems
> > >      like we’ll want to translate nodes into their file handles and
> > >      transmit those and open them again on the destination (at least on
> > >      shared filesystems).
> > > 
> > > Now every case has its own specific tricky bits:
> > > 
> > > Case (A) is something that we’d really like to have by default, and it
> > > would need to work all the time during runtime.  So the problem here is
> > > that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> > > we don’t want that.  One interesting bit is that we don’t need these
> > > file handles to be persistent between virtiofsd invocations.
> > > 
> > > For (B) we have the problem of needing to protect against a potentially
> > > malicious guest, i.e. that it must not be able to reference files
> > > outside the shared directory.  (Perhaps except for cases where the file
> > > was once reachable, i.e. where a file handle was generated by the guest,
> > > then the file was moved outside of the shared directory, but remains
> > > accessible through the file handle.)  Furthermore, file handles should
> > > really be persistent between virtiofsd invocations.  On the positive
> > > side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> > > case, because it really is optional.  We could require users to give us
> > > that capability if they want file handles in the guest (and we find no
> > > way to avoid requiring that capability).
> > > 
> > > (C) needs persistency between source and destination, but on the
> > > positive side, we only need to be able to open file handles during the
> > > in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> > > only during that phase might not be that big of a deal.
> > > 
> > > 
> > > (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> > > as you can see, that requirement is weakened for cases (B) and
> > > especially (C).)
> > > 
> > > 
> > > As far as I’ve understood, (A) is the case that we want to focus on
> > > first, and the main problem there is that we need to open file handles
> > > without CAP_DAC_READ_SEARCH.
> > > 
> > > To do that, I at one point proposed a service process that has
> > > CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> > > probably won’t really be an improvement, because this process too would
> > > probably need to run in the container and so if we can’t give virtiofsd
> > > that capability, we can’t give it to that service process either.
> > > 
> > > What Miklos proposed was to modify the kernel to allow processes to open
> > > file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> > > those files are in the process’s scope.  One way to implement this
> > > restriction (in a very restrictive manner) is to only allow opening file
> > > handles that the process has generated before, e.g. by appending a MAC
> > > to every file handle (generated with a process-specific key) and
> > > checking that when opening a handle.  (You would request this MAC with a
> > > new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> > > recognizes it due to a special file handle type value.)
> > > 
> > > (Process-specific key = stored in current->files, i.e. the files_struct.
> > > I’m not 100% sure what this is, but I guess this is the structure that
> > > keeps a process’s open file descriptors, and so should generally be
> > > unique to a process, or at least unique to a group of processes that
> > > share the same FDs.)
> > 
> > Does this mean that if we have, let's say, 3 virtiofsd instances each
> > one "remembering" (because there's a reference in the dentry cache of
> > the guest they are servicing) 10.000 files, the Host's kernel would
> > need to keep 30.000 entries holding the host_file+MAC information? Or
> > is there some kind of VFS trickery at play here that allows to do the
> > check without holding all that information?
> 
> No, you only store one MAC key per process.  The MAC is appended to the file
> handle that is handed back to virtiofsd, so the kernel doesn’t need to store
> it.

Ah, that sounds great, thanks for the clarification. I'm mainly
concerned about the need to keep an FD open per each "known" (as in
present in the guest's dentry cache) file, and this seems like a good
solution for it.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-18 17:09 [Virtio-fs] Current file handle status and open questions Max Reitz
  2021-03-23 16:39 ` Sergio Lopez
  2021-03-23 19:52 ` Vivek Goyal
@ 2021-03-24 11:51 ` Stefan Hajnoczi
  2021-03-24 12:28   ` Max Reitz
  2021-03-24 16:44   ` Dr. David Alan Gilbert
  2021-03-24 16:54 ` Dr. David Alan Gilbert
  2021-04-13 14:05 ` Vivek Goyal
  4 siblings, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2021-03-24 11:51 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> So, overall it seems like it may be workable to extend the in-kernel MAC
> verification to allow for persistent keys, but I still have some open
> questions, and I don’t know whether I should just defer them until we
> get to the point where we need persistency.

It seems worth refining these ideas a little bit further until they
reach solid ground. Then we can be sure that there is at least one
possible solution. At the moment it seems like most the components of a
solution are known but it's not clear how they all fit together - and
they have a nasty tendency of failing to meet the requirements if they
aren't put together in exactly the right way :).

> (Of note: If we implement something where a user space process (or
> multiple in conjunction) can arbitrarily choose a MAC key, then this
> will also be usable for live migration, because you can continue to use
> the key from the source on the destination.)

Definitely. Taking migration into account is worthwhile. One question
about that:

Are Linux file handles transferrable between systems? Basic
configurations that come to mind:
1. XFS, brtfs, ext4 on SAN (e.g. FibreChannel SCSI LUN)
2. NFS

I assumed they were in previous discussions but has anyone checked the
file handle implementations to see whether this is really the case?

> Final minor question that doesn’t really fit in fully elsewhere: When
> generating a MAC over a file handle, should the mount ID be included?
> I’m worried that this might definitely break persistency, but I think we
> should include some FS ID.  Maybe the kernel should query the FS UUID
> for this MAC calculation, and use that instead of the mount ID?

This is a good point. If the file handle is not tied to a particular
file system mount then an application can stash a well-known file handle
(e.g. /) from one mount it has full access to and then use open a file
on a mount that it does not have full directory treeaccess to (e.g. a
bind mount/sub-tree?).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-24 11:51 ` Stefan Hajnoczi
@ 2021-03-24 12:28   ` Max Reitz
  2021-03-24 13:08     ` Stefan Hajnoczi
  2021-03-24 16:44   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 17+ messages in thread
From: Max Reitz @ 2021-03-24 12:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs-list

On 24.03.21 12:51, Stefan Hajnoczi wrote:
> On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
>> So, overall it seems like it may be workable to extend the in-kernel MAC
>> verification to allow for persistent keys, but I still have some open
>> questions, and I don’t know whether I should just defer them until we
>> get to the point where we need persistency.
> 
> It seems worth refining these ideas a little bit further until they
> reach solid ground. Then we can be sure that there is at least one
> possible solution. At the moment it seems like most the components of a
> solution are known but it's not clear how they all fit together - and
> they have a nasty tendency of failing to meet the requirements if they
> aren't put together in exactly the right way :).
> 
>> (Of note: If we implement something where a user space process (or
>> multiple in conjunction) can arbitrarily choose a MAC key, then this
>> will also be usable for live migration, because you can continue to use
>> the key from the source on the destination.)
> 
> Definitely. Taking migration into account is worthwhile. One question
> about that:
> 
> Are Linux file handles transferrable between systems? Basic
> configurations that come to mind:
> 1. XFS, brtfs, ext4 on SAN (e.g. FibreChannel SCSI LUN)
> 2. NFS
> 
> I assumed they were in previous discussions but has anyone checked the
> file handle implementations to see whether this is really the case?

I’m not exactly sure what you mean.  I think you’re asking whether for a 
single (shared) filesystem a file handle always works independently of 
which Linux instance is using it?

If so, depends on the filesystem, I guess, but for the ones where I know 
how file handles work, yes, it does.

XFS and ext4 just have an inode ID plus a generation ID.  For XFS, the 
generation ID is stored in the inode structure (I don’t know about ext4, 
but I assume it’s the same), so this handle has no information that 
would change at runtime, it’s just about on-disk information.

I’ve described the NFS handle structure before, the only notable thing 
here is the FS ID it contains.  If it’s a UUID, I would assume it is 
transferrable, because this UUID is part of the on-disk filesystem.  If 
such a UUID is not available, AFAIA NFS may fall back to a device ID, 
which of course may not be transferrable, but that’s a fallback that I 
haven’t seen happen.

>> Final minor question that doesn’t really fit in fully elsewhere: When
>> generating a MAC over a file handle, should the mount ID be included?
>> I’m worried that this might definitely break persistency, but I think we
>> should include some FS ID.  Maybe the kernel should query the FS UUID
>> for this MAC calculation, and use that instead of the mount ID?
> 
> This is a good point. If the file handle is not tied to a particular
> file system mount then an application can stash a well-known file handle
> (e.g. /) from one mount it has full access to and then use open a file
> on a mount that it does not have full directory treeaccess to (e.g. a
> bind mount/sub-tree?).

Oh, that’s absolutely true.  / on XFS at least has the generation ID 0, 
so this would absolutely be usable at least for other XFS filesystems.

So we probably do need to include some FS ID, like the UUID.

Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-24  8:06   ` Max Reitz
@ 2021-03-24 13:01     ` Vivek Goyal
  0 siblings, 0 replies; 17+ messages in thread
From: Vivek Goyal @ 2021-03-24 13:01 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

On Wed, Mar 24, 2021 at 09:06:52AM +0100, Max Reitz wrote:
> On 23.03.21 20:52, Vivek Goyal wrote:
> > On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> > > Hi,
> > > 
> > > As threatened in our last meeting, I’ve written this mail to give an
> > > overview on where we stand with regards to virtiofsd(-rs) using file
> > > handles.
> > > 
> > > Technically, this should be a reply to the “Securint file handles”
> > > thread, but this mail is so long I think it’s better to split it off.
> > > 
> > > There are multiple problems that somehow relate to file handles, the
> > > ones I’m aware of are:
> > > 
> > > (A) We have a problem with too many FDs open.  To solve it, we could
> > >      attach a file handle to each node, then close the FD (as far as
> > >      possible) and reopen it when needed from the file handle.
> > > 
> > > (B) We want to allow the guest to use persistent file handles.
> > > 
> > > (C) For live migration, the problem isn’t even clear yet, but it seems
> > >      like we’ll want to translate nodes into their file handles and
> > >      transmit those and open them again on the destination (at least on
> > >      shared filesystems).
> > > 
> > > Now every case has its own specific tricky bits:
> > > 
> > > Case (A) is something that we’d really like to have by default, and it
> > > would need to work all the time during runtime.  So the problem here is
> > > that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> > > we don’t want that.  One interesting bit is that we don’t need these
> > > file handles to be persistent between virtiofsd invocations.
> > > 
> > > For (B) we have the problem of needing to protect against a potentially
> > > malicious guest, i.e. that it must not be able to reference files
> > > outside the shared directory.  (Perhaps except for cases where the file
> > > was once reachable, i.e. where a file handle was generated by the guest,
> > > then the file was moved outside of the shared directory, but remains
> > > accessible through the file handle.)  Furthermore, file handles should
> > > really be persistent between virtiofsd invocations.  On the positive
> > > side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> > > case, because it really is optional.  We could require users to give us
> > > that capability if they want file handles in the guest (and we find no
> > > way to avoid requiring that capability).
> > > 
> > > (C) needs persistency between source and destination, but on the
> > > positive side, we only need to be able to open file handles during the
> > > in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> > > only during that phase might not be that big of a deal.
> > > 
> > > 
> > > (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> > > as you can see, that requirement is weakened for cases (B) and
> > > especially (C).)
> > > 
> > > 
> > > As far as I’ve understood, (A) is the case that we want to focus on
> > > first, and the main problem there is that we need to open file handles
> > > without CAP_DAC_READ_SEARCH.
> > > 
> > > To do that, I at one point proposed a service process that has
> > > CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> > > probably won’t really be an improvement, because this process too would
> > > probably need to run in the container and so if we can’t give virtiofsd
> > > that capability, we can’t give it to that service process either.
> > > 
> > > What Miklos proposed was to modify the kernel to allow processes to open
> > > file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> > > those files are in the process’s scope.  One way to implement this
> > > restriction (in a very restrictive manner) is to only allow opening file
> > > handles that the process has generated before, e.g. by appending a MAC
> > > to every file handle (generated with a process-specific key) and
> > > checking that when opening a handle.  (You would request this MAC with a
> > > new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> > > recognizes it due to a special file handle type value.)
> > > 
> > > (Process-specific key = stored in current->files, i.e. the files_struct.
> > > I’m not 100% sure what this is, but I guess this is the structure that
> > > keeps a process’s open file descriptors, and so should generally be
> > > unique to a process, or at least unique to a group of processes that
> > > share the same FDs.)
> > 
> > People are now discussing the idea of doing crash recovery for virtiofsd
> > and pass all the information to systemd(or other process) and get it
> > back once virtiofsd restarts. So tying the key to process life time
> > might not work with crash recovery.
> 
> Well, this would be solved by having the "persistency between virtiofsd
> invocations" I go on to sketch.

Aha.., I did not read the full email and replied early. Will check it
out.

> 
> > May be we can have a kernel system wide keyring and kernel can generate
> > a key and load there and that key can be used for the lifetime of kernel.
> > And every reboot will generate a new key.
> 
> Hm.  The problem I see with a system-wide key, i.e. one that isn’t
> process-specific, is that processes could exchange file handles among each
> other.  Probably not a real problem, because again, if processes can share
> file handles, they can probably do I/O for each other.  OTOH, a file handle
> persists after a process has exited, so maybe this isn’t truly equivalent.
> 
> The nice thing about a global key would be that it would make true
> persistency easier, too, because you could let it be uploaded from an
> arbitrary user space process and don’t have to deal with having to assign it
> to virtiofsd somehow.

If user space were to load a key in kernel, then it has to be out of band
from a different process (and not virtiofsd) which is privileged,
say has CAP_SYS_ADMIN, IIUC.

Otherwise kernel can't trust a unprivileged process to load a key and bypass
CAP_DAC_READ_SEARCH. That will effectively mean, that load a key and
open any file in the filesystem (by crafting the file handle).

> 
> But the question remains whether it is a good idea to have a single key for
> all processes.  I feel uneasy about it.

Not sure. But can't think how per process keys will make it any better
(as opposed to a system wide key).

> 
> Or did you mean that the kernel would generate a new key per process, put it
> into the keyring, and then after a restart the application could select its
> old key again?  I don’t know how you’d verify that the application has the
> right to do so.  I think this would have the same problem as a global key,
> because a process could quit and let some different application use its key
> and its file handles.

No, I did not mean that.

I was thinking along the lines of having a system wide key and just
use that to encrypt/decrypt. And any process in the system will
be able to use that (as long as it requested this by passing a flag
in name_to_handle_at() and open_by_handle_at()).

> 
> > I don't know much about encryption. How does HMAC key look like. Typically
> > with asymmetric keys, we sign kernel modules during build outside the
> > kernel and build public key into the kernel. Is this HMAC key a single
> > key which can be used both for encoding and decoding operation.
> 
> Yes.  You basically combine a single secret key with the message to be
> authenticated, hash it twice, and the resulting hash is the MAC.
> 
> For verification, you do exactly the same process (i.e., it’s the same key)
> and check whether the MAC matches.
> 
> > If same key can be used both for encoding/decoding, then question
> > arises, what are the chances that this key can leak to user space
> > and then user space can artifically encode file handles and be
> > able to open any files. I guess that's the reason TPM kind of things
> > are there to entrust that hardware with private key and nobody else
> > can get to it. (Sorry, I know very little about cryptography, so lot
> > of above might be wrong).
> 
> I don’t understand this question.  We would have the same problem with an
> asymmetric key, right?  The private key could leak to user space.

Generally signing servers (which have private key) are separate and behind
firewall and they don't run general purpose workload. So yes, if one
manages to run malicious process on signing server and somehow has a 
way to get to key, it is possible. But difference here is that server
which is verifying the signature does not have private key (and only
uses public key to verify signature).

In the case of HMAC stuff, same key is being used both for both the
purposes, so we will have to be extra careful that it does not leak
to user space (say, through some sideband channel attack). May be
I am overthinking. 

Thanks
Vivek

> 
> The chances are as high as it is for user space applications to read kernel
> space.  I guess when you upload a key with keyctl() so that no other process
> may read it, you rely on the same protection.
> 
> Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-24 12:28   ` Max Reitz
@ 2021-03-24 13:08     ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2021-03-24 13:08 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

On Wed, Mar 24, 2021 at 01:28:48PM +0100, Max Reitz wrote:
> On 24.03.21 12:51, Stefan Hajnoczi wrote:
> > On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> > > So, overall it seems like it may be workable to extend the in-kernel MAC
> > > verification to allow for persistent keys, but I still have some open
> > > questions, and I don’t know whether I should just defer them until we
> > > get to the point where we need persistency.
> > 
> > It seems worth refining these ideas a little bit further until they
> > reach solid ground. Then we can be sure that there is at least one
> > possible solution. At the moment it seems like most the components of a
> > solution are known but it's not clear how they all fit together - and
> > they have a nasty tendency of failing to meet the requirements if they
> > aren't put together in exactly the right way :).
> > 
> > > (Of note: If we implement something where a user space process (or
> > > multiple in conjunction) can arbitrarily choose a MAC key, then this
> > > will also be usable for live migration, because you can continue to use
> > > the key from the source on the destination.)
> > 
> > Definitely. Taking migration into account is worthwhile. One question
> > about that:
> > 
> > Are Linux file handles transferrable between systems? Basic
> > configurations that come to mind:
> > 1. XFS, brtfs, ext4 on SAN (e.g. FibreChannel SCSI LUN)
> > 2. NFS
> > 
> > I assumed they were in previous discussions but has anyone checked the
> > file handle implementations to see whether this is really the case?
> 
> I’m not exactly sure what you mean.  I think you’re asking whether for a
> single (shared) filesystem a file handle always works independently of which
> Linux instance is using it?

Yes. What you've found sounds promising.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-24 11:51 ` Stefan Hajnoczi
  2021-03-24 12:28   ` Max Reitz
@ 2021-03-24 16:44   ` Dr. David Alan Gilbert
  2021-03-24 16:57     ` Max Reitz
  1 sibling, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-24 16:44 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs-list, Max Reitz

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> > So, overall it seems like it may be workable to extend the in-kernel MAC
> > verification to allow for persistent keys, but I still have some open
> > questions, and I don’t know whether I should just defer them until we
> > get to the point where we need persistency.
> 
> It seems worth refining these ideas a little bit further until they
> reach solid ground. Then we can be sure that there is at least one
> possible solution. At the moment it seems like most the components of a
> solution are known but it's not clear how they all fit together - and
> they have a nasty tendency of failing to meet the requirements if they
> aren't put together in exactly the right way :).
> 
> > (Of note: If we implement something where a user space process (or
> > multiple in conjunction) can arbitrarily choose a MAC key, then this
> > will also be usable for live migration, because you can continue to use
> > the key from the source on the destination.)
> 
> Definitely. Taking migration into account is worthwhile. One question
> about that:
> 
> Are Linux file handles transferrable between systems? Basic
> configurations that come to mind:
> 1. XFS, brtfs, ext4 on SAN (e.g. FibreChannel SCSI LUN)
> 2. NFS
> 
> I assumed they were in previous discussions but has anyone checked the
> file handle implementations to see whether this is really the case?
> 
> > Final minor question that doesn’t really fit in fully elsewhere: When
> > generating a MAC over a file handle, should the mount ID be included?
> > I’m worried that this might definitely break persistency, but I think we
> > should include some FS ID.  Maybe the kernel should query the FS UUID
> > for this MAC calculation, and use that instead of the mount ID?
> 
> This is a good point. If the file handle is not tied to a particular
> file system mount then an application can stash a well-known file handle
> (e.g. /) from one mount it has full access to and then use open a file
> on a mount that it does not have full directory treeaccess to (e.g. a
> bind mount/sub-tree?).

I'd be surprised if the mount-id was the same between two hosts.

Dave

> Stefan



> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-18 17:09 [Virtio-fs] Current file handle status and open questions Max Reitz
                   ` (2 preceding siblings ...)
  2021-03-24 11:51 ` Stefan Hajnoczi
@ 2021-03-24 16:54 ` Dr. David Alan Gilbert
  2021-04-13 14:05 ` Vivek Goyal
  4 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2021-03-24 16:54 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

* Max Reitz (mreitz@redhat.com) wrote:
> Hi,
> 
> As threatened in our last meeting, I’ve written this mail to give an
> overview on where we stand with regards to virtiofsd(-rs) using file
> handles.
> 
> Technically, this should be a reply to the “Securint file handles”
> thread, but this mail is so long I think it’s better to split it off.
> 
> There are multiple problems that somehow relate to file handles, the
> ones I’m aware of are:
> 
> (A) We have a problem with too many FDs open.  To solve it, we could
>     attach a file handle to each node, then close the FD (as far as
>     possible) and reopen it when needed from the file handle.
> 
> (B) We want to allow the guest to use persistent file handles.
> 
> (C) For live migration, the problem isn’t even clear yet, but it seems
>     like we’ll want to translate nodes into their file handles and
>     transmit those and open them again on the destination (at least on
>     shared filesystems).

> Now every case has its own specific tricky bits:
> 
> Case (A) is something that we’d really like to have by default, and it
> would need to work all the time during runtime.  So the problem here is
> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> we don’t want that.  One interesting bit is that we don’t need these
> file handles to be persistent between virtiofsd invocations.
> 
> For (B) we have the problem of needing to protect against a potentially
> malicious guest, i.e. that it must not be able to reference files
> outside the shared directory.  (Perhaps except for cases where the file
> was once reachable, i.e. where a file handle was generated by the guest,
> then the file was moved outside of the shared directory, but remains
> accessible through the file handle.)  Furthermore, file handles should
> really be persistent between virtiofsd invocations.  On the positive
> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> case, because it really is optional.  We could require users to give us
> that capability if they want file handles in the guest (and we find no
> way to avoid requiring that capability).
> 
> (C) needs persistency between source and destination, but on the
> positive side, we only need to be able to open file handles during the
> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> only during that phase might not be that big of a deal.

(B) & (C) interact in some fun ways; if we ignore that then we get:

  a) handle has to be readnble by same host kernel; so a MAC with a
local kernel known key works
  b) Same as (a)?
  c) Now we need a way to transfer the handle to a destination, where
the destination 1) Trusts that it was generated by another trustworthy
host  2) That it hasn't been tampered with

But if you allow the guest persistent file handles (b), and then migrate
it (c), that guest visibile file handle has to work on the destination;
even worse it has to work on some future destination even if
intermediate destinations haven't ever used that handle.

That makes it quite tricky, because one way to have solved (c) alone
would have been to generate some form of 'transport key' for the
migration, where the destination and source somehow exchange keys (at
the managment layer) and the handles are protected during migration; but
b+c is much harder because you can't use a short term key like that.

Dave

> 
> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> as you can see, that requirement is weakened for cases (B) and
> especially (C).)
> 
> 
> As far as I’ve understood, (A) is the case that we want to focus on
> first, and the main problem there is that we need to open file handles
> without CAP_DAC_READ_SEARCH.
> 
> To do that, I at one point proposed a service process that has
> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> probably won’t really be an improvement, because this process too would
> probably need to run in the container and so if we can’t give virtiofsd
> that capability, we can’t give it to that service process either.
> 
> What Miklos proposed was to modify the kernel to allow processes to open
> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> those files are in the process’s scope.  One way to implement this
> restriction (in a very restrictive manner) is to only allow opening file
> handles that the process has generated before, e.g. by appending a MAC
> to every file handle (generated with a process-specific key) and
> checking that when opening a handle.  (You would request this MAC with a
> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> recognizes it due to a special file handle type value.)
> 
> (Process-specific key = stored in current->files, i.e. the files_struct.
> I’m not 100% sure what this is, but I guess this is the structure that
> keeps a process’s open file descriptors, and so should generally be
> unique to a process, or at least unique to a group of processes that
> share the same FDs.)
> 
> 
> So far so good.
> 
> What I’m now worried about is whether we can make use of that kernel
> feature for (B) and (C), too.  I don’t really want to design completely
> independent solutions for those problems.
> 
> I don’t think it’s important that we come up with exactly plotted
> implementations for (B) and (C) now, but a rough sketch on how we might
> approach them would be nice.
> 
> What we need for (B) is persistency between virtiofsd invocations.  For
> (C), we can decide whether we just don’t want to care (i.e., we simply
> require CAP_DAC_READ_SEARCH during the in-migrate phase on the
> destination), or whether we need some way to transfer the MAC key from
> the source to the destination.  (But if we want (B) with live migration,
> we will need to transfer the key to the destination.)
> 
> Let’s talk about (B).
> 
> I can’t imagine a design where the kernel could create persistent keys
> without having a user space process store them.  Well, except something
> where you measure a program image to derive its specific MAC key from
> some persistent platform key, but I don’t think that’s feasible (for
> multiple reasons), and it’d also make it impossible to migrate a VM to a
> different host and keep guest file handles valid.
> 
> So I think we need some way to let a user space process upload a key for
> some other key to use.  Vivek has proposed using keyctl(2) for that.
> I’m not exactly sure how we’d approach that in practice, though...  I
> suppose we’d add the key to some global keyring (write-only), and then
> somehow we’d need to let virtiofsd inform the kernel which key to use
> for its MACs.  (Perhaps we can do the last bit with keyctl_search(), by
> copying a key from the upload keyring to a process-specific keyring,
> perhaps even a keyring that exists only to hold the process’s MAC key?)
> 
> A problem here is that when a process (a) uploads a key for some other
> process (b) to use, you can effectively give (b) CAP_DAC_READ_SEARCH;
> because if (a) then shared the key with (b), (b) could forge all of its
> MACs and thus bypass the MAC protection (so it effectively gains
> CAP_DAC_READ_SEARCH).  To be allowed this, (a) would need a level of
> privilege that’s equivalent to being able to grant arbitrary processes
> CAP_DAC_READ_SEARCH.  Miklos suggested that we might as well fall back
> to requiring to CAP_SYS_ADMIN, and I think that’s reasonable.
> 
> The next question is, how would we require this capability for a key
> upload?  I don’t think keyctl currently has a solution for that.  I
> suppose we would need to have a special global keyring for such MACs and
> only allow privileged processes to upload keys there, so once a process
> asks for a specific key to be used, we can fetch it from that keyring
> and be sure that key has been uploaded by a privileged application.
> 
> Then, there’d be the question of how an application would select its key
> from the upload keyring.  I think we would need to prevent it from
> selecting a key that’s intended for some other application, because then
> it could use that other application’s file handles...  Which I don’t
> think is a problem if that other application willingly shares that
> ability (because in such a case, that other application might as well
> just do I/O on the first application’s behalf), but other applications’
> keys must not be reachable by guessing.  So perhaps a key description of
> high entropy would do (that’s given to keyctl_search()).
> 
> I think Dave also suggested using a certificate-based system, but I’m
> afraid I don’t fully understand it.  As far as I had guessed, that would
> mean the privileged process only configures what CAs are to be trusted,
> and then every process could upload its own MAC key along with a
> certificate from a trusted CA.  That would get around the whole keyring
> stuff, but it means you’d need to be able to provide trusted CAs and
> we’d need to check those certificates in the kernel.  I have no idea how
> we’d do that, I don’t think keyctl could help us there.
> 
> Furthermore, the obvious problem with my interpretation of the
> certificate proposal is that the uploading application would see its own
> key, which would be bad.  I suspect it would need to be encrypted
> somehow, but if we do that, we might as well skip the certificate part
> and just let the privileged process upload one or more global encryption
> keys, right?  (Applications could then upload encrypted keys with nonces
> and a hash, I guess, so the kernel can verify that the key is valid.)
> 
> 
> So, overall it seems like it may be workable to extend the in-kernel MAC
> verification to allow for persistent keys, but I still have some open
> questions, and I don’t know whether I should just defer them until we
> get to the point where we need persistency.
> 
> (Of note: If we implement something where a user space process (or
> multiple in conjunction) can arbitrarily choose a MAC key, then this
> will also be usable for live migration, because you can continue to use
> the key from the source on the destination.)
> 
> 
> Final minor question that doesn’t really fit in fully elsewhere: When
> generating a MAC over a file handle, should the mount ID be included?
> I’m worried that this might definitely break persistency, but I think we
> should include some FS ID.  Maybe the kernel should query the FS UUID
> for this MAC calculation, and use that instead of the mount ID?
> 
> 
> Thanks for reading and all your help,
> 
> Max
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-24 16:44   ` Dr. David Alan Gilbert
@ 2021-03-24 16:57     ` Max Reitz
  0 siblings, 0 replies; 17+ messages in thread
From: Max Reitz @ 2021-03-24 16:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Stefan Hajnoczi; +Cc: virtio-fs-list

On 24.03.21 17:44, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
>> On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:

[...]

>>> Final minor question that doesn’t really fit in fully elsewhere: When
>>> generating a MAC over a file handle, should the mount ID be included?
>>> I’m worried that this might definitely break persistency, but I think we
>>> should include some FS ID.  Maybe the kernel should query the FS UUID
>>> for this MAC calculation, and use that instead of the mount ID?
>>
>> This is a good point. If the file handle is not tied to a particular
>> file system mount then an application can stash a well-known file handle
>> (e.g. /) from one mount it has full access to and then use open a file
>> on a mount that it does not have full directory treeaccess to (e.g. a
>> bind mount/sub-tree?).
> 
> I'd be surprised if the mount-id was the same between two hosts.

True, and it’s also not persistent between reboots, that’s why I 
proposed for the kernel to query the FS UUID for the MAC calculation.

Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-03-18 17:09 [Virtio-fs] Current file handle status and open questions Max Reitz
                   ` (3 preceding siblings ...)
  2021-03-24 16:54 ` Dr. David Alan Gilbert
@ 2021-04-13 14:05 ` Vivek Goyal
  2021-04-13 14:14   ` Max Reitz
  4 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2021-04-13 14:05 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> Hi,
> 
> As threatened in our last meeting, I’ve written this mail to give an
> overview on where we stand with regards to virtiofsd(-rs) using file
> handles.
> 
> Technically, this should be a reply to the “Securint file handles”
> thread, but this mail is so long I think it’s better to split it off.
> 
> There are multiple problems that somehow relate to file handles, the
> ones I’m aware of are:
> 
> (A) We have a problem with too many FDs open.  To solve it, we could
>     attach a file handle to each node, then close the FD (as far as
>     possible) and reopen it when needed from the file handle.
> 
> (B) We want to allow the guest to use persistent file handles.
> 
> (C) For live migration, the problem isn’t even clear yet, but it seems
>     like we’ll want to translate nodes into their file handles and
>     transmit those and open them again on the destination (at least on
>     shared filesystems).
> 
> Now every case has its own specific tricky bits:
> 
> Case (A) is something that we’d really like to have by default, and it
> would need to work all the time during runtime.  So the problem here is
> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> we don’t want that.  One interesting bit is that we don’t need these
> file handles to be persistent between virtiofsd invocations.

Hi Max,

A question about CAP_DAC_READ_SEARCH. I get it that in long term we
don't want it beacuse current limitation is that CAP_DAC_READ_SEARCH
is needed in init_user_ns. And we want to run virtiofsd in user
namespace and it will not have CAP_DAC_READ_SEARCH in init_user_ns.

But as of now we are running virtiofsd in init_user_ns
CAP_DAC_READ_SEARCH.

So question is that if virtiofsd has CAP_DAC_READ_SEARCH, can we
try to fall back to using name_to_handle_at()/open_by_handle_at()
for lo_inode->fd. And this should help solve the problem
A and C.

And once a solution comes along which does not require CAP_DAC_READ_SEARCH
we could move to that. In fact virtiofsd probably will have to deal
with both so that we could continue to work with older kernels.

Thanks
Vivek

> 
> For (B) we have the problem of needing to protect against a potentially
> malicious guest, i.e. that it must not be able to reference files
> outside the shared directory.  (Perhaps except for cases where the file
> was once reachable, i.e. where a file handle was generated by the guest,
> then the file was moved outside of the shared directory, but remains
> accessible through the file handle.)  Furthermore, file handles should
> really be persistent between virtiofsd invocations.  On the positive
> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> case, because it really is optional.  We could require users to give us
> that capability if they want file handles in the guest (and we find no
> way to avoid requiring that capability).
> 
> (C) needs persistency between source and destination, but on the
> positive side, we only need to be able to open file handles during the
> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> only during that phase might not be that big of a deal.
> 
> 
> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> as you can see, that requirement is weakened for cases (B) and
> especially (C).)
> 
> 
> As far as I’ve understood, (A) is the case that we want to focus on
> first, and the main problem there is that we need to open file handles
> without CAP_DAC_READ_SEARCH.
> 
> To do that, I at one point proposed a service process that has
> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> probably won’t really be an improvement, because this process too would
> probably need to run in the container and so if we can’t give virtiofsd
> that capability, we can’t give it to that service process either.
> 
> What Miklos proposed was to modify the kernel to allow processes to open
> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> those files are in the process’s scope.  One way to implement this
> restriction (in a very restrictive manner) is to only allow opening file
> handles that the process has generated before, e.g. by appending a MAC
> to every file handle (generated with a process-specific key) and
> checking that when opening a handle.  (You would request this MAC with a
> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> recognizes it due to a special file handle type value.)
> 
> (Process-specific key = stored in current->files, i.e. the files_struct.
> I’m not 100% sure what this is, but I guess this is the structure that
> keeps a process’s open file descriptors, and so should generally be
> unique to a process, or at least unique to a group of processes that
> share the same FDs.)
> 
> 
> So far so good.
> 
> What I’m now worried about is whether we can make use of that kernel
> feature for (B) and (C), too.  I don’t really want to design completely
> independent solutions for those problems.
> 
> I don’t think it’s important that we come up with exactly plotted
> implementations for (B) and (C) now, but a rough sketch on how we might
> approach them would be nice.
> 
> What we need for (B) is persistency between virtiofsd invocations.  For
> (C), we can decide whether we just don’t want to care (i.e., we simply
> require CAP_DAC_READ_SEARCH during the in-migrate phase on the
> destination), or whether we need some way to transfer the MAC key from
> the source to the destination.  (But if we want (B) with live migration,
> we will need to transfer the key to the destination.)
> 
> Let’s talk about (B).
> 
> I can’t imagine a design where the kernel could create persistent keys
> without having a user space process store them.  Well, except something
> where you measure a program image to derive its specific MAC key from
> some persistent platform key, but I don’t think that’s feasible (for
> multiple reasons), and it’d also make it impossible to migrate a VM to a
> different host and keep guest file handles valid.
> 
> So I think we need some way to let a user space process upload a key for
> some other key to use.  Vivek has proposed using keyctl(2) for that.
> I’m not exactly sure how we’d approach that in practice, though...  I
> suppose we’d add the key to some global keyring (write-only), and then
> somehow we’d need to let virtiofsd inform the kernel which key to use
> for its MACs.  (Perhaps we can do the last bit with keyctl_search(), by
> copying a key from the upload keyring to a process-specific keyring,
> perhaps even a keyring that exists only to hold the process’s MAC key?)
> 
> A problem here is that when a process (a) uploads a key for some other
> process (b) to use, you can effectively give (b) CAP_DAC_READ_SEARCH;
> because if (a) then shared the key with (b), (b) could forge all of its
> MACs and thus bypass the MAC protection (so it effectively gains
> CAP_DAC_READ_SEARCH).  To be allowed this, (a) would need a level of
> privilege that’s equivalent to being able to grant arbitrary processes
> CAP_DAC_READ_SEARCH.  Miklos suggested that we might as well fall back
> to requiring to CAP_SYS_ADMIN, and I think that’s reasonable.
> 
> The next question is, how would we require this capability for a key
> upload?  I don’t think keyctl currently has a solution for that.  I
> suppose we would need to have a special global keyring for such MACs and
> only allow privileged processes to upload keys there, so once a process
> asks for a specific key to be used, we can fetch it from that keyring
> and be sure that key has been uploaded by a privileged application.
> 
> Then, there’d be the question of how an application would select its key
> from the upload keyring.  I think we would need to prevent it from
> selecting a key that’s intended for some other application, because then
> it could use that other application’s file handles...  Which I don’t
> think is a problem if that other application willingly shares that
> ability (because in such a case, that other application might as well
> just do I/O on the first application’s behalf), but other applications’
> keys must not be reachable by guessing.  So perhaps a key description of
> high entropy would do (that’s given to keyctl_search()).
> 
> I think Dave also suggested using a certificate-based system, but I’m
> afraid I don’t fully understand it.  As far as I had guessed, that would
> mean the privileged process only configures what CAs are to be trusted,
> and then every process could upload its own MAC key along with a
> certificate from a trusted CA.  That would get around the whole keyring
> stuff, but it means you’d need to be able to provide trusted CAs and
> we’d need to check those certificates in the kernel.  I have no idea how
> we’d do that, I don’t think keyctl could help us there.
> 
> Furthermore, the obvious problem with my interpretation of the
> certificate proposal is that the uploading application would see its own
> key, which would be bad.  I suspect it would need to be encrypted
> somehow, but if we do that, we might as well skip the certificate part
> and just let the privileged process upload one or more global encryption
> keys, right?  (Applications could then upload encrypted keys with nonces
> and a hash, I guess, so the kernel can verify that the key is valid.)
> 
> 
> So, overall it seems like it may be workable to extend the in-kernel MAC
> verification to allow for persistent keys, but I still have some open
> questions, and I don’t know whether I should just defer them until we
> get to the point where we need persistency.
> 
> (Of note: If we implement something where a user space process (or
> multiple in conjunction) can arbitrarily choose a MAC key, then this
> will also be usable for live migration, because you can continue to use
> the key from the source on the destination.)
> 
> 
> Final minor question that doesn’t really fit in fully elsewhere: When
> generating a MAC over a file handle, should the mount ID be included?
> I’m worried that this might definitely break persistency, but I think we
> should include some FS ID.  Maybe the kernel should query the FS UUID
> for this MAC calculation, and use that instead of the mount ID?
> 
> 
> Thanks for reading and all your help,
> 
> Max
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-04-13 14:05 ` Vivek Goyal
@ 2021-04-13 14:14   ` Max Reitz
  2021-04-13 14:56     ` Vivek Goyal
  0 siblings, 1 reply; 17+ messages in thread
From: Max Reitz @ 2021-04-13 14:14 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list

On 13.04.21 16:05, Vivek Goyal wrote:
> On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
>> Hi,
>>
>> As threatened in our last meeting, I’ve written this mail to give an
>> overview on where we stand with regards to virtiofsd(-rs) using file
>> handles.
>>
>> Technically, this should be a reply to the “Securint file handles”
>> thread, but this mail is so long I think it’s better to split it off.
>>
>> There are multiple problems that somehow relate to file handles, the
>> ones I’m aware of are:
>>
>> (A) We have a problem with too many FDs open.  To solve it, we could
>>      attach a file handle to each node, then close the FD (as far as
>>      possible) and reopen it when needed from the file handle.
>>
>> (B) We want to allow the guest to use persistent file handles.
>>
>> (C) For live migration, the problem isn’t even clear yet, but it seems
>>      like we’ll want to translate nodes into their file handles and
>>      transmit those and open them again on the destination (at least on
>>      shared filesystems).
>>
>> Now every case has its own specific tricky bits:
>>
>> Case (A) is something that we’d really like to have by default, and it
>> would need to work all the time during runtime.  So the problem here is
>> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
>> we don’t want that.  One interesting bit is that we don’t need these
>> file handles to be persistent between virtiofsd invocations.
> 
> Hi Max,
> 
> A question about CAP_DAC_READ_SEARCH. I get it that in long term we
> don't want it beacuse current limitation is that CAP_DAC_READ_SEARCH
> is needed in init_user_ns. And we want to run virtiofsd in user
> namespace and it will not have CAP_DAC_READ_SEARCH in init_user_ns.
> 
> But as of now we are running virtiofsd in init_user_ns
> CAP_DAC_READ_SEARCH.
> 
> So question is that if virtiofsd has CAP_DAC_READ_SEARCH, can we
> try to fall back to using name_to_handle_at()/open_by_handle_at()
> for lo_inode->fd. And this should help solve the problem
> A and C.
> 
> And once a solution comes along which does not require CAP_DAC_READ_SEARCH
> we could move to that. In fact virtiofsd probably will have to deal
> with both so that we could continue to work with older kernels.

For A, we basically don’t need to change anything regardless of whether 
(1) we have CAP_DAC_READ_SEARCH, or whether (2) we don’t have that 
capability, but the kernel supports MAC-ed file handles for that case.

When we want to replace lo_inode->fd, virtiofsd only generates file 
handles (and stores them instead of the fd) and opens them later (when 
the fd is needed), that’s it.  The only difference would be that it 
needs to pass a new flag (AT_HANDLE_MAC) when generating handles without 
CAP_DAC_READ_SEARCH.


And then we should be able to use those handles for live migration, too, 
yes.  As I said, I suppose perhaps we can just live with requiring 
CAP_DAC_READ_SEARCH during the initial phase.

Max


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-04-13 14:14   ` Max Reitz
@ 2021-04-13 14:56     ` Vivek Goyal
  2021-04-13 15:10       ` Miklos Szeredi
  0 siblings, 1 reply; 17+ messages in thread
From: Vivek Goyal @ 2021-04-13 14:56 UTC (permalink / raw)
  To: Max Reitz; +Cc: virtio-fs-list

On Tue, Apr 13, 2021 at 04:14:57PM +0200, Max Reitz wrote:
> On 13.04.21 16:05, Vivek Goyal wrote:
> > On Thu, Mar 18, 2021 at 06:09:58PM +0100, Max Reitz wrote:
> > > Hi,
> > > 
> > > As threatened in our last meeting, I’ve written this mail to give an
> > > overview on where we stand with regards to virtiofsd(-rs) using file
> > > handles.
> > > 
> > > Technically, this should be a reply to the “Securint file handles”
> > > thread, but this mail is so long I think it’s better to split it off.
> > > 
> > > There are multiple problems that somehow relate to file handles, the
> > > ones I’m aware of are:
> > > 
> > > (A) We have a problem with too many FDs open.  To solve it, we could
> > >      attach a file handle to each node, then close the FD (as far as
> > >      possible) and reopen it when needed from the file handle.
> > > 
> > > (B) We want to allow the guest to use persistent file handles.
> > > 
> > > (C) For live migration, the problem isn’t even clear yet, but it seems
> > >      like we’ll want to translate nodes into their file handles and
> > >      transmit those and open them again on the destination (at least on
> > >      shared filesystems).
> > > 
> > > Now every case has its own specific tricky bits:
> > > 
> > > Case (A) is something that we’d really like to have by default, and it
> > > would need to work all the time during runtime.  So the problem here is
> > > that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> > > we don’t want that.  One interesting bit is that we don’t need these
> > > file handles to be persistent between virtiofsd invocations.
> > 
> > Hi Max,
> > 
> > A question about CAP_DAC_READ_SEARCH. I get it that in long term we
> > don't want it beacuse current limitation is that CAP_DAC_READ_SEARCH
> > is needed in init_user_ns. And we want to run virtiofsd in user
> > namespace and it will not have CAP_DAC_READ_SEARCH in init_user_ns.
> > 
> > But as of now we are running virtiofsd in init_user_ns
> > CAP_DAC_READ_SEARCH.
> > 
> > So question is that if virtiofsd has CAP_DAC_READ_SEARCH, can we
> > try to fall back to using name_to_handle_at()/open_by_handle_at()
> > for lo_inode->fd. And this should help solve the problem
> > A and C.
> > 
> > And once a solution comes along which does not require CAP_DAC_READ_SEARCH
> > we could move to that. In fact virtiofsd probably will have to deal
> > with both so that we could continue to work with older kernels.
> 
> For A, we basically don’t need to change anything regardless of whether (1)
> we have CAP_DAC_READ_SEARCH, or whether (2) we don’t have that capability,
> but the kernel supports MAC-ed file handles for that case.
> 
> When we want to replace lo_inode->fd, virtiofsd only generates file handles
> (and stores them instead of the fd) and opens them later (when the fd is
> needed), that’s it.  The only difference would be that it needs to pass a
> new flag (AT_HANDLE_MAC) when generating handles without
> CAP_DAC_READ_SEARCH.
> 
> 
> And then we should be able to use those handles for live migration, too,
> yes.  As I said, I suppose perhaps we can just live with requiring
> CAP_DAC_READ_SEARCH during the initial phase.

Thanks Max. I guess making A and C work with CAP_DAC_READ_SEARCH is
probably a useful functionality. People are complaining both about
too many open file descriptors as well as lack of migration capability.
Especially A, is much more pressing.

I thought we are giving CAP_DAC_READ_SEARCH but I guest checked current
source code and CAP_DAC_READ_SEARCH is not in the list. So that means
either we or user will have to give it explicitly.

    if (capng_updatev(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
            CAP_CHOWN,
            CAP_DAC_OVERRIDE,
            CAP_FOWNER,
            CAP_FSETID,
            CAP_SETGID,
            CAP_SETUID,
            CAP_MKNOD,
            CAP_SETFCAP,
            -1)) {

Thanks
Vivek


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] Current file handle status and open questions
  2021-04-13 14:56     ` Vivek Goyal
@ 2021-04-13 15:10       ` Miklos Szeredi
  0 siblings, 0 replies; 17+ messages in thread
From: Miklos Szeredi @ 2021-04-13 15:10 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs-list, Max Reitz

On Tue, Apr 13, 2021 at 4:57 PM Vivek Goyal <vgoyal@redhat.com> wrote:

> I thought we are giving CAP_DAC_READ_SEARCH but I guest checked current
> source code and CAP_DAC_READ_SEARCH is not in the list. So that means
> either we or user will have to give it explicitly.

Looking at generic_permission() it appears that CAP_DAC_READ_SEARCH
gives a subset of CAP_DAC_OVERRIDE capabilities.  So it seems quite
safe at this point to enable CAP_DAC_READ_SEARCH too.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-13 15:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 17:09 [Virtio-fs] Current file handle status and open questions Max Reitz
2021-03-23 16:39 ` Sergio Lopez
2021-03-24  7:40   ` Max Reitz
2021-03-24  8:51     ` Sergio Lopez
2021-03-23 19:52 ` Vivek Goyal
2021-03-24  8:06   ` Max Reitz
2021-03-24 13:01     ` Vivek Goyal
2021-03-24 11:51 ` Stefan Hajnoczi
2021-03-24 12:28   ` Max Reitz
2021-03-24 13:08     ` Stefan Hajnoczi
2021-03-24 16:44   ` Dr. David Alan Gilbert
2021-03-24 16:57     ` Max Reitz
2021-03-24 16:54 ` Dr. David Alan Gilbert
2021-04-13 14:05 ` Vivek Goyal
2021-04-13 14:14   ` Max Reitz
2021-04-13 14:56     ` Vivek Goyal
2021-04-13 15:10       ` Miklos Szeredi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.