All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
@ 2021-06-18 23:28 Ameer Ghani
  2021-06-22 13:45 ` Vivek Goyal
  0 siblings, 1 reply; 9+ messages in thread
From: Ameer Ghani @ 2021-06-18 23:28 UTC (permalink / raw)
  To: virtio-fs; +Cc: vgoyal

[-- Attachment #1: Type: text/plain, Size: 5550 bytes --]

Hello,

We were experiencing the same issue w/r/t group permissions not being respected
on directories.

I did a quick-and-dirty hack to add the SECBIT_NO_SETUID_FIXUP securebit, as
suggested by Chirantan and Vivek, and that seems to have done the trick. Looks
like if we set the securebits before we setup sandbox we can avoid leaving the
process with CAP_SETPCAP.

Seems a better approach would just to expose the ability to set securebits at
the CLI like what is done with caps. Then anybody affected by this bug can set
SECBIT_NO_SETUID_FIXUP themselves. I'd prefer this if the maintainers agree.

Would be happy to clean up my patch and submit it--just want to check if this
is an approach y'all are content with.

Current patch is included if anybody wants to try it out. Note that the
securebits header is lifted from kernel 5.4.0 /usr/include/linux/securebits.h

Thanks,
Ameer Ghani

---
include/standard-headers/linux/securebits.h | 61 +++++++++++++++++++++
tools/virtiofsd/passthrough_ll.c            | 15 +++++
2 files changed, 76 insertions(+)
create mode 100644 include/standard-headers/linux/securebits.h

diff --git a/include/standard-headers/linux/securebits.h b/include/standard-headers/linux/securebits.h
new file mode 100644
index 0000000000..69c11a49c5
--- /dev/null
+++ b/include/standard-headers/linux/securebits.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_SECUREBITS_H
+#define _LINUX_SECUREBITS_H
+
+/* Each securesetting is implemented using two bits. One bit specifies
+   whether the setting is on or off. The other bit specify whether the
+   setting is locked or not. A setting which is locked cannot be
+   changed from user-level. */
+#define issecure_mask(X)  (1 << (X))
+
+#define SECUREBITS_DEFAULT 0x00000000
+
+/* When set UID 0 has no special privileges. When unset, we support
+   inheritance of root-permissions and suid-root executable under
+   compatibility mode. We raise the effective and inheritable bitmasks
+   *of the executable file* if the effective uid of the new process is
+   0. If the real uid is 0, we raise the effective (legacy) bit of the
+   executable file. */
+#define SECURE_NOROOT                0
+#define SECURE_NOROOT_LOCKED         1  /* make bit-0 immutable */
+
+#define SECBIT_NOROOT          (issecure_mask(SECURE_NOROOT))
+#define SECBIT_NOROOT_LOCKED    (issecure_mask(SECURE_NOROOT_LOCKED))
+
+/* When set, setuid to/from uid 0 does not trigger capability-"fixup".
+   When unset, to provide compatiblility with old programs relying on
+   set*uid to gain/lose privilege, transitions to/from uid 0 cause
+   capabilities to be gained/lost. */
+#define SECURE_NO_SETUID_FIXUP       2
+#define SECURE_NO_SETUID_FIXUP_LOCKED 3  /* make bit-2 immutable */
+
+#define SECBIT_NO_SETUID_FIXUP  (issecure_mask(SECURE_NO_SETUID_FIXUP))
+#define SECBIT_NO_SETUID_FIXUP_LOCKED \
+               (issecure_mask(SECURE_NO_SETUID_FIXUP_LOCKED))
+
+/* When set, a process can retain its capabilities even after
+   transitioning to a non-root user (the set-uid fixup suppressed by
+   bit 2). Bit-4 is cleared when a process calls exec(); setting both
+   bit 4 and 5 will create a barrier through exec that no exec()'d
+   child can use this feature again. */
+#define SECURE_KEEP_CAPS        4
+#define SECURE_KEEP_CAPS_LOCKED      5  /* make bit-4 immutable */
+
+#define SECBIT_KEEP_CAPS  (issecure_mask(SECURE_KEEP_CAPS))
+#define SECBIT_KEEP_CAPS_LOCKED (issecure_mask(SECURE_KEEP_CAPS_LOCKED))
+
+/* When set, a process cannot add new capabilities to its ambient set. */
+#define SECURE_NO_CAP_AMBIENT_RAISE       6
+#define SECURE_NO_CAP_AMBIENT_RAISE_LOCKED 7  /* make bit-6 immutable */
+
+#define SECBIT_NO_CAP_AMBIENT_RAISE (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
+#define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
+               (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
+
+#define SECURE_ALL_BITS         (issecure_mask(SECURE_NOROOT) | \
+                    issecure_mask(SECURE_NO_SETUID_FIXUP) | \
+                    issecure_mask(SECURE_KEEP_CAPS) | \
+                    issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
+#define SECURE_ALL_LOCKS  (SECURE_ALL_BITS << 1)
+
+#endif /* _LINUX_SECUREBITS_H */
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 49c21fd855..afec9a6567 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -43,6 +43,7 @@
#include "fuse_log.h"
#include "fuse_lowlevel.h"
#include "standard-headers/linux/fuse.h"
+#include "standard-headers/linux/securebits.h"
#include <cap-ng.h>
#include <dirent.h>
#include <pthread.h>
@@ -3515,6 +3516,17 @@ static void setup_chroot(struct lo_data *lo)
     }
}
+static void setup_securebits(void)
+{
+    int status;
+
+    status = prctl(PR_SET_SECUREBITS, SECBIT_NO_SETUID_FIXUP);
+    if (status == -1) {
+        fuse_log(FUSE_LOG_ERR, "prctl(PR_SET_SECUREBITS): %m\n");
+        exit(1);
+    }
+}
+
/*
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
@@ -3869,6 +3881,9 @@ int main(int argc, char *argv[])
     setup_nofile_rlimit(opts.rlimit_nofile);
+    /* Set securebits before we drop CAP_SETPCAP */
+    setup_securebits();
+
     /* Must be before sandbox since it wants /proc */
     setup_capng();
--
2.25.1


[-- Attachment #2: Type: text/html, Size: 21041 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-18 23:28 [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission Ameer Ghani
@ 2021-06-22 13:45 ` Vivek Goyal
  2021-06-24 14:23   ` Ameer Ghani
  0 siblings, 1 reply; 9+ messages in thread
From: Vivek Goyal @ 2021-06-22 13:45 UTC (permalink / raw)
  To: Ameer Ghani; +Cc: virtio-fs

On Fri, Jun 18, 2021 at 11:28:13PM +0000, Ameer Ghani wrote:
> Hello,
> 
> We were experiencing the same issue w/r/t group permissions not being respected
> on directories.
> 
> I did a quick-and-dirty hack to add the SECBIT_NO_SETUID_FIXUP securebit, as
> suggested by Chirantan and Vivek, and that seems to have done the trick. Looks
> like if we set the securebits before we setup sandbox we can avoid leaving the
> process with CAP_SETPCAP.

This is assuming that in all the cases we need not drop capabilities upon
uid switch. Hmm.., thinking about couple of use caches of swithing uid/gid
I have in mind.

- I have proposed ACL support patches for virtiofs. There I need to switch
  gid (atleast) and drop CAP_FSETID. Currently I am switching both uid/gid
  and dropping CAP_FSETID explicitly. But If I know that I am switching
  uid to non-zero and that will automatically drop CAP_FSETID, then I 
  will not have made explicit call to drop CAP_FSETID.

- There have been reports of some xfstests failing when virtiofsd is
  running on top of NFS with "-o killpriv_v2" enabled. For NFS, just
  dropping CAP_FSETID is not enough to clear suid/sgid (as local file
  systems do). So we probably will have to switch uid/gid in some
  more places where we kill suid/sgid bits so that it works on top of
  NFS too. Do we care about dropping capabilities, not sure. Maybe not.

  So I guess this is just an optimization. If we leave SECBIT_NO_SETUID_FIXUP
  set for virtiofsd for its life, then we need to make sure if we need
  to make sure that any capability will have to be dropped explicitly
  and just uid switch will not drop capability automatically.

I guess its ok to set SECBIT_NO_SETUID_FIXUP and drop CAP_SETPCAP and
let virtiosd drop capabilities explicitly where need be.

If this becomes too painful or inefficient from performance point of view,
we probably will have to change it and set SECBIT_NO_SETUID_FIXUP only
during file creation path. (lo_create and lo_mknod).

> 
> Seems a better approach would just to expose the ability to set securebits at
> the CLI like what is done with caps. Then anybody affected by this bug can set
> SECBIT_NO_SETUID_FIXUP themselves. I'd prefer this if the maintainers agree.

I would think that don't ask user to opt-in for this behavior and just
implement it for everyone. Asking too many many questions will make
configuration more complex.

> 
> Would be happy to clean up my patch and submit it--just want to check if this
> is an approach y'all are content with.

Can you please also run xfstests and see if this patch introduces any
regressions. Just want to make sure there are no unintended side affects.

Please do submit a formal patch.

> 
> Current patch is included if anybody wants to try it out. Note that the
> securebits header is lifted from kernel 5.4.0 /usr/include/linux/securebits.h

You have have to split changes in two patches. First patch probably adds
latest header (from 5.13-rc5) kernel and second patch does the securebit
magic.

Thanks
Vivek
> 
> Thanks,
> Ameer Ghani
> 
> ---
> include/standard-headers/linux/securebits.h | 61 +++++++++++++++++++++
> tools/virtiofsd/passthrough_ll.c            | 15 +++++
> 2 files changed, 76 insertions(+)
> create mode 100644 include/standard-headers/linux/securebits.h
> 
> diff --git a/include/standard-headers/linux/securebits.h b/include/standard-headers/linux/securebits.h
> new file mode 100644
> index 0000000000..69c11a49c5
> --- /dev/null
> +++ b/include/standard-headers/linux/securebits.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _LINUX_SECUREBITS_H
> +#define _LINUX_SECUREBITS_H
> +
> +/* Each securesetting is implemented using two bits. One bit specifies
> +   whether the setting is on or off. The other bit specify whether the
> +   setting is locked or not. A setting which is locked cannot be
> +   changed from user-level. */
> +#define issecure_mask(X)  (1 << (X))
> +
> +#define SECUREBITS_DEFAULT 0x00000000
> +
> +/* When set UID 0 has no special privileges. When unset, we support
> +   inheritance of root-permissions and suid-root executable under
> +   compatibility mode. We raise the effective and inheritable bitmasks
> +   *of the executable file* if the effective uid of the new process is
> +   0. If the real uid is 0, we raise the effective (legacy) bit of the
> +   executable file. */
> +#define SECURE_NOROOT                0
> +#define SECURE_NOROOT_LOCKED         1  /* make bit-0 immutable */
> +
> +#define SECBIT_NOROOT          (issecure_mask(SECURE_NOROOT))
> +#define SECBIT_NOROOT_LOCKED    (issecure_mask(SECURE_NOROOT_LOCKED))
> +
> +/* When set, setuid to/from uid 0 does not trigger capability-"fixup".
> +   When unset, to provide compatiblility with old programs relying on
> +   set*uid to gain/lose privilege, transitions to/from uid 0 cause
> +   capabilities to be gained/lost. */
> +#define SECURE_NO_SETUID_FIXUP       2
> +#define SECURE_NO_SETUID_FIXUP_LOCKED 3  /* make bit-2 immutable */
> +
> +#define SECBIT_NO_SETUID_FIXUP  (issecure_mask(SECURE_NO_SETUID_FIXUP))
> +#define SECBIT_NO_SETUID_FIXUP_LOCKED \
> +               (issecure_mask(SECURE_NO_SETUID_FIXUP_LOCKED))
> +
> +/* When set, a process can retain its capabilities even after
> +   transitioning to a non-root user (the set-uid fixup suppressed by
> +   bit 2). Bit-4 is cleared when a process calls exec(); setting both
> +   bit 4 and 5 will create a barrier through exec that no exec()'d
> +   child can use this feature again. */
> +#define SECURE_KEEP_CAPS        4
> +#define SECURE_KEEP_CAPS_LOCKED      5  /* make bit-4 immutable */
> +
> +#define SECBIT_KEEP_CAPS  (issecure_mask(SECURE_KEEP_CAPS))
> +#define SECBIT_KEEP_CAPS_LOCKED (issecure_mask(SECURE_KEEP_CAPS_LOCKED))
> +
> +/* When set, a process cannot add new capabilities to its ambient set. */
> +#define SECURE_NO_CAP_AMBIENT_RAISE       6
> +#define SECURE_NO_CAP_AMBIENT_RAISE_LOCKED 7  /* make bit-6 immutable */
> +
> +#define SECBIT_NO_CAP_AMBIENT_RAISE (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> +#define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
> +               (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
> +
> +#define SECURE_ALL_BITS         (issecure_mask(SECURE_NOROOT) | \
> +                    issecure_mask(SECURE_NO_SETUID_FIXUP) | \
> +                    issecure_mask(SECURE_KEEP_CAPS) | \
> +                    issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> +#define SECURE_ALL_LOCKS  (SECURE_ALL_BITS << 1)
> +
> +#endif /* _LINUX_SECUREBITS_H */
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 49c21fd855..afec9a6567 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -43,6 +43,7 @@
> #include "fuse_log.h"
> #include "fuse_lowlevel.h"
> #include "standard-headers/linux/fuse.h"
> +#include "standard-headers/linux/securebits.h"
> #include <cap-ng.h>
> #include <dirent.h>
> #include <pthread.h>
> @@ -3515,6 +3516,17 @@ static void setup_chroot(struct lo_data *lo)
>      }
> }
> +static void setup_securebits(void)
> +{
> +    int status;
> +
> +    status = prctl(PR_SET_SECUREBITS, SECBIT_NO_SETUID_FIXUP);
> +    if (status == -1) {
> +        fuse_log(FUSE_LOG_ERR, "prctl(PR_SET_SECUREBITS): %m\n");
> +        exit(1);
> +    }
> +}
> +
> /*
>   * Lock down this process to prevent access to other processes or files outside
>   * source directory.  This reduces the impact of arbitrary code execution bugs.
> @@ -3869,6 +3881,9 @@ int main(int argc, char *argv[])
>      setup_nofile_rlimit(opts.rlimit_nofile);
> +    /* Set securebits before we drop CAP_SETPCAP */
> +    setup_securebits();
> +
>      /* Must be before sandbox since it wants /proc */
>      setup_capng();
> --
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-22 13:45 ` Vivek Goyal
@ 2021-06-24 14:23   ` Ameer Ghani
  0 siblings, 0 replies; 9+ messages in thread
From: Ameer Ghani @ 2021-06-24 14:23 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs


On 6/22/21, 8:46 AM, "Vivek Goyal" <vgoyal@redhat.com> wrote:

>    I guess its ok to set SECBIT_NO_SETUID_FIXUP and drop CAP_SETPCAP and
>    let virtiosd drop capabilities explicitly where need be.
>
>    If this becomes too painful or inefficient from performance point of view,
>    we probably will have to change it and set SECBIT_NO_SETUID_FIXUP only
>    during file creation path. (lo_create and lo_mknod).

I think I follow. Will proceed with permanent SECBIT_NO_SETUID_FIXUP, but I'll
also explore setting/dropping in places where it's explicitly needed.

>    I would think that don't ask user to opt-in for this behavior and just
>    implement it for everyone. Asking too many many questions will make
>    configuration more complex.

Understood.

>    Can you please also run xfstests and see if this patch introduces any
>    regressions. Just want to make sure there are no unintended side affects.
>
>    Please do submit a formal patch.

Will do!



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-04 18:30   ` tecufanujacu
@ 2021-06-07 15:51     ` Harry G. Coin
  0 siblings, 0 replies; 9+ messages in thread
From: Harry G. Coin @ 2021-06-07 15:51 UTC (permalink / raw)
  To: virtio-fs


On 6/4/21 1:30 PM, tecufanujacu@tutanota.com wrote:
> I guess a third option is to change the fuse protocol so that it also
> includes the supplementary groups of the calling process.

I suspect that option will, over time, require the fewest
developer/maintainer hours.

But not in the short term....




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-02  8:32 ` Chirantan Ekbote
  2021-06-02 13:17   ` Dr. David Alan Gilbert
  2021-06-04 13:16   ` Vivek Goyal
@ 2021-06-04 18:30   ` tecufanujacu
  2021-06-07 15:51     ` Harry G. Coin
  2 siblings, 1 reply; 9+ messages in thread
From: tecufanujacu @ 2021-06-04 18:30 UTC (permalink / raw)
  To: Chirantan Ekbote; +Cc: virtio-fs-list

[-- Attachment #1: Type: text/plain, Size: 4740 bytes --]

Thanks for this great explanation, now everything is more clear to me.

Basically this problem happens every time there is permissions configuration just a little bit more advanced and at the moment there isn't an on the fly solution that doesn't need kernel patching and recompiling.

Good to know.


2 giu 2021, 10:32 da chirantan@chromium.org:

> On Wed, Jun 2, 2021 at 1:01 AM <tecufanujacu@tutanota.com> wrote:
>
>>
>> Hello to everyone,
>>
>> I'm trying virtiofsd with proxmox from few weeks and I noticed a problem, virtiofsd doesn't garant write access at users allowed by group permission.
>>
>> The virtiofsd bin included in proxmox is v 7.32, but I have also tested the bins compiled from source from stable branch (v7.27) and dev branch (v7.33), and I also have tested the bin virtiofsd-rs. Always same problem.
>>
>> I opened an issue on gitlab and I asked in the proxmox forum but with almost zero interaction. Seems to me that virtiofsd isn't a lot used. Someone suggested me to write in the mailinglist for these technical things and here I am.
>>
>> To better make understand what problem I'm noticing I link the issue opened in the gitlab in which I have reported a lot of useful info and logs with a good formatting:
>> https://gitlab.com/qemu-project/qemu/-/issues/368
>>
>> To me seems really strange what is happening, am I doing some error or this really is a virtiofsd bug?
>>
>
> I'm pretty sure this is a virtiofsd issue because we ran into the same
> problem on crosvm.  The issue is that the server changes its uid/gid
> to the uid/gid in the fuse context before making the syscall.  This
> ensures that the file/directory appears atomically with the correct
> metadata.  However, this causes an EACCES error when the following
> conditions are
> met:
>
> * The parent directory has g+rw permissions with gid A
> * The process has gid B but has A in its list of supplementary groups
>
> In this case the fuse context only contains gid B, which doesn't have
> permission to modify the parent directory.
>
> There are a couple of ways to fix this problem:
>
> The first one we tried was to split file/directory creation into 2
> stages [1].  Basically for files we first create a temporary with
> O_TMPFILE and then initialize the metadata before linking it into the
> directory tree.  The main issue with this is that we're duplicating
> the work that kernel already does on open and turning a single syscall
> in the VM into several syscalls on the host, which adds a significant
> amount of latency.  You also have to deal with a bunch of esoteric
> corner cases for file systems that the kernel would normally just
> handle automatically [2][3].  For directories, there is no O_TMPFILE
> equivalent so we had to do a gross hack of creating a directory with a
> random name and then renaming it to the correct one once all the
> metadata was properly initialized.  In theory you could create the
> directory in a separate "work dir" first but you have to be careful if
> the original directory uses selinux.  From what I understand, rename
> preserves the security context so to ensure the security context is
> properly inherited from the parent directory you need to create a new
> directory anyway.  Or figure out what the correct context should be
> and set it in the work dir before the rename.
>
> The second solution, which is also what we're using now, is to set the
> SECBIT_NO_SETUID_FIXUP flag.  This flag prevents the kernel from
> dropping caps when the process changes its uid/gid so the permission
> checks are skipped as long as the server has the appropriate
> capabilities (CAP_DAC_OVERRIDE, I think?).  Doing this lets us drop
> all the special handling code and just go back to making a single
> syscall and letting the kernel figure out the rest [4].  The crosvm fs
> device always runs as root in a user namespace with the proper caps so
> this works for us but obviously will not work if virtiofsd doesn't
> have the caps to begin with.  At that point the first option may be
> the only choice.
>
> I guess a third option is to change the fuse protocol so that it also
> includes the supplementary groups of the calling process.  Then the
> server can also set its supplementary groups the same way before
> making the syscall.  Merging the necessary changes into the kernel is
> left as an exercise for the reader ;-)
>
>
> Chirantan
>
> [1]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2217534
> [2]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2253493
> [3]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2260253
> [4]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2684067
>


[-- Attachment #2: Type: text/html, Size: 6105 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-02  8:32 ` Chirantan Ekbote
  2021-06-02 13:17   ` Dr. David Alan Gilbert
@ 2021-06-04 13:16   ` Vivek Goyal
  2021-06-04 18:30   ` tecufanujacu
  2 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2021-06-04 13:16 UTC (permalink / raw)
  To: Chirantan Ekbote; +Cc: virtio-fs-list

On Wed, Jun 02, 2021 at 05:32:49PM +0900, Chirantan Ekbote wrote:
> On Wed, Jun 2, 2021 at 1:01 AM <tecufanujacu@tutanota.com> wrote:
> >
> > Hello to everyone,
> >
> > I'm trying virtiofsd with proxmox from few weeks and I noticed a problem, virtiofsd doesn't garant write access at users allowed by group permission.
> >
> > The virtiofsd bin included in proxmox is v 7.32, but I have also tested the bins compiled from source from stable branch (v7.27) and dev branch (v7.33), and I also have tested the bin virtiofsd-rs. Always same problem.
> >
> > I opened an issue on gitlab and I asked in the proxmox forum but with almost zero interaction. Seems to me that virtiofsd isn't a lot used. Someone suggested me to write in the mailinglist for these technical things and here I am.
> >
> > To better make understand what problem I'm noticing I link the issue opened in the gitlab in which I have reported a lot of useful info and logs with a good formatting:
> > https://gitlab.com/qemu-project/qemu/-/issues/368
> >
> > To me seems really strange what is happening, am I doing some error or this really is a virtiofsd bug?
> 
> I'm pretty sure this is a virtiofsd issue because we ran into the same
> problem on crosvm.  The issue is that the server changes its uid/gid
> to the uid/gid in the fuse context before making the syscall.  This
> ensures that the file/directory appears atomically with the correct
> metadata.  However, this causes an EACCES error when the following
> conditions are
> met:
> 
> * The parent directory has g+rw permissions with gid A
> * The process has gid B but has A in its list of supplementary groups
> 
> In this case the fuse context only contains gid B, which doesn't have
> permission to modify the parent directory.
> 
> There are a couple of ways to fix this problem:
> 
> The first one we tried was to split file/directory creation into 2
> stages [1].  Basically for files we first create a temporary with
> O_TMPFILE and then initialize the metadata before linking it into the
> directory tree.  The main issue with this is that we're duplicating
> the work that kernel already does on open and turning a single syscall
> in the VM into several syscalls on the host, which adds a significant
> amount of latency.  You also have to deal with a bunch of esoteric
> corner cases for file systems that the kernel would normally just
> handle automatically [2][3].  For directories, there is no O_TMPFILE
> equivalent so we had to do a gross hack of creating a directory with a
> random name and then renaming it to the correct one once all the
> metadata was properly initialized.  In theory you could create the
> directory in a separate "work dir" first but you have to be careful if
> the original directory uses selinux.  From what I understand, rename
> preserves the security context so to ensure the security context is
> properly inherited from the parent directory you need to create a new
> directory anyway.  Or figure out what the correct context should be
> and set it in the work dir before the rename.

Hi Chirantan,

Thanks for the updating us with all the work you have already done to
tackle this issue. 

> 
> The second solution, which is also what we're using now, is to set the
> SECBIT_NO_SETUID_FIXUP flag.  This flag prevents the kernel from
> dropping caps when the process changes its uid/gid so the permission
> checks are skipped as long as the server has the appropriate
> capabilities (CAP_DAC_OVERRIDE, I think?).  Doing this lets us drop
> all the special handling code and just go back to making a single
> syscall and letting the kernel figure out the rest [4].  The crosvm fs
> device always runs as root in a user namespace with the proper caps so
> this works for us but obviously will not work if virtiofsd doesn't
> have the caps to begin with.  At that point the first option may be
> the only choice.

For the short term fix, this solution sounds reasonable. Only reason
we are swtiching to uid/gid of caller is not because of any permission
checks but to make sure new file is created with caller as owner (and
not root as owner). Permission checks are supposed to be done in
client. So if we set SECBIT_NO_SETUID_FIXUP securebit flag and
retain CAP_DAC_OVERRIDE after switching uid/gid, that should do it.

We still run in init_user_ns and already give CAP_DAC_OVERRIDE. man
says setting securebit flag requires CAP_SETPCAP. We don't give this
capability to daemon as of now. Looks like we will have to add it.

I do want to play with running virtiofsd in user namespace. I think
one of the problem still is who is going to choose the uid/gid range
for virtiofsd and make sure there are no conflicts with other
daemons. Also this will need support for changing uid/gids of files
according to mapping (Or now use idmapped mount support).

> 
> I guess a third option is to change the fuse protocol so that it also
> includes the supplementary groups of the calling process.  Then the
> server can also set its supplementary groups the same way before
> making the syscall.  Merging the necessary changes into the kernel is
> left as an exercise for the reader ;-)

This definitely sounds like a more long term solution to the problem.
Carry all supplemental group membership id in fuse message. Not sure
how much work this is.

Cc Miklos. He might have more thoughts on this.

Thanks
Vivek

> 
> 
> Chirantan
> 
> [1]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2217534
> [2]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2253493
> [3]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2260253
> [4]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2684067
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-02  8:32 ` Chirantan Ekbote
@ 2021-06-02 13:17   ` Dr. David Alan Gilbert
  2021-06-04 13:16   ` Vivek Goyal
  2021-06-04 18:30   ` tecufanujacu
  2 siblings, 0 replies; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2021-06-02 13:17 UTC (permalink / raw)
  To: Chirantan Ekbote; +Cc: virtio-fs-list

* Chirantan Ekbote (chirantan@chromium.org) wrote:
> On Wed, Jun 2, 2021 at 1:01 AM <tecufanujacu@tutanota.com> wrote:
> >
> > Hello to everyone,
> >
> > I'm trying virtiofsd with proxmox from few weeks and I noticed a problem, virtiofsd doesn't garant write access at users allowed by group permission.
> >
> > The virtiofsd bin included in proxmox is v 7.32, but I have also tested the bins compiled from source from stable branch (v7.27) and dev branch (v7.33), and I also have tested the bin virtiofsd-rs. Always same problem.
> >
> > I opened an issue on gitlab and I asked in the proxmox forum but with almost zero interaction. Seems to me that virtiofsd isn't a lot used. Someone suggested me to write in the mailinglist for these technical things and here I am.
> >
> > To better make understand what problem I'm noticing I link the issue opened in the gitlab in which I have reported a lot of useful info and logs with a good formatting:
> > https://gitlab.com/qemu-project/qemu/-/issues/368
> >
> > To me seems really strange what is happening, am I doing some error or this really is a virtiofsd bug?
> 
> I'm pretty sure this is a virtiofsd issue because we ran into the same
> problem on crosvm.  The issue is that the server changes its uid/gid
> to the uid/gid in the fuse context before making the syscall.  This
> ensures that the file/directory appears atomically with the correct
> metadata.  However, this causes an EACCES error when the following
> conditions are
> met:
> 
> * The parent directory has g+rw permissions with gid A
> * The process has gid B but has A in its list of supplementary groups
> 
> In this case the fuse context only contains gid B, which doesn't have
> permission to modify the parent directory.

Ewww; thanks for that description.
Could you add that to that gitlab issue please?

Dave

> There are a couple of ways to fix this problem:
> 
> The first one we tried was to split file/directory creation into 2
> stages [1].  Basically for files we first create a temporary with
> O_TMPFILE and then initialize the metadata before linking it into the
> directory tree.  The main issue with this is that we're duplicating
> the work that kernel already does on open and turning a single syscall
> in the VM into several syscalls on the host, which adds a significant
> amount of latency.  You also have to deal with a bunch of esoteric
> corner cases for file systems that the kernel would normally just
> handle automatically [2][3].  For directories, there is no O_TMPFILE
> equivalent so we had to do a gross hack of creating a directory with a
> random name and then renaming it to the correct one once all the
> metadata was properly initialized.  In theory you could create the
> directory in a separate "work dir" first but you have to be careful if
> the original directory uses selinux.  From what I understand, rename
> preserves the security context so to ensure the security context is
> properly inherited from the parent directory you need to create a new
> directory anyway.  Or figure out what the correct context should be
> and set it in the work dir before the rename.
> 
> The second solution, which is also what we're using now, is to set the
> SECBIT_NO_SETUID_FIXUP flag.  This flag prevents the kernel from
> dropping caps when the process changes its uid/gid so the permission
> checks are skipped as long as the server has the appropriate
> capabilities (CAP_DAC_OVERRIDE, I think?).  Doing this lets us drop
> all the special handling code and just go back to making a single
> syscall and letting the kernel figure out the rest [4].  The crosvm fs
> device always runs as root in a user namespace with the proper caps so
> this works for us but obviously will not work if virtiofsd doesn't
> have the caps to begin with.  At that point the first option may be
> the only choice.
> 
> I guess a third option is to change the fuse protocol so that it also
> includes the supplementary groups of the calling process.  Then the
> server can also set its supplementary groups the same way before
> making the syscall.  Merging the necessary changes into the kernel is
> left as an exercise for the reader ;-)
> 
> 
> Chirantan
> 
> [1]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2217534
> [2]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2253493
> [3]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2260253
> [4]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2684067
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
  2021-06-01 15:55 tecufanujacu
@ 2021-06-02  8:32 ` Chirantan Ekbote
  2021-06-02 13:17   ` Dr. David Alan Gilbert
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Chirantan Ekbote @ 2021-06-02  8:32 UTC (permalink / raw)
  To: tecufanujacu; +Cc: virtio-fs-list

On Wed, Jun 2, 2021 at 1:01 AM <tecufanujacu@tutanota.com> wrote:
>
> Hello to everyone,
>
> I'm trying virtiofsd with proxmox from few weeks and I noticed a problem, virtiofsd doesn't garant write access at users allowed by group permission.
>
> The virtiofsd bin included in proxmox is v 7.32, but I have also tested the bins compiled from source from stable branch (v7.27) and dev branch (v7.33), and I also have tested the bin virtiofsd-rs. Always same problem.
>
> I opened an issue on gitlab and I asked in the proxmox forum but with almost zero interaction. Seems to me that virtiofsd isn't a lot used. Someone suggested me to write in the mailinglist for these technical things and here I am.
>
> To better make understand what problem I'm noticing I link the issue opened in the gitlab in which I have reported a lot of useful info and logs with a good formatting:
> https://gitlab.com/qemu-project/qemu/-/issues/368
>
> To me seems really strange what is happening, am I doing some error or this really is a virtiofsd bug?

I'm pretty sure this is a virtiofsd issue because we ran into the same
problem on crosvm.  The issue is that the server changes its uid/gid
to the uid/gid in the fuse context before making the syscall.  This
ensures that the file/directory appears atomically with the correct
metadata.  However, this causes an EACCES error when the following
conditions are
met:

* The parent directory has g+rw permissions with gid A
* The process has gid B but has A in its list of supplementary groups

In this case the fuse context only contains gid B, which doesn't have
permission to modify the parent directory.

There are a couple of ways to fix this problem:

The first one we tried was to split file/directory creation into 2
stages [1].  Basically for files we first create a temporary with
O_TMPFILE and then initialize the metadata before linking it into the
directory tree.  The main issue with this is that we're duplicating
the work that kernel already does on open and turning a single syscall
in the VM into several syscalls on the host, which adds a significant
amount of latency.  You also have to deal with a bunch of esoteric
corner cases for file systems that the kernel would normally just
handle automatically [2][3].  For directories, there is no O_TMPFILE
equivalent so we had to do a gross hack of creating a directory with a
random name and then renaming it to the correct one once all the
metadata was properly initialized.  In theory you could create the
directory in a separate "work dir" first but you have to be careful if
the original directory uses selinux.  From what I understand, rename
preserves the security context so to ensure the security context is
properly inherited from the parent directory you need to create a new
directory anyway.  Or figure out what the correct context should be
and set it in the work dir before the rename.

The second solution, which is also what we're using now, is to set the
SECBIT_NO_SETUID_FIXUP flag.  This flag prevents the kernel from
dropping caps when the process changes its uid/gid so the permission
checks are skipped as long as the server has the appropriate
capabilities (CAP_DAC_OVERRIDE, I think?).  Doing this lets us drop
all the special handling code and just go back to making a single
syscall and letting the kernel figure out the rest [4].  The crosvm fs
device always runs as root in a user namespace with the proper caps so
this works for us but obviously will not work if virtiofsd doesn't
have the caps to begin with.  At that point the first option may be
the only choice.

I guess a third option is to change the fuse protocol so that it also
includes the supplementary groups of the calling process.  Then the
server can also set its supplementary groups the same way before
making the syscall.  Merging the necessary changes into the kernel is
left as an exercise for the reader ;-)


Chirantan

[1]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2217534
[2]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2253493
[3]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2260253
[4]: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2684067


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission
@ 2021-06-01 15:55 tecufanujacu
  2021-06-02  8:32 ` Chirantan Ekbote
  0 siblings, 1 reply; 9+ messages in thread
From: tecufanujacu @ 2021-06-01 15:55 UTC (permalink / raw)
  To: virtio-fs

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Hello to everyone,

I'm trying virtiofsd with proxmox from few weeks and I noticed a problem, virtiofsd doesn't garant write access at users allowed by group permission.

The virtiofsd bin included in proxmox is v 7.32, but I have also tested the bins compiled from source from stable branch (v7.27) and dev branch (v7.33), and I also have tested the bin virtiofsd-rs. Always same problem.

I opened an issue on gitlab and I asked in the proxmox forum but with almost zero interaction. Seems to me that virtiofsd isn't a lot used. Someone suggested me to write in the mailinglist for these technical things and here I am.

To better make understand what problem I'm noticing I link the issue opened in the gitlab in which I have reported a lot of useful info and logs with a good formatting:
https://gitlab.com/qemu-project/qemu/-/issues/368

To me seems really strange what is happening, am I doing some error or this really is a virtiofsd bug?

[-- Attachment #2: Type: text/html, Size: 4404 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-24 14:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 23:28 [Virtio-fs] virtiofsd: doesn't garant write access at users allowed by group permission Ameer Ghani
2021-06-22 13:45 ` Vivek Goyal
2021-06-24 14:23   ` Ameer Ghani
  -- strict thread matches above, loose matches on Subject: below --
2021-06-01 15:55 tecufanujacu
2021-06-02  8:32 ` Chirantan Ekbote
2021-06-02 13:17   ` Dr. David Alan Gilbert
2021-06-04 13:16   ` Vivek Goyal
2021-06-04 18:30   ` tecufanujacu
2021-06-07 15:51     ` Harry G. Coin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.