All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
@ 2022-10-20 20:14 Dave Marchevsky
  2022-10-21  8:09 ` Christian Brauner
  2022-10-21 16:05 ` Andrii Nakryiko
  0 siblings, 2 replies; 9+ messages in thread
From: Dave Marchevsky @ 2022-10-20 20:14 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, kernel-team, Dave Marchevsky, Andrii Nakryiko,
	Christian Brauner

This is a followup to a previous commit of mine [0], which added the
allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
rearranges the order of checks in fuse_allow_current_process without
changing functionality.

[0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
beginning of the function, with the reasoning that
allow_sys_admin_access should be an 'escape hatch' for users with
CAP_SYS_ADMIN, allowing them to skip any subsequent checks.

However, placing this new check first results in many capable() calls when
allow_sys_admin_access is set, where another check would've also
returned 1. This can be problematic when a BPF program is tracing
capable() calls.

At Meta we ran into such a scenario recently. On a host where
allow_sys_admin_access is set but most of the FUSE access is from
processes which would pass other checks - i.e. they don't need
CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
call for each fs op. We also have a daemon tracing capable() with BPF and
doing some data collection, so tracing these extraneous capable() calls
has the potential to regress performance for an application doing many
FUSE ops.

So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
hatch' is checked last. Previously, if allow_other is set on the
fuse_conn, uid/gid checking doesn't happen as current_in_userns result
is returned. It's necessary to add a 'goto' here to skip past uid/gid
check to maintain those semantics here.

  [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
---
v1 -> v2: lore.kernel.org/linux-fsdevel/20221020183830.1077143-1-davemarchevsky@fb.com

  * Add missing brackets to allow_other if statement (Andrii)

 fs/fuse/dir.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 2c4b08a6ec81..2f14e90907a2 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1254,11 +1254,11 @@ int fuse_allow_current_process(struct fuse_conn *fc)
 {
 	const struct cred *cred;
 
-	if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
-		return 1;
-
-	if (fc->allow_other)
-		return current_in_userns(fc->user_ns);
+	if (fc->allow_other) {
+		if (current_in_userns(fc->user_ns))
+			return 1;
+		goto skip_cred_check;
+	}
 
 	cred = current_cred();
 	if (uid_eq(cred->euid, fc->user_id) &&
@@ -1269,6 +1269,10 @@ int fuse_allow_current_process(struct fuse_conn *fc)
 	    gid_eq(cred->gid,  fc->group_id))
 		return 1;
 
+skip_cred_check:
+	if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
+		return 1;
+
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-20 20:14 [PATCH v2] fuse: Rearrange fuse_allow_current_process checks Dave Marchevsky
@ 2022-10-21  8:09 ` Christian Brauner
  2022-10-21 16:02   ` Andrii Nakryiko
  2022-10-21 16:05 ` Andrii Nakryiko
  1 sibling, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2022-10-21  8:09 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: linux-fsdevel, Miklos Szeredi, kernel-team, Andrii Nakryiko

On Thu, Oct 20, 2022 at 01:14:09PM -0700, Dave Marchevsky wrote:
> This is a followup to a previous commit of mine [0], which added the
> allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> rearranges the order of checks in fuse_allow_current_process without
> changing functionality.
> 
> [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> beginning of the function, with the reasoning that
> allow_sys_admin_access should be an 'escape hatch' for users with
> CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> 
> However, placing this new check first results in many capable() calls when
> allow_sys_admin_access is set, where another check would've also
> returned 1. This can be problematic when a BPF program is tracing
> capable() calls.
> 
> At Meta we ran into such a scenario recently. On a host where
> allow_sys_admin_access is set but most of the FUSE access is from
> processes which would pass other checks - i.e. they don't need
> CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> call for each fs op. We also have a daemon tracing capable() with BPF and
> doing some data collection, so tracing these extraneous capable() calls
> has the potential to regress performance for an application doing many
> FUSE ops.
> 
> So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> hatch' is checked last. Previously, if allow_other is set on the
> fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> is returned. It's necessary to add a 'goto' here to skip past uid/gid
> check to maintain those semantics here.
> 
>   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Cc: Christian Brauner <brauner@kernel.org>
> ---

The idea sounds good.

> v1 -> v2: lore.kernel.org/linux-fsdevel/20221020183830.1077143-1-davemarchevsky@fb.com
> 
>   * Add missing brackets to allow_other if statement (Andrii)
> 
>  fs/fuse/dir.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 2c4b08a6ec81..2f14e90907a2 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1254,11 +1254,11 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>  {
>  	const struct cred *cred;
>  
> -	if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> -		return 1;
> -
> -	if (fc->allow_other)
> -		return current_in_userns(fc->user_ns);
> +	if (fc->allow_other) {
> +		if (current_in_userns(fc->user_ns))
> +			return 1;
> +		goto skip_cred_check;

I think this is misnamed especially because capabilities are creds as
well. Maybe we should not use a goto even if it makes the patch a bit
bigger (_completely untested_)?:

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index bb97a384dc5d..3d733e0865bf 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1235,6 +1235,28 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
        return err;
 }

+static inline bool fuse_permissible_uidgid(const struct fuse_conn *fc)
+{
+       cred = current_cred();
+       return (uid_eq(cred->euid, fc->user_id) &&
+               uid_eq(cred->suid, fc->user_id) &&
+               uid_eq(cred->uid,  fc->user_id) &&
+               gid_eq(cred->egid, fc->group_id) &&
+               gid_eq(cred->sgid, fc->group_id) &&
+               gid_eq(cred->gid,  fc->group_id))
+}
+
+static inline bool fuse_permissible_other(const struct fuse_conn *fc)
+{
+       if (current_in_userns(fc->user_ns))
+               return true;
+
+       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
+               return true;
+
+       return false;
+}
+
 /*
  * Calling into a user-controlled filesystem gives the filesystem
  * daemon ptrace-like capabilities over the current process.  This
@@ -1250,24 +1272,10 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
  */
 int fuse_allow_current_process(struct fuse_conn *fc)
 {
-       const struct cred *cred;
-
-       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
-               return 1;
-
        if (fc->allow_other)
-               return current_in_userns(fc->user_ns);
-
-       cred = current_cred();
-       if (uid_eq(cred->euid, fc->user_id) &&
-           uid_eq(cred->suid, fc->user_id) &&
-           uid_eq(cred->uid,  fc->user_id) &&
-           gid_eq(cred->egid, fc->group_id) &&
-           gid_eq(cred->sgid, fc->group_id) &&
-           gid_eq(cred->gid,  fc->group_id))
-               return 1;
+               return fuse_permissible_other(fc);

-       return 0;
+       return fuse_permissible_uidgid(fc);
 }

 static int fuse_access(struct inode *inode, int mask)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-21  8:09 ` Christian Brauner
@ 2022-10-21 16:02   ` Andrii Nakryiko
  2022-10-22 13:30     ` Christian Brauner
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2022-10-21 16:02 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Dave Marchevsky, linux-fsdevel, Miklos Szeredi, kernel-team,
	Andrii Nakryiko

On Fri, Oct 21, 2022 at 1:09 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, Oct 20, 2022 at 01:14:09PM -0700, Dave Marchevsky wrote:
> > This is a followup to a previous commit of mine [0], which added the
> > allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> > rearranges the order of checks in fuse_allow_current_process without
> > changing functionality.
> >
> > [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> > beginning of the function, with the reasoning that
> > allow_sys_admin_access should be an 'escape hatch' for users with
> > CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> >
> > However, placing this new check first results in many capable() calls when
> > allow_sys_admin_access is set, where another check would've also
> > returned 1. This can be problematic when a BPF program is tracing
> > capable() calls.
> >
> > At Meta we ran into such a scenario recently. On a host where
> > allow_sys_admin_access is set but most of the FUSE access is from
> > processes which would pass other checks - i.e. they don't need
> > CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> > call for each fs op. We also have a daemon tracing capable() with BPF and
> > doing some data collection, so tracing these extraneous capable() calls
> > has the potential to regress performance for an application doing many
> > FUSE ops.
> >
> > So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> > hatch' is checked last. Previously, if allow_other is set on the
> > fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> > is returned. It's necessary to add a 'goto' here to skip past uid/gid
> > check to maintain those semantics here.
> >
> >   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> >
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > ---
>
> The idea sounds good.
>
> > v1 -> v2: lore.kernel.org/linux-fsdevel/20221020183830.1077143-1-davemarchevsky@fb.com
> >
> >   * Add missing brackets to allow_other if statement (Andrii)
> >
> >  fs/fuse/dir.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 2c4b08a6ec81..2f14e90907a2 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1254,11 +1254,11 @@ int fuse_allow_current_process(struct fuse_conn *fc)
> >  {
> >       const struct cred *cred;
> >
> > -     if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> > -             return 1;
> > -
> > -     if (fc->allow_other)
> > -             return current_in_userns(fc->user_ns);
> > +     if (fc->allow_other) {
> > +             if (current_in_userns(fc->user_ns))
> > +                     return 1;
> > +             goto skip_cred_check;
>
> I think this is misnamed especially because capabilities are creds as
> well. Maybe we should not use a goto even if it makes the patch a bit
> bigger (_completely untested_)?:
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index bb97a384dc5d..3d733e0865bf 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1235,6 +1235,28 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
>         return err;
>  }
>
> +static inline bool fuse_permissible_uidgid(const struct fuse_conn *fc)
> +{
> +       cred = current_cred();
> +       return (uid_eq(cred->euid, fc->user_id) &&
> +               uid_eq(cred->suid, fc->user_id) &&
> +               uid_eq(cred->uid,  fc->user_id) &&
> +               gid_eq(cred->egid, fc->group_id) &&
> +               gid_eq(cred->sgid, fc->group_id) &&
> +               gid_eq(cred->gid,  fc->group_id))
> +}
> +
> +static inline bool fuse_permissible_other(const struct fuse_conn *fc)
> +{
> +       if (current_in_userns(fc->user_ns))
> +               return true;
> +
> +       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> +               return true;

This needs to be checked regardless of fc->allow_other, so the change
you are proposing is not equivalent to the original logic. It does
seem like a simple goto is a cleaner approach in this particular case.

> +
> +       return false;
> +}
> +
>  /*
>   * Calling into a user-controlled filesystem gives the filesystem
>   * daemon ptrace-like capabilities over the current process.  This
> @@ -1250,24 +1272,10 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
>   */
>  int fuse_allow_current_process(struct fuse_conn *fc)
>  {
> -       const struct cred *cred;
> -
> -       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> -               return 1;
> -
>         if (fc->allow_other)
> -               return current_in_userns(fc->user_ns);
> -
> -       cred = current_cred();
> -       if (uid_eq(cred->euid, fc->user_id) &&
> -           uid_eq(cred->suid, fc->user_id) &&
> -           uid_eq(cred->uid,  fc->user_id) &&
> -           gid_eq(cred->egid, fc->group_id) &&
> -           gid_eq(cred->sgid, fc->group_id) &&
> -           gid_eq(cred->gid,  fc->group_id))
> -               return 1;
> +               return fuse_permissible_other(fc);
>
> -       return 0;
> +       return fuse_permissible_uidgid(fc);
>  }
>
>  static int fuse_access(struct inode *inode, int mask)
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-20 20:14 [PATCH v2] fuse: Rearrange fuse_allow_current_process checks Dave Marchevsky
  2022-10-21  8:09 ` Christian Brauner
@ 2022-10-21 16:05 ` Andrii Nakryiko
  2022-10-22 13:22   ` Christian Brauner
  1 sibling, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2022-10-21 16:05 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: linux-fsdevel, Miklos Szeredi, kernel-team, Andrii Nakryiko,
	Christian Brauner

On Thu, Oct 20, 2022 at 1:14 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> This is a followup to a previous commit of mine [0], which added the
> allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> rearranges the order of checks in fuse_allow_current_process without
> changing functionality.
>
> [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> beginning of the function, with the reasoning that
> allow_sys_admin_access should be an 'escape hatch' for users with
> CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
>
> However, placing this new check first results in many capable() calls when
> allow_sys_admin_access is set, where another check would've also
> returned 1. This can be problematic when a BPF program is tracing
> capable() calls.
>
> At Meta we ran into such a scenario recently. On a host where
> allow_sys_admin_access is set but most of the FUSE access is from
> processes which would pass other checks - i.e. they don't need
> CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> call for each fs op. We also have a daemon tracing capable() with BPF and
> doing some data collection, so tracing these extraneous capable() calls
> has the potential to regress performance for an application doing many
> FUSE ops.
>
> So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> hatch' is checked last. Previously, if allow_other is set on the
> fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> is returned. It's necessary to add a 'goto' here to skip past uid/gid
> check to maintain those semantics here.
>
>   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Cc: Christian Brauner <brauner@kernel.org>
> ---

LGTM!

Acked-by: Andrii Nakryiko <andrii@kernel.org>


But I would also be curious to hear from Miklos or others whether
skipping uid/gid check if fc->allow_other is true wa an intentional
logic, oversight, or just doesn't matter. Because doing:

if (fc->allow_other && current_in_userns(fc->user_ns))
    return 1;

/* do uid/gid check */

if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
    return 1;

Would be simpler and more linear logic, but I'm not sure if there are
subtle downsides to doing it this way.


> v1 -> v2: lore.kernel.org/linux-fsdevel/20221020183830.1077143-1-davemarchevsky@fb.com
>
>   * Add missing brackets to allow_other if statement (Andrii)
>
>  fs/fuse/dir.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 2c4b08a6ec81..2f14e90907a2 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1254,11 +1254,11 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>  {
>         const struct cred *cred;
>
> -       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> -               return 1;
> -
> -       if (fc->allow_other)
> -               return current_in_userns(fc->user_ns);
> +       if (fc->allow_other) {
> +               if (current_in_userns(fc->user_ns))
> +                       return 1;
> +               goto skip_cred_check;
> +       }
>
>         cred = current_cred();
>         if (uid_eq(cred->euid, fc->user_id) &&
> @@ -1269,6 +1269,10 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>             gid_eq(cred->gid,  fc->group_id))
>                 return 1;
>
> +skip_cred_check:
> +       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> +               return 1;
> +
>         return 0;
>  }
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-21 16:05 ` Andrii Nakryiko
@ 2022-10-22 13:22   ` Christian Brauner
  2022-10-24 13:44     ` Seth Forshee
  2022-10-24 16:52     ` Andrii Nakryiko
  0 siblings, 2 replies; 9+ messages in thread
From: Christian Brauner @ 2022-10-22 13:22 UTC (permalink / raw)
  To: Andrii Nakryiko, Seth Forshee
  Cc: Dave Marchevsky, linux-fsdevel, Miklos Szeredi, kernel-team,
	Andrii Nakryiko

On Fri, Oct 21, 2022 at 09:05:26AM -0700, Andrii Nakryiko wrote:
> On Thu, Oct 20, 2022 at 1:14 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> >
> > This is a followup to a previous commit of mine [0], which added the
> > allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> > rearranges the order of checks in fuse_allow_current_process without
> > changing functionality.
> >
> > [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> > beginning of the function, with the reasoning that
> > allow_sys_admin_access should be an 'escape hatch' for users with
> > CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> >
> > However, placing this new check first results in many capable() calls when
> > allow_sys_admin_access is set, where another check would've also
> > returned 1. This can be problematic when a BPF program is tracing
> > capable() calls.
> >
> > At Meta we ran into such a scenario recently. On a host where
> > allow_sys_admin_access is set but most of the FUSE access is from
> > processes which would pass other checks - i.e. they don't need
> > CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> > call for each fs op. We also have a daemon tracing capable() with BPF and
> > doing some data collection, so tracing these extraneous capable() calls
> > has the potential to regress performance for an application doing many
> > FUSE ops.
> >
> > So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> > hatch' is checked last. Previously, if allow_other is set on the
> > fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> > is returned. It's necessary to add a 'goto' here to skip past uid/gid
> > check to maintain those semantics here.
> >
> >   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> >
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > ---
> 
> LGTM!
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
> 
> But I would also be curious to hear from Miklos or others whether
> skipping uid/gid check if fc->allow_other is true wa an intentional
> logic, oversight, or just doesn't matter. Because doing:

Originally, setting fc->allow_other granted access to everyone skipping
all further permission checks.

When Seth (Cced) made it possible to mount fuse in userns
fc->allow_other needed to be restricted to callers who are in the
superblock's userns or a descendant of it. The reason is that an
unprivileged user can mount a fuse filesystem with fc->allow_other
turned on. But then the user could mess with processes outside its
userns when they access the filesystem.

Without fc->allow_other it doesn't matter what userns the filesystem is
accessed from as long as the uidgid is permissible. This could e.g.,
happen if you unshare a userns but dont set{g,u}id.

In general, I see two obvious permission models. In the first model we
restrict access to the owner of the mount by uidgid but also require
callers to be within the userns hierarchy. If allow_other is specified
the requirement would be relaxed to only require the caller to be within
the userns hierarchy. That would make the permission checking
consistent.

The second permission model would only require permissible uidgid by
default and for allow other either permissible uidgid or for the caller
to be within the userns hierarchy (Which is what you're asking about
afaiu.).

I don't see any obvious reasons why this wouldn't work from a security
perspective; maybe Seth has additional insights. The problem however is
that it would be a non-trivial change for containers to lift the
restriction now. It is quite useful to be able to mount a fuse
filesystem with allow_other and not have it accessible by anything
outside your userns hierarchy. So I wouldn't like to see that changed.

If so we should probably make it yet another fuse module option or sm.
But in any case that should be a separate patch with a proper
justification why this semantic change is needed other than that it
simplifies the logic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-21 16:02   ` Andrii Nakryiko
@ 2022-10-22 13:30     ` Christian Brauner
  2022-10-24 16:53       ` Andrii Nakryiko
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2022-10-22 13:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Dave Marchevsky, linux-fsdevel, Miklos Szeredi, kernel-team,
	Andrii Nakryiko

On Fri, Oct 21, 2022 at 09:02:30AM -0700, Andrii Nakryiko wrote:
> On Fri, Oct 21, 2022 at 1:09 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Thu, Oct 20, 2022 at 01:14:09PM -0700, Dave Marchevsky wrote:
> > > This is a followup to a previous commit of mine [0], which added the
> > > allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> > > rearranges the order of checks in fuse_allow_current_process without
> > > changing functionality.
> > >
> > > [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> > > beginning of the function, with the reasoning that
> > > allow_sys_admin_access should be an 'escape hatch' for users with
> > > CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> > >
> > > However, placing this new check first results in many capable() calls when
> > > allow_sys_admin_access is set, where another check would've also
> > > returned 1. This can be problematic when a BPF program is tracing
> > > capable() calls.
> > >
> > > At Meta we ran into such a scenario recently. On a host where
> > > allow_sys_admin_access is set but most of the FUSE access is from
> > > processes which would pass other checks - i.e. they don't need
> > > CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> > > call for each fs op. We also have a daemon tracing capable() with BPF and
> > > doing some data collection, so tracing these extraneous capable() calls
> > > has the potential to regress performance for an application doing many
> > > FUSE ops.
> > >
> > > So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> > > hatch' is checked last. Previously, if allow_other is set on the
> > > fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> > > is returned. It's necessary to add a 'goto' here to skip past uid/gid
> > > check to maintain those semantics here.
> > >
> > >   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> > >
> > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > ---
> >
> > The idea sounds good.
> >
> > > v1 -> v2: lore.kernel.org/linux-fsdevel/20221020183830.1077143-1-davemarchevsky@fb.com
> > >
> > >   * Add missing brackets to allow_other if statement (Andrii)
> > >
> > >  fs/fuse/dir.c | 14 +++++++++-----
> > >  1 file changed, 9 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > index 2c4b08a6ec81..2f14e90907a2 100644
> > > --- a/fs/fuse/dir.c
> > > +++ b/fs/fuse/dir.c
> > > @@ -1254,11 +1254,11 @@ int fuse_allow_current_process(struct fuse_conn *fc)
> > >  {
> > >       const struct cred *cred;
> > >
> > > -     if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> > > -             return 1;
> > > -
> > > -     if (fc->allow_other)
> > > -             return current_in_userns(fc->user_ns);
> > > +     if (fc->allow_other) {
> > > +             if (current_in_userns(fc->user_ns))
> > > +                     return 1;
> > > +             goto skip_cred_check;
> >
> > I think this is misnamed especially because capabilities are creds as
> > well. Maybe we should not use a goto even if it makes the patch a bit
> > bigger (_completely untested_)?:
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index bb97a384dc5d..3d733e0865bf 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1235,6 +1235,28 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> >         return err;
> >  }
> >
> > +static inline bool fuse_permissible_uidgid(const struct fuse_conn *fc)
> > +{
> > +       cred = current_cred();
> > +       return (uid_eq(cred->euid, fc->user_id) &&
> > +               uid_eq(cred->suid, fc->user_id) &&
> > +               uid_eq(cred->uid,  fc->user_id) &&
> > +               gid_eq(cred->egid, fc->group_id) &&
> > +               gid_eq(cred->sgid, fc->group_id) &&
> > +               gid_eq(cred->gid,  fc->group_id))
> > +}
> > +
> > +static inline bool fuse_permissible_other(const struct fuse_conn *fc)
> > +{
> > +       if (current_in_userns(fc->user_ns))
> > +               return true;
> > +
> > +       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> > +               return true;
> 
> This needs to be checked regardless of fc->allow_other, so the change
> you are proposing is not equivalent to the original logic. It does

Thanks for spotting this. I missed that. I'm not opposed to gotos it
just seems a bit ugly in this case and I'd prefer sm like (again,
completely untested):

int fuse_allow_current_process(struct fuse_conn *fc)
{
	const struct cred *cred;
	int ret;

	if (fc->allow_other)
		ret = current_in_userns(fc->user_ns);
	else
		ret = fuse_permissible_uidgid(fc);
	if (!ret && allow_sys_admin_access && capable(CAP_SYS_ADMIN))
		ret = 1;

	return ret;
}

but I'm not fuzzed about it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-22 13:22   ` Christian Brauner
@ 2022-10-24 13:44     ` Seth Forshee
  2022-10-24 16:52     ` Andrii Nakryiko
  1 sibling, 0 replies; 9+ messages in thread
From: Seth Forshee @ 2022-10-24 13:44 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrii Nakryiko, Dave Marchevsky, linux-fsdevel, Miklos Szeredi,
	kernel-team, Andrii Nakryiko

On Sat, Oct 22, 2022 at 03:22:13PM +0200, Christian Brauner wrote:
> On Fri, Oct 21, 2022 at 09:05:26AM -0700, Andrii Nakryiko wrote:
> > On Thu, Oct 20, 2022 at 1:14 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> > >
> > > This is a followup to a previous commit of mine [0], which added the
> > > allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> > > rearranges the order of checks in fuse_allow_current_process without
> > > changing functionality.
> > >
> > > [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> > > beginning of the function, with the reasoning that
> > > allow_sys_admin_access should be an 'escape hatch' for users with
> > > CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> > >
> > > However, placing this new check first results in many capable() calls when
> > > allow_sys_admin_access is set, where another check would've also
> > > returned 1. This can be problematic when a BPF program is tracing
> > > capable() calls.
> > >
> > > At Meta we ran into such a scenario recently. On a host where
> > > allow_sys_admin_access is set but most of the FUSE access is from
> > > processes which would pass other checks - i.e. they don't need
> > > CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> > > call for each fs op. We also have a daemon tracing capable() with BPF and
> > > doing some data collection, so tracing these extraneous capable() calls
> > > has the potential to regress performance for an application doing many
> > > FUSE ops.
> > >
> > > So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> > > hatch' is checked last. Previously, if allow_other is set on the
> > > fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> > > is returned. It's necessary to add a 'goto' here to skip past uid/gid
> > > check to maintain those semantics here.
> > >
> > >   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> > >
> > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > ---
> > 
> > LGTM!
> > 
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > 
> > 
> > But I would also be curious to hear from Miklos or others whether
> > skipping uid/gid check if fc->allow_other is true wa an intentional
> > logic, oversight, or just doesn't matter. Because doing:
> 
> Originally, setting fc->allow_other granted access to everyone skipping
> all further permission checks.
> 
> When Seth (Cced) made it possible to mount fuse in userns
> fc->allow_other needed to be restricted to callers who are in the
> superblock's userns or a descendant of it. The reason is that an
> unprivileged user can mount a fuse filesystem with fc->allow_other
> turned on. But then the user could mess with processes outside its
> userns when they access the filesystem.
> 
> Without fc->allow_other it doesn't matter what userns the filesystem is
> accessed from as long as the uidgid is permissible. This could e.g.,
> happen if you unshare a userns but dont set{g,u}id.
> 
> In general, I see two obvious permission models. In the first model we
> restrict access to the owner of the mount by uidgid but also require
> callers to be within the userns hierarchy. If allow_other is specified
> the requirement would be relaxed to only require the caller to be within
> the userns hierarchy. That would make the permission checking
> consistent.
> 
> The second permission model would only require permissible uidgid by
> default and for allow other either permissible uidgid or for the caller
> to be within the userns hierarchy (Which is what you're asking about
> afaiu.).
> 
> I don't see any obvious reasons why this wouldn't work from a security
> perspective; maybe Seth has additional insights. The problem however is
> that it would be a non-trivial change for containers to lift the
> restriction now. It is quite useful to be able to mount a fuse
> filesystem with allow_other and not have it accessible by anything
> outside your userns hierarchy. So I wouldn't like to see that changed.

I think both of these are defensible models from a security perspective.
But I don't have a good idea of what container runtimes have come to
expect. Either of these would represent a change which could potentially
break assumptions of something in userspace.

Seth

> If so we should probably make it yet another fuse module option or sm.
> But in any case that should be a separate patch with a proper
> justification why this semantic change is needed other than that it
> simplifies the logic.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-22 13:22   ` Christian Brauner
  2022-10-24 13:44     ` Seth Forshee
@ 2022-10-24 16:52     ` Andrii Nakryiko
  1 sibling, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2022-10-24 16:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Seth Forshee, Dave Marchevsky, linux-fsdevel, Miklos Szeredi,
	kernel-team, Andrii Nakryiko

On Sat, Oct 22, 2022 at 6:22 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Oct 21, 2022 at 09:05:26AM -0700, Andrii Nakryiko wrote:
> > On Thu, Oct 20, 2022 at 1:14 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> > >
> > > This is a followup to a previous commit of mine [0], which added the
> > > allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> > > rearranges the order of checks in fuse_allow_current_process without
> > > changing functionality.
> > >
> > > [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> > > beginning of the function, with the reasoning that
> > > allow_sys_admin_access should be an 'escape hatch' for users with
> > > CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> > >
> > > However, placing this new check first results in many capable() calls when
> > > allow_sys_admin_access is set, where another check would've also
> > > returned 1. This can be problematic when a BPF program is tracing
> > > capable() calls.
> > >
> > > At Meta we ran into such a scenario recently. On a host where
> > > allow_sys_admin_access is set but most of the FUSE access is from
> > > processes which would pass other checks - i.e. they don't need
> > > CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> > > call for each fs op. We also have a daemon tracing capable() with BPF and
> > > doing some data collection, so tracing these extraneous capable() calls
> > > has the potential to regress performance for an application doing many
> > > FUSE ops.
> > >
> > > So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> > > hatch' is checked last. Previously, if allow_other is set on the
> > > fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> > > is returned. It's necessary to add a 'goto' here to skip past uid/gid
> > > check to maintain those semantics here.
> > >
> > >   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> > >
> > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > ---
> >
> > LGTM!
> >
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> >
> >
> > But I would also be curious to hear from Miklos or others whether
> > skipping uid/gid check if fc->allow_other is true wa an intentional
> > logic, oversight, or just doesn't matter. Because doing:
>
> Originally, setting fc->allow_other granted access to everyone skipping
> all further permission checks.
>
> When Seth (Cced) made it possible to mount fuse in userns
> fc->allow_other needed to be restricted to callers who are in the
> superblock's userns or a descendant of it. The reason is that an
> unprivileged user can mount a fuse filesystem with fc->allow_other
> turned on. But then the user could mess with processes outside its
> userns when they access the filesystem.
>
> Without fc->allow_other it doesn't matter what userns the filesystem is
> accessed from as long as the uidgid is permissible. This could e.g.,
> happen if you unshare a userns but dont set{g,u}id.
>
> In general, I see two obvious permission models. In the first model we
> restrict access to the owner of the mount by uidgid but also require
> callers to be within the userns hierarchy. If allow_other is specified
> the requirement would be relaxed to only require the caller to be within
> the userns hierarchy. That would make the permission checking
> consistent.
>
> The second permission model would only require permissible uidgid by
> default and for allow other either permissible uidgid or for the caller
> to be within the userns hierarchy (Which is what you're asking about
> afaiu.).
>

Great, thanks for explaining! It does feel the latter is a more
natural behavior, but I was just mostly curious, don't really have any
strong opinion either way.

> I don't see any obvious reasons why this wouldn't work from a security
> perspective; maybe Seth has additional insights. The problem however is
> that it would be a non-trivial change for containers to lift the
> restriction now. It is quite useful to be able to mount a fuse
> filesystem with allow_other and not have it accessible by anything
> outside your userns hierarchy. So I wouldn't like to see that changed.
>
> If so we should probably make it yet another fuse module option or sm.
> But in any case that should be a separate patch with a proper
> justification why this semantic change is needed other than that it
> simplifies the logic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fuse: Rearrange fuse_allow_current_process checks
  2022-10-22 13:30     ` Christian Brauner
@ 2022-10-24 16:53       ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2022-10-24 16:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Dave Marchevsky, linux-fsdevel, Miklos Szeredi, kernel-team,
	Andrii Nakryiko

On Sat, Oct 22, 2022 at 6:30 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Fri, Oct 21, 2022 at 09:02:30AM -0700, Andrii Nakryiko wrote:
> > On Fri, Oct 21, 2022 at 1:09 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Thu, Oct 20, 2022 at 01:14:09PM -0700, Dave Marchevsky wrote:
> > > > This is a followup to a previous commit of mine [0], which added the
> > > > allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
> > > > rearranges the order of checks in fuse_allow_current_process without
> > > > changing functionality.
> > > >
> > > > [0] added allow_sys_admin_access && capable(CAP_SYS_ADMIN) check to the
> > > > beginning of the function, with the reasoning that
> > > > allow_sys_admin_access should be an 'escape hatch' for users with
> > > > CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
> > > >
> > > > However, placing this new check first results in many capable() calls when
> > > > allow_sys_admin_access is set, where another check would've also
> > > > returned 1. This can be problematic when a BPF program is tracing
> > > > capable() calls.
> > > >
> > > > At Meta we ran into such a scenario recently. On a host where
> > > > allow_sys_admin_access is set but most of the FUSE access is from
> > > > processes which would pass other checks - i.e. they don't need
> > > > CAP_SYS_ADMIN 'escape hatch' - this results in an unnecessary capable()
> > > > call for each fs op. We also have a daemon tracing capable() with BPF and
> > > > doing some data collection, so tracing these extraneous capable() calls
> > > > has the potential to regress performance for an application doing many
> > > > FUSE ops.
> > > >
> > > > So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
> > > > hatch' is checked last. Previously, if allow_other is set on the
> > > > fuse_conn, uid/gid checking doesn't happen as current_in_userns result
> > > > is returned. It's necessary to add a 'goto' here to skip past uid/gid
> > > > check to maintain those semantics here.
> > > >
> > > >   [0]: commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other")
> > > >
> > > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > > > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > ---
> > >
> > > The idea sounds good.
> > >
> > > > v1 -> v2: lore.kernel.org/linux-fsdevel/20221020183830.1077143-1-davemarchevsky@fb.com
> > > >
> > > >   * Add missing brackets to allow_other if statement (Andrii)
> > > >
> > > >  fs/fuse/dir.c | 14 +++++++++-----
> > > >  1 file changed, 9 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > > index 2c4b08a6ec81..2f14e90907a2 100644
> > > > --- a/fs/fuse/dir.c
> > > > +++ b/fs/fuse/dir.c
> > > > @@ -1254,11 +1254,11 @@ int fuse_allow_current_process(struct fuse_conn *fc)
> > > >  {
> > > >       const struct cred *cred;
> > > >
> > > > -     if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> > > > -             return 1;
> > > > -
> > > > -     if (fc->allow_other)
> > > > -             return current_in_userns(fc->user_ns);
> > > > +     if (fc->allow_other) {
> > > > +             if (current_in_userns(fc->user_ns))
> > > > +                     return 1;
> > > > +             goto skip_cred_check;
> > >
> > > I think this is misnamed especially because capabilities are creds as
> > > well. Maybe we should not use a goto even if it makes the patch a bit
> > > bigger (_completely untested_)?:
> > >
> > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > index bb97a384dc5d..3d733e0865bf 100644
> > > --- a/fs/fuse/dir.c
> > > +++ b/fs/fuse/dir.c
> > > @@ -1235,6 +1235,28 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> > >         return err;
> > >  }
> > >
> > > +static inline bool fuse_permissible_uidgid(const struct fuse_conn *fc)
> > > +{
> > > +       cred = current_cred();
> > > +       return (uid_eq(cred->euid, fc->user_id) &&
> > > +               uid_eq(cred->suid, fc->user_id) &&
> > > +               uid_eq(cred->uid,  fc->user_id) &&
> > > +               gid_eq(cred->egid, fc->group_id) &&
> > > +               gid_eq(cred->sgid, fc->group_id) &&
> > > +               gid_eq(cred->gid,  fc->group_id))
> > > +}
> > > +
> > > +static inline bool fuse_permissible_other(const struct fuse_conn *fc)
> > > +{
> > > +       if (current_in_userns(fc->user_ns))
> > > +               return true;
> > > +
> > > +       if (allow_sys_admin_access && capable(CAP_SYS_ADMIN))
> > > +               return true;
> >
> > This needs to be checked regardless of fc->allow_other, so the change
> > you are proposing is not equivalent to the original logic. It does
>
> Thanks for spotting this. I missed that. I'm not opposed to gotos it
> just seems a bit ugly in this case and I'd prefer sm like (again,
> completely untested):
>
> int fuse_allow_current_process(struct fuse_conn *fc)
> {
>         const struct cred *cred;
>         int ret;
>
>         if (fc->allow_other)
>                 ret = current_in_userns(fc->user_ns);
>         else
>                 ret = fuse_permissible_uidgid(fc);
>         if (!ret && allow_sys_admin_access && capable(CAP_SYS_ADMIN))
>                 ret = 1;
>
>         return ret;
> }
>
> but I'm not fuzzed about it.

Me neither as long as it allows kernel to avoid unnecessary capable()
calls :) LGTM, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-24 19:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-20 20:14 [PATCH v2] fuse: Rearrange fuse_allow_current_process checks Dave Marchevsky
2022-10-21  8:09 ` Christian Brauner
2022-10-21 16:02   ` Andrii Nakryiko
2022-10-22 13:30     ` Christian Brauner
2022-10-24 16:53       ` Andrii Nakryiko
2022-10-21 16:05 ` Andrii Nakryiko
2022-10-22 13:22   ` Christian Brauner
2022-10-24 13:44     ` Seth Forshee
2022-10-24 16:52     ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.