All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
@ 2021-11-11 22:11 Dave Marchevsky
  2021-11-12  2:10 ` Rik van Riel
  2021-11-12 10:13 ` Christian Brauner
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Marchevsky @ 2021-11-11 22:11 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Dave Marchevsky, Miklos Szeredi, Seth Forshee, Rik van Riel, kernel-team

Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the
superblock's namespace or a descendant"), access to allow_other FUSE
filesystems has been limited to users in the mounting user namespace or
descendants. This prevents a process that is privileged in its userns -
but not its parent namespaces - from mounting a FUSE fs w/ allow_other
that is accessible to processes in parent namespaces.

While this restriction makes sense overall it breaks a legitimate
usecase for me. I have a tracing daemon which needs to peek into
process' open files in order to symbolicate - similar to 'perf'. The
daemon is a privileged process in the root userns, but is unable to peek
into FUSE filesystems mounted with allow_other by processes in child
namespaces.

This patch adds an escape hatch to the descendant userns logic
specifically for processes with CAP_SYS_ADMIN in the root userns. Such
processes can already do many dangerous things regardless of namespace,
and moreover could fork and setns into any child userns with a FUSE
mount, so it's reasonable to allow them to interact with all allow_other
FUSE filesystems.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: kernel-team@fb.com
---

Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best
choice of capability here. Went with the former as it's checked
elsewhere in fs/fuse while CAP_SYS_PTRACE isn't.

 fs/fuse/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0654bfedcbb0..2524eeb0f35d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
 	const struct cred *cred;
 
 	if (fc->allow_other)
-		return current_in_userns(fc->user_ns);
+		return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN);
 
 	cred = current_cred();
 	if (uid_eq(cred->euid, fc->user_id) &&
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2021-11-11 22:11 [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount Dave Marchevsky
@ 2021-11-12  2:10 ` Rik van Riel
  2021-11-12 10:13 ` Christian Brauner
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2021-11-12  2:10 UTC (permalink / raw)
  To: Dave Marchevsky, linux-fsdevel; +Cc: Miklos Szeredi, Seth Forshee, kernel-team

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

On Thu, 2021-11-11 at 14:11 -0800, Dave Marchevsky wrote:
> 
> This patch adds an escape hatch to the descendant userns logic
> specifically for processes with CAP_SYS_ADMIN in the root userns.
> Such
> processes can already do many dangerous things regardless of
> namespace,
> and moreover could fork and setns into any child userns with a FUSE
> mount, so it's reasonable to allow them to interact with all
> allow_other
> FUSE filesystems.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Seth Forshee <sforshee@digitalocean.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: kernel-team@fb.com

This will also want a:

Fixes: 73f03c2b4b52 ("fuse: Restrict allow_other to the superblock's
namespace or a descendant")
Cc: stable@kernel.org

The patch itself looks good to my untrained eye, but could
probably use some attention from somebody who really understands
the VFS :)

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2021-11-11 22:11 [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount Dave Marchevsky
  2021-11-12  2:10 ` Rik van Riel
@ 2021-11-12 10:13 ` Christian Brauner
  2021-11-12 23:29   ` Dave Marchevsky
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2021-11-12 10:13 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: linux-fsdevel, Miklos Szeredi, Seth Forshee, Rik van Riel, kernel-team

On Thu, Nov 11, 2021 at 02:11:42PM -0800, Dave Marchevsky wrote:
> Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the
> superblock's namespace or a descendant"), access to allow_other FUSE
> filesystems has been limited to users in the mounting user namespace or
> descendants. This prevents a process that is privileged in its userns -
> but not its parent namespaces - from mounting a FUSE fs w/ allow_other
> that is accessible to processes in parent namespaces.
> 
> While this restriction makes sense overall it breaks a legitimate
> usecase for me. I have a tracing daemon which needs to peek into
> process' open files in order to symbolicate - similar to 'perf'. The
> daemon is a privileged process in the root userns, but is unable to peek
> into FUSE filesystems mounted with allow_other by processes in child
> namespaces.
> 
> This patch adds an escape hatch to the descendant userns logic
> specifically for processes with CAP_SYS_ADMIN in the root userns. Such
> processes can already do many dangerous things regardless of namespace,
> and moreover could fork and setns into any child userns with a FUSE
> mount, so it's reasonable to allow them to interact with all allow_other
> FUSE filesystems.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Cc: Miklos Szeredi <miklos@szeredi.hu>
> Cc: Seth Forshee <sforshee@digitalocean.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: kernel-team@fb.com
> ---

If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't
it simply use a helper process/thread to
setns(userns_fd/pidfd, CLONE_NEWUSER)
to the target userns? This way we don't need to special-case
init_user_ns at all.

> 
> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best
> choice of capability here. Went with the former as it's checked
> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't.
> 
>  fs/fuse/dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 0654bfedcbb0..2524eeb0f35d 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>  	const struct cred *cred;
>  
>  	if (fc->allow_other)
> -		return current_in_userns(fc->user_ns);
> +		return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN);
>  
>  	cred = current_cred();
>  	if (uid_eq(cred->euid, fc->user_id) &&
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2021-11-12 10:13 ` Christian Brauner
@ 2021-11-12 23:29   ` Dave Marchevsky
  2021-11-15 15:28     ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Marchevsky @ 2021-11-12 23:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Miklos Szeredi, Seth Forshee, Rik van Riel, kernel-team

On 11/12/21 5:13 AM, Christian Brauner wrote:   
> On Thu, Nov 11, 2021 at 02:11:42PM -0800, Dave Marchevsky wrote:
>> Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the
>> superblock's namespace or a descendant"), access to allow_other FUSE
>> filesystems has been limited to users in the mounting user namespace or
>> descendants. This prevents a process that is privileged in its userns -
>> but not its parent namespaces - from mounting a FUSE fs w/ allow_other
>> that is accessible to processes in parent namespaces.
>>
>> While this restriction makes sense overall it breaks a legitimate
>> usecase for me. I have a tracing daemon which needs to peek into
>> process' open files in order to symbolicate - similar to 'perf'. The
>> daemon is a privileged process in the root userns, but is unable to peek
>> into FUSE filesystems mounted with allow_other by processes in child
>> namespaces.
>>
>> This patch adds an escape hatch to the descendant userns logic
>> specifically for processes with CAP_SYS_ADMIN in the root userns. Such
>> processes can already do many dangerous things regardless of namespace,
>> and moreover could fork and setns into any child userns with a FUSE
>> mount, so it's reasonable to allow them to interact with all allow_other
>> FUSE filesystems.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> Cc: Miklos Szeredi <miklos@szeredi.hu>
>> Cc: Seth Forshee <sforshee@digitalocean.com>
>> Cc: Rik van Riel <riel@surriel.com>
>> Cc: kernel-team@fb.com
>> ---
> 
> If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't
> it simply use a helper process/thread to
> setns(userns_fd/pidfd, CLONE_NEWUSER)
> to the target userns? This way we don't need to special-case
> init_user_ns at all.

helper process + setns could work for my usecase. But the fact that there's no
way to say "I know what I am about to do is potentially stupid and dangerous,
but I am root so let me do it", without spawning a helper process in this case,
feels like it'll result in special-case userspace workarounds for anyone doing
symbolication of backtraces.

e.g. perf will have to add some logic: "did I fail
to grab this exe that some process had mapped? Is it in a FUSE mounted by some
descendant userns? let's fork a helper process..." Not the end of the world,
but unnecessary complexity nonetheless.

That being said, I agree that this patch's special-casing of init_user_ns is
hacky. What do you think about a more explicit and general "let me do this
stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field
containing a set of exception userns', populated with ioctl or similar.

> 
>>
>> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best
>> choice of capability here. Went with the former as it's checked
>> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't.
>>
>>  fs/fuse/dir.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index 0654bfedcbb0..2524eeb0f35d 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>>  	const struct cred *cred;
>>  
>>  	if (fc->allow_other)
>> -		return current_in_userns(fc->user_ns);
>> +		return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN);
>>  
>>  	cred = current_cred();
>>  	if (uid_eq(cred->euid, fc->user_id) &&
>> -- 
>> 2.30.2
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2021-11-12 23:29   ` Dave Marchevsky
@ 2021-11-15 15:28     ` Miklos Szeredi
  2022-05-17 16:50       ` Dave Marchevsky
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2021-11-15 15:28 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel,
	kernel-team

On Sat, 13 Nov 2021 at 00:29, Dave Marchevsky <davemarchevsky@fb.com> wrote:

> > If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't
> > it simply use a helper process/thread to
> > setns(userns_fd/pidfd, CLONE_NEWUSER)
> > to the target userns? This way we don't need to special-case
> > init_user_ns at all.
>
> helper process + setns could work for my usecase. But the fact that there's no
> way to say "I know what I am about to do is potentially stupid and dangerous,
> but I am root so let me do it", without spawning a helper process in this case,
> feels like it'll result in special-case userspace workarounds for anyone doing
> symbolication of backtraces.

Note: any mechanism that grants filesystem access to users that have
higher privileges than the daemon serving the filesystem will
potentially open DoS attacks against the higher privilege task.  This
would be somewhat mitigated if the filesystem is only mounted in a
private mount namespace, but AFAICS that's not guaranteed.

The above obviously applies to your original patch but it also applies
to any other mechanism where the high privilege user doesn't
explicitly acknowledge and accept the consequences.   IOW granting the
exception has to be initiated by the high privleged user.

Thanks,
Miklos



>
> e.g. perf will have to add some logic: "did I fail
> to grab this exe that some process had mapped? Is it in a FUSE mounted by some
> descendant userns? let's fork a helper process..." Not the end of the world,
> but unnecessary complexity nonetheless.
>
> That being said, I agree that this patch's special-casing of init_user_ns is
> hacky. What do you think about a more explicit and general "let me do this
> stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field
> containing a set of exception userns', populated with ioctl or similar.



>
> >
> >>
> >> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best
> >> choice of capability here. Went with the former as it's checked
> >> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't.
> >>
> >>  fs/fuse/dir.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> >> index 0654bfedcbb0..2524eeb0f35d 100644
> >> --- a/fs/fuse/dir.c
> >> +++ b/fs/fuse/dir.c
> >> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
> >>      const struct cred *cred;
> >>
> >>      if (fc->allow_other)
> >> -            return current_in_userns(fc->user_ns);
> >> +            return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN);
> >>
> >>      cred = current_cred();
> >>      if (uid_eq(cred->euid, fc->user_id) &&
> >> --
> >> 2.30.2
> >>
> >>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2021-11-15 15:28     ` Miklos Szeredi
@ 2022-05-17 16:50       ` Dave Marchevsky
  2022-05-18 11:22         ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Marchevsky @ 2022-05-17 16:50 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel,
	kernel-team, Andrii Nakryiko

On 11/15/21 10:28 AM, Miklos Szeredi wrote:   
> On Sat, 13 Nov 2021 at 00:29, Dave Marchevsky <davemarchevsky@fb.com> wrote:
> 
>>> If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't
>>> it simply use a helper process/thread to
>>> setns(userns_fd/pidfd, CLONE_NEWUSER)
>>> to the target userns? This way we don't need to special-case
>>> init_user_ns at all.
>>
>> helper process + setns could work for my usecase. But the fact that there's no
>> way to say "I know what I am about to do is potentially stupid and dangerous,
>> but I am root so let me do it", without spawning a helper process in this case,
>> feels like it'll result in special-case userspace workarounds for anyone doing
>> symbolication of backtraces.
> 
> Note: any mechanism that grants filesystem access to users that have
> higher privileges than the daemon serving the filesystem will
> potentially open DoS attacks against the higher privilege task.  This
> would be somewhat mitigated if the filesystem is only mounted in a
> private mount namespace, but AFAICS that's not guaranteed.
> 
> The above obviously applies to your original patch but it also applies
> to any other mechanism where the high privilege user doesn't
> explicitly acknowledge and accept the consequences.   IOW granting the
> exception has to be initiated by the high privleged user.
> 
> Thanks,
> Miklos
> 

Sorry to ressurect this old thread. My proposed alternate approach of "special
ioctl to grant exception to descendant userns check" proved unnecessarily
complex: ioctls also go through fuse_allow_current_process check, so a special
carve-out would be necessary for in both ioctl and fuse_permission check in
order to make it possible for non-descendant-userns user to opt in to exception.

How about a version of this patch with CAP_DAC_READ_SEARCH check? This way 
there's more of a clear opt-in vs CAP_SYS_ADMIN.

FWIW we've been running CAP_SYS_ADMIN version of this patch internally and
can confirm it fixes tracing tools' ability to symbolicate binaries in FUSE.

> 
>>
>> e.g. perf will have to add some logic: "did I fail
>> to grab this exe that some process had mapped? Is it in a FUSE mounted by some
>> descendant userns? let's fork a helper process..." Not the end of the world,
>> but unnecessary complexity nonetheless.
>>
>> That being said, I agree that this patch's special-casing of init_user_ns is
>> hacky. What do you think about a more explicit and general "let me do this
>> stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field
>> containing a set of exception userns', populated with ioctl or similar.
> 
> 
> 
>>
>>>
>>>>
>>>> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best
>>>> choice of capability here. Went with the former as it's checked
>>>> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't.
>>>>
>>>>  fs/fuse/dir.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>>>> index 0654bfedcbb0..2524eeb0f35d 100644
>>>> --- a/fs/fuse/dir.c
>>>> +++ b/fs/fuse/dir.c
>>>> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>>>>      const struct cred *cred;
>>>>
>>>>      if (fc->allow_other)
>>>> -            return current_in_userns(fc->user_ns);
>>>> +            return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN);
>>>>
>>>>      cred = current_cred();
>>>>      if (uid_eq(cred->euid, fc->user_id) &&
>>>> --
>>>> 2.30.2
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-17 16:50       ` Dave Marchevsky
@ 2022-05-18 11:22         ` Christian Brauner
  2022-05-18 11:26           ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2022-05-18 11:22 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, Seth Forshee,
	Rik van Riel, kernel-team, Andrii Nakryiko

On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote:
> On 11/15/21 10:28 AM, Miklos Szeredi wrote:   
> > On Sat, 13 Nov 2021 at 00:29, Dave Marchevsky <davemarchevsky@fb.com> wrote:
> > 
> >>> If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't
> >>> it simply use a helper process/thread to
> >>> setns(userns_fd/pidfd, CLONE_NEWUSER)
> >>> to the target userns? This way we don't need to special-case
> >>> init_user_ns at all.
> >>
> >> helper process + setns could work for my usecase. But the fact that there's no
> >> way to say "I know what I am about to do is potentially stupid and dangerous,
> >> but I am root so let me do it", without spawning a helper process in this case,
> >> feels like it'll result in special-case userspace workarounds for anyone doing
> >> symbolication of backtraces.
> > 
> > Note: any mechanism that grants filesystem access to users that have
> > higher privileges than the daemon serving the filesystem will
> > potentially open DoS attacks against the higher privilege task.  This
> > would be somewhat mitigated if the filesystem is only mounted in a
> > private mount namespace, but AFAICS that's not guaranteed.
> > 
> > The above obviously applies to your original patch but it also applies
> > to any other mechanism where the high privilege user doesn't
> > explicitly acknowledge and accept the consequences.   IOW granting the
> > exception has to be initiated by the high privleged user.
> > 
> > Thanks,
> > Miklos
> > 
> 
> Sorry to ressurect this old thread. My proposed alternate approach of "special
> ioctl to grant exception to descendant userns check" proved unnecessarily
> complex: ioctls also go through fuse_allow_current_process check, so a special
> carve-out would be necessary for in both ioctl and fuse_permission check in
> order to make it possible for non-descendant-userns user to opt in to exception.
> 
> How about a version of this patch with CAP_DAC_READ_SEARCH check? This way 
> there's more of a clear opt-in vs CAP_SYS_ADMIN.

I still think this isn't needed given that especially for the use-cases
listed here you have a workable userspace solution to this problem.

If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about
giving a privileged task access then it'd be fine imho. But given that
this means the privileged task is open to a DoS attack it seems we're
building a trap into the fuse code.

The setns() model has the advantage that this forces the task to assume
the correct privileges and also serves as an explicit opt-in. Just my 2
cents here.

> 
> FWIW we've been running CAP_SYS_ADMIN version of this patch internally and
> can confirm it fixes tracing tools' ability to symbolicate binaries in FUSE.
> 
> > 
> >>
> >> e.g. perf will have to add some logic: "did I fail
> >> to grab this exe that some process had mapped? Is it in a FUSE mounted by some
> >> descendant userns? let's fork a helper process..." Not the end of the world,
> >> but unnecessary complexity nonetheless.
> >>
> >> That being said, I agree that this patch's special-casing of init_user_ns is
> >> hacky. What do you think about a more explicit and general "let me do this
> >> stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field
> >> containing a set of exception userns', populated with ioctl or similar.
> > 
> > 
> > 
> >>
> >>>
> >>>>
> >>>> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best
> >>>> choice of capability here. Went with the former as it's checked
> >>>> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't.
> >>>>
> >>>>  fs/fuse/dir.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> >>>> index 0654bfedcbb0..2524eeb0f35d 100644
> >>>> --- a/fs/fuse/dir.c
> >>>> +++ b/fs/fuse/dir.c
> >>>> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc)
> >>>>      const struct cred *cred;
> >>>>
> >>>>      if (fc->allow_other)
> >>>> -            return current_in_userns(fc->user_ns);
> >>>> +            return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN);
> >>>>
> >>>>      cred = current_cred();
> >>>>      if (uid_eq(cred->euid, fc->user_id) &&
> >>>> --
> >>>> 2.30.2
> >>>>
> >>>>
> >>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-18 11:22         ` Christian Brauner
@ 2022-05-18 11:26           ` Miklos Szeredi
  2022-05-19  4:56             ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Miklos Szeredi @ 2022-05-18 11:26 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Dave Marchevsky, Christian Brauner, linux-fsdevel, Seth Forshee,
	Rik van Riel, kernel-team, Andrii Nakryiko

On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote:

> > Sorry to ressurect this old thread. My proposed alternate approach of "special
> > ioctl to grant exception to descendant userns check" proved unnecessarily
> > complex: ioctls also go through fuse_allow_current_process check, so a special
> > carve-out would be necessary for in both ioctl and fuse_permission check in
> > order to make it possible for non-descendant-userns user to opt in to exception.
> >
> > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way
> > there's more of a clear opt-in vs CAP_SYS_ADMIN.
>
> I still think this isn't needed given that especially for the use-cases
> listed here you have a workable userspace solution to this problem.
>
> If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about
> giving a privileged task access then it'd be fine imho. But given that
> this means the privileged task is open to a DoS attack it seems we're
> building a trap into the fuse code.
>
> The setns() model has the advantage that this forces the task to assume
> the correct privileges and also serves as an explicit opt-in. Just my 2
> cents here.

Fully agreed.  Using CAP_DAC_READ_SEARCH instead of CAP_SYS_ADMIN
doesn't make this any better, since root has all caps including
CAP_DAC_READ_SEARCH.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-18 11:26           ` Miklos Szeredi
@ 2022-05-19  4:56             ` Andrii Nakryiko
  2022-05-19  8:59               ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2022-05-19  4:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Dave Marchevsky, Christian Brauner,
	linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team,
	Andrii Nakryiko, clm

On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote:
>
> > > Sorry to ressurect this old thread. My proposed alternate approach of "special
> > > ioctl to grant exception to descendant userns check" proved unnecessarily
> > > complex: ioctls also go through fuse_allow_current_process check, so a special
> > > carve-out would be necessary for in both ioctl and fuse_permission check in
> > > order to make it possible for non-descendant-userns user to opt in to exception.
> > >
> > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way
> > > there's more of a clear opt-in vs CAP_SYS_ADMIN.
> >
> > I still think this isn't needed given that especially for the use-cases
> > listed here you have a workable userspace solution to this problem.

Unfortunately such userspace solution isn't that great in practice.
It's both very cumbersome to implement and integrate into existing
profiling solutions and causes undesired inefficiencies when
processing (typically for stack trace symbolization) lots of profiled
processes.

> >
> > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about
> > giving a privileged task access then it'd be fine imho. But given that
> > this means the privileged task is open to a DoS attack it seems we're
> > building a trap into the fuse code.

Running under root presumably means that the application knows what
it's doing (and it can do a lot of dangerous and harmful things
outside of FUSE already), so why should there be any more opt in for
it to access file contents? CAP_SYS_ADMIN can do pretty much anything
in the system, it seems a bit asymmetric to have extra FUSE-specific
restrictions for it.

> >
> > The setns() model has the advantage that this forces the task to assume
> > the correct privileges and also serves as an explicit opt-in. Just my 2
> > cents here.
>
> Fully agreed.  Using CAP_DAC_READ_SEARCH instead of CAP_SYS_ADMIN
> doesn't make this any better, since root has all caps including
> CAP_DAC_READ_SEARCH.
>
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-19  4:56             ` Andrii Nakryiko
@ 2022-05-19  8:59               ` Christian Brauner
  2022-05-24  4:35                 ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2022-05-19  8:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Miklos Szeredi, Dave Marchevsky, linux-fsdevel, Seth Forshee,
	Rik van Riel, kernel-team, Andrii Nakryiko, clm

On Wed, May 18, 2022 at 09:56:26PM -0700, Andrii Nakryiko wrote:
> On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote:
> >
> > > > Sorry to ressurect this old thread. My proposed alternate approach of "special
> > > > ioctl to grant exception to descendant userns check" proved unnecessarily
> > > > complex: ioctls also go through fuse_allow_current_process check, so a special
> > > > carve-out would be necessary for in both ioctl and fuse_permission check in
> > > > order to make it possible for non-descendant-userns user to opt in to exception.
> > > >
> > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way
> > > > there's more of a clear opt-in vs CAP_SYS_ADMIN.
> > >
> > > I still think this isn't needed given that especially for the use-cases
> > > listed here you have a workable userspace solution to this problem.
> 
> Unfortunately such userspace solution isn't that great in practice.
> It's both very cumbersome to implement and integrate into existing
> profiling solutions and causes undesired inefficiencies when
> processing (typically for stack trace symbolization) lots of profiled
> processes.
> 
> > >
> > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about
> > > giving a privileged task access then it'd be fine imho. But given that
> > > this means the privileged task is open to a DoS attack it seems we're
> > > building a trap into the fuse code.
> 
> Running under root presumably means that the application knows what
> it's doing (and it can do a lot of dangerous and harmful things
> outside of FUSE already), so why should there be any more opt in for
> it to access file contents? CAP_SYS_ADMIN can do pretty much anything
> in the system, it seems a bit asymmetric to have extra FUSE-specific
> restrictions for it.

Processes trying to access a fuse filesystem that is not in the same
userns or a descendant userns are open to DoS attacks. This specifically
includes processes capable in the initial userns.

If it suddenly becomes possible that an initial userns capable process
can access fuse filesystems in any userns than any such process
accessing a fuse filesystem unintentionally will be susceptible to DoS
attacks.

Iow, the problem isn't that an initial userns capable process is doing
something harmful and we're overly careful trying to prevent this and
thereby going against standard CAP_SYS_ADMIN assumptions; it's that an
initial userns capable process can unintentionally have something
harmful done to it simply by accessing a fuse filesystem.

This is even more concerning since rn this isn't possible so this patch
is removing a protection/security mechanism. The performance argument
isn't enough to justify this imho.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-19  8:59               ` Christian Brauner
@ 2022-05-24  4:35                 ` Andrii Nakryiko
  2022-05-24  7:07                   ` Miklos Szeredi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2022-05-24  4:35 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Dave Marchevsky, linux-fsdevel, Seth Forshee,
	Rik van Riel, kernel-team, Andrii Nakryiko, clm,
	Arnaldo Carvalho de Melo

On Thu, May 19, 2022 at 1:59 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, May 18, 2022 at 09:56:26PM -0700, Andrii Nakryiko wrote:
> > On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote:
> > >
> > > > > Sorry to ressurect this old thread. My proposed alternate approach of "special
> > > > > ioctl to grant exception to descendant userns check" proved unnecessarily
> > > > > complex: ioctls also go through fuse_allow_current_process check, so a special
> > > > > carve-out would be necessary for in both ioctl and fuse_permission check in
> > > > > order to make it possible for non-descendant-userns user to opt in to exception.
> > > > >
> > > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way
> > > > > there's more of a clear opt-in vs CAP_SYS_ADMIN.
> > > >
> > > > I still think this isn't needed given that especially for the use-cases
> > > > listed here you have a workable userspace solution to this problem.
> >
> > Unfortunately such userspace solution isn't that great in practice.
> > It's both very cumbersome to implement and integrate into existing
> > profiling solutions and causes undesired inefficiencies when
> > processing (typically for stack trace symbolization) lots of profiled
> > processes.
> >
> > > >
> > > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about
> > > > giving a privileged task access then it'd be fine imho. But given that
> > > > this means the privileged task is open to a DoS attack it seems we're
> > > > building a trap into the fuse code.
> >
> > Running under root presumably means that the application knows what
> > it's doing (and it can do a lot of dangerous and harmful things
> > outside of FUSE already), so why should there be any more opt in for
> > it to access file contents? CAP_SYS_ADMIN can do pretty much anything
> > in the system, it seems a bit asymmetric to have extra FUSE-specific
> > restrictions for it.
>
> Processes trying to access a fuse filesystem that is not in the same
> userns or a descendant userns are open to DoS attacks. This specifically
> includes processes capable in the initial userns.

Sure, but by DoS attack here you mean that a capable (I'll just say
"root" for simplicity) process might get stuck. While not great, it's
not as horrible as crashing the kernel or something along those lines.
So let's keep this perspective in mind, because here we are talking
about disabling very useful functionality (it's not a hypothetical
problem, it's a real production problem that users are struggling with
right now) while trying to prevent root process (which has to be
careful anyways as it's a root process after all) from shooting itself
in the foot.

>
> If it suddenly becomes possible that an initial userns capable process
> can access fuse filesystems in any userns than any such process
> accessing a fuse filesystem unintentionally will be susceptible to DoS
> attacks.
>
> Iow, the problem isn't that an initial userns capable process is doing
> something harmful and we're overly careful trying to prevent this and
> thereby going against standard CAP_SYS_ADMIN assumptions; it's that an
> initial userns capable process can unintentionally have something
> harmful done to it simply by accessing a fuse filesystem.
>
> This is even more concerning since rn this isn't possible so this patch
> is removing a protection/security mechanism. The performance argument
> isn't enough to justify this imho.

Performance is a big deal in the fleet of many thousands of servers,
so let's not just dismiss this argument so easily. Also, in a lot of
cases (production systems, properly audited, monitored, secured, etc)
the workloads can be trusted, so the DoS attack by an unprivileged
process is not a huge concern. On the other hand, though, inability to
read FUSE-backed files by root process is a huge blocker for stack
symbolization, as one specific use case. And spinning threads for
hundreds of target processes is not a viable solution in production,
unfortunately.

But given for some situations it might be better to be safe than
sorry, can we let users building kernels decide on this? How about
adding Kconfig value that would allow such access for CAP_SYS_ADMIN
(or whichever capability makes most sense). E.g.,
CONFIG_FUSE_ALLOW_OTHER_PERMISSIVE would allow reads, but otherwise
reject them? Would something like this be a good compromise here?

I still think that tools like perf being able to provide good tracing
data is going to hurt due to this cautious rejection of access, but
with Kconfig we at least give an option for users to opt out of it.
WDYT?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-24  4:35                 ` Andrii Nakryiko
@ 2022-05-24  7:07                   ` Miklos Szeredi
  2022-05-24 14:59                     ` Rik van Riel
  2022-05-24 15:44                     ` Christian Brauner
  0 siblings, 2 replies; 14+ messages in thread
From: Miklos Szeredi @ 2022-05-24  7:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Christian Brauner, Dave Marchevsky, linux-fsdevel, Seth Forshee,
	Rik van Riel, kernel-team, Andrii Nakryiko, Chris Mason,
	Arnaldo Carvalho de Melo

On Tue, 24 May 2022 at 06:36, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> I still think that tools like perf being able to provide good tracing
> data is going to hurt due to this cautious rejection of access, but
> with Kconfig we at least give an option for users to opt out of it.
> WDYT?

I'd rather use a module option for this, always defaulting to off .
Then sysadmin then can choose to turn this protection off if
necessary. This would effectively be the same as "user_allow_other"
option in /etc/fuse.conf, which fusermount interprets but the kernel
doesn't.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-24  7:07                   ` Miklos Szeredi
@ 2022-05-24 14:59                     ` Rik van Riel
  2022-05-24 15:44                     ` Christian Brauner
  1 sibling, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2022-05-24 14:59 UTC (permalink / raw)
  To: Miklos Szeredi, Andrii Nakryiko
  Cc: Christian Brauner, Dave Marchevsky, linux-fsdevel, Seth Forshee,
	kernel-team, Andrii Nakryiko, Chris Mason,
	Arnaldo Carvalho de Melo

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Tue, 2022-05-24 at 09:07 +0200, Miklos Szeredi wrote:
> On Tue, 24 May 2022 at 06:36, Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> 
> > I still think that tools like perf being able to provide good
> > tracing
> > data is going to hurt due to this cautious rejection of access, but
> > with Kconfig we at least give an option for users to opt out of it.
> > WDYT?
> 
> I'd rather use a module option for this, always defaulting to off .
> Then sysadmin then can choose to turn this protection off if
> necessary. This would effectively be the same as "user_allow_other"
> option in /etc/fuse.conf, which fusermount interprets but the kernel
> doesn't.

Configuring that behavior through /sys/module/fuse/user_allow_other
(or some other name if people have better ideas) seems like a good
way to configure that, indeed!

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
  2022-05-24  7:07                   ` Miklos Szeredi
  2022-05-24 14:59                     ` Rik van Riel
@ 2022-05-24 15:44                     ` Christian Brauner
  1 sibling, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2022-05-24 15:44 UTC (permalink / raw)
  To: Miklos Szeredi, Andrii Nakryiko
  Cc: Dave Marchevsky, linux-fsdevel, Seth Forshee, Rik van Riel,
	kernel-team, Andrii Nakryiko, Chris Mason,
	Arnaldo Carvalho de Melo

On Tue, May 24, 2022 at 09:07:34AM +0200, Miklos Szeredi wrote:
> On Tue, 24 May 2022 at 06:36, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> > I still think that tools like perf being able to provide good tracing
> > data is going to hurt due to this cautious rejection of access, but
> > with Kconfig we at least give an option for users to opt out of it.
> > WDYT?
> 
> I'd rather use a module option for this, always defaulting to off .
> Then sysadmin then can choose to turn this protection off if
> necessary. This would effectively be the same as "user_allow_other"
> option in /etc/fuse.conf, which fusermount interprets but the kernel
> doesn't.

Agreed. Should be properly documented.

Christian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-24 15:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 22:11 [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount Dave Marchevsky
2021-11-12  2:10 ` Rik van Riel
2021-11-12 10:13 ` Christian Brauner
2021-11-12 23:29   ` Dave Marchevsky
2021-11-15 15:28     ` Miklos Szeredi
2022-05-17 16:50       ` Dave Marchevsky
2022-05-18 11:22         ` Christian Brauner
2022-05-18 11:26           ` Miklos Szeredi
2022-05-19  4:56             ` Andrii Nakryiko
2022-05-19  8:59               ` Christian Brauner
2022-05-24  4:35                 ` Andrii Nakryiko
2022-05-24  7:07                   ` Miklos Szeredi
2022-05-24 14:59                     ` Rik van Riel
2022-05-24 15:44                     ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.