* [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount @ 2021-11-11 22:11 Dave Marchevsky 2021-11-12 2:10 ` Rik van Riel 2021-11-12 10:13 ` Christian Brauner 0 siblings, 2 replies; 14+ messages in thread From: Dave Marchevsky @ 2021-11-11 22:11 UTC (permalink / raw) To: linux-fsdevel Cc: Dave Marchevsky, Miklos Szeredi, Seth Forshee, Rik van Riel, kernel-team Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the superblock's namespace or a descendant"), access to allow_other FUSE filesystems has been limited to users in the mounting user namespace or descendants. This prevents a process that is privileged in its userns - but not its parent namespaces - from mounting a FUSE fs w/ allow_other that is accessible to processes in parent namespaces. While this restriction makes sense overall it breaks a legitimate usecase for me. I have a tracing daemon which needs to peek into process' open files in order to symbolicate - similar to 'perf'. The daemon is a privileged process in the root userns, but is unable to peek into FUSE filesystems mounted with allow_other by processes in child namespaces. This patch adds an escape hatch to the descendant userns logic specifically for processes with CAP_SYS_ADMIN in the root userns. Such processes can already do many dangerous things regardless of namespace, and moreover could fork and setns into any child userns with a FUSE mount, so it's reasonable to allow them to interact with all allow_other FUSE filesystems. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Rik van Riel <riel@surriel.com> Cc: kernel-team@fb.com --- Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best choice of capability here. Went with the former as it's checked elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. fs/fuse/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 0654bfedcbb0..2524eeb0f35d 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) const struct cred *cred; if (fc->allow_other) - return current_in_userns(fc->user_ns); + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); cred = current_cred(); if (uid_eq(cred->euid, fc->user_id) && -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2021-11-11 22:11 [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount Dave Marchevsky @ 2021-11-12 2:10 ` Rik van Riel 2021-11-12 10:13 ` Christian Brauner 1 sibling, 0 replies; 14+ messages in thread From: Rik van Riel @ 2021-11-12 2:10 UTC (permalink / raw) To: Dave Marchevsky, linux-fsdevel; +Cc: Miklos Szeredi, Seth Forshee, kernel-team [-- Attachment #1: Type: text/plain, Size: 991 bytes --] On Thu, 2021-11-11 at 14:11 -0800, Dave Marchevsky wrote: > > This patch adds an escape hatch to the descendant userns logic > specifically for processes with CAP_SYS_ADMIN in the root userns. > Such > processes can already do many dangerous things regardless of > namespace, > and moreover could fork and setns into any child userns with a FUSE > mount, so it's reasonable to allow them to interact with all > allow_other > FUSE filesystems. > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> > Cc: Miklos Szeredi <miklos@szeredi.hu> > Cc: Seth Forshee <sforshee@digitalocean.com> > Cc: Rik van Riel <riel@surriel.com> > Cc: kernel-team@fb.com This will also want a: Fixes: 73f03c2b4b52 ("fuse: Restrict allow_other to the superblock's namespace or a descendant") Cc: stable@kernel.org The patch itself looks good to my untrained eye, but could probably use some attention from somebody who really understands the VFS :) -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2021-11-11 22:11 [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount Dave Marchevsky 2021-11-12 2:10 ` Rik van Riel @ 2021-11-12 10:13 ` Christian Brauner 2021-11-12 23:29 ` Dave Marchevsky 1 sibling, 1 reply; 14+ messages in thread From: Christian Brauner @ 2021-11-12 10:13 UTC (permalink / raw) To: Dave Marchevsky Cc: linux-fsdevel, Miklos Szeredi, Seth Forshee, Rik van Riel, kernel-team On Thu, Nov 11, 2021 at 02:11:42PM -0800, Dave Marchevsky wrote: > Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the > superblock's namespace or a descendant"), access to allow_other FUSE > filesystems has been limited to users in the mounting user namespace or > descendants. This prevents a process that is privileged in its userns - > but not its parent namespaces - from mounting a FUSE fs w/ allow_other > that is accessible to processes in parent namespaces. > > While this restriction makes sense overall it breaks a legitimate > usecase for me. I have a tracing daemon which needs to peek into > process' open files in order to symbolicate - similar to 'perf'. The > daemon is a privileged process in the root userns, but is unable to peek > into FUSE filesystems mounted with allow_other by processes in child > namespaces. > > This patch adds an escape hatch to the descendant userns logic > specifically for processes with CAP_SYS_ADMIN in the root userns. Such > processes can already do many dangerous things regardless of namespace, > and moreover could fork and setns into any child userns with a FUSE > mount, so it's reasonable to allow them to interact with all allow_other > FUSE filesystems. > > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> > Cc: Miklos Szeredi <miklos@szeredi.hu> > Cc: Seth Forshee <sforshee@digitalocean.com> > Cc: Rik van Riel <riel@surriel.com> > Cc: kernel-team@fb.com > --- If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't it simply use a helper process/thread to setns(userns_fd/pidfd, CLONE_NEWUSER) to the target userns? This way we don't need to special-case init_user_ns at all. > > Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best > choice of capability here. Went with the former as it's checked > elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. > > fs/fuse/dir.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index 0654bfedcbb0..2524eeb0f35d 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) > const struct cred *cred; > > if (fc->allow_other) > - return current_in_userns(fc->user_ns); > + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); > > cred = current_cred(); > if (uid_eq(cred->euid, fc->user_id) && > -- > 2.30.2 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2021-11-12 10:13 ` Christian Brauner @ 2021-11-12 23:29 ` Dave Marchevsky 2021-11-15 15:28 ` Miklos Szeredi 0 siblings, 1 reply; 14+ messages in thread From: Dave Marchevsky @ 2021-11-12 23:29 UTC (permalink / raw) To: Christian Brauner Cc: linux-fsdevel, Miklos Szeredi, Seth Forshee, Rik van Riel, kernel-team On 11/12/21 5:13 AM, Christian Brauner wrote: > On Thu, Nov 11, 2021 at 02:11:42PM -0800, Dave Marchevsky wrote: >> Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the >> superblock's namespace or a descendant"), access to allow_other FUSE >> filesystems has been limited to users in the mounting user namespace or >> descendants. This prevents a process that is privileged in its userns - >> but not its parent namespaces - from mounting a FUSE fs w/ allow_other >> that is accessible to processes in parent namespaces. >> >> While this restriction makes sense overall it breaks a legitimate >> usecase for me. I have a tracing daemon which needs to peek into >> process' open files in order to symbolicate - similar to 'perf'. The >> daemon is a privileged process in the root userns, but is unable to peek >> into FUSE filesystems mounted with allow_other by processes in child >> namespaces. >> >> This patch adds an escape hatch to the descendant userns logic >> specifically for processes with CAP_SYS_ADMIN in the root userns. Such >> processes can already do many dangerous things regardless of namespace, >> and moreover could fork and setns into any child userns with a FUSE >> mount, so it's reasonable to allow them to interact with all allow_other >> FUSE filesystems. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> >> Cc: Miklos Szeredi <miklos@szeredi.hu> >> Cc: Seth Forshee <sforshee@digitalocean.com> >> Cc: Rik van Riel <riel@surriel.com> >> Cc: kernel-team@fb.com >> --- > > If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't > it simply use a helper process/thread to > setns(userns_fd/pidfd, CLONE_NEWUSER) > to the target userns? This way we don't need to special-case > init_user_ns at all. helper process + setns could work for my usecase. But the fact that there's no way to say "I know what I am about to do is potentially stupid and dangerous, but I am root so let me do it", without spawning a helper process in this case, feels like it'll result in special-case userspace workarounds for anyone doing symbolication of backtraces. e.g. perf will have to add some logic: "did I fail to grab this exe that some process had mapped? Is it in a FUSE mounted by some descendant userns? let's fork a helper process..." Not the end of the world, but unnecessary complexity nonetheless. That being said, I agree that this patch's special-casing of init_user_ns is hacky. What do you think about a more explicit and general "let me do this stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field containing a set of exception userns', populated with ioctl or similar. > >> >> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best >> choice of capability here. Went with the former as it's checked >> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. >> >> fs/fuse/dir.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >> index 0654bfedcbb0..2524eeb0f35d 100644 >> --- a/fs/fuse/dir.c >> +++ b/fs/fuse/dir.c >> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) >> const struct cred *cred; >> >> if (fc->allow_other) >> - return current_in_userns(fc->user_ns); >> + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); >> >> cred = current_cred(); >> if (uid_eq(cred->euid, fc->user_id) && >> -- >> 2.30.2 >> >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2021-11-12 23:29 ` Dave Marchevsky @ 2021-11-15 15:28 ` Miklos Szeredi 2022-05-17 16:50 ` Dave Marchevsky 0 siblings, 1 reply; 14+ messages in thread From: Miklos Szeredi @ 2021-11-15 15:28 UTC (permalink / raw) To: Dave Marchevsky Cc: Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team On Sat, 13 Nov 2021 at 00:29, Dave Marchevsky <davemarchevsky@fb.com> wrote: > > If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't > > it simply use a helper process/thread to > > setns(userns_fd/pidfd, CLONE_NEWUSER) > > to the target userns? This way we don't need to special-case > > init_user_ns at all. > > helper process + setns could work for my usecase. But the fact that there's no > way to say "I know what I am about to do is potentially stupid and dangerous, > but I am root so let me do it", without spawning a helper process in this case, > feels like it'll result in special-case userspace workarounds for anyone doing > symbolication of backtraces. Note: any mechanism that grants filesystem access to users that have higher privileges than the daemon serving the filesystem will potentially open DoS attacks against the higher privilege task. This would be somewhat mitigated if the filesystem is only mounted in a private mount namespace, but AFAICS that's not guaranteed. The above obviously applies to your original patch but it also applies to any other mechanism where the high privilege user doesn't explicitly acknowledge and accept the consequences. IOW granting the exception has to be initiated by the high privleged user. Thanks, Miklos > > e.g. perf will have to add some logic: "did I fail > to grab this exe that some process had mapped? Is it in a FUSE mounted by some > descendant userns? let's fork a helper process..." Not the end of the world, > but unnecessary complexity nonetheless. > > That being said, I agree that this patch's special-casing of init_user_ns is > hacky. What do you think about a more explicit and general "let me do this > stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field > containing a set of exception userns', populated with ioctl or similar. > > > > >> > >> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best > >> choice of capability here. Went with the former as it's checked > >> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. > >> > >> fs/fuse/dir.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > >> index 0654bfedcbb0..2524eeb0f35d 100644 > >> --- a/fs/fuse/dir.c > >> +++ b/fs/fuse/dir.c > >> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) > >> const struct cred *cred; > >> > >> if (fc->allow_other) > >> - return current_in_userns(fc->user_ns); > >> + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); > >> > >> cred = current_cred(); > >> if (uid_eq(cred->euid, fc->user_id) && > >> -- > >> 2.30.2 > >> > >> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2021-11-15 15:28 ` Miklos Szeredi @ 2022-05-17 16:50 ` Dave Marchevsky 2022-05-18 11:22 ` Christian Brauner 0 siblings, 1 reply; 14+ messages in thread From: Dave Marchevsky @ 2022-05-17 16:50 UTC (permalink / raw) To: Miklos Szeredi Cc: Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko On 11/15/21 10:28 AM, Miklos Szeredi wrote: > On Sat, 13 Nov 2021 at 00:29, Dave Marchevsky <davemarchevsky@fb.com> wrote: > >>> If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't >>> it simply use a helper process/thread to >>> setns(userns_fd/pidfd, CLONE_NEWUSER) >>> to the target userns? This way we don't need to special-case >>> init_user_ns at all. >> >> helper process + setns could work for my usecase. But the fact that there's no >> way to say "I know what I am about to do is potentially stupid and dangerous, >> but I am root so let me do it", without spawning a helper process in this case, >> feels like it'll result in special-case userspace workarounds for anyone doing >> symbolication of backtraces. > > Note: any mechanism that grants filesystem access to users that have > higher privileges than the daemon serving the filesystem will > potentially open DoS attacks against the higher privilege task. This > would be somewhat mitigated if the filesystem is only mounted in a > private mount namespace, but AFAICS that's not guaranteed. > > The above obviously applies to your original patch but it also applies > to any other mechanism where the high privilege user doesn't > explicitly acknowledge and accept the consequences. IOW granting the > exception has to be initiated by the high privleged user. > > Thanks, > Miklos > Sorry to ressurect this old thread. My proposed alternate approach of "special ioctl to grant exception to descendant userns check" proved unnecessarily complex: ioctls also go through fuse_allow_current_process check, so a special carve-out would be necessary for in both ioctl and fuse_permission check in order to make it possible for non-descendant-userns user to opt in to exception. How about a version of this patch with CAP_DAC_READ_SEARCH check? This way there's more of a clear opt-in vs CAP_SYS_ADMIN. FWIW we've been running CAP_SYS_ADMIN version of this patch internally and can confirm it fixes tracing tools' ability to symbolicate binaries in FUSE. > >> >> e.g. perf will have to add some logic: "did I fail >> to grab this exe that some process had mapped? Is it in a FUSE mounted by some >> descendant userns? let's fork a helper process..." Not the end of the world, >> but unnecessary complexity nonetheless. >> >> That being said, I agree that this patch's special-casing of init_user_ns is >> hacky. What do you think about a more explicit and general "let me do this >> stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field >> containing a set of exception userns', populated with ioctl or similar. > > > >> >>> >>>> >>>> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best >>>> choice of capability here. Went with the former as it's checked >>>> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. >>>> >>>> fs/fuse/dir.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >>>> index 0654bfedcbb0..2524eeb0f35d 100644 >>>> --- a/fs/fuse/dir.c >>>> +++ b/fs/fuse/dir.c >>>> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) >>>> const struct cred *cred; >>>> >>>> if (fc->allow_other) >>>> - return current_in_userns(fc->user_ns); >>>> + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); >>>> >>>> cred = current_cred(); >>>> if (uid_eq(cred->euid, fc->user_id) && >>>> -- >>>> 2.30.2 >>>> >>>> >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-17 16:50 ` Dave Marchevsky @ 2022-05-18 11:22 ` Christian Brauner 2022-05-18 11:26 ` Miklos Szeredi 0 siblings, 1 reply; 14+ messages in thread From: Christian Brauner @ 2022-05-18 11:22 UTC (permalink / raw) To: Dave Marchevsky Cc: Miklos Szeredi, Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote: > On 11/15/21 10:28 AM, Miklos Szeredi wrote: > > On Sat, 13 Nov 2021 at 00:29, Dave Marchevsky <davemarchevsky@fb.com> wrote: > > > >>> If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't > >>> it simply use a helper process/thread to > >>> setns(userns_fd/pidfd, CLONE_NEWUSER) > >>> to the target userns? This way we don't need to special-case > >>> init_user_ns at all. > >> > >> helper process + setns could work for my usecase. But the fact that there's no > >> way to say "I know what I am about to do is potentially stupid and dangerous, > >> but I am root so let me do it", without spawning a helper process in this case, > >> feels like it'll result in special-case userspace workarounds for anyone doing > >> symbolication of backtraces. > > > > Note: any mechanism that grants filesystem access to users that have > > higher privileges than the daemon serving the filesystem will > > potentially open DoS attacks against the higher privilege task. This > > would be somewhat mitigated if the filesystem is only mounted in a > > private mount namespace, but AFAICS that's not guaranteed. > > > > The above obviously applies to your original patch but it also applies > > to any other mechanism where the high privilege user doesn't > > explicitly acknowledge and accept the consequences. IOW granting the > > exception has to be initiated by the high privleged user. > > > > Thanks, > > Miklos > > > > Sorry to ressurect this old thread. My proposed alternate approach of "special > ioctl to grant exception to descendant userns check" proved unnecessarily > complex: ioctls also go through fuse_allow_current_process check, so a special > carve-out would be necessary for in both ioctl and fuse_permission check in > order to make it possible for non-descendant-userns user to opt in to exception. > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way > there's more of a clear opt-in vs CAP_SYS_ADMIN. I still think this isn't needed given that especially for the use-cases listed here you have a workable userspace solution to this problem. If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about giving a privileged task access then it'd be fine imho. But given that this means the privileged task is open to a DoS attack it seems we're building a trap into the fuse code. The setns() model has the advantage that this forces the task to assume the correct privileges and also serves as an explicit opt-in. Just my 2 cents here. > > FWIW we've been running CAP_SYS_ADMIN version of this patch internally and > can confirm it fixes tracing tools' ability to symbolicate binaries in FUSE. > > > > >> > >> e.g. perf will have to add some logic: "did I fail > >> to grab this exe that some process had mapped? Is it in a FUSE mounted by some > >> descendant userns? let's fork a helper process..." Not the end of the world, > >> but unnecessary complexity nonetheless. > >> > >> That being said, I agree that this patch's special-casing of init_user_ns is > >> hacky. What do you think about a more explicit and general "let me do this > >> stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field > >> containing a set of exception userns', populated with ioctl or similar. > > > > > > > >> > >>> > >>>> > >>>> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best > >>>> choice of capability here. Went with the former as it's checked > >>>> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. > >>>> > >>>> fs/fuse/dir.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > >>>> index 0654bfedcbb0..2524eeb0f35d 100644 > >>>> --- a/fs/fuse/dir.c > >>>> +++ b/fs/fuse/dir.c > >>>> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) > >>>> const struct cred *cred; > >>>> > >>>> if (fc->allow_other) > >>>> - return current_in_userns(fc->user_ns); > >>>> + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); > >>>> > >>>> cred = current_cred(); > >>>> if (uid_eq(cred->euid, fc->user_id) && > >>>> -- > >>>> 2.30.2 > >>>> > >>>> > >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-18 11:22 ` Christian Brauner @ 2022-05-18 11:26 ` Miklos Szeredi 2022-05-19 4:56 ` Andrii Nakryiko 0 siblings, 1 reply; 14+ messages in thread From: Miklos Szeredi @ 2022-05-18 11:26 UTC (permalink / raw) To: Christian Brauner Cc: Dave Marchevsky, Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote: > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote: > > Sorry to ressurect this old thread. My proposed alternate approach of "special > > ioctl to grant exception to descendant userns check" proved unnecessarily > > complex: ioctls also go through fuse_allow_current_process check, so a special > > carve-out would be necessary for in both ioctl and fuse_permission check in > > order to make it possible for non-descendant-userns user to opt in to exception. > > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way > > there's more of a clear opt-in vs CAP_SYS_ADMIN. > > I still think this isn't needed given that especially for the use-cases > listed here you have a workable userspace solution to this problem. > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about > giving a privileged task access then it'd be fine imho. But given that > this means the privileged task is open to a DoS attack it seems we're > building a trap into the fuse code. > > The setns() model has the advantage that this forces the task to assume > the correct privileges and also serves as an explicit opt-in. Just my 2 > cents here. Fully agreed. Using CAP_DAC_READ_SEARCH instead of CAP_SYS_ADMIN doesn't make this any better, since root has all caps including CAP_DAC_READ_SEARCH. Thanks, Miklos ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-18 11:26 ` Miklos Szeredi @ 2022-05-19 4:56 ` Andrii Nakryiko 2022-05-19 8:59 ` Christian Brauner 0 siblings, 1 reply; 14+ messages in thread From: Andrii Nakryiko @ 2022-05-19 4:56 UTC (permalink / raw) To: Miklos Szeredi Cc: Christian Brauner, Dave Marchevsky, Christian Brauner, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko, clm On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote: > > > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote: > > > > Sorry to ressurect this old thread. My proposed alternate approach of "special > > > ioctl to grant exception to descendant userns check" proved unnecessarily > > > complex: ioctls also go through fuse_allow_current_process check, so a special > > > carve-out would be necessary for in both ioctl and fuse_permission check in > > > order to make it possible for non-descendant-userns user to opt in to exception. > > > > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way > > > there's more of a clear opt-in vs CAP_SYS_ADMIN. > > > > I still think this isn't needed given that especially for the use-cases > > listed here you have a workable userspace solution to this problem. Unfortunately such userspace solution isn't that great in practice. It's both very cumbersome to implement and integrate into existing profiling solutions and causes undesired inefficiencies when processing (typically for stack trace symbolization) lots of profiled processes. > > > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about > > giving a privileged task access then it'd be fine imho. But given that > > this means the privileged task is open to a DoS attack it seems we're > > building a trap into the fuse code. Running under root presumably means that the application knows what it's doing (and it can do a lot of dangerous and harmful things outside of FUSE already), so why should there be any more opt in for it to access file contents? CAP_SYS_ADMIN can do pretty much anything in the system, it seems a bit asymmetric to have extra FUSE-specific restrictions for it. > > > > The setns() model has the advantage that this forces the task to assume > > the correct privileges and also serves as an explicit opt-in. Just my 2 > > cents here. > > Fully agreed. Using CAP_DAC_READ_SEARCH instead of CAP_SYS_ADMIN > doesn't make this any better, since root has all caps including > CAP_DAC_READ_SEARCH. > > Thanks, > Miklos ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-19 4:56 ` Andrii Nakryiko @ 2022-05-19 8:59 ` Christian Brauner 2022-05-24 4:35 ` Andrii Nakryiko 0 siblings, 1 reply; 14+ messages in thread From: Christian Brauner @ 2022-05-19 8:59 UTC (permalink / raw) To: Andrii Nakryiko Cc: Miklos Szeredi, Dave Marchevsky, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko, clm On Wed, May 18, 2022 at 09:56:26PM -0700, Andrii Nakryiko wrote: > On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote: > > > > > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote: > > > > > > Sorry to ressurect this old thread. My proposed alternate approach of "special > > > > ioctl to grant exception to descendant userns check" proved unnecessarily > > > > complex: ioctls also go through fuse_allow_current_process check, so a special > > > > carve-out would be necessary for in both ioctl and fuse_permission check in > > > > order to make it possible for non-descendant-userns user to opt in to exception. > > > > > > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way > > > > there's more of a clear opt-in vs CAP_SYS_ADMIN. > > > > > > I still think this isn't needed given that especially for the use-cases > > > listed here you have a workable userspace solution to this problem. > > Unfortunately such userspace solution isn't that great in practice. > It's both very cumbersome to implement and integrate into existing > profiling solutions and causes undesired inefficiencies when > processing (typically for stack trace symbolization) lots of profiled > processes. > > > > > > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about > > > giving a privileged task access then it'd be fine imho. But given that > > > this means the privileged task is open to a DoS attack it seems we're > > > building a trap into the fuse code. > > Running under root presumably means that the application knows what > it's doing (and it can do a lot of dangerous and harmful things > outside of FUSE already), so why should there be any more opt in for > it to access file contents? CAP_SYS_ADMIN can do pretty much anything > in the system, it seems a bit asymmetric to have extra FUSE-specific > restrictions for it. Processes trying to access a fuse filesystem that is not in the same userns or a descendant userns are open to DoS attacks. This specifically includes processes capable in the initial userns. If it suddenly becomes possible that an initial userns capable process can access fuse filesystems in any userns than any such process accessing a fuse filesystem unintentionally will be susceptible to DoS attacks. Iow, the problem isn't that an initial userns capable process is doing something harmful and we're overly careful trying to prevent this and thereby going against standard CAP_SYS_ADMIN assumptions; it's that an initial userns capable process can unintentionally have something harmful done to it simply by accessing a fuse filesystem. This is even more concerning since rn this isn't possible so this patch is removing a protection/security mechanism. The performance argument isn't enough to justify this imho. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-19 8:59 ` Christian Brauner @ 2022-05-24 4:35 ` Andrii Nakryiko 2022-05-24 7:07 ` Miklos Szeredi 0 siblings, 1 reply; 14+ messages in thread From: Andrii Nakryiko @ 2022-05-24 4:35 UTC (permalink / raw) To: Christian Brauner Cc: Miklos Szeredi, Dave Marchevsky, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko, clm, Arnaldo Carvalho de Melo On Thu, May 19, 2022 at 1:59 AM Christian Brauner <brauner@kernel.org> wrote: > > On Wed, May 18, 2022 at 09:56:26PM -0700, Andrii Nakryiko wrote: > > On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > > > > On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote: > > > > > > > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote: > > > > > > > > Sorry to ressurect this old thread. My proposed alternate approach of "special > > > > > ioctl to grant exception to descendant userns check" proved unnecessarily > > > > > complex: ioctls also go through fuse_allow_current_process check, so a special > > > > > carve-out would be necessary for in both ioctl and fuse_permission check in > > > > > order to make it possible for non-descendant-userns user to opt in to exception. > > > > > > > > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way > > > > > there's more of a clear opt-in vs CAP_SYS_ADMIN. > > > > > > > > I still think this isn't needed given that especially for the use-cases > > > > listed here you have a workable userspace solution to this problem. > > > > Unfortunately such userspace solution isn't that great in practice. > > It's both very cumbersome to implement and integrate into existing > > profiling solutions and causes undesired inefficiencies when > > processing (typically for stack trace symbolization) lots of profiled > > processes. > > > > > > > > > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about > > > > giving a privileged task access then it'd be fine imho. But given that > > > > this means the privileged task is open to a DoS attack it seems we're > > > > building a trap into the fuse code. > > > > Running under root presumably means that the application knows what > > it's doing (and it can do a lot of dangerous and harmful things > > outside of FUSE already), so why should there be any more opt in for > > it to access file contents? CAP_SYS_ADMIN can do pretty much anything > > in the system, it seems a bit asymmetric to have extra FUSE-specific > > restrictions for it. > > Processes trying to access a fuse filesystem that is not in the same > userns or a descendant userns are open to DoS attacks. This specifically > includes processes capable in the initial userns. Sure, but by DoS attack here you mean that a capable (I'll just say "root" for simplicity) process might get stuck. While not great, it's not as horrible as crashing the kernel or something along those lines. So let's keep this perspective in mind, because here we are talking about disabling very useful functionality (it's not a hypothetical problem, it's a real production problem that users are struggling with right now) while trying to prevent root process (which has to be careful anyways as it's a root process after all) from shooting itself in the foot. > > If it suddenly becomes possible that an initial userns capable process > can access fuse filesystems in any userns than any such process > accessing a fuse filesystem unintentionally will be susceptible to DoS > attacks. > > Iow, the problem isn't that an initial userns capable process is doing > something harmful and we're overly careful trying to prevent this and > thereby going against standard CAP_SYS_ADMIN assumptions; it's that an > initial userns capable process can unintentionally have something > harmful done to it simply by accessing a fuse filesystem. > > This is even more concerning since rn this isn't possible so this patch > is removing a protection/security mechanism. The performance argument > isn't enough to justify this imho. Performance is a big deal in the fleet of many thousands of servers, so let's not just dismiss this argument so easily. Also, in a lot of cases (production systems, properly audited, monitored, secured, etc) the workloads can be trusted, so the DoS attack by an unprivileged process is not a huge concern. On the other hand, though, inability to read FUSE-backed files by root process is a huge blocker for stack symbolization, as one specific use case. And spinning threads for hundreds of target processes is not a viable solution in production, unfortunately. But given for some situations it might be better to be safe than sorry, can we let users building kernels decide on this? How about adding Kconfig value that would allow such access for CAP_SYS_ADMIN (or whichever capability makes most sense). E.g., CONFIG_FUSE_ALLOW_OTHER_PERMISSIVE would allow reads, but otherwise reject them? Would something like this be a good compromise here? I still think that tools like perf being able to provide good tracing data is going to hurt due to this cautious rejection of access, but with Kconfig we at least give an option for users to opt out of it. WDYT? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-24 4:35 ` Andrii Nakryiko @ 2022-05-24 7:07 ` Miklos Szeredi 2022-05-24 14:59 ` Rik van Riel 2022-05-24 15:44 ` Christian Brauner 0 siblings, 2 replies; 14+ messages in thread From: Miklos Szeredi @ 2022-05-24 7:07 UTC (permalink / raw) To: Andrii Nakryiko Cc: Christian Brauner, Dave Marchevsky, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko, Chris Mason, Arnaldo Carvalho de Melo On Tue, 24 May 2022 at 06:36, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > I still think that tools like perf being able to provide good tracing > data is going to hurt due to this cautious rejection of access, but > with Kconfig we at least give an option for users to opt out of it. > WDYT? I'd rather use a module option for this, always defaulting to off . Then sysadmin then can choose to turn this protection off if necessary. This would effectively be the same as "user_allow_other" option in /etc/fuse.conf, which fusermount interprets but the kernel doesn't. Thanks, Miklos ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-24 7:07 ` Miklos Szeredi @ 2022-05-24 14:59 ` Rik van Riel 2022-05-24 15:44 ` Christian Brauner 1 sibling, 0 replies; 14+ messages in thread From: Rik van Riel @ 2022-05-24 14:59 UTC (permalink / raw) To: Miklos Szeredi, Andrii Nakryiko Cc: Christian Brauner, Dave Marchevsky, linux-fsdevel, Seth Forshee, kernel-team, Andrii Nakryiko, Chris Mason, Arnaldo Carvalho de Melo [-- Attachment #1: Type: text/plain, Size: 880 bytes --] On Tue, 2022-05-24 at 09:07 +0200, Miklos Szeredi wrote: > On Tue, 24 May 2022 at 06:36, Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > I still think that tools like perf being able to provide good > > tracing > > data is going to hurt due to this cautious rejection of access, but > > with Kconfig we at least give an option for users to opt out of it. > > WDYT? > > I'd rather use a module option for this, always defaulting to off . > Then sysadmin then can choose to turn this protection off if > necessary. This would effectively be the same as "user_allow_other" > option in /etc/fuse.conf, which fusermount interprets but the kernel > doesn't. Configuring that behavior through /sys/module/fuse/user_allow_other (or some other name if people have better ideas) seems like a good way to configure that, indeed! -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount 2022-05-24 7:07 ` Miklos Szeredi 2022-05-24 14:59 ` Rik van Riel @ 2022-05-24 15:44 ` Christian Brauner 1 sibling, 0 replies; 14+ messages in thread From: Christian Brauner @ 2022-05-24 15:44 UTC (permalink / raw) To: Miklos Szeredi, Andrii Nakryiko Cc: Dave Marchevsky, linux-fsdevel, Seth Forshee, Rik van Riel, kernel-team, Andrii Nakryiko, Chris Mason, Arnaldo Carvalho de Melo On Tue, May 24, 2022 at 09:07:34AM +0200, Miklos Szeredi wrote: > On Tue, 24 May 2022 at 06:36, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > I still think that tools like perf being able to provide good tracing > > data is going to hurt due to this cautious rejection of access, but > > with Kconfig we at least give an option for users to opt out of it. > > WDYT? > > I'd rather use a module option for this, always defaulting to off . > Then sysadmin then can choose to turn this protection off if > necessary. This would effectively be the same as "user_allow_other" > option in /etc/fuse.conf, which fusermount interprets but the kernel > doesn't. Agreed. Should be properly documented. Christian ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-05-24 15:45 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-11 22:11 [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount Dave Marchevsky 2021-11-12 2:10 ` Rik van Riel 2021-11-12 10:13 ` Christian Brauner 2021-11-12 23:29 ` Dave Marchevsky 2021-11-15 15:28 ` Miklos Szeredi 2022-05-17 16:50 ` Dave Marchevsky 2022-05-18 11:22 ` Christian Brauner 2022-05-18 11:26 ` Miklos Szeredi 2022-05-19 4:56 ` Andrii Nakryiko 2022-05-19 8:59 ` Christian Brauner 2022-05-24 4:35 ` Andrii Nakryiko 2022-05-24 7:07 ` Miklos Szeredi 2022-05-24 14:59 ` Rik van Riel 2022-05-24 15:44 ` Christian Brauner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.