Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	linux-fsdevel@vger.kernel.org,
	Seth Forshee <sforshee@digitalocean.com>,
	Rik van Riel <riel@surriel.com>, kernel-team <kernel-team@fb.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	clm@fb.com, Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [PATCH] fuse: allow CAP_SYS_ADMIN in root userns to access allow_other mount
Date: Mon, 23 May 2022 21:35:58 -0700	[thread overview]
Message-ID: <CAEf4BzY5en_O9NtKUB=1uHkGdHLSo_FqddUkokh7pcEWAQ2omw@mail.gmail.com> (raw)
In-Reply-To: <20220519085919.yqj2hvlzg7gpzby3@wittgenstein>

On Thu, May 19, 2022 at 1:59 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, May 18, 2022 at 09:56:26PM -0700, Andrii Nakryiko wrote:
> > On Wed, May 18, 2022 at 4:26 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Wed, 18 May 2022 at 13:22, Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Tue, May 17, 2022 at 12:50:32PM -0400, Dave Marchevsky wrote:
> > >
> > > > > Sorry to ressurect this old thread. My proposed alternate approach of "special
> > > > > ioctl to grant exception to descendant userns check" proved unnecessarily
> > > > > complex: ioctls also go through fuse_allow_current_process check, so a special
> > > > > carve-out would be necessary for in both ioctl and fuse_permission check in
> > > > > order to make it possible for non-descendant-userns user to opt in to exception.
> > > > >
> > > > > How about a version of this patch with CAP_DAC_READ_SEARCH check? This way
> > > > > there's more of a clear opt-in vs CAP_SYS_ADMIN.
> > > >
> > > > I still think this isn't needed given that especially for the use-cases
> > > > listed here you have a workable userspace solution to this problem.
> >
> > Unfortunately such userspace solution isn't that great in practice.
> > It's both very cumbersome to implement and integrate into existing
> > profiling solutions and causes undesired inefficiencies when
> > processing (typically for stack trace symbolization) lots of profiled
> > processes.
> >
> > > >
> > > > If the CAP_SYS_ADMIN/CAP_DAC_READ_SEARCH check were really just about
> > > > giving a privileged task access then it'd be fine imho. But given that
> > > > this means the privileged task is open to a DoS attack it seems we're
> > > > building a trap into the fuse code.
> >
> > Running under root presumably means that the application knows what
> > it's doing (and it can do a lot of dangerous and harmful things
> > outside of FUSE already), so why should there be any more opt in for
> > it to access file contents? CAP_SYS_ADMIN can do pretty much anything
> > in the system, it seems a bit asymmetric to have extra FUSE-specific
> > restrictions for it.
>
> Processes trying to access a fuse filesystem that is not in the same
> userns or a descendant userns are open to DoS attacks. This specifically
> includes processes capable in the initial userns.

Sure, but by DoS attack here you mean that a capable (I'll just say
"root" for simplicity) process might get stuck. While not great, it's
not as horrible as crashing the kernel or something along those lines.
So let's keep this perspective in mind, because here we are talking
about disabling very useful functionality (it's not a hypothetical
problem, it's a real production problem that users are struggling with
right now) while trying to prevent root process (which has to be
careful anyways as it's a root process after all) from shooting itself
in the foot.

>
> If it suddenly becomes possible that an initial userns capable process
> can access fuse filesystems in any userns than any such process
> accessing a fuse filesystem unintentionally will be susceptible to DoS
> attacks.
>
> Iow, the problem isn't that an initial userns capable process is doing
> something harmful and we're overly careful trying to prevent this and
> thereby going against standard CAP_SYS_ADMIN assumptions; it's that an
> initial userns capable process can unintentionally have something
> harmful done to it simply by accessing a fuse filesystem.
>
> This is even more concerning since rn this isn't possible so this patch
> is removing a protection/security mechanism. The performance argument
> isn't enough to justify this imho.

Performance is a big deal in the fleet of many thousands of servers,
so let's not just dismiss this argument so easily. Also, in a lot of
cases (production systems, properly audited, monitored, secured, etc)
the workloads can be trusted, so the DoS attack by an unprivileged
process is not a huge concern. On the other hand, though, inability to
read FUSE-backed files by root process is a huge blocker for stack
symbolization, as one specific use case. And spinning threads for
hundreds of target processes is not a viable solution in production,
unfortunately.

But given for some situations it might be better to be safe than
sorry, can we let users building kernels decide on this? How about
adding Kconfig value that would allow such access for CAP_SYS_ADMIN
(or whichever capability makes most sense). E.g.,
CONFIG_FUSE_ALLOW_OTHER_PERMISSIVE would allow reads, but otherwise
reject them? Would something like this be a good compromise here?

I still think that tools like perf being able to provide good tracing
data is going to hurt due to this cautious rejection of access, but
with Kconfig we at least give an option for users to opt out of it.
WDYT?