Re: [PATCH 2/8] loopfs: implement loopfs

From: David Rheinsberg <david.rheinsberg@gmail.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org, linux-api@vger.kernel.org,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Serge Hallyn" <serge@hallyn.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Tejun Heo" <tj@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Saravana Kannan" <saravanak@google.com>,
	"Jan Kara" <jack@suse.cz>, "David Howells" <dhowells@redhat.com>,
	"Seth Forshee" <seth.forshee@canonical.com>,
	"Tom Gundersen" <teg@jklm.no>,
	"Christian Kellner" <ckellner@redhat.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Stéphane Graber" <stgraber@ubuntu.com>,
	linux-doc@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 2/8] loopfs: implement loopfs
Date: Sun, 12 Apr 2020 15:44:44 +0200	[thread overview]
Message-ID: <CADyDSO7t7xXWmc=GJVbi6GWicuvDh_80tdYbWsneR7ZoTqE79A@mail.gmail.com> (raw)
In-Reply-To: <20200412120300.vuigwofazxfbxluu@wittgenstein>

Hi

On Sun, Apr 12, 2020 at 2:03 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
[...]
> On Sun, Apr 12, 2020 at 12:38:54PM +0200, David Rheinsberg wrote:
> > which scenario the limit would be useful. Anyone can create a user-ns,
> > create a new loopfs mount, and just happily create more loop-devices.
> > So what is so special that you want to restrict the devices on a
> > _single_ mount instance?
>
> To share that instance across namespaces. You can e.g. create the
> mount instance in one mount namespace owned by userns1, create a second
> user namespace usern2 with the same mapping which is blocked from
> creating additional user namespaces either by seccomp or by
> /proc/sys/user/max_user_namespaces or lsms what have you. Because it
> doesn't own the mount namespace the loopfs mount it is in it can't
> remount it and can't exceed the local limit.

Right. But now you re-use the userns-limit to also limit loopfs (or
other userns restrictions to limit loopfs access). Existing safe
setups allow contained processes to create their own user-namespace.
With your patchset merged, every such existing contained system with
userns-access gets access to a kernel API that allows them unbound
kernel memory allocations. I don't think you can tell every existing
system to not enable CONFIG_LOOP_FS. Or to make sure to install
seccomp filters before updating their kernels. Right? These setups
already exist, and they happily use distribution kernels.

I think there is no way around `struct user_struct`, `struct ucount`,
or whatever you like.

> > Furthermore, how do you intend to limit user-space from creating an
> > unbound amount of loop devices? Unless I am mistaken, with your
> > proposal *any* process can create a new loopfs with a basically
> > unlimited amount of loop-devices, thus easily triggering unbound
> > kernel allocations. I think this needs to be accounted. The classic
> > way is to put a per-uid limit into `struct user_struct` (done by
> > pipes, mlock, epoll, mq, etc.). An alternative is `struct ucount`,
> > which allows hierarchical management (inotify uses that, as an
> > example).
>
> Yeah, I know. We can certainly do this.

My point is, I think we have to.

[...]
> > With your proposed loop-fs we could achieve something close to it:
> > Mount a private loopfs, create a loop-device, and rely on automatic
> > cleanup when the mount-namespace is destroyed.
>
> With loopfs you can do this with the old or new mount api and you don't
> need to have loopfs mounted for that at all. Here's a sample program
> that works right now with the old mount api:

Yeah, loopfs would certainly allow this, and I would be perfectly
happy with this API. I think it is overly heavy for the use-case we
have, but I do acknowledge that there are other use-cases as well.
But I think your claim that "you don't need to have loopfs mounted" is
misleading. loopfs must be mounted for the entirety of the program.
Instead, you don't have to have it linked in your mount-namespace,
since you can immediately detach it. And with the new mount-APIs, you
don't even need it linked initially, as you can create a detached
mount right away.

Thanks
David