All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: David Rheinsberg <david.rheinsberg@gmail.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org, linux-api@vger.kernel.org,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Serge Hallyn" <serge@hallyn.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Tejun Heo" <tj@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Saravana Kannan" <saravanak@google.com>,
	"Jan Kara" <jack@suse.cz>, "David Howells" <dhowells@redhat.com>,
	"Seth Forshee" <seth.forshee@canonical.com>,
	"Tom Gundersen" <teg@jklm.no>,
	"Christian Kellner" <ckellner@redhat.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Stéphane Graber" <stgraber@ubuntu.com>,
	linux-doc@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 2/8] loopfs: implement loopfs
Date: Sun, 12 Apr 2020 15:04:57 +0200	[thread overview]
Message-ID: <20200412130457.a7ma4z3lmry5tcmc@wittgenstein> (raw)
In-Reply-To: <20200412120300.vuigwofazxfbxluu@wittgenstein>

On Sun, Apr 12, 2020 at 02:03:00PM +0200, Christian Brauner wrote:
> On Sun, Apr 12, 2020 at 12:38:54PM +0200, David Rheinsberg wrote:
> > Hey
> > 
> > On Thu, Apr 9, 2020 at 10:27 AM Christian Brauner
> > <christian.brauner@ubuntu.com> wrote:
> > > On Thu, Apr 09, 2020 at 07:39:18AM +0200, David Rheinsberg wrote:
> > > > With loopfs in place, any process can create its own user_ns, mount
> > > > their private loopfs and create as many loop-devices as they want.
> > > > Hence, this limit does not serve as an effective global
> > > > resource-control. Secondly, anyone with access to `loop-control` can
> > > > now create loop instances until this limit is hit, thus causing anyone
> > > > else to be unable to create more. This effectively prevents you from
> > > > sharing a loopfs between non-trusting parties. I am unsure where that
> > > > limit would actually be used?
> > >
> > > Restricting it globally indeed wasn't the intended use-case for it. This
> > > was more so that you can specify an instance limit, bind-mount that
> > > instance into several places and sufficiently locked down users cannot
> > > exceed the instance limit.
> > 
> > But then these users can each exhaust the limit individually. As such,
> > you cannot share this instance across users that have no
> > trust-relationship. Fine with me, but I still don't understand in
> 
> Well, you can't really share anything across clients with the same
> privilege level if one of them is untrusted.
> 
> > which scenario the limit would be useful. Anyone can create a user-ns,
> > create a new loopfs mount, and just happily create more loop-devices.
> > So what is so special that you want to restrict the devices on a
> > _single_ mount instance?
> 
> To share that instance across namespaces. You can e.g. create the
> mount instance in one mount namespace owned by userns1, create a second
> user namespace usern2 with the same mapping which is blocked from
> creating additional user namespaces either by seccomp or by
> /proc/sys/user/max_user_namespaces or lsms what have you. Because it
> doesn't own the mount namespace the loopfs mount it is in it can't
> remount it and can't exceed the local limit.
> 
> > 
> > > I don't think we'd be getting much out of a global limit per se I think
> > > the initial namespace being able to reserve a bunch of devices
> > > they can always rely on being able create when they need them is more
> > > interesting. This is similat to what devpts implements with the
> > > "reserved" mount option and what I initially proposed for binderfs. For
> > > the latter it was deemed unnecessary by others so I dropped it from
> > > loopfs too.
> > 
> > The `reserve` of devpts has a fixed 2-tier system: A global limit, and
> > a init-ns reserve. This does nothing to protect one container from
> > another.
> 
> What I was getting at is that what matters first and foremost is
> protecting init userns.
> 
> > 
> > Furthermore, how do you intend to limit user-space from creating an
> > unbound amount of loop devices? Unless I am mistaken, with your
> > proposal *any* process can create a new loopfs with a basically
> > unlimited amount of loop-devices, thus easily triggering unbound
> > kernel allocations. I think this needs to be accounted. The classic
> > way is to put a per-uid limit into `struct user_struct` (done by
> > pipes, mlock, epoll, mq, etc.). An alternative is `struct ucount`,
> > which allows hierarchical management (inotify uses that, as an
> > example).
> 
> Yeah, I know. We can certainly do this.
> 
> > 
> > > I also expect most users to pre-create devices in the initial namespace
> > > instance they need (e.g. similar to what binderfs does or what loop
> > > devices currently have). Does that make sense to you?
> > 
> > Our use-case is to get programmatic access to loop-devices, so we can
> > build customer images on request (especially to create XFS images,
> > since mkfs.xfs cannot write them, IIRC). We would be perfectly happy
> > with a kernel-interface that takes a file-descriptor to a regular file
> > and returns us a file-descriptor to a newly created block device
> > (which is automatically destroyed when the last file-descriptor to it
> > is closed). This would be ideal *to us*, since it would do automatic
> > cleanup on crashes.
> > 
> > We don't need any representation of the loop-device in the
> > file-system, as long as we can somehow mount it (either by passing the
> > bdev-FD to the new mount-api, or by using /proc/self/fd/ as
> > mount-source).
> 
> We want the ability to have a filesystem representation as it will allow
> us to handle a host of legacy workloads cleanly e.g. that users can just
> call mount -o loop /bla whenever they have opted into syscall
> interception for a particular filesystem. In addition, we can cover your
> use case completely was well I think. Both with the old and new mount api.
> 
> > 
> > With your proposed loop-fs we could achieve something close to it:
> > Mount a private loopfs, create a loop-device, and rely on automatic
> > cleanup when the mount-namespace is destroyed.
> 
> With loopfs you can do this with the old or new mount api and you don't
> need to have loopfs mounted for that at all. Here's a sample program
> that works right now with the old mount api:

That also led me to discover a bug I need to fix, so thanks!
Christian

  reply	other threads:[~2020-04-12 13:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 15:21 [PATCH 0/8] loopfs Christian Brauner
2020-04-08 15:21 ` [PATCH 1/8] kobject_uevent: remove unneeded netlink_ns check Christian Brauner
2020-04-08 15:21 ` [PATCH 2/8] loopfs: implement loopfs Christian Brauner
2020-04-09  5:39   ` David Rheinsberg
2020-04-09  8:26     ` Christian Brauner
2020-04-12 10:38       ` David Rheinsberg
2020-04-12 12:03         ` Christian Brauner
2020-04-12 13:04           ` Christian Brauner [this message]
2020-04-12 13:44           ` David Rheinsberg
2020-04-09  7:53   ` Christoph Hellwig
2020-04-09  8:33     ` Christian Brauner
2020-04-08 15:21 ` [PATCH 3/8] loop: use ns_capable for some loop operations Christian Brauner
2020-04-08 15:21 ` [PATCH 4/8] kernfs: handle multiple namespace tags Christian Brauner
2020-04-13 18:46   ` Tejun Heo
2020-04-08 15:21 ` [PATCH 5/8] kernfs: let objects opt-in to propagating from the initial namespace Christian Brauner
2020-04-08 21:42   ` kbuild test robot
2020-04-08 22:13   ` kbuild test robot
2020-04-13 19:02   ` Tejun Heo
2020-04-13 19:39     ` Christian Brauner
2020-04-13 19:45       ` Tejun Heo
2020-04-13 19:59         ` Christian Brauner
2020-04-13 20:37           ` Tejun Heo
2020-04-14 10:39             ` Christian Brauner
2020-04-08 15:21 ` [PATCH 6/8] genhd: add minimal namespace infrastructure Christian Brauner
2020-04-13 19:04   ` Tejun Heo
2020-04-13 19:42     ` Christian Brauner
2020-04-08 15:21 ` [PATCH 7/8] loopfs: start attaching correct namespace during loop_add() Christian Brauner
2020-04-08 15:21 ` [PATCH 8/8] loopfs: only show devices in their correct instance Christian Brauner
2020-04-08 16:24 ` [PATCH 0/8] loopfs Jann Horn
2020-04-08 16:41   ` Stéphane Graber
2020-04-09  7:02     ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200412130457.a7ma4z3lmry5tcmc@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=axboe@kernel.dk \
    --cc=ckellner@redhat.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=david.rheinsberg@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dvyukov@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=saravanak@google.com \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=stgraber@ubuntu.com \
    --cc=teg@jklm.no \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.