linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rheinsberg <david.rheinsberg@gmail.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org, linux-api@vger.kernel.org,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Serge Hallyn" <serge@hallyn.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	"Tejun Heo" <tj@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Saravana Kannan" <saravanak@google.com>,
	"Jan Kara" <jack@suse.cz>, "David Howells" <dhowells@redhat.com>,
	"Seth Forshee" <seth.forshee@canonical.com>,
	"Tom Gundersen" <teg@jklm.no>,
	"Christian Kellner" <ckellner@redhat.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Stéphane Graber" <stgraber@ubuntu.com>,
	linux-doc@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 2/8] loopfs: implement loopfs
Date: Sun, 12 Apr 2020 12:38:54 +0200	[thread overview]
Message-ID: <CADyDSO54FV7OaVwWremmnNbTkvw6hQ-KTLJdEg3V5rfBi8n3Yw@mail.gmail.com> (raw)
In-Reply-To: <20200409082659.exequ3evhlv33csr@wittgenstein>

Hey

On Thu, Apr 9, 2020 at 10:27 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> On Thu, Apr 09, 2020 at 07:39:18AM +0200, David Rheinsberg wrote:
> > With loopfs in place, any process can create its own user_ns, mount
> > their private loopfs and create as many loop-devices as they want.
> > Hence, this limit does not serve as an effective global
> > resource-control. Secondly, anyone with access to `loop-control` can
> > now create loop instances until this limit is hit, thus causing anyone
> > else to be unable to create more. This effectively prevents you from
> > sharing a loopfs between non-trusting parties. I am unsure where that
> > limit would actually be used?
>
> Restricting it globally indeed wasn't the intended use-case for it. This
> was more so that you can specify an instance limit, bind-mount that
> instance into several places and sufficiently locked down users cannot
> exceed the instance limit.

But then these users can each exhaust the limit individually. As such,
you cannot share this instance across users that have no
trust-relationship. Fine with me, but I still don't understand in
which scenario the limit would be useful. Anyone can create a user-ns,
create a new loopfs mount, and just happily create more loop-devices.
So what is so special that you want to restrict the devices on a
_single_ mount instance?

> I don't think we'd be getting much out of a global limit per se I think
> the initial namespace being able to reserve a bunch of devices
> they can always rely on being able create when they need them is more
> interesting. This is similat to what devpts implements with the
> "reserved" mount option and what I initially proposed for binderfs. For
> the latter it was deemed unnecessary by others so I dropped it from
> loopfs too.

The `reserve` of devpts has a fixed 2-tier system: A global limit, and
a init-ns reserve. This does nothing to protect one container from
another.

Furthermore, how do you intend to limit user-space from creating an
unbound amount of loop devices? Unless I am mistaken, with your
proposal *any* process can create a new loopfs with a basically
unlimited amount of loop-devices, thus easily triggering unbound
kernel allocations. I think this needs to be accounted. The classic
way is to put a per-uid limit into `struct user_struct` (done by
pipes, mlock, epoll, mq, etc.). An alternative is `struct ucount`,
which allows hierarchical management (inotify uses that, as an
example).

> I also expect most users to pre-create devices in the initial namespace
> instance they need (e.g. similar to what binderfs does or what loop
> devices currently have). Does that make sense to you?

Our use-case is to get programmatic access to loop-devices, so we can
build customer images on request (especially to create XFS images,
since mkfs.xfs cannot write them, IIRC). We would be perfectly happy
with a kernel-interface that takes a file-descriptor to a regular file
and returns us a file-descriptor to a newly created block device
(which is automatically destroyed when the last file-descriptor to it
is closed). This would be ideal *to us*, since it would do automatic
cleanup on crashes.

We don't need any representation of the loop-device in the
file-system, as long as we can somehow mount it (either by passing the
bdev-FD to the new mount-api, or by using /proc/self/fd/ as
mount-source).

With your proposed loop-fs we could achieve something close to it:
Mount a private loopfs, create a loop-device, and rely on automatic
cleanup when the mount-namespace is destroyed.

Thanks
David

  reply	other threads:[~2020-04-12 10:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 15:21 [PATCH 0/8] loopfs Christian Brauner
2020-04-08 15:21 ` [PATCH 1/8] kobject_uevent: remove unneeded netlink_ns check Christian Brauner
2020-04-08 15:21 ` [PATCH 2/8] loopfs: implement loopfs Christian Brauner
2020-04-09  5:39   ` David Rheinsberg
2020-04-09  8:26     ` Christian Brauner
2020-04-12 10:38       ` David Rheinsberg [this message]
2020-04-12 12:03         ` Christian Brauner
2020-04-12 13:04           ` Christian Brauner
2020-04-12 13:44           ` David Rheinsberg
2020-04-09  7:53   ` Christoph Hellwig
2020-04-09  8:33     ` Christian Brauner
2020-04-08 15:21 ` [PATCH 3/8] loop: use ns_capable for some loop operations Christian Brauner
2020-04-08 15:21 ` [PATCH 4/8] kernfs: handle multiple namespace tags Christian Brauner
2020-04-13 18:46   ` Tejun Heo
2020-04-08 15:21 ` [PATCH 5/8] kernfs: let objects opt-in to propagating from the initial namespace Christian Brauner
2020-04-13 19:02   ` Tejun Heo
2020-04-13 19:39     ` Christian Brauner
2020-04-13 19:45       ` Tejun Heo
2020-04-13 19:59         ` Christian Brauner
2020-04-13 20:37           ` Tejun Heo
2020-04-14 10:39             ` Christian Brauner
2020-04-08 15:21 ` [PATCH 6/8] genhd: add minimal namespace infrastructure Christian Brauner
2020-04-13 19:04   ` Tejun Heo
2020-04-13 19:42     ` Christian Brauner
2020-04-08 15:21 ` [PATCH 7/8] loopfs: start attaching correct namespace during loop_add() Christian Brauner
2020-04-08 15:21 ` [PATCH 8/8] loopfs: only show devices in their correct instance Christian Brauner
2020-04-08 16:24 ` [PATCH 0/8] loopfs Jann Horn
2020-04-08 16:41   ` Stéphane Graber
2020-04-09  7:02     ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADyDSO54FV7OaVwWremmnNbTkvw6hQ-KTLJdEg3V5rfBi8n3Yw@mail.gmail.com \
    --to=david.rheinsberg@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=christian.brauner@ubuntu.com \
    --cc=ckellner@redhat.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=dvyukov@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=saravanak@google.com \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=stgraber@ubuntu.com \
    --cc=teg@jklm.no \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).