linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge@hallyn.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>,
	Marian Marinov <mm@1h.com>, Andy Lutomirski <luto@amacapital.net>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	"Michael H. Warfield" <mhw@wittsend.com>,
	Arnd Bergmann <arnd@arndb.de>,
	LXC development mailing-list 
	<lxc-devel@lists.linuxcontainers.org>,
	Richard Weinberger <richard@nod.at>,
	LKML <linux-kernel@vger.kernel.org>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces
Date: Mon, 26 May 2014 00:24:43 +0200	[thread overview]
Message-ID: <20140525222443.GA18410@mail.hallyn.com> (raw)
In-Reply-To: <1401005530.2322.43.camel@dabdike.int.hansenpartnership.com>

Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> On Sat, 2014-05-24 at 22:25 +0000, Serge Hallyn wrote:
> > Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
> > > On Fri, 2014-05-23 at 11:20 +0300, Marian Marinov wrote:
> > > > On 05/20/2014 05:19 PM, Serge Hallyn wrote:
> > > > > Quoting Andy Lutomirski (luto@amacapital.net):
> > > > >> On May 15, 2014 1:26 PM, "Serge E. Hallyn" <serge@hallyn.com> wrote:
> > > > >>> 
> > > > >>> Quoting Richard Weinberger (richard@nod.at):
> > > > >>>> Am 15.05.2014 21:50, schrieb Serge Hallyn:
> > > > >>>>> Quoting Richard Weinberger (richard.weinberger@gmail.com):
> > > > >>>>>> On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > > > >>>>>>> Then don't use a container to build such a thing, or fix the build scripts to not do that :)
> > > > >>>>>> 
> > > > >>>>>> I second this. To me it looks like some folks try to (ab)use Linux containers for purposes where KVM
> > > > >>>>>> would much better fit in. Please don't put more complexity into containers. They are already horrible
> > > > >>>>>> complex and error prone.
> > > > >>>>> 
> > > > >>>>> I, naturally, disagree :)  The only use case which is inherently not valid for containers is running a
> > > > >>>>> kernel.  Practically speaking there are other things which likely will never be possible, but if someone 
> > > > >>>>> offers a way to do something in containers, "you can't do that in containers" is not an apropos response.
> > > > >>>>> 
> > > > >>>>> "That abstraction is wrong" is certainly valid, as when vpids were originally proposed and rejected,
> > > > >>>>> resulting in the development of pid namespaces.  "We have to work out (x) first" can be valid (and I can
> > > > >>>>> think of examples here), assuming it's not just trying to hide behind a catch-22/chicken-egg problem.
> > > > >>>>> 
> > > > >>>>> Finally, saying "containers are complex and error prone" is conflating several large suites of userspace
> > > > >>>>> code and many kernel features which support them.  Being more precise would, if the argument is valid, lend
> > > > >>>>> it a lot more weight.
> > > > >>>> 
> > > > >>>> We (my company) use Linux containers since 2011 in production. First LXC, now libvirt-lxc. To understand the
> > > > >>>> internals better I also wrote my own userspace to create/start containers. There are so many things which can
> > > > >>>> hurt you badly. With user namespaces we expose a really big attack surface to regular users. I.e. Suddenly a
> > > > >>>> user is allowed to mount filesystems.
> > > > >>> 
> > > > >>> That is currently not the case.  They can mount some virtual filesystems and do bind mounts, but cannot mount
> > > > >>> most real filesystems.  This keeps us protected (for now) from potentially unsafe superblock readers in the 
> > > > >>> kernel.
> > > > >>> 
> > > > >>>> Ask Andy, he found already lots of nasty things...
> > > > >> 
> > > > >> I don't think I have anything brilliant to add to this discussion right now, except possibly:
> > > > >> 
> > > > >> ISTM that Linux distributions are, in general, vulnerable to all kinds of shenanigans that would happen if an
> > > > >> untrusted user can cause a block device to appear.  That user doesn't need permission to mount it
> > > > > 
> > > > > Interesting point.  This would further suggest that we absolutely must ensure that a loop device which shows up in
> > > > > the container does not also show up in the host.
> > > > 
> > > > Can I suggest the usage of the devices cgroup to achieve that?
> > > 
> > > Not really ... cgroups impose resource limits, it's namespaces that
> > > impose visibility separations.  In theory this can be done with the
> > > device namespace that's been proposed; however, a simpler way is simply
> > > to rm the device node in the host and mknod it in the guest.  I don't
> > > really see host visibility as a huge problem: in a shared OS
> > > virtualisation it's not really possible securely to separate the guest
> > > from the host (only vice versa).
> > > 
> > > But I really don't think we want to do it this way.  Giving a container
> > > the ability to do a mount is too dangerous.  What we want to do is
> > > intercept the mount in the host and perform it on behalf of the guest as
> > > host root in the guest's mount namespace.  If you do it that way, it
> > 
> > That doesn't help the problem of guests being able to provide bad input
> > for (basically fuzz) the in-kernel filesystem code.  So apparently I'm
> > suffering a failure of the imagination - what problem exactly does it solve?
> 
> Well, there's two types of fuzzing, one is on sys_mount, which this
> would help with because the host filters the mount including all
> parameters and may even redo the mount (from direct to bind etc).

Sorry - I'm not *trying* to be dense, but am still not seeing it.

Let's assume that we continue to be strict about what a container may
mount - let's say they can only mount using loopdev from blockdev images.
They have to own the file, as well as the mount target.  Whatever they
do with sys_mount, the only danger I see is the one where the filesystem
data is bad and causes a DOS or privilege escalation in some bad fs
reading code in the kernel.

What else is there?  Are you thinking of the sys_mount flags?  I guess
the void *data?  (Though I see that as the same problem;  we're just
not trusting the fs code to deal with badly formed data)

> If you're thinking the system can be compromised by fuzzing within the
> filesystem, then yes, I agree, but it's the same vulnerability an
> unvirtualised host would have, so I don't necessarily see it as our
> problem.
> 
> The problem vectored mount solves is the one of not wanting root in the
> container to have unfettered access to sys_mount because it allows the
> host to vet all calls and execute the ones it likes in the context of
> real root (possibly after modifying the parameters).
> 
> James
> 

  reply	other threads:[~2014-05-25 22:24 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-14 21:34 [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 01/11] driver core: Assign owning user namespace to devices Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 02/11] driver core: Add device_create_global() Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 03/11] tmpfs: Add sub-filesystem data pointer to shmem_sb_info Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 04/11] ramfs: Add sub-filesystem data pointer to ram_fs_info Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 05/11] devtmpfs: Add support for mounting in user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 06/11] drivers/char/mem.c: Make null/zero/full/random/urandom available to " Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 07/11] block: Make partitions inherit namespace from whole disk device Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 08/11] block: Allow blkdev ioctls within user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 09/11] misc: Make loop-control available to all " Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 10/11] loop: Assign devices to current_user_ns() Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 11/11] loop: Allow priveleged operations for root in the namespace which owns a device Seth Forshee
2014-05-23  5:48   ` Marian Marinov
2014-05-26  9:16     ` Seth Forshee
2014-05-26 15:32       ` [lxc-devel] " Michael H. Warfield
2014-05-26 15:45         ` Seth Forshee
2014-05-27  1:36         ` Serge E. Hallyn
2014-05-27  2:39           ` Michael H. Warfield
2014-05-27  7:16             ` Seth Forshee
2014-05-27 13:16             ` Serge Hallyn
2014-05-15  1:32 ` [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Greg Kroah-Hartman
2014-05-15  2:17   ` [lxc-devel] " Michael H. Warfield
2014-05-15  3:15     ` Seth Forshee
2014-05-15  4:00       ` Greg Kroah-Hartman
2014-05-15 13:42         ` Michael H. Warfield
2014-05-15 14:08           ` Greg Kroah-Hartman
2014-05-15 17:42             ` Serge Hallyn
2014-05-15 18:12               ` Seth Forshee
2014-05-15 22:15               ` Greg Kroah-Hartman
2014-05-16  1:42                 ` Michael H. Warfield
2014-05-16  7:56                   ` Richard Weinberger
2014-05-16 19:20                   ` James Bottomley
2014-05-16 19:42                     ` Michael H. Warfield
2014-05-16 19:52                       ` [lxc-devel] Mount and other notifiers, was: " James Bottomley
2014-05-16 20:04                         ` Michael H. Warfield
2014-05-16  1:49                 ` [lxc-devel] " Serge Hallyn
2014-05-16  4:35                   ` Greg Kroah-Hartman
2014-05-16 14:06                     ` Seth Forshee
2014-05-16 15:28                       ` Michael H. Warfield
2014-05-16 15:43                         ` Seth Forshee
2014-05-16 18:57                       ` Greg Kroah-Hartman
2014-05-16 19:28                         ` James Bottomley
2014-05-16 20:18                           ` Seth Forshee
2014-05-20  0:04                             ` Eric W. Biederman
2014-05-20  1:14                               ` Michael H. Warfield
2014-05-20 14:18                                 ` Serge Hallyn
2014-05-20 14:21                               ` Seth Forshee
2014-05-21 22:00                                 ` Eric W. Biederman
2014-05-21 22:33                                   ` Serge Hallyn
2014-05-23 22:23                                     ` Eric W. Biederman
2014-05-28  9:26                                       ` Seth Forshee
2014-05-28 13:12                                         ` Serge E. Hallyn
2014-05-28 20:33                                           ` Eric W. Biederman
2014-05-18  2:42                           ` Serge E. Hallyn
2014-05-17  4:31                     ` Eric W. Biederman
2014-05-17 16:01                       ` Seth Forshee
2014-05-18  2:44                         ` Serge E. Hallyn
2014-05-19 13:27                           ` Seth Forshee
2014-05-20 14:15                             ` Serge Hallyn
2014-05-20 14:26                               ` Serge Hallyn
2014-05-17 12:57                     ` Michael H. Warfield
2014-05-15 18:25             ` Richard Weinberger
2014-05-15 19:50               ` Serge Hallyn
2014-05-15 20:13                 ` Richard Weinberger
2014-05-15 20:26                   ` Serge E. Hallyn
2014-05-15 20:33                     ` Richard Weinberger
2014-05-19 20:22                     ` Andy Lutomirski
2014-05-20 14:19                       ` Serge Hallyn
2014-05-23  8:20                         ` Marian Marinov
2014-05-23 13:16                           ` James Bottomley
2014-05-23 16:39                             ` Andy Lutomirski
2014-05-24 22:25                             ` Serge Hallyn
2014-05-25  8:12                               ` James Bottomley
2014-05-25 22:24                                 ` Serge E. Hallyn [this message]
2014-05-28  7:02                                   ` James Bottomley
2014-05-28 13:49                                     ` Serge Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140525222443.GA18410@mail.hallyn.com \
    --to=serge@hallyn.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=lxc-devel@lists.linuxcontainers.org \
    --cc=mhw@wittsend.com \
    --cc=mm@1h.com \
    --cc=richard@nod.at \
    --cc=serge.hallyn@canonical.com \
    --cc=serge.hallyn@ubuntu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).