From: "Serge E. Hallyn" <serge@hallyn.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
Aristeu Rozanski <aris@ruivo.org>,
Neil Horman <nhorman@tuxdriver.com>,
"Serge E. Hallyn" <serue@us.ibm.com>,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
Thomas Graf <tgraf@suug.ch>, Paul Mackerras <paulus@samba.org>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
cgroups@vger.kernel.org, Paul Turner <pjt@google.com>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: Controlling devices and device namespaces
Date: Sat, 15 Sep 2012 22:05:20 +0000 [thread overview]
Message-ID: <20120915220520.GA11364@mail.hallyn.com> (raw)
In-Reply-To: <87wqzv7i08.fsf_-_@xmission.com>
Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
>
> > Quoting Aristeu Rozanski (aris@ruivo.org):
> >> Tejun,
> >> On Thu, Sep 13, 2012 at 01:58:27PM -0700, Tejun Heo wrote:
> >> > memcg can be handled by memcg people and I can handle cgroup_freezer
> >> > and others with help from the authors. The problematic one is
> >> > blkio. If anyone is interested in working on blkio, please be my
> >> > guest. Vivek? Glauber?
> >>
> >> if Serge is not planning to do it already, I can take a look in device_cgroup.
> >
> > That's fine with me, thanks.
> >
> >> also, heard about the desire of having a device namespace instead with
> >> support for translation ("sda" -> "sdf"). If anyone see immediate use for
> >> this please let me know.
> >
> > Before going down this road, I'd like to discuss this with at least you,
> > me, and Eric Biederman (cc:d) as to how it relates to a device
> > namespace.
>
>
> The problem with devices.
>
> - An unrestricted mknod gives you access to effectively any device in
> the system.
>
> - During process migration if the device number changes using
> stat to file descriptors can fail on the same file descriptor.
>
> - Devices coming from prexisting filesystems that we mount
> as unprivileged users are as dangerous as mknod but show
> that the problem is not limited to mknod.
>
> - udev thinks mknod is a system call we can remove from the kernel.
Also,
- udevadm trigger --action=add
causes all the devices known on the host to be re-sent to
everyone (all namespaces). Which floods everyone and causes the
host to reset some devices.
> ---
>
> The use cases seem comparitively simple to enumerate.
>
> - Giving unfiltered access to a device to someone not root.
>
> - Virtual devices that everyone uses and have no real privilege
> requirements: /dev/null /dev/tty /dev/zero etc.
>
> - Dynamically created devices /dev/loopN /dev/tun /dev/macvtapN,
> nbd, iscsi, /dev/ptsN, etc
and
- per-namespace uevent filtering.
> ---
>
> There are a couple of solution to these problems.
>
> - The classic solution of creating a /dev for a container
> before starting it.
>
> - The devpts filesystem. This works well for unprivileged access
> to ptys. Except for the /dev/ptmx sillines I very like how
> things are handled today with devpts.
>
> - Device control groups. I am not quite certain what to make
> of them. The only case I see where they are better than
> a prebuilt static dev is if there is a hotppluged device
> that I want to push into my container.
>
> I think the only problem with device control groups and
> hierarchies is that removing a device from a whitelist
> does not recurse down the hierarchy.
That's going to be fixed soon thanks to Aristeu :)
> Can a process inside of a device control group create
> a child group that has access to a subset of it's
> devices? The actually checks don't need to be hierarchical
> but the presence of device nodes should be.
If I understand your question right, yes.
> ---
>
> I see a couple of holes in the device control picture.
>
> - How do we handle hotplug events?
>
> I think we can do this by relaying events trough userspace,
> upating the device control groups etc.
>
> - Unprivileged processess interacting with all of this.
> (possibly with privilege in their user namespace)
> What I don't know how to do is how to create a couple of different
> subhierarchies each for different child processes.
>
> - Dynamically created devices.
>
> My gut feel is that we should replicate the success of devpts
> and give each type of dynamically created device it's own
> filesystem and mount point under /dev, and just bend
> the handful of userspace users into that model.
Phew. Maybe. Had not considered that. But seems daunting.
> - Sysfs
>
> My gut says for the container use case we should aim to
> simply not have dynamically created devices in sysfs
> and then we can simply not care.
>
> - Migration
>
> Either we need block device numbers that can migrate with us,
> (possibly a subset of the entire range ala devpts) or we need to send
> hotplug events to userspace right after a migration so userspace
> processes that care can invalidate their caches of stat data.
>
> ---
>
> With the code in my userns development tree I can create a user
> namespace, create a new mount namespace, and then if I have
> access to any block devices mount filesystems, all without
> needing to have any special privileges. What I haven't
> figured out is what it would take to get the the device
> control group into the middle that.
I'm really not sure that's a question we want to ask. The
device control group, like the ns cgroup, was meant as a
temporary workaround to not having user and device namespaces.
If we can come up with a device cgroup model that works to
fill all the requirements we would have for a devices ns, then
great. But I don't want us to be constrained by that.
> It feels like it should be possible to get the checks straight
> and use the device control group hooks to control which devices
> are usable in a user namespace. Unfortunately when I try and work
> it out the independence of the user namespace and the device
> control group seem to make that impossible.
>
> Shrug there is most definitely something missing from our
> model on how to handle devices well. I am hoping we can
> sprinkling some devpts derived pixie dust at the problem
> migrate userspace to some new interfaces and have life
> be good.
>
> Eric
Me too!
I'm torn between suggesting that we have a session at UDS to
discuss this, and not wanting to so that we can focus on the
remaining questions with the user namespace.
thanks,
-serge
next prev parent reply other threads:[~2012-09-15 22:04 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-13 20:58 [RFC] cgroup TODOs Tejun Heo
2012-09-14 9:04 ` Mike Galbraith
2012-09-14 17:17 ` Tejun Heo
2012-09-14 9:10 ` Daniel P. Berrange
2012-09-14 13:58 ` Vivek Goyal
2012-09-14 19:29 ` Tejun Heo
2012-09-14 21:51 ` Kay Sievers
[not found] ` <5052E7DF.7040000@parallels.com>
2012-09-14 9:12 ` Li Zefan
2012-09-14 11:22 ` Peter Zijlstra
2012-09-14 17:59 ` Tejun Heo
2012-09-14 18:23 ` Peter Zijlstra
2012-09-14 18:33 ` Tejun Heo
2012-09-14 17:43 ` Tejun Heo
2012-09-17 8:50 ` Glauber Costa
2012-09-17 17:21 ` Tejun Heo
2012-09-14 11:15 ` Peter Zijlstra
2012-09-14 12:54 ` Daniel P. Berrange
2012-09-14 17:53 ` Tejun Heo
2012-09-14 14:25 ` Vivek Goyal
2012-09-14 14:53 ` Peter Zijlstra
2012-09-14 15:14 ` Vivek Goyal
2012-09-14 21:57 ` Tejun Heo
2012-09-17 15:27 ` Vivek Goyal
2012-09-18 18:08 ` Vivek Goyal
2012-09-14 21:39 ` Tejun Heo
2012-09-17 15:05 ` Vivek Goyal
2012-09-17 16:40 ` Tejun Heo
2012-09-14 15:03 ` Michal Hocko
2012-09-19 14:02 ` Michal Hocko
2012-09-19 14:03 ` [PATCH 2.6.32] memcg: warn on deeper hierarchies with use_hierarchy==0 Michal Hocko
2012-09-19 19:38 ` David Rientjes
2012-09-20 13:24 ` Michal Hocko
2012-09-20 22:33 ` David Rientjes
2012-09-21 7:16 ` Michal Hocko
2012-09-19 14:03 ` [PATCH 3.0] " Michal Hocko
2012-09-19 14:05 ` [PATCH 3.2+] " Michal Hocko
2012-09-14 18:07 ` [RFC] cgroup TODOs Vivek Goyal
2012-09-14 18:53 ` Tejun Heo
2012-09-14 19:28 ` Vivek Goyal
2012-09-14 19:44 ` Tejun Heo
2012-09-14 19:49 ` Tejun Heo
2012-09-14 20:39 ` Tejun Heo
2012-09-17 8:40 ` Glauber Costa
2012-09-17 17:30 ` Tejun Heo
2012-09-17 14:37 ` Vivek Goyal
2012-09-14 18:36 ` Aristeu Rozanski
2012-09-14 18:54 ` Tejun Heo
2012-09-15 2:20 ` Serge E. Hallyn
2012-09-15 9:27 ` Controlling devices and device namespaces Eric W. Biederman
2012-09-15 22:05 ` Serge E. Hallyn [this message]
2012-09-16 0:24 ` Eric W. Biederman
2012-09-16 3:31 ` Serge E. Hallyn
2012-09-16 11:21 ` Alan Cox
2012-09-16 11:56 ` Eric W. Biederman
2012-09-16 12:17 ` Eric W. Biederman
2012-09-16 13:32 ` Serge Hallyn
2012-09-16 14:23 ` Eric W. Biederman
2012-09-16 16:13 ` Alan Cox
2012-09-16 17:49 ` Eric W. Biederman
2012-09-16 16:15 ` Serge Hallyn
2012-09-16 16:53 ` Eric W. Biederman
2012-09-16 8:19 ` [RFC] cgroup TODOs James Bottomley
2012-09-16 14:41 ` Eric W. Biederman
2012-09-17 13:21 ` Aristeu Rozanski
2012-09-14 22:03 ` Dhaval Giani
2012-09-14 22:06 ` Tejun Heo
2012-09-20 1:33 ` Andy Lutomirski
2012-09-20 18:26 ` Tejun Heo
2012-09-20 18:39 ` Andy Lutomirski
2012-09-21 21:40 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120915220520.GA11364@mail.hallyn.com \
--to=serge@hallyn.com \
--cc=acme@ghostprotocols.net \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=aris@ruivo.org \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@redhat.com \
--cc=nhorman@tuxdriver.com \
--cc=paulus@samba.org \
--cc=pjt@google.com \
--cc=serue@us.ibm.com \
--cc=tgraf@suug.ch \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).