All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge@hallyn.com>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Serge Hallyn <serge.hallyn@ubuntu.com>,
	Aristeu Rozanski <aris@redhat.com>,
	linux-kernel@vger.kernel.org, morgan@kernel.org,
	cgroups@vger.kernel.org
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset
Date: Wed, 15 May 2013 20:23:10 -0500	[thread overview]
Message-ID: <20130516012310.GA17819@austin.hallyn.com> (raw)
In-Reply-To: <20130516011401.GA17462@austin.hallyn.com>

Quoting Serge E. Hallyn (serge@hallyn.com):
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> > 
> > > Quoting Aristeu Rozanski (aris@redhat.com):
> > >> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> > >> > so now that the device cgroup properly respects hierarchy, not allowing
> > >> > a cgroup to be given greater permission than its parent, should we consider
> > >> > relaxing the capability checks?
> > >> > 
> > >> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> > >> > devcgroup_can_attach() to protect changing another task's cgroup, and
> > >> > one in devcgroup_update_access() to protect writes to the devices.allow
> > >> > and devices.deny files.
> > >> > 
> > >> > I think the first should be changed to a check for ns_capable() to
> > >> > the victim's user_ns.  Something like 
> > >> > 
> > >> > --- a/security/device_cgroup.c
> > >> > +++ b/security/device_cgroup.c
> > >> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> > >> >                                 struct cgroup_taskset *set)
> > >> >  {
> > >> >         struct task_struct *task = cgroup_taskset_first(set);
> > >> > +       struct user_namespace *ns;
> > >> > +       int ret = -EPERM;
> > >> > 
> > >> > -       if (current != task && !capable(CAP_SYS_ADMIN))
> > >> > -               return -EPERM;
> > >> > -       return 0;
> > >> > +       if (current == task)
> > >> > +               return 0;
> > >> > +
> > >> > +       ns = userns_get(task);;
> > >> > +       ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> > >> > +       put_user_ns(ns);
> > >> > +       return ret;
> > >> >  }
> > >> 
> > >> wouldn't this allow a userns root to move a task in the same userns into
> > >> a parent cgroup? I believe than anything but moving down the hierarchy
> > >> would be very complicated to verify (how far up can you go).
> > >
> > > But only if they are able to open the tasks file for writing, which
> > > they shouldn't be able to do, right?
> > 
> > That should be looked at very closely.  There are some funny exploits of
> > setuid root applications writing to files that have required some
> > additional permission checks on /proc/<pid>/uid_map.  I think the
> > cgroups files may be vulnerable to some of the same kind of exploits.
> > 
> > Certainly we should be verifying that the opener of the file had the
> > capabilities we are trying to use to avoid being open to those kinds of
> > problems.
> > 
> > I am trying to see the utilitity of the proposed patch.  It doesn't
> > allow mknod.  So what is the benefit of having the user namespace bits?
> 
> I'm still thinking through it, which is why I haven't sent a real
> patch.  What I'm working on is the unprivileged startup of a container.
> Right now most things are not allowed in a private user ns, so device
> cgroup is not as useful.  But it should be possible eventually to use
> block devices, which the original unprivileged user owned, by chowning
> the blockdev to a user mapped into the target userns.
> 
> The unprivileged user may want to use devices cgroup so he can chown
> the loop file into the container, but only allow read-only mounts, for
> instance.
> 
> > Is the point to allow the userns root to remove access to selected
> > devices from it's children even if the DAC permissions would allow the
> > access?
> 
> Yes I think that's it - except userns root before forking the container
> init (and venturing into the really untrusted category).
> 
> ...
> 
> > That said I haven't looked at open or mknod, and usually we are talking
> > about calls that aren't made by suid apps so I think there is a fair
> > chance that dropping some of those permissions could cause issues.
> > The first danger that crosses my mind is what happens if you remove
> > access to /dev/tty from a normal application that would trying and log
> > strange goings on to a user if they could.
> 
> If they were going to do that over tty, that would be to the malicious
> user anyway, so that should just either be ignored, or result in the
> program exiting early.
> 
> > Shrug mostly I don't see the advantage of this change.
> 
> It's also possible that this will end up being worked around by the new
> (not-yet-designed) interface/library which Tejun wants people to use,
> sitting above the cgroupfs.  At least at a first layer.
> 
> Anyway this isn't urgent, as it's not in the way for general unprivileged
> container creation.  But in general if we don't need the check to be
> capable(), it would be better to introduce the right check.
> 
> -serge

I'm terribly sorry, Andrew, I have no idea how that address for you got
into my address book.  (Corrected)  fwiw the thread can be followed at
https://lkml.org/lkml/2013/5/14/363 .

-serge

WARNING: multiple messages have this Message-ID (diff)
From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
Cc: "Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	Serge Hallyn
	<serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>,
	Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	morgan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH 0/4] Rebase device_cgroup v2 patchset
Date: Wed, 15 May 2013 20:23:10 -0500	[thread overview]
Message-ID: <20130516012310.GA17819@austin.hallyn.com> (raw)
In-Reply-To: <20130516011401.GA17462-anj0Drq5vpzx6HRWoRZK3AC/G2K4zDHf@public.gmane.org>

Quoting Serge E. Hallyn (serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org):
> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> > Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> writes:
> > 
> > > Quoting Aristeu Rozanski (aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org):
> > >> On Tue, May 14, 2013 at 10:05:39AM -0500, Serge Hallyn wrote:
> > >> > so now that the device cgroup properly respects hierarchy, not allowing
> > >> > a cgroup to be given greater permission than its parent, should we consider
> > >> > relaxing the capability checks?
> > >> > 
> > >> > There are two capable(CAP_SYS_ADMIN) checks in deice_cgroup.c: one in
> > >> > devcgroup_can_attach() to protect changing another task's cgroup, and
> > >> > one in devcgroup_update_access() to protect writes to the devices.allow
> > >> > and devices.deny files.
> > >> > 
> > >> > I think the first should be changed to a check for ns_capable() to
> > >> > the victim's user_ns.  Something like 
> > >> > 
> > >> > --- a/security/device_cgroup.c
> > >> > +++ b/security/device_cgroup.c
> > >> > @@ -70,10 +70,16 @@ static int devcgroup_can_attach(struct cgroup *new_cgrp,
> > >> >                                 struct cgroup_taskset *set)
> > >> >  {
> > >> >         struct task_struct *task = cgroup_taskset_first(set);
> > >> > +       struct user_namespace *ns;
> > >> > +       int ret = -EPERM;
> > >> > 
> > >> > -       if (current != task && !capable(CAP_SYS_ADMIN))
> > >> > -               return -EPERM;
> > >> > -       return 0;
> > >> > +       if (current == task)
> > >> > +               return 0;
> > >> > +
> > >> > +       ns = userns_get(task);;
> > >> > +       ret = ns_capable(ns, CAP_SYS_ADMIN) ? 0 : -EPERM;
> > >> > +       put_user_ns(ns);
> > >> > +       return ret;
> > >> >  }
> > >> 
> > >> wouldn't this allow a userns root to move a task in the same userns into
> > >> a parent cgroup? I believe than anything but moving down the hierarchy
> > >> would be very complicated to verify (how far up can you go).
> > >
> > > But only if they are able to open the tasks file for writing, which
> > > they shouldn't be able to do, right?
> > 
> > That should be looked at very closely.  There are some funny exploits of
> > setuid root applications writing to files that have required some
> > additional permission checks on /proc/<pid>/uid_map.  I think the
> > cgroups files may be vulnerable to some of the same kind of exploits.
> > 
> > Certainly we should be verifying that the opener of the file had the
> > capabilities we are trying to use to avoid being open to those kinds of
> > problems.
> > 
> > I am trying to see the utilitity of the proposed patch.  It doesn't
> > allow mknod.  So what is the benefit of having the user namespace bits?
> 
> I'm still thinking through it, which is why I haven't sent a real
> patch.  What I'm working on is the unprivileged startup of a container.
> Right now most things are not allowed in a private user ns, so device
> cgroup is not as useful.  But it should be possible eventually to use
> block devices, which the original unprivileged user owned, by chowning
> the blockdev to a user mapped into the target userns.
> 
> The unprivileged user may want to use devices cgroup so he can chown
> the loop file into the container, but only allow read-only mounts, for
> instance.
> 
> > Is the point to allow the userns root to remove access to selected
> > devices from it's children even if the DAC permissions would allow the
> > access?
> 
> Yes I think that's it - except userns root before forking the container
> init (and venturing into the really untrusted category).
> 
> ...
> 
> > That said I haven't looked at open or mknod, and usually we are talking
> > about calls that aren't made by suid apps so I think there is a fair
> > chance that dropping some of those permissions could cause issues.
> > The first danger that crosses my mind is what happens if you remove
> > access to /dev/tty from a normal application that would trying and log
> > strange goings on to a user if they could.
> 
> If they were going to do that over tty, that would be to the malicious
> user anyway, so that should just either be ignored, or result in the
> program exiting early.
> 
> > Shrug mostly I don't see the advantage of this change.
> 
> It's also possible that this will end up being worked around by the new
> (not-yet-designed) interface/library which Tejun wants people to use,
> sitting above the cgroupfs.  At least at a first layer.
> 
> Anyway this isn't urgent, as it's not in the way for general unprivileged
> container creation.  But in general if we don't need the check to be
> capable(), it would be better to introduce the right check.
> 
> -serge

I'm terribly sorry, Andrew, I have no idea how that address for you got
into my address book.  (Corrected)  fwiw the thread can be followed at
https://lkml.org/lkml/2013/5/14/363 .

-serge

  reply	other threads:[~2013-05-16  1:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 13:45 [PATCH 0/4] Rebase device_cgroup v2 patchset Aristeu Rozanski
2012-10-22 13:45 ` [PATCH 1/4] cgroup: fix invalid rcu dereference Aristeu Rozanski
2012-10-22 16:11   ` Serge Hallyn
2012-10-22 16:11     ` Serge Hallyn
2012-10-23 12:50   ` Jiri Slaby
2012-10-23 12:50     ` Jiri Slaby
2012-10-23 13:17     ` Aristeu Rozanski
2012-10-23 13:17       ` Aristeu Rozanski
2012-10-23 13:53       ` Jiri Slaby
2012-10-23 13:53         ` Jiri Slaby
2012-10-22 13:45 ` [PATCH 2/4] device_cgroup: rename deny_all to behavior Aristeu Rozanski
2012-10-22 16:12   ` Serge Hallyn
2012-10-22 13:45 ` [PATCH 3/4] device_cgroup: stop using simple_strtoul() Aristeu Rozanski
2012-10-22 16:14   ` Serge Hallyn
2012-10-22 16:14     ` Serge Hallyn
2012-10-22 13:45 ` [PATCH 4/4] device_cgroup: add proper checking when changing default behavior Aristeu Rozanski
2012-10-22 16:16   ` Serge Hallyn
2012-10-22 16:16     ` Serge Hallyn
2012-10-22 19:58 ` [PATCH 0/4] Rebase device_cgroup v2 patchset Andrew Morton
2012-10-22 19:58   ` Andrew Morton
2012-10-22 20:14   ` Aristeu Rozanski
2012-10-22 20:14     ` Aristeu Rozanski
2013-05-14 15:05 ` Serge Hallyn
2013-05-14 15:51   ` Aristeu Rozanski
2013-05-14 15:51     ` Aristeu Rozanski
2013-05-14 16:22     ` Serge Hallyn
2013-05-14 16:22       ` Serge Hallyn
2013-05-14 21:02       ` Eric W. Biederman
2013-05-14 21:02         ` Eric W. Biederman
2013-05-16  1:14         ` Serge E. Hallyn
2013-05-16  1:14           ` Serge E. Hallyn
2013-05-16  1:23           ` Serge E. Hallyn [this message]
2013-05-16  1:23             ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130516012310.GA17819@austin.hallyn.com \
    --to=serge@hallyn.com \
    --cc=aris@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=morgan@kernel.org \
    --cc=serge.hallyn@ubuntu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.