From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) Subject: Re: [PATCH] devpts: Add ptmx_uid and ptmx_gid options Date: Thu, 28 May 2015 12:14:19 -0500 Message-ID: <87mw0omxp0.fsf@x220.int.ebiederm.org> References: <1427447013.2250.9.camel@HansenPartnership.com> <1427788642.4411.12.camel@redhat.com> <1427807248.2117.117.camel@HansenPartnership.com> <1427808184.2117.122.camel@HansenPartnership.com> <1427810118.2117.126.camel@HansenPartnership.com> <1427810886.2117.129.camel@HansenPartnership.com> <1427811444.4411.20.camel@redhat.com> <1427969525.3559.120.camel@HansenPartnership.com> <1427984969.13651.11.camel@redhat.com> <87zj6qs7v8.fsf@x220.int.ebiederm.org> <87oal4odne.fsf@x220.int.ebiederm.org> <1432832511.21304.6.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1432832511.21304.6.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> (Alexander Larsson's message of "Thu, 28 May 2015 19:01:51 +0200") List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Alexander Larsson Cc: gnome-os-list-rDKQcyrBJuzYtjvyW6yDsg@public.gmane.org, Linux Containers , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Andy Lutomirski , mclasen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Linux FS Devel List-Id: containers.vger.kernel.org Alexander Larsson writes: > On Thu, 2015-05-28 at 11:44 -0500, Eric W. Biederman wrote: >> Andy Lutomirski writes: >> >> > On Thu, Apr 2, 2015 at 11:27 AM, Eric W. Biederman >> > wrote: >> > > Andy Lutomirski writes: >> > > >> > > > On Thu, Apr 2, 2015 at 7:29 AM, Alexander Larsson < >> > > > alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > > > > On Thu, 2015-04-02 at 07:06 -0700, Andy Lutomirski wrote: >> > > > > > On Thu, Apr 2, 2015 at 3:12 AM, James Bottomley >> > > > > > wrote: >> > > > > > > On Tue, 2015-03-31 at 16:17 +0200, Alexander Larsson >> > > > > > > wrote: >> > > > > > > > On tis, 2015-03-31 at 17:08 +0300, James Bottomley >> > > > > > > > wrote: >> > > > > > > > > On Tue, 2015-03-31 at 06:59 -0700, Andy Lutomirski >> > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > I don't think that this is correct. That user can >> > > > > > > > > > already create a >> > > > > > > > > > nested userns and map themselves as 0 inside it. >> > > > > > > > > > Then they can mount >> > > > > > > > > > devpts. >> > > > > > > > > >> > > > > > > > > I don't mind if they create a container and control >> > > > > > > > > the isolated ttys in >> > > > > > > > > that sub container in the VPS; that's fine. I do >> > > > > > > > > mind if they get >> > > > > > > > > access to the ttys in the VPS. >> > > > > > > > > >> > > > > > > > > If you can convince me (and the rest of Linux) that >> > > > > > > > > the tty subsystem >> > > > > > > > > should be mountable by an unprivileged user >> > > > > > > > > generally, then what you >> > > > > > > > > propose is OK. >> > > > > > > > >> > > > > > > > That is controlled by the general rights to mount >> > > > > > > > stuff. I.e. unless you >> > > > > > > > have CAP_SYS_ADMIN in the VPS container you will not be >> > > > > > > > able to mount >> > > > > > > > devpts there. You can only do it in a subcontainer >> > > > > > > > where you got >> > > > > > > > permissions to mount via using user namespaces. >> > > > > > > >> > > > > > > OK let me try again. Fine, if you want to speak >> > > > > > > capabilities, you've >> > > > > > > given a non-root user an unexpected capability (the >> > > > > > > capability of >> > > > > > > creating a ptmx device). But you haven't used a >> > > > > > > capability separation >> > > > > > > to do this, you've just hard coded it via a mount >> > > > > > > parameter mechanism. >> > > > > > > >> > > > > > > If you want to do this thing, do it properly, so it's >> > > > > > > acceptable to the >> > > > > > > whole of Linux, not a special corner case for one >> > > > > > > particular type of >> > > > > > > container. >> > > > > > > >> > > > > > > Security breaches are created when people code in >> > > > > > > special, little used, >> > > > > > > corner cases because they don't get as thoroughly tested >> > > > > > > and inspected >> > > > > > > as generally applicable mechanisms. >> > > > > > > >> > > > > > > What you want is to be able to use the tty subsystem as a >> > > > > > > non root user: >> > > > > > > fine, but set that up globally, don't hide it in >> > > > > > > containers so a lot >> > > > > > > fewer people care. >> > > > > > >> > > > > > I tend to agree, and not just for the tty subsystem. This >> > > > > > is an >> > > > > > attack surface issue. With unprivileged user namespaces, >> > > > > > unprivileged >> > > > > > users can create mount namespaces (probably a good thing >> > > > > > for bind >> > > > > > mounts, etc), network namespaces (reasonably safe by >> > > > > > themselves), >> > > > > > network interfaces and iptables rules (scary), fresh >> > > > > > instances/superblocks of some filesystems (scariness >> > > > > > depends on the fs >> > > > > > -- tmpfs is probably fine), and more. >> > > > > > >> > > > > > I think we should have real controls for this, and this is >> > > > > > mostly >> > > > > > Eric's domain. Eric? A silly issue that sometimes >> > > > > > prevents devpts >> > > > > > from being mountable isn't a real control, though. >> > > >> > > I thought the controls for limiting how much of the userspace API >> > > an application could use were called seccomp and seccomp2. >> > > >> > > Do we need something like a PAM module so that we can set up >> > > these >> > > controls during login? >> > > >> > > > > I'm honestly surprised that non-root is allowed to mount >> > > > > things in >> > > > > general with user namespaces. This was long disabled use for >> > > > > non-root in >> > > > > Fedora, but it is now enabled. >> > > > > >> > > > > For instance, using loopback mounted files you could probably >> > > > > attack >> > > > > some of the less well tested filesystem implementations by >> > > > > feeding them >> > > > > fuzzed data. >> > > > > >> > > > >> > > > You actually can't do that right now. Filesystems have to opt >> > > > in to >> > > > being mounted in unprivileged user namespaces, and no >> > > > filesystems with >> > > > backing stores have opted in. devpts has, but it's buggy >> > > > without this >> > > > patch IMO. >> > > >> > > Arguably you should use two user namespaces. The first to do >> > > what you >> > > want to as root the second to run as the uid you want to run as. >> > > >> > > > > Anyway, I don't see how this affects devpts though. If you're >> > > > > running in >> > > > > a container (or uncontained), as a regular users with no >> > > > > mount >> > > > > capabilities you can already mount a devpts filesystem if you >> > > > > create a >> > > > > subbcontainer with user namespaces and map your uid to 0 in >> > > > > the >> > > > > subcontainer. Then you get a new ptmx device that you can do >> > > > > whatever >> > > > > you want with. The mount option would let you do the same, >> > > > > except be >> > > > > your regular uid in the subcontainer. >> > > > > >> > > > > The only difference outside of the subcontainer is that if >> > > > > the outer >> > > > > container has no uid 0 mapped, yet the user has CAP_SYSADMIN >> > > > > rights in >> > > > > that container. Then he can mount devpts in the outer >> > > > > container where he >> > > > > before could only mount it in an inner container. >> > > > > >> > > > >> > > > Agreed. Also, devpts doesn't seem scary at all to me from a >> > > > userns >> > > > perspective. Regular users on normal systems can already use >> > > > ptmx, >> > > > and AFAICS basically all of the attack surface is already >> > > > available >> > > > through the normal /dev/ptmx node. >> > > >> > > My only real take is that there are a lot more places that you >> > > need to >> > > tweak beyond devpts. So this patch seemed lacking and boring. >> > > >> > > Beyond that until I get the mount namespace sorted out things are >> > > pretty >> > > much in a feature freeze because I can't multitask well enough to >> > > do >> > > complicated patches and take feature patches. >> > > >> > >> > Eric, do you think you have time now to take a look at this patch? >> >> I am much closer. Escaping bind mounts is still not yet fixed but I >> have code that almost works. >> >> My gut feel still says that two user namespaces one where your 0 is >> mapped to your uid and a second where your uid is identity mapped is >> the >> preferrable configuration, and makes this patch unnecessary. > > I don't really understand this. My usecase is that I want a desktop app > sandbox, it should run as the actual user that is running the graphical > session mapped to its real uid. In this namespace i want a /dev/pts so > that i can e.g. shell out to ssh and feed it a password on the tty > prompt or similar. And i don't want to bind-mount in the host /dev/pts, > because then the sandbox can read from the ttys of other apps. > > Where does the second namespace enter into this? Step a. Create create a user namespace where uid 0 is mapped to your real uid, and set up your sandbox (aka mount /dev/pts and everything else). Step b. Create a nested user namespace where your uid is identity mapped and run your desktop application. You can even drop all caps in your namespace. Or basically: unshare(CLONE_NEWUSER) map 0 to real_uid set things up. unshare(CLONE_NEWUSER) map real_uid to 0 (Because I am assuming we are single threaded in the nested context) drop caps exec /path/to/my/sandboxed/application Eric From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754227AbbE1RTc (ORCPT ); Thu, 28 May 2015 13:19:32 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:44347 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754074AbbE1RTT (ORCPT ); Thu, 28 May 2015 13:19:19 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Alexander Larsson Cc: Andy Lutomirski , James Bottomley , gnome-os-list@gnome.org, Linux Containers , "linux-kernel\@vger.kernel.org" , mclasen@redhat.com, Linux FS Devel References: <1427447013.2250.9.camel@HansenPartnership.com> <1427788642.4411.12.camel@redhat.com> <1427807248.2117.117.camel@HansenPartnership.com> <1427808184.2117.122.camel@HansenPartnership.com> <1427810118.2117.126.camel@HansenPartnership.com> <1427810886.2117.129.camel@HansenPartnership.com> <1427811444.4411.20.camel@redhat.com> <1427969525.3559.120.camel@HansenPartnership.com> <1427984969.13651.11.camel@redhat.com> <87zj6qs7v8.fsf@x220.int.ebiederm.org> <87oal4odne.fsf@x220.int.ebiederm.org> <1432832511.21304.6.camel@redhat.com> Date: Thu, 28 May 2015 12:14:19 -0500 In-Reply-To: <1432832511.21304.6.camel@redhat.com> (Alexander Larsson's message of "Thu, 28 May 2015 19:01:51 +0200") Message-ID: <87mw0omxp0.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX18Jwc/xqvhTfBQhPHEso6Fc78IBWWe+8Cc= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Alexander Larsson X-Spam-Relay-Country: X-Spam-Timing: total 670 ms - load_scoreonly_sql: 0.13 (0.0%), signal_user_changed: 4.8 (0.7%), b_tie_ro: 3.3 (0.5%), parse: 1.43 (0.2%), extract_message_metadata: 24 (3.6%), get_uri_detail_list: 6 (0.9%), tests_pri_-1000: 10 (1.5%), tests_pri_-950: 1.41 (0.2%), tests_pri_-900: 1.17 (0.2%), tests_pri_-400: 40 (6.0%), check_bayes: 38 (5.7%), b_tokenize: 13 (2.0%), b_tok_get_all: 13 (1.9%), b_comp_prob: 5 (0.8%), b_tok_touch_all: 3.9 (0.6%), b_finish: 0.81 (0.1%), tests_pri_0: 576 (85.9%), tests_pri_500: 7 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] devpts: Add ptmx_uid and ptmx_gid options X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alexander Larsson writes: > On Thu, 2015-05-28 at 11:44 -0500, Eric W. Biederman wrote: >> Andy Lutomirski writes: >> >> > On Thu, Apr 2, 2015 at 11:27 AM, Eric W. Biederman >> > wrote: >> > > Andy Lutomirski writes: >> > > >> > > > On Thu, Apr 2, 2015 at 7:29 AM, Alexander Larsson < >> > > > alexl@redhat.com> wrote: >> > > > > On Thu, 2015-04-02 at 07:06 -0700, Andy Lutomirski wrote: >> > > > > > On Thu, Apr 2, 2015 at 3:12 AM, James Bottomley >> > > > > > wrote: >> > > > > > > On Tue, 2015-03-31 at 16:17 +0200, Alexander Larsson >> > > > > > > wrote: >> > > > > > > > On tis, 2015-03-31 at 17:08 +0300, James Bottomley >> > > > > > > > wrote: >> > > > > > > > > On Tue, 2015-03-31 at 06:59 -0700, Andy Lutomirski >> > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > I don't think that this is correct. That user can >> > > > > > > > > > already create a >> > > > > > > > > > nested userns and map themselves as 0 inside it. >> > > > > > > > > > Then they can mount >> > > > > > > > > > devpts. >> > > > > > > > > >> > > > > > > > > I don't mind if they create a container and control >> > > > > > > > > the isolated ttys in >> > > > > > > > > that sub container in the VPS; that's fine. I do >> > > > > > > > > mind if they get >> > > > > > > > > access to the ttys in the VPS. >> > > > > > > > > >> > > > > > > > > If you can convince me (and the rest of Linux) that >> > > > > > > > > the tty subsystem >> > > > > > > > > should be mountable by an unprivileged user >> > > > > > > > > generally, then what you >> > > > > > > > > propose is OK. >> > > > > > > > >> > > > > > > > That is controlled by the general rights to mount >> > > > > > > > stuff. I.e. unless you >> > > > > > > > have CAP_SYS_ADMIN in the VPS container you will not be >> > > > > > > > able to mount >> > > > > > > > devpts there. You can only do it in a subcontainer >> > > > > > > > where you got >> > > > > > > > permissions to mount via using user namespaces. >> > > > > > > >> > > > > > > OK let me try again. Fine, if you want to speak >> > > > > > > capabilities, you've >> > > > > > > given a non-root user an unexpected capability (the >> > > > > > > capability of >> > > > > > > creating a ptmx device). But you haven't used a >> > > > > > > capability separation >> > > > > > > to do this, you've just hard coded it via a mount >> > > > > > > parameter mechanism. >> > > > > > > >> > > > > > > If you want to do this thing, do it properly, so it's >> > > > > > > acceptable to the >> > > > > > > whole of Linux, not a special corner case for one >> > > > > > > particular type of >> > > > > > > container. >> > > > > > > >> > > > > > > Security breaches are created when people code in >> > > > > > > special, little used, >> > > > > > > corner cases because they don't get as thoroughly tested >> > > > > > > and inspected >> > > > > > > as generally applicable mechanisms. >> > > > > > > >> > > > > > > What you want is to be able to use the tty subsystem as a >> > > > > > > non root user: >> > > > > > > fine, but set that up globally, don't hide it in >> > > > > > > containers so a lot >> > > > > > > fewer people care. >> > > > > > >> > > > > > I tend to agree, and not just for the tty subsystem. This >> > > > > > is an >> > > > > > attack surface issue. With unprivileged user namespaces, >> > > > > > unprivileged >> > > > > > users can create mount namespaces (probably a good thing >> > > > > > for bind >> > > > > > mounts, etc), network namespaces (reasonably safe by >> > > > > > themselves), >> > > > > > network interfaces and iptables rules (scary), fresh >> > > > > > instances/superblocks of some filesystems (scariness >> > > > > > depends on the fs >> > > > > > -- tmpfs is probably fine), and more. >> > > > > > >> > > > > > I think we should have real controls for this, and this is >> > > > > > mostly >> > > > > > Eric's domain. Eric? A silly issue that sometimes >> > > > > > prevents devpts >> > > > > > from being mountable isn't a real control, though. >> > > >> > > I thought the controls for limiting how much of the userspace API >> > > an application could use were called seccomp and seccomp2. >> > > >> > > Do we need something like a PAM module so that we can set up >> > > these >> > > controls during login? >> > > >> > > > > I'm honestly surprised that non-root is allowed to mount >> > > > > things in >> > > > > general with user namespaces. This was long disabled use for >> > > > > non-root in >> > > > > Fedora, but it is now enabled. >> > > > > >> > > > > For instance, using loopback mounted files you could probably >> > > > > attack >> > > > > some of the less well tested filesystem implementations by >> > > > > feeding them >> > > > > fuzzed data. >> > > > > >> > > > >> > > > You actually can't do that right now. Filesystems have to opt >> > > > in to >> > > > being mounted in unprivileged user namespaces, and no >> > > > filesystems with >> > > > backing stores have opted in. devpts has, but it's buggy >> > > > without this >> > > > patch IMO. >> > > >> > > Arguably you should use two user namespaces. The first to do >> > > what you >> > > want to as root the second to run as the uid you want to run as. >> > > >> > > > > Anyway, I don't see how this affects devpts though. If you're >> > > > > running in >> > > > > a container (or uncontained), as a regular users with no >> > > > > mount >> > > > > capabilities you can already mount a devpts filesystem if you >> > > > > create a >> > > > > subbcontainer with user namespaces and map your uid to 0 in >> > > > > the >> > > > > subcontainer. Then you get a new ptmx device that you can do >> > > > > whatever >> > > > > you want with. The mount option would let you do the same, >> > > > > except be >> > > > > your regular uid in the subcontainer. >> > > > > >> > > > > The only difference outside of the subcontainer is that if >> > > > > the outer >> > > > > container has no uid 0 mapped, yet the user has CAP_SYSADMIN >> > > > > rights in >> > > > > that container. Then he can mount devpts in the outer >> > > > > container where he >> > > > > before could only mount it in an inner container. >> > > > > >> > > > >> > > > Agreed. Also, devpts doesn't seem scary at all to me from a >> > > > userns >> > > > perspective. Regular users on normal systems can already use >> > > > ptmx, >> > > > and AFAICS basically all of the attack surface is already >> > > > available >> > > > through the normal /dev/ptmx node. >> > > >> > > My only real take is that there are a lot more places that you >> > > need to >> > > tweak beyond devpts. So this patch seemed lacking and boring. >> > > >> > > Beyond that until I get the mount namespace sorted out things are >> > > pretty >> > > much in a feature freeze because I can't multitask well enough to >> > > do >> > > complicated patches and take feature patches. >> > > >> > >> > Eric, do you think you have time now to take a look at this patch? >> >> I am much closer. Escaping bind mounts is still not yet fixed but I >> have code that almost works. >> >> My gut feel still says that two user namespaces one where your 0 is >> mapped to your uid and a second where your uid is identity mapped is >> the >> preferrable configuration, and makes this patch unnecessary. > > I don't really understand this. My usecase is that I want a desktop app > sandbox, it should run as the actual user that is running the graphical > session mapped to its real uid. In this namespace i want a /dev/pts so > that i can e.g. shell out to ssh and feed it a password on the tty > prompt or similar. And i don't want to bind-mount in the host /dev/pts, > because then the sandbox can read from the ttys of other apps. > > Where does the second namespace enter into this? Step a. Create create a user namespace where uid 0 is mapped to your real uid, and set up your sandbox (aka mount /dev/pts and everything else). Step b. Create a nested user namespace where your uid is identity mapped and run your desktop application. You can even drop all caps in your namespace. Or basically: unshare(CLONE_NEWUSER) map 0 to real_uid set things up. unshare(CLONE_NEWUSER) map real_uid to 0 (Because I am assuming we are single threaded in the nested context) drop caps exec /path/to/my/sandboxed/application Eric