From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751548AbdJCRlE convert rfc822-to-8bit (ORCPT ); Tue, 3 Oct 2017 13:41:04 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:58495 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751104AbdJCRlC (ORCPT ); Tue, 3 Oct 2017 13:41:02 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: =?utf-8?Q?J=C3=BCrg?= Billeter Cc: Andrew Morton , Oleg Nesterov , Linus Torvalds , Michael Kerrisk , Filipe Brandenburger , David Wilcox , hansecke@gmail.com, linux-kernel@vger.kernel.org References: <20170909094008.49983-1-j@bitron.ch> <20170929123058.48924-1-j@bitron.ch> <20171002162041.a7cefe8af71327b8becd2347@linux-foundation.org> <87o9pogbf7.fsf@xmission.com> <1507013157.2304.48.camel@bitron.ch> <878tgse1c5.fsf@xmission.com> <1507050019.19102.51.camel@bitron.ch> Date: Tue, 03 Oct 2017 12:40:43 -0500 In-Reply-To: <1507050019.19102.51.camel@bitron.ch> (=?utf-8?Q?=22J=C3=BCrg?= Billeter"'s message of "Tue, 03 Oct 2017 19:00:19 +0200") Message-ID: <8760bwb04k.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1dzRBe-0003we-Nm;;;mid=<8760bwb04k.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=67.3.200.44;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+1ha77idFmnY+rjJTifEAvXf/HGnri+5g= X-SA-Exim-Connect-IP: 67.3.200.44 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 TR_Symld_Words too many words that have symbols inside * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4998] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] * 0.2 T_XMDrugObfuBody_14 obfuscated drug references X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: =?ISO-8859-1?Q?*;J=c3=bcrg Billeter ?= X-Spam-Relay-Country: X-Spam-Timing: total 5301 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 2.5 (0.0%), b_tie_ro: 1.75 (0.0%), parse: 0.83 (0.0%), extract_message_metadata: 12 (0.2%), get_uri_detail_list: 2.4 (0.0%), tests_pri_-1000: 4.6 (0.1%), tests_pri_-950: 1.12 (0.0%), tests_pri_-900: 0.96 (0.0%), tests_pri_-400: 29 (0.5%), check_bayes: 27 (0.5%), b_tokenize: 9 (0.2%), b_tok_get_all: 11 (0.2%), b_comp_prob: 2.9 (0.1%), b_tok_touch_all: 2.9 (0.1%), b_finish: 0.52 (0.0%), tests_pri_0: 273 (5.1%), check_dkim_signature: 0.50 (0.0%), check_dkim_adsp: 3.2 (0.1%), tests_pri_500: 4975 (93.9%), poll_dns_idle: 4968 (93.7%), rewrite_mail: 0.00 (0.0%) Subject: Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jürg Billeter writes: > On Tue, 2017-10-03 at 09:46 -0500, Eric W. Biederman wrote: >> There is a general need to find out about the death of other processes, >> if you are not the parent of the process. I would be inclined to call >> it waitfd. Something that you give a pid. It performs a permission >> check and the pid becomes readable when the process dies. With poll >> working on the fd, and the fd returning wstatus of the dead child. >> >> Support SIGIO on the fd and you have a signal delivery mechanism, >> if you want it. > > File descriptors for processes (waitfd/clonefd) are definitely > interesting. Especially if reaping the process (and reparenting its > children) is delayed until the last process file descriptor is closed. > However, this would be a much larger addition and also less intuitive > to use if all you want is killing the process tree. > >> For the kill all children when the parent dies the mechanism you are >> proposing is escapable. We already have an inescapable version of it >> with init in a pid namespace. We already have an escapable version of >> it with orphaned process groups and SIGHUP. >> >> So I would really appreciate a very clear use case for what we are >> building here. As it appears the killing of children can already be >> done another way, and that the waiting for the parent can be done better >> another way. > > My use case is to provide a way for a process to spawn a child and > ensure that no descendants survive when that child dies. Avoiding > runaway processes is desirable in many situations. My motivation is > very lightweight (nested) sandboxing (every process is potentially > sandboxed). > > I.e., pid namespaces would be a pretty good fit (assuming they are > sufficiently lightweight) but CLONE_NEWPID requires CAP_SYS_ADMIN. > User namespaces can help here, but creating tons of user namespaces > just for this doesn't sound sensible. MAX_PID_NS_LEVEL could be an > issue as well at some point but 32 levels are likely fine in practice. > > For my particular scenario I may actually be able to create a single > user namespace, run all processes with (namespaced) CAP_SYS_ADMIN and > use CLONE_NEWPID for every process. However, I would prefer not > requiring CAP_SYS_ADMIN and a regular application that wants to avoid > runaway processes for a spawned helper process cannot rely on > CAP_SYS_ADMIN. > > My plan was to use PR_SET_PDEATHSIG_PROC with PR_NO_NEW_PRIVS and a > suitable seccomp filter to prevent changes to pdeath_signal_proc. For > my SIGKILL use case it would be even better to simply require > PR_NO_NEW_PRIVS and make pdeath_signal_proc sticky, avoiding the need > for seccomp. I wanted to keep the differences to the existing > PR_SET_PDEATHSIG minimal but if we argue that the non-SIGKILL use case > is better solved with waitfd (or maybe the process events connector), > we could tailor the prctl for the SIGKILL use case (or support both via > prctl arg3). > > I have another small patch locally that adds a prctl that restricts > kill(2) to direct children of the current thread group for lightweight > sandboxing. That would also be redundant if it was possible to use > CLONE_NEWPID for every process. I believe the current default limits allow using CLONE_NEWPID for every process. The data structures seem light enough as well. > What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? > Does CLONE_NEWPID pose any risks that don't exist for > CLONE_NEWUSER|CLONE_NEWPID? Assuming we can't simply drop the > CAP_SYS_ADMIN requirement, do you see a better solution for this use > case? CLONE_NEWPID without a permission check would allow runing a setuid root application in a pid namespace. Off the top of my head I can't think of a really good exploit. But when you mess up pid files, and hide information from a privileged application I can completely imagine forcing that application to misbehave in ways the attacker can control. Leading to bad things. Eric