From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751548AbdJCRlE convert rfc822-to-8bit (ORCPT <rfc822;w@1wt.eu>);
        Tue, 3 Oct 2017 13:41:04 -0400
Received: from out02.mta.xmission.com ([166.70.13.232]:58495 "EHLO
        out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751104AbdJCRlC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 3 Oct 2017 13:41:02 -0400
From: ebiederm@xmission.com (Eric W. Biederman)
To: =?utf-8?Q?J=C3=BCrg?= Billeter <j@bitron.ch>
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Oleg Nesterov <oleg@redhat.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Michael Kerrisk <mtk.manpages@gmail.com>,
        Filipe Brandenburger <filbranden@google.com>,
        David Wilcox <davidvsthegiant@gmail.com>, hansecke@gmail.com,
        linux-kernel@vger.kernel.org
References: <20170909094008.49983-1-j@bitron.ch>
        <20170929123058.48924-1-j@bitron.ch>
        <20171002162041.a7cefe8af71327b8becd2347@linux-foundation.org>
        <87o9pogbf7.fsf@xmission.com> <1507013157.2304.48.camel@bitron.ch>
        <878tgse1c5.fsf@xmission.com> <1507050019.19102.51.camel@bitron.ch>
Date: Tue, 03 Oct 2017 12:40:43 -0500
In-Reply-To: <1507050019.19102.51.camel@bitron.ch> (=?utf-8?Q?=22J=C3=BCrg?=
 Billeter"'s message
        of "Tue, 03 Oct 2017 19:00:19 +0200")
Message-ID: <8760bwb04k.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
X-XM-SPF: eid=1dzRBe-0003we-Nm;;;mid=<8760bwb04k.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=67.3.200.44;;;frm=ebiederm@xmission.com;;;spf=neutral
X-XM-AID: U2FsdGVkX1+1ha77idFmnY+rjJTifEAvXf/HGnri+5g=
X-SA-Exim-Connect-IP: 67.3.200.44
X-SA-Exim-Mail-From: ebiederm@xmission.com
X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
        *  1.5 TR_Symld_Words too many words that have symbols inside
        *  0.0 TVD_RCVD_IP Message was received from an IP address
        *  0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available.
        *  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.4998]
        * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC
        *      [sa06 1397; Body=1 Fuz1=1 Fuz2=1]
        *  0.2 T_XMDrugObfuBody_14 obfuscated drug references
X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 
X-Spam-Combo: =?ISO-8859-1?Q?*;J=c3=bcrg Billeter <j@bitron.ch>?=
X-Spam-Relay-Country: 
X-Spam-Timing: total 5301 ms - load_scoreonly_sql: 0.03 (0.0%),
        signal_user_changed: 2.5 (0.0%), b_tie_ro: 1.75 (0.0%), parse: 0.83 (0.0%),
        extract_message_metadata: 12 (0.2%), get_uri_detail_list: 2.4 (0.0%),
        tests_pri_-1000: 4.6 (0.1%), tests_pri_-950: 1.12 (0.0%), tests_pri_-900:
        0.96 (0.0%), tests_pri_-400: 29 (0.5%), check_bayes: 27 (0.5%), b_tokenize: 9
        (0.2%), b_tok_get_all: 11 (0.2%), b_comp_prob: 2.9 (0.1%), b_tok_touch_all:
        2.9 (0.1%), b_finish: 0.52 (0.0%), tests_pri_0: 273 (5.1%),
        check_dkim_signature: 0.50 (0.0%), check_dkim_adsp: 3.2 (0.1%),
        tests_pri_500: 4975 (93.9%), poll_dns_idle: 4968 (93.7%), rewrite_mail: 0.00
        (0.0%)
Subject: Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC
X-Spam-Flag: No
X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600)
X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Jürg Billeter <j@bitron.ch> writes:

> On Tue, 2017-10-03 at 09:46 -0500, Eric W. Biederman wrote:
>> There is a general need to find out about the death of other processes,
>> if you are not the parent of the process.   I would be inclined to call
>> it waitfd.  Something that you give a pid.  It performs a permission
>> check and the pid becomes readable when the process dies.  With poll
>> working on the fd, and the fd returning wstatus of the dead child.
>> 
>> Support SIGIO on the fd and you have a signal delivery mechanism,
>> if you want it.
>
> File descriptors for processes (waitfd/clonefd) are definitely
> interesting.  Especially if reaping the process (and reparenting its
> children) is delayed until the last process file descriptor is closed. 
> However, this would be a much larger addition and also less intuitive
> to use if all you want is killing the process tree.
>
>> For the kill all children when the parent dies the mechanism you are
>> proposing is escapable.  We already have an inescapable version of it
>> with init in a pid namespace.  We already have an escapable version of
>> it with orphaned process groups and SIGHUP.
>> 
>> So I would really appreciate a very clear use case for what we are
>> building here.  As it appears the killing of children can already be
>> done another way, and that the waiting for the parent can be done better
>> another way.
>
> My use case is to provide a way for a process to spawn a child and
> ensure that no descendants survive when that child dies.  Avoiding
> runaway processes is desirable in many situations.  My motivation is
> very lightweight (nested) sandboxing (every process is potentially
> sandboxed).
>
> I.e., pid namespaces would be a pretty good fit (assuming they are
> sufficiently lightweight) but CLONE_NEWPID requires CAP_SYS_ADMIN. 
> User namespaces can help here, but creating tons of user namespaces
> just for this doesn't sound sensible. MAX_PID_NS_LEVEL could be an
> issue as well at some point but 32 levels are likely fine in practice.
>
> For my particular scenario I may actually be able to create a single
> user namespace, run all processes with (namespaced) CAP_SYS_ADMIN and
> use CLONE_NEWPID for every process.  However, I would prefer not
> requiring CAP_SYS_ADMIN and a regular application that wants to avoid
> runaway processes for a spawned helper process cannot rely on
> CAP_SYS_ADMIN.
>
> My plan was to use PR_SET_PDEATHSIG_PROC with PR_NO_NEW_PRIVS and a
> suitable seccomp filter to prevent changes to pdeath_signal_proc.  For
> my SIGKILL use case it would be even better to simply require
> PR_NO_NEW_PRIVS and make pdeath_signal_proc sticky, avoiding the need
> for seccomp.  I wanted to keep the differences to the existing
> PR_SET_PDEATHSIG minimal but if we argue that the non-SIGKILL use case
> is better solved with waitfd (or maybe the process events connector),
> we could tailor the prctl for the SIGKILL use case (or support both via
> prctl arg3).
>
> I have another small patch locally that adds a prctl that restricts
> kill(2) to direct children of the current thread group for lightweight
> sandboxing.  That would also be redundant if it was possible to use
> CLONE_NEWPID for every process.

I believe the current default limits allow using CLONE_NEWPID for every
process.  The data structures seem light enough as well.

> What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? 
> Does CLONE_NEWPID pose any risks that don't exist for
> CLONE_NEWUSER|CLONE_NEWPID?  Assuming we can't simply drop the
> CAP_SYS_ADMIN requirement, do you see a better solution for this use
> case?

CLONE_NEWPID without a permission check would allow runing a setuid root
application in a pid namespace.  Off the top of my head I can't think of
a really good exploit.  But when you mess up pid files, and hide
information from a privileged application I can completely imagine
forcing that application to misbehave in ways the attacker can control.
Leading to bad things.

Eric