From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Brauner Date: Thu, 16 May 2019 14:57:17 +0000 Subject: Re: [PATCH v1 1/2] pid: add pidfd_open() Message-Id: <20190516145716.ool2pzdqbfclnnqi@brauner.io> List-Id: References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> In-Reply-To: <20190516142659.GB22564@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Oleg Nesterov Cc: jannh@google.com, viro@zeniv.linux.org.uk, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, arnd@arndb.de, akpm@linux-foundation.org, cyphar@cyphar.com, dhowells@redhat.com, ebiederm@xmission.com, elena.reshetova@intel.com, keescook@chromium.org, luto@amacapital.net, luto@kernel.org, tglx@linutronix.de, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, joel@joelfernandes.org, dancol@google.com, serge@hallyn.c On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20744C04AAF for ; Thu, 16 May 2019 14:57:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E524820848 for ; Thu, 16 May 2019 14:57:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=brauner.io header.i=@brauner.io header.b="N/UXULef" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726856AbfEPO5W (ORCPT ); Thu, 16 May 2019 10:57:22 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:40432 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726796AbfEPO5W (ORCPT ); Thu, 16 May 2019 10:57:22 -0400 Received: by mail-ed1-f65.google.com with SMTP id j12so5681972eds.7 for ; Thu, 16 May 2019 07:57:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zu/DFoA8gurZzDl4gRPBRIxRpOpQplODrEJWOL6D0bE=; b=N/UXULef7deCmqVDFFuhdRP+EE/HVwi/f8Xw9poIWaOZnQrgvAqVWfNExWGd8BOsJf 33o1T0+h2MUW+EUGMeLcQHkrt56q01RBA4HuUCpP1HhL+qbKYrx8/hsh8yLGkBvmD49R 8+PbTDBAKauvYkDZWTDJ6HTdkLHobcXK0IWVXwUdxsfwEzukIgCacfL1mkKlnh4BufAE cmndL7Y+bK4GfjQMr6DNIgpxClV5ziCpFELiCjoMMwn4lPnJkr6DjWQm5xrMWvDy9MoN 9Tz8wKlpHlaRJ+TCJu0UoV0aK7s1NnsOhDY4lT/AMXF5dPM+Twab8dYUty/tvzJmCNb5 QTZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=zu/DFoA8gurZzDl4gRPBRIxRpOpQplODrEJWOL6D0bE=; b=CHcxKCTT/mW+aC9FZfVBrtnrMYlL2ZtpPiIUFYVxHl6QL/P7IfnovvrPbsSDp4k4lJ xiBfOOIAIkJUXgq3idGzfAJo6tBYbYf2lswd25pYxO5dUVDdKw1CsvM5m3p9v2zyh2bO kG2fdQB5U2StiIHiw8tPbxBmZVAK8owT8lHmkwLckNlh0/oYtMxpt+PpVEjcPYm0osJU hLENWyPdN1RGsWIfX/NtY8/HiayYFQ9sKnuy2l6scd1e+rtBGKVl3jTJHAt15rgs38mE Z7faGO01GC+xdu/8g9bTHvkMq6TdfR5TRiuXgutiIlX8+mxqYrLP//BDx6Ca1hdUzS3h /bXg== X-Gm-Message-State: APjAAAU06RugGJYp6d+PC+taJhvUZRkjG+So48MC9ZP1L0dL2yd51YNo ZXOhSf4uOuOFrripXetyOqSNqg== X-Google-Smtp-Source: APXvYqwezzg+v8pMXQltznYwK70iO/BCmxK3JFRnM2IhYSWFpsND0L8cwVVVFYHGIM+owfjxdC4+8Q== X-Received: by 2002:aa7:c483:: with SMTP id m3mr21927311edq.161.1558018640584; Thu, 16 May 2019 07:57:20 -0700 (PDT) Received: from brauner.io ([193.96.224.243]) by smtp.gmail.com with ESMTPSA id u11sm1122263ejr.48.2019.05.16.07.57.18 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 16 May 2019 07:57:20 -0700 (PDT) Date: Thu, 16 May 2019 16:57:17 +0200 From: Christian Brauner To: Oleg Nesterov Cc: jannh@google.com, viro@zeniv.linux.org.uk, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, arnd@arndb.de, akpm@linux-foundation.org, cyphar@cyphar.com, dhowells@redhat.com, ebiederm@xmission.com, elena.reshetova@intel.com, keescook@chromium.org, luto@amacapital.net, luto@kernel.org, tglx@linutronix.de, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, joel@joelfernandes.org, dancol@google.com, serge@hallyn.com, Geert Uytterhoeven Subject: Re: [PATCH v1 1/2] pid: add pidfd_open() Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190516142659.GB22564@redhat.com> User-Agent: NeoMutt/20180716 Sender: linux-parisc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-parisc@vger.kernel.org On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Brauner Subject: Re: [PATCH v1 1/2] pid: add pidfd_open() Date: Thu, 16 May 2019 16:57:17 +0200 Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Return-path: Content-Disposition: inline In-Reply-To: <20190516142659.GB22564@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Archive: List-Post: To: Oleg Nesterov Cc: jannh@google.com, viro@zeniv.linux.org.uk, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, arnd@arndb.de, akpm@linux-foundation.org, cyphar@cyphar.com, dhowells@redhat.com, ebiederm@xmission.com, elena.reshetova@intel.com, keescook@chromium.org, luto@amacapital.net, luto@kernel.org, tglx@linutronix.de, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, joel@joelfernandes.org, dancol@google.com, serge@hallyn.c List-ID: On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian From mboxrd@z Thu Jan 1 00:00:00 1970 From: christian at brauner.io (Christian Brauner) Date: Thu, 16 May 2019 16:57:17 +0200 Subject: [PATCH v1 1/2] pid: add pidfd_open() In-Reply-To: <20190516142659.GB22564@redhat.com> References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian From mboxrd@z Thu Jan 1 00:00:00 1970 From: christian@brauner.io (Christian Brauner) Date: Thu, 16 May 2019 16:57:17 +0200 Subject: [PATCH v1 1/2] pid: add pidfd_open() In-Reply-To: <20190516142659.GB22564@redhat.com> References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> Content-Type: text/plain; charset="UTF-8" Message-ID: <20190516145717.FAz8YSZ70Qj2Gsqlny2qQf7XrKGZBzR8Esfmv5LIHNw@z> On Thu, May 16, 2019@04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_NEOMUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0605C04AAF for ; Thu, 16 May 2019 15:04:25 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3A212082E for ; Thu, 16 May 2019 15:04:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=brauner.io header.i=@brauner.io header.b="N/UXULef" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3A212082E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=brauner.io Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 454ZT21QTxzDqMt for ; Fri, 17 May 2019 01:04:22 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=brauner.io (client-ip=2a00:1450:4864:20::544; helo=mail-ed1-x544.google.com; envelope-from=christian@brauner.io; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=brauner.io Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=brauner.io header.i=@brauner.io header.b="N/UXULef"; dkim-atps=neutral Received: from mail-ed1-x544.google.com (mail-ed1-x544.google.com [IPv6:2a00:1450:4864:20::544]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 454ZK00cPFzDqdG for ; Fri, 17 May 2019 00:57:23 +1000 (AEST) Received: by mail-ed1-x544.google.com with SMTP id g57so5640622edc.12 for ; Thu, 16 May 2019 07:57:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zu/DFoA8gurZzDl4gRPBRIxRpOpQplODrEJWOL6D0bE=; b=N/UXULef7deCmqVDFFuhdRP+EE/HVwi/f8Xw9poIWaOZnQrgvAqVWfNExWGd8BOsJf 33o1T0+h2MUW+EUGMeLcQHkrt56q01RBA4HuUCpP1HhL+qbKYrx8/hsh8yLGkBvmD49R 8+PbTDBAKauvYkDZWTDJ6HTdkLHobcXK0IWVXwUdxsfwEzukIgCacfL1mkKlnh4BufAE cmndL7Y+bK4GfjQMr6DNIgpxClV5ziCpFELiCjoMMwn4lPnJkr6DjWQm5xrMWvDy9MoN 9Tz8wKlpHlaRJ+TCJu0UoV0aK7s1NnsOhDY4lT/AMXF5dPM+Twab8dYUty/tvzJmCNb5 QTZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=zu/DFoA8gurZzDl4gRPBRIxRpOpQplODrEJWOL6D0bE=; b=PBEdb4mEXOfWZjzHLUVvtuPUIFVH2ZBI/y8QPp6sfWodHDOIqFr8KXrOelIP6LW5qX 5MdZJpXkWwM2zrJT7J/GyI/KFiHCL/sesV8hXYvuGCdDRZfNdXe8BuO3TKsXLN0jE5wO /EPNYraZvKZr/mQPPkInOSkHOYe1y8lUtcwnMolow1UaveArjke+5ObYfy5gTn5e7F4m YpS+Qchgah9Z+VHWcgnuO9L938KtnC4agEzFYKHzYoVgrQzCoRgl2iXObm/aeS+wLCkd Rbq1og1shBVJkvQXIzPy7BdiYP8KuajcD5qMNdEcLnfcDWFFSCkLAd++9+oY7JPbZMt6 VINg== X-Gm-Message-State: APjAAAVrclslQlC2geJR9ZJ8IXlTJl8rMflgtP9xk/rhyAYsj6IDNlID zA1aJnuK15oW7rOFmN6hLbHKmQ== X-Google-Smtp-Source: APXvYqwezzg+v8pMXQltznYwK70iO/BCmxK3JFRnM2IhYSWFpsND0L8cwVVVFYHGIM+owfjxdC4+8Q== X-Received: by 2002:aa7:c483:: with SMTP id m3mr21927311edq.161.1558018640584; Thu, 16 May 2019 07:57:20 -0700 (PDT) Received: from brauner.io ([193.96.224.243]) by smtp.gmail.com with ESMTPSA id u11sm1122263ejr.48.2019.05.16.07.57.18 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 16 May 2019 07:57:20 -0700 (PDT) Date: Thu, 16 May 2019 16:57:17 +0200 From: Christian Brauner To: Oleg Nesterov Subject: Re: [PATCH v1 1/2] pid: add pidfd_open() Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20190516142659.GB22564@redhat.com> User-Agent: NeoMutt/20180716 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, linux-mips@vger.kernel.org, dhowells@redhat.com, joel@joelfernandes.org, linux-kselftest@vger.kernel.org, sparclinux@vger.kernel.org, linux-api@vger.kernel.org, elena.reshetova@intel.com, linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, dancol@google.com, Geert Uytterhoeven , serge@hallyn.com, linux-xtensa@linux-xtensa.org, keescook@chromium.org, arnd@arndb.de, jannh@google.com, linux-m68k@lists.linux-m68k.org, viro@zeniv.linux.org.uk, luto@kernel.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-parisc@vger.kernel.org, cyphar@cyphar.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, luto@amacapital.net, ebiederm@xmission.com, linux-alpha@vger.kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_NEOMUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1776C04AAF for ; Thu, 16 May 2019 14:57:31 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8376620848 for ; Thu, 16 May 2019 14:57:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="aQRyYrcM"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=brauner.io header.i=@brauner.io header.b="N/UXULef" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8376620848 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=brauner.io Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=1XOgmNXTkvPHoFgmQCmmKYlovKBqFaz5bTHt3sL69u8=; b=aQRyYrcMRw57u1 /TOwoEZArwktc99e15c9QAr3+KIrC0wcuwHdtnudGitoaZFOkLFGlq9j2kZftyV0D2lDSbENVvZvn 0if2Gey0CpHsuDI6vh0eZQHBCCeiulCKGEZrJT9kffXl6DA541keSsfHMYMCKCPqW1I8p9LO8OWl8 uqfIsdkv2pb3efLs16aTWmDyxEV2wzhVaELL3g4YIqUmjnsXyvBVUR+iYEovnMmBEcEZMJUzPRw9P TtDsTBkLMz3mgqX84EL76u25CIKK71U7iPnar4H4ZG2LpudVT4c4rnquuDNeksXIYKFL6wOrxFXYE T+AAG8sE1HDhnUoI28Hw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hRHoz-0007D4-AS; Thu, 16 May 2019 14:57:25 +0000 Received: from mail-ed1-x541.google.com ([2a00:1450:4864:20::541]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1hRHow-0007C3-7F for linux-arm-kernel@lists.infradead.org; Thu, 16 May 2019 14:57:23 +0000 Received: by mail-ed1-x541.google.com with SMTP id b8so5645664edm.11 for ; Thu, 16 May 2019 07:57:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zu/DFoA8gurZzDl4gRPBRIxRpOpQplODrEJWOL6D0bE=; b=N/UXULef7deCmqVDFFuhdRP+EE/HVwi/f8Xw9poIWaOZnQrgvAqVWfNExWGd8BOsJf 33o1T0+h2MUW+EUGMeLcQHkrt56q01RBA4HuUCpP1HhL+qbKYrx8/hsh8yLGkBvmD49R 8+PbTDBAKauvYkDZWTDJ6HTdkLHobcXK0IWVXwUdxsfwEzukIgCacfL1mkKlnh4BufAE cmndL7Y+bK4GfjQMr6DNIgpxClV5ziCpFELiCjoMMwn4lPnJkr6DjWQm5xrMWvDy9MoN 9Tz8wKlpHlaRJ+TCJu0UoV0aK7s1NnsOhDY4lT/AMXF5dPM+Twab8dYUty/tvzJmCNb5 QTZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=zu/DFoA8gurZzDl4gRPBRIxRpOpQplODrEJWOL6D0bE=; b=A/b1SvC3ZPpy0PFrQ1GbgIXhgl5BKUd+2RAnP6ECfx+fcCjScfc1VrQDBq3I3y6oOt EwoVLAiYNbF/Z1ay+0k1KuW3lwZ+M/NSaLPSBJLE0WZ7IcEWg1U1Zjor0HgwDDQtrdLD FONi59ZMyIZswVazCoecDcen00dwBz4r3btcb3wnuKbCRz9GWuyEO+Q+mc5Lfy+9zwpd OnYc6tLQ/X8W/8XpbP3cqqfrui19VfW7i2JIQGPBsdihJ7qsdrxwXTgr7jXCUIaPwDDt 4YUuIbOpgFz9vRtWsb+PqevsZAD73GDOy8uNisUhgVjlls2ludNsF9/+vumukiZRyibd bB5Q== X-Gm-Message-State: APjAAAU5I8rxUWEC/pPYxxYTLbetD3HTslA+7W54WlPXImqP+QznhomT aXczweH0jOgm2gkICx2GEEgCfA== X-Google-Smtp-Source: APXvYqwezzg+v8pMXQltznYwK70iO/BCmxK3JFRnM2IhYSWFpsND0L8cwVVVFYHGIM+owfjxdC4+8Q== X-Received: by 2002:aa7:c483:: with SMTP id m3mr21927311edq.161.1558018640584; Thu, 16 May 2019 07:57:20 -0700 (PDT) Received: from brauner.io ([193.96.224.243]) by smtp.gmail.com with ESMTPSA id u11sm1122263ejr.48.2019.05.16.07.57.18 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 16 May 2019 07:57:20 -0700 (PDT) Date: Thu, 16 May 2019 16:57:17 +0200 From: Christian Brauner To: Oleg Nesterov Subject: Re: [PATCH v1 1/2] pid: add pidfd_open() Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> References: <20190516135944.7205-1-christian@brauner.io> <20190516142659.GB22564@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190516142659.GB22564@redhat.com> User-Agent: NeoMutt/20180716 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190516_075722_265266_FE7FA9B4 X-CRM114-Status: GOOD ( 18.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, linux-mips@vger.kernel.org, dhowells@redhat.com, joel@joelfernandes.org, linux-kselftest@vger.kernel.org, sparclinux@vger.kernel.org, linux-api@vger.kernel.org, elena.reshetova@intel.com, linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, dancol@google.com, Geert Uytterhoeven , serge@hallyn.com, linux-xtensa@linux-xtensa.org, keescook@chromium.org, arnd@arndb.de, jannh@google.com, linux-m68k@lists.linux-m68k.org, viro@zeniv.linux.org.uk, luto@kernel.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-parisc@vger.kernel.org, cyphar@cyphar.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, luto@amacapital.net, ebiederm@xmission.com, linux-alpha@vger.kernel.org, akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote: > On 05/16, Christian Brauner wrote: > > > > With the introduction of pidfds through CLONE_PIDFD it is possible to > > created pidfds at process creation time. > > Now I am wondering why do we need CLONE_PIDFD, you can just do > > pid = fork(); > pidfd_open(pid); CLONE_PIDFD eliminates the race at the source and let's us avoid two syscalls for the sake of one. That'll obviously matter even more when we enable CLONE_THREAD | CLONE_PIDFD. pidfd_open() is really just a necessity for anyone who does non-parent process management aka LMK or service managers. I also would like to reserve the ability at some point (e.g. with cloneX or sm) to be able to specify specific additional flags at process creation time that modify pidfd behavior. > > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) > > +{ > > + int fd, ret; > > + struct pid *p; > > + struct task_struct *tsk; > > + > > + if (flags) > > + return -EINVAL; > > + > > + if (pid <= 0) > > + return -EINVAL; > > + > > + p = find_get_pid(pid); > > + if (!p) > > + return -ESRCH; > > + > > + ret = 0; > > + rcu_read_lock(); > > + /* > > + * If this returns non-NULL the pid was used as a thread-group > > + * leader. Note, we race with exec here: If it changes the > > + * thread-group leader we might return the old leader. > > + */ > > + tsk = pid_task(p, PIDTYPE_TGID); > > + if (!tsk) > > + ret = -ESRCH; > > + rcu_read_unlock(); > > + > > + fd = ret ?: pidfd_create(p); > > + put_pid(p); > > + return fd; > > +} > > Looks correct, feel free to add Reviewed-by: Oleg Nesterov > > But why do we need task_struct *tsk? > > rcu_read_lock(); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_read_unlock(); Sure, that's simpler. I'll rework and add your Reviewed-by. > > and in fact we do not even need rcu_read_lock(), we could do > > // shut up rcu_dereference_check() > rcu_lock_acquire(&rcu_lock_map); > if (!pid_task(PIDTYPE_TGID)) > ret = -ESRCH; > rcu_lock_release(&rcu_lock_map); > > Well... I won't insist, but the comment about the race with exec looks a bit > confusing to me. It is true, but we do not care at all, we are not going to > use the task_struct returned by pid_task(). Yeah, I can remove it. Thanks! Christian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel