From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60CAEC43441 for ; Tue, 27 Nov 2018 22:54:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 081E92086B for ; Tue, 27 Nov 2018 22:54:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=cisco.com header.i=@cisco.com header.b="jXOXHF9i" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 081E92086B Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=cisco.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726621AbeK1JyK (ORCPT ); Wed, 28 Nov 2018 04:54:10 -0500 Received: from alln-iport-8.cisco.com ([173.37.142.95]:54630 "EHLO alln-iport-8.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726277AbeK1JyK (ORCPT ); Wed, 28 Nov 2018 04:54:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=6533; q=dns/txt; s=iport; t=1543359282; x=1544568882; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=isLfO9iKlSjm6SKCaXGYcxyXogzIaS4H/B/J5fzc18o=; b=jXOXHF9idYe4W/RcksI8sCmvkMXncp7y0JdRm4r4CFDavWIU+N9X8oem dMUMw1y8hI4r9/DPTWDpIBRI0pLaFV5nzUiTfLRvMEfB9aBwKnBZD6gvE H+bYQGzzski4olHdirzuux1EjRiIn8/3VKT2mRMVahGboxKAC6ntoMXlu s=; X-IronPort-AV: E=Sophos;i="5.56,288,1539648000"; d="scan'208";a="204750576" Received: from rcdn-core-6.cisco.com ([173.37.93.157]) by alln-iport-8.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Nov 2018 22:54:41 +0000 Received: from [10.154.208.155] ([10.154.208.155]) by rcdn-core-6.cisco.com (8.15.2/8.15.2) with ESMTP id wARMscNO025515; Tue, 27 Nov 2018 22:54:39 GMT Subject: [PATCH v5 1/2] kernel/signal: Signal-based pre-coredump notification To: Oleg Nesterov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Arnd Bergmann , "Eric W. Biederman" , Khalid Aziz , Kate Stewart , Helge Deller , Greg Kroah-Hartman , Al Viro , Andrew Morton , Christian Brauner , Catalin Marinas , Will Deacon , Dave Martin , Mauro Carvalho Chehab , Michal Hocko , Rik van Riel , "Kirill A. Shutemov" , Roman Gushchin , Marcos Paulo de Souza , Dominik Brodowski , Cyrill Gorcunov , Yang Shi , Jann Horn , Kees Cook Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, "Victor Kamensky (kamensky)" , xe-linux-external@cisco.com, Stefan Strogin , Enke Chen References: <458c04d8-d189-4a26-729a-bb1d1d751534@cisco.com> <7741efa7-a3f8-62a1-ba52-613883164643@cisco.com> <84460a77-a111-404e-4bad-88104a6e246e@cisco.com> <20181026082812.GA10581@redhat.com> <21f678a8-4001-df36-c26e-e96cf203b1b1@cisco.com> <20181029111804.GA24820@redhat.com> <0c197608-3b7e-ffd1-8943-801a60beb917@cisco.com> From: Enke Chen Message-ID: <80e96710-f424-9b39-72ee-9cc7cbe7a5f7@cisco.com> Date: Tue, 27 Nov 2018 14:54:41 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <0c197608-3b7e-ffd1-8943-801a60beb917@cisco.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Auto-Response-Suppress: DR, OOF, AutoReply X-Outbound-SMTP-Client: 10.154.208.155, [10.154.208.155] X-Outbound-Node: rcdn-core-6.cisco.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Repost as a series, as suggested by Andrew Morton] For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. One application is BFD. The early fault notification is a critical component for maintaining BFD sessions (with a timeout value of 50 msec or 100 msec) across a control-plane failure. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process Event POSIX Signal Connector-based ---------------------------------------------------------------------- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/ N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit() SIGCHLD / exit_signal ---------------------------------------------------------------------- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen Reviewed-by: Oleg Nesterov --- v4 -> v5: Addressed review comments from Oleg Nesterov: o use rcu_read_lock instead. o revert back to notify the real_parent. fs/coredump.c | 23 +++++++++++++++++++++++ fs/exec.c | 3 +++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 ++++ kernel/sys.c | 13 +++++++++++++ 5 files changed, 46 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..740b1bb 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) return err; } +/* + * While do_notify_parent() notifies the parent of a child's death post + * its coredump, this function lets the parent (if so desired) know about + * the imminent death of a child just prior to its coredump. + */ +static void do_notify_parent_predump(void) +{ + struct task_struct *parent; + int sig; + + rcu_read_lock(); + parent = rcu_dereference(current->real_parent); + sig = parent->signal->predump_signal; + if (sig != 0) + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); + rcu_read_unlock(); +} + void do_coredump(const kernel_siginfo_t *siginfo) { struct core_state core_state; @@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* + * Send the pre-coredump signal to the parent if requested. + */ + do_notify_parent_predump(); + old_cred = override_creds(cred); ispipe = format_corename(&cn, &cprm); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef CONFIG_POSIX_TIMERS exit_itimers(sig); flush_itimer_signals(); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 13789d1..728ef68 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -112,6 +112,9 @@ struct signal_struct { int group_stop_count; unsigned int flags; /* see SIGNAL_* flags below */ + /* The signal sent prior to a child's coredump */ + int predump_signal; + /* * PR_SET_CHILD_SUBREAPER marks a process, like a service * manager, to re-parent orphan (double-forking) child processes diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0..79f0a8a 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -219,4 +219,8 @@ struct prctl_mm_map { # define PR_SPEC_DISABLE (1UL << 2) # define PR_SPEC_FORCE_DISABLE (1UL << 3) +/* Whether to receive signal prior to child's coredump */ +#define PR_SET_PREDUMP_SIG 54 +#define PR_GET_PREDUMP_SIG 55 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 123bd73..39aa3b8 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2476,6 +2476,19 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_SET_PREDUMP_SIG: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (!valid_signal((int)arg2)) + return -EINVAL; + me->signal->predump_signal = (int)arg2; + break; + case PR_GET_PREDUMP_SIG: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(me->signal->predump_signal, + (int __user *)arg2); + break; default: error = -EINVAL; break; -- 1.8.3.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Enke Chen Subject: [PATCH v5 1/2] kernel/signal: Signal-based pre-coredump notification Date: Tue, 27 Nov 2018 14:54:41 -0800 Message-ID: <80e96710-f424-9b39-72ee-9cc7cbe7a5f7@cisco.com> References: <458c04d8-d189-4a26-729a-bb1d1d751534@cisco.com> <7741efa7-a3f8-62a1-ba52-613883164643@cisco.com> <84460a77-a111-404e-4bad-88104a6e246e@cisco.com> <20181026082812.GA10581@redhat.com> <21f678a8-4001-df36-c26e-e96cf203b1b1@cisco.com> <20181029111804.GA24820@redhat.com> <0c197608-3b7e-ffd1-8943-801a60beb917@cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <0c197608-3b7e-ffd1-8943-801a60beb917@cisco.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Oleg Nesterov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Arnd Bergmann , "Eric W. Biederman" , Khalid Aziz , Kate Stewart , Helge Deller , Greg Kroah-Hartman , Al Viro , Andrew Morton , Christian Brauner , Catalin Marinas , Will Deacon , Dave Martin , Mauro Carvalho Chehab , Michal Hocko Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, "Victor Kamensky (kamensky)" , xe-linux-external@cisco.com, Stefan Strogin , Enke Chen List-Id: linux-arch.vger.kernel.org [Repost as a series, as suggested by Andrew Morton] For simplicity and consistency, this patch provides an implementation for signal-based fault notification prior to the coredump of a child process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can be used by an application to express its interest and to specify the signal for such a notification. Changes to prctl(2): PR_SET_PREDUMP_SIG (since Linux 4.20.x) Set the child pre-coredump signal of the calling process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the calling process will get prior to the coredump of a child process. This value is cleared across execve(2), or for the child of a fork(2). PR_GET_PREDUMP_SIG (since Linux 4.20.x) Return the current value of the child pre-coredump signal, in the location pointed to by (int *) arg2. Background: As the coredump of a process may take time, in certain time-sensitive applications it is necessary for a parent process (e.g., a process manager) to be notified of a child's imminent death before the coredump so that the parent process can act sooner, such as re-spawning an application process, or initiating a control-plane fail-over. One application is BFD. The early fault notification is a critical component for maintaining BFD sessions (with a timeout value of 50 msec or 100 msec) across a control-plane failure. Currently there are two ways for a parent process to be notified of a child process's state change. One is to use the POSIX signal, and another is to use the kernel connector module. The specific events and actions are summarized as follows: Process Event POSIX Signal Connector-based ---------------------------------------------------------------------- ptrace_attach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_STOPPED ptrace_detach() do_notify_parent_cldstop() proc_ptrace_connector() SIGCHLD / CLD_CONTINUED pre_coredump/ N/A proc_coredump_connector() get_signal() post_coredump/ do_notify_parent() proc_exit_connector() do_exit() SIGCHLD / exit_signal ---------------------------------------------------------------------- As shown in the table, the signal-based pre-coredump notification is not currently available. In some cases using a connector-based notification can be quite complicated (e.g., when a process manager is written in shell scripts and thus is subject to certain inherent limitations), and a signal-based notification would be simpler and better suited. Signed-off-by: Enke Chen Reviewed-by: Oleg Nesterov --- v4 -> v5: Addressed review comments from Oleg Nesterov: o use rcu_read_lock instead. o revert back to notify the real_parent. fs/coredump.c | 23 +++++++++++++++++++++++ fs/exec.c | 3 +++ include/linux/sched/signal.h | 3 +++ include/uapi/linux/prctl.h | 4 ++++ kernel/sys.c | 13 +++++++++++++ 5 files changed, 46 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e..740b1bb 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -536,6 +536,24 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) return err; } +/* + * While do_notify_parent() notifies the parent of a child's death post + * its coredump, this function lets the parent (if so desired) know about + * the imminent death of a child just prior to its coredump. + */ +static void do_notify_parent_predump(void) +{ + struct task_struct *parent; + int sig; + + rcu_read_lock(); + parent = rcu_dereference(current->real_parent); + sig = parent->signal->predump_signal; + if (sig != 0) + do_send_sig_info(sig, SEND_SIG_NOINFO, parent, PIDTYPE_TGID); + rcu_read_unlock(); +} + void do_coredump(const kernel_siginfo_t *siginfo) { struct core_state core_state; @@ -590,6 +608,11 @@ void do_coredump(const kernel_siginfo_t *siginfo) if (retval < 0) goto fail_creds; + /* + * Send the pre-coredump signal to the parent if requested. + */ + do_notify_parent_predump(); + old_cred = override_creds(cred); ispipe = format_corename(&cn, &cprm); diff --git a/fs/exec.c b/fs/exec.c index fc281b7..7714da7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1181,6 +1181,9 @@ static int de_thread(struct task_struct *tsk) /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; + /* Clear the pre-coredump signal before loading a new binary */ + sig->predump_signal = 0; + #ifdef CONFIG_POSIX_TIMERS exit_itimers(sig); flush_itimer_signals(); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 13789d1..728ef68 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -112,6 +112,9 @@ struct signal_struct { int group_stop_count; unsigned int flags; /* see SIGNAL_* flags below */ + /* The signal sent prior to a child's coredump */ + int predump_signal; + /* * PR_SET_CHILD_SUBREAPER marks a process, like a service * manager, to re-parent orphan (double-forking) child processes diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c0d7ea0..79f0a8a 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -219,4 +219,8 @@ struct prctl_mm_map { # define PR_SPEC_DISABLE (1UL << 2) # define PR_SPEC_FORCE_DISABLE (1UL << 3) +/* Whether to receive signal prior to child's coredump */ +#define PR_SET_PREDUMP_SIG 54 +#define PR_GET_PREDUMP_SIG 55 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 123bd73..39aa3b8 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2476,6 +2476,19 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, return -EINVAL; error = arch_prctl_spec_ctrl_set(me, arg2, arg3); break; + case PR_SET_PREDUMP_SIG: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (!valid_signal((int)arg2)) + return -EINVAL; + me->signal->predump_signal = (int)arg2; + break; + case PR_GET_PREDUMP_SIG: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = put_user(me->signal->predump_signal, + (int __user *)arg2); + break; default: error = -EINVAL; break; -- 1.8.3.1