From: Enke Chen <enkechen@cisco.com> To: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, Arnd Bergmann <arnd@arndb.de>, "Eric W. Biederman" <ebiederm@xmission.com>, Khalid Aziz <khalid.aziz@oracle.com>, Kate Stewart <kstewart@linuxfoundation.org>, Helge Deller <deller@gmx.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>, Christian Brauner <christian@brauner.io>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com>, Dave Martin <Dave.Martin@arm.com>, Mauro Carvalho Chehab <mchehab+samsung@kernel.org>, Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@surriel.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Roman Gushchin <guro@fb.com>, Marcos Paulo de Souza <marcos.souza.org@gmail.com>, Dominik Brodowski <linux@dominikbrodowski.net>, Cyrill Gorcunov <gorcunov@openvz.org>, Yang Shi <yang.shi@linux.alibaba.com>, Jann Horn <jannh@google.com>, Kees Cook <keescook@chromium.org>, x86@kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, "Victor Kamensky (kamensky)" <kamensky@cisco.com>, xe-linux-external@cisco.com, Stefan Strogin <sstrogin@cisco.com>, Enke Chen <enkechen@cisco.com> Subject: Re: [PATCH v2] kernel/signal: Signal-based pre-coredump notification Date: Tue, 23 Oct 2018 12:43:20 -0700 [thread overview] Message-ID: <1e68a3ce-32cd-b058-3d1d-36455ceca848@cisco.com> (raw) In-Reply-To: <20181023092348.GA14340@redhat.com> Hi, Oleg: Thanks for your review. Please see my replies inline. On 10/23/18 2:23 AM, Oleg Nesterov wrote: > On 10/22, Enke Chen wrote: >> >> As the coredump of a process may take time, in certain time-sensitive >> applications it is necessary for a parent process (e.g., a process >> manager) to be notified of a child's imminent death before the coredump >> so that the parent process can act sooner, such as re-spawning an >> application process, or initiating a control-plane fail-over. > > Personally I still do not like this feature, but I won't argue. > >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -546,6 +546,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> struct cred *cred; >> int retval = 0; >> int ispipe; >> + bool notify; >> struct files_struct *displaced; >> /* require nonrelative corefile path and be extra careful */ >> bool need_suid_safe = false; >> @@ -590,6 +591,15 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> if (retval < 0) >> goto fail_creds; >> >> + /* >> + * Send the pre-coredump signal to the parent if requested. >> + */ >> + read_lock(&tasklist_lock); >> + notify = do_notify_parent_predump(current); >> + read_unlock(&tasklist_lock); >> + if (notify) >> + cond_resched(); > > Hmm. I do not understand why do we need cond_resched(). And even if we need it, > why we can't call it unconditionally? Remember the goal is to allow the parent (e.g., a process manager) to take early action. The "yield" before doing coredump will help. The yield is made conditional because the notification is conditional. Is that ok? > > I'd also suggest to move read_lock/unlock(tasklist) into do_notify_parent_predump() > and remove the "task_struct *tsk" argument, tsk is always current. > > Yes, do_notify_parent() and do_notify_parent_cldstop() are called with tasklist_lock > held, but there are good reasons for that. Sure I will make the suggested changes. This function is only called in one place. > > >> +static inline int valid_predump_signal(int sig) >> +{ >> + return (sig == SIGCHLD) || (sig == SIGUSR1) || (sig == SIGUSR2); >> +} > > I still do not understand why do we need to restrict predump_signal. > > PR_SET_PREDUMP_SIG can only change the caller's ->predump_signal, so to me > even PR_SET_PREDUMP_SIG(SIGKILL) is fine. I will remove it to reduce the code size and give more flexibility to the application. > > And once again, SIGCHLD/SIGUSR do not queue, this means that PR_SET_PREDUMP_SIG > is pointless if you have 2 or more children. Hmm, could you point me to the code where SIGCHLD/SIGUSR is treated differently w.r.t. queuing? That does not sound right to me. > >> +bool do_notify_parent_predump(struct task_struct *tsk) >> +{ >> + struct sighand_struct *sighand; >> + struct kernel_siginfo info; >> + struct task_struct *parent; >> + unsigned long flags; >> + pid_t pid; >> + int sig; >> + >> + parent = tsk->parent; >> + sighand = parent->sighand; >> + pid = task_tgid_vnr(tsk); >> + >> + spin_lock_irqsave(&sighand->siglock, flags); >> + sig = parent->signal->predump_signal; >> + if (!valid_predump_signal(sig)) { >> + spin_unlock_irqrestore(&sighand->siglock, flags); >> + return false; >> + } > > Why do we need to check parent->signal->predump_signal under ->siglock? > This complicates the code for no reason, afaics. > >> + clear_siginfo(&info); >> + info.si_pid = pid; >> + info.si_signo = sig; >> + if (sig == SIGCHLD) >> + info.si_code = CLD_PREDUMP; >> + >> + __group_send_sig_info(sig, &info, parent); >> + __wake_up_parent(tsk, parent); > > Why __wake_up_parent() ? not needed, and will remove. > > do_notify_parent() does this to wake up the parent sleeping in do_wait(), to > report the event. But predump_signal has nothing to do with wait(). > > Now. This version sends the signal to ->parent, not ->real_parent. OK, but this > means that real_parent won't be notified if its child is traced. > > >> + case PR_SET_PREDUMP_SIG: >> + if (arg3 || arg4 || arg5) >> + return -EINVAL; >> + >> + /* 0 is valid for disabling the feature */ >> + if (arg2 && !valid_predump_signal((int)arg2)) >> + return -EINVAL; >> + me->signal->predump_signal = (int)arg2; >> + break; > > Again, I do not understand why do we need valid_predump_signal(). But even > if we need it, I don't understand why should we check it twice. IOW, why > do_notify_parent_predump() can't simply check ->predump_signal != 0? > > Whatever we do, PR_SET_PREDUMP_SIG should validate arg2 anyway. Who else can > change ->predump_signal after that? Ok, will relax. > >> + case PR_GET_PREDUMP_SIG: >> + if (arg3 || arg4 || arg5) >> + return -EINVAL; >> + error = put_user(me->signal->predump_signal, >> + (int __user *)arg2); > > To me it would be better to simply return ->predump_signal, iow > > error = me->signal->predump_signal; > break; > > but I won't insist, this is subjective and cosmetic. Vast majority of system calls returns 0 or -1. So does PR_GET_PDEATHSIG. I would like to keep them consistent. Thanks again. -- Enke
WARNING: multiple messages have this Message-ID (diff)
From: Enke Chen <enkechen@cisco.com> To: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, Arnd Bergmann <arnd@arndb.de>, "Eric W. Biederman" <ebiederm@xmission.com>, Khalid Aziz <khalid.aziz@oracle.com>, Kate Stewart <kstewart@linuxfoundation.org>, Helge Deller <deller@gmx.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Al Viro <viro@zeniv.linux.org.uk>, Andrew Morton <akpm@linux-foundation.org>, Christian Brauner <christian@brauner.io>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com>, Dave Martin <Dave.Martin@arm.com>, Mauro Carvalho Chehab <mchehab+samsung@kernel.org>, Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@s> Subject: Re: [PATCH v2] kernel/signal: Signal-based pre-coredump notification Date: Tue, 23 Oct 2018 12:43:20 -0700 [thread overview] Message-ID: <1e68a3ce-32cd-b058-3d1d-36455ceca848@cisco.com> (raw) In-Reply-To: <20181023092348.GA14340@redhat.com> Hi, Oleg: Thanks for your review. Please see my replies inline. On 10/23/18 2:23 AM, Oleg Nesterov wrote: > On 10/22, Enke Chen wrote: >> >> As the coredump of a process may take time, in certain time-sensitive >> applications it is necessary for a parent process (e.g., a process >> manager) to be notified of a child's imminent death before the coredump >> so that the parent process can act sooner, such as re-spawning an >> application process, or initiating a control-plane fail-over. > > Personally I still do not like this feature, but I won't argue. > >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -546,6 +546,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> struct cred *cred; >> int retval = 0; >> int ispipe; >> + bool notify; >> struct files_struct *displaced; >> /* require nonrelative corefile path and be extra careful */ >> bool need_suid_safe = false; >> @@ -590,6 +591,15 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> if (retval < 0) >> goto fail_creds; >> >> + /* >> + * Send the pre-coredump signal to the parent if requested. >> + */ >> + read_lock(&tasklist_lock); >> + notify = do_notify_parent_predump(current); >> + read_unlock(&tasklist_lock); >> + if (notify) >> + cond_resched(); > > Hmm. I do not understand why do we need cond_resched(). And even if we need it, > why we can't call it unconditionally? Remember the goal is to allow the parent (e.g., a process manager) to take early action. The "yield" before doing coredump will help. The yield is made conditional because the notification is conditional. Is that ok? > > I'd also suggest to move read_lock/unlock(tasklist) into do_notify_parent_predump() > and remove the "task_struct *tsk" argument, tsk is always current. > > Yes, do_notify_parent() and do_notify_parent_cldstop() are called with tasklist_lock > held, but there are good reasons for that. Sure I will make the suggested changes. This function is only called in one place. > > >> +static inline int valid_predump_signal(int sig) >> +{ >> + return (sig == SIGCHLD) || (sig == SIGUSR1) || (sig == SIGUSR2); >> +} > > I still do not understand why do we need to restrict predump_signal. > > PR_SET_PREDUMP_SIG can only change the caller's ->predump_signal, so to me > even PR_SET_PREDUMP_SIG(SIGKILL) is fine. I will remove it to reduce the code size and give more flexibility to the application. > > And once again, SIGCHLD/SIGUSR do not queue, this means that PR_SET_PREDUMP_SIG > is pointless if you have 2 or more children. Hmm, could you point me to the code where SIGCHLD/SIGUSR is treated differently w.r.t. queuing? That does not sound right to me. > >> +bool do_notify_parent_predump(struct task_struct *tsk) >> +{ >> + struct sighand_struct *sighand; >> + struct kernel_siginfo info; >> + struct task_struct *parent; >> + unsigned long flags; >> + pid_t pid; >> + int sig; >> + >> + parent = tsk->parent; >> + sighand = parent->sighand; >> + pid = task_tgid_vnr(tsk); >> + >> + spin_lock_irqsave(&sighand->siglock, flags); >> + sig = parent->signal->predump_signal; >> + if (!valid_predump_signal(sig)) { >> + spin_unlock_irqrestore(&sighand->siglock, flags); >> + return false; >> + } > > Why do we need to check parent->signal->predump_signal under ->siglock? > This complicates the code for no reason, afaics. > >> + clear_siginfo(&info); >> + info.si_pid = pid; >> + info.si_signo = sig; >> + if (sig == SIGCHLD) >> + info.si_code = CLD_PREDUMP; >> + >> + __group_send_sig_info(sig, &info, parent); >> + __wake_up_parent(tsk, parent); > > Why __wake_up_parent() ? not needed, and will remove. > > do_notify_parent() does this to wake up the parent sleeping in do_wait(), to > report the event. But predump_signal has nothing to do with wait(). > > Now. This version sends the signal to ->parent, not ->real_parent. OK, but this > means that real_parent won't be notified if its child is traced. > > >> + case PR_SET_PREDUMP_SIG: >> + if (arg3 || arg4 || arg5) >> + return -EINVAL; >> + >> + /* 0 is valid for disabling the feature */ >> + if (arg2 && !valid_predump_signal((int)arg2)) >> + return -EINVAL; >> + me->signal->predump_signal = (int)arg2; >> + break; > > Again, I do not understand why do we need valid_predump_signal(). But even > if we need it, I don't understand why should we check it twice. IOW, why > do_notify_parent_predump() can't simply check ->predump_signal != 0? > > Whatever we do, PR_SET_PREDUMP_SIG should validate arg2 anyway. Who else can > change ->predump_signal after that? Ok, will relax. > >> + case PR_GET_PREDUMP_SIG: >> + if (arg3 || arg4 || arg5) >> + return -EINVAL; >> + error = put_user(me->signal->predump_signal, >> + (int __user *)arg2); > > To me it would be better to simply return ->predump_signal, iow > > error = me->signal->predump_signal; > break; > > but I won't insist, this is subjective and cosmetic. Vast majority of system calls returns 0 or -1. So does PR_GET_PDEATHSIG. I would like to keep them consistent. Thanks again. -- Enke
next prev parent reply other threads:[~2018-10-23 19:43 UTC|newest] Thread overview: 148+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-13 0:33 [PATCH] kernel/signal: Signal-based pre-coredump notification Enke Chen 2018-10-13 0:33 ` Enke Chen 2018-10-13 6:40 ` Greg Kroah-Hartman 2018-10-13 6:40 ` Greg Kroah-Hartman 2018-10-15 18:16 ` Enke Chen 2018-10-15 18:16 ` Enke Chen 2018-10-15 18:43 ` Greg Kroah-Hartman 2018-10-15 18:43 ` Greg Kroah-Hartman 2018-10-15 18:49 ` Enke Chen 2018-10-15 18:49 ` Enke Chen 2018-10-15 18:58 ` Greg Kroah-Hartman 2018-10-15 18:58 ` Greg Kroah-Hartman 2018-10-13 10:44 ` Christian Brauner 2018-10-13 10:44 ` Christian Brauner 2018-10-15 18:39 ` Enke Chen 2018-10-15 18:39 ` Enke Chen 2018-10-15 18:39 ` Enke Chen 2018-10-13 18:27 ` Jann Horn 2018-10-13 18:27 ` Jann Horn 2018-10-15 18:36 ` Enke Chen 2018-10-15 18:36 ` Enke Chen 2018-10-15 18:54 ` Jann Horn 2018-10-15 18:54 ` Jann Horn 2018-10-15 19:23 ` Enke Chen 2018-10-15 19:23 ` Enke Chen 2018-10-19 23:01 ` Enke Chen 2018-10-19 23:01 ` Enke Chen 2018-10-22 15:40 ` Jann Horn 2018-10-22 15:40 ` Jann Horn 2018-10-22 20:48 ` Enke Chen 2018-10-22 20:48 ` Enke Chen 2018-10-15 12:05 ` Oleg Nesterov 2018-10-15 12:05 ` Oleg Nesterov 2018-10-15 18:54 ` Enke Chen 2018-10-15 18:54 ` Enke Chen 2018-10-15 19:17 ` Enke Chen 2018-10-15 19:17 ` Enke Chen 2018-10-15 19:26 ` Enke Chen 2018-10-15 19:26 ` Enke Chen 2018-10-16 14:14 ` Oleg Nesterov 2018-10-16 14:14 ` Oleg Nesterov 2018-10-16 15:09 ` Eric W. Biederman 2018-10-16 15:09 ` Eric W. Biederman 2018-10-16 15:09 ` Eric W. Biederman 2018-10-17 0:39 ` Enke Chen 2018-10-17 0:39 ` Enke Chen 2018-10-15 21:21 ` Alan Cox 2018-10-15 21:21 ` Alan Cox 2018-10-15 21:31 ` Enke Chen 2018-10-15 21:31 ` Enke Chen 2018-10-15 23:28 ` Eric W. Biederman 2018-10-15 23:28 ` Eric W. Biederman 2018-10-15 23:28 ` Eric W. Biederman 2018-10-16 0:33 ` valdis.kletnieks 2018-10-16 0:33 ` valdis.kletnieks 2018-10-16 0:33 ` valdis.kletnieks 2018-10-16 0:54 ` Enke Chen 2018-10-16 0:54 ` Enke Chen 2018-10-16 15:26 ` Eric W. Biederman 2018-10-16 15:26 ` Eric W. Biederman 2018-10-16 15:26 ` Eric W. Biederman 2018-10-22 21:09 ` [PATCH v2] " Enke Chen 2018-10-22 21:09 ` Enke Chen 2018-10-23 9:23 ` Oleg Nesterov 2018-10-23 9:23 ` Oleg Nesterov 2018-10-23 19:43 ` Enke Chen [this message] 2018-10-23 19:43 ` Enke Chen 2018-10-23 21:40 ` Enke Chen 2018-10-23 21:40 ` Enke Chen 2018-10-24 13:52 ` Oleg Nesterov 2018-10-24 13:52 ` Oleg Nesterov 2018-10-24 21:56 ` Enke Chen 2018-10-24 21:56 ` Enke Chen 2018-10-24 5:39 ` [PATCH v3] " Enke Chen 2018-10-24 5:39 ` Enke Chen 2018-10-24 14:02 ` Oleg Nesterov 2018-10-24 14:02 ` Oleg Nesterov 2018-10-24 22:02 ` Enke Chen 2018-10-24 22:02 ` Enke Chen 2018-10-25 22:56 ` [PATCH v4] " Enke Chen 2018-10-25 22:56 ` Enke Chen 2018-10-26 8:28 ` Oleg Nesterov 2018-10-26 8:28 ` Oleg Nesterov 2018-10-26 22:23 ` Enke Chen 2018-10-26 22:23 ` Enke Chen 2018-10-29 11:18 ` Oleg Nesterov 2018-10-29 11:18 ` Oleg Nesterov 2018-10-29 21:08 ` Enke Chen 2018-10-29 21:08 ` Enke Chen 2018-10-29 22:31 ` [PATCH v5] " Enke Chen 2018-10-29 22:31 ` Enke Chen 2018-10-30 16:46 ` Oleg Nesterov 2018-10-30 16:46 ` Oleg Nesterov 2018-10-31 0:25 ` Enke Chen 2018-10-31 0:25 ` Enke Chen 2018-11-22 0:37 ` Andrew Morton 2018-11-22 0:37 ` Andrew Morton 2018-11-22 1:09 ` Enke Chen 2018-11-22 1:09 ` Enke Chen 2018-11-22 1:18 ` Enke Chen 2018-11-22 1:18 ` Enke Chen 2018-11-22 1:33 ` Andrew Morton 2018-11-22 1:33 ` Andrew Morton 2018-11-22 4:57 ` Enke Chen 2018-11-22 4:57 ` Enke Chen 2018-11-12 23:22 ` Enke Chen 2018-11-12 23:22 ` Enke Chen 2018-11-27 22:54 ` [PATCH v5 1/2] " Enke Chen 2018-11-27 22:54 ` Enke Chen 2018-11-28 15:19 ` Dave Martin 2018-11-28 15:19 ` Dave Martin 2018-11-29 0:15 ` Enke Chen 2018-11-29 0:15 ` Enke Chen 2018-11-29 11:55 ` Dave Martin 2018-11-29 11:55 ` Dave Martin 2018-11-30 0:27 ` Enke Chen 2018-11-30 0:27 ` Enke Chen 2018-11-30 12:03 ` Oleg Nesterov 2018-11-30 12:03 ` Oleg Nesterov 2018-12-05 6:47 ` Jann Horn 2018-12-05 6:47 ` Jann Horn 2018-12-04 22:37 ` Andrew Morton 2018-12-04 22:37 ` Andrew Morton 2018-12-06 17:29 ` Oleg Nesterov 2018-12-06 17:29 ` Oleg Nesterov 2018-10-25 22:56 ` [PATCH] selftests/prctl: selftest for pre-coredump signal notification Enke Chen 2018-10-25 22:56 ` Enke Chen 2018-11-27 22:54 ` [PATCH v5 2/2] " Enke Chen 2018-11-27 22:54 ` Enke Chen 2018-10-24 13:29 ` [PATCH v2] kernel/signal: Signal-based pre-coredump notification Eric W. Biederman 2018-10-24 13:29 ` Eric W. Biederman 2018-10-24 13:29 ` Eric W. Biederman 2018-10-24 23:50 ` Enke Chen 2018-10-24 23:50 ` Enke Chen 2018-10-25 12:23 ` Eric W. Biederman 2018-10-25 12:23 ` Eric W. Biederman 2018-10-25 12:23 ` Eric W. Biederman 2018-10-25 20:45 ` Enke Chen 2018-10-25 20:45 ` Enke Chen 2018-10-25 21:24 ` Enke Chen 2018-10-25 21:24 ` Enke Chen 2018-10-25 21:56 ` Enke Chen 2018-10-25 21:56 ` Enke Chen 2018-10-25 13:45 ` Jann Horn 2018-10-25 13:45 ` Jann Horn 2018-10-25 20:21 ` Eric W. Biederman 2018-10-25 20:21 ` Eric W. Biederman 2018-10-25 20:21 ` Eric W. Biederman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1e68a3ce-32cd-b058-3d1d-36455ceca848@cisco.com \ --to=enkechen@cisco.com \ --cc=Dave.Martin@arm.com \ --cc=akpm@linux-foundation.org \ --cc=arnd@arndb.de \ --cc=bp@alien8.de \ --cc=catalin.marinas@arm.com \ --cc=christian@brauner.io \ --cc=deller@gmx.de \ --cc=ebiederm@xmission.com \ --cc=gorcunov@openvz.org \ --cc=gregkh@linuxfoundation.org \ --cc=guro@fb.com \ --cc=hpa@zytor.com \ --cc=jannh@google.com \ --cc=kamensky@cisco.com \ --cc=keescook@chromium.org \ --cc=khalid.aziz@oracle.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=kstewart@linuxfoundation.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@dominikbrodowski.net \ --cc=marcos.souza.org@gmail.com \ --cc=mchehab+samsung@kernel.org \ --cc=mhocko@kernel.org \ --cc=mingo@redhat.com \ --cc=oleg@redhat.com \ --cc=peterz@infradead.org \ --cc=riel@surriel.com \ --cc=sstrogin@cisco.com \ --cc=tglx@linutronix.de \ --cc=viro@zeniv.linux.org.uk \ --cc=will.deacon@arm.com \ --cc=x86@kernel.org \ --cc=xe-linux-external@cisco.com \ --cc=yang.shi@linux.alibaba.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.