From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89FD9C46475 for ; Thu, 25 Oct 2018 12:24:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 20D4C20831 for ; Thu, 25 Oct 2018 12:24:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20D4C20831 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727458AbeJYU4g (ORCPT ); Thu, 25 Oct 2018 16:56:36 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:38540 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727228AbeJYU4g (ORCPT ); Thu, 25 Oct 2018 16:56:36 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1gFeg0-00045G-KM; Thu, 25 Oct 2018 06:23:48 -0600 Received: from 67-3-154-154.omah.qwest.net ([67.3.154.154] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1gFefl-0001QP-6o; Thu, 25 Oct 2018 06:23:48 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Enke Chen Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Arnd Bergmann , Khalid Aziz , Kate Stewart , Helge Deller , Greg Kroah-Hartman , Al Viro , Andrew Morton , Christian Brauner , Catalin Marinas , Will Deacon , Dave Martin , Mauro Carvalho Chehab , Michal Hocko , Rik van Riel , "Kirill A. Shutemov" , Roman Gushchin , Marcos Paulo de Souza , Oleg Nesterov , Dominik Brodowski , Cyrill Gorcunov , Yang Shi , Jann Horn , Kees Cook , x86@kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, "Victor Kamensky \(kamensky\)" , xe-linux-external@cisco.com, Stefan Strogin References: <458c04d8-d189-4a26-729a-bb1d1d751534@cisco.com> <87sh0vpj5q.fsf@xmission.com> Date: Thu, 25 Oct 2018 07:23:07 -0500 In-Reply-To: (Enke Chen's message of "Wed, 24 Oct 2018 16:50:26 -0700") Message-ID: <87zhv2md04.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1gFefl-0001QP-6o;;;mid=<87zhv2md04.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.154.154;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/LtjS9Fcc+DHWaV6YDmpqliZiKuzg+BD4= X-SA-Exim-Connect-IP: 67.3.154.154 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v2] kernel/signal: Signal-based pre-coredump notification X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Enke Chen writes: > Hi, Eric: > > Thanks for your comments. Please see my replies inline. > > On 10/24/18 6:29 AM, Eric W. Biederman wrote: >> Enke Chen writes: >> >>> For simplicity and consistency, this patch provides an implementation >>> for signal-based fault notification prior to the coredump of a child >>> process. A new prctl command, PR_SET_PREDUMP_SIG, is defined that can >>> be used by an application to express its interest and to specify the >>> signal (SIGCHLD or SIGUSR1 or SIGUSR2) for such a notification. A new >>> signal code (si_code), CLD_PREDUMP, is also defined for SIGCHLD. >>> >>> Changes to prctl(2): >>> >>> PR_SET_PREDUMP_SIG (since Linux 4.20.x) >>> Set the child pre-coredump signal of the calling process to >>> arg2 (either SIGUSR1, or SIUSR2, or SIGCHLD, or 0 to clear). >>> This is the signal that the calling process will get prior to >>> the coredump of a child process. This value is cleared across >>> execve(2), or for the child of a fork(2). >>> >>> When SIGCHLD is specified, the signal code will be set to >>> CLD_PREDUMP in such an SIGCHLD signal. >> >> Your signal handling is still not right. Please read and comprehend >> siginfo_layout. >> >> You have not filled in all of the required fields for the SIGCHLD case. >> For the non SIGCHLD case you are using si_code == 0 == SI_USER which is >> very wrong. This is not a user generated signal. >> >> Let me say this slowly. The pair si_signo si_code determines the union >> member of struct siginfo. That needs to be handled consistently. You >> aren't. I just finished fixing this up in the entire kernel and now you >> are trying to add a usage that is worst than most of the bugs I have >> fixed. I really don't appreciate having to deal with no bugs. >> > > My apologies. I will investigate and make them consistent. > >> >> >> Further siginfo can be dropped. Multiple signals with the same signal >> number can be consolidated. What is your plan for dealing with that? > > The primary application for the early notification involves a process > manager which is responsible for re-spawning processes or initiating > the control-plane fail-over. There are two models: > > One model is to have 1:1 relationship between a process manager and > application process. There can only be one predump-signal (say, SIGUSR1) > from the child to the parent, and will unlikely be dropped or consolidated. > > Another model is to have 1:N where there is only one process manager with > multiple application processes. One of the RT signal can be used to help > make it more reliable. Which suggests you want one of the negative si_codes, and to use the _rt siginfo member like sigqueue. >> Other code paths pair with wait to get the information out. There >> is no equivalent of wait in your code. > > I was not aware of that before. Let me investigate. > >> >> Signals can be delayed by quite a bit, scheduling delays etc. They can >> not provide any meaningful kind of real time notification. >> > > The timing requirement is about 50-100 msecs for BFD. Not sure if that > qualifies as "real time". This mechanism has worked well in deployment > over the years. It would help if those numbers were put into the patch description so people can tell if the mechanism is quick enough. >> So between delays and loss of information signals appear to be a very >> poor fit for this usecase. >> >> I am concerned about code that does not fit the usecase well because >> such code winds up as code that no one cares about that must be >> maintained indefinitely, because somewhere out there there is one use >> that would break if the interface was removed. This does not feel like >> an interface people will want to use and maintain in proper working >> order forever. >> >> Ugh. Your test case is even using signalfd. So you don't even want >> this signal to be delivered as a signal. > > I actually tested sigaction()/waitpid() as well. If there is a preference, > I can check in the sigaction()/waitpid() version instead. > >> >> You add an interface that takes a pointer and you don't add a compat >> interface. See Oleg's point of just returning the signal number in the >> return code. > > This is what Oleg said "but I won't insist, this is subjective and cosmetic". > > It is no big deal either way. It just seems less work if we do not keep > adding exceptions to the prctl(2) manpage: > > prctl(2): > > On success, PR_GET_DUMPABLE, PR_GET_KEEPCAPS, PR_GET_NO_NEW_PRIVS, PR_CAPBSET_READ, PR_GET_TIMING, PR_GET_SECUREBITS, > PR_MCE_KILL_GET, PR_CAP_AMBIENT+PR_CAP_AMBIENT_IS_SET, and (if it returns) PR_GET_SECCOMP return the nonnegative values described > above. All other option values return 0 on success. On error, -1 is returned, and errno is set appropriately. More work in the man page versus less work in the kernel, and less code to maintain. I will vote for more work in the man page. >> Now I am wondering how well prctl works from a 32bit process on a 64bit >> kernel. At first glance it looks like it probably does not work. >> > > I am not sure which part would be problematic. 32bit pointers need to be translated into 64bit pointers. If the system call does not zero extend them. Plus structure sizes. I think prctl is just inside the line where problems happen but it is so close to the line of structure size differences that it makes me nervous. Typically pointers in structures are what cause system calls to cross that line. >> Consistency with PDEATHSIG is not a good argument for anything. >> PDEATHSIG at the present time is unusable in the real world by most >> applications that want something like it. > > Agreed, PDEATHSIG seems to have a few issues ... > >> >> So far I see an interface that even you don't want to use as designed, >> that is implemented incorrectly. >> >> The concern is real and deserves to be addressed. I don't think signals >> are the right way to handle it, and certainly not this patch as it >> stands. > > I will address your concerns on the patch. Regarding the requirement and the > overall solution, if there are specific questions that I have not answered, > please let me know. So far so good. Eric