From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753424AbcCLSFu (ORCPT ); Sat, 12 Mar 2016 13:05:50 -0500 Received: from 216-12-86-13.cv.mvl.ntelos.net ([216.12.86.13]:51714 "EHLO brightrain.aerifal.cx" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751921AbcCLSFk (ORCPT ); Sat, 12 Mar 2016 13:05:40 -0500 Date: Sat, 12 Mar 2016 13:05:31 -0500 From: Rich Felker To: Ingo Molnar Cc: Linus Torvalds , Andy Lutomirski , the arch/x86 maintainers , Linux Kernel Mailing List , Borislav Petkov , "musl@lists.openwall.com" , Andrew Morton , Thomas Gleixner , Peter Zijlstra Subject: Re: [musl] Re: [RFC PATCH] x86/vdso/32: Add AT_SYSINFO cancellation helpers Message-ID: <20160312180531.GD9349@brightrain.aerifal.cx> References: <20160310033446.GL9349@brightrain.aerifal.cx> <20160310111646.GA13102@gmail.com> <20160310164104.GM9349@brightrain.aerifal.cx> <20160310180331.GB15940@gmail.com> <20160310232819.GR9349@brightrain.aerifal.cx> <20160311093347.GA17749@gmail.com> <20160311113914.GD29662@port70.net> <20160312170040.GA1108@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160312170040.GA1108@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Mar 12, 2016 at 06:00:40PM +0100, Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > [...] > > > > Because if that's the case, I wonder if what you really want is not "sticky > > signals" as much as "synchronous signals" - ie the ability to say that a signal > > shouldn't ever interrupt in random places, but only at well-defined points > > (where a system call would be one such point - are there others?) > > Yes, I had similar 'deferred signal delivery' thoughts after having written up the > sticky signals approach, I just couldn't map all details of the semantics: see the > 'internal libc functions' problem below. > > If we can do this approach then there's another advantage as well: this way the C > library does not even have to poll for cancellation at syscall boundaries: i.e. > the regular system call fast path gets faster by 2-3 instructions as well. That is not a measurable benefit. You're talking about 2-3 cycles out of 10k or more cycles (these are heavy blocking syscalls not light things like SYS_time or SYS_getpid). > > So then you could make "pthread_setcanceltype()" just set that flag for the > > cancellation signal, and just know that the signal itself will always be > > deferred to such a synchronous point (ie system call entry). > > > > We already have the ability to catch things at system call entry (ptrace needs > > it, for example), so we could possibly make our signal delivery have a mode > > where a signal does *not* cause user space execution to be interrupted by a > > signal handler, but instead just sets a bit in the thread info state that then > > causes the next system call to take the signal. > > Yes, so this would need a bit of work, to handle the problem mentioned by Rich > Felker: "internal" libc APIs (such as name server lookups) may consist of a series > of complex system calls - some of which might be blocking. It should still be > possible to execute such 'internal' system calls undisturbed, even if a 'deferred' > signal is sent. That's equivalent to setcancelstate(disabled), and actually the mechanism we use for most "complex" functions since it's a lot simpler and more maintainable to build these complex functins on top of public APIs than direct inline syscalls or internal APIs that may change. In musl, direct non-cancellable syscall variants are mainly used in places where either it's just a single simple syscall (like close) or where calling the public API is already impossible for namespace reasons (e.g. inside stdio, which can't use POSIX namespace because it's implementing ISO C not POSIX). > One workable solution I think would be to prepare the internal functions for > eventual interruption by the cancellation signal. They have to be restartable > anyway, because the application can send other signals. As long as the > interruption is only transient it should be fine. No, that does not work. EINTR from a non-restarting signal is a specified, reportable error (despite being rather useles in practice due to race conditions; of course you can solve those with repeated signals and exponential backoff). We cannot just loop and retry on spurious EINTR except in a few cases where EINTR is optional or not used (like sem_wait). > And note that this approach would also be pretty fast on the libc side: none of > the 'fast' cancellation APIs would have to do anything complex like per call > signal blocking/unblocking or other complex signal operations. They would just > activate a straightforward new SA_ flag and rely on its semantics. It's already fast, aside from not being able to use sysenter/syscall instructions. I'm really frustrated that, again and again, we have kernel folks with no experience with libc implementation trying to redesign something that already has a simple zero-cost design that works on all existing systems, and proposing things that have a mix of immediately-obvious flaws and potential future problems we haven't even thought of yet. Even if your designs were ideal, we would end up with libc implementing two good designs and switching them at runtime based on kernel version, instead of just one good design. As it stands, every alternative proposed so far is _more_ complex on the libc side, _more_ complex on the kernel side, _and_ on top of that, requires having two implementations. Rich