From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758834Ab2DYDHH (ORCPT ); Tue, 24 Apr 2012 23:07:07 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:35824 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757157Ab2DYDHD (ORCPT ); Tue, 24 Apr 2012 23:07:03 -0400 Date: Wed, 25 Apr 2012 04:06:59 +0100 From: Al Viro To: Linus Torvalds Cc: Oleg Nesterov , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, Russell King , Tejun Heo , Arnd Bergmann Subject: Re: [RFC] TIF_NOTIFY_RESUME, arch/*/*/*signal*.c and all such Message-ID: <20120425030659.GE6871@ZenIV.linux.org.uk> References: <20120420025438.GD6871@ZenIV.linux.org.uk> <20120420080914.GF6871@ZenIV.linux.org.uk> <20120420160848.GG6871@ZenIV.linux.org.uk> <20120420164239.GH6871@ZenIV.linux.org.uk> <20120420180748.GI6871@ZenIV.linux.org.uk> <20120423180150.GA6871@ZenIV.linux.org.uk> <20120424072617.GB6871@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120424072617.GB6871@ZenIV.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > And then there's a lovely issue of what syscall restarts - both on multiple > arriving signals (we want the restart to apply on the _first_ signal being > processed, TYVM, since the rest of those signals are not interrupting > a syscall - conceptually, they are interrupting the previous signal handlers > at the point of entry) and on {rt_,}sigreturn(2) (where we might get a value > in the register normally used to return sys_whatever() value that would look > like one of restart-related special errors, except that it's simply what that > register used to have when e.g. a timer interrupt had hit while we had a signal > pending; hell to debug, since it looks e.g. like one register in userland > process getting its value randomly replaced with -EINTR if it happened to > contain -ERESTARTSYS when an interrupt had happened). I'd fixed one like > that on arm a couple of years ago, but AFAICS we still have several on > other architectures ;-/ FWIW, there's an interesting question rmk has brought up. Consider the following scenario (on any architecture): sigsuspend(2) sees a signal and returns ERESTARTNOHAND. do_signal() is called and calls get_signal_to_deliver() and gets 0, for whatever reason. We decide to restart, return address adjusted, syscall number returned to the right register in pt_regs. In the meanwhile, no matter what state interrupts used to have before, get_signal_to_deliver() has enabled them when returning, so we'll need to reload thread flags. And we find that another signal has arrived in the meanwhile. OK, do_signal() is called again, and this time we have a handler for the arrived signal. We form a stack frame and return to userland, into the beginning of the handler. We don't even look at the restart-related logics this time around, due to the usual logics protecting us from double restarts. Handler is executed, up to rt_sigreturn(2). We decode the sigcontext, restore pt_regs and return to userland. Right into the beginning of interrupted sigsuspend() So we have sigsuspend() hit by a signal we have a handler for. Handler is executed and we are stuck is sigsuspend() again, all because a signal without a handler has arrived just before that one - close enough for our signal to come right after get_signal_to_deliver() has returned zero to do_signal(). AFAICS, that's a clear bug. Arrival of a signal that has userland handler and that isn't blocked by the mask given to sigsuspend() should terminate sigsuspend(). Sure, we can weasel around the wording in POSIX (it says "sigsuspend() will return after the signal-catching function returns" without explicitly saying that it shouldn't sit around waiting for another signals to arrive), but it's obviously against the intent of standard (as well as expectations of any programmer). Note that if the handler-less signal had arrived a bit earlier, we would've returned to userland and reenter the kernel before the arrival of our signal. Then it *would* terminate sigsuspend() - handler would be executed and we would've returned to caller of sigsuspend() with EINTR as return value. If they came simultaneously, the same thing would've happened - get_signal_to_deliver() would have returned non-zero the first time around and that would be it. It's just the right timing for the second signal that triggers that race. Solution proposed last summer when that had been noticed by arm folks was more or less along the lines of * new thread flag, checked after we'd seen that no SIGPENDING et.al. is there. If it's set, we clear it, do syscall restart work as we would for handlerless signal and recheck the flags if we had to do something like __put_user() in process (arm might have to do that for ERESTART_RESTARTBLOCK)[1] * do_signal() would set that flag if + anti double-restart logics would not have prevented restarts + error value was ERESTART_... * no restart work on "no signal" path in do_signal() * if we have a handler and the flag is set, clear it and do what we normally do for restarts (including the "has ptrace mangled registers in a way that would prevent restarts in the current code" logics for architectures that have such logics - arm and sparc, at least). I would've probably made that TS_... instead of TIF_... at least for something like x86 - it's obviously thread-synchronous. [1] it wasn't covered in that thread, but if a signal arrives during that __put_user() we really won't care - the usual logics in sigreturn() will make sure that when we return from handler into the resulting restart_syscall(2) we'll have it immediately fail with -EINTR. Comments? AFAICS, the bug is arch-independent; it's not just arm and it's worse than the example mentioned last year - sigsuspend() is a lot more common than ppoll() and behaviour clearly goes against what sigsuspend() exists for.