Linux-arch Archive on lore.kernel.org
 help / color / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: James Hogan <james.hogan@imgtec.com>
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Arnd Bergmann <arnd@arndb.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling)
Date: Sat, 8 Dec 2012 07:44:29 +0000
Message-ID: <20121208074429.GC4939@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20121206220955.GZ4939@ZenIV.linux.org.uk>

On Thu, Dec 06, 2012 at 10:09:55PM +0000, Al Viro wrote:
> 	"Subtle and undocumented" is an extremely polite way to describe that.
> By now we had at least a dozen architectures step on that trap, simply because
> they had different calling conventions and the same logics did *not* "just
> work" there.  
> 
> 	What we need to guarantee is
> * restarts do not happen on signals caught in interrupts or exceptions
> * restarts do not happen on signals caught in sigreturn()
> * restart should happen only once, even if we get through do_signal() many
> times.

FWIW, here's the current situation:

alpha: works.  Double restarts are prevented by the loop in do_work_pending()
resetting 'r0' (syscall number or 0 if restarts should not be done)
to 0 after the first call of do_signal(); all restart logics is conditional
on r0 != 0.  The logics making sure that we get the right value passed
to do_work_pending() is convoluted and had cost us at least one bug
(sigreturn/rt_sigreturn had stopped only once in case of straced process;
strace(1) got seriously confused and produced garbage).

arm: works.  Double restarts are prevented by logics similar to alpha
do_work_pending(); prevention of restarts on non-syscalls and sigreturn is
done by asm glue setting r8 ('why', aka 'tbl') to 0 in non-syscall entry points
and to syscall table address in syscall entry; zeroed in asm wrappers for
sigreturn/rt_sigreturn.  Used to be broken until several years ago.

arm64: works.  Syscall number is in pt_regs (->syscallno); -1 for non-syscall
ones.  Reaching do_signal() the first time around will set it to -1 and so will
sigreturn (in restore_sigframe()).  Restart logics is conditional on
->syscallno being non-negative.

avr32: _very_ odd logics used to decide whether to do restarts or not and
frankly, I do not believe that it could possibly work correctly - whatever
we do when building a sigframe, we don't touch SYSREG_SR in process, so
that won't prevent double restarts.  And if we had r12 (first argument of
syscall) restart-worthy at the entry, setup_syscall_restart() will leave
us with restart-worthy value in ->r12.  So e.g. pause(2) called when
r12 happened to contain -514 (it's a zero-argument syscall, so calling it
doesn't involve assignments to r12) will happily hit double restarts if
we have e.g. SIGCHLD coming often enough.  If that thing works, I would
really appreciate a detailed explanation of how it manages to do that - it
definitely deserves one.

blackfin: doesn't handle multiple signals; if you get a SIGSEGV generated
by failing attempt to set a sigframe up, too bad - you'll pass to userland
and coredump will hit at some later point when a hardware interrupt happend.
Restarts on sigreturn and non-syscalls are prevented by checking if
->orig_p0 is non-negative (similar to arm64 solution above) and it's easy to
turn into prevention of double restarts, which will become necessary as soon
as multiple signal handling gets fixed.  Actually, it's almost OK as is -
ERESTART_RESTARTBLOCK case is the only problematic one.

c6x: there's a flag next to pt_regs on stack and it's non-zero if and only
if we have a syscall.  Passed explicitly to do_notify_resume() to tell if
restarts are allowed.  As far as I can see, it is vulnerable to bogus
restarts on sigreturn (can be fixed by clearing the same flag in
do_rt_sigreturn() - simple *(long *)(regs + 1) = 0 in there will do).
It might be vulnerable to double restarts as well - pause(2) is not
enabled there, but it's not much comfort.  In the best case we are relying
on the following property:
	no syscall can return -ERESTART... when called with the first
argument equal to that value.
Might be true (the usual suspects are pause() and ancient sigsuspend() of
3-argument variety and neither is used here), but it's brittle as hell.
Come to think of that, clone(2) would probably fit the bill - we ignore
all unknown bits in the mask and clone(2) *can* return -ERESTARTNOINTR.
So it's almost certainly vulnerable.  The same fix would do, but explicit
loop a-la arm might be better.

cris: same lossage as on blackfin (quits after the first signal).  Vulnerable
to bogus restarts on sigreturn.  Would be vulnerable to double restarts if it
handled multiple signals.

frv: works (similar to the situation on arm64.  Used to be broken until a
couple of years ago.

h8300: works.  Prevention of restarts on sigreturn and non-syscalls based
on sign of ->orig_er0; the first pass through syscall restart logics renders
regs->er0 (return value) non-restart-worthy - anything that used to be
restart-worthy becomes either non-negative (->orig_er0 has to be, or we
won't touch that at all) or -EINTR.  In other words, we can't hit that
sucker twice.

hexagon: broken.  Prevention of restarts on non-syscalls is based on
sign of ->syscall_nr.  sigreturn carefully sets it *positive* and that
makes it vulnerable to bogus restarts.  Moreover, double restarts are
possible as well, same as on c6x.

ia64: really, really weird.  The main source of weirdness is that in
addition to usual cleanup on return from handle, it has very non-trivial
work done on *entry* to handle (register stack manipulations) and its
asm glue is really a thing to behold - from a safe distance, preferably
with warranty that you won't need to touch it.  I *think* it avoids the
restart-related holes, but...

m32r: works.  regs->syscall_nr is used in more or less usual fashion;
sigreturn and non-syscalls set it to -1 and so does the shiftback
logics in case of do_signal().  I would probably move setting it to -1
from prev_insn() to just before both switches by return value, but that's
cosmetical.  Used to be broken until a couple of years ago...

m68k: works.  Same situation as for x86 - ->orig_d0 set to -1 by non-syscalls
and sigreturn, restart-worthy return value (in ->d0) can be had only if
the syscall number (in ->orig_d0) was non-negative and that's enough to
make it non-restart-worthy after handle_restart().

microblaze: broken, fixes await an ACK from maintainer.

mips: works.  regs->regs[0] is used more or less as ->syscall and friends
are on other architectures.  Explicitly cleaned on syscall restart.
sigreturn bypasses the codepath on the normal way out of syscall where
the thing is set to non-zero (we do it only on sys_something() returning
an error there).  Used to be broken...

mn10300: works.  Usual story, ->orig_d0 set to -1 by non-syscalls, by
sigreturn and by restart logics.  This approach is possible on architectures
where the register clobbered by return value is used for syscall number;
preserved value can't be negative, or we'll never get a restart-worthy
return value in the first place.  Used to be broken...

openrisc: broken.  regs->orig_gpr11 can be easily used to fix - it fits the
usual model, but isn't set by sigreturn/restarts.  BTW, the comment around
the call of do_notify_resume() in the asm glue is deeply confused - we *do*
want the userspace pt_regs; fortunately, there can't be any on top of those
at that point.

parisc: works.  regs->orig_r28 is used as a flag suppressing restarts (used
by sigreturn and restart).  Used to be broken...

powerpc: works.  regs->trap is used to tell syscalls from non-syscalls and
it's tweaked by sigreturn/restarts.  Used to be broken...

s390: works.  This one uses thread flag (TIF_SYSCALL) as an indicator,
clearing it in sigreturn and restarts.  In reality, it uses the same
"handlerless restarts without stepping out into userland" model as
arm does (well, the other way round, really), but the loop is done in
asm glue instead of taking it into C...  The things are slightly complicated
because the same flag is used to tell the caller of do_signal() that it
ought to do restart-without-stepping-out right now.  IMO taking the
damn thing into C would make it at least easier to review, but there might
be some dragons elsewhere in that loop.

score: works (well, in that respect, at least).  Explicit regs->in_syscall
handled in usual fashion.  Used to be broken (double restarts prevented,
but sigreturn missed).

sh32: regs->tra < 0 used to suppress restarts on non-syscalls and sigreturn,
but it seems to be vulnerable to double restarts.

sh64: similar, but this one actually works - regs->syscall_nr is used in a
similar way, but there the register clobbered by return value seems to be
the one used to pass the syscall number.

sparc: works.  syscall is indicated by a bit stolen from psr or tstate,
cleared in sigreturn/restarts.  Used to be broken...

tile: works.  regs->faultnum is used to suppress restarts - sigreturn
flips it from "I'm a syscall" to "I'm a sigreturn" and so does do_signal().
Used to be broken...

unicore32: works, but would really benefit from switch to *current* arm
variant (it's very obviously modelled after arch/arm, but the changes
done on arm hadn't propagated there).

x86: works.  The usual "we are using the same register for syscall number
and for return value, so we can't go through restart and keep a restart-worthy
return value" situation.

um/x86: same ABI as x86, same solution...

xtensa: works.  regs->syscall is used, solution based on having the same
register used for syscall number and return value.

The sad part is, those "used to be broken" bits are about the pile of fixes
done about two years ago (some by me, some - by architecture maintainers).
Since then we got
	* arm64 - modelled after arm, inherited the fixes
	* c6x - stepped on that minefield
	* hexagon - stepped on that minefield
	* openrisc - stepped on that minefield
	* unicore32 - modelled after arm, inherited the fixes
and a bunch of embedded architectures are *still* broken the same way they
had been back then ;-/  Sigh...

  reply index

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-05 16:08 [PATCH v2 00/44] Meta Linux Kernel Port James Hogan
2012-12-05 16:08 ` [PATCH v2 01/44] asm-generic/io.h: remove asm/cacheflush.h include James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 02/44] asm-generic/unistd.h: handle symbol prefixes in cond_syscall James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 03/44] Add CONFIG_HAVE_64BIT_ALIGNED_STRUCT for taskstats James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-08  3:43   ` H. Peter Anvin
2012-12-10 10:22     ` James Hogan
2012-12-10 12:55       ` Geert Uytterhoeven
2012-12-10 12:55         ` Geert Uytterhoeven
2012-12-17  9:51         ` James Hogan
2012-12-17 19:11           ` David Miller
2012-12-05 16:08 ` [PATCH v2 04/44] trace/ring_buffer: handle 64bit aligned structs James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-08  1:24   ` Steven Rostedt
2012-12-10 10:27     ` James Hogan
2012-12-10 10:27       ` James Hogan
2012-12-05 16:08 ` [PATCH v2 05/44] Revert some of "binfmt_elf: cleanups" James Hogan
2012-12-05 16:08   ` James Hogan
     [not found] ` <1354723742-6195-1-git-send-email-james.hogan-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
2012-12-05 16:08   ` [PATCH v2 06/44] of/vendor-prefixes: add Imagination Technologies James Hogan
2012-12-05 16:08     ` James Hogan
2012-12-05 22:28     ` Grant Likely
2012-12-05 22:28       ` Grant Likely
2012-12-06  9:24       ` James Hogan
2012-12-05 16:08 ` [PATCH v2 07/44] metag: Add MAINTAINERS entry James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 08/44] metag: Headers for core arch constants James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 09/44] metag: Header for core memory mapped registers James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 10/44] metag: Boot James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 11/44] metag; TBX header James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 12/44] metag: TBX source James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 18:53   ` Joe Perches
2012-12-05 18:53     ` Joe Perches
2012-12-06  9:35     ` James Hogan
2012-12-06 12:59       ` Joe Perches
2012-12-06 15:03         ` James Hogan
2012-12-05 16:08 ` [PATCH v2 13/44] metag: Cache/TLB handling James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 14/44] metag: Memory management James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 15/44] metag: Memory handling James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 16/44] metag: Huge TLB James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 17/44] metag: Highmem support James Hogan
2012-12-05 16:08 ` [PATCH v2 18/44] metag: TCM support James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 19/44] metag: Signal handling James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 17:16   ` Al Viro
2012-12-06 11:17     ` James Hogan
2012-12-06 22:09       ` [braindump][RFC] signals and syscall restarts (Re: [PATCH v2 19/44] metag: Signal handling) Al Viro
2012-12-08  7:44         ` Al Viro [this message]
2012-12-15 16:26           ` Jonas Bonn
2012-12-15 17:07             ` Al Viro
2012-12-08 18:14         ` Al Viro
2012-12-08 18:14           ` Al Viro
2012-12-12  9:44           ` James Hogan
2012-12-10 10:40         ` James Hogan
2012-12-05 16:08 ` [PATCH v2 20/44] metag: Device tree James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 21/44] metag: ptrace James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 22/44] metag: Time keeping James Hogan
2013-01-04 10:05   ` Vineet Gupta
2013-01-04 12:21     ` James Hogan
2013-01-04 12:48       ` Vineet Gupta
2013-01-04 13:11         ` James Hogan
2012-12-05 16:08 ` [PATCH v2 23/44] metag: Traps James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 17:40   ` Al Viro
2012-12-06 11:43     ` James Hogan
2012-12-05 16:08 ` [PATCH v2 24/44] metag: IRQ handling James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 25/44] metag: System Calls James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 26/44] metag: Scheduling/Process management James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 27/44] metag: Module support James Hogan
2012-12-05 16:08 ` [PATCH v2 28/44] metag: Atomics, locks and bitops James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 29/44] metag: Basic documentation James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 30/44] metag: SMP support James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 31/44] metag: DMA James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 32/44] metag: Optimised library functions James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 33/44] metag: Stack unwinding James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 34/44] metag: Various other headers James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 35/44] mm: define VM_GROWSUP for CONFIG_METAG James Hogan
2012-12-05 16:08 ` [PATCH v2 36/44] Add metag to various Kconfig dependency lists James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 37/44] metag: Build infrastructure James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 38/44] metag: Perf James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 39/44] metag: ftrace support James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 40/44] scripts/checkstack.pl: Add metag support James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:08 ` [PATCH v2 41/44] metag: OProfile James Hogan
2012-12-05 16:08   ` James Hogan
2012-12-05 16:09 ` [PATCH v2 42/44] metag: Add JTAG Debug Adapter (DA) support James Hogan
2012-12-05 16:09 ` [PATCH v2 43/44] tty/metag_da: Add metag DA TTY driver James Hogan
2012-12-05 16:09   ` James Hogan
2012-12-05 17:24   ` Alan Cox
2013-01-04 14:11     ` James Hogan
2013-01-04 17:00       ` Alan Cox
2013-01-07 11:30         ` James Hogan
2013-01-07 11:54           ` Alan Cox
2012-12-05 16:09 ` [PATCH v2 44/44] fs: imgdafs: Add IMG DAFS filesystem for metag James Hogan
2012-12-05 17:11 ` [PATCH v2 00/44] Meta Linux Kernel Port Al Viro
2012-12-05 18:39   ` Al Viro
2012-12-18 16:09     ` James Hogan
2012-12-06  9:19   ` James Hogan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121208074429.GC4939@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=arnd@arndb.de \
    --cc=james.hogan@imgtec.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-arch Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arch/0 linux-arch/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arch linux-arch/ https://lore.kernel.org/linux-arch \
		linux-arch@vger.kernel.org
	public-inbox-index linux-arch

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-arch


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git