linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-m68k@lists.linux-m68k.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
	Greg Ungerer <gerg@linux-m68k.org>,
	linux-kernel@vger.kernel.org
Subject: [RFC][CFT] signal handling fixes
Date: Sun, 25 Jul 2021 17:18:15 +0000	[thread overview]
Message-ID: <YP2c1xk9LJ0zE3KW@zeniv-ca.linux.org.uk> (raw)

	Back in 2012 or so I'd found a bunch of fun issues with multiple
pending signals on a lot of architectures.  m68k looked scarier than
usual (due to the combination of variable-sized exception frames with the
way kernel stack pointer is handled by the hardware), but I'd convinced
myself that it had been correct.

	Unfortunately, I was wrong - handling of multiple pending signals
does *not* work correctly there.

	Some background: wrt exception stack frames m68k variants fall
into 3 groups -
	1) 68000 and near relatives (all non-MMU): push 32bit PC, then
push 16bit SR.
	2) everything later than that, except for coldfire: push a
variable amount of auxillary data (used for insn restart, among other
things), then 16bit value encoding the format (upper 4 bits) and vector
(lower 12), then same as for (1) - 32bit PC and 16bit SR.  Size of
variable part depends upon the frame type (upper 4 bits of frame/vector
word).	Note that CPU32 falls into that group, even though it's non-MMU.
	3) coldfire (both MMU and non-MMU): push 32bit PC, then 16bit SR,
then 16bit word superficially similar to format/vector combination on (2).

	Handling of (2) is complicated, since we need the right frame
type when we go from kernel to userland.  In particular, we want format 0
(8-byte) for entering a signal handler, no matter how did we enter the
kernel when we caught the signal.  Conversely, when we return from signal
handler, we have format 0 on kernel entry (sigreturn(2) is a syscall) and
we need whatever frame we used to have back when we'd caught the signal.

	The monumentally unpleasant part is that we *must* leave the
kernel mode with the same value of kernel stack pointer we had on entry.
Crossing from user to kernel mode does not set the kernel stack pointer
to known value - its value is kept since the last time we'd left the
kernel mode.

	The sigreturn part is ugly as hell.  Signal delivery avoids
quite that level of nastiness by the following trick: there's an int
(stkadj, initially 0) between the exception frame and the rest of pt_regs.
On the way back from exception we pop the registers, then add stkadj +
4 to stack pointer before doing RTE.  Normally stkadj remains zero;
if we need to shrink the exception stack frame, we put the minimal one
over the aux data and store the offset into stkadj.  When on the way out
we pop our way through the registers, we'll end up with stack pointer
pointing to stkadj (4 bytes below the original exception stack frame)
and once we add 4 + stkadj to stack pointer, we have the minimal exception
stack frame on top of stack and RTE does the right thing.

	The problem with that trick is that exception stack frame in
pt_regs is no longer valid; in the best case regs->format will match
the original exception stack frame format and in the worst case it'll
be overwritten by bits 31..28 of signal handler address (if aux data
used to be 4 bytes long).

	ptrace get_regs()/set_regs() takes that effect into account when
accessing PC and SR; unfortunately, setup_frame() and setup_rt_frame()
do not.  That's not a problem for the first signal - ->stkadj is still
0 at that point.  However, if we end up building multiple sigframes,
we might get screwed.  Not hard to fix, thankfully...

	Another bug is on the sigreturn side of things, and that one is
my fault - in bd6f56a75bb2 ("m68k: Missing syscall_trace() on sigreturn")
I'd missed the fact that we'd just relocated pt_regs, without having
updated ->thread.esp0.

	These two are, IMO, -stable fodder.  The third one isn't -
it cleans sigreturn, hopefully making it less convoluted.  Instead of
doing unnatural things to C stack frames, call chain, etc. we let the
asm wrapper of {rt_,}sigreturn(2) do the following:
	reserve a gap on stack, sufficiently large to hold any aux data
	then call the C side of things, passing pt_regs and switch_stack
pointers to it, same as we do now if C part decides to insert aux data,
	it can simply use memmove
to shift switch_stack + pt_regs and memcpy whatever's needed into the
created gap.  Then we can return without any kind of magic - the C part
of stack is intact.  Just return the new location of switch_stack +
pt_regs to the (asm) caller, so it could set stack pointer to it.

	The series is on top of 5.14-rc1; it lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #untested.m68k
Individual patches in followups...

	_Very_ lightly tested on aranym; no real hardware to test it on.
Any help with review and testing would be very welcome.

PS:  FWIW, ifdefs in arch/m68k/kernel/signal.c are wrong - it's not !MMU
vs. coldfire/MMU vs. classic/MMU.  It's actually 68000 vs. coldfire vs.
everything else.  These days it's nearly correct, but only because on MMU
variants of coldfire we never see exception stack frames with type other
than 4 - it's controlled by alignment of kernel stack pointer on those,
and it's under the kernel control, so it's always 32bit-aligned.  It used
to be more serious back when we had 68360 support - that's !MMU and exception
stack frames are like those on 68020, unless I'm misreading their manual...

             reply	other threads:[~2021-07-25 17:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-25 17:18 Al Viro [this message]
2021-07-25 17:19 ` [PATCH 1/3] m68k: handle arrivals of multiple signals correctly Al Viro
2021-09-15 22:08   ` Michael Schmitz
2021-07-25 17:19 ` [PATCH 2/3] m68k: update ->thread.esp0 before calling syscall_trace() in ret_from_signal Al Viro
2021-09-15 22:19   ` Michael Schmitz
2021-07-25 17:20 ` [PATCH 3/3] m68k: leave stack mangling to asm wrapper of sigreturn() Al Viro
2021-09-15 23:35   ` Michael Schmitz
2021-09-16  0:19     ` Al Viro
2021-09-16  0:53       ` Michael Schmitz
2021-09-16  3:21         ` Al Viro
2021-09-16  5:02           ` Michael Schmitz
2021-09-16 16:14             ` Al Viro
2021-07-27 10:21 ` [RFC][CFT] signal handling fixes Finn Thain
2021-07-27 14:42   ` Al Viro
2021-07-28  1:23     ` Finn Thain
2021-08-11  1:42     ` Finn Thain
2021-09-16  9:03 ` Finn Thain
2021-09-23 14:43   ` Geert Uytterhoeven
2021-09-23 14:45 ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YP2c1xk9LJ0zE3KW@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=geert@linux-m68k.org \
    --cc=gerg@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --subject='Re: [RFC][CFT] signal handling fixes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).