* [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn
@ 2019-01-16 16:47 Breno Leitao
2019-05-01 7:17 ` Michael Neuling
2019-05-03 6:59 ` Michael Ellerman
0 siblings, 2 replies; 3+ messages in thread
From: Breno Leitao @ 2019-01-16 16:47 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Breno Leitao, mikey, gromero
There is a kernel crash that happens if rt_sigreturn is called inside a
transactional block.
This crash happens if the kernel hits an in-kernel page fault when
accessing userspace memory, usually through copy_ckvsx_to_user(). A major
page fault calls might_sleep() function, which can cause a task reschedule.
A task reschedule (switch_to()) reclaim and recheckpoint the TM states,
but, in the signal return path, the checkpointed memory was already
reclaimed, thus the exception stack has MSR that points to MSR[TS]=0.
When the code returns from might_sleep() and a task reschedule happened,
then this task is returned with the memory recheckpointed, and
CPU MSR[TS] = suspended.
This means that there is a side effect at might_sleep() if it is called
with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0.
This side effect can cause a TM bad thing, since at the exception entrance,
the stack saves MSR[TS]=0, and this is what will be used at RFID, but,
the processor has MSR[TS] = Suspended, and this transition will be invalid
and a TM Bad thing will be raised, causing the following crash:
Unexpected TM Bad Thing exception at c00000000000e9ec (msr 0x8000000302a03031) tm_scratch=800000010280b033
cpu 0xc: Vector: 700 (Program Check) at [c00000003ff1fd70]
pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
lr: c000000000032948: handle_rt_signal64+0xb8/0xaf0
sp: c0000004263ebc40
msr: 8000000302a03031
current = 0xc000000415050300
paca = 0xc00000003ffc4080 irqmask: 0x03 irq_happened: 0x01
pid = 25006, comm = sigfuz
Linux version 5.0.0-rc1-00001-g3bd6e94bec12 (breno@debian) (gcc version 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019
WARNING: exception is not recoverable, can't continue
enter ? for help
[c0000004263ebc40] c000000000032948 handle_rt_signal64+0xb8/0xaf0 (unreliable)
[c0000004263ebd30] c000000000022780 do_notify_resume+0x2f0/0x430
[c0000004263ebe20] c00000000000e844 ret_from_except_lite+0x70/0x74
--- Exception: c00 (System Call) at 00007fffbaac400c
SP (7fffeca90f40) is in userspace
The solution for this problem is running the sigreturn code with
regs->msr[TS] disabled, thus, avoiding hitting the side effect above. This
does not seem to be a problem since regs->msr will be replaced by the
ucontext value, so, it is being flushed already. In this case, it is
flushed earlier.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/powerpc/kernel/signal_64.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 6794466f6420..06c299ef6132 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -565,7 +565,7 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
preempt_disable();
/* pull in MSR TS bits from user context */
- regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+ regs->msr |= msr & MSR_TS_MASK;
/*
* Ensure that TM is enabled in regs->msr before we leave the signal
@@ -745,6 +745,31 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (MSR_TM_SUSPENDED(mfmsr()))
tm_reclaim_current(0);
+ /*
+ * Disable MSR[TS] bit also, so, if there is an exception in the
+ * code below (as a page fault in copy_ckvsx_to_user()), it does
+ * not recheckpoint this task if there was a context switch inside
+ * the exception.
+ *
+ * A major page fault can indirectly call schedule(). A reschedule
+ * process in the middle of an exception can have a side effect
+ * (Changing the CPU MSR[TS] state), since schedule() is called
+ * with the CPU MSR[TS] disable and returns with MSR[TS]=Suspended
+ * (switch_to() calls tm_recheckpoint() for the 'new' process). In
+ * this case, the process continues to be the same in the CPU, but
+ * the CPU state just changed.
+ *
+ * This can cause a TM Bad Thing, since the MSR in the stack will
+ * have the MSR[TS]=0, and this is what will be used to RFID.
+ *
+ * Clearing MSR[TS] state here will avoid a recheckpoint if there
+ * is any process reschedule in kernel space. The MSR[TS] state
+ * does not need to be saved also, since it will be replaced with
+ * the MSR[TS] that came from user context later, at
+ * restore_tm_sigcontexts.
+ */
+ regs->msr &= ~MSR_TS_MASK;
+
if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
goto badframe;
if (MSR_TM_ACTIVE(msr)) {
--
2.19.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn
2019-01-16 16:47 [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn Breno Leitao
@ 2019-05-01 7:17 ` Michael Neuling
2019-05-03 6:59 ` Michael Ellerman
1 sibling, 0 replies; 3+ messages in thread
From: Michael Neuling @ 2019-05-01 7:17 UTC (permalink / raw)
To: Breno Leitao, linuxppc-dev; +Cc: gromero
On Wed, 2019-01-16 at 14:47 -0200, Breno Leitao wrote:
> There is a kernel crash that happens if rt_sigreturn is called inside a
> transactional block.
>
> This crash happens if the kernel hits an in-kernel page fault when
> accessing userspace memory, usually through copy_ckvsx_to_user(). A major
> page fault calls might_sleep() function, which can cause a task reschedule.
> A task reschedule (switch_to()) reclaim and recheckpoint the TM states,
> but, in the signal return path, the checkpointed memory was already
> reclaimed, thus the exception stack has MSR that points to MSR[TS]=0.
>
> When the code returns from might_sleep() and a task reschedule happened,
> then this task is returned with the memory recheckpointed, and
> CPU MSR[TS] = suspended.
>
> This means that there is a side effect at might_sleep() if it is called
> with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0.
>
> This side effect can cause a TM bad thing, since at the exception entrance,
> the stack saves MSR[TS]=0, and this is what will be used at RFID, but,
> the processor has MSR[TS] = Suspended, and this transition will be invalid
> and a TM Bad thing will be raised, causing the following crash:
>
> Unexpected TM Bad Thing exception at c00000000000e9ec (msr
> 0x8000000302a03031) tm_scratch=800000010280b033
> cpu 0xc: Vector: 700 (Program Check) at [c00000003ff1fd70]
> pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
> lr: c000000000032948: handle_rt_signal64+0xb8/0xaf0
> sp: c0000004263ebc40
> msr: 8000000302a03031
> current = 0xc000000415050300
> paca = 0xc00000003ffc4080 irqmask: 0x03 irq_happened: 0x01
> pid = 25006, comm = sigfuz
> Linux version 5.0.0-rc1-00001-g3bd6e94bec12 (breno@debian) (gcc version
> 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019
> WARNING: exception is not recoverable, can't continue
> enter ? for help
> [c0000004263ebc40] c000000000032948 handle_rt_signal64+0xb8/0xaf0
> (unreliable)
> [c0000004263ebd30] c000000000022780 do_notify_resume+0x2f0/0x430
> [c0000004263ebe20] c00000000000e844 ret_from_except_lite+0x70/0x74
> --- Exception: c00 (System Call) at 00007fffbaac400c
> SP (7fffeca90f40) is in userspace
>
> The solution for this problem is running the sigreturn code with
> regs->msr[TS] disabled, thus, avoiding hitting the side effect above. This
> does not seem to be a problem since regs->msr will be replaced by the
> ucontext value, so, it is being flushed already. In this case, it is
> flushed earlier.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Michael Neuling <mikey@neuling.org>
This still applies on powerpc/next so just acking rather than reposting
> ---
> arch/powerpc/kernel/signal_64.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 6794466f6420..06c299ef6132 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -565,7 +565,7 @@ static long restore_tm_sigcontexts(struct task_struct
> *tsk,
> preempt_disable();
>
> /* pull in MSR TS bits from user context */
> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> + regs->msr |= msr & MSR_TS_MASK;
>
> /*
> * Ensure that TM is enabled in regs->msr before we leave the signal
> @@ -745,6 +745,31 @@ SYSCALL_DEFINE0(rt_sigreturn)
> if (MSR_TM_SUSPENDED(mfmsr()))
> tm_reclaim_current(0);
>
> + /*
> + * Disable MSR[TS] bit also, so, if there is an exception in the
> + * code below (as a page fault in copy_ckvsx_to_user()), it does
> + * not recheckpoint this task if there was a context switch inside
> + * the exception.
> + *
> + * A major page fault can indirectly call schedule(). A reschedule
> + * process in the middle of an exception can have a side effect
> + * (Changing the CPU MSR[TS] state), since schedule() is called
> + * with the CPU MSR[TS] disable and returns with MSR[TS]=Suspended
> + * (switch_to() calls tm_recheckpoint() for the 'new' process). In
> + * this case, the process continues to be the same in the CPU, but
> + * the CPU state just changed.
> + *
> + * This can cause a TM Bad Thing, since the MSR in the stack will
> + * have the MSR[TS]=0, and this is what will be used to RFID.
> + *
> + * Clearing MSR[TS] state here will avoid a recheckpoint if there
> + * is any process reschedule in kernel space. The MSR[TS] state
> + * does not need to be saved also, since it will be replaced with
> + * the MSR[TS] that came from user context later, at
> + * restore_tm_sigcontexts.
> + */
> + regs->msr &= ~MSR_TS_MASK;
> +
> if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
> goto badframe;
> if (MSR_TM_ACTIVE(msr)) {
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn
2019-01-16 16:47 [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn Breno Leitao
2019-05-01 7:17 ` Michael Neuling
@ 2019-05-03 6:59 ` Michael Ellerman
1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2019-05-03 6:59 UTC (permalink / raw)
To: Breno Leitao, linuxppc-dev; +Cc: Breno Leitao, mikey, gromero
On Wed, 2019-01-16 at 16:47:44 UTC, Breno Leitao wrote:
> There is a kernel crash that happens if rt_sigreturn is called inside a
> transactional block.
>
> This crash happens if the kernel hits an in-kernel page fault when
> accessing userspace memory, usually through copy_ckvsx_to_user(). A major
> page fault calls might_sleep() function, which can cause a task reschedule.
> A task reschedule (switch_to()) reclaim and recheckpoint the TM states,
> but, in the signal return path, the checkpointed memory was already
> reclaimed, thus the exception stack has MSR that points to MSR[TS]=0.
>
> When the code returns from might_sleep() and a task reschedule happened,
> then this task is returned with the memory recheckpointed, and
> CPU MSR[TS] = suspended.
>
> This means that there is a side effect at might_sleep() if it is called
> with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0.
>
> This side effect can cause a TM bad thing, since at the exception entrance,
> the stack saves MSR[TS]=0, and this is what will be used at RFID, but,
> the processor has MSR[TS] = Suspended, and this transition will be invalid
> and a TM Bad thing will be raised, causing the following crash:
>
> Unexpected TM Bad Thing exception at c00000000000e9ec (msr 0x8000000302a03031) tm_scratch=800000010280b033
> cpu 0xc: Vector: 700 (Program Check) at [c00000003ff1fd70]
> pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
> lr: c000000000032948: handle_rt_signal64+0xb8/0xaf0
> sp: c0000004263ebc40
> msr: 8000000302a03031
> current = 0xc000000415050300
> paca = 0xc00000003ffc4080 irqmask: 0x03 irq_happened: 0x01
> pid = 25006, comm = sigfuz
> Linux version 5.0.0-rc1-00001-g3bd6e94bec12 (breno@debian) (gcc version 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019
> WARNING: exception is not recoverable, can't continue
> enter ? for help
> [c0000004263ebc40] c000000000032948 handle_rt_signal64+0xb8/0xaf0 (unreliable)
> [c0000004263ebd30] c000000000022780 do_notify_resume+0x2f0/0x430
> [c0000004263ebe20] c00000000000e844 ret_from_except_lite+0x70/0x74
> --- Exception: c00 (System Call) at 00007fffbaac400c
> SP (7fffeca90f40) is in userspace
>
> The solution for this problem is running the sigreturn code with
> regs->msr[TS] disabled, thus, avoiding hitting the side effect above. This
> does not seem to be a problem since regs->msr will be replaced by the
> ucontext value, so, it is being flushed already. In this case, it is
> flushed earlier.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Acked-by: Michael Neuling <mikey@neuling.org>
Applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/e620d45065c7b5b8d6ae11217c09c093
cheers
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-05-03 7:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 16:47 [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn Breno Leitao
2019-05-01 7:17 ` Michael Neuling
2019-05-03 6:59 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).