All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
@ 2018-11-19 12:44 ` Breno Leitao
  0 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2018-11-19 12:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mikey, mpe, gromero, v3.9+, Breno Leitao

On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.

At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.

This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.

The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called.  More
importantly, without TEXAR[FS] bit set also.

Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().

	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.

With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.

It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set, as done in this patch.

Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..15b153bdd826 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 	if (MSR_TM_RESV(msr))
 		return -EINVAL;
 
-	/* pull in MSR TS bits from user context */
-	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
-	/*
-	 * Ensure that TM is enabled in regs->msr before we leave the signal
-	 * handler. It could be the case that (a) user disabled the TM bit
-	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
-	 * TM bit was disabled because a sufficient number of context switches
-	 * happened whilst in the signal handler and load_tm overflowed,
-	 * disabling the TM bit. In either case we can end up with an illegal
-	 * TM state leading to a TM Bad Thing when we return to userspace.
-	 */
-	regs->msr |= MSR_TM;
-
 	/* pull in MSR LE from user context */
 	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
 
@@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 	tm_enable();
 	/* Make sure the transaction is marked as failed */
 	tsk->thread.tm_texasr |= TEXASR_FS;
+
+	/* pull in MSR TS bits from user context */
+	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+	/*
+	 * Ensure that TM is enabled in regs->msr before we leave the signal
+	 * handler. It could be the case that (a) user disabled the TM bit
+	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
+	 * TM bit was disabled because a sufficient number of context switches
+	 * happened whilst in the signal handler and load_tm overflowed,
+	 * disabling the TM bit. In either case we can end up with an illegal
+	 * TM state leading to a TM Bad Thing when we return to userspace.
+	 */
+	regs->msr |= MSR_TM;
+
 	/* This loads the checkpointed FP/VEC state, if used */
 	tm_recheckpoint(&tsk->thread);
 
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
@ 2018-11-19 12:44 ` Breno Leitao
  0 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2018-11-19 12:44 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Breno Leitao, mikey, v3.9+, gromero

On a signal handler return, the user could set a context with MSR[TS] bits
set, and these bits would be copied to task regs->msr.

At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
several __get_user() are called and then a recheckpoint is executed.

This is a problem since a page fault (in kernel space) could happen when
calling __get_user(). If it happens, the process MSR[TS] bits were
already set, but recheckpoint was not executed, and SPRs are still invalid.

The page fault can cause the current process to be de-scheduled, with
MSR[TS] active and without tm_recheckpoint() being called.  More
importantly, without TEXAR[FS] bit set also.

Since TEXASR might not have the FS bit set, and when the process is
scheduled back, it will try to reclaim, which will be aborted because of
the CPU is not in the suspended state, and, then, recheckpoint. This
recheckpoint will restore thread->texasr into TEXASR SPR, which might be
zero, hitting a BUG_ON().

	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

This patch simply delays the MSR[TS] set, so, if there is any page fault in
the __get_user() section, it does not have regs->msr[TS] set, since the TM
structures are still invalid, thus avoiding doing TM operations for
in-kernel exceptions and possible process reschedule.

With this patch, the MSR[TS] will only be set just before recheckpointing
and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
invalid state.

It is not possible to move tm_recheckpoint to happen earlier, because it is
required to get the checkpointed registers from userspace, with
__get_user(), thus, the only way to avoid this undesired behavior is
delaying the MSR[TS] set, as done in this patch.

Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
Cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 83d51bf586c7..15b153bdd826 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 	if (MSR_TM_RESV(msr))
 		return -EINVAL;
 
-	/* pull in MSR TS bits from user context */
-	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
-
-	/*
-	 * Ensure that TM is enabled in regs->msr before we leave the signal
-	 * handler. It could be the case that (a) user disabled the TM bit
-	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
-	 * TM bit was disabled because a sufficient number of context switches
-	 * happened whilst in the signal handler and load_tm overflowed,
-	 * disabling the TM bit. In either case we can end up with an illegal
-	 * TM state leading to a TM Bad Thing when we return to userspace.
-	 */
-	regs->msr |= MSR_TM;
-
 	/* pull in MSR LE from user context */
 	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
 
@@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 	tm_enable();
 	/* Make sure the transaction is marked as failed */
 	tsk->thread.tm_texasr |= TEXASR_FS;
+
+	/* pull in MSR TS bits from user context */
+	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
+
+	/*
+	 * Ensure that TM is enabled in regs->msr before we leave the signal
+	 * handler. It could be the case that (a) user disabled the TM bit
+	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
+	 * TM bit was disabled because a sufficient number of context switches
+	 * happened whilst in the signal handler and load_tm overflowed,
+	 * disabling the TM bit. In either case we can end up with an illegal
+	 * TM state leading to a TM Bad Thing when we return to userspace.
+	 */
+	regs->msr |= MSR_TM;
+
 	/* This loads the checkpointed FP/VEC state, if used */
 	tm_recheckpoint(&tsk->thread);
 
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
  2018-11-19 12:44 ` Breno Leitao
  (?)
@ 2018-11-19 23:30 ` Michael Neuling
  2018-11-21 18:27   ` Breno Leitao
  -1 siblings, 1 reply; 10+ messages in thread
From: Michael Neuling @ 2018-11-19 23:30 UTC (permalink / raw)
  To: Breno Leitao, linuxppc-dev; +Cc: Gustavo Romero

On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
> 
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.

Do we have the same problem on the entry side?  There are a bunch of
copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
there some chance of the same problem there?

> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
> 
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called.  More
> importantly, without TEXAR[FS] bit set also.

This patch is great and should go in ASAP 

> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
> 
> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

Can you put more of the backtrace here?  Can be useful for people searching for
a similar problem.

> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.

Can you put a comment in the code what says after the MSR[TS] setting, there can
be no get/put_user before the recheckpoint? 

Also, it looks like the 32bit signals case is safe, but please add a comment in
there too.

> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.
> 
> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.
> 
> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: stable@vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	if (MSR_TM_RESV(msr))
>  		return -EINVAL;
>  
> -	/* pull in MSR TS bits from user context */
> -	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> -	/*
> -	 * Ensure that TM is enabled in regs->msr before we leave the signal
> -	 * handler. It could be the case that (a) user disabled the TM bit
> -	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
> -	 * TM bit was disabled because a sufficient number of context switches
> -	 * happened whilst in the signal handler and load_tm overflowed,
> -	 * disabling the TM bit. In either case we can end up with an illegal
> -	 * TM state leading to a TM Bad Thing when we return to userspace.
> -	 */
> -	regs->msr |= MSR_TM;
> -
>  	/* pull in MSR LE from user context */
>  	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>  
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	tm_enable();
>  	/* Make sure the transaction is marked as failed */
>  	tsk->thread.tm_texasr |= TEXASR_FS;
> +
> +	/* pull in MSR TS bits from user context */
> +	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> +	/*
> +	 * Ensure that TM is enabled in regs->msr before we leave the signal
> +	 * handler. It could be the case that (a) user disabled the TM bit
> +	 * through the manipulation of ararararthe MSR bits in uc_mcontext or (b) the
> +	 * TM bit was disabled because a sufficient number of context switches
> +	 * happened whilst in the signal handler and load_tm overflowed,
> +	 * disabling the TM bit. In either case we can end up with an illegal
> +	 * TM state leading to a TM Bad Thing when we return to userspace.
> +	 */
> +	regs->msr |= MSR_TM;
> +
>  	/* This loads the checkpointed FP/VEC state, if used */
>  	tm_recheckpoint(&tsk->thread);
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
  2018-11-19 12:44 ` Breno Leitao
@ 2018-11-20 10:34   ` Michael Ellerman
  -1 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2018-11-20 10:34 UTC (permalink / raw)
  To: Breno Leitao, linuxppc-dev; +Cc: mikey, gromero, v3.9+, Breno Leitao

Hi Breno,

Thanks for chasing this one down.

Breno Leitao <leitao@debian.org> writes:

> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
>
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.
>
> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
>
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called.  More
> importantly, without TEXAR[FS] bit set also.
>
> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
>
> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

As Mikey said, would be good to have at least the stack trace & NIP
here, if not the full oops.

> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.
>
> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.

To make this safe when PREEMPT is enabled we need to preempt_disable() /
enable() around the setting of regs->msr and the recheckpoint.

That could also serve as nice documentation.

I guess the other question is whether it should be the job of
tm_recheckpoint() to set regs->msr, given that it already hard disables
interrupts.

eg. we'd set the TM flags in a local msr variable and pass the to
tm_recheckpoint(), it would then assign that to regs->msr in the IRQ
disabled section. Though there's many callers of tm_recheckpoint() that
don't need that behaviour, so it would probably need to be factored out.

> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.

I think the root cause here is that we're copying into the live regs of
current. That has obviously worked in the past, because the register
state wasn't used until we returned back to userspace. But that's no
longer true with TM. And even so it's quite subtle. I also suspect some
of our FP/VEC handling may not work correctly if we're scheduled part
way through restoring the regs.

What might work better is if we copy all the regs into temporary space
and then with interrupts disabled we copy them into the task. That way
we should never be scheduled with a half-populated set of regs.

That's obviously a much bigger patch though and something we'll have to
do later.

> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: stable@vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	if (MSR_TM_RESV(msr))
>  		return -EINVAL;
>  
> -	/* pull in MSR TS bits from user context */
> -	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> -	/*
> -	 * Ensure that TM is enabled in regs->msr before we leave the signal
> -	 * handler. It could be the case that (a) user disabled the TM bit
> -	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
> -	 * TM bit was disabled because a sufficient number of context switches
> -	 * happened whilst in the signal handler and load_tm overflowed,
> -	 * disabling the TM bit. In either case we can end up with an illegal
> -	 * TM state leading to a TM Bad Thing when we return to userspace.
> -	 */
> -	regs->msr |= MSR_TM;
> -
>  	/* pull in MSR LE from user context */
>  	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>  
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	tm_enable();
>  	/* Make sure the transaction is marked as failed */
>  	tsk->thread.tm_texasr |= TEXASR_FS;
> +

	preempt_disable();

> +	/* pull in MSR TS bits from user context */
> +	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> +	/*
> +	 * Ensure that TM is enabled in regs->msr before we leave the signal
> +	 * handler. It could be the case that (a) user disabled the TM bit
> +	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
> +	 * TM bit was disabled because a sufficient number of context switches
> +	 * happened whilst in the signal handler and load_tm overflowed,
> +	 * disabling the TM bit. In either case we can end up with an illegal
> +	 * TM state leading to a TM Bad Thing when we return to userspace.
> +	 */
> +	regs->msr |= MSR_TM;
> +
>  	/* This loads the checkpointed FP/VEC state, if used */
>  	tm_recheckpoint(&tsk->thread);
> 
	preempt_enable();

Although looking at the code that follows, it probably won't cope with
being preempted either. So the preempt_enable() should probably go at
the end of the function.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
@ 2018-11-20 10:34   ` Michael Ellerman
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2018-11-20 10:34 UTC (permalink / raw)
  To: Breno Leitao, linuxppc-dev; +Cc: Breno Leitao, mikey, v3.9+, gromero

Hi Breno,

Thanks for chasing this one down.

Breno Leitao <leitao@debian.org> writes:

> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
>
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.
>
> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
>
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called.  More
> importantly, without TEXAR[FS] bit set also.
>
> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
>
> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

As Mikey said, would be good to have at least the stack trace & NIP
here, if not the full oops.

> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.
>
> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.

To make this safe when PREEMPT is enabled we need to preempt_disable() /
enable() around the setting of regs->msr and the recheckpoint.

That could also serve as nice documentation.

I guess the other question is whether it should be the job of
tm_recheckpoint() to set regs->msr, given that it already hard disables
interrupts.

eg. we'd set the TM flags in a local msr variable and pass the to
tm_recheckpoint(), it would then assign that to regs->msr in the IRQ
disabled section. Though there's many callers of tm_recheckpoint() that
don't need that behaviour, so it would probably need to be factored out.

> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.

I think the root cause here is that we're copying into the live regs of
current. That has obviously worked in the past, because the register
state wasn't used until we returned back to userspace. But that's no
longer true with TM. And even so it's quite subtle. I also suspect some
of our FP/VEC handling may not work correctly if we're scheduled part
way through restoring the regs.

What might work better is if we copy all the regs into temporary space
and then with interrupts disabled we copy them into the task. That way
we should never be scheduled with a half-populated set of regs.

That's obviously a much bigger patch though and something we'll have to
do later.

> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: stable@vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	if (MSR_TM_RESV(msr))
>  		return -EINVAL;
>  
> -	/* pull in MSR TS bits from user context */
> -	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> -	/*
> -	 * Ensure that TM is enabled in regs->msr before we leave the signal
> -	 * handler. It could be the case that (a) user disabled the TM bit
> -	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
> -	 * TM bit was disabled because a sufficient number of context switches
> -	 * happened whilst in the signal handler and load_tm overflowed,
> -	 * disabling the TM bit. In either case we can end up with an illegal
> -	 * TM state leading to a TM Bad Thing when we return to userspace.
> -	 */
> -	regs->msr |= MSR_TM;
> -
>  	/* pull in MSR LE from user context */
>  	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>  
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	tm_enable();
>  	/* Make sure the transaction is marked as failed */
>  	tsk->thread.tm_texasr |= TEXASR_FS;
> +

	preempt_disable();

> +	/* pull in MSR TS bits from user context */
> +	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> +	/*
> +	 * Ensure that TM is enabled in regs->msr before we leave the signal
> +	 * handler. It could be the case that (a) user disabled the TM bit
> +	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
> +	 * TM bit was disabled because a sufficient number of context switches
> +	 * happened whilst in the signal handler and load_tm overflowed,
> +	 * disabling the TM bit. In either case we can end up with an illegal
> +	 * TM state leading to a TM Bad Thing when we return to userspace.
> +	 */
> +	regs->msr |= MSR_TM;
> +
>  	/* This loads the checkpointed FP/VEC state, if used */
>  	tm_recheckpoint(&tsk->thread);
> 
	preempt_enable();

Although looking at the code that follows, it probably won't cope with
being preempted either. So the preempt_enable() should probably go at
the end of the function.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
  2018-11-19 23:30 ` Michael Neuling
@ 2018-11-21 18:27   ` Breno Leitao
  2018-11-21 22:42     ` Michael Neuling
  0 siblings, 1 reply; 10+ messages in thread
From: Breno Leitao @ 2018-11-21 18:27 UTC (permalink / raw)
  To: Michael Neuling, linuxppc-dev; +Cc: Gustavo Romero

hi Mikey,

On 11/19/18 9:30 PM, Michael Neuling wrote:
> On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
>> On a signal handler return, the user could set a context with MSR[TS] bits
>> set, and these bits would be copied to task regs->msr.
>>
>> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
>> several __get_user() are called and then a recheckpoint is executed.
> 
> Do we have the same problem on the entry side?  There are a bunch of
> copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
> there some chance of the same problem there?

Do you mean in this part of code?

  SYSCALL_DEFINE0(rt_sigreturn)
  {
	....
        if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
                goto badframe;

	...
        if (MSR_TM_SUSPENDED(mfmsr()))
                tm_reclaim_current(0);

I suspect it is not a problem, since a page fault here will cause the process
to be reschedule, and tm-reclaim and tm_recheckpoint will happen, and save
and restore the checkpointed/live registers before this tm_reclaim_current(0).

I also forced a process reschedule (calling schedule()) just after
copy_from_user() and I didn't see it breaking anything. I might spent more
timing thinking about this whole get_user() at signal handlers and I will
double check it.

>> This is a problem since a page fault (in kernel space) could happen when
>> calling __get_user(). If it happens, the process MSR[TS] bits were
>> already set, but recheckpoint was not executed, and SPRs are still invalid.
>>
>> The page fault can cause the current process to be de-scheduled, with
>> MSR[TS] active and without tm_recheckpoint() being called.  More
>> importantly, without TEXAR[FS] bit set also.
> 
> This patch is great and should go in ASAP 
> 
>> Since TEXASR might not have the FS bit set, and when the process is
>> scheduled back, it will try to reclaim, which will be aborted because of
>> the CPU is not in the suspended state, and, then, recheckpoint. This
>> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
>> zero, hitting a BUG_ON().
>>
>> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
> 
> Can you put more of the backtrace here?  Can be useful for people searching for
> a similar problem.

Ack!

> 
>> This patch simply delays the MSR[TS] set, so, if there is any page fault in
>> the __get_user() section, it does not have regs->msr[TS] set, since the TM
>> structures are still invalid, thus avoiding doing TM operations for
>> in-kernel exceptions and possible process reschedule.
> 
> Can you put a comment in the code what says after the MSR[TS] setting, there can
> be no get/put_user before the recheckpoint?

Sure. I will circle this code with a preempt_disable/enable() and put a
comment there as well.

> Also, it looks like the 32bit signals case is safe, but please add a comment in
> there too.

Yes, 32-bits seems to be safe, but, I will do the same as above, adding
preempt disable/enable and warn about possible interrupts in-between.

Thank you!
Breno

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
  2018-11-20 10:34   ` Michael Ellerman
@ 2018-11-21 18:32     ` Breno Leitao
  -1 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2018-11-21 18:32 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: mikey, gromero, v3.9+

hi Michael,

On 11/20/18 8:34 AM, Michael Ellerman wrote:
> Hi Breno,
> 
> Thanks for chasing this one down.
> 
> Breno Leitao <leitao@debian.org> writes:
> 
>> On a signal handler return, the user could set a context with MSR[TS] bits
>> set, and these bits would be copied to task regs->msr.
>>
>> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
>> several __get_user() are called and then a recheckpoint is executed.
>>
>> This is a problem since a page fault (in kernel space) could happen when
>> calling __get_user(). If it happens, the process MSR[TS] bits were
>> already set, but recheckpoint was not executed, and SPRs are still invalid.
>>
>> The page fault can cause the current process to be de-scheduled, with
>> MSR[TS] active and without tm_recheckpoint() being called.  More
>> importantly, without TEXAR[FS] bit set also.
>>
>> Since TEXASR might not have the FS bit set, and when the process is
>> scheduled back, it will try to reclaim, which will be aborted because of
>> the CPU is not in the suspended state, and, then, recheckpoint. This
>> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
>> zero, hitting a BUG_ON().
>>
>> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
> 
> As Mikey said, would be good to have at least the stack trace & NIP
> here, if not the full oops.

Ack!

>> This patch simply delays the MSR[TS] set, so, if there is any page fault in
>> the __get_user() section, it does not have regs->msr[TS] set, since the TM
>> structures are still invalid, thus avoiding doing TM operations for
>> in-kernel exceptions and possible process reschedule.
>>
>> With this patch, the MSR[TS] will only be set just before recheckpointing
>> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
>> invalid state.
> 
> To make this safe when PREEMPT is enabled we need to preempt_disable() /
> enable() around the setting of regs->msr and the recheckpoint.
> 
> That could also serve as nice documentation.
> 
> I guess the other question is whether it should be the job of
> tm_recheckpoint() to set regs->msr, given that it already hard disables
> interrupts.
> 
> eg. we'd set the TM flags in a local msr variable and pass the to
> tm_recheckpoint(), it would then assign that to regs->msr in the IRQ
> disabled section. Though there's many callers of tm_recheckpoint() that
> don't need that behaviour, so it would probably need to be factored out.

Right, that might be doable, but I am wondering if it isn't better to create
a new function that does it as below:

void tm_set_msr_and_recheckpoint(struct pt_regs *regs, u64 msr)
{
       preempt_disable();
       regs->msr = msr;
       tm_recheckpoint();
       preempt_enable();
}

I understand that preempt_disable() does a better work disabling preemption
compared to disabling IRQ. Also, it does not change the API just for this
very specific signal case.

>> It is not possible to move tm_recheckpoint to happen earlier, because it is
>> required to get the checkpointed registers from userspace, with
>> __get_user(), thus, the only way to avoid this undesired behavior is
>> delaying the MSR[TS] set, as done in this patch.
> 
> I think the root cause here is that we're copying into the live regs of
> current. That has obviously worked in the past, because the register
> state wasn't used until we returned back to userspace. But that's no
> longer true with TM. And even so it's quite subtle. I also suspect some
> of our FP/VEC handling may not work correctly if we're scheduled part
> way through restoring the regs.

I got your point and I think we have a risk of corrupting the regs if MSR[TS]
is set and we do page_fault->recheckpoint/recheclaim in the middle of
__get_user() chunks.

> What might work better is if we copy all the regs into temporary space
> and then with interrupts disabled we copy them into the task. That way
> we should never be scheduled with a half-populated set of regs.

Yes, that seems to be the best strategy, and I might be interested in working
on it.

> That's obviously a much bigger patch though and something we'll have to
> do later.
> 
>> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
>> Cc: stable@vger.kernel.org (v3.9+)
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>>  arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>>  1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
>> index 83d51bf586c7..15b153bdd826 100644
>> --- a/arch/powerpc/kernel/signal_64.c
>> +++ b/arch/powerpc/kernel/signal_64.c
>> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>>  	if (MSR_TM_RESV(msr))
>>  		return -EINVAL;
>>  
>> -	/* pull in MSR TS bits from user context */
>> -	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
>> -
>> -	/*
>> -	 * Ensure that TM is enabled in regs->msr before we leave the signal
>> -	 * handler. It could be the case that (a) user disabled the TM bit
>> -	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
>> -	 * TM bit was disabled because a sufficient number of context switches
>> -	 * happened whilst in the signal handler and load_tm overflowed,
>> -	 * disabling the TM bit. In either case we can end up with an illegal
>> -	 * TM state leading to a TM Bad Thing when we return to userspace.
>> -	 */
>> -	regs->msr |= MSR_TM;
>> -
>>  	/* pull in MSR LE from user context */
>>  	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>>  
>> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>>  	tm_enable();
>>  	/* Make sure the transaction is marked as failed */
>>  	tsk->thread.tm_texasr |= TEXASR_FS;
>> +
> 
> 	preempt_disable();
> 
>> +	/* pull in MSR TS bits from user context */
>> +	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
>> +
>> +	/*
>> +	 * Ensure that TM is enabled in regs->msr before we leave the signal
>> +	 * handler. It could be the case that (a) user disabled the TM bit
>> +	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
>> +	 * TM bit was disabled because a sufficient number of context switches
>> +	 * happened whilst in the signal handler and load_tm overflowed,
>> +	 * disabling the TM bit. In either case we can end up with an illegal
>> +	 * TM state leading to a TM Bad Thing when we return to userspace.
>> +	 */
>> +	regs->msr |= MSR_TM;
>> +
>>  	/* This loads the checkpointed FP/VEC state, if used */
>>  	tm_recheckpoint(&tsk->thread);
>>
> 	preempt_enable();
> 
> Although looking at the code that follows, it probably won't cope with
> being preempted either. So the preempt_enable() should probably go at
> the end of the function.

Why? I confess I do not understand the preempt mechanism a lot. Does a
preemption save/restore FP and vector states when it is preempted?

What is the code/IRQ that is executed when there is a preemption? I am
wondering if a preempt happens because the  Decrementer (0x900) IRQ  happens
and a syscall is being executed, thus, saving the whole context and then
restoring it. If my guess is correct, the IRQ might not save the FP/VEC
states, thus, your concern about ahving load_fp_state() after a preemption.

Let me paste the "patched" code here to help with the discussion:

        regs->msr |= MSR_TM;

        /* This loads the checkpointed FP/VEC state, if used */
        tm_recheckpoint(&tsk->thread);

        msr_check_and_set(msr & (MSR_FP | MSR_VEC));
        if (msr & MSR_FP) {
                load_fp_state(&tsk->thread.fp_state);
                regs->msr |= (MSR_FP | tsk->thread.fpexc_mode);
        }
        if (msr & MSR_VEC) {
                load_vr_state(&tsk->thread.vr_state);
                regs->msr |= MSR_VEC;
        }

Thanks
Breno

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
@ 2018-11-21 18:32     ` Breno Leitao
  0 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2018-11-21 18:32 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: mikey, v3.9+, gromero

hi Michael,

On 11/20/18 8:34 AM, Michael Ellerman wrote:
> Hi Breno,
> 
> Thanks for chasing this one down.
> 
> Breno Leitao <leitao@debian.org> writes:
> 
>> On a signal handler return, the user could set a context with MSR[TS] bits
>> set, and these bits would be copied to task regs->msr.
>>
>> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
>> several __get_user() are called and then a recheckpoint is executed.
>>
>> This is a problem since a page fault (in kernel space) could happen when
>> calling __get_user(). If it happens, the process MSR[TS] bits were
>> already set, but recheckpoint was not executed, and SPRs are still invalid.
>>
>> The page fault can cause the current process to be de-scheduled, with
>> MSR[TS] active and without tm_recheckpoint() being called.  More
>> importantly, without TEXAR[FS] bit set also.
>>
>> Since TEXASR might not have the FS bit set, and when the process is
>> scheduled back, it will try to reclaim, which will be aborted because of
>> the CPU is not in the suspended state, and, then, recheckpoint. This
>> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
>> zero, hitting a BUG_ON().
>>
>> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
> 
> As Mikey said, would be good to have at least the stack trace & NIP
> here, if not the full oops.

Ack!

>> This patch simply delays the MSR[TS] set, so, if there is any page fault in
>> the __get_user() section, it does not have regs->msr[TS] set, since the TM
>> structures are still invalid, thus avoiding doing TM operations for
>> in-kernel exceptions and possible process reschedule.
>>
>> With this patch, the MSR[TS] will only be set just before recheckpointing
>> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
>> invalid state.
> 
> To make this safe when PREEMPT is enabled we need to preempt_disable() /
> enable() around the setting of regs->msr and the recheckpoint.
> 
> That could also serve as nice documentation.
> 
> I guess the other question is whether it should be the job of
> tm_recheckpoint() to set regs->msr, given that it already hard disables
> interrupts.
> 
> eg. we'd set the TM flags in a local msr variable and pass the to
> tm_recheckpoint(), it would then assign that to regs->msr in the IRQ
> disabled section. Though there's many callers of tm_recheckpoint() that
> don't need that behaviour, so it would probably need to be factored out.

Right, that might be doable, but I am wondering if it isn't better to create
a new function that does it as below:

void tm_set_msr_and_recheckpoint(struct pt_regs *regs, u64 msr)
{
       preempt_disable();
       regs->msr = msr;
       tm_recheckpoint();
       preempt_enable();
}

I understand that preempt_disable() does a better work disabling preemption
compared to disabling IRQ. Also, it does not change the API just for this
very specific signal case.

>> It is not possible to move tm_recheckpoint to happen earlier, because it is
>> required to get the checkpointed registers from userspace, with
>> __get_user(), thus, the only way to avoid this undesired behavior is
>> delaying the MSR[TS] set, as done in this patch.
> 
> I think the root cause here is that we're copying into the live regs of
> current. That has obviously worked in the past, because the register
> state wasn't used until we returned back to userspace. But that's no
> longer true with TM. And even so it's quite subtle. I also suspect some
> of our FP/VEC handling may not work correctly if we're scheduled part
> way through restoring the regs.

I got your point and I think we have a risk of corrupting the regs if MSR[TS]
is set and we do page_fault->recheckpoint/recheclaim in the middle of
__get_user() chunks.

> What might work better is if we copy all the regs into temporary space
> and then with interrupts disabled we copy them into the task. That way
> we should never be scheduled with a half-populated set of regs.

Yes, that seems to be the best strategy, and I might be interested in working
on it.

> That's obviously a much bigger patch though and something we'll have to
> do later.
> 
>> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
>> Cc: stable@vger.kernel.org (v3.9+)
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>>  arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>>  1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
>> index 83d51bf586c7..15b153bdd826 100644
>> --- a/arch/powerpc/kernel/signal_64.c
>> +++ b/arch/powerpc/kernel/signal_64.c
>> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>>  	if (MSR_TM_RESV(msr))
>>  		return -EINVAL;
>>  
>> -	/* pull in MSR TS bits from user context */
>> -	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
>> -
>> -	/*
>> -	 * Ensure that TM is enabled in regs->msr before we leave the signal
>> -	 * handler. It could be the case that (a) user disabled the TM bit
>> -	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
>> -	 * TM bit was disabled because a sufficient number of context switches
>> -	 * happened whilst in the signal handler and load_tm overflowed,
>> -	 * disabling the TM bit. In either case we can end up with an illegal
>> -	 * TM state leading to a TM Bad Thing when we return to userspace.
>> -	 */
>> -	regs->msr |= MSR_TM;
>> -
>>  	/* pull in MSR LE from user context */
>>  	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>>  
>> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>>  	tm_enable();
>>  	/* Make sure the transaction is marked as failed */
>>  	tsk->thread.tm_texasr |= TEXASR_FS;
>> +
> 
> 	preempt_disable();
> 
>> +	/* pull in MSR TS bits from user context */
>> +	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
>> +
>> +	/*
>> +	 * Ensure that TM is enabled in regs->msr before we leave the signal
>> +	 * handler. It could be the case that (a) user disabled the TM bit
>> +	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
>> +	 * TM bit was disabled because a sufficient number of context switches
>> +	 * happened whilst in the signal handler and load_tm overflowed,
>> +	 * disabling the TM bit. In either case we can end up with an illegal
>> +	 * TM state leading to a TM Bad Thing when we return to userspace.
>> +	 */
>> +	regs->msr |= MSR_TM;
>> +
>>  	/* This loads the checkpointed FP/VEC state, if used */
>>  	tm_recheckpoint(&tsk->thread);
>>
> 	preempt_enable();
> 
> Although looking at the code that follows, it probably won't cope with
> being preempted either. So the preempt_enable() should probably go at
> the end of the function.

Why? I confess I do not understand the preempt mechanism a lot. Does a
preemption save/restore FP and vector states when it is preempted?

What is the code/IRQ that is executed when there is a preemption? I am
wondering if a preempt happens because the  Decrementer (0x900) IRQ  happens
and a syscall is being executed, thus, saving the whole context and then
restoring it. If my guess is correct, the IRQ might not save the FP/VEC
states, thus, your concern about ahving load_fp_state() after a preemption.

Let me paste the "patched" code here to help with the discussion:

        regs->msr |= MSR_TM;

        /* This loads the checkpointed FP/VEC state, if used */
        tm_recheckpoint(&tsk->thread);

        msr_check_and_set(msr & (MSR_FP | MSR_VEC));
        if (msr & MSR_FP) {
                load_fp_state(&tsk->thread.fp_state);
                regs->msr |= (MSR_FP | tsk->thread.fpexc_mode);
        }
        if (msr & MSR_VEC) {
                load_vr_state(&tsk->thread.vr_state);
                regs->msr |= MSR_VEC;
        }

Thanks
Breno

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
  2018-11-21 18:27   ` Breno Leitao
@ 2018-11-21 22:42     ` Michael Neuling
  2018-11-22 12:46       ` Breno Leitao
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Neuling @ 2018-11-21 22:42 UTC (permalink / raw)
  To: Breno Leitao, linuxppc-dev; +Cc: Gustavo Romero

> Do you mean in this part of code?
> 
>   SYSCALL_DEFINE0(rt_sigreturn)
>   {
> 	....
>         if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
>                 goto badframe;
> 
> 	...
>         if (MSR_TM_SUSPENDED(mfmsr()))
>                 tm_reclaim_current(0);

I'm actually thinking after the reclaim, not before.

If I follow your original email properly, you have a problem because you end up
in this senario:
1) Current MSR is not TM suspended
2) regs->msr[TS] set
3) get_user() (which may fault)

After the tm_reclaim there are cases in restore_tm_sigcontexts() where the above
is also the case. Hence why I think we have a problem there too.

Mikey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
  2018-11-21 22:42     ` Michael Neuling
@ 2018-11-22 12:46       ` Breno Leitao
  0 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2018-11-22 12:46 UTC (permalink / raw)
  To: Michael Neuling, linuxppc-dev; +Cc: Gustavo Romero

Hi Mikey,

On 11/21/18 8:42 PM, Michael Neuling wrote:
>> Do you mean in this part of code?
>>
>>   SYSCALL_DEFINE0(rt_sigreturn)
>>   {
>> 	....
>>         if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
>>                 goto badframe;
>>
>> 	...
>>         if (MSR_TM_SUSPENDED(mfmsr()))
>>                 tm_reclaim_current(0);
> 
> I'm actually thinking after the reclaim, not before.
> 
> If I follow your original email properly, you have a problem because you end up
> in this senario:
> 1) Current MSR is not TM suspended
> 2) regs->msr[TS] set
> 3) get_user() (which may fault)

In fact you need another case, where TEXASR register (the live register) does
not contain FS bit set. So, the current flow is:

 1) Current MSR is not TM suspended
 2) regs->msr[TS] set
 2a) TEXASR[FS] = 0
3) get_user() (which may fault)

In this case, the page fault will call SCHEDULE, which will call
__switch_to_tm().

__switch_to_tm() will call tm_reclaim_task(), which does:

	static inline void tm_reclaim_task(struct task_struct *tsk)
	{
		...
	        tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
		...
	        tm_save_sprs(thr);
	}

So, the code above is executed at page fault with the scenario you described
(current MSR is not suspended, regs->msr[TS] set and current TEXASR = 0).

That said, tm_reclaim_task() will invoke tm_reclaim_thread() which will
return due to:

	        if (!MSR_TM_SUSPENDED(mfmsr()))
	                return;

Calling tm_save_sprs(thread), which does:

	_GLOBAL(tm_save_sprs)
     		mfspr   r0, SPRN_TEXASR		 <- TEXASR is 0 here
      		std     r0, THREAD_TM_TEXASR(r3) <- thr->texasr will be 0


In this case, we have a process that was de-schedule properly but has
regs->msr[TS] set and Thread->texasr[FS] = 0. (If current MSR[TS] was set,
then the reclaim process would set the live TEXASR[FS] for us, but it didn't
happen, since MSR_TM_SUSPENDED(mfmsr()) was false.)

When this process is scheduled back, then it breaks. It will do call the
following chain:

__switch_to_tm() -> tm_recheckpoint_new_task() -> tm_recheckpoint() which does:

	void tm_recheckpoint(struct thread_struct *thread)
	{
		...
	        tm_restore_sprs(thread); 	
        	__tm_recheckpoint(thread);
	}

In this case, __tm-recheckpoint() is called with current TEXASR[FS] = 0,
hitting that bug.


> After the tm_reclaim there are cases in restore_tm_sigcontexts() where the
> above
> is also the case. Hence why I think we have a problem there too

Right, but in order to meet the criteria, you need to *fully* execute
tm_reclaim() (i.e execute the TRECLAIM instruction), so, thread->texasr[FS]
will be set.

So, at the entrance level, you either have current MSR[TS] set and fully
reclaim, thus setting texasr[FS], or, you will not have the regs->msr[TS] set
*and* current MSR[TS] disabled (until later where this patch fixes the problem).

Anyway, I might be missing something. So, the root of the problem seem to be
related to creating a case where current MSR[TS] is not set but regs->msr[TS]
is set.




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-11-22 12:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19 12:44 [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint Breno Leitao
2018-11-19 12:44 ` Breno Leitao
2018-11-19 23:30 ` Michael Neuling
2018-11-21 18:27   ` Breno Leitao
2018-11-21 22:42     ` Michael Neuling
2018-11-22 12:46       ` Breno Leitao
2018-11-20 10:34 ` Michael Ellerman
2018-11-20 10:34   ` Michael Ellerman
2018-11-21 18:32   ` Breno Leitao
2018-11-21 18:32     ` Breno Leitao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.