All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed
@ 2017-11-02  3:09 Cyril Bur
  2017-11-02  3:09 ` [PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception Cyril Bur
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Cyril Bur @ 2017-11-02  3:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mikey, leitao, gromero, andrew, jk, sam

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
V2: Rather than incorrectly using the MSR to enable {FP,VEC,VSX} use
the load_fp and load_vec booleans to help restore_math() make the
correct decision
V3: Put tm_active_with_{fp,altivec}() inside a #ifdef
CONFIG_PPC_TRANSACTIONAL_MEM 


 arch/powerpc/kernel/process.c | 18 ++++++++++++++++--
 arch/powerpc/kernel/traps.c   |  8 ++++----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bbf3454..cff887e67eb9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -97,9 +97,23 @@ static inline bool msr_tm_active(unsigned long msr)
 {
 	return MSR_TM_ACTIVE(msr);
 }
+
+static bool tm_active_with_fp(struct task_struct *tsk)
+{
+	return msr_tm_active(tsk->thread.regs->msr) &&
+		(tsk->thread.ckpt_regs.msr & MSR_FP);
+}
+
+static bool tm_active_with_altivec(struct task_struct *tsk)
+{
+	return msr_tm_active(tsk->thread.regs->msr) &&
+		(tsk->thread.ckpt_regs.msr & MSR_VEC);
+}
 #else
 static inline bool msr_tm_active(unsigned long msr) { return false; }
 static inline void check_if_tm_restore_required(struct task_struct *tsk) { }
+static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; }
+static inline bool tm_active_with_altivec(struct task_struct *tsk) { return false; }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 bool strict_msr_control;
@@ -232,7 +246,7 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 static int restore_fp(struct task_struct *tsk)
 {
-	if (tsk->thread.load_fp || msr_tm_active(tsk->thread.regs->msr)) {
+	if (tsk->thread.load_fp || tm_active_with_fp(tsk)) {
 		load_fp_state(&current->thread.fp_state);
 		current->thread.load_fp++;
 		return 1;
@@ -314,7 +328,7 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
 static int restore_altivec(struct task_struct *tsk)
 {
 	if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
-		(tsk->thread.load_vec || msr_tm_active(tsk->thread.regs->msr))) {
+		(tsk->thread.load_vec || tm_active_with_altivec(tsk))) {
 		load_vr_state(&tsk->thread.vr_state);
 		tsk->thread.used_vr = 1;
 		tsk->thread.load_vec++;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 13c9dcdcba69..ef6a45969812 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1487,7 +1487,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
 	/* Reclaim didn't save out any FPRs to transact_fprs. */
 
 	/* Enable FP for the task: */
-	regs->msr |= (MSR_FP | current->thread.fpexc_mode);
+	current->thread.load_fp = 1;
 
 	/* This loads and recheckpoints the FP registers from
 	 * thread.fpr[].  They will remain in registers after the
@@ -1516,7 +1516,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 		 "MSR=%lx\n",
 		 regs->nip, regs->msr);
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
-	regs->msr |= MSR_VEC;
+	current->thread.load_vec = 1;
 	tm_recheckpoint(&current->thread, MSR_VEC);
 	current->thread.used_vr = 1;
 
@@ -1553,8 +1553,8 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 	/* This reclaims FP and/or VR regs if they're already enabled */
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 
-	regs->msr |= MSR_VEC | MSR_FP | current->thread.fpexc_mode |
-		MSR_VSX;
+	current->thread.load_vec = 1;
+	current->thread.load_fp = 1;
 
 	/* This loads & recheckpoints FP and VRs; but we have
 	 * to be sure not to overwrite previously-valid state.
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception
  2017-11-02  3:09 [PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed Cyril Bur
@ 2017-11-02  3:09 ` Cyril Bur
  2017-11-02  3:09 ` [PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint Cyril Bur
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Cyril Bur @ 2017-11-02  3:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mikey, leitao, gromero, andrew, jk, sam

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
V2: Add this patch for ease of backporting the same fix as the next
patch.
V3: No change

 arch/powerpc/kernel/process.c |  4 ++--
 arch/powerpc/kernel/traps.c   | 22 +++++++++++++++++-----
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index cff887e67eb9..bf651f2fd3bd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -867,6 +867,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 	if (!MSR_TM_SUSPENDED(mfmsr()))
 		return;
 
+	giveup_all(container_of(thr, struct task_struct, thread));
+
 	/*
 	 * If we are in a transaction and FP is off then we can't have
 	 * used FP inside that transaction. Hence the checkpointed
@@ -886,8 +888,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 		memcpy(&thr->ckvr_state, &thr->vr_state,
 		       sizeof(struct thread_vr_state));
 
-	giveup_all(container_of(thr, struct task_struct, thread));
-
 	tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ef6a45969812..a7d42c89a257 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,6 +1471,12 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
+	/*
+	 * Save the MSR now because tm_reclaim_current() is likely to
+	 * change it
+	 */
+	unsigned long orig_msr = regs->msr;
+
 	/* Note:  This does not handle any kind of FP laziness. */
 
 	TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1495,10 +1501,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 	 * If VMX is in use, the VRs now hold checkpointed values,
 	 * so we don't want to load the VRs from the thread_struct.
 	 */
-	tm_recheckpoint(&current->thread, MSR_FP);
+	tm_recheckpoint(&current->thread, orig_msr | MSR_FP);
 
 	/* If VMX is in use, get the transactional values back */
-	if (regs->msr & MSR_VEC) {
+	if (orig_msr & MSR_VEC) {
 		msr_check_and_set(MSR_VEC);
 		load_vr_state(&current->thread.vr_state);
 		/* At this point all the VSX state is loaded, so enable it */
@@ -1508,6 +1514,12 @@ void fp_unavailable_tm(struct pt_regs *regs)
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
+	/*
+	 * Save the MSR now because tm_reclaim_current() is likely to
+	 * change it
+	 */
+	unsigned long orig_msr = regs->msr;
+
 	/* See the comments in fp_unavailable_tm().  This function operates
 	 * the same way.
 	 */
@@ -1517,10 +1529,10 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 		 regs->nip, regs->msr);
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 	current->thread.load_vec = 1;
-	tm_recheckpoint(&current->thread, MSR_VEC);
+	tm_recheckpoint(&current->thread, orig_msr | MSR_VEC);
 	current->thread.used_vr = 1;
 
-	if (regs->msr & MSR_FP) {
+	if (orig_msr & MSR_FP) {
 		msr_check_and_set(MSR_FP);
 		load_fp_state(&current->thread.fp_state);
 		regs->msr |= MSR_VSX;
@@ -1559,7 +1571,7 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 	/* This loads & recheckpoints FP and VRs; but we have
 	 * to be sure not to overwrite previously-valid state.
 	 */
-	tm_recheckpoint(&current->thread, regs->msr & ~orig_msr);
+	tm_recheckpoint(&current->thread, orig_msr | MSR_FP | MSR_VEC);
 
 	msr_check_and_set(orig_msr & (MSR_FP | MSR_VEC));
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint
  2017-11-02  3:09 [PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed Cyril Bur
  2017-11-02  3:09 ` [PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception Cyril Bur
@ 2017-11-02  3:09 ` Cyril Bur
  2017-11-02  3:09 ` [PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable Cyril Bur
  2017-11-07 23:30 ` [v3,1/4] powerpc: Don't enable FP/Altivec if not checkpointed Michael Ellerman
  3 siblings, 0 replies; 5+ messages in thread
From: Cyril Bur @ 2017-11-02  3:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mikey, leitao, gromero, andrew, jk, sam

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
V2: Unchanged
V3: Unchanged

 arch/powerpc/include/asm/tm.h   |  5 ++--
 arch/powerpc/kernel/process.c   | 22 ++++++---------
 arch/powerpc/kernel/signal_32.c |  2 +-
 arch/powerpc/kernel/signal_64.c |  2 +-
 arch/powerpc/kernel/tm.S        | 59 ++++++++++++-----------------------------
 arch/powerpc/kernel/traps.c     | 26 +++++-------------
 6 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index 82e06ca3a49b..33d965911bec 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -11,10 +11,9 @@
 
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
-		       unsigned long orig_msr, uint8_t cause);
+		       uint8_t cause);
 extern void tm_reclaim_current(uint8_t cause);
-extern void tm_recheckpoint(struct thread_struct *thread,
-			    unsigned long orig_msr);
+extern void tm_recheckpoint(struct thread_struct *thread);
 extern void tm_abort(uint8_t cause);
 extern void tm_save_sprs(struct thread_struct *thread);
 extern void tm_restore_sprs(struct thread_struct *thread);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bf651f2fd3bd..b00c291cd05c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -869,6 +869,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 
 	giveup_all(container_of(thr, struct task_struct, thread));
 
+	tm_reclaim(thr, cause);
+
 	/*
 	 * If we are in a transaction and FP is off then we can't have
 	 * used FP inside that transaction. Hence the checkpointed
@@ -887,8 +889,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 	if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
 		memcpy(&thr->ckvr_state, &thr->vr_state,
 		       sizeof(struct thread_vr_state));
-
-	tm_reclaim(thr, thr->ckpt_regs.msr, cause);
 }
 
 void tm_reclaim_current(uint8_t cause)
@@ -937,11 +937,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 	tm_save_sprs(thr);
 }
 
-extern void __tm_recheckpoint(struct thread_struct *thread,
-			      unsigned long orig_msr);
+extern void __tm_recheckpoint(struct thread_struct *thread);
 
-void tm_recheckpoint(struct thread_struct *thread,
-		     unsigned long orig_msr)
+void tm_recheckpoint(struct thread_struct *thread)
 {
 	unsigned long flags;
 
@@ -960,15 +958,13 @@ void tm_recheckpoint(struct thread_struct *thread,
 	 */
 	tm_restore_sprs(thread);
 
-	__tm_recheckpoint(thread, orig_msr);
+	__tm_recheckpoint(thread);
 
 	local_irq_restore(flags);
 }
 
 static inline void tm_recheckpoint_new_task(struct task_struct *new)
 {
-	unsigned long msr;
-
 	if (!cpu_has_feature(CPU_FTR_TM))
 		return;
 
@@ -987,13 +983,11 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new)
 		tm_restore_sprs(&new->thread);
 		return;
 	}
-	msr = new->thread.ckpt_regs.msr;
 	/* Recheckpoint to restore original checkpointed register state. */
-	TM_DEBUG("*** tm_recheckpoint of pid %d "
-		 "(new->msr 0x%lx, new->origmsr 0x%lx)\n",
-		 new->pid, new->thread.regs->msr, msr);
+	TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n",
+		 new->pid, new->thread.regs->msr);
 
-	tm_recheckpoint(&new->thread, msr);
+	tm_recheckpoint(&new->thread);
 
 	/*
 	 * The checkpointed state has been restored but the live state has
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 92fb1c8dbbd8..6fde1ff7396a 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -876,7 +876,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 	/* Make sure the transaction is marked as failed */
 	current->thread.tm_texasr |= TEXASR_FS;
 	/* This loads the checkpointed FP/VEC state, if used */
-	tm_recheckpoint(&current->thread, msr);
+	tm_recheckpoint(&current->thread);
 
 	/* This loads the speculative FP/VEC state, if used */
 	msr_check_and_set(msr & (MSR_FP | MSR_VEC));
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index b2c002993d78..f395c5b81df9 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -558,7 +558,7 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
 	/* Make sure the transaction is marked as failed */
 	tsk->thread.tm_texasr |= TEXASR_FS;
 	/* This loads the checkpointed FP/VEC state, if used */
-	tm_recheckpoint(&tsk->thread, msr);
+	tm_recheckpoint(&tsk->thread);
 
 	msr_check_and_set(msr & (MSR_FP | MSR_VEC));
 	if (msr & MSR_FP) {
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index c4ba37822ba0..d89fb0e6f9ed 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -79,15 +79,12 @@ _GLOBAL(tm_abort)
 	blr
 
 /* void tm_reclaim(struct thread_struct *thread,
- *                 unsigned long orig_msr,
  *		   uint8_t cause)
  *
  *	- Performs a full reclaim.  This destroys outstanding
  *	  transactions and updates thread->regs.tm_ckpt_* with the
  *	  original checkpointed state.  Note that thread->regs is
  *	  unchanged.
- *	- FP regs are written back to thread->transact_fpr before
- *	  reclaiming.  These are the transactional (current) versions.
  *
  * Purpose is to both abort transactions of, and preserve the state of,
  * a transactions at a context switch. We preserve/restore both sets of process
@@ -98,9 +95,9 @@ _GLOBAL(tm_abort)
  * Call with IRQs off, stacks get all out of sync for some periods in here!
  */
 _GLOBAL(tm_reclaim)
-	mfcr	r6
+	mfcr	r5
 	mflr	r0
-	stw	r6, 8(r1)
+	stw	r5, 8(r1)
 	std	r0, 16(r1)
 	std	r2, STK_GOT(r1)
 	stdu	r1, -TM_FRAME_SIZE(r1)
@@ -108,7 +105,6 @@ _GLOBAL(tm_reclaim)
 	/* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. */
 
 	std	r3, STK_PARAM(R3)(r1)
-	std	r4, STK_PARAM(R4)(r1)
 	SAVE_NVGPRS(r1)
 
 	/* We need to setup MSR for VSX register save instructions. */
@@ -138,8 +134,8 @@ _GLOBAL(tm_reclaim)
 	std	r1, PACAR1(r13)
 
 	/* Clear MSR RI since we are about to change r1, EE is already off. */
-	li	r4, 0
-	mtmsrd	r4, 1
+	li	r5, 0
+	mtmsrd	r5, 1
 
 	/*
 	 * BE CAREFUL HERE:
@@ -151,7 +147,7 @@ _GLOBAL(tm_reclaim)
 	 * to user register state.  (FPRs, CCR etc. also!)
 	 * Use an sprg and a tm_scratch in the PACA to shuffle.
 	 */
-	TRECLAIM(R5)				/* Cause in r5 */
+	TRECLAIM(R4)				/* Cause in r4 */
 
 	/* ******************** GPRs ******************** */
 	/* Stash the checkpointed r13 away in the scratch SPR and get the real
@@ -242,40 +238,30 @@ _GLOBAL(tm_reclaim)
 
 
 	/* ******************** FPR/VR/VSRs ************
-	 * After reclaiming, capture the checkpointed FPRs/VRs /if used/.
-	 *
-	 * (If VSX used, FP and VMX are implied.  Or, we don't need to look
-	 * at MSR.VSX as copying FP regs if .FP, vector regs if .VMX covers it.)
-	 *
-	 * We're passed the thread's MSR as the second parameter
+	 * After reclaiming, capture the checkpointed FPRs/VRs.
 	 *
 	 * We enabled VEC/FP/VSX in the msr above, so we can execute these
 	 * instructions!
 	 */
-	ld	r4, STK_PARAM(R4)(r1)		/* Second parameter, MSR * */
 	mr	r3, r12
-	andis.		r0, r4, MSR_VEC@h
-	beq	dont_backup_vec
 
+	/* Altivec (VEC/VMX/VR)*/
 	addi	r7, r3, THREAD_CKVRSTATE
 	SAVE_32VRS(0, r6, r7)	/* r6 scratch, r7 transact vr state */
 	mfvscr	v0
 	li	r6, VRSTATE_VSCR
 	stvx	v0, r7, r6
-dont_backup_vec:
+
+	/* VRSAVE */
 	mfspr	r0, SPRN_VRSAVE
 	std	r0, THREAD_CKVRSAVE(r3)
 
-	andi.	r0, r4, MSR_FP
-	beq	dont_backup_fp
-
+	/* Floating Point (FP) */
 	addi	r7, r3, THREAD_CKFPSTATE
 	SAVE_32FPRS_VSRS(0, R6, R7)	/* r6 scratch, r7 transact fp state */
-
 	mffs    fr0
 	stfd    fr0,FPSTATE_FPSCR(r7)
 
-dont_backup_fp:
 
 	/* TM regs, incl TEXASR -- these live in thread_struct.  Note they've
 	 * been updated by the treclaim, to explain to userland the failure
@@ -343,22 +329,19 @@ _GLOBAL(__tm_recheckpoint)
 	 */
 	subi	r7, r7, STACK_FRAME_OVERHEAD
 
+	/* We need to setup MSR for FP/VMX/VSX register save instructions. */
 	mfmsr	r6
-	/* R4 = original MSR to indicate whether thread used FP/Vector etc. */
-
-	/* Enable FP/vec in MSR if necessary! */
-	lis	r5, MSR_VEC@h
+	mr	r5, r6
 	ori	r5, r5, MSR_FP
-	and.	r5, r4, r5
-	beq	restore_gprs			/* if neither, skip both */
-
+#ifdef CONFIG_ALTIVEC
+	oris	r5, r5, MSR_VEC@h
+#endif
 #ifdef CONFIG_VSX
 	BEGIN_FTR_SECTION
-	oris	r5, r5, MSR_VSX@h
+	oris	r5,r5, MSR_VSX@h
 	END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif
-	or	r5, r6, r5			/* Set MSR.FP+.VSX/.VEC */
-	mtmsr	r5
+	mtmsrd	r5
 
 #ifdef CONFIG_ALTIVEC
 	/*
@@ -367,28 +350,20 @@ _GLOBAL(__tm_recheckpoint)
 	 * thread.fp_state[] version holds the 'live' (transactional)
 	 * and will be loaded subsequently by any FPUnavailable trap.
 	 */
-	andis.	r0, r4, MSR_VEC@h
-	beq	dont_restore_vec
-
 	addi	r8, r3, THREAD_CKVRSTATE
 	li	r5, VRSTATE_VSCR
 	lvx	v0, r8, r5
 	mtvscr	v0
 	REST_32VRS(0, r5, r8)			/* r5 scratch, r8 ptr */
-dont_restore_vec:
 	ld	r5, THREAD_CKVRSAVE(r3)
 	mtspr	SPRN_VRSAVE, r5
 #endif
 
-	andi.	r0, r4, MSR_FP
-	beq	dont_restore_fp
-
 	addi	r8, r3, THREAD_CKFPSTATE
 	lfd	fr0, FPSTATE_FPSCR(r8)
 	MTFSF_L(fr0)
 	REST_32FPRS_VSRS(0, R4, R8)
 
-dont_restore_fp:
 	mtmsr	r6				/* FP/Vec off again! */
 
 restore_gprs:
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a7d42c89a257..4a7bc64352fd 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1501,7 +1501,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
 	 * If VMX is in use, the VRs now hold checkpointed values,
 	 * so we don't want to load the VRs from the thread_struct.
 	 */
-	tm_recheckpoint(&current->thread, orig_msr | MSR_FP);
+	tm_recheckpoint(&current->thread);
 
 	/* If VMX is in use, get the transactional values back */
 	if (orig_msr & MSR_VEC) {
@@ -1529,7 +1529,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 		 regs->nip, regs->msr);
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 	current->thread.load_vec = 1;
-	tm_recheckpoint(&current->thread, orig_msr | MSR_VEC);
+	tm_recheckpoint(&current->thread);
 	current->thread.used_vr = 1;
 
 	if (orig_msr & MSR_FP) {
@@ -1541,8 +1541,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 
 void vsx_unavailable_tm(struct pt_regs *regs)
 {
-	unsigned long orig_msr = regs->msr;
-
 	/* See the comments in fp_unavailable_tm().  This works similarly,
 	 * though we're loading both FP and VEC registers in here.
 	 *
@@ -1556,29 +1554,17 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 
 	current->thread.used_vsr = 1;
 
-	/* If FP and VMX are already loaded, we have all the state we need */
-	if ((orig_msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {
-		regs->msr |= MSR_VSX;
-		return;
-	}
-
 	/* This reclaims FP and/or VR regs if they're already enabled */
 	tm_reclaim_current(TM_CAUSE_FAC_UNAV);
 
 	current->thread.load_vec = 1;
 	current->thread.load_fp = 1;
 
-	/* This loads & recheckpoints FP and VRs; but we have
-	 * to be sure not to overwrite previously-valid state.
-	 */
-	tm_recheckpoint(&current->thread, orig_msr | MSR_FP | MSR_VEC);
-
-	msr_check_and_set(orig_msr & (MSR_FP | MSR_VEC));
+	tm_recheckpoint(&current->thread);
 
-	if (orig_msr & MSR_FP)
-		load_fp_state(&current->thread.fp_state);
-	if (orig_msr & MSR_VEC)
-		load_vr_state(&current->thread.vr_state);
+	msr_check_and_set(MSR_FP | MSR_VEC);
+	load_fp_state(&current->thread.fp_state);
+	load_vr_state(&current->thread.vr_state);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable
  2017-11-02  3:09 [PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed Cyril Bur
  2017-11-02  3:09 ` [PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception Cyril Bur
  2017-11-02  3:09 ` [PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint Cyril Bur
@ 2017-11-02  3:09 ` Cyril Bur
  2017-11-07 23:30 ` [v3,1/4] powerpc: Don't enable FP/Altivec if not checkpointed Michael Ellerman
  3 siblings, 0 replies; 5+ messages in thread
From: Cyril Bur @ 2017-11-02  3:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mikey, leitao, gromero, andrew, jk, sam

After handling a transactional FP, Altivec or VSX unavailable exception.
The return to userspace code will detect that the TIF_RESTORE_TM bit is
set and call restore_tm_state(). restore_tm_state() will call
restore_math() to ensure that the correct facilities are loaded.

This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
is doing pointless work and can simply be removed.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
V2: Obvious cleanup which should have been in v1
V3: Unchanged
 arch/powerpc/kernel/traps.c | 30 ------------------------------
 1 file changed, 30 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4a7bc64352fd..3181e85ef17c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1471,12 +1471,6 @@ void facility_unavailable_exception(struct pt_regs *regs)
 
 void fp_unavailable_tm(struct pt_regs *regs)
 {
-	/*
-	 * Save the MSR now because tm_reclaim_current() is likely to
-	 * change it
-	 */
-	unsigned long orig_msr = regs->msr;
-
 	/* Note:  This does not handle any kind of FP laziness. */
 
 	TM_DEBUG("FP Unavailable trap whilst transactional at 0x%lx, MSR=%lx\n",
@@ -1502,24 +1496,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
 	 * so we don't want to load the VRs from the thread_struct.
 	 */
 	tm_recheckpoint(&current->thread);
-
-	/* If VMX is in use, get the transactional values back */
-	if (orig_msr & MSR_VEC) {
-		msr_check_and_set(MSR_VEC);
-		load_vr_state(&current->thread.vr_state);
-		/* At this point all the VSX state is loaded, so enable it */
-		regs->msr |= MSR_VSX;
-	}
 }
 
 void altivec_unavailable_tm(struct pt_regs *regs)
 {
-	/*
-	 * Save the MSR now because tm_reclaim_current() is likely to
-	 * change it
-	 */
-	unsigned long orig_msr = regs->msr;
-
 	/* See the comments in fp_unavailable_tm().  This function operates
 	 * the same way.
 	 */
@@ -1531,12 +1511,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 	current->thread.load_vec = 1;
 	tm_recheckpoint(&current->thread);
 	current->thread.used_vr = 1;
-
-	if (orig_msr & MSR_FP) {
-		msr_check_and_set(MSR_FP);
-		load_fp_state(&current->thread.fp_state);
-		regs->msr |= MSR_VSX;
-	}
 }
 
 void vsx_unavailable_tm(struct pt_regs *regs)
@@ -1561,10 +1535,6 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 	current->thread.load_fp = 1;
 
 	tm_recheckpoint(&current->thread);
-
-	msr_check_and_set(MSR_FP | MSR_VEC);
-	load_fp_state(&current->thread.fp_state);
-	load_vr_state(&current->thread.vr_state);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [v3,1/4] powerpc: Don't enable FP/Altivec if not checkpointed
  2017-11-02  3:09 [PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed Cyril Bur
                   ` (2 preceding siblings ...)
  2017-11-02  3:09 ` [PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable Cyril Bur
@ 2017-11-07 23:30 ` Michael Ellerman
  3 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2017-11-07 23:30 UTC (permalink / raw)
  To: Cyril Bur, linuxppc-dev; +Cc: mikey, andrew, gromero, jk, leitao, sam

On Thu, 2017-11-02 at 03:09:03 UTC, Cyril Bur wrote:
> Lazy save and restore of FP/Altivec means that a userspace process can
> be sent to userspace with FP or Altivec disabled and loaded only as
> required (by way of an FP/Altivec unavailable exception). Transactional
> Memory complicates this situation as a transaction could be started
> without FP/Altivec being loaded up. This causes the hardware to
> checkpoint incorrect registers. Handling FP/Altivec unavailable
> exceptions while a thread is transactional requires a reclaim and
> recheckpoint to ensure the CPU has correct state for both sets of
> registers.
> 
> Lazy save and restore of FP/Altivec cannot be done if a process is
> transactional. If a facility was enabled it must remain enabled whenever
> a thread is transactional.
> 
> Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
> transactional memory in use") ensures that the facilities are always
> enabled if a thread is transactional. A bug in the introduced code may
> cause it to inadvertently enable a facility that was (and should remain)
> disabled. The problem with this extraneous enablement is that the
> registers for the erroneously enabled facility have not been correctly
> recheckpointed - the recheckpointing code assumed the facility would
> remain disabled.
> 
> Further compounding the issue, the transactional {fp,altivec,vsx}
> unavailable code has been incorrectly using the MSR to enable
> facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
> means if the registers are live on the CPU, not if the kernel should
> load them before returning to userspace. This has worked due to the bug
> mentioned above.
> 
> This causes transactional threads which return to their failure handler
> to observe incorrect checkpointed registers. Perhaps an example will
> help illustrate the problem:
> 
> A userspace process is running and uses both FP and Altivec registers.
> This process then continues to run for some time without touching
> either sets of registers. The kernel subsequently disables the
> facilities as part of lazy save and restore. The userspace process then
> performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
> registers. The process then performs a floating point instruction
> triggering a fp unavailable exception in the kernel.
> 
> The kernel then loads the FP registers - and only the FP registers.
> Since the thread is transactional it must perform a reclaim and
> recheckpoint to ensure both the checkpointed registers and the
> transactional registers are correct. It then (correctly) enables
> MSR[FP] for the process. Later (on exception exist) the kernel also
> (inadvertently) enables MSR[VEC]. The process is then returned to
> userspace.
> 
> Since the act of loading the FP registers doomed the transaction we know
> CPU will fail the transaction, restore its checkpointed registers, and
> return the process to its failure handler. The problem is that we're
> now running with Altivec enabled and the 'junk' checkpointed registers
> are restored. The kernel had only recheckpointed FP.
> 
> This patch solves this by only activating FP/Altivec if userspace was
> using them when it entered the kernel and not simply if the process is
> transactional.
> 
> Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
> transactional memory in use")
> 
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a7771176b4392fbc3a17399c51a8c1

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-07 23:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02  3:09 [PATCH v3 1/4] powerpc: Don't enable FP/Altivec if not checkpointed Cyril Bur
2017-11-02  3:09 ` [PATCH v3 2/4] powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception Cyril Bur
2017-11-02  3:09 ` [PATCH v3 3/4] powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint Cyril Bur
2017-11-02  3:09 ` [PATCH v3 4/4] powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable Cyril Bur
2017-11-07 23:30 ` [v3,1/4] powerpc: Don't enable FP/Altivec if not checkpointed Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.