linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP
@ 2010-12-06 23:40 Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev, linux-kernel

This implements lazy save of FP, VMX and VSX state on SMP 64bit and 32
bit powerpc.

Currently we only do lazy save in UP, but this patch set extends this to
SMP.  We always do lazy restore.

For VMX, on a context switch we do the following:
 - if we are switching to a CPU that currently holds the new processes
   state, just turn on VMX in the MSR (this is the lazy/quick case)
 - if the new processes state is in the thread_struct, turn VMX off.
 - if the new processes state is in someone else's CPU, IPI that CPU to
   giveup it's state and turn VMX off in the MSR (slow IPI case).
We always start the new process at this point, irrespective of if we
have the state or not in the thread struct or current CPU.  

So in the slow case, we attempt to avoid the IPI latency by starting
the process immediately and only waiting for the state to be flushed
when the process actually needs VMX.  ie. when we take the VMX
unavailable exception after the context switch.

FP is implemented in a similar way.  VSX reuses the FP and VMX code as
it doesn't have any additional state over what FP and VMX used.

I've been benchmarking with Anton Blanchard's context_switch.c benchmark
found here: 
  http://ozlabs.org/~anton/junkcode/context_switch.c 
Using this benchmark as is gives no degradation in performance with these
patches applied.  

Inserting a simple FP instruction into one of the threads (gives the
nice save/restore lazy case), I get about a 4% improvement in context
switching rates with my patches applied.  I get similar results VMX.
With a simple VSX instruction (VSX state is 64x128bit registers) in 1
thread I get an 8% bump in performance with these patches.

With FP/VMX/VSX instructions in both threads, I get no degradation in
performance.

Running lmbench doesn't have any degradation in performance.

Most of my benchmarking and testing has been done on 64 bit systems.
I've tested 32 bit FP but I've not tested 32 bit VMX at all.

There is probably some optimisations to my asm code that can also be
made.  I've been concentrating on correctness, as opposed to speed
with the asm code, since if you get a lazy context switch, you skip
all the asm now anyway.

Whole series is bisectable to compile with various 64/32bit SMP/UP
FPU/VMX/VSX config options on and off.

I really hate the include file changes in this series.  Getting the
call_single_data in the powerpc threads_struct was a PITA :-)

Mikey

Signed-off-by: Michael Neuling <mikey@neuling.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 1/7] Add csd_locked function
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places Michael Neuling
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev, linux-kernel

Add csd_locked function to determine if a struct call_single_data is
currently locked.  This can be used to see if an IPI can be called
again using this call_single_data.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 kernel/smp.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: linux-lazy/kernel/smp.c
===================================================================
--- linux-lazy.orig/kernel/smp.c
+++ linux-lazy/kernel/smp.c
@@ -12,6 +12,7 @@
 #include <linux/gfp.h>
 #include <linux/smp.h>
 #include <linux/cpu.h>
+#include <linux/hardirq.h>
 
 static struct {
 	struct list_head	queue;
@@ -131,6 +132,22 @@
 }
 
 /*
+ * Determine if a csd is currently locked.  This can be used to
+ * determine if an IPI is currently pending using this csd already.
+ */
+int csd_locked(struct call_single_data *data)
+{
+	WARN_ON(preemptible());
+
+	/* Ensure flags have propagated */
+	smp_mb();
+
+	if (data->flags & CSD_FLAG_LOCK)
+		return 1;
+	return 0;
+}
+
+/*
  * Insert a previously allocated call_single_data element
  * for execution on the given CPU. data must already have
  * ->func, ->info, and ->flags set.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 3/7] powerpc: Reorganise powerpc include files to make call_single_data Michael Neuling
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev, linux-kernel

We need to put struct call_single_data in the powerpc thread_struct,
but can't without this.

The thread_struct is in processor.h.  To add a struct call_single_data
to the thread_struct asm/processor.h must include linux/smp.h.  When
linux/smp.h is added to processor.h this creates an include loop via
with list.h via:

  linux/list.h includes: 
    linux/prefetch.h includes:
      asm/processor.h (for powerpc) includes:
        linux/smp.h includes:
          linux/list.h

This loops results in an "incomplete list type" compile when using
struct list_head as used in struct call_single_data.

This patch rearanges some include files to avoid this loop.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 include/linux/call_single_data.h |   14 ++++++++++++++
 include/linux/list.h             |    4 +++-
 include/linux/smp.h              |    8 +-------
 3 files changed, 18 insertions(+), 8 deletions(-)

Index: linux-lazy/include/linux/call_single_data.h
===================================================================
--- /dev/null
+++ linux-lazy/include/linux/call_single_data.h
@@ -0,0 +1,14 @@
+#ifndef __LINUX_CALL_SINGLE_DATA_H
+#define __LINUX_CALL_SINGLE_DATA_H
+
+#include <linux/list.h>
+
+struct call_single_data {
+	struct list_head list;
+	void (*func) (void *info);
+	void *info;
+	u16 flags;
+	u16 priv;
+};
+
+#endif /* __LINUX_CALL_SINGLE_DATA_H */
Index: linux-lazy/include/linux/list.h
===================================================================
--- linux-lazy.orig/include/linux/list.h
+++ linux-lazy/include/linux/list.h
@@ -4,7 +4,6 @@
 #include <linux/types.h>
 #include <linux/stddef.h>
 #include <linux/poison.h>
-#include <linux/prefetch.h>
 
 /*
  * Simple doubly linked list implementation.
@@ -16,6 +15,9 @@
  * using the generic single-entry routines.
  */
 
+#include <linux/prefetch.h>
+#include <asm/system.h>
+
 #define LIST_HEAD_INIT(name) { &(name), &(name) }
 
 #define LIST_HEAD(name) \
Index: linux-lazy/include/linux/smp.h
===================================================================
--- linux-lazy.orig/include/linux/smp.h
+++ linux-lazy/include/linux/smp.h
@@ -9,18 +9,12 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/call_single_data.h>
 #include <linux/cpumask.h>
 
 extern void cpu_idle(void);
 
 typedef void (*smp_call_func_t)(void *info);
-struct call_single_data {
-	struct list_head list;
-	smp_call_func_t func;
-	void *info;
-	u16 flags;
-	u16 priv;
-};
 
 /* total number of cpus in this system (may exceed NR_CPUS) */
 extern unsigned int total_cpus;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 3/7] powerpc: Reorganise powerpc include files to make call_single_data
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 4/7] powerpc: Change fast_exception_return to restore r0, r7. r8, and CTR Michael Neuling
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev

We need to put struct call_single_data in the powerpc thread_struct,
but can't without this.  

In processor.h this moves up the prefetch() functions before the
#include of types.h to ensure __builtin_prefetch doesn't get defined
twice.

Similarly in hw_irq.h move arch_irqs_disabled_flags() to before the
#include of processor.h to ensure it's correctly found when hw_irq.h
has already been included somewhere earlier

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/hw_irq.h    |   21 ++++++++------
 arch/powerpc/include/asm/processor.h |   52 ++++++++++++++++++-----------------
 2 files changed, 40 insertions(+), 33 deletions(-)

Index: linux-lazy/arch/powerpc/include/asm/hw_irq.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/hw_irq.h
+++ linux-lazy/arch/powerpc/include/asm/hw_irq.h
@@ -8,10 +8,6 @@
 
 #include <linux/errno.h>
 #include <linux/compiler.h>
-#include <asm/ptrace.h>
-#include <asm/processor.h>
-
-extern void timer_interrupt(struct pt_regs *);
 
 #ifdef CONFIG_PPC64
 #include <asm/paca.h>
@@ -81,6 +77,8 @@
 
 #else /* CONFIG_PPC64 */
 
+#include <asm/reg.h>
+
 #define SET_MSR_EE(x)	mtmsr(x)
 
 static inline unsigned long arch_local_save_flags(void)
@@ -108,6 +106,16 @@
 	return flags;
 }
 
+static inline bool arch_irqs_disabled_flags(unsigned long flags)
+{
+	return (flags & MSR_EE) == 0;
+}
+
+#include <asm/ptrace.h>
+#include <asm/processor.h>
+
+extern void timer_interrupt(struct pt_regs *);
+
 static inline void arch_local_irq_disable(void)
 {
 #ifdef CONFIG_BOOKE
@@ -127,11 +135,6 @@
 #endif
 }
 
-static inline bool arch_irqs_disabled_flags(unsigned long flags)
-{
-	return (flags & MSR_EE) == 0;
-}
-
 static inline bool arch_irqs_disabled(void)
 {
 	return arch_irqs_disabled_flags(arch_local_save_flags());
Index: linux-lazy/arch/powerpc/include/asm/processor.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/processor.h
+++ linux-lazy/arch/powerpc/include/asm/processor.h
@@ -20,6 +20,34 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
+
+struct thread_struct;
+
+/*
+ * Prefetch macros.
+ */
+#define ARCH_HAS_PREFETCH
+#define ARCH_HAS_PREFETCHW
+#define ARCH_HAS_SPINLOCK_PREFETCH
+
+static inline void prefetch(const void *x)
+{
+	if (unlikely(!x))
+		return;
+
+	__asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
+}
+
+static inline void prefetchw(const void *x)
+{
+	if (unlikely(!x))
+		return;
+
+	__asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x));
+}
+
+#define spin_lock_prefetch(x)	prefetchw(x)
+
 #include <asm/ptrace.h>
 #include <asm/types.h>
 
@@ -327,30 +355,6 @@
 int validate_sp(unsigned long sp, struct task_struct *p,
                        unsigned long nbytes);
 
-/*
- * Prefetch macros.
- */
-#define ARCH_HAS_PREFETCH
-#define ARCH_HAS_PREFETCHW
-#define ARCH_HAS_SPINLOCK_PREFETCH
-
-static inline void prefetch(const void *x)
-{
-	if (unlikely(!x))
-		return;
-
-	__asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
-}
-
-static inline void prefetchw(const void *x)
-{
-	if (unlikely(!x))
-		return;
-
-	__asm__ __volatile__ ("dcbtst 0,%0" : : "r" (x));
-}
-
-#define spin_lock_prefetch(x)	prefetchw(x)
 
 #ifdef CONFIG_PPC64
 #define HAVE_ARCH_PICK_MMAP_LAYOUT

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 4/7] powerpc: Change fast_exception_return to restore r0, r7. r8, and CTR
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
                   ` (2 preceding siblings ...)
  2010-12-06 23:40 ` [RFC/PATCH 3/7] powerpc: Reorganise powerpc include files to make call_single_data Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 5/7] powerpc: Enable lazy save VMX registers for SMP Michael Neuling
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev

Currently we don't restore r0, r7, r8 and CTR in
fast_exception_return. This changes fast_exception_return to restore
these, which were saved anyway on exception entry.

This seems like a bug waiting to happen, plus we do it in hash_page
for 32bit anyway.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/entry_32.S |    4 ++++
 arch/powerpc/mm/hash_low_32.S  |    7 -------
 2 files changed, 4 insertions(+), 7 deletions(-)

Index: linux-lazy/arch/powerpc/kernel/entry_32.S
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/entry_32.S
+++ linux-lazy/arch/powerpc/kernel/entry_32.S
@@ -720,6 +720,10 @@
 #endif
 
 2:	REST_4GPRS(3, r11)
+	REST_2GPRS(7, r11)
+	REST_GPR(0, r11)
+ 	lwz	r10,_CTR(r11)
+ 	mtctr	r10
 	lwz	r10,_CCR(r11)
 	REST_GPR(1, r11)
 	mtcr	r10
Index: linux-lazy/arch/powerpc/mm/hash_low_32.S
===================================================================
--- linux-lazy.orig/arch/powerpc/mm/hash_low_32.S
+++ linux-lazy/arch/powerpc/mm/hash_low_32.S
@@ -146,13 +146,6 @@
 	li	r0,0
 	stw	r0,mmu_hash_lock@l(r8)
 #endif
-
-	/* Return from the exception */
-	lwz	r5,_CTR(r11)
-	mtctr	r5
-	lwz	r0,GPR0(r11)
-	lwz	r7,GPR7(r11)
-	lwz	r8,GPR8(r11)
 	b	fast_exception_return
 
 #ifdef CONFIG_SMP

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 5/7] powerpc: Enable lazy save VMX registers for SMP
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
                   ` (3 preceding siblings ...)
  2010-12-06 23:40 ` [RFC/PATCH 4/7] powerpc: Change fast_exception_return to restore r0, r7. r8, and CTR Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 6/7] powerpc: Enable lazy save FP " Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 7/7] powerpc: Enable lazy save VSX " Michael Neuling
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev

This enables lazy save of VMX registers for SMP configurations.

This adds a pointer to the thread struct to say which CPU holds this
processes VMX register state.  On 64 bit, this points to the paca of
the CPU holding the state or NULL if it's in the thread_struct.  On 32
bit, this is the CPU number of the CPU holding the state or -1 if it's
in the thread_struct.

It also adds a per cpu pointer (paca on 64bit), which points to the
thread_struct of the process who's state we currently own. 

On a context switch we do the following:
 - if we are switching to a CPU that currently holds the new processes
   state, just turn on VMX in the MSR (this is the lazy/quick case)
 - if the new processes state is in the thread_struct, turn VMX off.
 - if the new processes state is in someone else's CPU, IPI that CPU
   to giveup it's state and turn VMX off.
We always start the new process at this point, irrespective of if we
have the state or not in the thread struct or current CPU.

When we take the VMX unavailable, load_up_altivec checks to see if the
state is now in the thread_struct.  If it is, we restore the VMX
registers and start the process.  If it's not, we need to wait for the
IPI to finish.  Unfortunately, IRQs are off on the current CPU at this
point, so we must turn IRQs on (to avoid a deadlock) before we block
waiting for the IPI to finished on the other CPU.

We also change load_up_altivec to call giveup_altivec to save it's
state rather than duplicating this code.  This means that
giveup_altivec can now be called with the MMU on or off, hence we pass
in an offset, which gets subtracted on 32 bit systems on loads and
stores. 

For 32bit it's be nice to have last_used_altivec cacheline aligned or
as per_cpu variables but we can't access per_cpu vars in asm

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/paca.h              |    5 
 arch/powerpc/include/asm/processor.h         |   22 ++
 arch/powerpc/include/asm/reg.h               |    9 +
 arch/powerpc/include/asm/system.h            |   10 -
 arch/powerpc/kernel/asm-offsets.c            |    4 
 arch/powerpc/kernel/paca.c                   |    3 
 arch/powerpc/kernel/process.c                |  138 ++++++++++++------
 arch/powerpc/kernel/vector.S                 |  199 ++++++++++++++++++++-------
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    1 
 9 files changed, 288 insertions(+), 103 deletions(-)

Index: linux-lazy/arch/powerpc/include/asm/paca.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/paca.h
+++ linux-lazy/arch/powerpc/include/asm/paca.h
@@ -145,6 +145,11 @@
 	u64 dtl_ridx;			/* read index in dispatch log */
 	struct dtl_entry *dtl_curr;	/* pointer corresponding to dtl_ridx */
 
+#ifdef CONFIG_ALTIVEC
+	/* lazy save pointers */
+	struct task_struct *last_used_altivec;
+#endif
+
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
 	/* We use this to store guest state in */
 	struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
Index: linux-lazy/arch/powerpc/include/asm/processor.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/processor.h
+++ linux-lazy/arch/powerpc/include/asm/processor.h
@@ -18,6 +18,16 @@
 #define TS_FPRWIDTH 1
 #endif
 
+#ifdef CONFIG_PPC64
+#ifdef __ASSEMBLY__
+#define TS_LAZY_STATE_INVALID 0
+#else
+#define TS_LAZY_STATE_INVALID NULL
+#endif
+#else
+#define TS_LAZY_STATE_INVALID -1
+#endif
+
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 
@@ -48,6 +58,7 @@
 
 #define spin_lock_prefetch(x)	prefetchw(x)
 
+#include <linux/call_single_data.h>
 #include <asm/ptrace.h>
 #include <asm/types.h>
 
@@ -109,7 +120,6 @@
 
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
-extern struct task_struct *last_task_used_altivec;
 extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
@@ -177,6 +187,14 @@
 #define TS_VSRLOWOFFSET 1
 #define TS_FPR(i) fpr[i][TS_FPROFFSET]
 
+#ifdef CONFIG_PPC64
+#define TS_LAZY_STATE_TYPE struct paca_struct *
+#define TS_LAZY_STATE_INVALID NULL
+#else
+#define TS_LAZY_STATE_TYPE unsigned long
+#define TS_LAZY_STATE_INVALID -1
+#endif
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
@@ -253,6 +271,8 @@
 	/* AltiVec status */
 	vector128	vscr __attribute__((aligned(16)));
 	unsigned long	vrsave;
+	TS_LAZY_STATE_TYPE vr_state;	/* where my vr state is */
+	struct call_single_data vr_csd;	/* IPI data structure */
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
Index: linux-lazy/arch/powerpc/include/asm/reg.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/reg.h
+++ linux-lazy/arch/powerpc/include/asm/reg.h
@@ -808,6 +808,15 @@
 
 #define __is_processor(pv)	(PVR_VER(mfspr(SPRN_PVR)) == (pv))
 
+#ifdef CONFIG_PPC64
+#define GET_CURRENT_THREAD(reg)			\
+	ld	(reg),PACACURRENT(r13) ;	\
+	addi    (reg),(reg),THREAD
+#else
+#define GET_CURRENT_THREAD(reg)			\
+	mfspr	(reg),SPRN_SPRG_THREAD
+#endif
+
 /*
  * IBM has further subdivided the standard PowerPC 16-bit version and
  * revision subfields of the PVR for the PowerPC 403s into the following:
Index: linux-lazy/arch/powerpc/include/asm/system.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/system.h
+++ linux-lazy/arch/powerpc/include/asm/system.h
@@ -7,6 +7,7 @@
 #include <linux/kernel.h>
 #include <linux/irqflags.h>
 
+#include <asm/irqflags.h>
 #include <asm/hw_irq.h>
 
 /*
@@ -145,7 +146,8 @@
 extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void enable_kernel_altivec(void);
-extern void giveup_altivec(struct task_struct *);
+extern void giveup_altivec(unsigned long offset);
+extern void giveup_altivec_ipi(void *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
 extern void __giveup_vsx(struct task_struct *);
@@ -157,13 +159,7 @@
 extern void cvt_fd(float *from, double *to);
 extern void cvt_df(double *from, float *to);
 
-#ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
-#else
-static inline void discard_lazy_cpu_state(void)
-{
-}
-#endif
 
 #ifdef CONFIG_ALTIVEC
 extern void flush_altivec_to_thread(struct task_struct *);
Index: linux-lazy/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-lazy/arch/powerpc/kernel/asm-offsets.c
@@ -89,6 +89,7 @@
 	DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
+	DEFINE(THREAD_VR_STATE, offsetof(struct thread_struct, vr_state));
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
 	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
@@ -197,6 +198,9 @@
 	DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time));
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+#ifdef CONFIG_ALTIVEC
+	DEFINE(PACA_LAST_USED_ALTIVEC, offsetof(struct paca_struct, last_used_altivec));
+#endif
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 	DEFINE(PACA_KVM_SVCPU, offsetof(struct paca_struct, shadow_vcpu));
 	DEFINE(SVCPU_SLB, offsetof(struct kvmppc_book3s_shadow_vcpu, slb));
Index: linux-lazy/arch/powerpc/kernel/paca.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/paca.c
+++ linux-lazy/arch/powerpc/kernel/paca.c
@@ -162,6 +162,9 @@
 	new_paca->hw_cpu_id = 0xffff;
 	new_paca->kexec_state = KEXEC_STATE_NONE;
 	new_paca->__current = &init_task;
+#ifdef CONFIG_ALTIVEC
+	new_paca->last_used_altivec = NULL;
+#endif
 #ifdef CONFIG_PPC_STD_MMU_64
 	new_paca->slb_shadow_ptr = &slb_shadow[cpu];
 #endif /* CONFIG_PPC_STD_MMU_64 */
Index: linux-lazy/arch/powerpc/kernel/process.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/process.c
+++ linux-lazy/arch/powerpc/kernel/process.c
@@ -59,7 +59,6 @@
 
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
-struct task_struct *last_task_used_altivec = NULL;
 struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
@@ -117,14 +116,7 @@
 {
 	WARN_ON(preemptible());
 
-#ifdef CONFIG_SMP
-	if (current->thread.regs && (current->thread.regs->msr & MSR_VEC))
-		giveup_altivec(current);
-	else
-		giveup_altivec(NULL);	/* just enable AltiVec for kernel - force */
-#else
-	giveup_altivec(last_task_used_altivec);
-#endif /* CONFIG_SMP */
+	giveup_altivec(0);
 }
 EXPORT_SYMBOL(enable_kernel_altivec);
 
@@ -134,16 +126,7 @@
  */
 void flush_altivec_to_thread(struct task_struct *tsk)
 {
-	if (tsk->thread.regs) {
-		preempt_disable();
-		if (tsk->thread.regs->msr & MSR_VEC) {
-#ifdef CONFIG_SMP
-			BUG_ON(tsk != current);
-#endif
-			giveup_altivec(tsk);
-		}
-		preempt_enable();
-	}
+	giveup_altivec(0);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -169,7 +152,7 @@
 void giveup_vsx(struct task_struct *tsk)
 {
 	giveup_fpu(tsk);
-	giveup_altivec(tsk);
+	giveup_altivec(0);
 	__giveup_vsx(tsk);
 }
 
@@ -220,7 +203,6 @@
 }
 #endif /* CONFIG_SPE */
 
-#ifndef CONFIG_SMP
 /*
  * If we are doing lazy switching of CPU state (FP, altivec or SPE),
  * and the current task has some state, discard it.
@@ -228,12 +210,12 @@
 void discard_lazy_cpu_state(void)
 {
 	preempt_disable();
-	if (last_task_used_math == current)
-		last_task_used_math = NULL;
 #ifdef CONFIG_ALTIVEC
-	if (last_task_used_altivec == current)
-		last_task_used_altivec = NULL;
+	giveup_altivec(0);
 #endif /* CONFIG_ALTIVEC */
+#ifndef CONFIG_SMP
+	if (last_task_used_math == current)
+		last_task_used_math = NULL;
 #ifdef CONFIG_VSX
 	if (last_task_used_vsx == current)
 		last_task_used_vsx = NULL;
@@ -242,9 +224,9 @@
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
 #endif
+#endif /* CONFIG_SMP */
 	preempt_enable();
 }
-#endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 void do_send_trap(struct pt_regs *regs, unsigned long address,
@@ -386,6 +368,78 @@
 #ifdef CONFIG_PPC64
 DEFINE_PER_CPU(struct cpu_usage, cpu_usage_array);
 #endif
+#ifdef CONFIG_PPC64
+#define LAZY_STATE_HERE (get_paca())
+#define LAZY_STATE_CPU_ID (state->hw_cpu_id)
+#else
+#define LAZY_STATE_HERE (smp_processor_id())
+#define LAZY_STATE_CPU_ID (state)
+#endif
+
+extern int csd_locked(struct call_single_data *data);
+
+#ifdef CONFIG_ALTIVEC
+/* Return value indicates if it was lazy or not */
+static bool switch_to_altivec_lazy(struct task_struct *prev,
+				   struct task_struct *new)
+{
+	/*
+	 * At this point the Altivec reg state can be in 1 of 3 places
+	 * 1) cached on _this_ CPU.   Lazy/fast  :-)
+	 * 2) in the thread_struct.   Normal     :-|
+	 * 3) cached on another CPU.  Slow IPI   :-(
+         * .... lets go workout what happened....
+	 */
+
+	/* Cache the state pointer here incase it changes */
+	TS_LAZY_STATE_TYPE state = new->thread.vr_state;
+
+	/* Is the state here? */
+	if (state == LAZY_STATE_HERE) {
+		/* It's here! Excellent, simply turn VMX on */
+		new->thread.regs->msr |= MSR_VEC;
+		return true;
+	}
+	/*
+	 * If we have used VMX in the past, but don't have lazy state,
+	 * then make sure we turn off VMX.  load_up_altivec will deal
+	 * with saving the lazy state if we run a VMX instruction
+	 */
+	new->thread.regs->msr &= ~MSR_VEC;
+
+	if (state != TS_LAZY_STATE_INVALID) {
+#ifdef CONFIG_SMP
+		/*
+		 * To avoid a deadlock, make sure we don't
+		 * have someone else state here
+		 */
+		discard_lazy_cpu_state();
+
+		/*
+		 * Get the other CPU to flush it's state
+		 * synchronously.  It's possible this may may get run
+		 * multiple times, but giveup_altivec can handle this.
+		 */
+		if (!csd_locked(&(new->thread.vr_csd)))
+			__smp_call_function_single(
+				LAZY_STATE_CPU_ID,
+				&(new->thread.vr_csd),
+				0);
+#else /* CONFIG_SMP */
+		/* UP can't have state on another CPU */
+		BUG();
+#endif
+
+	}
+	return false;
+}
+#else /* CONFIG_ALTIVEC */
+static bool switch_to_altivec_lazy(struct task_struct *prev,
+				  struct task_struct *new)
+{
+	return true;
+}
+#endif /* CONFIG_ALTIVEC */
 
 struct task_struct *__switch_to(struct task_struct *prev,
 	struct task_struct *new)
@@ -393,6 +447,12 @@
 	struct thread_struct *new_thread, *old_thread;
 	unsigned long flags;
 	struct task_struct *last;
+	int lazy = 1;
+
+	/* Does next have lazy state somewhere? */
+	if (new->thread.regs) {
+		lazy &= switch_to_altivec_lazy(prev, new);
+	}
 
 #ifdef CONFIG_SMP
 	/* avoid complexity of lazy save/restore of fpu
@@ -406,21 +466,6 @@
 	 */
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_FP))
 		giveup_fpu(prev);
-#ifdef CONFIG_ALTIVEC
-	/*
-	 * If the previous thread used altivec in the last quantum
-	 * (thus changing altivec regs) then save them.
-	 * We used to check the VRSAVE register but not all apps
-	 * set it, so we don't rely on it now (and in fact we need
-	 * to save & restore VSCR even if VRSAVE == 0).  -- paulus
-	 *
-	 * On SMP we always save/restore altivec regs just to avoid the
-	 * complexity of changing processors.
-	 *  -- Cort
-	 */
-	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
-		giveup_altivec(prev);
-#endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
 		/* VMX and FPU registers are already save here */
@@ -439,13 +484,6 @@
 #endif /* CONFIG_SPE */
 
 #else  /* CONFIG_SMP */
-#ifdef CONFIG_ALTIVEC
-	/* Avoid the trap.  On smp this this never happens since
-	 * we don't set last_task_used_altivec -- Cort
-	 */
-	if (new->thread.regs && last_task_used_altivec == new)
-		new->thread.regs->msr |= MSR_VEC;
-#endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
 	if (new->thread.regs && last_task_used_vsx == new)
 		new->thread.regs->msr |= MSR_VSX;
@@ -862,6 +900,10 @@
 	current->thread.vscr.u[3] = 0x00010000; /* Java mode disabled */
 	current->thread.vrsave = 0;
 	current->thread.used_vr = 0;
+	current->thread.vr_state = TS_LAZY_STATE_INVALID;
+	current->thread.vr_csd.func = giveup_altivec_ipi;
+	current->thread.vr_csd.info = 0;
+	current->thread.vr_csd.flags = 0;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_SPE
 	memset(current->thread.evr, 0, sizeof(current->thread.evr));
Index: linux-lazy/arch/powerpc/kernel/vector.S
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/vector.S
+++ linux-lazy/arch/powerpc/kernel/vector.S
@@ -5,7 +5,29 @@
 #include <asm/cputable.h>
 #include <asm/thread_info.h>
 #include <asm/page.h>
+#include <asm/exception-64s.h>
+#include <linux/threads.h>
 
+#ifdef CONFIG_PPC32
+	.section .bss
+	.align	4
+last_used_altivec:
+	.space	4*NR_CPUS
+	.previous
+/*
+ * Get the last_used_altivec pointer for this cpu.
+ * Pointer ends up in register n.  offset in a, volotile scratch in b
+ */
+#define LAST_USED_ALTIVEC_PTR(n, a, b)		\
+	rlwinm	b,r1,0,0,(31-THREAD_SHIFT) ;	\
+        sub     b,b,a	;			\
+	lwz	b,TI_CPU(b) ;			\
+	slwi	b,b,2	    ;			\
+	lis	n,last_used_altivec@ha ;	\
+	addi	n,n,last_used_altivec@l	;	\
+	sub	n,n,b			;	\
+	add	n,n,b
+#endif
 /*
  * load_up_altivec(unused, unused, tsk)
  * Disable VMX for the task which had it previously,
@@ -20,38 +42,98 @@
 	MTMSRD(r5)			/* enable use of AltiVec now */
 	isync
 
+	mflr	r10
+#ifdef CONFIG_PPC32
+	lis	r3, PAGE_OFFSET@h
+#endif
+	bl	giveup_altivec_msr_done
 /*
- * For SMP, we don't do lazy VMX switching because it just gets too
- * horrendously complex, especially when a task switches from one CPU
- * to another.  Instead we call giveup_altvec in switch_to.
- * VRSAVE isn't dealt with here, that is done in the normal context
- * switch code. Note that we could rely on vrsave value to eventually
- * avoid saving all of the VREGs here...
+ * lazy restore:
+ * 	If we are doing lazy restore we enter here either:
+ * 	1. never done vmx before
+ * 	2. done vmx and state is in our thread_struct
+ * 	3. done vmx and but state is being flushed via an IPI
  */
-#ifndef CONFIG_SMP
-	LOAD_REG_ADDRBASE(r3, last_task_used_altivec)
-	toreal(r3)
-	PPC_LL	r4,ADDROFF(last_task_used_altivec)(r3)
-	PPC_LCMPI	0,r4,0
-	beq	1f
+	GET_CURRENT_THREAD(r5)
+	lwz 	r4,THREAD_USED_VR(r5)
+	PPC_LCMPI	cr0,r4,0 /* we've not used vmx before */
+	beq	4f
+
+	/*
+         * Spin here waiting for IPI to finish.  Once the data is in
+	 * our thread_struct, vr_state will be null:
+	 *
+	 * First quickly check to see if data has been flushed from
+	 * another CPU yet (as it's likely the IPI has completed)
+	 */
+5:
+	PPC_LL	r4,THREAD_VR_STATE(r5)
+	PPC_LCMPI	0,r4,TS_LAZY_STATE_INVALID
+	beq+	3f /* it's likely the data is already here */
+	/*
+	* Bugger, the IPI has not completed.  Let's spin here waiting
+	* for it, but we should turn on IRQ incase someone is wait for
+	* us for something.
+	 */
 
-	/* Save VMX state to last_task_used_altivec's THREAD struct */
-	toreal(r4)
-	addi	r4,r4,THREAD
-	SAVE_32VRS(0,r5,r4)
-	mfvscr	vr0
-	li	r10,THREAD_VSCR
-	stvx	vr0,r10,r4
-	/* Disable VMX for last_task_used_altivec */
-	PPC_LL	r5,PT_REGS(r4)
-	toreal(r5)
-	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-	lis	r10,MSR_VEC@h
-	andc	r4,r4,r10
-	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
-#endif /* CONFIG_SMP */
+	/* Enable IRQs */
+#ifdef CONFIG_PPC32
+	mfmsr	r4
+	rlwimi	r4,r9,0,MSR_EE
+	MTMSRD(r4)
+#else
+	ENABLE_INTS
+#endif
+2:
+	/* Wait for lazy state to appear */
+	PPC_LL  r4,THREAD_VR_STATE(r5)
+	PPC_LCMPI	0,r4,TS_LAZY_STATE_INVALID
+	bne     2b
 
+	/* disable irqs and enable vec again */
+#ifdef CONFIG_PPC32
+	mfmsr	r4
+	oris	r4,r4,MSR_VEC@h
+	xori	r4,r4,MSR_EE
+	MTMSRD(r4)
+#else
+	mfmsr   r11
+	oris    r11,r11,MSR_VEC@h
+	xori	r11,r11,MSR_EE
+	MTMSRD(r11)
+#endif
+	/*
+	 * make sure we didn't pickup someones state while we had IRQs
+	 * on
+	 */
+#ifdef CONFIG_PPC32
+	lis	r3, PAGE_OFFSET@h
+#endif
+        bl      giveup_altivec_msr_done
+3:
+	LWSYNC /* make sure VMX registers are in memory */
+4:
+	mtlr	r10
+	/* setup lazy pointers */
+	GET_CURRENT_THREAD(r5)
+#ifdef CONFIG_PPC64
+	PPC_STL	r13,THREAD_VR_STATE(r5)
+#else
+	/* get the cpuid */
+	lis	r6,PAGE_OFFSET@h
+	rlwinm  r7,r1,0,0,(31-THREAD_SHIFT)
+	sub     r7,r7,r6
+	lwz     r7,TI_CPU(r7)
+	PPC_STL	r7,THREAD_VR_STATE(r5) /* write the cpuid */
+#endif
+	subi	r4, r5, THREAD
+#ifdef CONFIG_PPC64
+	PPC_STL	r4,PACA_LAST_USED_ALTIVEC(r13)
+#else
+/*	lis	r6,PAGE_OFFSET@h */
+	LAST_USED_ALTIVEC_PTR(r3, r6, r7)
+	PPC_STL	r4,0(r3)
+#endif
 	/* Hack: if we get an altivec unavailable trap with VRSAVE
 	 * set to all zeros, we assume this is a broken application
 	 * that fails to set it properly, and thus we switch it to
@@ -65,11 +147,8 @@
 1:
 	/* enable use of VMX after return */
 #ifdef CONFIG_PPC32
-	mfspr	r5,SPRN_SPRG_THREAD		/* current task's THREAD (phys) */
 	oris	r9,r9,MSR_VEC@h
 #else
-	ld	r4,PACACURRENT(r13)
-	addi	r5,r4,THREAD		/* Get THREAD */
 	oris	r12,r12,MSR_VEC@h
 	std	r12,_MSR(r1)
 #endif
@@ -79,29 +158,41 @@
 	lvx	vr0,r10,r5
 	mtvscr	vr0
 	REST_32VRS(0,r4,r5)
-#ifndef CONFIG_SMP
-	/* Update last_task_used_altivec to 'current' */
-	subi	r4,r5,THREAD		/* Back to 'current' */
-	fromreal(r4)
-	PPC_STL	r4,ADDROFF(last_task_used_altivec)(r3)
-#endif /* CONFIG_SMP */
 	/* restore registers and return */
 	blr
-
 /*
- * giveup_altivec(tsk)
- * Disable VMX for the task given as the argument,
- * and save the vector registers in its thread_struct.
+ * giveup_altivec(offset)
+ * Disable VMX for the task currently using vmx and and save the
+ * vector registers in its thread_struct.
  * Enables the VMX for use in the kernel on return.
  */
+_GLOBAL(giveup_altivec_ipi)
 _GLOBAL(giveup_altivec)
 	mfmsr	r5
 	oris	r5,r5,MSR_VEC@h
 	SYNC
 	MTMSRD(r5)			/* enable use of VMX now */
 	isync
+
+giveup_altivec_msr_done:
+#ifdef CONFIG_PPC64
+	PPC_LL	r3,PACA_LAST_USED_ALTIVEC(r13)
+#else
+	mr	r7, r3
+	LAST_USED_ALTIVEC_PTR(r4, r7, r5)
+	PPC_LL	r3,0(r4) /* phys address */
+#endif
 	PPC_LCMPI	0,r3,0
-	beqlr-				/* if no previous owner, done */
+	beqlr				/* if no previous owner, done */
+#ifdef CONFIG_PPC32
+	/* turn phys address into phys or virt based on offset */
+	lis	r6,PAGE_OFFSET@h
+	sub	r6, r6, r7
+	add	r3, r3, r6
+#endif
+2:
+	/* Save state to the thread struct */
+	mr	r6,r3
 	addi	r3,r3,THREAD		/* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
@@ -110,6 +201,9 @@
 	li	r4,THREAD_VSCR
 	stvx	vr0,r4,r3
 	beq	1f
+#ifdef CONFIG_PPC32
+	sub	r5, r5, r7
+#endif
 	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
 #ifdef CONFIG_VSX
 BEGIN_FTR_SECTION
@@ -120,14 +214,25 @@
 #else
 	lis	r3,MSR_VEC@h
 #endif
-	andc	r4,r4,r3		/* disable FP for previous task */
+	andc	r4,r4,r3		/* disable vmx for previous task */
 	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
 1:
-#ifndef CONFIG_SMP
+	/*
+	 * If this is an ipi, make sure state is is commited before we
+	 * clear the lazy state pointers and return.  If a CPU is waiting on
+	 * this data (IPI case) then it won't start until VR_STATE is cleared
+	 */
+	LWSYNC /* make sure registers are in mem before say they are */
+	li	r5,TS_LAZY_STATE_INVALID
+	PPC_STL	r5,THREAD+THREAD_VR_STATE(r6)
 	li	r5,0
-	LOAD_REG_ADDRBASE(r4,last_task_used_altivec)
-	PPC_STL	r5,ADDROFF(last_task_used_altivec)(r4)
-#endif /* CONFIG_SMP */
+#ifdef CONFIG_PPC64
+	PPC_STL	r5,PACA_LAST_USED_ALTIVEC(r13)
+#else
+	LAST_USED_ALTIVEC_PTR(r3, r7, r4)
+	PPC_STL	r5,0(r3)
+#endif
+	LWSYNC
 	blr
 
 #ifdef CONFIG_VSX
Index: linux-lazy/arch/powerpc/platforms/pseries/hotplug-cpu.c
===================================================================
--- linux-lazy.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ linux-lazy/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -112,6 +112,7 @@
 
 	local_irq_disable();
 	idle_task_exit();
+	discard_lazy_cpu_state();
 	xics_teardown_cpu();
 
 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 6/7] powerpc: Enable lazy save FP registers for SMP
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
                   ` (4 preceding siblings ...)
  2010-12-06 23:40 ` [RFC/PATCH 5/7] powerpc: Enable lazy save VMX registers for SMP Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  2010-12-06 23:40 ` [RFC/PATCH 7/7] powerpc: Enable lazy save VSX " Michael Neuling
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev

This enables lazy save of FP registers for SMP configurations.

This adds a pointer to the thread struct to say which CPU holds this
processes FP register state.  On 64 bit, this points to the paca of
the CPU holding the state or NULL if it's in the thread_struct.  On 32
bit, this is the CPU number of the CPU holding the state or -1 if it's
in the thread_struct.

It also adds a per cpu pointer (paca on 64bit), which points to the
thread_struct of the process who's state we currently own. 

On a context switch we do the following:
 - if we are switching to a CPU that currently holds the new processes
   state, just turn on FP in the MSR (this is the lazy/quick case)
 - if the new processes state is in the thread_struct, turn FP off.
 - if the new processes state is in someone else's CPU, IPI that CPU
   to giveup it's state and turn FP off.
We always start the new process at this point, irrespective of if we
have the state or not in the thread struct or current CPU.

When we take the FP unavailable, load_up_altivec checks to see if the
state is now in the thread_struct.  If it is, we restore the FP
registers and start the process.  If it's not, we need to wait for the
IPI to finish.  Unfortunately, IRQs are off on the current CPU at this
point, so we must turn IRQs on (to avoid a deadlock) before we block
waiting for the IPI to finished on the other CPU.

We also change load_up_fpu to call giveup_fpu to save it's state
rather than duplicating this code.  This means that giveup_altivec can
now be called with the MMU on or off, hence we pass in an offset,
which gets subtracted on 32 bit systems on loads and stores.

For 32bit it's be nice to have last_used_fp cacheline aligned or as
per_cpu variables but we can't access per_cpu vars in asm.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/paca.h      |    1 
 arch/powerpc/include/asm/processor.h |    8 +
 arch/powerpc/include/asm/system.h    |    3 
 arch/powerpc/kernel/asm-offsets.c    |    3 
 arch/powerpc/kernel/fpu.S            |  198 +++++++++++++++++++++++++++--------
 arch/powerpc/kernel/paca.c           |    1 
 arch/powerpc/kernel/process.c        |  114 +++++++++++---------
 7 files changed, 237 insertions(+), 91 deletions(-)

Index: linux-lazy/arch/powerpc/include/asm/paca.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/paca.h
+++ linux-lazy/arch/powerpc/include/asm/paca.h
@@ -145,6 +145,7 @@
 	u64 dtl_ridx;			/* read index in dispatch log */
 	struct dtl_entry *dtl_curr;	/* pointer corresponding to dtl_ridx */
 
+	struct task_struct *last_used_fp;
 #ifdef CONFIG_ALTIVEC
 	/* lazy save pointers */
 	struct task_struct *last_used_altivec;
Index: linux-lazy/arch/powerpc/include/asm/processor.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/processor.h
+++ linux-lazy/arch/powerpc/include/asm/processor.h
@@ -120,7 +120,6 @@
 extern long kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
 /* Lazy FPU handling on uni-processor */
-extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
@@ -253,6 +252,13 @@
 	} fpscr;
 	int		fpexc_mode;	/* floating-point exception mode */
 	unsigned int	align_ctl;	/* alignment handling control */
+	int		used_fp;	/* set if process has used fp */
+#ifdef CONFIG_PPC64
+	struct paca_struct *fp_state;	/* paca where my fp state could be? */
+#else
+	unsigned long	fp_state;	/* paca where my fp state could be? */
+#endif
+	struct call_single_data fp_csd;	/* IPI data structure */
 #ifdef CONFIG_PPC64
 	unsigned long	start_tb;	/* Start purr when proc switched in */
 	unsigned long	accum_tb;	/* Total accumilated purr for process */
Index: linux-lazy/arch/powerpc/include/asm/system.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/system.h
+++ linux-lazy/arch/powerpc/include/asm/system.h
@@ -140,7 +140,8 @@
 extern void via_cuda_init(void);
 extern void read_rtc_time(void);
 extern void pmac_find_display(void);
-extern void giveup_fpu(struct task_struct *);
+extern void giveup_fpu(unsigned long offset);
+extern void giveup_fpu_ipi(void *);
 extern void disable_kernel_fp(void);
 extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
Index: linux-lazy/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-lazy/arch/powerpc/kernel/asm-offsets.c
@@ -84,6 +84,8 @@
 	DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
 	DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
 	DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
+	DEFINE(THREAD_FP_STATE, offsetof(struct thread_struct, fp_state));
+	DEFINE(THREAD_USED_FP, offsetof(struct thread_struct, used_fp));
 #ifdef CONFIG_ALTIVEC
 	DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
 	DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
@@ -198,6 +200,7 @@
 	DEFINE(PACA_USER_TIME, offsetof(struct paca_struct, user_time));
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+	DEFINE(PACA_LAST_USED_FP, offsetof(struct paca_struct, last_used_fp));
 #ifdef CONFIG_ALTIVEC
 	DEFINE(PACA_LAST_USED_ALTIVEC, offsetof(struct paca_struct, last_used_altivec));
 #endif
Index: linux-lazy/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/fpu.S
+++ linux-lazy/arch/powerpc/kernel/fpu.S
@@ -23,6 +23,8 @@
 #include <asm/thread_info.h>
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+#include <linux/threads.h>
 
 #ifdef CONFIG_VSX
 #define REST_32FPVSRS(n,c,base)						\
@@ -47,6 +49,26 @@
 #define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
 #endif
 
+#ifdef CONFIG_PPC32
+       .section .bss
+       .align  4
+last_used_fp:
+       .space  4*NR_CPUS
+       .previous
+/*
+ * Get the last_used_fp pointer for this cpu.
+ * Pointer ends up in register n.  offset in a, volotile scratch in b
+ */
+#define LAST_USED_FP_PTR(n, a, b)		\
+       rlwinm  b,r1,0,0,(31-THREAD_SHIFT) ;	\
+       sub     b,b,a	;			\
+       lwz     b,TI_CPU(b) ;			\
+       slwi    b,b,2       ;			\
+       lis     n,last_used_fp@ha ;		\
+       addi    n,n,last_used_fp@l ;		\
+       sub     n,n,a	;			\
+       add     n,n,b
+#endif
 /*
  * This task wants to use the FPU now.
  * On UP, disable FP for the task which had the FPU previously,
@@ -65,52 +87,113 @@
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
+
+	mflr    r10
+#ifdef CONFIG_PPC32
+	lis	r3, PAGE_OFFSET@h
+#endif
+	bl      giveup_fpu_msr_done
 /*
- * For SMP, we don't do lazy FPU switching because it just gets too
- * horrendously complex, especially when a task switches from one CPU
- * to another.  Instead we call giveup_fpu in switch_to.
+ * lazy restore:
+ * 	If we are doing lazy restore we enter here either:
+ * 	1. never done fp before
+ * 	2. done fp and state is in our thread_struct
+ * 	3. done fp and but state is being flushed via an IPI
  */
-#ifndef CONFIG_SMP
-	LOAD_REG_ADDRBASE(r3, last_task_used_math)
-	toreal(r3)
-	PPC_LL	r4,ADDROFF(last_task_used_math)(r3)
-	PPC_LCMPI	0,r4,0
-	beq	1f
-	toreal(r4)
-	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPVSRS(0, r5, r4)
-	mffs	fr0
-	stfd	fr0,THREAD_FPSCR(r4)
-	PPC_LL	r5,PT_REGS(r4)
-	toreal(r5)
-	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-	li	r10,MSR_FP|MSR_FE0|MSR_FE1
-	andc	r4,r4,r10		/* disable FP for previous task */
-	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
-#endif /* CONFIG_SMP */
+	GET_CURRENT_THREAD(r5)
+	lwz 	r4,THREAD_USED_FP(r5)
+	PPC_LCMPI	cr0,r4,0 /* we've not used fp before */
+	beq	4f
+
+	/*
+         * Spin here waiting for IPI to finish.  Once the data is in
+	 * our thread_struct, cp_state will be null:
+	 *
+	 * First quickly check to see if data has been flushed from
+	 * another CPU yet (as it's likely the IPI has completed)
+	 */
+5:
+	PPC_LL	r4,THREAD_FP_STATE(r5)
+	PPC_LCMPI	0,r4,TS_LAZY_STATE_INVALID
+	beq+	3f /* it's likely the data is already here */
+	/*
+	 * Bugger, the IPI has not completed.  Let's spin here waiting
+	 * for it, but we should turn on IRQ incase someone is wait for
+	 * us for something.
+	 */
+
+	/* Enable IRQs */
+#ifdef CONFIG_PPC32
+	mfmsr	r4
+	rlwimi	r4,r9,0,MSR_EE
+	MTMSRD(r4)
+#else
+	ENABLE_INTS
+#endif
+2:
+	/* Wait for lazy state to appear */
+	PPC_LL	r4,THREAD_FP_STATE(r5)
+	PPC_LCMPI	0,r4,TS_LAZY_STATE_INVALID
+	bne	2b
+
+	/* disable irqs and enable fp again */
+#ifdef CONFIG_PPC32
+	mfmsr	r4
+	ori	r4,r4,MSR_FP
+	xori	r4,r4,MSR_EE
+	MTMSRD(r4)
+#else
+	mfmsr	r11
+	ori	r11,r11,MSR_FP
+	xori	r11,r11,MSR_EE
+	MTMSRD(r11)
+#endif
+	/*
+	 * make sure we didn't pickup someones state while we had IRQs
+	 * on
+	 */
+#ifdef CONFIG_PPC32
+	lis	r3, PAGE_OFFSET@h
+#endif
+	bl	giveup_fpu_msr_done
+3:
+	LWSYNC /* make sure fp registers are in memory */
+4:
+	mtlr	r10
+
+	/* setup lazy pointers */
+	GET_CURRENT_THREAD(r5)
+#ifdef CONFIG_PPC64
+	PPC_STL	r13,THREAD_FP_STATE(r5)
+#else
+	/* get the cpuid */
+	lis	r6,PAGE_OFFSET@h
+	rlwinm  r7,r1,0,0,(31-THREAD_SHIFT)
+	sub     r7,r7,r6
+	lwz     r7,TI_CPU(r7)
+	PPC_STL	r7,THREAD_FP_STATE(r5) /* write the cpuid */
+#endif
+	subi	r4, r5, THREAD
+#ifdef CONFIG_PPC64
+	PPC_STL	r4,PACA_LAST_USED_FP(r13)
+#else
+/*	lis	r6, PAGE_OFFSET@h */
+	LAST_USED_FP_PTR(r3, r6, r7)
+	PPC_STL	r4,0(r3)
+#endif
 	/* enable use of FP after return */
 #ifdef CONFIG_PPC32
-	mfspr	r5,SPRN_SPRG_THREAD		/* current task's THREAD (phys) */
-	lwz	r4,THREAD_FPEXC_MODE(r5)
-	ori	r9,r9,MSR_FP		/* enable FP for current */
-	or	r9,r9,r4
-#else
-	ld	r4,PACACURRENT(r13)
-	addi	r5,r4,THREAD		/* Get THREAD */
-	lwz	r4,THREAD_FPEXC_MODE(r5)
+	ori	r9,r9,MSR_FP
+#else
 	ori	r12,r12,MSR_FP
-	or	r12,r12,r4
 	std	r12,_MSR(r1)
 #endif
+	li	r4,1
+	stw	r4,THREAD_USED_FP(r5)
+	LWSYNC
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
 	REST_32FPVSRS(0, r4, r5)
-#ifndef CONFIG_SMP
-	subi	r4,r5,THREAD
-	fromreal(r4)
-	PPC_STL	r4,ADDROFF(last_task_used_math)(r3)
-#endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
 	blr
@@ -122,6 +205,7 @@
  * Enables the FPU for use in the kernel on return.
  */
 _GLOBAL(giveup_fpu)
+_GLOBAL(giveup_fpu_ipi)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
 #ifdef CONFIG_VSX
@@ -134,8 +218,26 @@
 	MTMSRD(r5)			/* enable use of fpu now */
 	SYNC_601
 	isync
+
+giveup_fpu_msr_done:
+#ifdef CONFIG_PPC64
+	PPC_LL	r3,PACA_LAST_USED_FP(r13)
+#else
+	mr	r7, r3
+	LAST_USED_FP_PTR(r4, r7, r5)
+	PPC_LL	r3,0(r4)		/* phys address */
+#endif
 	PPC_LCMPI	0,r3,0
-	beqlr-				/* if no previous owner, done */
+	beqlr				/* if no previous owner, done */
+#ifdef CONFIG_PPC32
+	/* turn phys address into phys or virt based on offset */
+	lis	r6,PAGE_OFFSET@h
+	sub	r6, r6, r7
+	add	r3, r3, r6
+#endif
+2:
+	/* Save state to the thread struct */
+	mr	r6,r3
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
@@ -143,6 +245,9 @@
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
+#ifdef CONFIG_PPC32
+	sub	r5, r5, r7
+#endif
 	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
 	li	r3,MSR_FP|MSR_FE0|MSR_FE1
 #ifdef CONFIG_VSX
@@ -153,11 +258,22 @@
 	andc	r4,r4,r3		/* disable FP for previous task */
 	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
 1:
-#ifndef CONFIG_SMP
+	/*
+	 * If this is an ipi, make sure state is is commited before we
+	 * clear the lazy state pointers and return.  If a CPU is waiting on
+	 * this data (IPI case) then it won't start until FP_STATE is cleared
+	 */
+	LWSYNC /* make sure registers are in mem before say they are */
+	li	r5,TS_LAZY_STATE_INVALID
+	PPC_STL	r5,THREAD+THREAD_FP_STATE(r6)
 	li	r5,0
-	LOAD_REG_ADDRBASE(r4,last_task_used_math)
-	PPC_STL	r5,ADDROFF(last_task_used_math)(r4)
-#endif /* CONFIG_SMP */
+#ifdef CONFIG_PPC64
+	PPC_STL	r5,PACA_LAST_USED_FP(r13)
+#else
+	LAST_USED_FP_PTR(r3, r7, r4)
+	PPC_STL	r5,0(r3)
+#endif
+	LWSYNC
 	blr
 
 /*
Index: linux-lazy/arch/powerpc/kernel/paca.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/paca.c
+++ linux-lazy/arch/powerpc/kernel/paca.c
@@ -162,6 +162,7 @@
 	new_paca->hw_cpu_id = 0xffff;
 	new_paca->kexec_state = KEXEC_STATE_NONE;
 	new_paca->__current = &init_task;
+	new_paca->last_used_fp = NULL;
 #ifdef CONFIG_ALTIVEC
 	new_paca->last_used_altivec = NULL;
 #endif
Index: linux-lazy/arch/powerpc/kernel/process.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/process.c
+++ linux-lazy/arch/powerpc/kernel/process.c
@@ -58,7 +58,6 @@
 extern unsigned long _get_SP(void);
 
 #ifndef CONFIG_SMP
-struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
@@ -69,45 +68,14 @@
  */
 void flush_fp_to_thread(struct task_struct *tsk)
 {
-	if (tsk->thread.regs) {
-		/*
-		 * We need to disable preemption here because if we didn't,
-		 * another process could get scheduled after the regs->msr
-		 * test but before we have finished saving the FP registers
-		 * to the thread_struct.  That process could take over the
-		 * FPU, and then when we get scheduled again we would store
-		 * bogus values for the remaining FP registers.
-		 */
-		preempt_disable();
-		if (tsk->thread.regs->msr & MSR_FP) {
-#ifdef CONFIG_SMP
-			/*
-			 * This should only ever be called for current or
-			 * for a stopped child process.  Since we save away
-			 * the FP register state on context switch on SMP,
-			 * there is something wrong if a stopped child appears
-			 * to still have its FP state in the CPU registers.
-			 */
-			BUG_ON(tsk != current);
-#endif
-			giveup_fpu(tsk);
-		}
-		preempt_enable();
-	}
+	giveup_fpu(0);
 }
 
 void enable_kernel_fp(void)
 {
 	WARN_ON(preemptible());
 
-#ifdef CONFIG_SMP
-	if (current->thread.regs && (current->thread.regs->msr & MSR_FP))
-		giveup_fpu(current);
-	else
-		giveup_fpu(NULL);	/* just enables FP for kernel */
-#else
-	giveup_fpu(last_task_used_math);
-#endif /* CONFIG_SMP */
+	giveup_fpu(0);
 }
 EXPORT_SYMBOL(enable_kernel_fp);
 
@@ -151,7 +119,7 @@
 
 void giveup_vsx(struct task_struct *tsk)
 {
-	giveup_fpu(tsk);
+	giveup_fpu(0);
 	giveup_altivec(0);
 	__giveup_vsx(tsk);
 }
@@ -210,12 +178,11 @@
 void discard_lazy_cpu_state(void)
 {
 	preempt_disable();
+	giveup_fpu(0);
 #ifdef CONFIG_ALTIVEC
 	giveup_altivec(0);
 #endif /* CONFIG_ALTIVEC */
 #ifndef CONFIG_SMP
-	if (last_task_used_math == current)
-		last_task_used_math = NULL;
 #ifdef CONFIG_VSX
 	if (last_task_used_vsx == current)
 		last_task_used_vsx = NULL;
@@ -378,6 +345,60 @@
 
 extern int csd_locked(struct call_single_data *data);
 
+/* Return value indicates if it was lazy or not */
+static bool switch_to_fp_lazy(struct task_struct *prev,
+			      struct task_struct *new)
+{
+	/*
+	 * At this point the FP reg state can be in 1 of 3 places
+	 * 1) cached on _this_ CPU.   Lazy/fast  :-)
+	 * 2) in the thread_struct.   Normal     :-|
+	 * 3) cached on another CPU.  Slow IPI   :-(
+         * .... lets go workout what happened....
+	 */
+
+	/* Cache the state pointer here incase it changes */
+	TS_LAZY_STATE_TYPE state = new->thread.fp_state;
+
+	/* Is the state here? */
+	if (state == LAZY_STATE_HERE) {
+		/* It's here! Excellent, simply turn FP on */
+		new->thread.regs->msr |= MSR_FP;
+		return true;
+	}
+	/*
+	 * If we have used FP in the past, but don't have lazy state,
+	 * then make sure we turn off FP.  load_up_fpu will deal
+	 * with saving the lazy state if we run an fp instruction
+	 */
+	new->thread.regs->msr &= ~MSR_FP;
+
+	if (state != TS_LAZY_STATE_INVALID) {
+#ifdef CONFIG_SMP
+		/*
+		 * To avoid a deadlock, make sure we don't
+		 * have someone else state here
+		 */
+		discard_lazy_cpu_state();
+
+		/*
+		 * Get the other CPU to flush it's state
+		 * synchronously.  It's possible this may may get run
+		 * multiple times, but giveup_fpu can handle this.
+		 */
+		if (!csd_locked(&(new->thread.fp_csd)))
+			__smp_call_function_single(
+				LAZY_STATE_CPU_ID,
+				&(new->thread.fp_csd),
+				0);
+#else /* CONFIG_SMP */
+		/* UP can't have state on another CPU */
+		BUG();
+#endif
+	}
+	return false;
+}
+
 #ifdef CONFIG_ALTIVEC
 /* Return value indicates if it was lazy or not */
 static bool switch_to_altivec_lazy(struct task_struct *prev,
@@ -451,21 +472,11 @@
 
 	/* Does next have lazy state somewhere? */
 	if (new->thread.regs) {
+		lazy &= switch_to_fp_lazy(prev, new);
 		lazy &= switch_to_altivec_lazy(prev, new);
 	}
 
 #ifdef CONFIG_SMP
-	/* avoid complexity of lazy save/restore of fpu
-	 * by just saving it every time we switch out if
-	 * this task used the fpu during the last quantum.
-	 *
-	 * If it tries to use the fpu again, it'll trap and
-	 * reload its fp regs.  So we don't have to do a restore
-	 * every switch, just a save.
-	 *  -- Cort
-	 */
-	if (prev->thread.regs && (prev->thread.regs->msr & MSR_FP))
-		giveup_fpu(prev);
 #ifdef CONFIG_VSX
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
 		/* VMX and FPU registers are already save here */
@@ -892,8 +903,15 @@
 #ifdef CONFIG_VSX
 	current->thread.used_vsr = 0;
 #endif
+#ifdef CONFIG_PPC_FPU
 	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
+	current->thread.used_fp = 0;
+	current->thread.fp_state = TS_LAZY_STATE_INVALID;
+	current->thread.fp_csd.func = giveup_fpu_ipi;
+	current->thread.fp_csd.info = 0;
+	current->thread.fp_csd.flags = 0;
+#endif /* CONFIG_PPC_FPU */
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
 	memset(&current->thread.vscr, 0, sizeof(current->thread.vscr));

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC/PATCH 7/7] powerpc: Enable lazy save VSX for SMP
  2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
                   ` (5 preceding siblings ...)
  2010-12-06 23:40 ` [RFC/PATCH 6/7] powerpc: Enable lazy save FP " Michael Neuling
@ 2010-12-06 23:40 ` Michael Neuling
  6 siblings, 0 replies; 8+ messages in thread
From: Michael Neuling @ 2010-12-06 23:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kumar Gala; +Cc: linuxppc-dev

This enables lazy save of VSX state for SMP configurations.

Most of the logic for this is in the FP and VMX code, since VSX has no
additional state over these.  

When context switching to a new process:
 - if both VMX and FP state are on the CPU we are switch to, turn VSX
   on also.
 - if either FP or VMX are not on the CPU we are switch on, do not
   turn VSX on in the MSR.
We always start the new process at this point, irrespective of if we
have the FP and/or VMX state in the thread struct or current CPU.

When we take the vsx_unavailable exception, we first run load_up_fpu
and load_up_altivec to enable our state.  If either of these fail to
enable their respective MSR bits, the state has not arrived from the
IPI and hence we should bail to userspace with VSX off.  This will
enable IRQs, while we are waiting for the FP and VMX state. 

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/system.h |    4 +-
 arch/powerpc/kernel/process.c     |   72 ++++++++++++++++++++------------------
 arch/powerpc/kernel/signal_32.c   |    2 -
 arch/powerpc/kernel/signal_64.c   |    2 -
 arch/powerpc/kernel/vector.S      |   48 +++----------------------
 5 files changed, 49 insertions(+), 79 deletions(-)

Index: linux-lazy/arch/powerpc/include/asm/system.h
===================================================================
--- linux-lazy.orig/arch/powerpc/include/asm/system.h
+++ linux-lazy/arch/powerpc/include/asm/system.h
@@ -150,8 +150,8 @@
 extern void giveup_altivec_ipi(void *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
-extern void __giveup_vsx(struct task_struct *);
-extern void giveup_vsx(struct task_struct *);
+extern void __giveup_vsx(void);
+extern void giveup_vsx(void);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
Index: linux-lazy/arch/powerpc/kernel/process.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/process.c
+++ linux-lazy/arch/powerpc/kernel/process.c
@@ -58,7 +58,6 @@
 extern unsigned long _get_SP(void);
 
 #ifndef CONFIG_SMP
-struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -105,37 +104,21 @@
 {
 	WARN_ON(preemptible());
 
-#ifdef CONFIG_SMP
-	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
-		giveup_vsx(current);
-	else
-		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
-#else
-	giveup_vsx(last_task_used_vsx);
-#endif /* CONFIG_SMP */
+	giveup_vsx();
 }
 EXPORT_SYMBOL(enable_kernel_vsx);
 #endif
 
-void giveup_vsx(struct task_struct *tsk)
+void giveup_vsx(void)
 {
 	giveup_fpu(0);
 	giveup_altivec(0);
-	__giveup_vsx(tsk);
+	__giveup_vsx();
 }
 
 void flush_vsx_to_thread(struct task_struct *tsk)
 {
-	if (tsk->thread.regs) {
-		preempt_disable();
-		if (tsk->thread.regs->msr & MSR_VSX) {
-#ifdef CONFIG_SMP
-			BUG_ON(tsk != current);
-#endif
-			giveup_vsx(tsk);
-		}
-		preempt_enable();
-	}
+	giveup_vsx();
 }
 #endif /* CONFIG_VSX */
 
@@ -182,11 +165,11 @@
 #ifdef CONFIG_ALTIVEC
 	giveup_altivec(0);
 #endif /* CONFIG_ALTIVEC */
-#ifndef CONFIG_SMP
 #ifdef CONFIG_VSX
-	if (last_task_used_vsx == current)
-		last_task_used_vsx = NULL;
+	/* use __ version since fpu and altivec have been called already */
+	__giveup_vsx();
 #endif /* CONFIG_VSX */
+#ifndef CONFIG_SMP
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -462,26 +445,51 @@
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/* Return value indicates if it was lazy or not */
+static bool switch_to_vsx_lazy(struct task_struct *prev,
+			      struct task_struct *new,
+			      bool lazy)
+{
+	/* Is the state here? */
+	if (lazy) {
+		/* It's here! Excellent, simply turn VSX on */
+		new->thread.regs->msr |= MSR_VSX;
+		return true;
+	}
+	/*
+	 * If we have used VSX in the past, but don't have lazy state,
+	 * then make sure we turn off VSX.  load_up_vsx will deal
+	 * with saving the lazy state if we run a VSX instruction
+	 */
+	new->thread.regs->msr &= ~MSR_VSX;
+	return false;
+}
+#else /* CONFIG_VSX */
+static bool switch_to_vsx_lazy(struct task_struct *prev,
+			      struct task_struct *new,
+			      int lazy)
+{
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
 struct task_struct *__switch_to(struct task_struct *prev,
 	struct task_struct *new)
 {
 	struct thread_struct *new_thread, *old_thread;
 	unsigned long flags;
 	struct task_struct *last;
-	int lazy = 1;
+	bool lazy = true;
 
 	/* Does next have lazy state somewhere? */
 	if (new->thread.regs) {
 		lazy &= switch_to_fp_lazy(prev, new);
 		lazy &= switch_to_altivec_lazy(prev, new);
+		switch_to_vsx_lazy(prev, new, lazy);
 	}
 
 #ifdef CONFIG_SMP
-#ifdef CONFIG_VSX
-	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
-		/* VMX and FPU registers are already save here */
-		__giveup_vsx(prev);
-#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -495,10 +503,6 @@
 #endif /* CONFIG_SPE */
 
 #else  /* CONFIG_SMP */
-#ifdef CONFIG_VSX
-	if (new->thread.regs && last_task_used_vsx == new)
-		new->thread.regs->msr |= MSR_VSX;
-#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
Index: linux-lazy/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/signal_32.c
+++ linux-lazy/arch/powerpc/kernel/signal_32.c
@@ -452,7 +452,7 @@
 	 * contains valid data
 	 */
 	if (current->thread.used_vsr && ctx_has_vsx_region) {
-		__giveup_vsx(current);
+		__giveup_vsx();
 		if (copy_vsx_to_user(&frame->mc_vsregs, current))
 			return 1;
 		msr |= MSR_VSX;
Index: linux-lazy/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/signal_64.c
+++ linux-lazy/arch/powerpc/kernel/signal_64.c
@@ -123,7 +123,7 @@
 	 * VMX data.
 	 */
 	if (current->thread.used_vsr && ctx_has_vsx_region) {
-		__giveup_vsx(current);
+		__giveup_vsx();
 		v_regs += ELF_NVRREG;
 		err |= copy_vsx_to_user(v_regs, current);
 		/* set MSR_VSX in the MSR value in the frame to
Index: linux-lazy/arch/powerpc/kernel/vector.S
===================================================================
--- linux-lazy.orig/arch/powerpc/kernel/vector.S
+++ linux-lazy/arch/powerpc/kernel/vector.S
@@ -257,50 +257,23 @@
 	beql+	load_up_fpu		/* skip if already loaded */
 	andis.	r5,r12,MSR_VEC@h
 	beql+	load_up_altivec		/* skip if already loaded */
-
-#ifndef CONFIG_SMP
-	ld	r3,last_task_used_vsx@got(r2)
-	ld	r4,0(r3)
-	cmpdi	0,r4,0
-	beq	1f
-	/* Disable VSX for last_task_used_vsx */
-	addi	r4,r4,THREAD
-	ld	r5,PT_REGS(r4)
-	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-	lis	r6,MSR_VSX@h
-	andc	r6,r4,r6
-	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
-#endif /* CONFIG_SMP */
-	ld	r4,PACACURRENT(r13)
-	addi	r4,r4,THREAD		/* Get THREAD */
+/* state is all local now */
+	GET_CURRENT_THREAD(r5)
 	li	r6,1
-	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	stw	r6,THREAD_USED_VSR(r5)
 	/* enable use of VSX after return */
 	oris	r12,r12,MSR_VSX@h
 	std	r12,_MSR(r1)
-#ifndef CONFIG_SMP
-	/* Update last_task_used_vsx to 'current' */
-	ld	r4,PACACURRENT(r13)
-	std	r4,0(r3)
-#endif /* CONFIG_SMP */
 	b	fast_exception_return
 
 /*
- * __giveup_vsx(tsk)
- * Disable VSX for the task given as the argument.
+ * __giveup_vsx()
+ * Disable VSX for current task
  * Does NOT save vsx registers.
- * Enables the VSX for use in the kernel on return.
+ * Doesn't enable kernel VSX on return (we could if need later)
  */
 _GLOBAL(__giveup_vsx)
-	mfmsr	r5
-	oris	r5,r5,MSR_VSX@h
-	mtmsrd	r5			/* enable use of VSX now */
-	isync
-
-	cmpdi	0,r3,0
-	beqlr-				/* if no previous owner, done */
-	addi	r3,r3,THREAD		/* want THREAD of task */
+	GET_CURRENT_THREAD(r3)
 	ld	r5,PT_REGS(r3)
 	cmpdi	0,r5,0
 	beq	1f
@@ -309,16 +282,9 @@
 	andc	r4,r4,r3		/* disable VSX for previous task */
 	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
 1:
-#ifndef CONFIG_SMP
-	li	r5,0
-	ld	r4,last_task_used_vsx@got(r2)
-	std	r5,0(r4)
-#endif /* CONFIG_SMP */
 	blr
-
 #endif /* CONFIG_VSX */
 
-
 /*
  * The routines below are in assembler so we can closely control the
  * usage of floating-point registers.  These routines must be called

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-12-06 23:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-06 23:40 [RFC/PATCH 0/7] powerpc: Implement lazy save of FP, VMX and VSX state in SMP Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 1/7] Add csd_locked function Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 2/7] Rearrange include files to make struct call_single_data usable in more places Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 3/7] powerpc: Reorganise powerpc include files to make call_single_data Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 4/7] powerpc: Change fast_exception_return to restore r0, r7. r8, and CTR Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 5/7] powerpc: Enable lazy save VMX registers for SMP Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 6/7] powerpc: Enable lazy save FP " Michael Neuling
2010-12-06 23:40 ` [RFC/PATCH 7/7] powerpc: Enable lazy save VSX " Michael Neuling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).