All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
@ 2008-06-18  0:47 Michael Neuling
  2008-06-18  0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
                   ` (10 more replies)
  0 siblings, 11 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
This series is on top of the POWER7 cputable entry patch. 

Paulus: please consider for your 2.6.27 tree.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (3 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18 14:53   ` Kumar Gala
  2008-06-18  0:47 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This will never happen in reality, but it looks bad.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/signal_32.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
 		int sigret)
 {
+	unsigned long msr = regs->msr;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_VEC in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_VEC;
 	}
 	/* else assert((regs->msr & MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_SPE in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_SPE;
 	}
 	/* else assert((regs->msr & MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
 		return 1;
 #endif /* CONFIG_SPE */
 
+	if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+		return 1;
 	if (sigret) {
 		/* Set up the sigreturn trampoline: li r0,sigret; sc */
 		if (__put_user(0x38000000UL + sigret, &frame->tramp[0])

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-18  0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18  0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/align.c       |    6 ++--
 arch/powerpc/kernel/asm-offsets.c |    2 -
 arch/powerpc/kernel/process.c     |    5 ++-
 arch/powerpc/kernel/ptrace.c      |   14 +++++----
 arch/powerpc/kernel/ptrace32.c    |    9 ++++--
 arch/powerpc/kernel/signal_32.c   |    6 ++--
 arch/powerpc/kernel/signal_64.c   |   13 +++++---
 arch/powerpc/kernel/softemu8xx.c  |    4 +-
 arch/powerpc/math-emu/math.c      |   56 +++++++++++++++++++-------------------
 include/asm-powerpc/ppc_asm.h     |    5 ++-
 include/asm-powerpc/processor.h   |    7 ++++
 11 files changed, 71 insertions(+), 56 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -66,7 +66,7 @@ int main(void)
 	DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
 	DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
 	DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
-	DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
+	DEFINE(THREAD_FPR0, offsetof(struct thread_struct, TS_FPR(0)));
 	DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
 #ifdef CONFIG_ALTIVEC
 	DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
-	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+	memset(current->thread.TS_FPRSTART, 0,
+	       sizeof(current->thread.TS_FPRSTART));
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   &target->thread.fpr, 0, -1);
+				   &target->thread.TS_FPRSTART, 0, -1);
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-				  &target->thread.fpr, 0, -1);
+				  &target->thread.TS_FPRSTART, 0, -1);
 }
 
 
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -122,7 +122,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +163,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (numReg - PT_FPR0)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +219,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -343,7 +343,7 @@ static int save_user_regs(struct pt_regs
 
 	/* save general and floating-point registers */
 	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.fpr,
+	    __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
 		    ELF_NFPREG * sizeof(double)))
 		return 1;
 
@@ -431,7 +431,7 @@ static long restore_user_regs(struct pt_
 
 	/*
 	 * Do this before updating the thread state in
-	 * current->thread.fpr/vr/evr.  That way, if we get preempted
+	 * current->thread.FPR/vr/evr.  That way, if we get preempted
 	 * and another task grabs the FPU/Altivec/SPE, it won't be
 	 * tempted to save the current CPU state into the thread_struct
 	 * and corrupt what we are writing there.
@@ -441,7 +441,7 @@ static long restore_user_regs(struct pt_
 	/* force the process to reload the FP registers from
 	   current->thread when it next does FP instructions */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
 		return 1;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -116,7 +116,8 @@ static long setup_sigcontext(struct sigc
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPRSTART,
+			      FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -168,7 +169,7 @@ static long restore_sigcontext(struct pt
 
 	/*
 	 * Do this before updating the thread state in
-	 * current->thread.fpr/vr.  That way, if we get preempted
+	 * current->thread.TS_FPR/vr.  That way, if we get preempted
 	 * and another task grabs the FPU/Altivec, it won't be
 	 * tempted to save the current CPU state into the thread_struct
 	 * and corrupt what we are writing there.
@@ -177,12 +178,14 @@ static long restore_sigcontext(struct pt
 
 	/*
 	 * Force reload of FP/VEC.
-	 * This has to be done before copying stuff into current->thread.fpr/vr
-	 * for the reasons explained in the previous comment.
+	 * This has to be done before copying stuff into
+	 * current->thread.TS_FPR/vr for the reasons explained in the
+	 * previous comment.
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
+				FP_REGS_SIZE);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,9 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+#define TS_FPRSTART fpr
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
@@ -197,12 +200,13 @@ struct thread_struct {
 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
 }
 #else
+#define	FPVSR_INIT_THREAD .fpr = {0}
 #define INIT_THREAD  { \
 	.ksp = INIT_SP, \
 	.ksp_limit = INIT_SP_LIMIT, \
 	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
 	.fs = KERNEL_DS, \
-	.fpr = {0}, \
+	FPVSR_INIT_THREAD, \
 	.fpscr = { .val = 0, }, \
 	.fpexc_mode = 0, \
 }
@@ -289,4 +293,5 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 3/9] powerpc: Move altivec_unavailable
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (2 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18  0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/head_64.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf00
 	b	performance_monitor_pSeries
 
-	STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+	. = 0xf20
+	b	altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
+	STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18  0:47 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S        |    2 +-
 arch/powerpc/kernel/head_32.S    |    6 ++++--
 arch/powerpc/kernel/head_64.S    |    8 +++++---
 arch/powerpc/kernel/head_booke.h |    6 ++++--
 4 files changed, 14 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
-	b	fast_exception_return
+	blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
 	b 	ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 	EXCEPTION_PROLOG
-	bne	load_up_fpu		/* if from user, just load it up */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_fpu		/* if from user, just load it up */
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
 	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
-1:	b	.load_up_fpu
+1:	bl	.load_up_fpu
+	b	fast_exception_return
 
 	.align	7
 	.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-	bne	.load_up_altivec	/* if from user, just load it up */
+	bnel	.load_up_altivec
+	b	fast_exception_return
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	bl	.save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
 	std	r4,0(r3)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
-	b	fast_exception_return
+	blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION					      \
 	START_EXCEPTION(FloatingPointUnavailable)			      \
 	NORMAL_EXCEPTION_PROLOG;					      \
-	bne	load_up_fpu;		/* if from user, just load it up */   \
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
+	beq	1f;							      \
+	bl	load_up_fpu;		/* if from user, just load it up */   \
+	b	fast_exception_return;					      \
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-18  0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
  2008-06-18  0:47 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18 19:35   ` Kumar Gala
  2008-06-19  4:22   ` Kumar Gala
  2008-06-18  0:47 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

                   VSR doubleword 0               VSR doubleword 1
          ----------------------------------------------------------------
  VSR[0]  |             FPR[0]            |                              |
          ----------------------------------------------------------------
  VSR[1]  |             FPR[1]            |                              |
          ----------------------------------------------------------------
          |              ...              |                              |
          |              ...              |                              |
          ----------------------------------------------------------------
  VSR[30] |             FPR[30]           |                              |
          ----------------------------------------------------------------
  VSR[31] |             FPR[31]           |                              |
          ----------------------------------------------------------------
  VSR[32] |                             VR[0]                            |
          ----------------------------------------------------------------
  VSR[33] |                             VR[1]                            |
          ----------------------------------------------------------------
          |                              ...                             |
          |                              ...                             |
          ----------------------------------------------------------------
  VSR[62] |                             VR[30]                           |
          ----------------------------------------------------------------
  VSR[63] |                             VR[31]                           |
          ----------------------------------------------------------------

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/asm-offsets.c      |    4 ++
 arch/powerpc/kernel/ptrace.c           |   28 +++++++++++++++
 arch/powerpc/kernel/signal_32.c        |   59 +++++++++++++++++++++++++--------
 arch/powerpc/kernel/signal_64.c        |   36 +++++++++++++++++---
 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++
 include/asm-powerpc/processor.h        |   31 ++++++++++++++++-
 6 files changed, 155 insertions(+), 19 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpvsr.vsr[0]));
+	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
 		   unsigned int pos, unsigned int count,
 		   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = target->thread.TS_FPR(i);
+	memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   &target->thread.TS_FPRSTART, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+	if (i)
+		return i;
+	for (i = 0; i < 32 ; i++)
+		target->thread.TS_FPR(i) = buf[i];
+	memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+	return 0;
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  &target->thread.TS_FPRSTART, 0, -1);
+#endif
 }
 
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
 		int sigret)
 {
 	unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
-	/* save general and floating-point registers */
-	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
-		    ELF_NFPREG * sizeof(double)))
+	/* save general registers */
+	if (save_general_regs(regs, frame))
 		return 1;
 
 #ifdef CONFIG_ALTIVEC
@@ -368,7 +370,21 @@ static int save_user_regs(struct pt_regs
 	if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* save FPR copy to local buffer then write to the thread_struct */
+	flush_fp_to_thread(current);
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+		return 1;
 
+#else
+	/* save floating-point registers */
+	if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
+		    ELF_NFPREG * sizeof(double)))
+		return 1;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* save spe registers */
 	if (current->thread.used_spe) {
@@ -411,6 +427,10 @@ static long restore_user_regs(struct pt_
 	long err;
 	unsigned int save_r2 = 0;
 	unsigned long msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/*
 	 * restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +458,11 @@ static long restore_user_regs(struct pt_
 	 */
 	discard_lazy_cpu_state();
 
-	/* force the process to reload the FP registers from
-	   current->thread when it next does FP instructions */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
-			     sizeof(sr->mc_fregs)))
-		return 1;
-
 #ifdef CONFIG_ALTIVEC
-	/* force the process to reload the altivec registers from
-	   current->thread when it next does altivec instructions */
+	/*
+	 * Force the process to reload the altivec registers from
+	 * current->thread when it next does altivec instructions
+	 */
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
@@ -462,6 +477,24 @@ static long restore_user_regs(struct pt_
 		return 1;
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+	if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+		return 1;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+
+#else
+	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
+			     sizeof(sr->mc_fregs)))
+		return 1;
+#endif /* CONFIG_VSX */
+	/*
+	 * force the process to reload the FP registers from
+	 * current->thread when it next does FP instructions
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
 #ifdef CONFIG_SPE
 	/* force the process to reload the spe registers from
 	   current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+	int i;
+#endif
 
 	flush_fp_to_thread(current);
 
@@ -112,12 +116,22 @@ static long setup_sigcontext(struct sigc
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
+	flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+	/* Copy FP to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+	/* copy fpr regs and fpscr */
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPR(0),
+			      FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
 	err |= __put_user(&sc->gp_regs, &sc->regs);
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPRSTART,
-			      FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -136,6 +150,9 @@ static long restore_sigcontext(struct pt
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs;
 #endif
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+#endif
 	unsigned long err = 0;
 	unsigned long save_r13 = 0;
 	elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -184,9 +201,6 @@ static long restore_sigcontext(struct pt
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
-				FP_REGS_SIZE);
-
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
 	if (err)
@@ -205,7 +219,19 @@ static long restore_sigcontext(struct pt
 	else
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* restore floating point */
+	err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+	if (err)
+		return err;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+#else
+	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
+				FP_REGS_SIZE);
+#endif
 	return err;
 }
 
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler  extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only usefully if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  cpu (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
 #ifdef CONFIG_PPC32
@@ -136,8 +137,13 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpvsr.fp[i].fpr
+#define TS_FPRSTART fpvsr.fp
+#else
 #define TS_FPR(i) fpr[i]
 #define TS_FPRSTART fpr
+#endif
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
@@ -155,8 +161,19 @@ struct thread_struct {
 	unsigned long	dbcr0;		/* debug control register values */
 	unsigned long	dbcr1;
 #endif
+#ifdef CONFIG_VSX
+	/* First 32 VSX registers (overlap with fpr[32]) */
+	union {
+		struct {
+			double fpr;
+			double vsrlow;
+		} fp[32];
+		vector128	vsr[32];
+	} fpvsr __attribute__((aligned(16)));
+#else
 	double		fpr[32];	/* Complete floating point set */
-	struct {			/* fpr ... fpscr must be contiguous */
+#endif
+	struct {
 
 		unsigned int pad;
 		unsigned int val;	/* Floating point status */
@@ -176,6 +193,10 @@ struct thread_struct {
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* VSR status */
+	int		used_vsr;	/* set if process has used altivec */
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
 	u64		acc;		/* Accumulator */
@@ -200,7 +221,11 @@ struct thread_struct {
 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
 }
 #else
+#ifdef CONFIG_VSX
+#define	FPVSR_INIT_THREAD .fpvsr = { .vsr = 0, }
+#else
 #define	FPVSR_INIT_THREAD .fpr = {0}
+#endif
 #define INIT_THREAD  { \
 	.ksp = INIT_SP, \
 	.ksp_limit = INIT_SP_LIMIT, \
@@ -293,5 +318,9 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
 #define TS_FPRSPACING 1
+#endif
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (6 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18 16:28   ` Joel Schopp
  2008-06-19  6:51   ` David Woodhouse
  2008-06-18  0:47 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/prom.c     |    3 +++
 include/asm-powerpc/cputable.h |   13 +++++++++++++
 2 files changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,9 @@ static struct feature_property {
 	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
 	{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP		0x00000400
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
+#define PPC_FEATURE_HAS_VSX		0x00000080
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
 #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP	0
+#define PPC_FEATURE_HAS_VSX_COMP    0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (5 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18  0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 				REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
+				 ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	REST_32FPRS(n,base);						\
+	b	3f;							\
+2:	REST_32VSRS(n,c,base);						\
+3:
+
+#define SAVE_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	SAVE_32FPRS(n,base);						\
+	b	3f;							\
+2:	SAVE_32VSRS(n,c,base);						\
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#endif
+
 #define SAVE_EVR(n,s,base)	evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)	SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)	SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define	vr30	30
 #define	vr31	31
 
+/* VSX Registers (VSRs) */
+
+#define	vsr0	0
+#define	vsr1	1
+#define	vsr2	2
+#define	vsr3	3
+#define	vsr4	4
+#define	vsr5	5
+#define	vsr6	6
+#define	vsr7	7
+#define	vsr8	8
+#define	vsr9	9
+#define	vsr10	10
+#define	vsr11	11
+#define	vsr12	12
+#define	vsr13	13
+#define	vsr14	14
+#define	vsr15	15
+#define	vsr16	16
+#define	vsr17	17
+#define	vsr18	18
+#define	vsr19	19
+#define	vsr20	20
+#define	vsr21	21
+#define	vsr22	22
+#define	vsr23	23
+#define	vsr24	24
+#define	vsr25	25
+#define	vsr26	26
+#define	vsr27	27
+#define	vsr28	28
+#define	vsr29	29
+#define	vsr30	30
+#define	vsr31	31
+#define	vsr32	32
+#define	vsr33	33
+#define	vsr34	34
+#define	vsr35	35
+#define	vsr36	36
+#define	vsr37	37
+#define	vsr38	38
+#define	vsr39	39
+#define	vsr40	40
+#define	vsr41	41
+#define	vsr42	42
+#define	vsr43	43
+#define	vsr44	44
+#define	vsr45	45
+#define	vsr46	46
+#define	vsr47	47
+#define	vsr48	48
+#define	vsr49	49
+#define	vsr50	50
+#define	vsr51	51
+#define	vsr52	52
+#define	vsr53	53
+#define	vsr54	54
+#define	vsr55	55
+#define	vsr56	56
+#define	vsr57	57
+#define	vsr58	58
+#define	vsr59	59
+#define	vsr60	60
+#define	vsr61	61
+#define	vsr62	62
+#define	vsr63	63
+
 /* SPE Registers (EVPRs) */
 
 #define	evr0	0

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (4 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18  0:47 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/entry_64.S   |    5 +
 arch/powerpc/kernel/fpu.S        |   16 ++++-
 arch/powerpc/kernel/head_64.S    |   65 +++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S    |   33 +++++++++++
 arch/powerpc/kernel/ppc32.h      |    1 
 arch/powerpc/kernel/ppc_ksyms.c  |    3 +
 arch/powerpc/kernel/process.c    |  108 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/kernel/ptrace.c     |   70 +++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c  |   33 +++++++++++
 arch/powerpc/kernel/signal_64.c  |   31 ++++++++++-
 arch/powerpc/kernel/traps.c      |   29 ++++++++++
 include/asm-powerpc/elf.h        |    6 +-
 include/asm-powerpc/ptrace.h     |   12 ++++
 include/asm-powerpc/reg.h        |    2 
 include/asm-powerpc/sigcontext.h |   37 +++++++++++++
 include/asm-powerpc/system.h     |    9 +++
 include/linux/elf.h              |    1 
 17 files changed, 453 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
 	mflr	r20		/* Return to switch caller */
 	mfmsr	r22
 	li	r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r0,r0,MSR_VSX@h	/* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	oris	r0,r0,MSR_VEC@h	/* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
 _GLOBAL(load_up_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
 	beq	1f
 	toreal(r4)
 	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPRS(0, r4)
+	SAVE_32FPVSRS(0, r5, r4)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r4)
 	PPC_LL	r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
 #endif
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
-	REST_32FPRS(0, r5)
+	REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
 	subi	r4,r5,THREAD
 	fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC_601
 	ISYNC_601
 	MTMSRD(r5)			/* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
-	SAVE_32FPRS(0, r3)
+	SAVE_32FPVSRS(0, r4 ,r3)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf20
 	b	altivec_unavailable_pSeries
 
+	. = 0xf40
+	b	vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
 	STD_EXCEPTION_PSERIES(., altivec_unavailable)
+	STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
 	blr
 #endif /* CONFIG_ALTIVEC */
 
+	.align	7
+	.globl vsx_unavailable_common
+vsx_unavailable_common:
+	EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	bne	.load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	ENABLE_INTS
+	bl	.vsx_unavailable_exception
+	b	.ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+	andi.	r5,r12,MSR_FP
+	beql+	load_up_fpu		/* skip if already loaded */
+	andis.	r5,r12,MSR_VEC@h
+	beql+	load_up_altivec		/* skip if already loaded */
+
+#ifndef CONFIG_SMP
+	ld	r3,last_task_used_vsx@got(r2)
+	ld	r4,0(r3)
+	cmpdi	0,r4,0
+	beq	1f
+	/* Disable VSX for last_task_used_vsx */
+	addi	r4,r4,THREAD
+	ld	r5,PT_REGS(r4)
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r6,MSR_VSX@h
+	andc	r6,r4,r6
+	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+	ld	r4,PACACURRENT(r13)
+	addi	r4,r4,THREAD		/* Get THREAD */
+	li	r6,1
+	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	/* enable use of VSX after return */
+	oris	r12,r12,MSR_VSX@h
+	std	r12,_MSR(r1)
+#ifndef CONFIG_SMP
+	/* Update last_task_used_math to 'current' */
+	ld	r4,PACACURRENT(r13)
+	std	r4,0(r3)
+#endif /* CONFIG_SMP */
+	b	fast_exception_return
+#endif /* CONFIG_VSX */
+
 /*
  * Hash table stuff
  */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
 
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+	mfmsr	r5
+	oris	r5,r5,MSR_VSX@h
+	mtmsrd	r5			/* enable use of VSX now */
+	isync
+
+	cmpdi	0,r3,0
+	beqlr-				/* if no previous owner, done */
+	addi	r3,r3,THREAD		/* want THREAD of task */
+	ld	r5,PT_REGS(r3)
+	cmpdi	0,r5,0
+	beq	1f
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r3,MSR_VSX@h
+	andc	r4,r4,r3		/* disable VSX for previous task */
+	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+	li	r5,0
+	ld	r4,last_task_used_vsx@got(r2)
+	std	r5,0(r4)
+#endif /* CONFIG_SMP */
+	blr
+
+#endif /* CONFIG_VSX */
+
 /* kexec_wait(phys_cpu)
  *
  * wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
 	elf_fpregset_t		mc_fregs;
 	unsigned int		mc_pad[2];
 	elf_vrregset_t32	mc_vregs __attribute__((__aligned__(16)));
+	elf_vsrreghalf_t32      mc_vsregs __attribute__((__aligned__(16)));
 };
 
 struct ucontext32 { 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
 #ifdef CONFIG_ALTIVEC
 EXPORT_SYMBOL(giveup_altivec);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 EXPORT_SYMBOL(giveup_spe);
 #endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
 {
+#ifdef CONFIG_VSX
+	int i;
+	elf_fpreg_t *reg;
+#endif
+
 	if (!tsk->thread.regs)
 		return 0;
 	flush_fp_to_thread(current);
 
+#ifdef CONFIG_VSX
+	reg = (elf_fpreg_t *)fpregs;
+	for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+		*reg = tsk->thread.TS_FPR(i);
+	memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
 	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
 
 	return 1;
 }
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
 	}
 }
 
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
 {
 	/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
 	 * separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+	WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+		giveup_vsx(current);
+	else
+		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
+#else
+	giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+	if (tsk->thread.regs) {
+		preempt_disable();
+		if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+			BUG_ON(tsk != current);
+#endif
+			giveup_vsx(tsk);
+		}
+		preempt_enable();
+	}
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers.  This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+	/* Grab only the first half */
+	const int nregs = 32;
+	elf_vrreg_t *reg;
+
+	if (tsk == current)
+		flush_vsx_to_thread(tsk);
+
+	reg = (elf_vrreg_t *)vrregs;
+
+	/* copy the first 32 vsr registers */
+	memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+	int rc = 0;
+	elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+	rc = dump_task_altivec(tsk, regs);
+	if (rc)
+		return rc;
+	regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+	rc = dump_task_altivec(tsk, regs);
+#endif
+	return rc;
+}
+
 #ifdef CONFIG_SPE
 
 void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
 	if (last_task_used_altivec == current)
 		last_task_used_altivec = NULL;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (last_task_used_vsx == current)
+		last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
 		giveup_altivec(prev);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+		giveup_vsx(prev);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
 	if (new->thread.regs && last_task_used_altivec == new)
 		new->thread.regs->msr |= MSR_VEC;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (new->thread.regs && last_task_used_vsx == new)
+		new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
 	{MSR_EE,	"EE"},
 	{MSR_PR,	"PR"},
 	{MSR_FP,	"FP"},
+	{MSR_VEC,	"VEC"},
+	{MSR_VSX,	"VSX"},
 	{MSR_ME,	"ME"},
 	{MSR_IR,	"IR"},
 	{MSR_DR,	"DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct 
 {
 	flush_fp_to_thread(current);
 	flush_altivec_to_thread(current);
+	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
 }
 
@@ -689,8 +790,13 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+	memset(current->thread.fpvsr.vsr, 0, sizeof(current->thread.fpvsr.vsr));
+	current->thread.used_vsr = 0;
+#else
 	memset(current->thread.TS_FPRSTART, 0,
 	       sizeof(current->thread.TS_FPRSTART));
+#endif /* CONFIG_VSX */
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell.  This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+		      const struct user_regset *regset)
+{
+	flush_vsx_to_thread(target);
+	return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  &target->thread.fpvsr.vsr, 0,
+				  32 * sizeof(vector128));
+
+	return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 &target->thread.fpvsr.vsr, 0,
+				 32 * sizeof(vector128));
+
+	return ret;
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_SPE
 
 /*
@@ -427,6 +472,9 @@ enum powerpc_regset {
 #ifdef CONFIG_ALTIVEC
 	REGSET_VMX,
 #endif
+#ifdef CONFIG_VSX
+	REGSET_VSX,
+#endif
 #ifdef CONFIG_SPE
 	REGSET_SPE,
 #endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
 		.active = vr_active, .get = vr_get, .set = vr_set
 	},
 #endif
+#ifdef CONFIG_VSX
+	[REGSET_VSX] = {
+		.core_note_type = NT_PPC_VSX, .n = 34,
+		.size = sizeof(vector128), .align = sizeof(vector128),
+		.active = vsr_active, .get = vsr_get, .set = vsr_set
+	},
+#endif
 #ifdef CONFIG_SPE
 	[REGSET_SPE] = {
 		.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
 						 sizeof(u32)),
 					     (const void __user *) data);
 #endif
+#ifdef CONFIG_VSX
+	case PTRACE_GETVSRREGS:
+		return copy_regset_to_user(child, &user_ppc_native_view,
+					   REGSET_VSX,
+					   0, (32 * sizeof(vector128) +
+					       sizeof(u32)),
+					   (void __user *) data);
+
+	case PTRACE_SETVSRREGS:
+		return copy_regset_from_user(child, &user_ppc_native_view,
+					     REGSET_VSX,
+					     0, (32 * sizeof(vector128) +
+						 sizeof(u32)),
+					     (const void __user *) data);
+#endif
 #ifdef CONFIG_SPE
 	case PTRACE_GETEVRREGS:
 		/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -379,6 +379,21 @@ static int save_user_regs(struct pt_regs
 	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
 		return 1;
 
+	/*
+	 * Copy VSR 0-31 upper half from thread_struct to local
+	 * buffer, then write that to userspace.  Also set MSR_VSX in
+	 * the saved MSR value to indicate that frame->mc_vregs
+	 * contains valid data
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpvsr.fp[i].vsrlow;
+		if (__copy_to_user(&frame->mc_vsregs, buf,
+				   ELF_NVSRHALFREG  * sizeof(double)))
+			return 1;
+		msr |= MSR_VSX;
+	}
 #else
 	/* save floating-point registers */
 	if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
@@ -484,6 +499,24 @@ static long restore_user_regs(struct pt_
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Force the process to reload the VSX registers from
+	 * current->thread when it next does VSX instruction.
+	 */
+	regs->msr &= ~MSR_VSX;
+	if (msr & MSR_VSX) {
+		/*
+		 * Restore altivec registers from the stack to a local
+		 * buffer, then write this out to the thread_struct
+		 */
+		if (__copy_from_user(buf, &sr->mc_vsregs,
+				     sizeof(sr->mc_vsregs)))
+			return 1;
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpvsr.fp[i].vsrlow = buf[i];
+	} else if (current->thread.used_vsr)
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpvsr.fp[i].vsrlow = 0;
 #else
 	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
 		buf[i] = current->thread.TS_FPR(i);
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+	/*
+	 * Copy VSX low doubleword to local buffer for formatting,
+	 * then out to userspace.  Update v_regs to point after the
+	 * VMX data.
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		v_regs += ELF_NVRREG;
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpvsr.fp[i].vsrlow;
+		err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+		/* set MSR_VSX in the MSR value in the frame to
+		 * indicate that sc->vs_reg) contains valid data.
+		 */
+		msr |= MSR_VSX;
+	}
 #else /* CONFIG_VSX */
 	/* copy fpr regs and fpscr */
 	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPR(0),
@@ -199,7 +215,7 @@ static long restore_sigcontext(struct pt
 	 * current->thread.TS_FPR/vr for the reasons explained in the
 	 * previous comment.
 	 */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
@@ -228,6 +244,19 @@ static long restore_sigcontext(struct pt
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Get additional VSX data. Update v_regs to point after the
+	 * VMX data.  Copy VSX low doubleword from userspace to local
+	 * buffer for formatting, then into the taskstruct.
+	 */
+	v_regs += ELF_NVRREG;
+	if ((msr & MSR_VSX) != 0)
+		err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+	else
+		memset(buf, 0, 32 * sizeof(double));
+
+	for (i = 0; i < 32 ; i++)
+		current->thread.fpvsr.fp[i].vsrlow = buf[i];
 #else
 	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
 				FP_REGS_SIZE);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* A user program has executed an vsx instruction,
+		   but this kernel doesn't support vsx. */
+		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+		return;
+	}
+
+	printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+			"%lx at %lx\n", regs->trap, regs->nip);
+	die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
 	perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "VSX assist exception in kernel mode"
+		       " at %lx\n", regs->nip);
+		die("Kernel VSX assist exception", regs, SIGILL);
+	}
+
+	flush_vsx_to_thread(current);
+	printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+	_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_FSL_BOOKE
 void CacheLockingException(struct pt_regs *regs, unsigned long address,
 			   unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
 #ifdef __powerpc64__
 # define ELF_NVRREG32	33	/* includes vscr & vrsave stuffed together */
 # define ELF_NVRREG	34	/* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32	/* Half the vsx registers */
 # define ELF_GREG_TYPE	elf_greg_t64
 #else
 # define ELF_NEVRREG	34	/* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
 typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
 #ifdef __powerpc64__
 typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 #endif
 
 #ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
 typedef elf_vrregset_t elf_fpxregset_t;
 
 #ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
 #define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
 #endif
 
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
 #define PT_VRSAVE_32 (PT_VR0 + 33*4)
 #endif
 
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150	/* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 	/* each VSR reg occupies 4 slots in 32-bit */
+#endif
 #endif /* __powerpc64__ */
 
 /*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
 #define PTRACE_GETEVRREGS	20
 #define PTRACE_SETEVRREGS	21
 
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS	27
+#define PTRACE_SETVSRREGS	28
+
 /*
  * Get or set a debug register. The first 16 are DABR registers and the
  * second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
 #define MSR_ISF_LG	61              /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG 	60              /* Hypervisor state */
 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */
 #define MSR_WE_LG	18		/* Wait State Enable */
 #define MSR_TGPR_LG	17		/* TLB Update registers in use */
@@ -71,6 +72,7 @@
 #endif
 
 #define MSR_VEC		__MASK(MSR_VEC_LG)	/* Enable AltiVec */
+#define MSR_VSX		__MASK(MSR_VSX_LG)	/* Enable VSX */
 #define MSR_POW		__MASK(MSR_POW_LG)	/* Enable Power Management */
 #define MSR_WE		__MASK(MSR_WE_LG)	/* Wait State Enable */
 #define MSR_TGPR	__MASK(MSR_TGPR_LG)	/* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
  * it must be copied via a vector register to/from storage) or as a word.
  * The entry with index 33 contains the vrsave as the first word (offset 0)
  * within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words.  Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ *                    VSR doubleword 0               VSR doubleword 1
+ *           ----------------------------------------------------------------
+ *   VSR[0]  |             FPR[0]            |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[1]  |             FPR[1]            |                              |
+ *           ----------------------------------------------------------------
+ *           |              ...              |                              |
+ *           |              ...              |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[30] |             FPR[30]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[31] |             FPR[31]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[32] |                             VR[0]                            |
+ *           ----------------------------------------------------------------
+ *   VSR[33] |                             VR[1]                            |
+ *           ----------------------------------------------------------------
+ *           |                              ...                             |
+ *           |                              ...                             |
+ *           ----------------------------------------------------------------
+ *   VSR[62] |                             VR[30]                           |
+ *           ----------------------------------------------------------------
+ *   VSR[63] |                             VR[31]                           |
+ *           ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve.  vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
  */
 	elf_vrreg_t	__user *v_regs;
-	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
 #endif
 };
 
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
 extern void giveup_altivec(struct task_struct *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
 }
 #endif
 
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 #ifdef CONFIG_SPE
 extern void flush_spe_to_thread(struct task_struct *);
 #else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
 #define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */
 #define NT_PPC_VMX	0x100		/* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE	0x101		/* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX	0x102		/* PowerPC VSX registers */
 #define NT_386_TLS	0x200		/* i386 TLS slots (struct user_desc) */
 
 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (7 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-18  0:47 ` Michael Neuling
  2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
  2008-06-20  4:13 ` Michael Neuling
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18  0:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -171,6 +171,22 @@ config VSX
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only usefully if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  CPUs (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (8 preceding siblings ...)
  2008-06-18  0:47 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-18 13:05 ` Kumar Gala
  2008-06-18 23:54   ` Michael Neuling
  2008-06-20  4:13 ` Michael Neuling
  10 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-18 13:05 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:

> The following set of patches adds Vector Scalar Extentions (VSX)
> support for POWER7.  Includes context switch, ptrace and signals  
> support.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
> This series is on top of the POWER7 cputable entry patch.
>
> Paulus: please consider for your 2.6.27 tree.

I bit better explanation of what VSX would be useful.  Its not clear  
to me exactly how these instructions behave such that we have to touch  
all this code.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-18  0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-18 14:53   ` Kumar Gala
  2008-06-18 23:55     ` Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-18 14:53 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:

> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit.  This will never happen in reality, but it looks bad.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> arch/powerpc/kernel/signal_32.c |   10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)

probably worth commenting on why this will never happen.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-18  0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-18 16:28   ` Joel Schopp
  2008-06-19  6:51   ` David Woodhouse
  1 sibling, 0 replies; 106+ messages in thread
From: Joel Schopp @ 2008-06-18 16:28 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras

A couple of these lines originated with me.

Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

Michael Neuling wrote:
> Add a VSX CPU feature.  Also add code to detect if VSX is available
> from the device tree.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
>  arch/powerpc/kernel/prom.c     |    3 +++
>  include/asm-powerpc/cputable.h |   13 +++++++++++++
>  2 files changed, 16 insertions(+)
>
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
> @@ -609,6 +609,9 @@ static struct feature_property {
>  	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
>  	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
>  #endif /* CONFIG_ALTIVEC */
> +#ifdef CONFIG_VSX
> +	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
> +#endif /* CONFIG_VSX */
>  #ifdef CONFIG_PPC64
>  	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
>  	{"ibm,purr", 1, CPU_FTR_PURR, 0},
> Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
> @@ -27,6 +27,7 @@
>  #define PPC_FEATURE_HAS_DFP		0x00000400
>  #define PPC_FEATURE_POWER6_EXT		0x00000200
>  #define PPC_FEATURE_ARCH_2_06		0x00000100
> +#define PPC_FEATURE_HAS_VSX		0x00000080
>  
>  #define PPC_FEATURE_TRUE_LE		0x00000002
>  #define PPC_FEATURE_PPC_LE		0x00000001
> @@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
>  #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
>  #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
>  #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
> +#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
>  #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
>  #endif
>  
> +/* We only set the VSX features if the kernel was compiled with VSX
> + * support
> + */
> +#ifdef CONFIG_VSX
> +#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
> +#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
> +#else
> +#define CPU_FTR_VSX_COMP	0
> +#define PPC_FEATURE_HAS_VSX_COMP    0
> +#endif
> +
>  /* We only set the spe features if the kernel was compiled with spe
>   * support
>   */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
>   

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-18  0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-18 19:35   ` Kumar Gala
  2008-06-18 22:58     ` Paul Mackerras
  2008-06-19  4:22   ` Kumar Gala
  1 sibling, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-18 19:35 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:

> The layout of the new VSR registers and how they overlap on top of the
> legacy FPR and VR registers is:
>
>                   VSR doubleword 0               VSR doubleword 1
>           
> ----------------------------------------------------------------
>  VSR[0]  |             FPR[0]             
> |                              |
>           
> ----------------------------------------------------------------
>  VSR[1]  |             FPR[1]             
> |                              |
>           
> ----------------------------------------------------------------
>          |              ...               
> |                              |
>          |              ...               
> |                              |
>           
> ----------------------------------------------------------------
>  VSR[30] |             FPR[30]            
> |                              |
>           
> ----------------------------------------------------------------
>  VSR[31] |             FPR[31]            
> |                              |
>           
> ----------------------------------------------------------------
>  VSR[32] |                              
> VR[0]                            |
>           
> ----------------------------------------------------------------
>  VSR[33] |                              
> VR[1]                            |
>           
> ----------------------------------------------------------------
>           
> |                              ...                             |
>           
> |                              ...                             |
>           
> ----------------------------------------------------------------
>  VSR[62] |                              
> VR[30]                           |
>           
> ----------------------------------------------------------------
>  VSR[63] |                              
> VR[31]                           |
>           
> ----------------------------------------------------------------
>
> VSX has 64 128bit registers.  The first 32 regs overlap with the FP
> registers and hence extend them with and additional 64 bits.  The
> second 32 regs overlap with the VMX registers.
>
> This patch introduces the thread_struct changes required to reflect
> this register layout.  Ptrace and signals code is updated so that the
> floating point registers are correctly accessed from the thread_struct
> when CONFIG_VSX is enabled.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---

Is VSX mutually exclusive with altivec/fp?  is there a MSR bit for it?

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-18 19:35   ` Kumar Gala
@ 2008-06-18 22:58     ` Paul Mackerras
  2008-06-19  4:13       ` Kumar Gala
  0 siblings, 1 reply; 106+ messages in thread
From: Paul Mackerras @ 2008-06-18 22:58 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling

Kumar Gala writes:

> Is VSX mutually exclusive with altivec/fp?  is there a MSR bit for it?

It's not exclusive, it's an extension of altivec/fp, and yes it has
its own MSR bit to enable it.

Paul.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-18 23:54   ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 23:54 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

> On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
> 
> > The following set of patches adds Vector Scalar Extentions (VSX)
> > support for POWER7.  Includes context switch, ptrace and signals  
> > support.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> > This series is on top of the POWER7 cputable entry patch.
> >
> > Paulus: please consider for your 2.6.27 tree.
> 
> I bit better explanation of what VSX would be useful.  Its not clear  
> to me exactly how these instructions behave such that we have to touch  
> all this code.

There is a register layout description which it looks like you found at
the top of patch 5.

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-18 14:53   ` Kumar Gala
@ 2008-06-18 23:55     ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 23:55 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

In message <DB1B686B-FE98-486B-B345-D18408C51135@kernel.crashing.org> you wrote:
> 
> On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
> 
> > If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> > bit.  This will never happen in reality, but it looks bad.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >
> > arch/powerpc/kernel/signal_32.c |   10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> 
> probably worth commenting on why this will never happen.

Ok, I'll update the comments.

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-18 22:58     ` Paul Mackerras
@ 2008-06-19  4:13       ` Kumar Gala
  2008-06-19  4:30         ` Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Michael Neuling


On Jun 18, 2008, at 5:58 PM, Paul Mackerras wrote:

> Kumar Gala writes:
>
>> Is VSX mutually exclusive with altivec/fp?  is there a MSR bit for  
>> it?
>
> It's not exclusive, it's an extension of altivec/fp, and yes it has
> its own MSR bit to enable it.

what MSR bit does it use... I'm not seeing the code add or test a new  
MSR bit anywhere.

What exactly do you mean by its an extension of altivec/fp?  Are the  
instructions considered part of altivec/fp or is it just reusing the  
register storage like SPE?

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-18  0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
  2008-06-18 19:35   ` Kumar Gala
@ 2008-06-19  4:22   ` Kumar Gala
  2008-06-19  4:35     ` Michael Neuling
  1 sibling, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19  4:22 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras

>
>
> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> /* Lazy FPU handling on uni-processor */
> extern struct task_struct *last_task_used_math;
> extern struct task_struct *last_task_used_altivec;
> +extern struct task_struct *last_task_used_vsx;
> extern struct task_struct *last_task_used_spe;
>
> #ifdef CONFIG_PPC32
> @@ -136,8 +137,13 @@ typedef struct {
> 	unsigned long seg;
> } mm_segment_t;
>
> +#ifdef CONFIG_VSX
> +#define TS_FPR(i) fpvsr.fp[i].fpr
> +#define TS_FPRSTART fpvsr.fp
> +#else
> #define TS_FPR(i) fpr[i]
> #define TS_FPRSTART fpr
> +#endif
>
> struct thread_struct {
> 	unsigned long	ksp;		/* Kernel stack pointer */
> @@ -155,8 +161,19 @@ struct thread_struct {
> 	unsigned long	dbcr0;		/* debug control register values */
> 	unsigned long	dbcr1;
> #endif
> +#ifdef CONFIG_VSX
> +	/* First 32 VSX registers (overlap with fpr[32]) */
> +	union {
> +		struct {
> +			double fpr;
> +			double vsrlow;
> +		} fp[32];
> +		vector128	vsr[32];
> +	} fpvsr __attribute__((aligned(16)));

Do we really need a union here?  what would happen if you just changed  
the type of fpr[32] from double to vector if #CONFIG_VSX?

I really dont like the union and think we can just make the storage  
look opaque which is the key.  I doubt we every really care about  
using fpr[] as a double in the kernel.

Also, the attribute is redundant, vector is already aligned(16).

> +#else
> 	double		fpr[32];	/* Complete floating point set */
> -	struct {			/* fpr ... fpscr must be contiguous */
> +#endif
> +	struct {
>
> 		unsigned int pad;
> 		unsigned int val;	/* Floating point status */
> @@ -176,6 +193,10 @@ struct thread_struct {
> 	unsigned long	vrsave;
> 	int		used_vr;	/* set if process has used altivec */
> #endif /* CONFIG_ALTIVEC */
> +#ifdef CONFIG_VSX
> +	/* VSR status */
> +	int		used_vsr;	/* set if process has used altivec */
> +#endif /* CONFIG_VSX */
> #ifdef CONFIG_SPE
> 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
> 	u64		acc;		/* Accumulator */
> @@ -200,7 +221,11 @@ struct thread_struct {
> 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
> }
> #else
> +#ifdef CONFIG_VSX
> +#define	FPVSR_INIT_THREAD .fpvsr = { .vsr = 0, }
> +#else
> #define	FPVSR_INIT_THREAD .fpr = {0}
> +#endif
> #define INIT_THREAD  { \
> 	.ksp = INIT_SP, \
> 	.ksp_limit = INIT_SP_LIMIT, \
> @@ -293,5 +318,9 @@ static inline void prefetchw(const void
>
> #endif /* __KERNEL__ */
> #endif /* __ASSEMBLY__ */
> +#ifdef CONFIG_VSX
> +#define TS_FPRSPACING 2
> +#else
> #define TS_FPRSPACING 1
> +#endif
> #endif /* _ASM_POWERPC_PROCESSOR_H */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  4:13       ` Kumar Gala
@ 2008-06-19  4:30         ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-19  4:30 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

In message <C780D687-D505-4A01-BED8-9866F4D0160A@kernel.crashing.org> you wrote:
> 
> On Jun 18, 2008, at 5:58 PM, Paul Mackerras wrote:
> 
> > Kumar Gala writes:
> >
> >> Is VSX mutually exclusive with altivec/fp?  is there a MSR bit for  
> >> it?
> >
> > It's not exclusive, it's an extension of altivec/fp, and yes it has
> > its own MSR bit to enable it.
> 
> what MSR bit does it use... I'm not seeing the code add or test a new  
> MSR bit anywhere.

It's introduced in patch 8.

 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */

> What exactly do you mean by its an extension of altivec/fp?  Are the  
> instructions considered part of altivec/fp or is it just reusing the  
> register storage like SPE?

VSX is considered separate instructions, but it uses the same
architected registers as FP and VMX.  

ie if you execute a VSX instruction which touches VSX regsister 0, it'll
change FP register 0 (and visa versa).  

Also, if execute a VSX instruction which touches VSX register 32, it'll
change VMX register 0 (and visa versa).  In fact, for this patch we use
the 128bit VMX load/stores to perform the context save/restore on the
VSX registers 32-63.

I guess in theory you could have VSX without FP and VMX, but this patch
assumes you have FP and VMX if you have VSX.  

This set of patches should allow any crazy mix of FP, VMX and VSX code
and the architected state should be context switched correctly.  

Sorry, I'm not familiar with how SPE works, so I can't comment on it's
relevance. 

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  4:22   ` Kumar Gala
@ 2008-06-19  4:35     ` Michael Neuling
  2008-06-19  4:58       ` Kumar Gala
  0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-19  4:35 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

In message <5AEB0769-1394-4924-803D-C40CAF685519@kernel.crashing.org> you wrote
:
> >
> >
> > Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> > +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> > /* Lazy FPU handling on uni-processor */
> > extern struct task_struct *last_task_used_math;
> > extern struct task_struct *last_task_used_altivec;
> > +extern struct task_struct *last_task_used_vsx;
> > extern struct task_struct *last_task_used_spe;
> >
> > #ifdef CONFIG_PPC32
> > @@ -136,8 +137,13 @@ typedef struct {
> > 	unsigned long seg;
> > } mm_segment_t;
> >
> > +#ifdef CONFIG_VSX
> > +#define TS_FPR(i) fpvsr.fp[i].fpr
> > +#define TS_FPRSTART fpvsr.fp
> > +#else
> > #define TS_FPR(i) fpr[i]
> > #define TS_FPRSTART fpr
> > +#endif
> >
> > struct thread_struct {
> > 	unsigned long	ksp;		/* Kernel stack pointer */
> > @@ -155,8 +161,19 @@ struct thread_struct {
> > 	unsigned long	dbcr0;		/* debug control register values */
> > 	unsigned long	dbcr1;
> > #endif
> > +#ifdef CONFIG_VSX
> > +	/* First 32 VSX registers (overlap with fpr[32]) */
> > +	union {
> > +		struct {
> > +			double fpr;
> > +			double vsrlow;
> > +		} fp[32];
> > +		vector128	vsr[32];
> > +	} fpvsr __attribute__((aligned(16)));
> 
> Do we really need a union here?  what would happen if you just changed  
> the type of fpr[32] from double to vector if #CONFIG_VSX?
>
> I really dont like the union and think we can just make the storage  
> look opaque which is the key.  I doubt we every really care about  
> using fpr[] as a double in the kernel.

I did something similar to this for the first cut of this patch, but it
made the code accessing this structure much less readable.

Personally, I think the union is good as it represents the true
structure of what it's storing.

> Also, the attribute is redundant, vector is already aligned(16).

Ok, I'll remove.

Mikey

> 
> > +#else
> > 	double		fpr[32];	/* Complete floating point set */
> > -	struct {			/* fpr ... fpscr must be contiguous */
> > +#endif
> > +	struct {
> >
> > 		unsigned int pad;
> > 		unsigned int val;	/* Floating point status */
> > @@ -176,6 +193,10 @@ struct thread_struct {
> > 	unsigned long	vrsave;
> > 	int		used_vr;	/* set if process has used altivec */
> > #endif /* CONFIG_ALTIVEC */
> > +#ifdef CONFIG_VSX
> > +	/* VSR status */
> > +	int		used_vsr;	/* set if process has used altivec */
> > +#endif /* CONFIG_VSX */
> > #ifdef CONFIG_SPE
> > 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
> > 	u64		acc;		/* Accumulator */
> > @@ -200,7 +221,11 @@ struct thread_struct {
> > 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
> > }
> > #else
> > +#ifdef CONFIG_VSX
> > +#define	FPVSR_INIT_THREAD .fpvsr = { .vsr = 0, }
> > +#else
> > #define	FPVSR_INIT_THREAD .fpr = {0}
> > +#endif
> > #define INIT_THREAD  { \
> > 	.ksp = INIT_SP, \
> > 	.ksp_limit = INIT_SP_LIMIT, \
> > @@ -293,5 +318,9 @@ static inline void prefetchw(const void
> >
> > #endif /* __KERNEL__ */
> > #endif /* __ASSEMBLY__ */
> > +#ifdef CONFIG_VSX
> > +#define TS_FPRSPACING 2
> > +#else
> > #define TS_FPRSPACING 1
> > +#endif
> > #endif /* _ASM_POWERPC_PROCESSOR_H */
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  4:35     ` Michael Neuling
@ 2008-06-19  4:58       ` Kumar Gala
  2008-06-19  5:37         ` Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19  4:58 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 18, 2008, at 11:35 PM, Michael Neuling wrote:

> In message <5AEB0769-1394-4924-803D- 
> C40CAF685519@kernel.crashing.org> you wrote
> :
>>>
>>>
>>> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
>>> ===================================================================
>>> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
>>> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
>>> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
>>> /* Lazy FPU handling on uni-processor */
>>> extern struct task_struct *last_task_used_math;
>>> extern struct task_struct *last_task_used_altivec;
>>> +extern struct task_struct *last_task_used_vsx;
>>> extern struct task_struct *last_task_used_spe;
>>>
>>> #ifdef CONFIG_PPC32
>>> @@ -136,8 +137,13 @@ typedef struct {
>>> 	unsigned long seg;
>>> } mm_segment_t;
>>>
>>> +#ifdef CONFIG_VSX
>>> +#define TS_FPR(i) fpvsr.fp[i].fpr
>>> +#define TS_FPRSTART fpvsr.fp
>>> +#else
>>> #define TS_FPR(i) fpr[i]
>>> #define TS_FPRSTART fpr
>>> +#endif
>>>
>>> struct thread_struct {
>>> 	unsigned long	ksp;		/* Kernel stack pointer */
>>> @@ -155,8 +161,19 @@ struct thread_struct {
>>> 	unsigned long	dbcr0;		/* debug control register values */
>>> 	unsigned long	dbcr1;
>>> #endif
>>> +#ifdef CONFIG_VSX
>>> +	/* First 32 VSX registers (overlap with fpr[32]) */
>>> +	union {
>>> +		struct {
>>> +			double fpr;
>>> +			double vsrlow;
>>> +		} fp[32];
>>> +		vector128	vsr[32];

how about:

	union {
		struct {
			double fp;
			double vsrlow;
		} fpr;
		vector128 v;
	} fpvsr[32];

>>>
>>> +	} fpvsr __attribute__((aligned(16)));
>>
>> Do we really need a union here?  what would happen if you just  
>> changed
>> the type of fpr[32] from double to vector if #CONFIG_VSX?
>>
>> I really dont like the union and think we can just make the storage
>> look opaque which is the key.  I doubt we every really care about
>> using fpr[] as a double in the kernel.
>
> I did something similar to this for the first cut of this patch, but  
> it
> made the code accessing this structure much less readable.

really, what code is that?

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  4:58       ` Kumar Gala
@ 2008-06-19  5:37         ` Michael Neuling
  2008-06-19  5:47           ` Kumar Gala
  0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-19  5:37 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

In message <A62DFD0C-387A-4833-B266-99DB1B09E101@kernel.crashing.org> you wrote
:
> 
> On Jun 18, 2008, at 11:35 PM, Michael Neuling wrote:
> 
> > In message <5AEB0769-1394-4924-803D- 
> > C40CAF685519@kernel.crashing.org> you wrote
> > :
> >>>
> >>>
> >>> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> >>> ===================================================================
> >>> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> >>> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> >>> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> >>> /* Lazy FPU handling on uni-processor */
> >>> extern struct task_struct *last_task_used_math;
> >>> extern struct task_struct *last_task_used_altivec;
> >>> +extern struct task_struct *last_task_used_vsx;
> >>> extern struct task_struct *last_task_used_spe;
> >>>
> >>> #ifdef CONFIG_PPC32
> >>> @@ -136,8 +137,13 @@ typedef struct {
> >>> 	unsigned long seg;
> >>> } mm_segment_t;
> >>>
> >>> +#ifdef CONFIG_VSX
> >>> +#define TS_FPR(i) fpvsr.fp[i].fpr
> >>> +#define TS_FPRSTART fpvsr.fp
> >>> +#else
> >>> #define TS_FPR(i) fpr[i]
> >>> #define TS_FPRSTART fpr
> >>> +#endif
> >>>
> >>> struct thread_struct {
> >>> 	unsigned long	ksp;		/* Kernel stack pointer */
> >>> @@ -155,8 +161,19 @@ struct thread_struct {
> >>> 	unsigned long	dbcr0;		/* debug control register values */
> >>> 	unsigned long	dbcr1;
> >>> #endif
> >>> +#ifdef CONFIG_VSX
> >>> +	/* First 32 VSX registers (overlap with fpr[32]) */
> >>> +	union {
> >>> +		struct {
> >>> +			double fpr;
> >>> +			double vsrlow;
> >>> +		} fp[32];
> >>> +		vector128	vsr[32];
> 
> how about:
> 
> 	union {
> 		struct {
> 			double fp;
> 			double vsrlow;
> 		} fpr;
> 		vector128 v;
> 	} fpvsr[32];

Arrh, yep, makes more sense to put the array definition outside the
union.  I'll change.

> 
> >>>
> >>> +	} fpvsr __attribute__((aligned(16)));
> >>
> >> Do we really need a union here?  what would happen if you just  
> >> changed
> >> the type of fpr[32] from double to vector if #CONFIG_VSX?
> >>
> >> I really dont like the union and think we can just make the storage
> >> look opaque which is the key.  I doubt we every really care about
> >> using fpr[] as a double in the kernel.
> >
> > I did something similar to this for the first cut of this patch, but  
> > it
> > made the code accessing this structure much less readable.
> 
> really, what code is that?

Any code that has to read/write the top or bottom 64 bits _only_ of the
128 bit vector.

The signals code is a good example where, for backwards compatibility,
we need to read/write the old 64 bit FP regs, from the 128 bit value in
the struct.

Similarly, the way we've extended the signals interface for VSX, you
need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.

eg. the simple:
     current->thread.fpvsr.fp[i].vsrlow = buf[i]
would turn into some abomination/macro.

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  5:37         ` Michael Neuling
@ 2008-06-19  5:47           ` Kumar Gala
  2008-06-19  6:01             ` Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19  5:47 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras

>>>>> +	} fpvsr __attribute__((aligned(16)));
>>>>
>>>> Do we really need a union here?  what would happen if you just
>>>> changed
>>>> the type of fpr[32] from double to vector if #CONFIG_VSX?
>>>>
>>>> I really dont like the union and think we can just make the storage
>>>> look opaque which is the key.  I doubt we every really care about
>>>> using fpr[] as a double in the kernel.
>>>
>>> I did something similar to this for the first cut of this patch, but
>>> it
>>> made the code accessing this structure much less readable.
>>
>> really, what code is that?
>
> Any code that has to read/write the top or bottom 64 bits _only_ of  
> the
> 128 bit vector.
>
> The signals code is a good example where, for backwards compatibility,
> we need to read/write the old 64 bit FP regs, from the 128 bit value  
> in
> the struct.
>
> Similarly, the way we've extended the signals interface for VSX, you
> need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.
>
> eg. the simple:
>     current->thread.fpvsr.fp[i].vsrlow = buf[i]
> would turn into some abomination/macro.

it would turn into something like:

current->thread.fpr[i][2] = buf[i];
current->thread.fpr[i][3] = buf[i+1];

if you look at your code you'll see there are only a few places you  
accessing the union as fpvsr.vsr[] and those places could easily be  
fpr[], since they are already #CONFIG_VSX protected.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  5:47           ` Kumar Gala
@ 2008-06-19  6:01             ` Michael Neuling
  2008-06-19  6:10               ` Kumar Gala
  0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-19  6:01 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

In message <B0E87874-BC65-4037-A43D-91C4142475E7@kernel.crashing.org> you wrote
:
> >>>>> +	} fpvsr __attribute__((aligned(16)));
> >>>>
> >>>> Do we really need a union here?  what would happen if you just
> >>>> changed
> >>>> the type of fpr[32] from double to vector if #CONFIG_VSX?
> >>>>
> >>>> I really dont like the union and think we can just make the storage
> >>>> look opaque which is the key.  I doubt we every really care about
> >>>> using fpr[] as a double in the kernel.
> >>>
> >>> I did something similar to this for the first cut of this patch, but
> >>> it
> >>> made the code accessing this structure much less readable.
> >>
> >> really, what code is that?
> >
> > Any code that has to read/write the top or bottom 64 bits _only_ of  
> > the
> > 128 bit vector.
> >
> > The signals code is a good example where, for backwards compatibility,
> > we need to read/write the old 64 bit FP regs, from the 128 bit value  
> > in
> > the struct.
> >
> > Similarly, the way we've extended the signals interface for VSX, you
> > need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.
> >
> > eg. the simple:
> >     current->thread.fpvsr.fp[i].vsrlow = buf[i]
> > would turn into some abomination/macro.
> 
> it would turn into something like:
> 
> current->thread.fpr[i][2] = buf[i];
> current->thread.fpr[i][3] = buf[i+1];

Maybe abomination was going too far :-) 

I still think using the union makes it is easier to read than what you
have here.  Also, it better reflects the structure of what's being
stored there.

Mikey

> if you look at your code you'll see there are only a few places you  
> accessing the union as fpvsr.vsr[] and those places could easily be  
> fpr[], since they are already #CONFIG_VSX protected.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  6:01             ` Michael Neuling
@ 2008-06-19  6:10               ` Kumar Gala
  2008-06-19  9:33                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19  6:10 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 19, 2008, at 1:01 AM, Michael Neuling wrote:

> In message <B0E87874-BC65-4037- 
> A43D-91C4142475E7@kernel.crashing.org> you wrote
> :
>>>>>>> +	} fpvsr __attribute__((aligned(16)));
>>>>>>
>>>>>> Do we really need a union here?  what would happen if you just
>>>>>> changed
>>>>>> the type of fpr[32] from double to vector if #CONFIG_VSX?
>>>>>>
>>>>>> I really dont like the union and think we can just make the  
>>>>>> storage
>>>>>> look opaque which is the key.  I doubt we every really care about
>>>>>> using fpr[] as a double in the kernel.
>>>>>
>>>>> I did something similar to this for the first cut of this patch,  
>>>>> but
>>>>> it
>>>>> made the code accessing this structure much less readable.
>>>>
>>>> really, what code is that?
>>>
>>> Any code that has to read/write the top or bottom 64 bits _only_ of
>>> the
>>> 128 bit vector.
>>>
>>> The signals code is a good example where, for backwards  
>>> compatibility,
>>> we need to read/write the old 64 bit FP regs, from the 128 bit value
>>> in
>>> the struct.
>>>
>>> Similarly, the way we've extended the signals interface for VSX, you
>>> need to read/write out the bottom 64 bits (vsrlow) of a 128 bit  
>>> value.
>>>
>>> eg. the simple:
>>>    current->thread.fpvsr.fp[i].vsrlow = buf[i]
>>> would turn into some abomination/macro.
>>
>> it would turn into something like:
>>
>> current->thread.fpr[i][2] = buf[i];
>> current->thread.fpr[i][3] = buf[i+1];
>
> Maybe abomination was going too far :-)
>
> I still think using the union makes it is easier to read than what you
> have here.  Also, it better reflects the structure of what's being
> stored there.

I don't think that holds much weight with me.  We don't union the  
vector128 type to show it also supports float, u16, and u8 types.

I stick by the fact that the ONLY place it looks like you access the  
union via the .vsr member is for memset or memcpy so you clearly know  
if the size should be sizeof(double) or sizeof(vector).

Also, I can see the case in the future that 'fpr's become 128-bits  
wide' and allow for native long double support.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-18  0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
  2008-06-18 16:28   ` Joel Schopp
@ 2008-06-19  6:51   ` David Woodhouse
  2008-06-19  7:00     ` Michael Neuling
  1 sibling, 1 reply; 106+ messages in thread
From: David Woodhouse @ 2008-06-19  6:51 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras

On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
>         {"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
>  #endif /* CONFIG_ALTIVEC */
> +#ifdef CONFIG_VSX
> +       {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
> +#endif /* CONFIG_VSX */

Should that be "ibm,vsx"?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-19  6:51   ` David Woodhouse
@ 2008-06-19  7:00     ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-19  7:00 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linuxppc-dev, Paul Mackerras

> On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
> >         {"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
> >  #endif /* CONFIG_ALTIVEC */
> > +#ifdef CONFIG_VSX
> > +       {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
> > +#endif /* CONFIG_VSX */
> 
> Should that be "ibm,vsx"?

Nope "ibm,vmx" == 2 is correct for VSX.

You're not the first to think it looks wrong, so I should add a
comment.  

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  6:10               ` Kumar Gala
@ 2008-06-19  9:33                 ` Benjamin Herrenschmidt
  2008-06-19 13:24                   ` Kumar Gala
  0 siblings, 1 reply; 106+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-19  9:33 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras

On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
> > I still think using the union makes it is easier to read than what you
> > have here.  Also, it better reflects the structure of what's being
> > stored there.
> 
> I don't think that holds much weight with me.  We don't union the  
> vector128 type to show it also supports float, u16, and u8 types.

But this is different. The same registers are either basic FP regs or 
full VSX regs.

I don't see what's wrong with union, it's a nice way to express things.
 
> I stick by the fact that the ONLY place it looks like you access the  
> union via the .vsr member is for memset or memcpy so you clearly know  
> if the size should be sizeof(double) or sizeof(vector).
> 
> Also, I can see the case in the future that 'fpr's become 

What's wrong with the union ? there's nothing ugly about them..

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-19  9:33                 ` Benjamin Herrenschmidt
@ 2008-06-19 13:24                   ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 13:24 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras


On Jun 19, 2008, at 4:33 AM, Benjamin Herrenschmidt wrote:

> On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
>>> I still think using the union makes it is easier to read than what  
>>> you
>>> have here.  Also, it better reflects the structure of what's being
>>> stored there.
>>
>> I don't think that holds much weight with me.  We don't union the
>> vector128 type to show it also supports float, u16, and u8 types.
>
> But this is different. The same registers are either basic FP regs or
> full VSX regs.
>
> I don't see what's wrong with union, it's a nice way to express  
> things.

We also don't do this for SPE (the freescale version).

>> I stick by the fact that the ONLY place it looks like you access the
>> union via the .vsr member is for memset or memcpy so you clearly know
>> if the size should be sizeof(double) or sizeof(vector).
>>
>> Also, I can see the case in the future that 'fpr's become
>
> What's wrong with the union ? there's nothing ugly about them..

I'll wait for the next version and see how many places .vsr is  
actually accessed.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                   ` (9 preceding siblings ...)
  2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-20  4:13 ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
                     ` (10 more replies)
  10 siblings, 11 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
Paulus: please consider for your 2.6.27 tree.

Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
'NKOTB' Nelson.
- Changed thread_struct array definition to be cleaner
- Updated CPU_FTRS_POSSIBLE 
- Updated Kconfig typo and dupilicate
- Added comment to clarify ibm,vmx = 2 really means VSX. 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 3/9] powerpc: Move altivec_unavailable
  2008-06-20  4:13 ` Michael Neuling
                     ` (5 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/head_64.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf00
 	b	performance_monitor_pSeries
 
-	STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+	. = 0xf20
+	b	altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
+	STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-20  4:13 ` Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  6:39     ` Kumar Gala
  2008-06-20  4:13   ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/align.c       |    6 ++--
 arch/powerpc/kernel/asm-offsets.c |    2 -
 arch/powerpc/kernel/process.c     |    5 ++-
 arch/powerpc/kernel/ptrace.c      |   14 +++++----
 arch/powerpc/kernel/ptrace32.c    |    9 ++++--
 arch/powerpc/kernel/signal_32.c   |    6 ++--
 arch/powerpc/kernel/signal_64.c   |   13 +++++---
 arch/powerpc/kernel/softemu8xx.c  |    4 +-
 arch/powerpc/math-emu/math.c      |   56 +++++++++++++++++++-------------------
 include/asm-powerpc/ppc_asm.h     |    5 ++-
 include/asm-powerpc/processor.h   |    7 ++++
 11 files changed, 71 insertions(+), 56 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -66,7 +66,7 @@ int main(void)
 	DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
 	DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
 	DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
-	DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
+	DEFINE(THREAD_FPR0, offsetof(struct thread_struct, TS_FPR(0)));
 	DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
 #ifdef CONFIG_ALTIVEC
 	DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
-	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+	memset(current->thread.TS_FPRSTART, 0,
+	       sizeof(current->thread.TS_FPRSTART));
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   &target->thread.fpr, 0, -1);
+				   &target->thread.TS_FPRSTART, 0, -1);
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-				  &target->thread.fpr, 0, -1);
+				  &target->thread.TS_FPRSTART, 0, -1);
 }
 
 
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -122,7 +122,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +163,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (numReg - PT_FPR0)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +219,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.TS_FPRSTART)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -343,7 +343,7 @@ static int save_user_regs(struct pt_regs
 
 	/* save general and floating-point registers */
 	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.fpr,
+	    __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
 		    ELF_NFPREG * sizeof(double)))
 		return 1;
 
@@ -431,7 +431,7 @@ static long restore_user_regs(struct pt_
 
 	/*
 	 * Do this before updating the thread state in
-	 * current->thread.fpr/vr/evr.  That way, if we get preempted
+	 * current->thread.FPR/vr/evr.  That way, if we get preempted
 	 * and another task grabs the FPU/Altivec/SPE, it won't be
 	 * tempted to save the current CPU state into the thread_struct
 	 * and corrupt what we are writing there.
@@ -441,7 +441,7 @@ static long restore_user_regs(struct pt_
 	/* force the process to reload the FP registers from
 	   current->thread when it next does FP instructions */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
 		return 1;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -116,7 +116,8 @@ static long setup_sigcontext(struct sigc
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPRSTART,
+			      FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -168,7 +169,7 @@ static long restore_sigcontext(struct pt
 
 	/*
 	 * Do this before updating the thread state in
-	 * current->thread.fpr/vr.  That way, if we get preempted
+	 * current->thread.TS_FPR/vr.  That way, if we get preempted
 	 * and another task grabs the FPU/Altivec, it won't be
 	 * tempted to save the current CPU state into the thread_struct
 	 * and corrupt what we are writing there.
@@ -177,12 +178,14 @@ static long restore_sigcontext(struct pt
 
 	/*
 	 * Force reload of FP/VEC.
-	 * This has to be done before copying stuff into current->thread.fpr/vr
-	 * for the reasons explained in the previous comment.
+	 * This has to be done before copying stuff into
+	 * current->thread.TS_FPR/vr for the reasons explained in the
+	 * previous comment.
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
+				FP_REGS_SIZE);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,9 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+#define TS_FPRSTART fpr
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
@@ -197,12 +200,13 @@ struct thread_struct {
 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
 }
 #else
+#define	FPVSR_INIT_THREAD .fpr = {0}
 #define INIT_THREAD  { \
 	.ksp = INIT_SP, \
 	.ksp_limit = INIT_SP_LIMIT, \
 	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
 	.fs = KERNEL_DS, \
-	.fpr = {0}, \
+	FPVSR_INIT_THREAD, \
 	.fpscr = { .val = 0, }, \
 	.fpexc_mode = 0, \
 }
@@ -289,4 +293,5 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-20  4:13 ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  6:35     ` Kumar Gala
  2008-06-20  4:13   ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad.  Also
when we add VSX here in a later patch, we can hit two of these at the
same time.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/signal_32.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
 		int sigret)
 {
+	unsigned long msr = regs->msr;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_VEC in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_VEC;
 	}
 	/* else assert((regs->msr & MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_SPE in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_SPE;
 	}
 	/* else assert((regs->msr & MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
 		return 1;
 #endif /* CONFIG_SPE */
 
+	if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+		return 1;
 	if (sigret) {
 		/* Set up the sigreturn trampoline: li r0,sigret; sc */
 		if (__put_user(0x38000000UL + sigret, &frame->tramp[0])

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-20  4:13 ` Michael Neuling
                     ` (3 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S        |    2 +-
 arch/powerpc/kernel/head_32.S    |    6 ++++--
 arch/powerpc/kernel/head_64.S    |    8 +++++---
 arch/powerpc/kernel/head_booke.h |    6 ++++--
 4 files changed, 14 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
-	b	fast_exception_return
+	blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
 	b 	ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 	EXCEPTION_PROLOG
-	bne	load_up_fpu		/* if from user, just load it up */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_fpu		/* if from user, just load it up */
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
 	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
-1:	b	.load_up_fpu
+1:	bl	.load_up_fpu
+	b	fast_exception_return
 
 	.align	7
 	.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-	bne	.load_up_altivec	/* if from user, just load it up */
+	bnel	.load_up_altivec
+	b	fast_exception_return
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	bl	.save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
 	std	r4,0(r3)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
-	b	fast_exception_return
+	blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION					      \
 	START_EXCEPTION(FloatingPointUnavailable)			      \
 	NORMAL_EXCEPTION_PROLOG;					      \
-	bne	load_up_fpu;		/* if from user, just load it up */   \
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
+	beq	1f;							      \
+	bl	load_up_fpu;		/* if from user, just load it up */   \
+	b	fast_exception_return;					      \
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-20  4:13 ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
  2008-06-20  4:13   ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  6:44     ` Kumar Gala
  2008-06-20  4:13   ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

                   VSR doubleword 0               VSR doubleword 1
          ----------------------------------------------------------------
  VSR[0]  |             FPR[0]            |                              |
          ----------------------------------------------------------------
  VSR[1]  |             FPR[1]            |                              |
          ----------------------------------------------------------------
          |              ...              |                              |
          |              ...              |                              |
          ----------------------------------------------------------------
  VSR[30] |             FPR[30]           |                              |
          ----------------------------------------------------------------
  VSR[31] |             FPR[31]           |                              |
          ----------------------------------------------------------------
  VSR[32] |                             VR[0]                            |
          ----------------------------------------------------------------
  VSR[33] |                             VR[1]                            |
          ----------------------------------------------------------------
          |                              ...                             |
          |                              ...                             |
          ----------------------------------------------------------------
  VSR[62] |                             VR[30]                           |
          ----------------------------------------------------------------
  VSR[63] |                             VR[31]                           |
          ----------------------------------------------------------------

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/asm-offsets.c |    4 ++
 arch/powerpc/kernel/ptrace.c      |   28 ++++++++++++++++++
 arch/powerpc/kernel/signal_32.c   |   59 +++++++++++++++++++++++++++++---------
 arch/powerpc/kernel/signal_64.c   |   36 +++++++++++++++++++----
 include/asm-powerpc/processor.h   |   31 +++++++++++++++++++
 5 files changed, 139 insertions(+), 19 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpvsr[0].vsr));
+	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
 		   unsigned int pos, unsigned int count,
 		   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = target->thread.TS_FPR(i);
+	memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   &target->thread.TS_FPRSTART, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+	if (i)
+		return i;
+	for (i = 0; i < 32 ; i++)
+		target->thread.TS_FPR(i) = buf[i];
+	memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+	return 0;
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  &target->thread.TS_FPRSTART, 0, -1);
+#endif
 }
 
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
 		int sigret)
 {
 	unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
-	/* save general and floating-point registers */
-	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
-		    ELF_NFPREG * sizeof(double)))
+	/* save general registers */
+	if (save_general_regs(regs, frame))
 		return 1;
 
 #ifdef CONFIG_ALTIVEC
@@ -368,7 +370,21 @@ static int save_user_regs(struct pt_regs
 	if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* save FPR copy to local buffer then write to the thread_struct */
+	flush_fp_to_thread(current);
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+		return 1;
 
+#else
+	/* save floating-point registers */
+	if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
+		    ELF_NFPREG * sizeof(double)))
+		return 1;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* save spe registers */
 	if (current->thread.used_spe) {
@@ -411,6 +427,10 @@ static long restore_user_regs(struct pt_
 	long err;
 	unsigned int save_r2 = 0;
 	unsigned long msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/*
 	 * restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +458,11 @@ static long restore_user_regs(struct pt_
 	 */
 	discard_lazy_cpu_state();
 
-	/* force the process to reload the FP registers from
-	   current->thread when it next does FP instructions */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
-			     sizeof(sr->mc_fregs)))
-		return 1;
-
 #ifdef CONFIG_ALTIVEC
-	/* force the process to reload the altivec registers from
-	   current->thread when it next does altivec instructions */
+	/*
+	 * Force the process to reload the altivec registers from
+	 * current->thread when it next does altivec instructions
+	 */
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
@@ -462,6 +477,24 @@ static long restore_user_regs(struct pt_
 		return 1;
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+	if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+		return 1;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+
+#else
+	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
+			     sizeof(sr->mc_fregs)))
+		return 1;
+#endif /* CONFIG_VSX */
+	/*
+	 * force the process to reload the FP registers from
+	 * current->thread when it next does FP instructions
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
 #ifdef CONFIG_SPE
 	/* force the process to reload the spe registers from
 	   current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+	int i;
+#endif
 
 	flush_fp_to_thread(current);
 
@@ -112,12 +116,22 @@ static long setup_sigcontext(struct sigc
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
+	flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+	/* Copy FP to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+	/* copy fpr regs and fpscr */
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPR(0),
+			      FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
 	err |= __put_user(&sc->gp_regs, &sc->regs);
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPRSTART,
-			      FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -136,6 +150,9 @@ static long restore_sigcontext(struct pt
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs;
 #endif
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+#endif
 	unsigned long err = 0;
 	unsigned long save_r13 = 0;
 	elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -184,9 +201,6 @@ static long restore_sigcontext(struct pt
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
-				FP_REGS_SIZE);
-
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
 	if (err)
@@ -205,7 +219,19 @@ static long restore_sigcontext(struct pt
 	else
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* restore floating point */
+	err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+	if (err)
+		return err;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+#else
+	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
+				FP_REGS_SIZE);
+#endif
 	return err;
 }
 
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
 #ifdef CONFIG_PPC32
@@ -136,8 +137,13 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpvsr[i].fpr.fp
+#define TS_FPRSTART fpvsr
+#else
 #define TS_FPR(i) fpr[i]
 #define TS_FPRSTART fpr
+#endif
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
@@ -155,8 +161,19 @@ struct thread_struct {
 	unsigned long	dbcr0;		/* debug control register values */
 	unsigned long	dbcr1;
 #endif
+#ifdef CONFIG_VSX
+	/* First 32 VSX registers (overlap with fpr[32]) */
+	union {
+		struct {
+			double fp;
+			double vsrlow;
+		} fpr;
+		vector128	vsr;
+	} fpvsr[32];
+#else
 	double		fpr[32];	/* Complete floating point set */
-	struct {			/* fpr ... fpscr must be contiguous */
+#endif
+	struct {
 
 		unsigned int pad;
 		unsigned int val;	/* Floating point status */
@@ -176,6 +193,10 @@ struct thread_struct {
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* VSR status */
+	int		used_vsr;	/* set if process has used altivec */
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
 	u64		acc;		/* Accumulator */
@@ -200,7 +221,11 @@ struct thread_struct {
 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
 }
 #else
+#ifdef CONFIG_VSX
+#define	FPVSR_INIT_THREAD .fpvsr = {0}
+#else
 #define	FPVSR_INIT_THREAD .fpr = {0}
+#endif
 #define INIT_THREAD  { \
 	.ksp = INIT_SP, \
 	.ksp_limit = INIT_SP_LIMIT, \
@@ -293,5 +318,9 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
 #define TS_FPRSPACING 1
+#endif
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-20  4:13 ` Michael Neuling
                     ` (6 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only useful if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  CPUs (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-20  4:13 ` Michael Neuling
                     ` (4 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

---

 arch/powerpc/kernel/prom.c     |    4 ++++
 include/asm-powerpc/cputable.h |   15 ++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
 	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
 	{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP		0x00000400
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
+#define PPC_FEATURE_HAS_VSX		0x00000080
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
 #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP	0
+#define PPC_FEATURE_HAS_VSX_COMP    0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
 	    (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |	\
 	    CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |	\
 	    CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |		\
-	    CPU_FTR_1T_SEGMENT)
+	    CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
 #else
 enum {
 	CPU_FTRS_POSSIBLE =

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-20  4:13 ` Michael Neuling
                     ` (7 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  6:37   ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
  2008-06-23  5:31   ` Michael Neuling
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 				REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
+				 ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	REST_32FPRS(n,base);						\
+	b	3f;							\
+2:	REST_32VSRS(n,c,base);						\
+3:
+
+#define SAVE_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	SAVE_32FPRS(n,base);						\
+	b	3f;							\
+2:	SAVE_32VSRS(n,c,base);						\
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#endif
+
 #define SAVE_EVR(n,s,base)	evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)	SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)	SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define	vr30	30
 #define	vr31	31
 
+/* VSX Registers (VSRs) */
+
+#define	vsr0	0
+#define	vsr1	1
+#define	vsr2	2
+#define	vsr3	3
+#define	vsr4	4
+#define	vsr5	5
+#define	vsr6	6
+#define	vsr7	7
+#define	vsr8	8
+#define	vsr9	9
+#define	vsr10	10
+#define	vsr11	11
+#define	vsr12	12
+#define	vsr13	13
+#define	vsr14	14
+#define	vsr15	15
+#define	vsr16	16
+#define	vsr17	17
+#define	vsr18	18
+#define	vsr19	19
+#define	vsr20	20
+#define	vsr21	21
+#define	vsr22	22
+#define	vsr23	23
+#define	vsr24	24
+#define	vsr25	25
+#define	vsr26	26
+#define	vsr27	27
+#define	vsr28	28
+#define	vsr29	29
+#define	vsr30	30
+#define	vsr31	31
+#define	vsr32	32
+#define	vsr33	33
+#define	vsr34	34
+#define	vsr35	35
+#define	vsr36	36
+#define	vsr37	37
+#define	vsr38	38
+#define	vsr39	39
+#define	vsr40	40
+#define	vsr41	41
+#define	vsr42	42
+#define	vsr43	43
+#define	vsr44	44
+#define	vsr45	45
+#define	vsr46	46
+#define	vsr47	47
+#define	vsr48	48
+#define	vsr49	49
+#define	vsr50	50
+#define	vsr51	51
+#define	vsr52	52
+#define	vsr53	53
+#define	vsr54	54
+#define	vsr55	55
+#define	vsr56	56
+#define	vsr57	57
+#define	vsr58	58
+#define	vsr59	59
+#define	vsr60	60
+#define	vsr61	61
+#define	vsr62	62
+#define	vsr63	63
+
 /* SPE Registers (EVPRs) */
 
 #define	evr0	0

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
  2008-06-20  4:13 ` Michael Neuling
                     ` (2 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-20  4:13   ` Michael Neuling
  2008-06-20  4:13   ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  4:13 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/entry_64.S   |    5 +
 arch/powerpc/kernel/fpu.S        |   16 ++++-
 arch/powerpc/kernel/head_64.S    |   65 +++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S    |   33 +++++++++++
 arch/powerpc/kernel/ppc32.h      |    1 
 arch/powerpc/kernel/ppc_ksyms.c  |    3 +
 arch/powerpc/kernel/process.c    |  109 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/kernel/ptrace.c     |   70 +++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c  |   33 +++++++++++
 arch/powerpc/kernel/signal_64.c  |   31 ++++++++++-
 arch/powerpc/kernel/traps.c      |   29 ++++++++++
 include/asm-powerpc/elf.h        |    6 +-
 include/asm-powerpc/ptrace.h     |   12 ++++
 include/asm-powerpc/reg.h        |    2 
 include/asm-powerpc/sigcontext.h |   37 ++++++++++++-
 include/asm-powerpc/system.h     |    9 +++
 include/linux/elf.h              |    1 
 17 files changed, 454 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
 	mflr	r20		/* Return to switch caller */
 	mfmsr	r22
 	li	r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r0,r0,MSR_VSX@h	/* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	oris	r0,r0,MSR_VEC@h	/* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
 _GLOBAL(load_up_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
 	beq	1f
 	toreal(r4)
 	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPRS(0, r4)
+	SAVE_32FPVSRS(0, r5, r4)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r4)
 	PPC_LL	r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
 #endif
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
-	REST_32FPRS(0, r5)
+	REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
 	subi	r4,r5,THREAD
 	fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC_601
 	ISYNC_601
 	MTMSRD(r5)			/* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
-	SAVE_32FPRS(0, r3)
+	SAVE_32FPVSRS(0, r4 ,r3)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf20
 	b	altivec_unavailable_pSeries
 
+	. = 0xf40
+	b	vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
 	STD_EXCEPTION_PSERIES(., altivec_unavailable)
+	STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
 	blr
 #endif /* CONFIG_ALTIVEC */
 
+	.align	7
+	.globl vsx_unavailable_common
+vsx_unavailable_common:
+	EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	bne	.load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	ENABLE_INTS
+	bl	.vsx_unavailable_exception
+	b	.ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+	andi.	r5,r12,MSR_FP
+	beql+	load_up_fpu		/* skip if already loaded */
+	andis.	r5,r12,MSR_VEC@h
+	beql+	load_up_altivec		/* skip if already loaded */
+
+#ifndef CONFIG_SMP
+	ld	r3,last_task_used_vsx@got(r2)
+	ld	r4,0(r3)
+	cmpdi	0,r4,0
+	beq	1f
+	/* Disable VSX for last_task_used_vsx */
+	addi	r4,r4,THREAD
+	ld	r5,PT_REGS(r4)
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r6,MSR_VSX@h
+	andc	r6,r4,r6
+	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+	ld	r4,PACACURRENT(r13)
+	addi	r4,r4,THREAD		/* Get THREAD */
+	li	r6,1
+	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	/* enable use of VSX after return */
+	oris	r12,r12,MSR_VSX@h
+	std	r12,_MSR(r1)
+#ifndef CONFIG_SMP
+	/* Update last_task_used_math to 'current' */
+	ld	r4,PACACURRENT(r13)
+	std	r4,0(r3)
+#endif /* CONFIG_SMP */
+	b	fast_exception_return
+#endif /* CONFIG_VSX */
+
 /*
  * Hash table stuff
  */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
 
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+	mfmsr	r5
+	oris	r5,r5,MSR_VSX@h
+	mtmsrd	r5			/* enable use of VSX now */
+	isync
+
+	cmpdi	0,r3,0
+	beqlr-				/* if no previous owner, done */
+	addi	r3,r3,THREAD		/* want THREAD of task */
+	ld	r5,PT_REGS(r3)
+	cmpdi	0,r5,0
+	beq	1f
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r3,MSR_VSX@h
+	andc	r4,r4,r3		/* disable VSX for previous task */
+	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+	li	r5,0
+	ld	r4,last_task_used_vsx@got(r2)
+	std	r5,0(r4)
+#endif /* CONFIG_SMP */
+	blr
+
+#endif /* CONFIG_VSX */
+
 /* kexec_wait(phys_cpu)
  *
  * wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
 	elf_fpregset_t		mc_fregs;
 	unsigned int		mc_pad[2];
 	elf_vrregset_t32	mc_vregs __attribute__((__aligned__(16)));
+	elf_vsrreghalf_t32      mc_vsregs __attribute__((__aligned__(16)));
 };
 
 struct ucontext32 { 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
 #ifdef CONFIG_ALTIVEC
 EXPORT_SYMBOL(giveup_altivec);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 EXPORT_SYMBOL(giveup_spe);
 #endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
 {
+#ifdef CONFIG_VSX
+	int i;
+	elf_fpreg_t *reg;
+#endif
+
 	if (!tsk->thread.regs)
 		return 0;
 	flush_fp_to_thread(current);
 
+#ifdef CONFIG_VSX
+	reg = (elf_fpreg_t *)fpregs;
+	for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+		*reg = tsk->thread.TS_FPR(i);
+	memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
 	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
 
 	return 1;
 }
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
 	}
 }
 
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
 {
 	/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
 	 * separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+	WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+		giveup_vsx(current);
+	else
+		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
+#else
+	giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+	if (tsk->thread.regs) {
+		preempt_disable();
+		if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+			BUG_ON(tsk != current);
+#endif
+			giveup_vsx(tsk);
+		}
+		preempt_enable();
+	}
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers.  This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+	/* Grab only the first half */
+	const int nregs = 32;
+	elf_vrreg_t *reg;
+
+	if (tsk == current)
+		flush_vsx_to_thread(tsk);
+
+	reg = (elf_vrreg_t *)vrregs;
+
+	/* copy the first 32 vsr registers */
+	memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+	int rc = 0;
+	elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+	rc = dump_task_altivec(tsk, regs);
+	if (rc)
+		return rc;
+	regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+	rc = dump_task_altivec(tsk, regs);
+#endif
+	return rc;
+}
+
 #ifdef CONFIG_SPE
 
 void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
 	if (last_task_used_altivec == current)
 		last_task_used_altivec = NULL;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (last_task_used_vsx == current)
+		last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
 		giveup_altivec(prev);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+		giveup_vsx(prev);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
 	if (new->thread.regs && last_task_used_altivec == new)
 		new->thread.regs->msr |= MSR_VEC;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (new->thread.regs && last_task_used_vsx == new)
+		new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
 	{MSR_EE,	"EE"},
 	{MSR_PR,	"PR"},
 	{MSR_FP,	"FP"},
+	{MSR_VEC,	"VEC"},
+	{MSR_VSX,	"VSX"},
 	{MSR_ME,	"ME"},
 	{MSR_IR,	"IR"},
 	{MSR_DR,	"DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct 
 {
 	flush_fp_to_thread(current);
 	flush_altivec_to_thread(current);
+	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
 }
 
@@ -689,8 +790,14 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+	memset(current->thread.fpvsr, 0,
+	       sizeof(current->thread.fpvsr));
+	current->thread.used_vsr = 0;
+#else
 	memset(current->thread.TS_FPRSTART, 0,
 	       sizeof(current->thread.TS_FPRSTART));
+#endif /* CONFIG_VSX */
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell.  This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+		      const struct user_regset *regset)
+{
+	flush_vsx_to_thread(target);
+	return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  &target->thread.fpvsr[0].vsr, 0,
+				  32 * sizeof(vector128));
+
+	return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 &target->thread.fpvsr[0].vsr, 0,
+				 32 * sizeof(vector128));
+
+	return ret;
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_SPE
 
 /*
@@ -427,6 +472,9 @@ enum powerpc_regset {
 #ifdef CONFIG_ALTIVEC
 	REGSET_VMX,
 #endif
+#ifdef CONFIG_VSX
+	REGSET_VSX,
+#endif
 #ifdef CONFIG_SPE
 	REGSET_SPE,
 #endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
 		.active = vr_active, .get = vr_get, .set = vr_set
 	},
 #endif
+#ifdef CONFIG_VSX
+	[REGSET_VSX] = {
+		.core_note_type = NT_PPC_VSX, .n = 34,
+		.size = sizeof(vector128), .align = sizeof(vector128),
+		.active = vsr_active, .get = vsr_get, .set = vsr_set
+	},
+#endif
 #ifdef CONFIG_SPE
 	[REGSET_SPE] = {
 		.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
 						 sizeof(u32)),
 					     (const void __user *) data);
 #endif
+#ifdef CONFIG_VSX
+	case PTRACE_GETVSRREGS:
+		return copy_regset_to_user(child, &user_ppc_native_view,
+					   REGSET_VSX,
+					   0, (32 * sizeof(vector128) +
+					       sizeof(u32)),
+					   (void __user *) data);
+
+	case PTRACE_SETVSRREGS:
+		return copy_regset_from_user(child, &user_ppc_native_view,
+					     REGSET_VSX,
+					     0, (32 * sizeof(vector128) +
+						 sizeof(u32)),
+					     (const void __user *) data);
+#endif
 #ifdef CONFIG_SPE
 	case PTRACE_GETEVRREGS:
 		/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -379,6 +379,21 @@ static int save_user_regs(struct pt_regs
 	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
 		return 1;
 
+	/*
+	 * Copy VSR 0-31 upper half from thread_struct to local
+	 * buffer, then write that to userspace.  Also set MSR_VSX in
+	 * the saved MSR value to indicate that frame->mc_vregs
+	 * contains valid data
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpvsr[i].fpr.vsrlow;
+		if (__copy_to_user(&frame->mc_vsregs, buf,
+				   ELF_NVSRHALFREG  * sizeof(double)))
+			return 1;
+		msr |= MSR_VSX;
+	}
 #else
 	/* save floating-point registers */
 	if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
@@ -484,6 +499,24 @@ static long restore_user_regs(struct pt_
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Force the process to reload the VSX registers from
+	 * current->thread when it next does VSX instruction.
+	 */
+	regs->msr &= ~MSR_VSX;
+	if (msr & MSR_VSX) {
+		/*
+		 * Restore altivec registers from the stack to a local
+		 * buffer, then write this out to the thread_struct
+		 */
+		if (__copy_from_user(buf, &sr->mc_vsregs,
+				     sizeof(sr->mc_vsregs)))
+			return 1;
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpvsr[i].fpr.vsrlow = buf[i];
+	} else if (current->thread.used_vsr)
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpvsr[i].fpr.vsrlow = 0;
 #else
 	if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
 		buf[i] = current->thread.TS_FPR(i);
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+	/*
+	 * Copy VSX low doubleword to local buffer for formatting,
+	 * then out to userspace.  Update v_regs to point after the
+	 * VMX data.
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		v_regs += ELF_NVRREG;
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpvsr[i].fpr.vsrlow;
+		err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+		/* set MSR_VSX in the MSR value in the frame to
+		 * indicate that sc->vs_reg) contains valid data.
+		 */
+		msr |= MSR_VSX;
+	}
 #else /* CONFIG_VSX */
 	/* copy fpr regs and fpscr */
 	err |= __copy_to_user(&sc->fp_regs, &current->thread.TS_FPR(0),
@@ -199,7 +215,7 @@ static long restore_sigcontext(struct pt
 	 * current->thread.TS_FPR/vr for the reasons explained in the
 	 * previous comment.
 	 */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
@@ -228,6 +244,19 @@ static long restore_sigcontext(struct pt
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Get additional VSX data. Update v_regs to point after the
+	 * VMX data.  Copy VSX low doubleword from userspace to local
+	 * buffer for formatting, then into the taskstruct.
+	 */
+	v_regs += ELF_NVRREG;
+	if ((msr & MSR_VSX) != 0)
+		err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+	else
+		memset(buf, 0, 32 * sizeof(double));
+
+	for (i = 0; i < 32 ; i++)
+		current->thread.fpvsr[i].fpr.vsrlow = buf[i];
 #else
 	err |= __copy_from_user(&current->thread.TS_FPRSTART, &sc->fp_regs,
 				FP_REGS_SIZE);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* A user program has executed an vsx instruction,
+		   but this kernel doesn't support vsx. */
+		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+		return;
+	}
+
+	printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+			"%lx at %lx\n", regs->trap, regs->nip);
+	die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
 	perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "VSX assist exception in kernel mode"
+		       " at %lx\n", regs->nip);
+		die("Kernel VSX assist exception", regs, SIGILL);
+	}
+
+	flush_vsx_to_thread(current);
+	printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+	_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_FSL_BOOKE
 void CacheLockingException(struct pt_regs *regs, unsigned long address,
 			   unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
 #ifdef __powerpc64__
 # define ELF_NVRREG32	33	/* includes vscr & vrsave stuffed together */
 # define ELF_NVRREG	34	/* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32	/* Half the vsx registers */
 # define ELF_GREG_TYPE	elf_greg_t64
 #else
 # define ELF_NEVRREG	34	/* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
 typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
 #ifdef __powerpc64__
 typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 #endif
 
 #ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
 typedef elf_vrregset_t elf_fpxregset_t;
 
 #ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
 #define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
 #endif
 
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
 #define PT_VRSAVE_32 (PT_VR0 + 33*4)
 #endif
 
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150	/* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 	/* each VSR reg occupies 4 slots in 32-bit */
+#endif
 #endif /* __powerpc64__ */
 
 /*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
 #define PTRACE_GETEVRREGS	20
 #define PTRACE_SETEVRREGS	21
 
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS	27
+#define PTRACE_SETVSRREGS	28
+
 /*
  * Get or set a debug register. The first 16 are DABR registers and the
  * second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
 #define MSR_ISF_LG	61              /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG 	60              /* Hypervisor state */
 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */
 #define MSR_WE_LG	18		/* Wait State Enable */
 #define MSR_TGPR_LG	17		/* TLB Update registers in use */
@@ -71,6 +72,7 @@
 #endif
 
 #define MSR_VEC		__MASK(MSR_VEC_LG)	/* Enable AltiVec */
+#define MSR_VSX		__MASK(MSR_VSX_LG)	/* Enable VSX */
 #define MSR_POW		__MASK(MSR_POW_LG)	/* Enable Power Management */
 #define MSR_WE		__MASK(MSR_WE_LG)	/* Wait State Enable */
 #define MSR_TGPR	__MASK(MSR_TGPR_LG)	/* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
  * it must be copied via a vector register to/from storage) or as a word.
  * The entry with index 33 contains the vrsave as the first word (offset 0)
  * within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words.  Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ *                    VSR doubleword 0               VSR doubleword 1
+ *           ----------------------------------------------------------------
+ *   VSR[0]  |             FPR[0]            |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[1]  |             FPR[1]            |                              |
+ *           ----------------------------------------------------------------
+ *           |              ...              |                              |
+ *           |              ...              |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[30] |             FPR[30]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[31] |             FPR[31]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[32] |                             VR[0]                            |
+ *           ----------------------------------------------------------------
+ *   VSR[33] |                             VR[1]                            |
+ *           ----------------------------------------------------------------
+ *           |                              ...                             |
+ *           |                              ...                             |
+ *           ----------------------------------------------------------------
+ *   VSR[62] |                             VR[30]                           |
+ *           ----------------------------------------------------------------
+ *   VSR[63] |                             VR[31]                           |
+ *           ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve.  vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
  */
 	elf_vrreg_t	__user *v_regs;
-	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
 #endif
 };
 
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
 extern void giveup_altivec(struct task_struct *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
 }
 #endif
 
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 #ifdef CONFIG_SPE
 extern void flush_spe_to_thread(struct task_struct *);
 #else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
 #define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */
 #define NT_PPC_VMX	0x100		/* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE	0x101		/* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX	0x102		/* PowerPC VSX registers */
 #define NT_386_TLS	0x200		/* i386 TLS slots (struct user_desc) */
 
 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-20  4:13   ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-20  6:35     ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-20  6:35 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:

> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit.  This will never happen in reality (VMX and SPE will never be in
> the same processor as their opcodes overlap), but it looks bad.  Also
> when we add VSX here in a later patch, we can hit two of these at the
> same time.

Also, MSR_SPE and MSR_VEC are the same bit.  So we'd never clobber  
anything.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-20  4:13 ` Michael Neuling
                     ` (8 preceding siblings ...)
  2008-06-20  4:13   ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-20  6:37   ` Kumar Gala
  2008-06-20  8:15     ` Michael Neuling
  2008-06-23  5:31   ` Michael Neuling
  10 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-20  6:37 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:

> The following set of patches adds Vector Scalar Extentions (VSX)
> support for POWER7.  Includes context switch, ptrace and signals  
> support.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
> Paulus: please consider for your 2.6.27 tree.
>
> Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
> 'NKOTB' Nelson.
> - Changed thread_struct array definition to be cleaner
> - Updated CPU_FTRS_POSSIBLE
> - Updated Kconfig typo and dupilicate
> - Added comment to clarify ibm,vmx = 2 really means VSX.

One question I was wondering about is the "user space" view of VSX.   
Is the intent to have it seem like there is a unique register set for  
VSX separate from FP or AltiVec?

(This gets into what the ABI changes look like).

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-20  4:13   ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-20  6:39     ` Kumar Gala
  2008-06-22 11:29       ` Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-20  6:39 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:

> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> @@ -136,6 +136,9 @@ typedef struct {
> 	unsigned long seg;
> } mm_segment_t;
>
> +#define TS_FPR(i) fpr[i]
> +#define TS_FPRSTART fpr
> +
> struct thread_struct {
> 	unsigned long	ksp;		/* Kernel stack pointer */
> 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
> @@ -197,12 +200,13 @@ struct thread_struct {
> 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
> }
> #else
> +#define	FPVSR_INIT_THREAD .fpr = {0}

Being a bit nit picky, but doesn't seem like this patch should  
introduce FPVSR.

>
> #define INIT_THREAD  { \
> 	.ksp = INIT_SP, \
> 	.ksp_limit = INIT_SP_LIMIT, \
> 	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
> 	.fs = KERNEL_DS, \
> -	.fpr = {0}, \
> +	FPVSR_INIT_THREAD, \
> 	.fpscr = { .val = 0, }, \
> 	.fpexc_mode = 0, \
> }
> @@ -289,4 +293,5 @@ static inline void prefetchw(const void
>
> #endif /* __KERNEL__ */
> #endif /* __ASSEMBLY__ */
> +#define TS_FPRSPACING 1
> #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-20  4:13   ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-20  6:44     ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-20  6:44 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras

> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> /* Lazy FPU handling on uni-processor */
> extern struct task_struct *last_task_used_math;
> extern struct task_struct *last_task_used_altivec;
> +extern struct task_struct *last_task_used_vsx;
> extern struct task_struct *last_task_used_spe;
>
> #ifdef CONFIG_PPC32
> @@ -136,8 +137,13 @@ typedef struct {
> 	unsigned long seg;
> } mm_segment_t;
>
> +#ifdef CONFIG_VSX
> +#define TS_FPR(i) fpvsr[i].fpr.fp
> +#define TS_FPRSTART fpvsr
> +#else
> #define TS_FPR(i) fpr[i]
> #define TS_FPRSTART fpr
> +#endif
>
> struct thread_struct {
> 	unsigned long	ksp;		/* Kernel stack pointer */
> @@ -155,8 +161,19 @@ struct thread_struct {
> 	unsigned long	dbcr0;		/* debug control register values */
> 	unsigned long	dbcr1;
> #endif
> +#ifdef CONFIG_VSX
> +	/* First 32 VSX registers (overlap with fpr[32]) */
> +	union {
> +		struct {
> +			double fp;

s/fp/fpr

> +			double vsrlow;
> +		} fpr;
> +		vector128	vsr;
> +	} fpvsr[32];
> +#else
> 	double		fpr[32];	/* Complete floating point set */
> -	struct {			/* fpr ... fpscr must be contiguous */
> +#endif
> +	struct {
>
> 		unsigned int pad;
> 		unsigned int val;	/* Floating point status */

So if I search correctly I count 2 uses of .vsr.  Seems like we could  
easily make those two cases use .fp and drop the union.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-20  6:37   ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-20  8:15     ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20  8:15 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras



In message <B353EBCF-A080-41C7-B331-61D29C6F5C02@kernel.crashing.org> you wrote
:
> 
> On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
> 
> > The following set of patches adds Vector Scalar Extentions (VSX)
> > support for POWER7.  Includes context switch, ptrace and signals  
> > support.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> > Paulus: please consider for your 2.6.27 tree.
> >
> > Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
> > 'NKOTB' Nelson.
> > - Changed thread_struct array definition to be cleaner
> > - Updated CPU_FTRS_POSSIBLE
> > - Updated Kconfig typo and dupilicate
> > - Added comment to clarify ibm,vmx = 2 really means VSX.
> 
> One question I was wondering about is the "user space" view of VSX.   
> Is the intent to have it seem like there is a unique register set for  
> VSX separate from FP or AltiVec?

For userspace it's not a unique register set.  So if you execute FP code
in the middle of your VSX code, you change VSX registers 0-31.

Userspace will see the same as if it was running natively on the CPU.  

> (This gets into what the ABI changes look like).

Signals and ptrace interfaces have been kept backwards compatible.

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-20  6:39     ` Kumar Gala
@ 2008-06-22 11:29       ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-22 11:29 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras



In message <7F82D2F6-6FB3-49F0-9512-D60AC2E9CBED@kernel.crashing.org> you wrote
:
> 
> On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
> 
> > Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> > +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > @@ -136,6 +136,9 @@ typedef struct {
> > 	unsigned long seg;
> > } mm_segment_t;
> >
> > +#define TS_FPR(i) fpr[i]
> > +#define TS_FPRSTART fpr
> > +
> > struct thread_struct {
> > 	unsigned long	ksp;		/* Kernel stack pointer */
> > 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow *
/
> > @@ -197,12 +200,13 @@ struct thread_struct {
> > 	.fpexc_mode = MSR_FE0 | MSR_FE1, \
> > }
> > #else
> > +#define	FPVSR_INIT_THREAD .fpr = {0}
> 
> Being a bit nit picky, but doesn't seem like this patch should  
> introduce FPVSR.

Yep.. a bit early, thanks.

> 
> >
> > #define INIT_THREAD  { \
> > 	.ksp = INIT_SP, \
> > 	.ksp_limit = INIT_SP_LIMIT, \
> > 	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
> > 	.fs = KERNEL_DS, \
> > -	.fpr = {0}, \
> > +	FPVSR_INIT_THREAD, \
> > 	.fpscr = { .val = 0, }, \
> > 	.fpexc_mode = 0, \
> > }
> > @@ -289,4 +293,5 @@ static inline void prefetchw(const void
> >
> > #endif /* __KERNEL__ */
> > #endif /* __ASSEMBLY__ */
> > +#define TS_FPRSPACING 1
> > #endif /* _ASM_POWERPC_PROCESSOR_H */
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-23  5:31   ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad.  Also
when we add VSX here in a later patch, we can hit two of these at the
same time.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/signal_32.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
 		int sigret)
 {
+	unsigned long msr = regs->msr;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_VEC in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_VEC;
 	}
 	/* else assert((regs->msr & MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_SPE in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_SPE;
 	}
 	/* else assert((regs->msr & MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
 		return 1;
 #endif /* CONFIG_SPE */
 
+	if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+		return 1;
 	if (sigret) {
 		/* Set up the sigreturn trampoline: li r0,sigret; sc */
 		if (__put_user(0x38000000UL + sigret, &frame->tramp[0])

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-23  5:31   ` Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S        |    2 +-
 arch/powerpc/kernel/head_32.S    |    6 ++++--
 arch/powerpc/kernel/head_64.S    |    8 +++++---
 arch/powerpc/kernel/head_booke.h |    6 ++++--
 4 files changed, 14 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
-	b	fast_exception_return
+	blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
 	b 	ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 	EXCEPTION_PROLOG
-	bne	load_up_fpu		/* if from user, just load it up */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_fpu		/* if from user, just load it up */
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
 	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
-1:	b	.load_up_fpu
+1:	bl	.load_up_fpu
+	b	fast_exception_return
 
 	.align	7
 	.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-	bne	.load_up_altivec	/* if from user, just load it up */
+	bnel	.load_up_altivec
+	b	fast_exception_return
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	bl	.save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
 	std	r4,0(r3)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
-	b	fast_exception_return
+	blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION					      \
 	START_EXCEPTION(FloatingPointUnavailable)			      \
 	NORMAL_EXCEPTION_PROLOG;					      \
-	bne	load_up_fpu;		/* if from user, just load it up */   \
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
+	beq	1f;							      \
+	bl	load_up_fpu;		/* if from user, just load it up */   \
+	b	fast_exception_return;					      \
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 3/9] powerpc: Move altivec_unavailable
  2008-06-23  5:31   ` Michael Neuling
                       ` (2 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/head_64.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf00
 	b	performance_monitor_pSeries
 
-	STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+	. = 0xf20
+	b	altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
+	STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-20  4:13 ` Michael Neuling
                     ` (9 preceding siblings ...)
  2008-06-20  6:37   ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-23  5:31   ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
                       ` (9 more replies)
  10 siblings, 10 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
Paulus: please consider for your 2.6.27 tree.

- Updated to remove the union that Kumar doesn't like.  I'm not sure I
  like like this version as much due to the magic offsets required to
  access the vsrlow.  It does clean up some other parts of the code
  though.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-23  5:31   ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
  2008-06-23  5:31     ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/align.c      |    6 ++--
 arch/powerpc/kernel/process.c    |    5 ++-
 arch/powerpc/kernel/ptrace.c     |   14 +++++----
 arch/powerpc/kernel/ptrace32.c   |    9 ++++--
 arch/powerpc/kernel/softemu8xx.c |    4 +-
 arch/powerpc/math-emu/math.c     |   56 +++++++++++++++++++--------------------
 include/asm-powerpc/ppc_asm.h    |    5 ++-
 include/asm-powerpc/processor.h  |    3 ++
 8 files changed, 56 insertions(+), 46 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
-	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+	memset(current->thread.fpr, 0,
+	       sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   &target->thread.fpr, 0, -1);
+				   target->thread.fpr, 0, -1);
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-				  &target->thread.fpr, 0, -1);
+				  target->thread.fpr, 0, -1);
 }
 
 
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -122,7 +122,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +163,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.fpr)
+				[TS_FPRSPACING * (numReg - PT_FPR0)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +219,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,8 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
@@ -289,4 +291,5 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-23  5:31   ` Michael Neuling
                       ` (5 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

                   VSR doubleword 0               VSR doubleword 1
          ----------------------------------------------------------------
  VSR[0]  |             FPR[0]            |                              |
          ----------------------------------------------------------------
  VSR[1]  |             FPR[1]            |                              |
          ----------------------------------------------------------------
          |              ...              |                              |
          |              ...              |                              |
          ----------------------------------------------------------------
  VSR[30] |             FPR[30]           |                              |
          ----------------------------------------------------------------
  VSR[31] |             FPR[31]           |                              |
          ----------------------------------------------------------------
  VSR[32] |                             VR[0]                            |
          ----------------------------------------------------------------
  VSR[33] |                             VR[1]                            |
          ----------------------------------------------------------------
          |                              ...                             |
          |                              ...                             |
          ----------------------------------------------------------------
  VSR[62] |                             VR[30]                           |
          ----------------------------------------------------------------
  VSR[63] |                             VR[31]                           |
          ----------------------------------------------------------------

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/asm-offsets.c |    4 ++
 arch/powerpc/kernel/ptrace.c      |   28 ++++++++++++++++++
 arch/powerpc/kernel/signal_32.c   |   59 ++++++++++++++++++++++++++++----------
 arch/powerpc/kernel/signal_64.c   |   32 ++++++++++++++++++--
 include/asm-powerpc/processor.h   |   21 ++++++++++++-
 5 files changed, 126 insertions(+), 18 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
 		   unsigned int pos, unsigned int count,
 		   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = target->thread.TS_FPR(i);
+	memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   target->thread.fpr, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+	if (i)
+		return i;
+	for (i = 0; i < 32 ; i++)
+		target->thread.TS_FPR(i) = buf[i];
+	memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+	return 0;
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  target->thread.fpr, 0, -1);
+#endif
 }
 
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
 		int sigret)
 {
 	unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
-	/* save general and floating-point registers */
-	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.fpr,
-		    ELF_NFPREG * sizeof(double)))
+	/* save general registers */
+	if (save_general_regs(regs, frame))
 		return 1;
 
 #ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
 	if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 #endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+	/* save FPR copy to local buffer then write to the thread_struct */
+	flush_fp_to_thread(current);
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+		return 1;
+#else
+	/* save floating-point registers */
+	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+		    ELF_NFPREG * sizeof(double)))
+		return 1;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* save spe registers */
 	if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
 	long err;
 	unsigned int save_r2 = 0;
 	unsigned long msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/*
 	 * restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
 	 */
 	discard_lazy_cpu_state();
 
-	/* force the process to reload the FP registers from
-	   current->thread when it next does FP instructions */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
-			     sizeof(sr->mc_fregs)))
-		return 1;
-
 #ifdef CONFIG_ALTIVEC
-	/* force the process to reload the altivec registers from
-	   current->thread when it next does altivec instructions */
+	/*
+	 * Force the process to reload the altivec registers from
+	 * current->thread when it next does altivec instructions
+	 */
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
 		return 1;
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+	if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+		return 1;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+#else
+	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+			     sizeof(sr->mc_fregs)))
+		return 1;
+#endif /* CONFIG_VSX */
+	/*
+	 * force the process to reload the FP registers from
+	 * current->thread when it next does FP instructions
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
 #ifdef CONFIG_SPE
 	/* force the process to reload the spe registers from
 	   current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+	int i;
+#endif
 
 	flush_fp_to_thread(current);
 
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
+	flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+	/* Copy FP to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+	/* copy fpr regs and fpscr */
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
 	err |= __put_user(&sc->gp_regs, &sc->regs);
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs;
 #endif
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+#endif
 	unsigned long err = 0;
 	unsigned long save_r13 = 0;
 	elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
 	if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
 	else
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* restore floating point */
+	err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+	if (err)
+		return err;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+#else
+	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
 	return err;
 }
 
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
 #ifdef CONFIG_PPC32
@@ -136,7 +137,13 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
+#else
 #define TS_FPR(i) fpr[i]
+#endif
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
@@ -154,8 +161,12 @@ struct thread_struct {
 	unsigned long	dbcr0;		/* debug control register values */
 	unsigned long	dbcr1;
 #endif
+#ifdef CONFIG_VSX
+	double		fpr[32][2];	/* Complete floating point set */
+#else
 	double		fpr[32];	/* Complete floating point set */
-	struct {			/* fpr ... fpscr must be contiguous */
+#endif
+	struct {
 
 		unsigned int pad;
 		unsigned int val;	/* Floating point status */
@@ -175,6 +186,10 @@ struct thread_struct {
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* VSR status */
+	int		used_vsr;	/* set if process has used altivec */
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
 	u64		acc;		/* Accumulator */
@@ -291,5 +306,9 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
 #define TS_FPRSPACING 1
+#endif
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-23  5:31   ` Michael Neuling
                       ` (3 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 				REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
+				 ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	REST_32FPRS(n,base);						\
+	b	3f;							\
+2:	REST_32VSRS(n,c,base);						\
+3:
+
+#define SAVE_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	SAVE_32FPRS(n,base);						\
+	b	3f;							\
+2:	SAVE_32VSRS(n,c,base);						\
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#endif
+
 #define SAVE_EVR(n,s,base)	evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)	SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)	SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define	vr30	30
 #define	vr31	31
 
+/* VSX Registers (VSRs) */
+
+#define	vsr0	0
+#define	vsr1	1
+#define	vsr2	2
+#define	vsr3	3
+#define	vsr4	4
+#define	vsr5	5
+#define	vsr6	6
+#define	vsr7	7
+#define	vsr8	8
+#define	vsr9	9
+#define	vsr10	10
+#define	vsr11	11
+#define	vsr12	12
+#define	vsr13	13
+#define	vsr14	14
+#define	vsr15	15
+#define	vsr16	16
+#define	vsr17	17
+#define	vsr18	18
+#define	vsr19	19
+#define	vsr20	20
+#define	vsr21	21
+#define	vsr22	22
+#define	vsr23	23
+#define	vsr24	24
+#define	vsr25	25
+#define	vsr26	26
+#define	vsr27	27
+#define	vsr28	28
+#define	vsr29	29
+#define	vsr30	30
+#define	vsr31	31
+#define	vsr32	32
+#define	vsr33	33
+#define	vsr34	34
+#define	vsr35	35
+#define	vsr36	36
+#define	vsr37	37
+#define	vsr38	38
+#define	vsr39	39
+#define	vsr40	40
+#define	vsr41	41
+#define	vsr42	42
+#define	vsr43	43
+#define	vsr44	44
+#define	vsr45	45
+#define	vsr46	46
+#define	vsr47	47
+#define	vsr48	48
+#define	vsr49	49
+#define	vsr50	50
+#define	vsr51	51
+#define	vsr52	52
+#define	vsr53	53
+#define	vsr54	54
+#define	vsr55	55
+#define	vsr56	56
+#define	vsr57	57
+#define	vsr58	58
+#define	vsr59	59
+#define	vsr60	60
+#define	vsr61	61
+#define	vsr62	62
+#define	vsr63	63
+
 /* SPE Registers (EVPRs) */
 
 #define	evr0	0

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
  2008-06-23  5:31   ` Michael Neuling
                       ` (6 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/entry_64.S   |    5 +
 arch/powerpc/kernel/fpu.S        |   16 ++++-
 arch/powerpc/kernel/head_64.S    |   65 +++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S    |   33 ++++++++++++
 arch/powerpc/kernel/ppc32.h      |    1 
 arch/powerpc/kernel/ppc_ksyms.c  |    3 +
 arch/powerpc/kernel/process.c    |  106 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/kernel/ptrace.c     |   70 +++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c  |   33 ++++++++++++
 arch/powerpc/kernel/signal_64.c  |   31 +++++++++++
 arch/powerpc/kernel/traps.c      |   29 ++++++++++
 include/asm-powerpc/elf.h        |    6 +-
 include/asm-powerpc/ptrace.h     |   12 ++++
 include/asm-powerpc/reg.h        |    2 
 include/asm-powerpc/sigcontext.h |   37 +++++++++++++
 include/asm-powerpc/system.h     |    9 +++
 include/linux/elf.h              |    1 
 17 files changed, 451 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
 	mflr	r20		/* Return to switch caller */
 	mfmsr	r22
 	li	r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r0,r0,MSR_VSX@h	/* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	oris	r0,r0,MSR_VEC@h	/* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
 _GLOBAL(load_up_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
 	beq	1f
 	toreal(r4)
 	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPRS(0, r4)
+	SAVE_32FPVSRS(0, r5, r4)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r4)
 	PPC_LL	r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
 #endif
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
-	REST_32FPRS(0, r5)
+	REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
 	subi	r4,r5,THREAD
 	fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC_601
 	ISYNC_601
 	MTMSRD(r5)			/* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
-	SAVE_32FPRS(0, r3)
+	SAVE_32FPVSRS(0, r4 ,r3)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf20
 	b	altivec_unavailable_pSeries
 
+	. = 0xf40
+	b	vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
 	STD_EXCEPTION_PSERIES(., altivec_unavailable)
+	STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
 	blr
 #endif /* CONFIG_ALTIVEC */
 
+	.align	7
+	.globl vsx_unavailable_common
+vsx_unavailable_common:
+	EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	bne	.load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	ENABLE_INTS
+	bl	.vsx_unavailable_exception
+	b	.ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+	andi.	r5,r12,MSR_FP
+	beql+	load_up_fpu		/* skip if already loaded */
+	andis.	r5,r12,MSR_VEC@h
+	beql+	load_up_altivec		/* skip if already loaded */
+
+#ifndef CONFIG_SMP
+	ld	r3,last_task_used_vsx@got(r2)
+	ld	r4,0(r3)
+	cmpdi	0,r4,0
+	beq	1f
+	/* Disable VSX for last_task_used_vsx */
+	addi	r4,r4,THREAD
+	ld	r5,PT_REGS(r4)
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r6,MSR_VSX@h
+	andc	r6,r4,r6
+	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+	ld	r4,PACACURRENT(r13)
+	addi	r4,r4,THREAD		/* Get THREAD */
+	li	r6,1
+	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	/* enable use of VSX after return */
+	oris	r12,r12,MSR_VSX@h
+	std	r12,_MSR(r1)
+#ifndef CONFIG_SMP
+	/* Update last_task_used_math to 'current' */
+	ld	r4,PACACURRENT(r13)
+	std	r4,0(r3)
+#endif /* CONFIG_SMP */
+	b	fast_exception_return
+#endif /* CONFIG_VSX */
+
 /*
  * Hash table stuff
  */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
 
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+	mfmsr	r5
+	oris	r5,r5,MSR_VSX@h
+	mtmsrd	r5			/* enable use of VSX now */
+	isync
+
+	cmpdi	0,r3,0
+	beqlr-				/* if no previous owner, done */
+	addi	r3,r3,THREAD		/* want THREAD of task */
+	ld	r5,PT_REGS(r3)
+	cmpdi	0,r5,0
+	beq	1f
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r3,MSR_VSX@h
+	andc	r4,r4,r3		/* disable VSX for previous task */
+	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+	li	r5,0
+	ld	r4,last_task_used_vsx@got(r2)
+	std	r5,0(r4)
+#endif /* CONFIG_SMP */
+	blr
+
+#endif /* CONFIG_VSX */
+
 /* kexec_wait(phys_cpu)
  *
  * wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
 	elf_fpregset_t		mc_fregs;
 	unsigned int		mc_pad[2];
 	elf_vrregset_t32	mc_vregs __attribute__((__aligned__(16)));
+	elf_vsrreghalf_t32      mc_vsregs __attribute__((__aligned__(16)));
 };
 
 struct ucontext32 { 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
 #ifdef CONFIG_ALTIVEC
 EXPORT_SYMBOL(giveup_altivec);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 EXPORT_SYMBOL(giveup_spe);
 #endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
 {
+#ifdef CONFIG_VSX
+	int i;
+	elf_fpreg_t *reg;
+#endif
+
 	if (!tsk->thread.regs)
 		return 0;
 	flush_fp_to_thread(current);
 
+#ifdef CONFIG_VSX
+	reg = (elf_fpreg_t *)fpregs;
+	for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+		*reg = tsk->thread.TS_FPR(i);
+	memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
 	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
 
 	return 1;
 }
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
 	}
 }
 
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
 {
 	/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
 	 * separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+	WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+		giveup_vsx(current);
+	else
+		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
+#else
+	giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+	if (tsk->thread.regs) {
+		preempt_disable();
+		if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+			BUG_ON(tsk != current);
+#endif
+			giveup_vsx(tsk);
+		}
+		preempt_enable();
+	}
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers.  This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+	/* Grab only the first half */
+	const int nregs = 32;
+	elf_vrreg_t *reg;
+
+	if (tsk == current)
+		flush_vsx_to_thread(tsk);
+
+	reg = (elf_vrreg_t *)vrregs;
+
+	/* copy the first 32 vsr registers */
+	memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+	int rc = 0;
+	elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+	rc = dump_task_altivec(tsk, regs);
+	if (rc)
+		return rc;
+	regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+	rc = dump_task_altivec(tsk, regs);
+#endif
+	return rc;
+}
+
 #ifdef CONFIG_SPE
 
 void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
 	if (last_task_used_altivec == current)
 		last_task_used_altivec = NULL;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (last_task_used_vsx == current)
+		last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
 		giveup_altivec(prev);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+		giveup_vsx(prev);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
 	if (new->thread.regs && last_task_used_altivec == new)
 		new->thread.regs->msr |= MSR_VEC;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (new->thread.regs && last_task_used_vsx == new)
+		new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
 	{MSR_EE,	"EE"},
 	{MSR_PR,	"PR"},
 	{MSR_FP,	"FP"},
+	{MSR_VEC,	"VEC"},
+	{MSR_VSX,	"VSX"},
 	{MSR_ME,	"ME"},
 	{MSR_IR,	"IR"},
 	{MSR_DR,	"DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct 
 {
 	flush_fp_to_thread(current);
 	flush_altivec_to_thread(current);
+	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
 }
 
@@ -689,6 +790,9 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+	current->thread.used_vsr = 0;
+#endif
 	memset(current->thread.fpr, 0,
 	       sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell.  This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+		      const struct user_regset *regset)
+{
+	flush_vsx_to_thread(target);
+	return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  target->thread.fpr, 0,
+				  32 * sizeof(vector128));
+
+	return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.fpr, 0,
+				 32 * sizeof(vector128));
+
+	return ret;
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_SPE
 
 /*
@@ -427,6 +472,9 @@ enum powerpc_regset {
 #ifdef CONFIG_ALTIVEC
 	REGSET_VMX,
 #endif
+#ifdef CONFIG_VSX
+	REGSET_VSX,
+#endif
 #ifdef CONFIG_SPE
 	REGSET_SPE,
 #endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
 		.active = vr_active, .get = vr_get, .set = vr_set
 	},
 #endif
+#ifdef CONFIG_VSX
+	[REGSET_VSX] = {
+		.core_note_type = NT_PPC_VSX, .n = 34,
+		.size = sizeof(vector128), .align = sizeof(vector128),
+		.active = vsr_active, .get = vsr_get, .set = vsr_set
+	},
+#endif
 #ifdef CONFIG_SPE
 	[REGSET_SPE] = {
 		.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
 						 sizeof(u32)),
 					     (const void __user *) data);
 #endif
+#ifdef CONFIG_VSX
+	case PTRACE_GETVSRREGS:
+		return copy_regset_to_user(child, &user_ppc_native_view,
+					   REGSET_VSX,
+					   0, (32 * sizeof(vector128) +
+					       sizeof(u32)),
+					   (void __user *) data);
+
+	case PTRACE_SETVSRREGS:
+		return copy_regset_from_user(child, &user_ppc_native_view,
+					     REGSET_VSX,
+					     0, (32 * sizeof(vector128) +
+						 sizeof(u32)),
+					     (const void __user *) data);
+#endif
 #ifdef CONFIG_SPE
 	case PTRACE_GETEVRREGS:
 		/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
 		return 1;
+	/*
+	 * Copy VSR 0-31 upper half from thread_struct to local
+	 * buffer, then write that to userspace.  Also set MSR_VSX in
+	 * the saved MSR value to indicate that frame->mc_vregs
+	 * contains valid data
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		if (__copy_to_user(&frame->mc_vsregs, buf,
+				   ELF_NVSRHALFREG  * sizeof(double)))
+			return 1;
+		msr |= MSR_VSX;
+	}
 #else
 	/* save floating-point registers */
 	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
 	for (i = 0; i < 32 ; i++)
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+	/*
+	 * Force the process to reload the VSX registers from
+	 * current->thread when it next does VSX instruction.
+	 */
+	regs->msr &= ~MSR_VSX;
+	if (msr & MSR_VSX) {
+		/*
+		 * Restore altivec registers from the stack to a local
+		 * buffer, then write this out to the thread_struct
+		 */
+		if (__copy_from_user(buf, &sr->mc_vsregs,
+				     sizeof(sr->mc_vsregs)))
+			return 1;
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	} else if (current->thread.used_vsr)
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
 #else
 	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
 		buf[i] = current->thread.TS_FPR(i);
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+	/*
+	 * Copy VSX low doubleword to local buffer for formatting,
+	 * then out to userspace.  Update v_regs to point after the
+	 * VMX data.
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		v_regs += ELF_NVRREG;
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+		/* set MSR_VSX in the MSR value in the frame to
+		 * indicate that sc->vs_reg) contains valid data.
+		 */
+		msr |= MSR_VSX;
+	}
 #else /* CONFIG_VSX */
 	/* copy fpr regs and fpscr */
 	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
 	 * This has to be done before copying stuff into current->thread.fpr/vr
 	 * for the reasons explained in the previous comment.
 	 */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Get additional VSX data. Update v_regs to point after the
+	 * VMX data.  Copy VSX low doubleword from userspace to local
+	 * buffer for formatting, then into the taskstruct.
+	 */
+	v_regs += ELF_NVRREG;
+	if ((msr & MSR_VSX) != 0)
+		err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+	else
+		memset(buf, 0, 32 * sizeof(double));
+
+	for (i = 0; i < 32 ; i++)
+		current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
 #else
 	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
 #endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* A user program has executed an vsx instruction,
+		   but this kernel doesn't support vsx. */
+		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+		return;
+	}
+
+	printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+			"%lx at %lx\n", regs->trap, regs->nip);
+	die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
 	perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "VSX assist exception in kernel mode"
+		       " at %lx\n", regs->nip);
+		die("Kernel VSX assist exception", regs, SIGILL);
+	}
+
+	flush_vsx_to_thread(current);
+	printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+	_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_FSL_BOOKE
 void CacheLockingException(struct pt_regs *regs, unsigned long address,
 			   unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
 #ifdef __powerpc64__
 # define ELF_NVRREG32	33	/* includes vscr & vrsave stuffed together */
 # define ELF_NVRREG	34	/* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32	/* Half the vsx registers */
 # define ELF_GREG_TYPE	elf_greg_t64
 #else
 # define ELF_NEVRREG	34	/* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
 typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
 #ifdef __powerpc64__
 typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 #endif
 
 #ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
 typedef elf_vrregset_t elf_fpxregset_t;
 
 #ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
 #define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
 #endif
 
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
 #define PT_VRSAVE_32 (PT_VR0 + 33*4)
 #endif
 
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150	/* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 	/* each VSR reg occupies 4 slots in 32-bit */
+#endif
 #endif /* __powerpc64__ */
 
 /*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
 #define PTRACE_GETEVRREGS	20
 #define PTRACE_SETEVRREGS	21
 
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS	27
+#define PTRACE_SETVSRREGS	28
+
 /*
  * Get or set a debug register. The first 16 are DABR registers and the
  * second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
 #define MSR_ISF_LG	61              /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG 	60              /* Hypervisor state */
 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */
 #define MSR_WE_LG	18		/* Wait State Enable */
 #define MSR_TGPR_LG	17		/* TLB Update registers in use */
@@ -71,6 +72,7 @@
 #endif
 
 #define MSR_VEC		__MASK(MSR_VEC_LG)	/* Enable AltiVec */
+#define MSR_VSX		__MASK(MSR_VSX_LG)	/* Enable VSX */
 #define MSR_POW		__MASK(MSR_POW_LG)	/* Enable Power Management */
 #define MSR_WE		__MASK(MSR_WE_LG)	/* Wait State Enable */
 #define MSR_TGPR	__MASK(MSR_TGPR_LG)	/* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
  * it must be copied via a vector register to/from storage) or as a word.
  * The entry with index 33 contains the vrsave as the first word (offset 0)
  * within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words.  Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ *                    VSR doubleword 0               VSR doubleword 1
+ *           ----------------------------------------------------------------
+ *   VSR[0]  |             FPR[0]            |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[1]  |             FPR[1]            |                              |
+ *           ----------------------------------------------------------------
+ *           |              ...              |                              |
+ *           |              ...              |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[30] |             FPR[30]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[31] |             FPR[31]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[32] |                             VR[0]                            |
+ *           ----------------------------------------------------------------
+ *   VSR[33] |                             VR[1]                            |
+ *           ----------------------------------------------------------------
+ *           |                              ...                             |
+ *           |                              ...                             |
+ *           ----------------------------------------------------------------
+ *   VSR[62] |                             VR[30]                           |
+ *           ----------------------------------------------------------------
+ *   VSR[63] |                             VR[31]                           |
+ *           ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve.  vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
  */
 	elf_vrreg_t	__user *v_regs;
-	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
 #endif
 };
 
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
 extern void giveup_altivec(struct task_struct *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
 }
 #endif
 
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 #ifdef CONFIG_SPE
 extern void flush_spe_to_thread(struct task_struct *);
 #else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
 #define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */
 #define NT_PPC_VMX	0x100		/* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE	0x101		/* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX	0x102		/* PowerPC VSX registers */
 #define NT_386_TLS	0x200		/* i386 TLS slots (struct user_desc) */
 
 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-23  5:31   ` Michael Neuling
                       ` (7 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -171,6 +171,22 @@ config VSX
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only useful if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  CPUs (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-23  5:31   ` Michael Neuling
                       ` (4 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-23  5:31     ` Michael Neuling
  2008-06-23  5:31     ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  5:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

---

 arch/powerpc/kernel/prom.c     |    4 ++++
 include/asm-powerpc/cputable.h |   15 ++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
 	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
 	{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP		0x00000400
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
+#define PPC_FEATURE_HAS_VSX		0x00000080
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
 #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP	0
+#define PPC_FEATURE_HAS_VSX_COMP    0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
 	    (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |	\
 	    CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |	\
 	    CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |		\
-	    CPU_FTR_1T_SEGMENT)
+	    CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
 #else
 enum {
 	CPU_FTRS_POSSIBLE =

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-23  7:38       ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23 14:46         ` Kumar Gala
  2008-06-23  7:38       ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
                         ` (7 subsequent siblings)
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad.  Also
when we add VSX here in a later patch, we can hit two of these at the
same time.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/signal_32.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
 		int sigret)
 {
+	unsigned long msr = regs->msr;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_VEC in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_VEC;
 	}
 	/* else assert((regs->msr & MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_SPE in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_SPE;
 	}
 	/* else assert((regs->msr & MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
 		return 1;
 #endif /* CONFIG_SPE */
 
+	if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+		return 1;
 	if (sigret) {
 		/* Set up the sigreturn trampoline: li r0,sigret; sc */
 		if (__put_user(0x38000000UL + sigret, &frame->tramp[0])

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-23  5:31   ` Michael Neuling
                       ` (8 preceding siblings ...)
  2008-06-23  5:31     ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-23  7:38     ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
                         ` (9 more replies)
  9 siblings, 10 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
Paulus: please consider for your 2.6.27 tree.

Updates this post....
- Fixed ptrace 32 error noticed by paulus.
- Fixed calling of load_up_altivec in head_64.S also noticed by paulus

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
                         ` (8 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/align.c      |    6 ++--
 arch/powerpc/kernel/process.c    |    5 ++-
 arch/powerpc/kernel/ptrace.c     |   14 +++++----
 arch/powerpc/kernel/ptrace32.c   |   14 +++++++--
 arch/powerpc/kernel/softemu8xx.c |    4 +-
 arch/powerpc/math-emu/math.c     |   56 +++++++++++++++++++--------------------
 include/asm-powerpc/ppc_asm.h    |    5 ++-
 include/asm-powerpc/processor.h  |    3 ++
 8 files changed, 61 insertions(+), 46 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
-	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+	memset(current->thread.fpr, 0,
+	       sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   &target->thread.fpr, 0, -1);
+				   target->thread.fpr, 0, -1);
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-				  &target->thread.fpr, 0, -1);
+				  target->thread.fpr, 0, -1);
 }
 
 
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
 	return -EPERM;
 }
 
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
+#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
+
 long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			compat_ulong_t caddr, compat_ulong_t cdata)
 {
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.fpr)
+				[FPRINDEX(index)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.fpr)
+				[FPRINDEX(numReg)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,8 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
@@ -289,4 +291,5 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 3/9] powerpc: Move altivec_unavailable
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (4 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
                         ` (3 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/head_64.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf00
 	b	performance_monitor_pSeries
 
-	STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+	. = 0xf20
+	b	altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
+	STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (6 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

                   VSR doubleword 0               VSR doubleword 1
          ----------------------------------------------------------------
  VSR[0]  |             FPR[0]            |                              |
          ----------------------------------------------------------------
  VSR[1]  |             FPR[1]            |                              |
          ----------------------------------------------------------------
          |              ...              |                              |
          |              ...              |                              |
          ----------------------------------------------------------------
  VSR[30] |             FPR[30]           |                              |
          ----------------------------------------------------------------
  VSR[31] |             FPR[31]           |                              |
          ----------------------------------------------------------------
  VSR[32] |                             VR[0]                            |
          ----------------------------------------------------------------
  VSR[33] |                             VR[1]                            |
          ----------------------------------------------------------------
          |                              ...                             |
          |                              ...                             |
          ----------------------------------------------------------------
  VSR[62] |                             VR[30]                           |
          ----------------------------------------------------------------
  VSR[63] |                             VR[31]                           |
          ----------------------------------------------------------------

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/asm-offsets.c |    4 ++
 arch/powerpc/kernel/ptrace.c      |   28 ++++++++++++++++++
 arch/powerpc/kernel/signal_32.c   |   59 ++++++++++++++++++++++++++++----------
 arch/powerpc/kernel/signal_64.c   |   32 ++++++++++++++++++--
 include/asm-powerpc/processor.h   |   21 ++++++++++++-
 5 files changed, 126 insertions(+), 18 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
 		   unsigned int pos, unsigned int count,
 		   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = target->thread.TS_FPR(i);
+	memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   target->thread.fpr, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+	if (i)
+		return i;
+	for (i = 0; i < 32 ; i++)
+		target->thread.TS_FPR(i) = buf[i];
+	memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+	return 0;
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  target->thread.fpr, 0, -1);
+#endif
 }
 
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
 		int sigret)
 {
 	unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
-	/* save general and floating-point registers */
-	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.fpr,
-		    ELF_NFPREG * sizeof(double)))
+	/* save general registers */
+	if (save_general_regs(regs, frame))
 		return 1;
 
 #ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
 	if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 #endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+	/* save FPR copy to local buffer then write to the thread_struct */
+	flush_fp_to_thread(current);
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+		return 1;
+#else
+	/* save floating-point registers */
+	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+		    ELF_NFPREG * sizeof(double)))
+		return 1;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* save spe registers */
 	if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
 	long err;
 	unsigned int save_r2 = 0;
 	unsigned long msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/*
 	 * restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
 	 */
 	discard_lazy_cpu_state();
 
-	/* force the process to reload the FP registers from
-	   current->thread when it next does FP instructions */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
-			     sizeof(sr->mc_fregs)))
-		return 1;
-
 #ifdef CONFIG_ALTIVEC
-	/* force the process to reload the altivec registers from
-	   current->thread when it next does altivec instructions */
+	/*
+	 * Force the process to reload the altivec registers from
+	 * current->thread when it next does altivec instructions
+	 */
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
 		return 1;
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+	if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+		return 1;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+#else
+	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+			     sizeof(sr->mc_fregs)))
+		return 1;
+#endif /* CONFIG_VSX */
+	/*
+	 * force the process to reload the FP registers from
+	 * current->thread when it next does FP instructions
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
 #ifdef CONFIG_SPE
 	/* force the process to reload the spe registers from
 	   current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+	int i;
+#endif
 
 	flush_fp_to_thread(current);
 
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
+	flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+	/* Copy FP to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+	/* copy fpr regs and fpscr */
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
 	err |= __put_user(&sc->gp_regs, &sc->regs);
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs;
 #endif
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+#endif
 	unsigned long err = 0;
 	unsigned long save_r13 = 0;
 	elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
 	if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
 	else
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* restore floating point */
+	err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+	if (err)
+		return err;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+#else
+	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
 	return err;
 }
 
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
 #ifdef CONFIG_PPC32
@@ -136,7 +137,13 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
+#else
 #define TS_FPR(i) fpr[i]
+#endif
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
@@ -154,8 +161,12 @@ struct thread_struct {
 	unsigned long	dbcr0;		/* debug control register values */
 	unsigned long	dbcr1;
 #endif
+#ifdef CONFIG_VSX
+	double		fpr[32][2];	/* Complete floating point set */
+#else
 	double		fpr[32];	/* Complete floating point set */
-	struct {			/* fpr ... fpscr must be contiguous */
+#endif
+	struct {
 
 		unsigned int pad;
 		unsigned int val;	/* Floating point status */
@@ -175,6 +186,10 @@ struct thread_struct {
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* VSR status */
+	int		used_vsr;	/* set if process has used altivec */
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
 	u64		acc;		/* Accumulator */
@@ -291,5 +306,9 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
 #define TS_FPRSPACING 1
+#endif
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-23  7:38       ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
  2008-06-23  7:38       ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
                         ` (6 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

---

 arch/powerpc/kernel/prom.c     |    4 ++++
 include/asm-powerpc/cputable.h |   15 ++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
 	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
 	{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP		0x00000400
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
+#define PPC_FEATURE_HAS_VSX		0x00000080
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
 #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP	0
+#define PPC_FEATURE_HAS_VSX_COMP    0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
 	    (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |	\
 	    CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |	\
 	    CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |		\
-	    CPU_FTR_1T_SEGMENT)
+	    CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
 #else
 enum {
 	CPU_FTRS_POSSIBLE =

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (7 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/entry_64.S   |    5 +
 arch/powerpc/kernel/fpu.S        |   16 ++++-
 arch/powerpc/kernel/head_64.S    |   65 +++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S    |   33 ++++++++++++
 arch/powerpc/kernel/ppc32.h      |    1 
 arch/powerpc/kernel/ppc_ksyms.c  |    3 +
 arch/powerpc/kernel/process.c    |  106 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/kernel/ptrace.c     |   70 +++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c  |   33 ++++++++++++
 arch/powerpc/kernel/signal_64.c  |   31 +++++++++++
 arch/powerpc/kernel/traps.c      |   29 ++++++++++
 include/asm-powerpc/elf.h        |    6 +-
 include/asm-powerpc/ptrace.h     |   12 ++++
 include/asm-powerpc/reg.h        |    2 
 include/asm-powerpc/sigcontext.h |   37 +++++++++++++
 include/asm-powerpc/system.h     |    9 +++
 include/linux/elf.h              |    1 
 17 files changed, 451 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
 	mflr	r20		/* Return to switch caller */
 	mfmsr	r22
 	li	r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r0,r0,MSR_VSX@h	/* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	oris	r0,r0,MSR_VEC@h	/* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
 _GLOBAL(load_up_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
 	beq	1f
 	toreal(r4)
 	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPRS(0, r4)
+	SAVE_32FPVSRS(0, r5, r4)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r4)
 	PPC_LL	r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
 #endif
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
-	REST_32FPRS(0, r5)
+	REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
 	subi	r4,r5,THREAD
 	fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC_601
 	ISYNC_601
 	MTMSRD(r5)			/* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
-	SAVE_32FPRS(0, r3)
+	SAVE_32FPVSRS(0, r4 ,r3)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf20
 	b	altivec_unavailable_pSeries
 
+	. = 0xf40
+	b	vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
 	STD_EXCEPTION_PSERIES(., altivec_unavailable)
+	STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -836,6 +840,67 @@ _STATIC(load_up_altivec)
 	blr
 #endif /* CONFIG_ALTIVEC */
 
+	.align	7
+	.globl vsx_unavailable_common
+vsx_unavailable_common:
+	EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	bne	.load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	ENABLE_INTS
+	bl	.vsx_unavailable_exception
+	b	.ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+	andi.	r5,r12,MSR_FP
+	beql+	load_up_fpu		/* skip if already loaded */
+	andis.	r5,r12,MSR_VEC@h
+	beql+	load_up_altivec		/* skip if already loaded */
+
+#ifndef CONFIG_SMP
+	ld	r3,last_task_used_vsx@got(r2)
+	ld	r4,0(r3)
+	cmpdi	0,r4,0
+	beq	1f
+	/* Disable VSX for last_task_used_vsx */
+	addi	r4,r4,THREAD
+	ld	r5,PT_REGS(r4)
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r6,MSR_VSX@h
+	andc	r6,r4,r6
+	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+	ld	r4,PACACURRENT(r13)
+	addi	r4,r4,THREAD		/* Get THREAD */
+	li	r6,1
+	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	/* enable use of VSX after return */
+	oris	r12,r12,MSR_VSX@h
+	std	r12,_MSR(r1)
+#ifndef CONFIG_SMP
+	/* Update last_task_used_math to 'current' */
+	ld	r4,PACACURRENT(r13)
+	std	r4,0(r3)
+#endif /* CONFIG_SMP */
+	b	fast_exception_return
+#endif /* CONFIG_VSX */
+
 /*
  * Hash table stuff
  */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
 
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+	mfmsr	r5
+	oris	r5,r5,MSR_VSX@h
+	mtmsrd	r5			/* enable use of VSX now */
+	isync
+
+	cmpdi	0,r3,0
+	beqlr-				/* if no previous owner, done */
+	addi	r3,r3,THREAD		/* want THREAD of task */
+	ld	r5,PT_REGS(r3)
+	cmpdi	0,r5,0
+	beq	1f
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r3,MSR_VSX@h
+	andc	r4,r4,r3		/* disable VSX for previous task */
+	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+	li	r5,0
+	ld	r4,last_task_used_vsx@got(r2)
+	std	r5,0(r4)
+#endif /* CONFIG_SMP */
+	blr
+
+#endif /* CONFIG_VSX */
+
 /* kexec_wait(phys_cpu)
  *
  * wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
 	elf_fpregset_t		mc_fregs;
 	unsigned int		mc_pad[2];
 	elf_vrregset_t32	mc_vregs __attribute__((__aligned__(16)));
+	elf_vsrreghalf_t32      mc_vsregs __attribute__((__aligned__(16)));
 };
 
 struct ucontext32 { 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
 #ifdef CONFIG_ALTIVEC
 EXPORT_SYMBOL(giveup_altivec);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 EXPORT_SYMBOL(giveup_spe);
 #endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
 {
+#ifdef CONFIG_VSX
+	int i;
+	elf_fpreg_t *reg;
+#endif
+
 	if (!tsk->thread.regs)
 		return 0;
 	flush_fp_to_thread(current);
 
+#ifdef CONFIG_VSX
+	reg = (elf_fpreg_t *)fpregs;
+	for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+		*reg = tsk->thread.TS_FPR(i);
+	memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
 	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
 
 	return 1;
 }
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
 	}
 }
 
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
 {
 	/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
 	 * separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+	WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+		giveup_vsx(current);
+	else
+		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
+#else
+	giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+	if (tsk->thread.regs) {
+		preempt_disable();
+		if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+			BUG_ON(tsk != current);
+#endif
+			giveup_vsx(tsk);
+		}
+		preempt_enable();
+	}
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers.  This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+	/* Grab only the first half */
+	const int nregs = 32;
+	elf_vrreg_t *reg;
+
+	if (tsk == current)
+		flush_vsx_to_thread(tsk);
+
+	reg = (elf_vrreg_t *)vrregs;
+
+	/* copy the first 32 vsr registers */
+	memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+	int rc = 0;
+	elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+	rc = dump_task_altivec(tsk, regs);
+	if (rc)
+		return rc;
+	regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+	rc = dump_task_altivec(tsk, regs);
+#endif
+	return rc;
+}
+
 #ifdef CONFIG_SPE
 
 void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
 	if (last_task_used_altivec == current)
 		last_task_used_altivec = NULL;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (last_task_used_vsx == current)
+		last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
 		giveup_altivec(prev);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+		giveup_vsx(prev);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
 	if (new->thread.regs && last_task_used_altivec == new)
 		new->thread.regs->msr |= MSR_VEC;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (new->thread.regs && last_task_used_vsx == new)
+		new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
 	{MSR_EE,	"EE"},
 	{MSR_PR,	"PR"},
 	{MSR_FP,	"FP"},
+	{MSR_VEC,	"VEC"},
+	{MSR_VSX,	"VSX"},
 	{MSR_ME,	"ME"},
 	{MSR_IR,	"IR"},
 	{MSR_DR,	"DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct 
 {
 	flush_fp_to_thread(current);
 	flush_altivec_to_thread(current);
+	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
 }
 
@@ -689,6 +790,9 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+	current->thread.used_vsr = 0;
+#endif
 	memset(current->thread.fpr, 0,
 	       sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell.  This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+		      const struct user_regset *regset)
+{
+	flush_vsx_to_thread(target);
+	return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  target->thread.fpr, 0,
+				  32 * sizeof(vector128));
+
+	return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.fpr, 0,
+				 32 * sizeof(vector128));
+
+	return ret;
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_SPE
 
 /*
@@ -427,6 +472,9 @@ enum powerpc_regset {
 #ifdef CONFIG_ALTIVEC
 	REGSET_VMX,
 #endif
+#ifdef CONFIG_VSX
+	REGSET_VSX,
+#endif
 #ifdef CONFIG_SPE
 	REGSET_SPE,
 #endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
 		.active = vr_active, .get = vr_get, .set = vr_set
 	},
 #endif
+#ifdef CONFIG_VSX
+	[REGSET_VSX] = {
+		.core_note_type = NT_PPC_VSX, .n = 34,
+		.size = sizeof(vector128), .align = sizeof(vector128),
+		.active = vsr_active, .get = vsr_get, .set = vsr_set
+	},
+#endif
 #ifdef CONFIG_SPE
 	[REGSET_SPE] = {
 		.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
 						 sizeof(u32)),
 					     (const void __user *) data);
 #endif
+#ifdef CONFIG_VSX
+	case PTRACE_GETVSRREGS:
+		return copy_regset_to_user(child, &user_ppc_native_view,
+					   REGSET_VSX,
+					   0, (32 * sizeof(vector128) +
+					       sizeof(u32)),
+					   (void __user *) data);
+
+	case PTRACE_SETVSRREGS:
+		return copy_regset_from_user(child, &user_ppc_native_view,
+					     REGSET_VSX,
+					     0, (32 * sizeof(vector128) +
+						 sizeof(u32)),
+					     (const void __user *) data);
+#endif
 #ifdef CONFIG_SPE
 	case PTRACE_GETEVRREGS:
 		/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
 		return 1;
+	/*
+	 * Copy VSR 0-31 upper half from thread_struct to local
+	 * buffer, then write that to userspace.  Also set MSR_VSX in
+	 * the saved MSR value to indicate that frame->mc_vregs
+	 * contains valid data
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		if (__copy_to_user(&frame->mc_vsregs, buf,
+				   ELF_NVSRHALFREG  * sizeof(double)))
+			return 1;
+		msr |= MSR_VSX;
+	}
 #else
 	/* save floating-point registers */
 	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
 	for (i = 0; i < 32 ; i++)
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+	/*
+	 * Force the process to reload the VSX registers from
+	 * current->thread when it next does VSX instruction.
+	 */
+	regs->msr &= ~MSR_VSX;
+	if (msr & MSR_VSX) {
+		/*
+		 * Restore altivec registers from the stack to a local
+		 * buffer, then write this out to the thread_struct
+		 */
+		if (__copy_from_user(buf, &sr->mc_vsregs,
+				     sizeof(sr->mc_vsregs)))
+			return 1;
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	} else if (current->thread.used_vsr)
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
 #else
 	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
 		buf[i] = current->thread.TS_FPR(i);
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+	/*
+	 * Copy VSX low doubleword to local buffer for formatting,
+	 * then out to userspace.  Update v_regs to point after the
+	 * VMX data.
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		v_regs += ELF_NVRREG;
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+		/* set MSR_VSX in the MSR value in the frame to
+		 * indicate that sc->vs_reg) contains valid data.
+		 */
+		msr |= MSR_VSX;
+	}
 #else /* CONFIG_VSX */
 	/* copy fpr regs and fpscr */
 	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
 	 * This has to be done before copying stuff into current->thread.fpr/vr
 	 * for the reasons explained in the previous comment.
 	 */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Get additional VSX data. Update v_regs to point after the
+	 * VMX data.  Copy VSX low doubleword from userspace to local
+	 * buffer for formatting, then into the taskstruct.
+	 */
+	v_regs += ELF_NVRREG;
+	if ((msr & MSR_VSX) != 0)
+		err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+	else
+		memset(buf, 0, 32 * sizeof(double));
+
+	for (i = 0; i < 32 ; i++)
+		current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
 #else
 	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
 #endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* A user program has executed an vsx instruction,
+		   but this kernel doesn't support vsx. */
+		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+		return;
+	}
+
+	printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+			"%lx at %lx\n", regs->trap, regs->nip);
+	die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
 	perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "VSX assist exception in kernel mode"
+		       " at %lx\n", regs->nip);
+		die("Kernel VSX assist exception", regs, SIGILL);
+	}
+
+	flush_vsx_to_thread(current);
+	printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+	_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_FSL_BOOKE
 void CacheLockingException(struct pt_regs *regs, unsigned long address,
 			   unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
 #ifdef __powerpc64__
 # define ELF_NVRREG32	33	/* includes vscr & vrsave stuffed together */
 # define ELF_NVRREG	34	/* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32	/* Half the vsx registers */
 # define ELF_GREG_TYPE	elf_greg_t64
 #else
 # define ELF_NEVRREG	34	/* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
 typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
 #ifdef __powerpc64__
 typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 #endif
 
 #ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
 typedef elf_vrregset_t elf_fpxregset_t;
 
 #ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
 #define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
 #endif
 
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
 #define PT_VRSAVE_32 (PT_VR0 + 33*4)
 #endif
 
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150	/* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 	/* each VSR reg occupies 4 slots in 32-bit */
+#endif
 #endif /* __powerpc64__ */
 
 /*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
 #define PTRACE_GETEVRREGS	20
 #define PTRACE_SETEVRREGS	21
 
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS	27
+#define PTRACE_SETVSRREGS	28
+
 /*
  * Get or set a debug register. The first 16 are DABR registers and the
  * second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
 #define MSR_ISF_LG	61              /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG 	60              /* Hypervisor state */
 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */
 #define MSR_WE_LG	18		/* Wait State Enable */
 #define MSR_TGPR_LG	17		/* TLB Update registers in use */
@@ -71,6 +72,7 @@
 #endif
 
 #define MSR_VEC		__MASK(MSR_VEC_LG)	/* Enable AltiVec */
+#define MSR_VSX		__MASK(MSR_VSX_LG)	/* Enable VSX */
 #define MSR_POW		__MASK(MSR_POW_LG)	/* Enable Power Management */
 #define MSR_WE		__MASK(MSR_WE_LG)	/* Wait State Enable */
 #define MSR_TGPR	__MASK(MSR_TGPR_LG)	/* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
  * it must be copied via a vector register to/from storage) or as a word.
  * The entry with index 33 contains the vrsave as the first word (offset 0)
  * within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words.  Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ *                    VSR doubleword 0               VSR doubleword 1
+ *           ----------------------------------------------------------------
+ *   VSR[0]  |             FPR[0]            |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[1]  |             FPR[1]            |                              |
+ *           ----------------------------------------------------------------
+ *           |              ...              |                              |
+ *           |              ...              |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[30] |             FPR[30]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[31] |             FPR[31]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[32] |                             VR[0]                            |
+ *           ----------------------------------------------------------------
+ *   VSR[33] |                             VR[1]                            |
+ *           ----------------------------------------------------------------
+ *           |                              ...                             |
+ *           |                              ...                             |
+ *           ----------------------------------------------------------------
+ *   VSR[62] |                             VR[30]                           |
+ *           ----------------------------------------------------------------
+ *   VSR[63] |                             VR[31]                           |
+ *           ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve.  vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
  */
 	elf_vrreg_t	__user *v_regs;
-	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
 #endif
 };
 
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
 extern void giveup_altivec(struct task_struct *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
 }
 #endif
 
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 #ifdef CONFIG_SPE
 extern void flush_spe_to_thread(struct task_struct *);
 #else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
 #define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */
 #define NT_PPC_VMX	0x100		/* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE	0x101		/* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX	0x102		/* PowerPC VSX registers */
 #define NT_386_TLS	0x200		/* i386 TLS slots (struct user_desc) */
 
 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (2 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
                         ` (5 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only useful if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  CPUs (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (3 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
                         ` (4 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S        |    2 +-
 arch/powerpc/kernel/head_32.S    |    6 ++++--
 arch/powerpc/kernel/head_64.S    |   10 +++++++---
 arch/powerpc/kernel/head_booke.h |    6 ++++--
 4 files changed, 16 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
-	b	fast_exception_return
+	blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
 	b 	ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 	EXCEPTION_PROLOG
-	bne	load_up_fpu		/* if from user, just load it up */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_fpu		/* if from user, just load it up */
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
 	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
-1:	b	.load_up_fpu
+1:	bl	.load_up_fpu
+	b	fast_exception_return
 
 	.align	7
 	.globl altivec_unavailable_common
@@ -749,7 +750,10 @@ altivec_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-	bne	.load_up_altivec	/* if from user, just load it up */
+	beq	1f
+	bl	.load_up_altivec
+	b	fast_exception_return
+1:
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	bl	.save_nvgprs
@@ -829,7 +833,7 @@ _STATIC(load_up_altivec)
 	std	r4,0(r3)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
-	b	fast_exception_return
+	blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION					      \
 	START_EXCEPTION(FloatingPointUnavailable)			      \
 	NORMAL_EXCEPTION_PROLOG;					      \
-	bne	load_up_fpu;		/* if from user, just load it up */   \
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
+	beq	1f;							      \
+	bl	load_up_fpu;		/* if from user, just load it up */   \
+	b	fast_exception_return;					      \
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (5 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-23  7:38       ` Michael Neuling
  2008-06-23  7:38       ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
                         ` (2 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23  7:38 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 				REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
+				 ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	REST_32FPRS(n,base);						\
+	b	3f;							\
+2:	REST_32VSRS(n,c,base);						\
+3:
+
+#define SAVE_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	SAVE_32FPRS(n,base);						\
+	b	3f;							\
+2:	SAVE_32VSRS(n,c,base);						\
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#endif
+
 #define SAVE_EVR(n,s,base)	evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)	SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)	SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define	vr30	30
 #define	vr31	31
 
+/* VSX Registers (VSRs) */
+
+#define	vsr0	0
+#define	vsr1	1
+#define	vsr2	2
+#define	vsr3	3
+#define	vsr4	4
+#define	vsr5	5
+#define	vsr6	6
+#define	vsr7	7
+#define	vsr8	8
+#define	vsr9	9
+#define	vsr10	10
+#define	vsr11	11
+#define	vsr12	12
+#define	vsr13	13
+#define	vsr14	14
+#define	vsr15	15
+#define	vsr16	16
+#define	vsr17	17
+#define	vsr18	18
+#define	vsr19	19
+#define	vsr20	20
+#define	vsr21	21
+#define	vsr22	22
+#define	vsr23	23
+#define	vsr24	24
+#define	vsr25	25
+#define	vsr26	26
+#define	vsr27	27
+#define	vsr28	28
+#define	vsr29	29
+#define	vsr30	30
+#define	vsr31	31
+#define	vsr32	32
+#define	vsr33	33
+#define	vsr34	34
+#define	vsr35	35
+#define	vsr36	36
+#define	vsr37	37
+#define	vsr38	38
+#define	vsr39	39
+#define	vsr40	40
+#define	vsr41	41
+#define	vsr42	42
+#define	vsr43	43
+#define	vsr44	44
+#define	vsr45	45
+#define	vsr46	46
+#define	vsr47	47
+#define	vsr48	48
+#define	vsr49	49
+#define	vsr50	50
+#define	vsr51	51
+#define	vsr52	52
+#define	vsr53	53
+#define	vsr54	54
+#define	vsr55	55
+#define	vsr56	56
+#define	vsr57	57
+#define	vsr58	58
+#define	vsr59	59
+#define	vsr60	60
+#define	vsr61	61
+#define	vsr62	62
+#define	vsr63	63
+
 /* SPE Registers (EVPRs) */
 
 #define	evr0	0

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-23  7:38       ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-23 14:46         ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-23 14:46 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 23, 2008, at 2:38 AM, Michael Neuling wrote:

> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit.  This will never happen in reality (VMX and SPE will never be in
> the same processor as their opcodes overlap), but it looks bad.  Also
> when we add VSX here in a later patch, we can hit two of these at the
> same time.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---

I think it would also be good to comment about how this doesn't happen  
since they are the same MSR bit.  Having that comment might reduce  
confusion if anyone ever looks at this commit message in the future.   
(Plus you seem to have trailing white space in the commit message).

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                         ` (8 preceding siblings ...)
  2008-06-23  7:38       ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-24 10:57       ` Michael Neuling
  2008-06-24 10:57         ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
                           ` (9 more replies)
  9 siblings, 10 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
Paulus: please consider for your 2.6.27 tree.

Updates this post....
- Comment on VMX vs SPE as suggested by Kumar.
- Fixes for core files

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-24 10:57         ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
  2008-06-24 10:57         ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 14:07           ` Kumar Gala
  2008-06-24 10:57         ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
                           ` (6 subsequent siblings)
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/align.c      |    6 ++--
 arch/powerpc/kernel/process.c    |    5 ++-
 arch/powerpc/kernel/ptrace.c     |   14 +++++----
 arch/powerpc/kernel/ptrace32.c   |   14 +++++++--
 arch/powerpc/kernel/softemu8xx.c |    4 +-
 arch/powerpc/math-emu/math.c     |   56 +++++++++++++++++++--------------------
 include/asm-powerpc/ppc_asm.h    |    5 ++-
 include/asm-powerpc/processor.h  |    3 ++
 8 files changed, 61 insertions(+), 46 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
-	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+	memset(current->thread.fpr, 0,
+	       sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
 	memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
-				   &target->thread.fpr, 0, -1);
+				   target->thread.fpr, 0, -1);
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
-				  &target->thread.fpr, 0, -1);
+				  target->thread.fpr, 0, -1);
 }
 
 
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
 	return -EPERM;
 }
 
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
+#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
+
 long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			compat_ulong_t caddr, compat_ulong_t cdata)
 {
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.fpr)
+				[FPRINDEX(index)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.fpr)
+				[FPRINDEX(numReg)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.fpr)
+				[TS_FPRSPACING * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,8 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */
@@ -289,4 +291,5 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 3/9] powerpc: Move altivec_unavailable
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 10:57         ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
                           ` (8 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/head_64.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf00
 	b	performance_monitor_pSeries
 
-	STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+	. = 0xf20
+	b	altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
+	STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-24 10:57         ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 13:47           ` Kumar Gala
  2008-06-24 10:57         ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
                           ` (7 subsequent siblings)
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This doesn't matter in reality as they are infact the same bit
but looks bad.

Also, when we add VSX in a later patch, we need to be able to set two
separate MSR bits here.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/signal_32.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
 		int sigret)
 {
+	unsigned long msr = regs->msr;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_VEC in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_VEC;
 	}
 	/* else assert((regs->msr & MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_SPE in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_SPE;
 	}
 	/* else assert((regs->msr & MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
 		return 1;
 #endif /* CONFIG_SPE */
 
+	if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+		return 1;
 	if (sigret) {
 		/* Set up the sigreturn trampoline: li r0,sigret; sc */
 		if (__put_user(0x38000000UL + sigret, &frame->tramp[0])

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (4 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 14:01           ` Kumar Gala
  2008-06-24 10:57         ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
                           ` (3 subsequent siblings)
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S        |    2 +-
 arch/powerpc/kernel/head_32.S    |    6 ++++--
 arch/powerpc/kernel/head_64.S    |   10 +++++++---
 arch/powerpc/kernel/head_booke.h |    6 ++++--
 4 files changed, 16 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
-	b	fast_exception_return
+	blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
 	b 	ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 	EXCEPTION_PROLOG
-	bne	load_up_fpu		/* if from user, just load it up */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_fpu		/* if from user, just load it up */
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
 	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
-1:	b	.load_up_fpu
+1:	bl	.load_up_fpu
+	b	fast_exception_return
 
 	.align	7
 	.globl altivec_unavailable_common
@@ -749,7 +750,10 @@ altivec_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-	bne	.load_up_altivec	/* if from user, just load it up */
+	beq	1f
+	bl	.load_up_altivec
+	b	fast_exception_return
+1:
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	bl	.save_nvgprs
@@ -829,7 +833,7 @@ _STATIC(load_up_altivec)
 	std	r4,0(r3)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
-	b	fast_exception_return
+	blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION					      \
 	START_EXCEPTION(FloatingPointUnavailable)			      \
 	NORMAL_EXCEPTION_PROLOG;					      \
-	bne	load_up_fpu;		/* if from user, just load it up */   \
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
+	beq	1f;							      \
+	bl	load_up_fpu;		/* if from user, just load it up */   \
+	b	fast_exception_return;					      \
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (5 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 14:19           ` Kumar Gala
  2008-06-24 10:57         ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
                           ` (2 subsequent siblings)
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

---

 arch/powerpc/kernel/prom.c     |    4 ++++
 include/asm-powerpc/cputable.h |   15 ++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
 	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
 	{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP		0x00000400
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
+#define PPC_FEATURE_HAS_VSX		0x00000080
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
 #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP	0
+#define PPC_FEATURE_HAS_VSX_COMP    0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
 	    (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |	\
 	    CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |	\
 	    CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |		\
-	    CPU_FTR_1T_SEGMENT)
+	    CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
 #else
 enum {
 	CPU_FTRS_POSSIBLE =

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (7 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 14:06           ` Kumar Gala
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 				REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
+				 ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	REST_32FPRS(n,base);						\
+	b	3f;							\
+2:	REST_32VSRS(n,c,base);						\
+3:
+
+#define SAVE_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	SAVE_32FPRS(n,base);						\
+	b	3f;							\
+2:	SAVE_32VSRS(n,c,base);						\
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#endif
+
 #define SAVE_EVR(n,s,base)	evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)	SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)	SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define	vr30	30
 #define	vr31	31
 
+/* VSX Registers (VSRs) */
+
+#define	vsr0	0
+#define	vsr1	1
+#define	vsr2	2
+#define	vsr3	3
+#define	vsr4	4
+#define	vsr5	5
+#define	vsr6	6
+#define	vsr7	7
+#define	vsr8	8
+#define	vsr9	9
+#define	vsr10	10
+#define	vsr11	11
+#define	vsr12	12
+#define	vsr13	13
+#define	vsr14	14
+#define	vsr15	15
+#define	vsr16	16
+#define	vsr17	17
+#define	vsr18	18
+#define	vsr19	19
+#define	vsr20	20
+#define	vsr21	21
+#define	vsr22	22
+#define	vsr23	23
+#define	vsr24	24
+#define	vsr25	25
+#define	vsr26	26
+#define	vsr27	27
+#define	vsr28	28
+#define	vsr29	29
+#define	vsr30	30
+#define	vsr31	31
+#define	vsr32	32
+#define	vsr33	33
+#define	vsr34	34
+#define	vsr35	35
+#define	vsr36	36
+#define	vsr37	37
+#define	vsr38	38
+#define	vsr39	39
+#define	vsr40	40
+#define	vsr41	41
+#define	vsr42	42
+#define	vsr43	43
+#define	vsr44	44
+#define	vsr45	45
+#define	vsr46	46
+#define	vsr47	47
+#define	vsr48	48
+#define	vsr49	49
+#define	vsr50	50
+#define	vsr51	51
+#define	vsr52	52
+#define	vsr53	53
+#define	vsr54	54
+#define	vsr55	55
+#define	vsr56	56
+#define	vsr57	57
+#define	vsr58	58
+#define	vsr59	59
+#define	vsr60	60
+#define	vsr61	61
+#define	vsr62	62
+#define	vsr63	63
+
 /* SPE Registers (EVPRs) */
 
 #define	evr0	0

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (2 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 10:57         ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
                           ` (5 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

                   VSR doubleword 0               VSR doubleword 1
          ----------------------------------------------------------------
  VSR[0]  |             FPR[0]            |                              |
          ----------------------------------------------------------------
  VSR[1]  |             FPR[1]            |                              |
          ----------------------------------------------------------------
          |              ...              |                              |
          |              ...              |                              |
          ----------------------------------------------------------------
  VSR[30] |             FPR[30]           |                              |
          ----------------------------------------------------------------
  VSR[31] |             FPR[31]           |                              |
          ----------------------------------------------------------------
  VSR[32] |                             VR[0]                            |
          ----------------------------------------------------------------
  VSR[33] |                             VR[1]                            |
          ----------------------------------------------------------------
          |                              ...                             |
          |                              ...                             |
          ----------------------------------------------------------------
  VSR[62] |                             VR[30]                           |
          ----------------------------------------------------------------
  VSR[63] |                             VR[31]                           |
          ----------------------------------------------------------------

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/asm-offsets.c |    4 ++
 arch/powerpc/kernel/ptrace.c      |   28 ++++++++++++++++++
 arch/powerpc/kernel/signal_32.c   |   59 ++++++++++++++++++++++++++++----------
 arch/powerpc/kernel/signal_64.c   |   32 ++++++++++++++++++--
 include/asm-powerpc/processor.h   |   21 ++++++++++++-
 5 files changed, 126 insertions(+), 18 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
 		   unsigned int pos, unsigned int count,
 		   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = target->thread.TS_FPR(i);
+	memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   target->thread.fpr, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+	if (i)
+		return i;
+	for (i = 0; i < 32 ; i++)
+		target->thread.TS_FPR(i) = buf[i];
+	memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+	return 0;
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  target->thread.fpr, 0, -1);
+#endif
 }
 
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
 		int sigret)
 {
 	unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
-	/* save general and floating-point registers */
-	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.fpr,
-		    ELF_NFPREG * sizeof(double)))
+	/* save general registers */
+	if (save_general_regs(regs, frame))
 		return 1;
 
 #ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
 	if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 #endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+	/* save FPR copy to local buffer then write to the thread_struct */
+	flush_fp_to_thread(current);
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+		return 1;
+#else
+	/* save floating-point registers */
+	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+		    ELF_NFPREG * sizeof(double)))
+		return 1;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* save spe registers */
 	if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
 	long err;
 	unsigned int save_r2 = 0;
 	unsigned long msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/*
 	 * restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
 	 */
 	discard_lazy_cpu_state();
 
-	/* force the process to reload the FP registers from
-	   current->thread when it next does FP instructions */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
-			     sizeof(sr->mc_fregs)))
-		return 1;
-
 #ifdef CONFIG_ALTIVEC
-	/* force the process to reload the altivec registers from
-	   current->thread when it next does altivec instructions */
+	/*
+	 * Force the process to reload the altivec registers from
+	 * current->thread when it next does altivec instructions
+	 */
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
 		return 1;
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+	if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+		return 1;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+#else
+	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+			     sizeof(sr->mc_fregs)))
+		return 1;
+#endif /* CONFIG_VSX */
+	/*
+	 * force the process to reload the FP registers from
+	 * current->thread when it next does FP instructions
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
 #ifdef CONFIG_SPE
 	/* force the process to reload the spe registers from
 	   current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+	int i;
+#endif
 
 	flush_fp_to_thread(current);
 
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
+	flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+	/* Copy FP to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+	/* copy fpr regs and fpscr */
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
 	err |= __put_user(&sc->gp_regs, &sc->regs);
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs;
 #endif
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+#endif
 	unsigned long err = 0;
 	unsigned long save_r13 = 0;
 	elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
 	if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
 	else
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* restore floating point */
+	err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+	if (err)
+		return err;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+#else
+	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
 	return err;
 }
 
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
 #ifdef CONFIG_PPC32
@@ -136,7 +137,13 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
+#else
 #define TS_FPR(i) fpr[i]
+#endif
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
@@ -154,8 +161,12 @@ struct thread_struct {
 	unsigned long	dbcr0;		/* debug control register values */
 	unsigned long	dbcr1;
 #endif
+#ifdef CONFIG_VSX
+	double		fpr[32][2];	/* Complete floating point set */
+#else
 	double		fpr[32];	/* Complete floating point set */
-	struct {			/* fpr ... fpscr must be contiguous */
+#endif
+	struct {
 
 		unsigned int pad;
 		unsigned int val;	/* Floating point status */
@@ -175,6 +186,10 @@ struct thread_struct {
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* VSR status */
+	int		used_vsr;	/* set if process has used altivec */
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
 	u64		acc;		/* Accumulator */
@@ -291,5 +306,9 @@ static inline void prefetchw(const void 
 
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
 #define TS_FPRSPACING 1
+#endif
 #endif /* _ASM_POWERPC_PROCESSOR_H */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (6 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 14:19           ` Kumar Gala
  2008-06-24 10:57         ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only useful if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  CPUs (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (3 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-24 10:57         ` Michael Neuling
  2008-06-24 10:57         ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
                           ` (4 subsequent siblings)
  9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/entry_64.S   |    5 +
 arch/powerpc/kernel/fpu.S        |   16 ++++-
 arch/powerpc/kernel/head_64.S    |   65 +++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S    |   33 ++++++++++++
 arch/powerpc/kernel/ppc32.h      |    1 
 arch/powerpc/kernel/ppc_ksyms.c  |    3 +
 arch/powerpc/kernel/process.c    |  107 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/kernel/ptrace.c     |   70 +++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c  |   33 ++++++++++++
 arch/powerpc/kernel/signal_64.c  |   31 ++++++++++-
 arch/powerpc/kernel/traps.c      |   29 ++++++++++
 include/asm-powerpc/elf.h        |    6 +-
 include/asm-powerpc/ptrace.h     |   12 ++++
 include/asm-powerpc/reg.h        |    2 
 include/asm-powerpc/sigcontext.h |   37 +++++++++++++
 include/asm-powerpc/system.h     |    9 +++
 16 files changed, 451 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
 	mflr	r20		/* Return to switch caller */
 	mfmsr	r22
 	li	r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r0,r0,MSR_VSX@h	/* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	oris	r0,r0,MSR_VEC@h	/* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
 _GLOBAL(load_up_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
 	beq	1f
 	toreal(r4)
 	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPRS(0, r4)
+	SAVE_32FPVSRS(0, r5, r4)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r4)
 	PPC_LL	r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
 #endif
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
-	REST_32FPRS(0, r5)
+	REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
 	subi	r4,r5,THREAD
 	fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC_601
 	ISYNC_601
 	MTMSRD(r5)			/* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
-	SAVE_32FPRS(0, r3)
+	SAVE_32FPVSRS(0, r4 ,r3)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf20
 	b	altivec_unavailable_pSeries
 
+	. = 0xf40
+	b	vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
 	STD_EXCEPTION_PSERIES(., altivec_unavailable)
+	STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -836,6 +840,67 @@ _STATIC(load_up_altivec)
 	blr
 #endif /* CONFIG_ALTIVEC */
 
+	.align	7
+	.globl vsx_unavailable_common
+vsx_unavailable_common:
+	EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	bne	.load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	ENABLE_INTS
+	bl	.vsx_unavailable_exception
+	b	.ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+	andi.	r5,r12,MSR_FP
+	beql+	load_up_fpu		/* skip if already loaded */
+	andis.	r5,r12,MSR_VEC@h
+	beql+	load_up_altivec		/* skip if already loaded */
+
+#ifndef CONFIG_SMP
+	ld	r3,last_task_used_vsx@got(r2)
+	ld	r4,0(r3)
+	cmpdi	0,r4,0
+	beq	1f
+	/* Disable VSX for last_task_used_vsx */
+	addi	r4,r4,THREAD
+	ld	r5,PT_REGS(r4)
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r6,MSR_VSX@h
+	andc	r6,r4,r6
+	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+	ld	r4,PACACURRENT(r13)
+	addi	r4,r4,THREAD		/* Get THREAD */
+	li	r6,1
+	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	/* enable use of VSX after return */
+	oris	r12,r12,MSR_VSX@h
+	std	r12,_MSR(r1)
+#ifndef CONFIG_SMP
+	/* Update last_task_used_math to 'current' */
+	ld	r4,PACACURRENT(r13)
+	std	r4,0(r3)
+#endif /* CONFIG_SMP */
+	b	fast_exception_return
+#endif /* CONFIG_VSX */
+
 /*
  * Hash table stuff
  */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
 
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+	mfmsr	r5
+	oris	r5,r5,MSR_VSX@h
+	mtmsrd	r5			/* enable use of VSX now */
+	isync
+
+	cmpdi	0,r3,0
+	beqlr-				/* if no previous owner, done */
+	addi	r3,r3,THREAD		/* want THREAD of task */
+	ld	r5,PT_REGS(r3)
+	cmpdi	0,r5,0
+	beq	1f
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r3,MSR_VSX@h
+	andc	r4,r4,r3		/* disable VSX for previous task */
+	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+	li	r5,0
+	ld	r4,last_task_used_vsx@got(r2)
+	std	r5,0(r4)
+#endif /* CONFIG_SMP */
+	blr
+
+#endif /* CONFIG_VSX */
+
 /* kexec_wait(phys_cpu)
  *
  * wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
 	elf_fpregset_t		mc_fregs;
 	unsigned int		mc_pad[2];
 	elf_vrregset_t32	mc_vregs __attribute__((__aligned__(16)));
+	elf_vsrreghalf_t32      mc_vsregs __attribute__((__aligned__(16)));
 };
 
 struct ucontext32 { 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
 #ifdef CONFIG_ALTIVEC
 EXPORT_SYMBOL(giveup_altivec);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 EXPORT_SYMBOL(giveup_spe);
 #endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
 {
+#ifdef CONFIG_VSX
+	int i;
+	elf_fpreg_t *reg;
+#endif
+
 	if (!tsk->thread.regs)
 		return 0;
 	flush_fp_to_thread(current);
 
+#ifdef CONFIG_VSX
+	reg = (elf_fpreg_t *)fpregs;
+	for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+		*reg = tsk->thread.TS_FPR(i);
+	memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
 	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
 
 	return 1;
 }
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
 	}
 }
 
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
 {
 	/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
 	 * separately, see below */
@@ -179,6 +192,80 @@ int dump_task_altivec(struct task_struct
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+	WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+		giveup_vsx(current);
+	else
+		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
+#else
+	giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+	if (tsk->thread.regs) {
+		preempt_disable();
+		if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+			BUG_ON(tsk != current);
+#endif
+			giveup_vsx(tsk);
+		}
+		preempt_enable();
+	}
+}
+
+/*
+ * This dumps the lower half 64bits of the first 32 VSX registers.
+ * This needs to be called with dump_task_fp and dump_task_altivec to
+ * get all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+	elf_vrreg_t *reg;
+	double buf[32];
+	int i;
+
+	if (tsk == current)
+		flush_vsx_to_thread(tsk);
+
+	reg = (elf_vrreg_t *)vrregs;
+
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+	memcpy(reg, buf, sizeof(buf));
+
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+	int rc = 0;
+	elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+	rc = dump_task_altivec(tsk, regs);
+	if (rc)
+		return rc;
+	regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+	rc = dump_task_vsx(tsk, regs);
+#endif
+	return rc;
+}
+
 #ifdef CONFIG_SPE
 
 void enable_kernel_spe(void)
@@ -233,6 +320,10 @@ void discard_lazy_cpu_state(void)
 	if (last_task_used_altivec == current)
 		last_task_used_altivec = NULL;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (last_task_used_vsx == current)
+		last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -297,6 +388,10 @@ struct task_struct *__switch_to(struct t
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
 		giveup_altivec(prev);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+		giveup_vsx(prev);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -317,6 +412,10 @@ struct task_struct *__switch_to(struct t
 	if (new->thread.regs && last_task_used_altivec == new)
 		new->thread.regs->msr |= MSR_VEC;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (new->thread.regs && last_task_used_vsx == new)
+		new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
@@ -417,6 +516,8 @@ static struct regbit {
 	{MSR_EE,	"EE"},
 	{MSR_PR,	"PR"},
 	{MSR_FP,	"FP"},
+	{MSR_VEC,	"VEC"},
+	{MSR_VSX,	"VSX"},
 	{MSR_ME,	"ME"},
 	{MSR_IR,	"IR"},
 	{MSR_DR,	"DR"},
@@ -534,6 +635,7 @@ void prepare_to_copy(struct task_struct 
 {
 	flush_fp_to_thread(current);
 	flush_altivec_to_thread(current);
+	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
 }
 
@@ -689,6 +791,9 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+	current->thread.used_vsr = 0;
+#endif
 	memset(current->thread.fpr, 0,
 	       sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell.  This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+		      const struct user_regset *regset)
+{
+	flush_vsx_to_thread(target);
+	return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  target->thread.fpr, 0,
+				  32 * sizeof(vector128));
+
+	return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.fpr, 0,
+				 32 * sizeof(vector128));
+
+	return ret;
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_SPE
 
 /*
@@ -427,6 +472,9 @@ enum powerpc_regset {
 #ifdef CONFIG_ALTIVEC
 	REGSET_VMX,
 #endif
+#ifdef CONFIG_VSX
+	REGSET_VSX,
+#endif
 #ifdef CONFIG_SPE
 	REGSET_SPE,
 #endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
 		.active = vr_active, .get = vr_get, .set = vr_set
 	},
 #endif
+#ifdef CONFIG_VSX
+	[REGSET_VSX] = {
+		.n = 32,
+		.size = sizeof(vector128), .align = sizeof(vector128),
+		.active = vsr_active, .get = vsr_get, .set = vsr_set
+	},
+#endif
 #ifdef CONFIG_SPE
 	[REGSET_SPE] = {
 		.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
 						 sizeof(u32)),
 					     (const void __user *) data);
 #endif
+#ifdef CONFIG_VSX
+	case PTRACE_GETVSRREGS:
+		return copy_regset_to_user(child, &user_ppc_native_view,
+					   REGSET_VSX,
+					   0, (32 * sizeof(vector128) +
+					       sizeof(u32)),
+					   (void __user *) data);
+
+	case PTRACE_SETVSRREGS:
+		return copy_regset_from_user(child, &user_ppc_native_view,
+					     REGSET_VSX,
+					     0, (32 * sizeof(vector128) +
+						 sizeof(u32)),
+					     (const void __user *) data);
+#endif
 #ifdef CONFIG_SPE
 	case PTRACE_GETEVRREGS:
 		/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
 		return 1;
+	/*
+	 * Copy VSR 0-31 upper half from thread_struct to local
+	 * buffer, then write that to userspace.  Also set MSR_VSX in
+	 * the saved MSR value to indicate that frame->mc_vregs
+	 * contains valid data
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		if (__copy_to_user(&frame->mc_vsregs, buf,
+				   ELF_NVSRHALFREG  * sizeof(double)))
+			return 1;
+		msr |= MSR_VSX;
+	}
 #else
 	/* save floating-point registers */
 	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
 	for (i = 0; i < 32 ; i++)
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+	/*
+	 * Force the process to reload the VSX registers from
+	 * current->thread when it next does VSX instruction.
+	 */
+	regs->msr &= ~MSR_VSX;
+	if (msr & MSR_VSX) {
+		/*
+		 * Restore altivec registers from the stack to a local
+		 * buffer, then write this out to the thread_struct
+		 */
+		if (__copy_from_user(buf, &sr->mc_vsregs,
+				     sizeof(sr->mc_vsregs)))
+			return 1;
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	} else if (current->thread.used_vsr)
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
 #else
 	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
 		buf[i] = current->thread.TS_FPR(i);
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+	/*
+	 * Copy VSX low doubleword to local buffer for formatting,
+	 * then out to userspace.  Update v_regs to point after the
+	 * VMX data.
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		v_regs += ELF_NVRREG;
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+		/* set MSR_VSX in the MSR value in the frame to
+		 * indicate that sc->vs_reg) contains valid data.
+		 */
+		msr |= MSR_VSX;
+	}
 #else /* CONFIG_VSX */
 	/* copy fpr regs and fpscr */
 	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
 	 * This has to be done before copying stuff into current->thread.fpr/vr
 	 * for the reasons explained in the previous comment.
 	 */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Get additional VSX data. Update v_regs to point after the
+	 * VMX data.  Copy VSX low doubleword from userspace to local
+	 * buffer for formatting, then into the taskstruct.
+	 */
+	v_regs += ELF_NVRREG;
+	if ((msr & MSR_VSX) != 0)
+		err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+	else
+		memset(buf, 0, 32 * sizeof(double));
+
+	for (i = 0; i < 32 ; i++)
+		current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
 #else
 	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
 #endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* A user program has executed an vsx instruction,
+		   but this kernel doesn't support vsx. */
+		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+		return;
+	}
+
+	printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+			"%lx at %lx\n", regs->trap, regs->nip);
+	die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
 	perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "VSX assist exception in kernel mode"
+		       " at %lx\n", regs->nip);
+		die("Kernel VSX assist exception", regs, SIGILL);
+	}
+
+	flush_vsx_to_thread(current);
+	printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+	_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_FSL_BOOKE
 void CacheLockingException(struct pt_regs *regs, unsigned long address,
 			   unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
 #ifdef __powerpc64__
 # define ELF_NVRREG32	33	/* includes vscr & vrsave stuffed together */
 # define ELF_NVRREG	34	/* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32	/* Half the vsx registers */
 # define ELF_GREG_TYPE	elf_greg_t64
 #else
 # define ELF_NEVRREG	34	/* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
 typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
 #ifdef __powerpc64__
 typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 #endif
 
 #ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
 typedef elf_vrregset_t elf_fpxregset_t;
 
 #ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
 #define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
 #endif
 
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
 #define PT_VRSAVE_32 (PT_VR0 + 33*4)
 #endif
 
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150	/* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 	/* each VSR reg occupies 4 slots in 32-bit */
+#endif
 #endif /* __powerpc64__ */
 
 /*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
 #define PTRACE_GETEVRREGS	20
 #define PTRACE_SETEVRREGS	21
 
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS	27
+#define PTRACE_SETVSRREGS	28
+
 /*
  * Get or set a debug register. The first 16 are DABR registers and the
  * second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
 #define MSR_ISF_LG	61              /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG 	60              /* Hypervisor state */
 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */
 #define MSR_WE_LG	18		/* Wait State Enable */
 #define MSR_TGPR_LG	17		/* TLB Update registers in use */
@@ -71,6 +72,7 @@
 #endif
 
 #define MSR_VEC		__MASK(MSR_VEC_LG)	/* Enable AltiVec */
+#define MSR_VSX		__MASK(MSR_VSX_LG)	/* Enable VSX */
 #define MSR_POW		__MASK(MSR_POW_LG)	/* Enable Power Management */
 #define MSR_WE		__MASK(MSR_WE_LG)	/* Wait State Enable */
 #define MSR_TGPR	__MASK(MSR_TGPR_LG)	/* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
  * it must be copied via a vector register to/from storage) or as a word.
  * The entry with index 33 contains the vrsave as the first word (offset 0)
  * within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words.  Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ *                    VSR doubleword 0               VSR doubleword 1
+ *           ----------------------------------------------------------------
+ *   VSR[0]  |             FPR[0]            |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[1]  |             FPR[1]            |                              |
+ *           ----------------------------------------------------------------
+ *           |              ...              |                              |
+ *           |              ...              |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[30] |             FPR[30]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[31] |             FPR[31]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[32] |                             VR[0]                            |
+ *           ----------------------------------------------------------------
+ *   VSR[33] |                             VR[1]                            |
+ *           ----------------------------------------------------------------
+ *           |                              ...                             |
+ *           |                              ...                             |
+ *           ----------------------------------------------------------------
+ *   VSR[62] |                             VR[30]                           |
+ *           ----------------------------------------------------------------
+ *   VSR[63] |                             VR[31]                           |
+ *           ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve.  vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
  */
 	elf_vrreg_t	__user *v_regs;
-	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
 #endif
 };
 
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
 extern void giveup_altivec(struct task_struct *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
 }
 #endif
 
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 #ifdef CONFIG_SPE
 extern void flush_spe_to_thread(struct task_struct *);
 #else

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-24 10:57         ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-24 13:47           ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 13:47 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:

> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit.  This doesn't matter in reality as they are infact the same bit
> but looks bad.
>
> Also, when we add VSX in a later patch, we need to be able to set two
> separate MSR bits here.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Acked-by: Kumar Gala <galak@kernel.crashing.org>

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-24 10:57         ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-24 14:01           ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:01 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:

> Make load_up_fpu and load_up_altivec callable so they can be reused by
> the VSX code.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Acked-by: Kumar Gala <galak@kernel.crashing.org>

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-24 10:57         ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-24 14:06           ` Kumar Gala
  2008-06-25  0:06             ` Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:06 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:

> This adds the macros for the VSX load/store instruction as most
> binutils are not going to support this for a while.
>
> Also add VSX register save/restore macros and vsr[0-63] register  
> definitions.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++ 
> ++++++++++
> 1 file changed, 127 insertions(+)
>
> Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
> 				REST_10GPRS(22, base)
> #endif
>
> +/*
> + * Define what the VSX XX1 form instructions will look like, then add
> + * the 128 bit load store instructions based on that.
> + */
> +#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
> +				 ((rb) << 11) | (((xs) >> 5)))
> +
> +#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra),  
> (rb)))
> +#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra),  
> (rb)))
>
> #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
> #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
> @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
> #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n 
> +8,b,base)
> #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n 
> +16,b,base)
>
> +/* Save the lower 32 VSRs in the thread VSR region */
> +#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));   
> STXVD2X(n,b,base)
> +#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
> +#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n 
> +2,b,base)
> +#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n 
> +4,b,base)
> +#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n 
> +8,b,base)
> +#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n 
> +16,b,base)
> +#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  
> LXVD2X(n,b,base)
> +#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
> +#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n 
> +2,b,base)
> +#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n 
> +4,b,base)
> +#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n 
> +8,b,base)
> +#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n 
> +16,b,base)
> +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
> +#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n 
> +32,b,base)
> +#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n 
> +1,b,base)
> +#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n 
> +2,b,base)
> +#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n 
> +4,b,base)
> +#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n 
> +8,b,base)
> +#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base);  
> SAVE_16VSRSU(n+16,b,base)
> +#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n 
> +32,b,base)
> +#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n 
> +1,b,base)
> +#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n 
> +2,b,base)
> +#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n 
> +4,b,base)
> +#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n 
> +8,b,base)
> +#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base);  
> REST_16VSRSU(n+16,b,base)
> +
> +#ifdef CONFIG_VSX

I think we should do this in fpu.S so its clearly in the code when  
reading it what's going on.

>
> +#define REST_32FPVSRS(n,c,base)						\
> +BEGIN_FTR_SECTION							\
> +	b	2f;							\
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> +	REST_32FPRS(n,base);						\
> +	b	3f;							\
> +2:	REST_32VSRS(n,c,base);						\
> +3:
> +
> +#define SAVE_32FPVSRS(n,c,base)						\
> +BEGIN_FTR_SECTION							\
> +	b	2f;							\
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
> +	SAVE_32FPRS(n,base);						\
> +	b	3f;							\
> +2:	SAVE_32VSRS(n,c,base);						\
> +3:
> +
> +#else
> +#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
> +#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
> +#endif

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-24 10:57         ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-24 14:07           ` Kumar Gala
  2008-06-24 16:33             ` Segher Boessenkool
  2008-06-25  0:25             ` Michael Neuling
  0 siblings, 2 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:07 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:

> We are going to change where the floating point registers are stored
> in the thread_struct, so in preparation add some macros to access the
> floating point registers.  Update all code to use these new macros.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> arch/powerpc/kernel/align.c      |    6 ++--
> arch/powerpc/kernel/process.c    |    5 ++-
> arch/powerpc/kernel/ptrace.c     |   14 +++++----
> arch/powerpc/kernel/ptrace32.c   |   14 +++++++--
> arch/powerpc/kernel/softemu8xx.c |    4 +-
> arch/powerpc/math-emu/math.c     |   56 ++++++++++++++++++ 
> +--------------------
> include/asm-powerpc/ppc_asm.h    |    5 ++-
> include/asm-powerpc/processor.h  |    3 ++
> 8 files changed, 61 insertions(+), 46 deletions(-)
>

> Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> @@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
> 	flush_fp_to_thread(target);
>
> 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> -		     offsetof(struct thread_struct, fpr[32]));
> +		     offsetof(struct thread_struct, TS_FPR(32)));
>
> 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> -				   &target->thread.fpr, 0, -1);
> +				   target->thread.fpr, 0, -1);

is there a reason we can drop the '&'? (I'm only look at this as a  
textual diff, not at what the code is trying to do).
>
> }
>
> static int fpr_set(struct task_struct *target, const struct  
> user_regset *regset,
> @@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
> 	flush_fp_to_thread(target);
>
> 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> -		     offsetof(struct thread_struct, fpr[32]));
> +		     offsetof(struct thread_struct, TS_FPR(32)));
>
> 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> -				  &target->thread.fpr, 0, -1);
> +				  target->thread.fpr, 0, -1);

ditto.
>
> }
>
>
> @@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
> 			tmp = ptrace_get_reg(child, (int) index);
> 		} else {
> 			flush_fp_to_thread(child);
> -			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
> +			tmp = ((unsigned long *)child->thread.fpr)
> +				[TS_FPRSPACING * (index - PT_FPR0)];
> 		}
> 		ret = put_user(tmp,(unsigned long __user *) data);
> 		break;
> @@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
> 			ret = ptrace_put_reg(child, index, data);
> 		} else {
> 			flush_fp_to_thread(child);
> -			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
> +			((unsigned long *)child->thread.fpr)
> +				[TS_FPRSPACING * (index - PT_FPR0)] = data;
> 			ret = 0;
> 		}
> 		break;
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> @@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
> 	return -EPERM;
> }
>
> +/* Macros to workout the correct index for the FPR in the thread  
> struct */
> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> +#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)

we should either use this macros in both ptrace.c and ptrace32.c or  
drop them

>
> +
> long compat_arch_ptrace(struct task_struct *child, compat_long_t  
> request,
> 			compat_ulong_t caddr, compat_ulong_t cdata)
> {
> @@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
> 			 * to be an array of unsigned int (32 bits) - the
> 			 * index passed in is based on this assumption.
> 			 */
> -			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
> +			tmp = ((unsigned int *)child->thread.fpr)
> +				[FPRINDEX(index)];
> 		}
> 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
> 		break;
> @@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
> 		CHECK_FULL_REGS(child->thread.regs);
> 		if (numReg >= PT_FPR0) {
> 			flush_fp_to_thread(child);
> -			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
> +			tmp = ((unsigned long int *)child->thread.fpr)
> +				[FPRINDEX(numReg)];
> 		} else { /* register within PT_REGS struct */
> 			tmp = ptrace_get_reg(child, numReg);
> 		}
> @@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
> 			 * to be an array of unsigned int (32 bits) - the
> 			 * index passed in is based on this assumption.
> 			 */
> -			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
> +			((unsigned int *)child->thread.fpr)
> +				[TS_FPRSPACING * (index - PT_FPR0)] = data;

is there a reason this isn't FPRINDEX(index)?

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-24 10:57         ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-24 14:19           ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:19 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:

> Add a VSX CPU feature.  Also add code to detect if VSX is available
> from the device tree.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

Acked-by: Kumar Gala <galak@kernel.crashing.org>

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-24 10:57         ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-24 14:19           ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:19 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:

> Add CONFIG_VSX config build option.  Must compile with POWER4, FPU  
> and ALTIVEC.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>

Acked-by: Kumar Gala <galak@kernel.crashing.org>

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-24 14:07           ` Kumar Gala
@ 2008-06-24 16:33             ` Segher Boessenkool
  2008-06-25  0:25             ` Michael Neuling
  1 sibling, 0 replies; 106+ messages in thread
From: Segher Boessenkool @ 2008-06-24 16:33 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras

>> 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>> -				   &target->thread.fpr, 0, -1);
>> +				   target->thread.fpr, 0, -1);
>
> is there a reason we can drop the '&'?

Yes, .fpr is an array.  C is _such_ a fun language, heh.


Segher

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-24 14:06           ` Kumar Gala
@ 2008-06-25  0:06             ` Michael Neuling
  2008-06-25  2:19               ` Kumar Gala
  0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  0:06 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras

In message <B2FEFA8A-8814-44BE-81E5-8E2A873C2A1F@kernel.crashing.org> you wrote
:
> 
> On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> 
> > This adds the macros for the VSX load/store instruction as most
> > binutils are not going to support this for a while.
> >
> > Also add VSX register save/restore macros and vsr[0-63] register  
> > definitions.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >
> > include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++++ 
> > ++++++++++
> > 1 file changed, 127 insertions(+)
> >
> > Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
> > +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> > @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);		
		
> > 				REST_10GPRS(22, base)
> > #endif
> >
> > +/*
> > + * Define what the VSX XX1 form instructions will look like, then add
> > + * the 128 bit load store instructions based on that.
> > + */
> > +#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  
\
> > +				 ((rb) << 11) | (((xs) >> 5)))
> > +
> > +#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra),
  
> > (rb)))
> > +#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra),  
> > (rb)))
> >
> > #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
> > #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
> > @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);		
		
> > #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n 
> > +8,b,base)
> > #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n 
> > +16,b,base)
> >
> > +/* Save the lower 32 VSRs in the thread VSR region */
> > +#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));   
> > STXVD2X(n,b,base)
> > +#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base
)
> > +#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n 
> > +2,b,base)
> > +#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n 
> > +4,b,base)
> > +#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n 
> > +8,b,base)
> > +#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n 
> > +16,b,base)
> > +#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  
> > LXVD2X(n,b,base)
> > +#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base
)
> > +#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n 
> > +2,b,base)
> > +#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n 
> > +4,b,base)
> > +#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n 
> > +8,b,base)
> > +#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n 
> > +16,b,base)
> > +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
> > +#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n 
> > +32,b,base)
> > +#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n 
> > +1,b,base)
> > +#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n 
> > +2,b,base)
> > +#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n 
> > +4,b,base)
> > +#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n 
> > +8,b,base)
> > +#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base);  
> > SAVE_16VSRSU(n+16,b,base)
> > +#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n 
> > +32,b,base)
> > +#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n 
> > +1,b,base)
> > +#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n 
> > +2,b,base)
> > +#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n 
> > +4,b,base)
> > +#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n 
> > +8,b,base)
> > +#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base);  
> > REST_16VSRSU(n+16,b,base)
> > +
> > +#ifdef CONFIG_VSX
> 
> I think we should do this in fpu.S so its clearly in the code when  
> reading it what's going on.

Do you mean the section above or below this comment? 

> 
> >
> > +#define REST_32FPVSRS(n,c,base)					
	\
> > +BEGIN_FTR_SECTION							\
> > +	b	2f;							\
> > +END_FTR_SECTION_IFSET(CPU_FTR_VSX);				
	\
> > +	REST_32FPRS(n,base);						\
> > +	b	3f;							\
> > +2:	REST_32VSRS(n,c,base);						\
> > +3:
> > +
> > +#define SAVE_32FPVSRS(n,c,base)					
	\
> > +BEGIN_FTR_SECTION							\
> > +	b	2f;							\
> > +END_FTR_SECTION_IFSET(CPU_FTR_VSX);				
	\
> > +	SAVE_32FPRS(n,base);						\
> > +	b	3f;							\
> > +2:	SAVE_32VSRS(n,c,base);						\
> > +3:
> > +
> > +#else
> > +#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
> > +#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
> > +#endif
> 
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-24 14:07           ` Kumar Gala
  2008-06-24 16:33             ` Segher Boessenkool
@ 2008-06-25  0:25             ` Michael Neuling
  1 sibling, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  0:25 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras



In message <0DCAAAC2-52AB-4704-98C0-4E9235C3AC88@kernel.crashing.org> you wrote
:
> 
> On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> 
> > We are going to change where the floating point registers are stored
> > in the thread_struct, so in preparation add some macros to access the
> > floating point registers.  Update all code to use these new macros.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >
> > arch/powerpc/kernel/align.c      |    6 ++--
> > arch/powerpc/kernel/process.c    |    5 ++-
> > arch/powerpc/kernel/ptrace.c     |   14 +++++----
> > arch/powerpc/kernel/ptrace32.c   |   14 +++++++--
> > arch/powerpc/kernel/softemu8xx.c |    4 +-
> > arch/powerpc/math-emu/math.c     |   56 ++++++++++++++++++ 
> > +--------------------
> > include/asm-powerpc/ppc_asm.h    |    5 ++-
> > include/asm-powerpc/processor.h  |    3 ++
> > 8 files changed, 61 insertions(+), 46 deletions(-)
> >
> 
> > Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
> > +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> > @@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
> > 	flush_fp_to_thread(target);
> >
> > 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> > -		     offsetof(struct thread_struct, fpr[32]));
> > +		     offsetof(struct thread_struct, TS_FPR(32)));
> >
> > 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> > -				   &target->thread.fpr, 0, -1);
> > +				   target->thread.fpr, 0, -1);
> 
> is there a reason we can drop the '&'? (I'm only look at this as a  
> textual diff, not at what the code is trying to do).

Oops.. I'll fix.

> >
> > }
> >
> > static int fpr_set(struct task_struct *target, const struct  
> > user_regset *regset,
> > @@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
> > 	flush_fp_to_thread(target);
> >
> > 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> > -		     offsetof(struct thread_struct, fpr[32]));
> > +		     offsetof(struct thread_struct, TS_FPR(32)));
> >
> > 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> > -				  &target->thread.fpr, 0, -1);
> > +				  target->thread.fpr, 0, -1);
> 
> ditto.
> >
> > }
> >
> >
> > @@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
> > 			tmp = ptrace_get_reg(child, (int) index);
> > 		} else {
> > 			flush_fp_to_thread(child);
> > -			tmp = ((unsigned long *)child->thread.fpr)[index - PT_F
PR0];
> > +			tmp = ((unsigned long *)child->thread.fpr)
> > +				[TS_FPRSPACING * (index - PT_FPR0)];
> > 		}
> > 		ret = put_user(tmp,(unsigned long __user *) data);
> > 		break;
> > @@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
> > 			ret = ptrace_put_reg(child, index, data);
> > 		} else {
> > 			flush_fp_to_thread(child);
> > -			((unsigned long *)child->thread.fpr)[index - PT_FPR0] =
 data;
> > +			((unsigned long *)child->thread.fpr)
> > +				[TS_FPRSPACING * (index - PT_FPR0)] = data;
> > 			ret = 0;
> > 		}
> > 		break;
> > Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
> > +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> > @@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
> > 	return -EPERM;
> > }
> >
> > +/* Macros to workout the correct index for the FPR in the thread  
> > struct */
> > +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> > +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> > +#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
> 
> we should either use this macros in both ptrace.c and ptrace32.c or  
> drop them

This set of macros is really only 32 bit specific since in ptrace 32 we
access the registers as 32 bits (hence needing two accesses to get the
full 64 bits), but in ptrace 64, we access them as 64 bit (hence only 1
access).

Theses macros are really only here to deal with the unique indexing into
the thread struct that we now need to do for ptrace 32 only (thanks to
paulus who pointed out I got this wrong first time).

The only macro here that could potentially be reused is FPRNUMER(i).  

> 
> >
> > +
> > long compat_arch_ptrace(struct task_struct *child, compat_long_t  
> > request,
> > 			compat_ulong_t caddr, compat_ulong_t cdata)
> > {
> > @@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
> > 			 * to be an array of unsigned int (32 bits) - the
> > 			 * index passed in is based on this assumption.
> > 			 */
> > -			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FP
R0];
> > +			tmp = ((unsigned int *)child->thread.fpr)
> > +				[FPRINDEX(index)];
> > 		}
> > 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
> > 		break;
> > @@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
> > 		CHECK_FULL_REGS(child->thread.regs);
> > 		if (numReg >= PT_FPR0) {
> > 			flush_fp_to_thread(child);
> > -			tmp = ((unsigned long int *)child->thread.fpr)[numReg -
 PT_FPR0];
> > +			tmp = ((unsigned long int *)child->thread.fpr)
> > +				[FPRINDEX(numReg)];
> > 		} else { /* register within PT_REGS struct */
> > 			tmp = ptrace_get_reg(child, numReg);
> > 		}
> > @@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
> > 			 * to be an array of unsigned int (32 bits) - the
> > 			 * index passed in is based on this assumption.
> > 			 */
> > -			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = 
data;
> > +			((unsigned int *)child->thread.fpr)
> > +				[TS_FPRSPACING * (index - PT_FPR0)] = data;
> 
> is there a reason this isn't FPRINDEX(index)?

Oops, fixed.  

Can you tell I only tested peek not poke user :-D 

> 
> - k
> 

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-25  0:06             ` Michael Neuling
@ 2008-06-25  2:19               ` Kumar Gala
  0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-25  2:19 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras


On Jun 24, 2008, at 7:06 PM, Michael Neuling wrote:

> In message  
> <B2FEFA8A-8814-44BE-81E5-8E2A873C2A1F@kernel.crashing.org> you wrote
> :
>>
>> On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
>>
>>> This adds the macros for the VSX load/store instruction as most
>>> binutils are not going to support this for a while.
>>>
>>> Also add VSX register save/restore macros and vsr[0-63] register
>>> definitions.
>>>
>>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>>> ---
>>>
>>> include/asm-powerpc/ppc_asm.h |  127 ++++++++++++++++++++++++++++++ 
>>> ++
>>> ++++++++++
>>> 1 file changed, 127 insertions(+)
>>>
>>> Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
>>> ===================================================================
>>> --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
>>> +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
>>> @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);		
> 		
>>> 				REST_10GPRS(22, base)
>>> #endif
>>>
>>> +/*
>>> + * Define what the VSX XX1 form instructions will look like, then  
>>> add
>>> + * the 128 bit load store instructions based on that.
>>> + */
>>> +#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |
> \
>>> +				 ((rb) << 11) | (((xs) >> 5)))
>>> +
>>> +#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra),
>
>>> (rb)))
>>> +#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra),
>>> (rb)))
>>>
>>> #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
>>> #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2,  
>>> base)
>>> @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);		
> 		
>>> #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n
>>> +8,b,base)
>>> #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n
>>> +16,b,base)
>>>
>>> +/* Save the lower 32 VSRs in the thread VSR region */
>>> +#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));
>>> STXVD2X(n,b,base)
>>> +#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n 
>>> +1,b,base
> )
>>> +#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n
>>> +2,b,base)
>>> +#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n
>>> +4,b,base)
>>> +#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n
>>> +8,b,base)
>>> +#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n
>>> +16,b,base)
>>> +#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));
>>> LXVD2X(n,b,base)
>>> +#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n 
>>> +1,b,base
> )
>>> +#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n
>>> +2,b,base)
>>> +#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n
>>> +4,b,base)
>>> +#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n
>>> +8,b,base)
>>> +#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n
>>> +16,b,base)
>>> +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31)  
>>> */
>>> +#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n
>>> +32,b,base)
>>> +#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n
>>> +1,b,base)
>>> +#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n
>>> +2,b,base)
>>> +#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n
>>> +4,b,base)
>>> +#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n
>>> +8,b,base)
>>> +#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base);
>>> SAVE_16VSRSU(n+16,b,base)
>>> +#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n
>>> +32,b,base)
>>> +#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n
>>> +1,b,base)
>>> +#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n
>>> +2,b,base)
>>> +#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n
>>> +4,b,base)
>>> +#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n
>>> +8,b,base)
>>> +#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base);
>>> REST_16VSRSU(n+16,b,base)
>>> +
>>> +#ifdef CONFIG_VSX
>>
>> I think we should do this in fpu.S so its clearly in the code when
>> reading it what's going on.
>
> Do you mean the section above or below this comment?

Sorry, the code below.  (That does REST_32FPVSRS)..

>
>
>>
>>>
>>> +#define REST_32FPVSRS(n,c,base)					
> 	\
>>> +BEGIN_FTR_SECTION							\
>>> +	b	2f;							\
>>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);				
> 	\
>>> +	REST_32FPRS(n,base);						\
>>> +	b	3f;							\
>>> +2:	REST_32VSRS(n,c,base);						\
>>> +3:
>>> +
>>> +#define SAVE_32FPVSRS(n,c,base)					
> 	\
>>> +BEGIN_FTR_SECTION							\
>>> +	b	2f;							\
>>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);				
> 	\
>>> +	SAVE_32FPRS(n,base);						\
>>> +	b	3f;							\
>>> +2:	SAVE_32VSRS(n,c,base);						\
>>> +3:
>>> +
>>> +#else
>>> +#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
>>> +#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
>>> +#endif
>>
>>

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
  2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                           ` (8 preceding siblings ...)
  2008-06-24 10:57         ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-25  4:07         ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
                             ` (8 more replies)
  9 siblings, 9 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7.  Includes context switch, ptrace and signals support.

Signed-off-by: Michael Neuling <mikey@neuling.org>
--- 
Paulus: please consider for your 2.6.27 tree.

Updates this post....
- White space change in start_thread thanks to Paulus
- thread_struct change/cleanup suggested thanks to Paulus. This
  also resulted in changing TS_FPRSPACING to TS_FPRWIDTH
- pointer to array fix, thanks to Kumar
- indexing macro fix in ptrace32 thanks to Kumar
- moved SAVE/REST_32FPVSRS to where they are used in fpu.S suggested by Kumar

This time for sure!

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-25  4:07           ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25 14:08             ` Kumar Gala
  2008-06-25  4:07           ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
                             ` (6 subsequent siblings)
  8 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/align.c      |    6 ++--
 arch/powerpc/kernel/process.c    |    2 -
 arch/powerpc/kernel/ptrace.c     |   10 ++++--
 arch/powerpc/kernel/ptrace32.c   |   14 +++++++--
 arch/powerpc/kernel/softemu8xx.c |    4 +-
 arch/powerpc/math-emu/math.c     |   56 +++++++++++++++++++--------------------
 include/asm-powerpc/ppc_asm.h    |    5 ++-
 include/asm-powerpc/processor.h  |    4 ++
 8 files changed, 58 insertions(+), 43 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,7 +218,7 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   &target->thread.fpr, 0, -1);
@@ -231,7 +231,7 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  &target->thread.fpr, 0, -1);
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.fpr)
+				[TS_FPRWIDTH * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.fpr)
+				[TS_FPRWIDTH * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
 	return -EPERM;
 }
 
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
+#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) + FPRHALF(i)
+
 long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			compat_ulong_t caddr, compat_ulong_t cdata)
 {
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.fpr)
+				[FPRINDEX(index)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.fpr)
+				[FPRINDEX(numReg)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.fpr)
+				[FPRINDEX(index)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -12,6 +12,8 @@
 
 #include <asm/reg.h>
 
+#define TS_FPRWIDTH 1
+
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <asm/ptrace.h>
@@ -136,6 +138,8 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
                             ` (7 subsequent siblings)
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit.  This doesn't matter in reality as they are infact the same bit
but looks bad.

Also, when we add VSX in a later patch, we need to be able to set two
separate MSR bits here.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/signal_32.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
 static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
 		int sigret)
 {
+	unsigned long msr = regs->msr;
+
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_VEC in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_VEC;
 	}
 	/* else assert((regs->msr & MSR_VEC) == 0) */
 
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
 			return 1;
 		/* set MSR_SPE in the saved MSR value to indicate that
 		   frame->mc_vregs contains valid data */
-		if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
-			return 1;
+		msr |= MSR_SPE;
 	}
 	/* else assert((regs->msr & MSR_SPE) == 0) */
 
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
 		return 1;
 #endif /* CONFIG_SPE */
 
+	if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+		return 1;
 	if (sigret) {
 		/* Set up the sigreturn trampoline: li r0,sigret; sc */
 		if (__put_user(0x38000000UL + sigret, &frame->tramp[0])

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 3/9] powerpc: Move altivec_unavailable
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                             ` (3 preceding siblings ...)
  2008-06-25  4:07           ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
                             ` (3 subsequent siblings)
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/head_64.S |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf00
 	b	performance_monitor_pSeries
 
-	STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+	. = 0xf20
+	b	altivec_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
+	STD_EXCEPTION_PSERIES(., altivec_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                             ` (5 preceding siblings ...)
  2008-06-25  4:07           ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
  2008-06-25  4:07           ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.  

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S        |    2 +-
 arch/powerpc/kernel/head_32.S    |    6 ++++--
 arch/powerpc/kernel/head_64.S    |   10 +++++++---
 arch/powerpc/kernel/head_booke.h |    6 ++++--
 4 files changed, 16 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
 	/* we haven't used ctr or xer or lr */
-	b	fast_exception_return
+	blr
 
 /*
  * giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
 	b 	ProgramCheck
 END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 	EXCEPTION_PROLOG
-	bne	load_up_fpu		/* if from user, just load it up */
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+	beq	1f
+	bl	load_up_fpu		/* if from user, just load it up */
+	b	fast_exception_return
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 /* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
 	ENABLE_INTS
 	bl	.kernel_fp_unavailable_exception
 	BUG_OPCODE
-1:	b	.load_up_fpu
+1:	bl	.load_up_fpu
+	b	fast_exception_return
 
 	.align	7
 	.globl altivec_unavailable_common
@@ -749,7 +750,10 @@ altivec_unavailable_common:
 	EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
-	bne	.load_up_altivec	/* if from user, just load it up */
+	beq	1f
+	bl	.load_up_altivec
+	b	fast_exception_return
+1:
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
 	bl	.save_nvgprs
@@ -829,7 +833,7 @@ _STATIC(load_up_altivec)
 	std	r4,0(r3)
 #endif /* CONFIG_SMP */
 	/* restore registers and return */
-	b	fast_exception_return
+	blr
 #endif /* CONFIG_ALTIVEC */
 
 /*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
 #define FP_UNAVAILABLE_EXCEPTION					      \
 	START_EXCEPTION(FloatingPointUnavailable)			      \
 	NORMAL_EXCEPTION_PROLOG;					      \
-	bne	load_up_fpu;		/* if from user, just load it up */   \
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
+	beq	1f;							      \
+	bl	load_up_fpu;		/* if from user, just load it up */   \
+	b	fast_exception_return;					      \
+1:	addi	r3,r1,STACK_FRAME_OVERHEAD;				      \
 	EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
 
 #endif /* __HEAD_BOOKE_H__ */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                             ` (2 preceding siblings ...)
  2008-06-25  4:07           ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
                             ` (4 subsequent siblings)
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:

                   VSR doubleword 0               VSR doubleword 1
          ----------------------------------------------------------------
  VSR[0]  |             FPR[0]            |                              |
          ----------------------------------------------------------------
  VSR[1]  |             FPR[1]            |                              |
          ----------------------------------------------------------------
          |              ...              |                              |
          |              ...              |                              |
          ----------------------------------------------------------------
  VSR[30] |             FPR[30]           |                              |
          ----------------------------------------------------------------
  VSR[31] |             FPR[31]           |                              |
          ----------------------------------------------------------------
  VSR[32] |                             VR[0]                            |
          ----------------------------------------------------------------
  VSR[33] |                             VR[1]                            |
          ----------------------------------------------------------------
          |                              ...                             |
          |                              ...                             |
          ----------------------------------------------------------------
  VSR[62] |                             VR[30]                           |
          ----------------------------------------------------------------
  VSR[63] |                             VR[31]                           |
          ----------------------------------------------------------------

VSX has 64 128bit registers.  The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits.  The
second 32 regs overlap with the VMX registers.

This patch introduces the thread_struct changes required to reflect
this register layout.  Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/asm-offsets.c |    4 ++
 arch/powerpc/kernel/ptrace.c      |   29 ++++++++++++++++++
 arch/powerpc/kernel/signal_32.c   |   59 ++++++++++++++++++++++++++++----------
 arch/powerpc/kernel/signal_64.c   |   32 ++++++++++++++++++--
 include/asm-powerpc/processor.h   |   18 +++++++++--
 5 files changed, 121 insertions(+), 21 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
 	DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
 #else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,29 +215,56 @@ static int fpr_get(struct task_struct *t
 		   unsigned int pos, unsigned int count,
 		   void *kbuf, void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = target->thread.TS_FPR(i);
+	memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   &target->thread.fpr, 0, -1);
+#endif
 }
 
 static int fpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
 {
+#ifdef CONFIG_VSX
+	double buf[33];
+	int i;
+#endif
 	flush_fp_to_thread(target);
 
+#ifdef CONFIG_VSX
+	/* copy to local buffer then write that out */
+	i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+	if (i)
+		return i;
+	for (i = 0; i < 32 ; i++)
+		target->thread.TS_FPR(i) = buf[i];
+	memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+	return 0;
+#else
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
 		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  &target->thread.fpr, 0, -1);
+#endif
 }
 
-
 #ifdef CONFIG_ALTIVEC
 /*
  * Get/set all the altivec registers vr0..vr31, vscr, vrsave, in one go.
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
 		int sigret)
 {
 	unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/* Make sure floating point registers are stored in regs */
 	flush_fp_to_thread(current);
 
-	/* save general and floating-point registers */
-	if (save_general_regs(regs, frame) ||
-	    __copy_to_user(&frame->mc_fregs, current->thread.fpr,
-		    ELF_NFPREG * sizeof(double)))
+	/* save general registers */
+	if (save_general_regs(regs, frame))
 		return 1;
 
 #ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
 	if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 #endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+	/* save FPR copy to local buffer then write to the thread_struct */
+	flush_fp_to_thread(current);
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+		return 1;
+#else
+	/* save floating-point registers */
+	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+		    ELF_NFPREG * sizeof(double)))
+		return 1;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* save spe registers */
 	if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
 	long err;
 	unsigned int save_r2 = 0;
 	unsigned long msr;
+#ifdef CONFIG_VSX
+	double buf[32];
+	int i;
+#endif
 
 	/*
 	 * restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
 	 */
 	discard_lazy_cpu_state();
 
-	/* force the process to reload the FP registers from
-	   current->thread when it next does FP instructions */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
-	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
-			     sizeof(sr->mc_fregs)))
-		return 1;
-
 #ifdef CONFIG_ALTIVEC
-	/* force the process to reload the altivec registers from
-	   current->thread when it next does altivec instructions */
+	/*
+	 * Force the process to reload the altivec registers from
+	 * current->thread when it next does altivec instructions
+	 */
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
 		return 1;
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+	if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+		return 1;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+#else
+	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+			     sizeof(sr->mc_fregs)))
+		return 1;
+#endif /* CONFIG_VSX */
+	/*
+	 * force the process to reload the FP registers from
+	 * current->thread when it next does FP instructions
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
 #ifdef CONFIG_SPE
 	/* force the process to reload the spe registers from
 	   current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
 #endif
 	unsigned long msr = regs->msr;
 	long err = 0;
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+	int i;
+#endif
 
 	flush_fp_to_thread(current);
 
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
 #else /* CONFIG_ALTIVEC */
 	err |= __put_user(0, &sc->v_regs);
 #endif /* CONFIG_ALTIVEC */
+	flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+	/* Copy FP to local buffer then write that out */
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.TS_FPR(i);
+	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
+	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+	/* copy fpr regs and fpscr */
+	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
 	err |= __put_user(&sc->gp_regs, &sc->regs);
 	WARN_ON(!FULL_REGS(regs));
 	err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
 	err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
-	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
 	err |= __put_user(signr, &sc->signal);
 	err |= __put_user(handler, &sc->handler);
 	if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
 #ifdef CONFIG_ALTIVEC
 	elf_vrreg_t __user *v_regs;
 #endif
+#ifdef CONFIG_VSX
+	double buf[FP_REGS_SIZE];
+#endif
 	unsigned long err = 0;
 	unsigned long save_r13 = 0;
 	elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
 	 */
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
 
-	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
 	if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
 	else
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* restore floating point */
+	err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+	if (err)
+		return err;
+	for (i = 0; i < 32 ; i++)
+		current->thread.TS_FPR(i) = buf[i];
+	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+#else
+	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
 	return err;
 }
 
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -12,7 +12,11 @@
 
 #include <asm/reg.h>
 
+#ifdef CONFIG_VSX
+#define TS_FPRWIDTH 2
+#else
 #define TS_FPRWIDTH 1
+#endif
 
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
@@ -80,6 +84,7 @@ extern long kernel_thread(int (*fn)(void
 /* Lazy FPU handling on uni-processor */
 extern struct task_struct *last_task_used_math;
 extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
 extern struct task_struct *last_task_used_spe;
 
 #ifdef CONFIG_PPC32
@@ -138,7 +143,9 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
-#define TS_FPR(i) fpr[i]
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
@@ -156,8 +163,9 @@ struct thread_struct {
 	unsigned long	dbcr0;		/* debug control register values */
 	unsigned long	dbcr1;
 #endif
-	double		fpr[32];	/* Complete floating point set */
-	struct {			/* fpr ... fpscr must be contiguous */
+	/* FP and VSX 0-31 register set */
+	double		fpr[32][TS_FPRWIDTH];
+	struct {
 
 		unsigned int pad;
 		unsigned int val;	/* Floating point status */
@@ -177,6 +185,10 @@ struct thread_struct {
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* VSR status */
+	int		used_vsr;	/* set if process has used altivec */
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	unsigned long	evr[32];	/* upper 32-bits of SPE regs */
 	u64		acc;		/* Accumulator */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 7/9] powerpc: Add VSX assembler code macros
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                             ` (4 preceding siblings ...)
  2008-06-25  4:07           ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
                             ` (2 subsequent siblings)
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.

Also add VSX register save/restore macros and vsr[0-63] register definitions.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/fpu.S     |   23 +++++++++
 include/asm-powerpc/ppc_asm.h |  103 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -24,6 +24,29 @@
 #include <asm/ppc_asm.h>
 #include <asm/asm-offsets.h>
 
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	REST_32FPRS(n,base);						\
+	b	3f;							\
+2:	REST_32VSRS(n,c,base);						\
+3:
+
+#define SAVE_32FPVSRS(n,c,base)						\
+BEGIN_FTR_SECTION							\
+	b	2f;							\
+END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
+	SAVE_32FPRS(n,base);						\
+	b	3f;							\
+2:	SAVE_32VSRS(n,c,base);						\
+3:
+#else
+#define REST_32FPVSRS(n,b,base)	REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base)	SAVE_32FPRS(n, base)
+#endif
+
 /*
  * This task wants to use the FPU now.
  * On UP, disable FP for the task which had the FPU previously,
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 				REST_10GPRS(22, base)
 #endif
 
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb)	(((xs) & 0x1f) << 21 | ((ra) << 16) |  \
+				 ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb)	.long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb)	.long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,33 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_16VRS(n,b,base)	REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
 #define REST_32VRS(n,b,base)	REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
 
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n));  STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base)	SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base)	SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base)	SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base)	SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base)	SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base)	li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base)	REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base)	REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base)	REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base)	REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base)	REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n));  STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base)	SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base)	SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base)	SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base)	SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base)	SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base)	li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base)	REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base)	REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base)	REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base)	REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base)	REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
 #define SAVE_EVR(n,s,base)	evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
 #define SAVE_2EVRS(n,s,base)	SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
 #define SAVE_4EVRS(n,s,base)	SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +570,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #define	vr30	30
 #define	vr31	31
 
+/* VSX Registers (VSRs) */
+
+#define	vsr0	0
+#define	vsr1	1
+#define	vsr2	2
+#define	vsr3	3
+#define	vsr4	4
+#define	vsr5	5
+#define	vsr6	6
+#define	vsr7	7
+#define	vsr8	8
+#define	vsr9	9
+#define	vsr10	10
+#define	vsr11	11
+#define	vsr12	12
+#define	vsr13	13
+#define	vsr14	14
+#define	vsr15	15
+#define	vsr16	16
+#define	vsr17	17
+#define	vsr18	18
+#define	vsr19	19
+#define	vsr20	20
+#define	vsr21	21
+#define	vsr22	22
+#define	vsr23	23
+#define	vsr24	24
+#define	vsr25	25
+#define	vsr26	26
+#define	vsr27	27
+#define	vsr28	28
+#define	vsr29	29
+#define	vsr30	30
+#define	vsr31	31
+#define	vsr32	32
+#define	vsr33	33
+#define	vsr34	34
+#define	vsr35	35
+#define	vsr36	36
+#define	vsr37	37
+#define	vsr38	38
+#define	vsr39	39
+#define	vsr40	40
+#define	vsr41	41
+#define	vsr42	42
+#define	vsr43	43
+#define	vsr44	44
+#define	vsr45	45
+#define	vsr46	46
+#define	vsr47	47
+#define	vsr48	48
+#define	vsr49	49
+#define	vsr50	50
+#define	vsr51	51
+#define	vsr52	52
+#define	vsr53	53
+#define	vsr54	54
+#define	vsr55	55
+#define	vsr56	56
+#define	vsr57	57
+#define	vsr58	58
+#define	vsr59	59
+#define	vsr60	60
+#define	vsr61	61
+#define	vsr62	62
+#define	vsr63	63
+
 /* SPE Registers (EVPRs) */
 
 #define	evr0	0

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
  2008-06-25  4:07           ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
  2008-06-25  4:07           ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
                             ` (5 subsequent siblings)
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available.  This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers.  Backward
compatibility is maintained.  

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/kernel/entry_64.S   |    5 +
 arch/powerpc/kernel/fpu.S        |   16 ++++-
 arch/powerpc/kernel/head_64.S    |   65 +++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S    |   33 ++++++++++++
 arch/powerpc/kernel/ppc32.h      |    1 
 arch/powerpc/kernel/ppc_ksyms.c  |    3 +
 arch/powerpc/kernel/process.c    |  107 ++++++++++++++++++++++++++++++++++++++-
 arch/powerpc/kernel/ptrace.c     |   70 +++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c  |   33 ++++++++++++
 arch/powerpc/kernel/signal_64.c  |   31 ++++++++++-
 arch/powerpc/kernel/traps.c      |   29 ++++++++++
 include/asm-powerpc/elf.h        |    6 +-
 include/asm-powerpc/ptrace.h     |   12 ++++
 include/asm-powerpc/reg.h        |    2 
 include/asm-powerpc/sigcontext.h |   37 +++++++++++++
 include/asm-powerpc/system.h     |    9 +++
 16 files changed, 451 insertions(+), 8 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
 	mflr	r20		/* Return to switch caller */
 	mfmsr	r22
 	li	r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r0,r0,MSR_VSX@h	/* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	oris	r0,r0,MSR_VEC@h	/* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -57,6 +57,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);					
 _GLOBAL(load_up_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC
 	MTMSRD(r5)			/* enable use of fpu now */
 	isync
@@ -73,7 +78,7 @@ _GLOBAL(load_up_fpu)
 	beq	1f
 	toreal(r4)
 	addi	r4,r4,THREAD		/* want last_task_used_math->thread */
-	SAVE_32FPRS(0, r4)
+	SAVE_32FPVSRS(0, r5, r4)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r4)
 	PPC_LL	r5,PT_REGS(r4)
@@ -100,7 +105,7 @@ _GLOBAL(load_up_fpu)
 #endif
 	lfd	fr0,THREAD_FPSCR(r5)
 	MTFSF_L(fr0)
-	REST_32FPRS(0, r5)
+	REST_32FPVSRS(0, r4, r5)
 #ifndef CONFIG_SMP
 	subi	r4,r5,THREAD
 	fromreal(r4)
@@ -119,6 +124,11 @@ _GLOBAL(load_up_fpu)
 _GLOBAL(giveup_fpu)
 	mfmsr	r5
 	ori	r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	oris	r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
 	SYNC_601
 	ISYNC_601
 	MTMSRD(r5)			/* enable use of fpu now */
@@ -129,7 +139,7 @@ _GLOBAL(giveup_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r5,0
-	SAVE_32FPRS(0, r3)
+	SAVE_32FPVSRS(0, r4 ,r3)
 	mffs	fr0
 	stfd	fr0,THREAD_FPSCR(r3)
 	beq	1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	. = 0xf20
 	b	altivec_unavailable_pSeries
 
+	. = 0xf40
+	b	vsx_unavailable_pSeries
+
 #ifdef CONFIG_CBE_RAS
 	HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
 #endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	/* moved from 0xf00 */
 	STD_EXCEPTION_PSERIES(., performance_monitor)
 	STD_EXCEPTION_PSERIES(., altivec_unavailable)
+	STD_EXCEPTION_PSERIES(., vsx_unavailable)
 
 /*
  * An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -836,6 +840,67 @@ _STATIC(load_up_altivec)
 	blr
 #endif /* CONFIG_ALTIVEC */
 
+	.align	7
+	.globl vsx_unavailable_common
+vsx_unavailable_common:
+	EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+	bne	.load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+	bl	.save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	ENABLE_INTS
+	bl	.vsx_unavailable_exception
+	b	.ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+	andi.	r5,r12,MSR_FP
+	beql+	load_up_fpu		/* skip if already loaded */
+	andis.	r5,r12,MSR_VEC@h
+	beql+	load_up_altivec		/* skip if already loaded */
+
+#ifndef CONFIG_SMP
+	ld	r3,last_task_used_vsx@got(r2)
+	ld	r4,0(r3)
+	cmpdi	0,r4,0
+	beq	1f
+	/* Disable VSX for last_task_used_vsx */
+	addi	r4,r4,THREAD
+	ld	r5,PT_REGS(r4)
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r6,MSR_VSX@h
+	andc	r6,r4,r6
+	std	r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+	ld	r4,PACACURRENT(r13)
+	addi	r4,r4,THREAD		/* Get THREAD */
+	li	r6,1
+	stw	r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+	/* enable use of VSX after return */
+	oris	r12,r12,MSR_VSX@h
+	std	r12,_MSR(r1)
+#ifndef CONFIG_SMP
+	/* Update last_task_used_math to 'current' */
+	ld	r4,PACACURRENT(r13)
+	std	r4,0(r3)
+#endif /* CONFIG_SMP */
+	b	fast_exception_return
+#endif /* CONFIG_VSX */
+
 /*
  * Hash table stuff
  */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
 
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+	mfmsr	r5
+	oris	r5,r5,MSR_VSX@h
+	mtmsrd	r5			/* enable use of VSX now */
+	isync
+
+	cmpdi	0,r3,0
+	beqlr-				/* if no previous owner, done */
+	addi	r3,r3,THREAD		/* want THREAD of task */
+	ld	r5,PT_REGS(r3)
+	cmpdi	0,r5,0
+	beq	1f
+	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+	lis	r3,MSR_VSX@h
+	andc	r4,r4,r3		/* disable VSX for previous task */
+	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+	li	r5,0
+	ld	r4,last_task_used_vsx@got(r2)
+	std	r5,0(r4)
+#endif /* CONFIG_SMP */
+	blr
+
+#endif /* CONFIG_VSX */
+
 /* kexec_wait(phys_cpu)
  *
  * wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
 	elf_fpregset_t		mc_fregs;
 	unsigned int		mc_pad[2];
 	elf_vrregset_t32	mc_vregs __attribute__((__aligned__(16)));
+	elf_vsrreghalf_t32      mc_vsregs __attribute__((__aligned__(16)));
 };
 
 struct ucontext32 { 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
 #ifdef CONFIG_ALTIVEC
 EXPORT_SYMBOL(giveup_altivec);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 EXPORT_SYMBOL(giveup_spe);
 #endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
 struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
 struct task_struct *last_task_used_spe = NULL;
 #endif
 
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
 {
+#ifdef CONFIG_VSX
+	int i;
+	elf_fpreg_t *reg;
+#endif
+
 	if (!tsk->thread.regs)
 		return 0;
 	flush_fp_to_thread(current);
 
+#ifdef CONFIG_VSX
+	reg = (elf_fpreg_t *)fpregs;
+	for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+		*reg = tsk->thread.TS_FPR(i);
+	memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
 	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
 
 	return 1;
 }
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
 	}
 }
 
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
 {
 	/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
 	 * separately, see below */
@@ -179,6 +192,80 @@ int dump_task_altivec(struct task_struct
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+	WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+	if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+		giveup_vsx(current);
+	else
+		giveup_vsx(NULL);	/* just enable vsx for kernel - force */
+#else
+	giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+	if (tsk->thread.regs) {
+		preempt_disable();
+		if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+			BUG_ON(tsk != current);
+#endif
+			giveup_vsx(tsk);
+		}
+		preempt_enable();
+	}
+}
+
+/*
+ * This dumps the lower half 64bits of the first 32 VSX registers.
+ * This needs to be called with dump_task_fp and dump_task_altivec to
+ * get all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+	elf_vrreg_t *reg;
+	double buf[32];
+	int i;
+
+	if (tsk == current)
+		flush_vsx_to_thread(tsk);
+
+	reg = (elf_vrreg_t *)vrregs;
+
+	for (i = 0; i < 32 ; i++)
+		buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+	memcpy(reg, buf, sizeof(buf));
+
+	return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+	int rc = 0;
+	elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+	rc = dump_task_altivec(tsk, regs);
+	if (rc)
+		return rc;
+	regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+	rc = dump_task_vsx(tsk, regs);
+#endif
+	return rc;
+}
+
 #ifdef CONFIG_SPE
 
 void enable_kernel_spe(void)
@@ -233,6 +320,10 @@ void discard_lazy_cpu_state(void)
 	if (last_task_used_altivec == current)
 		last_task_used_altivec = NULL;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (last_task_used_vsx == current)
+		last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	if (last_task_used_spe == current)
 		last_task_used_spe = NULL;
@@ -297,6 +388,10 @@ struct task_struct *__switch_to(struct t
 	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
 		giveup_altivec(prev);
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+		giveup_vsx(prev);
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/*
 	 * If the previous thread used spe in the last quantum
@@ -317,6 +412,10 @@ struct task_struct *__switch_to(struct t
 	if (new->thread.regs && last_task_used_altivec == new)
 		new->thread.regs->msr |= MSR_VEC;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	if (new->thread.regs && last_task_used_vsx == new)
+		new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_SPE
 	/* Avoid the trap.  On smp this this never happens since
 	 * we don't set last_task_used_spe
@@ -417,6 +516,8 @@ static struct regbit {
 	{MSR_EE,	"EE"},
 	{MSR_PR,	"PR"},
 	{MSR_FP,	"FP"},
+	{MSR_VEC,	"VEC"},
+	{MSR_VSX,	"VSX"},
 	{MSR_ME,	"ME"},
 	{MSR_IR,	"IR"},
 	{MSR_DR,	"DR"},
@@ -534,6 +635,7 @@ void prepare_to_copy(struct task_struct 
 {
 	flush_fp_to_thread(current);
 	flush_altivec_to_thread(current);
+	flush_vsx_to_thread(current);
 	flush_spe_to_thread(current);
 }
 
@@ -689,6 +791,9 @@ void start_thread(struct pt_regs *regs, 
 #endif
 
 	discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+	current->thread.used_vsr = 0;
+#endif
 	memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
 	current->thread.fpscr.val = 0;
 #ifdef CONFIG_ALTIVEC
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -350,6 +350,51 @@ static int vr_set(struct task_struct *ta
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell.  This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+		      const struct user_regset *regset)
+{
+	flush_vsx_to_thread(target);
+	return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   void *kbuf, void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				  target->thread.fpr, 0,
+				  32 * sizeof(vector128));
+
+	return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+		   unsigned int pos, unsigned int count,
+		   const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+
+	flush_vsx_to_thread(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+				 target->thread.fpr, 0,
+				 32 * sizeof(vector128));
+
+	return ret;
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_SPE
 
 /*
@@ -426,6 +471,9 @@ enum powerpc_regset {
 #ifdef CONFIG_ALTIVEC
 	REGSET_VMX,
 #endif
+#ifdef CONFIG_VSX
+	REGSET_VSX,
+#endif
 #ifdef CONFIG_SPE
 	REGSET_SPE,
 #endif
@@ -449,6 +497,13 @@ static const struct user_regset native_r
 		.active = vr_active, .get = vr_get, .set = vr_set
 	},
 #endif
+#ifdef CONFIG_VSX
+	[REGSET_VSX] = {
+		.n = 32,
+		.size = sizeof(vector128), .align = sizeof(vector128),
+		.active = vsr_active, .get = vsr_get, .set = vsr_set
+	},
+#endif
 #ifdef CONFIG_SPE
 	[REGSET_SPE] = {
 		.n = 35,
@@ -849,6 +904,21 @@ long arch_ptrace(struct task_struct *chi
 						 sizeof(u32)),
 					     (const void __user *) data);
 #endif
+#ifdef CONFIG_VSX
+	case PTRACE_GETVSRREGS:
+		return copy_regset_to_user(child, &user_ppc_native_view,
+					   REGSET_VSX,
+					   0, (32 * sizeof(vector128) +
+					       sizeof(u32)),
+					   (void __user *) data);
+
+	case PTRACE_SETVSRREGS:
+		return copy_regset_from_user(child, &user_ppc_native_view,
+					     REGSET_VSX,
+					     0, (32 * sizeof(vector128) +
+						 sizeof(u32)),
+					     (const void __user *) data);
+#endif
 #ifdef CONFIG_SPE
 	case PTRACE_GETEVRREGS:
 		/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
 		return 1;
+	/*
+	 * Copy VSR 0-31 upper half from thread_struct to local
+	 * buffer, then write that to userspace.  Also set MSR_VSX in
+	 * the saved MSR value to indicate that frame->mc_vregs
+	 * contains valid data
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		if (__copy_to_user(&frame->mc_vsregs, buf,
+				   ELF_NVSRHALFREG  * sizeof(double)))
+			return 1;
+		msr |= MSR_VSX;
+	}
 #else
 	/* save floating-point registers */
 	if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
 	for (i = 0; i < 32 ; i++)
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
+	/*
+	 * Force the process to reload the VSX registers from
+	 * current->thread when it next does VSX instruction.
+	 */
+	regs->msr &= ~MSR_VSX;
+	if (msr & MSR_VSX) {
+		/*
+		 * Restore altivec registers from the stack to a local
+		 * buffer, then write this out to the thread_struct
+		 */
+		if (__copy_from_user(buf, &sr->mc_vsregs,
+				     sizeof(sr->mc_vsregs)))
+			return 1;
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	} else if (current->thread.used_vsr)
+		for (i = 0; i < 32 ; i++)
+			current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
 #else
 	if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
 			     sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
 		buf[i] = current->thread.TS_FPR(i);
 	memcpy(&buf[i], &current->thread.fpscr, sizeof(double));
 	err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+	/*
+	 * Copy VSX low doubleword to local buffer for formatting,
+	 * then out to userspace.  Update v_regs to point after the
+	 * VMX data.
+	 */
+	if (current->thread.used_vsr) {
+		flush_vsx_to_thread(current);
+		v_regs += ELF_NVRREG;
+		for (i = 0; i < 32 ; i++)
+			buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+		err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+		/* set MSR_VSX in the MSR value in the frame to
+		 * indicate that sc->vs_reg) contains valid data.
+		 */
+		msr |= MSR_VSX;
+	}
 #else /* CONFIG_VSX */
 	/* copy fpr regs and fpscr */
 	err |= __copy_to_user(&sc->fp_regs, &current->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
 	 * This has to be done before copying stuff into current->thread.fpr/vr
 	 * for the reasons explained in the previous comment.
 	 */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
 		current->thread.TS_FPR(i) = buf[i];
 	memcpy(&current->thread.fpscr, &buf[i], sizeof(double));
 
+	/*
+	 * Get additional VSX data. Update v_regs to point after the
+	 * VMX data.  Copy VSX low doubleword from userspace to local
+	 * buffer for formatting, then into the taskstruct.
+	 */
+	v_regs += ELF_NVRREG;
+	if ((msr & MSR_VSX) != 0)
+		err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+	else
+		memset(buf, 0, 32 * sizeof(double));
+
+	for (i = 0; i < 32 ; i++)
+		current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
 #else
 	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
 #endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
 	die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
 }
 
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* A user program has executed an vsx instruction,
+		   but this kernel doesn't support vsx. */
+		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+		return;
+	}
+
+	printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+			"%lx at %lx\n", regs->trap, regs->nip);
+	die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
 void performance_monitor_exception(struct pt_regs *regs)
 {
 	perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
 }
 #endif /* CONFIG_ALTIVEC */
 
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		printk(KERN_EMERG "VSX assist exception in kernel mode"
+		       " at %lx\n", regs->nip);
+		die("Kernel VSX assist exception", regs, SIGILL);
+	}
+
+	flush_vsx_to_thread(current);
+	printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+	_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
 #ifdef CONFIG_FSL_BOOKE
 void CacheLockingException(struct pt_regs *regs, unsigned long address,
 			   unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
 #ifdef __powerpc64__
 # define ELF_NVRREG32	33	/* includes vscr & vrsave stuffed together */
 # define ELF_NVRREG	34	/* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32	/* Half the vsx registers */
 # define ELF_GREG_TYPE	elf_greg_t64
 #else
 # define ELF_NEVRREG	34	/* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
 typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
 #ifdef __powerpc64__
 typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
 #endif
 
 #ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
 typedef elf_vrregset_t elf_fpxregset_t;
 
 #ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
 #define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
 #endif
 
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
 #define PT_VRSAVE_32 (PT_VR0 + 33*4)
 #endif
 
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150	/* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 	/* each VSR reg occupies 4 slots in 32-bit */
+#endif
 #endif /* __powerpc64__ */
 
 /*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
 #define PTRACE_GETEVRREGS	20
 #define PTRACE_SETEVRREGS	21
 
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS	27
+#define PTRACE_SETVSRREGS	28
+
 /*
  * Get or set a debug register. The first 16 are DABR registers and the
  * second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
 #define MSR_ISF_LG	61              /* Interrupt 64b mode valid on 630 */
 #define MSR_HV_LG 	60              /* Hypervisor state */
 #define MSR_VEC_LG	25	        /* Enable AltiVec */
+#define MSR_VSX_LG	23		/* Enable VSX */
 #define MSR_POW_LG	18		/* Enable Power Management */
 #define MSR_WE_LG	18		/* Wait State Enable */
 #define MSR_TGPR_LG	17		/* TLB Update registers in use */
@@ -71,6 +72,7 @@
 #endif
 
 #define MSR_VEC		__MASK(MSR_VEC_LG)	/* Enable AltiVec */
+#define MSR_VSX		__MASK(MSR_VSX_LG)	/* Enable VSX */
 #define MSR_POW		__MASK(MSR_POW_LG)	/* Enable Power Management */
 #define MSR_WE		__MASK(MSR_WE_LG)	/* Wait State Enable */
 #define MSR_TGPR	__MASK(MSR_TGPR_LG)	/* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
  * it must be copied via a vector register to/from storage) or as a word.
  * The entry with index 33 contains the vrsave as the first word (offset 0)
  * within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words.  Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ *                    VSR doubleword 0               VSR doubleword 1
+ *           ----------------------------------------------------------------
+ *   VSR[0]  |             FPR[0]            |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[1]  |             FPR[1]            |                              |
+ *           ----------------------------------------------------------------
+ *           |              ...              |                              |
+ *           |              ...              |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[30] |             FPR[30]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[31] |             FPR[31]           |                              |
+ *           ----------------------------------------------------------------
+ *   VSR[32] |                             VR[0]                            |
+ *           ----------------------------------------------------------------
+ *   VSR[33] |                             VR[1]                            |
+ *           ----------------------------------------------------------------
+ *           |                              ...                             |
+ *           |                              ...                             |
+ *           ----------------------------------------------------------------
+ *   VSR[62] |                             VR[30]                           |
+ *           ----------------------------------------------------------------
+ *   VSR[63] |                             VR[31]                           |
+ *           ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve.  vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
  */
 	elf_vrreg_t	__user *v_regs;
-	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+	long		vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
 #endif
 };
 
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
 extern void giveup_altivec(struct task_struct *);
 extern void load_up_altivec(struct task_struct *);
 extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
 }
 #endif
 
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
 #ifdef CONFIG_SPE
 extern void flush_spe_to_thread(struct task_struct *);
 #else

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                             ` (7 preceding siblings ...)
  2008-06-25  4:07           ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add CONFIG_VSX config build option.  Must compile with POWER4, FPU and ALTIVEC.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---

 arch/powerpc/platforms/Kconfig.cputype |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
 
 	  If in doubt, say Y here.
 
+config VSX
+	bool "VSX Support"
+	depends on POWER4 && ALTIVEC && PPC_FPU
+	---help---
+
+	  This option enables kernel support for the Vector Scaler extensions
+	  to the PowerPC processor. The kernel currently supports saving and
+	  restoring VSX registers, and turning on the 'VSX enable' bit so user
+	  processes can execute VSX instructions.
+
+	  This option is only useful if you have a processor that supports
+	  VSX (P7 and above), but does not have any affect on a non-VSX
+	  CPUs (it does, however add code to the kernel).
+
+	  If in doubt, say Y here.
+
 config SPE
 	bool "SPE Support"
 	depends on E200 || E500

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH 6/9] powerpc: Add VSX CPU feature
  2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
                             ` (6 preceding siblings ...)
  2008-06-25  4:07           ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-25  4:07           ` Michael Neuling
  2008-06-25  4:07           ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
  8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25  4:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Add a VSX CPU feature.  Also add code to detect if VSX is available
from the device tree.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>

---

 arch/powerpc/kernel/prom.c     |    4 ++++
 include/asm-powerpc/cputable.h |   15 ++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
 	{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 	{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+	/* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+	{"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
 #ifdef CONFIG_PPC64
 	{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
 	{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
 #define PPC_FEATURE_HAS_DFP		0x00000400
 #define PPC_FEATURE_POWER6_EXT		0x00000200
 #define PPC_FEATURE_ARCH_2_06		0x00000100
+#define PPC_FEATURE_HAS_VSX		0x00000080
 
 #define PPC_FEATURE_TRUE_LE		0x00000002
 #define PPC_FEATURE_PPC_LE		0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
 #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0002000000000000)
 #define CPU_FTR_1T_SEGMENT		LONG_ASM_CONST(0x0004000000000000)
 #define CPU_FTR_NO_SLBIE_B		LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX			LONG_ASM_CONST(0x0010000000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
 #define PPC_FEATURE_HAS_ALTIVEC_COMP    0
 #endif
 
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP	CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP	0
+#define PPC_FEATURE_HAS_VSX_COMP    0
+#endif
+
 /* We only set the spe features if the kernel was compiled with spe
  * support
  */
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
 	    (CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 |	\
 	    CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 |	\
 	    CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T |		\
-	    CPU_FTR_1T_SEGMENT)
+	    CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
 #else
 enum {
 	CPU_FTRS_POSSIBLE =

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25  4:07           ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-25 14:08             ` Kumar Gala
  2008-06-25 15:34               ` Scott Wood
  0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-25 14:08 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras

>
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> @@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
> 	return -EPERM;
> }
>
> +/* Macros to workout the correct index for the FPR in the thread  
> struct */
> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)

Have you looked at what the compiler spits out here to make sure we  
aren't getting a divide?  Seems like we could use '& 0x1'.

> +#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) + FPRHALF(i)



>
> +
> long compat_arch_ptrace(struct task_struct *child, compat_long_t  
> request,
> 			compat_ulong_t caddr, compat_ulong_t cdata)
> {
> @@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
> 			 * to be an array of unsigned int (32 bits) - the
> 			 * index passed in is based on this assumption.
> 			 */
> -			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
> +			tmp = ((unsigned int *)child->thread.fpr)
> +				[FPRINDEX(index)];
> 		}
> 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
> 		break;
> @@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
> 		CHECK_FULL_REGS(child->thread.regs);
> 		if (numReg >= PT_FPR0) {
> 			flush_fp_to_thread(child);
> -			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
> +			tmp = ((unsigned long int *)child->thread.fpr)
> +				[FPRINDEX(numReg)];
> 		} else { /* register within PT_REGS struct */
> 			tmp = ptrace_get_reg(child, numReg);
> 		}
> @@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
> 			 * to be an array of unsigned int (32 bits) - the
> 			 * index passed in is based on this assumption.
> 			 */
> -			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
> +			((unsigned int *)child->thread.fpr)
> +				[FPRINDEX(index)] = data;
> 			ret = 0;
> 		}
> 		break;

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 14:08             ` Kumar Gala
@ 2008-06-25 15:34               ` Scott Wood
  2008-06-25 16:12                 ` Gabriel Paubert
  0 siblings, 1 reply; 106+ messages in thread
From: Scott Wood @ 2008-06-25 15:34 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras

Kumar Gala wrote:
>> +/* Macros to workout the correct index for the FPR in the thread 
>> struct */
>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> 
> Have you looked at what the compiler spits out here to make sure we 
> aren't getting a divide?  Seems like we could use '& 0x1'.

GCC's not *that* dumb.  However, you may get some unnecessary 
sign-twiddling if "i" is signed.

-Scott

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 15:34               ` Scott Wood
@ 2008-06-25 16:12                 ` Gabriel Paubert
  2008-06-25 16:17                   ` Scott Wood
  2008-06-25 17:08                   ` Andreas Schwab
  0 siblings, 2 replies; 106+ messages in thread
From: Gabriel Paubert @ 2008-06-25 16:12 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras

On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> Kumar Gala wrote:
> >>+/* Macros to workout the correct index for the FPR in the thread 
> >>struct */
> >>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >
> >Have you looked at what the compiler spits out here to make sure we 
> >aren't getting a divide?  Seems like we could use '& 0x1'.
> 
> GCC's not *that* dumb.  However, you may get some unnecessary 
> sign-twiddling if "i" is signed.

Not for modulo 2, it's only an even/odd choice and GCC 
implements that efficiently IIRC. For other powers of 2,
making the left hand side unsigned helps the compiler.

The right shift OTOH might be faster if "i" is unsigned 
since right signed right shifts affect the carry on PPC (I really
don't know if srawi is slower than srwi on some processors,
srwi is a form of rlwinm which is always fast).

	Gabriel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 16:12                 ` Gabriel Paubert
@ 2008-06-25 16:17                   ` Scott Wood
  2008-06-25 17:07                     ` Kumar Gala
  2008-06-26 10:44                     ` [PATCH 2/9] " Gabriel Paubert
  2008-06-25 17:08                   ` Andreas Schwab
  1 sibling, 2 replies; 106+ messages in thread
From: Scott Wood @ 2008-06-25 16:17 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras

Gabriel Paubert wrote:
> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
>> Kumar Gala wrote:
>>>> +/* Macros to workout the correct index for the FPR in the thread 
>>>> struct */
>>>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>>>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>>> Have you looked at what the compiler spits out here to make sure we 
>>> aren't getting a divide?  Seems like we could use '& 0x1'.
>> GCC's not *that* dumb.  However, you may get some unnecessary 
>> sign-twiddling if "i" is signed.
> 
> Not for modulo 2, it's only an even/odd choice and GCC 
> implements that efficiently IIRC. For other powers of 2,
> making the left hand side unsigned helps the compiler.

 From this:

int foo(int x)
{
	return x % 2;
}

I get this with -O3:

foo:
         mr 0,3
         srawi 3,3,1
         addze 3,3
         slwi 3,3,1
         subf 3,3,0
         blr
         .size   foo, .-foo
         .ident  "GCC: (GNU) 4.1.2"

Changing it to "x & 1", or to unsigned, gives this:

foo:
         rlwinm 3,3,0,31,31
         blr
         .size   foo, .-foo
         .ident  "GCC: (GNU) 4.1.2"

Maybe newer GCCs are better?

-Scott

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 16:17                   ` Scott Wood
@ 2008-06-25 17:07                     ` Kumar Gala
  2008-06-26  0:09                       ` Michael Neuling
  2008-06-26 10:44                     ` [PATCH 2/9] " Gabriel Paubert
  1 sibling, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-25 17:07 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras


On Jun 25, 2008, at 11:17 AM, Scott Wood wrote:

> Gabriel Paubert wrote:
>> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
>>> Kumar Gala wrote:
>>>>> +/* Macros to workout the correct index for the FPR in the  
>>>>> thread struct */
>>>>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>>>>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>>>> Have you looked at what the compiler spits out here to make sure  
>>>> we aren't getting a divide?  Seems like we could use '& 0x1'.
>>> GCC's not *that* dumb.  However, you may get some unnecessary sign- 
>>> twiddling if "i" is signed.
>> Not for modulo 2, it's only an even/odd choice and GCC implements  
>> that efficiently IIRC. For other powers of 2,
>> making the left hand side unsigned helps the compiler.
>
> From this:
>
> int foo(int x)
> {
> 	return x % 2;
> }
>
> I get this with -O3:
>
> foo:
>        mr 0,3
>        srawi 3,3,1
>        addze 3,3
>        slwi 3,3,1
>        subf 3,3,0
>        blr
>        .size   foo, .-foo
>        .ident  "GCC: (GNU) 4.1.2"
>
> Changing it to "x & 1", or to unsigned, gives this:
>
> foo:
>        rlwinm 3,3,0,31,31
>        blr
>        .size   foo, .-foo
>        .ident  "GCC: (GNU) 4.1.2"
>
> Maybe newer GCCs are better?

Nope. gcc-4.3.0 from fedora 9:

foo:
         mr 0,3
         srawi 3,3,1
         addze 3,3
         slwi 3,3,1
         subf 3,3,0
         blr

bar:
         rlwinm 3,3,0,31,31
         blr

if you make 'x' unsigned things are better.

- k

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 16:12                 ` Gabriel Paubert
  2008-06-25 16:17                   ` Scott Wood
@ 2008-06-25 17:08                   ` Andreas Schwab
  1 sibling, 0 replies; 106+ messages in thread
From: Andreas Schwab @ 2008-06-25 17:08 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Scott Wood, linuxppc-dev, Michael Neuling, Paul Mackerras

Gabriel Paubert <paubert@iram.es> writes:

> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
>> Kumar Gala wrote:
>> >>+/* Macros to workout the correct index for the FPR in the thread 
>> >>struct */
>> >>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>> >>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>> >
>> >Have you looked at what the compiler spits out here to make sure we 
>> >aren't getting a divide?  Seems like we could use '& 0x1'.
>> 
>> GCC's not *that* dumb.  However, you may get some unnecessary 
>> sign-twiddling if "i" is signed.
>
> Not for modulo 2, it's only an even/odd choice

That's wrong.  -1 % 2 == -1, 1 % 2 == 1.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 17:07                     ` Kumar Gala
@ 2008-06-26  0:09                       ` Michael Neuling
  2008-06-26  7:07                         ` [PATCH] " Michael Neuling
  0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-26  0:09 UTC (permalink / raw)
  To: Kumar Gala; +Cc: Scott Wood, linuxppc-dev, Paul Mackerras

In message <1DD06CDB-428E-4832-93CA-6F0404CA6692@kernel.crashing.org> you wrote:
> 
> On Jun 25, 2008, at 11:17 AM, Scott Wood wrote:
> 
> > Gabriel Paubert wrote:
> >> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> >>> Kumar Gala wrote:
> >>>>> +/* Macros to workout the correct index for the FPR in the  
> >>>>> thread struct */
> >>>>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>>>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >>>> Have you looked at what the compiler spits out here to make sure  
> >>>> we aren't getting a divide?  Seems like we could use '& 0x1'.
> >>> GCC's not *that* dumb.  However, you may get some unnecessary sign- 
> >>> twiddling if "i" is signed.
> >> Not for modulo 2, it's only an even/odd choice and GCC implements  
> >> that efficiently IIRC. For other powers of 2,
> >> making the left hand side unsigned helps the compiler.
> >
> > From this:
> >
> > int foo(int x)
> > {
> > 	return x % 2;
> > }
> >
> > I get this with -O3:
> >
> > foo:
> >        mr 0,3
> >        srawi 3,3,1
> >        addze 3,3
> >        slwi 3,3,1
> >        subf 3,3,0
> >        blr
> >        .size   foo, .-foo
> >        .ident  "GCC: (GNU) 4.1.2"
> >
> > Changing it to "x & 1", or to unsigned, gives this:
> >
> > foo:
> >        rlwinm 3,3,0,31,31
> >        blr
> >        .size   foo, .-foo
> >        .ident  "GCC: (GNU) 4.1.2"
> >
> > Maybe newer GCCs are better?
> 
> Nope. gcc-4.3.0 from fedora 9:
> 
> foo:
>          mr 0,3
>          srawi 3,3,1
>          addze 3,3
>          slwi 3,3,1
>          subf 3,3,0
>          blr
> 
> bar:
>          rlwinm 3,3,0,31,31
>          blr
> 
> if you make 'x' unsigned things are better.

I've changed it to '& 0x1', which compiles to something better here.

Mikey

^ permalink raw reply	[flat|nested] 106+ messages in thread

* [PATCH] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-26  0:09                       ` Michael Neuling
@ 2008-06-26  7:07                         ` Michael Neuling
  0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-26  7:07 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers.  Update all code to use these new macros.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Changes '% 2' to '& 1' as noticed by Kumar

---

 arch/powerpc/kernel/align.c      |    6 ++--
 arch/powerpc/kernel/process.c    |    2 -
 arch/powerpc/kernel/ptrace.c     |   10 ++++--
 arch/powerpc/kernel/ptrace32.c   |   14 +++++++--
 arch/powerpc/kernel/softemu8xx.c |    4 +-
 arch/powerpc/math-emu/math.c     |   56 +++++++++++++++++++--------------------
 include/asm-powerpc/ppc_asm.h    |    5 ++-
 include/asm-powerpc/processor.h  |    4 ++
 8 files changed, 58 insertions(+), 43 deletions(-)

Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
 static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
 			   unsigned int reg, unsigned int flags)
 {
-	char *ptr = (char *) &current->thread.fpr[reg];
+	char *ptr = (char *) &current->thread.TS_FPR(reg);
 	int i, ret;
 
 	if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
 				return -EFAULT;
 		}
 	} else if (flags & F) {
-		data.dd = current->thread.fpr[reg];
+		data.dd = current->thread.TS_FPR(reg);
 		if (flags & S) {
 			/* Single-precision FP store requires conversion... */
 #ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
 		if (unlikely(ret))
 			return -EFAULT;
 	} else if (flags & F)
-		current->thread.fpr[reg] = data.dd;
+		current->thread.TS_FPR(reg) = data.dd;
 	else
 		regs->gpr[reg] = data.ll;
 
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
 		return 0;
 	flush_fp_to_thread(current);
 
-	memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+	memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
 
 	return 1;
 }
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,7 +218,7 @@ static int fpr_get(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
 				   &target->thread.fpr, 0, -1);
@@ -231,7 +231,7 @@ static int fpr_set(struct task_struct *t
 	flush_fp_to_thread(target);
 
 	BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
-		     offsetof(struct thread_struct, fpr[32]));
+		     offsetof(struct thread_struct, TS_FPR(32)));
 
 	return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
 				  &target->thread.fpr, 0, -1);
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
 			tmp = ptrace_get_reg(child, (int) index);
 		} else {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned long *)child->thread.fpr)
+				[TS_FPRWIDTH * (index - PT_FPR0)];
 		}
 		ret = put_user(tmp,(unsigned long __user *) data);
 		break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
 			ret = ptrace_put_reg(child, index, data);
 		} else {
 			flush_fp_to_thread(child);
-			((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned long *)child->thread.fpr)
+				[TS_FPRWIDTH * (index - PT_FPR0)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
 	return -EPERM;
 }
 
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) & 1)
+#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) + FPRHALF(i)
+
 long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			compat_ulong_t caddr, compat_ulong_t cdata)
 {
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+			tmp = ((unsigned int *)child->thread.fpr)
+				[FPRINDEX(index)];
 		}
 		ret = put_user((unsigned int)tmp, (u32 __user *)data);
 		break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
 		CHECK_FULL_REGS(child->thread.regs);
 		if (numReg >= PT_FPR0) {
 			flush_fp_to_thread(child);
-			tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+			tmp = ((unsigned long int *)child->thread.fpr)
+				[FPRINDEX(numReg)];
 		} else { /* register within PT_REGS struct */
 			tmp = ptrace_get_reg(child, numReg);
 		} 
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
 			 * to be an array of unsigned int (32 bits) - the
 			 * index passed in is based on this assumption.
 			 */
-			((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+			((unsigned int *)child->thread.fpr)
+				[FPRINDEX(index)] = data;
 			ret = 0;
 		}
 		break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 	disp = instword & 0xffff;
 
 	ea = (u32 *)(regs->gpr[idxreg] + disp);
-	ip = (u32 *)&current->thread.fpr[flreg];
+	ip = (u32 *)&current->thread.TS_FPR(flreg);
 
 	switch ( inst )
 	{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
 		break;
 	case FMR:
 		/* assume this is a fp move -- Cort */
-		memcpy(ip, &current->thread.fpr[(instword>>11)&0x1f],
+		memcpy(ip, &current->thread.TS_FPR((instword>>11)&0x1f),
 		       sizeof(double));
 		break;
 	default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
 	case LFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		break;
 	case LFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		lfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
 	case STFD:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		break;
 	case STFDU:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		stfd(op0, op1, op2, op3);
 		regs->gpr[idx] = (unsigned long)op1;
 		break;
 	case OP63:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		fmr(op0, op1, op2, op3);
 		break;
 	default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
 
 	switch (type) {
 	case AB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case AC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case ABC:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op2 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >>  6) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op2 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >>  6) & 0x1f);
 		break;
 
 	case D:
 		idx = (insn >> 16) & 0x1f;
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
 		break;
 
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
 			goto illegal;
 
 		sdisp = (insn & 0xffff);
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)(regs->gpr[idx] + sdisp);
 		break;
 
 	case X:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		break;
 
 	case XA:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
 		break;
 
 	case XB:
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		if (!idx) {
 			if (((insn >> 1) & 0x3ff) == STFIWX)
 				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XEU:
 		idx = (insn >> 16) & 0x1f;
-		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
+		op0 = (void *)&current->thread.TS_FPR((insn >> 21) & 0x1f);
 		op1 = (void *)((idx ? regs->gpr[idx] : 0)
 				+ regs->gpr[(insn >> 11) & 0x1f]);
 		break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
 	case XCR:
 		op0 = (void *)&regs->ccr;
 		op1 = (void *)((insn >> 23) & 0x7);
-		op2 = (void *)&current->thread.fpr[(insn >> 16) & 0x1f];
-		op3 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op2 = (void *)&current->thread.TS_FPR((insn >> 16) & 0x1f);
+		op3 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
 
 	case XFLB:
 		op0 = (void *)((insn >> 17) & 0xff);
-		op1 = (void *)&current->thread.fpr[(insn >> 11) & 0x1f];
+		op1 = (void *)&current->thread.TS_FPR((insn >> 11) & 0x1f);
 		break;
 
 	default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
 
 #include <linux/stringify.h>
 #include <asm/asm-compat.h>
+#include <asm/processor.h>
 
 #ifndef __ASSEMBLY__
 #error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);				
 #define REST_8GPRS(n, base)	REST_4GPRS(n, base); REST_4GPRS(n+4, base)
 #define REST_10GPRS(n, base)	REST_8GPRS(n, base); REST_2GPRS(n+8, base)
 
-#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base)	stfd	n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
 #define SAVE_2FPRS(n, base)	SAVE_FPR(n, base); SAVE_FPR(n+1, base)
 #define SAVE_4FPRS(n, base)	SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
 #define SAVE_8FPRS(n, base)	SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
 #define SAVE_16FPRS(n, base)	SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
 #define SAVE_32FPRS(n, base)	SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base)	lfd	n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
 #define REST_2FPRS(n, base)	REST_FPR(n, base); REST_FPR(n+1, base)
 #define REST_4FPRS(n, base)	REST_2FPRS(n, base); REST_2FPRS(n+2, base)
 #define REST_8FPRS(n, base)	REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -12,6 +12,8 @@
 
 #include <asm/reg.h>
 
+#define TS_FPRWIDTH 1
+
 #ifndef __ASSEMBLY__
 #include <linux/compiler.h>
 #include <asm/ptrace.h>
@@ -136,6 +138,8 @@ typedef struct {
 	unsigned long seg;
 } mm_segment_t;
 
+#define TS_FPR(i) fpr[i]
+
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
 	unsigned long	ksp_limit;	/* if ksp <= ksp_limit stack overflow */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
  2008-06-25 16:17                   ` Scott Wood
  2008-06-25 17:07                     ` Kumar Gala
@ 2008-06-26 10:44                     ` Gabriel Paubert
  1 sibling, 0 replies; 106+ messages in thread
From: Gabriel Paubert @ 2008-06-26 10:44 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras

On Wed, Jun 25, 2008 at 11:17:45AM -0500, Scott Wood wrote:
> Gabriel Paubert wrote:
> >On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> >>Kumar Gala wrote:
> >>>>+/* Macros to workout the correct index for the FPR in the thread 
> >>>>struct */
> >>>>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>>>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >>>Have you looked at what the compiler spits out here to make sure we 
> >>>aren't getting a divide?  Seems like we could use '& 0x1'.
> >>GCC's not *that* dumb.  However, you may get some unnecessary 
> >>sign-twiddling if "i" is signed.
> >
> >Not for modulo 2, it's only an even/odd choice and GCC 
> >implements that efficiently IIRC. For other powers of 2,
> >making the left hand side unsigned helps the compiler.
> 
> From this:
> 
> int foo(int x)
> {
> 	return x % 2;
> }
> 
> I get this with -O3:
> 
> foo:
>         mr 0,3
>         srawi 3,3,1
>         addze 3,3
>         slwi 3,3,1
>         subf 3,3,0
>         blr
>         .size   foo, .-foo
>         .ident  "GCC: (GNU) 4.1.2"
> 

Indeed. Signed modulo results can be negative...

There are probably better ways to implement this case
on PPC, for example:

	rlwinm tmp,input,4,27,28 ; make shift amount from LSB and MSB 
	lis result,0xff01
	srw result,result,tmp
	; result is now 0x00 for even, 0x01 for odd positive,
	; and 0xff for odd negative
	extsb result,result

No carry, shorter dependency length (although srw may be slow
on Cell it seems, but addze may be worse).


> Changing it to "x & 1", or to unsigned, gives this:
> 
> foo:
>         rlwinm 3,3,0,31,31
>         blr
>         .size   foo, .-foo
>         .ident  "GCC: (GNU) 4.1.2"
> 
> Maybe newer GCCs are better?

Nope, but unsigned is often better for the right shift.

	Gabriel

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2008-06-26 11:21 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-18  0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-18  0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-18  0:47 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-18  0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-18 19:35   ` Kumar Gala
2008-06-18 22:58     ` Paul Mackerras
2008-06-19  4:13       ` Kumar Gala
2008-06-19  4:30         ` Michael Neuling
2008-06-19  4:22   ` Kumar Gala
2008-06-19  4:35     ` Michael Neuling
2008-06-19  4:58       ` Kumar Gala
2008-06-19  5:37         ` Michael Neuling
2008-06-19  5:47           ` Kumar Gala
2008-06-19  6:01             ` Michael Neuling
2008-06-19  6:10               ` Kumar Gala
2008-06-19  9:33                 ` Benjamin Herrenschmidt
2008-06-19 13:24                   ` Kumar Gala
2008-06-18  0:47 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-18  0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-18 14:53   ` Kumar Gala
2008-06-18 23:55     ` Michael Neuling
2008-06-18  0:47 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-18  0:47 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-18  0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-18 16:28   ` Joel Schopp
2008-06-19  6:51   ` David Woodhouse
2008-06-19  7:00     ` Michael Neuling
2008-06-18  0:47 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
2008-06-18 23:54   ` Michael Neuling
2008-06-20  4:13 ` Michael Neuling
2008-06-20  4:13   ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-20  6:39     ` Kumar Gala
2008-06-22 11:29       ` Michael Neuling
2008-06-20  4:13   ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-20  6:35     ` Kumar Gala
2008-06-20  4:13   ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-20  6:44     ` Kumar Gala
2008-06-20  4:13   ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-20  4:13   ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-20  4:13   ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-20  4:13   ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-20  4:13   ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-20  4:13   ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-20  6:37   ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
2008-06-20  8:15     ` Michael Neuling
2008-06-23  5:31   ` Michael Neuling
2008-06-23  5:31     ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-23  5:31     ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-23  5:31     ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-23  5:31     ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-23  5:31     ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-23  5:31     ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-23  5:31     ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-23  5:31     ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-23  5:31     ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-23  7:38     ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-23  7:38       ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-23  7:38       ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-23 14:46         ` Kumar Gala
2008-06-23  7:38       ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-23  7:38       ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-23  7:38       ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-23  7:38       ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-23  7:38       ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-23  7:38       ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-23  7:38       ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-24 10:57       ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-24 10:57         ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-24 10:57         ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-24 13:47           ` Kumar Gala
2008-06-24 10:57         ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-24 14:07           ` Kumar Gala
2008-06-24 16:33             ` Segher Boessenkool
2008-06-25  0:25             ` Michael Neuling
2008-06-24 10:57         ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-24 10:57         ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-24 10:57         ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-24 14:01           ` Kumar Gala
2008-06-24 10:57         ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-24 14:19           ` Kumar Gala
2008-06-24 10:57         ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-24 14:19           ` Kumar Gala
2008-06-24 10:57         ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-24 14:06           ` Kumar Gala
2008-06-25  0:06             ` Michael Neuling
2008-06-25  2:19               ` Kumar Gala
2008-06-25  4:07         ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-25  4:07           ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-25  4:07           ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-25 14:08             ` Kumar Gala
2008-06-25 15:34               ` Scott Wood
2008-06-25 16:12                 ` Gabriel Paubert
2008-06-25 16:17                   ` Scott Wood
2008-06-25 17:07                     ` Kumar Gala
2008-06-26  0:09                       ` Michael Neuling
2008-06-26  7:07                         ` [PATCH] " Michael Neuling
2008-06-26 10:44                     ` [PATCH 2/9] " Gabriel Paubert
2008-06-25 17:08                   ` Andreas Schwab
2008-06-25  4:07           ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-25  4:07           ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-25  4:07           ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-25  4:07           ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-25  4:07           ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-25  4:07           ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-25  4:07           ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.