* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
@ 2008-06-18 0:47 Michael Neuling
2008-06-18 0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
` (10 more replies)
0 siblings, 11 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
This series is on top of the POWER7 cputable entry patch.
Paulus: please consider for your 2.6.27 tree.
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (3 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 14:53 ` Kumar Gala
2008-06-18 0:47 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
` (5 subsequent siblings)
10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This will never happen in reality, but it looks bad.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/signal_32.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
{
+ unsigned long msr = regs->msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_VEC;
}
/* else assert((regs->msr & MSR_VEC) == 0) */
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_SPE;
}
/* else assert((regs->msr & MSR_SPE) == 0) */
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
#endif /* CONFIG_SPE */
+ if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+ return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x38000000UL + sigret, &frame->tramp[0])
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-18 0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
` (8 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/asm-offsets.c | 2 -
arch/powerpc/kernel/process.c | 5 ++-
arch/powerpc/kernel/ptrace.c | 14 +++++----
arch/powerpc/kernel/ptrace32.c | 9 ++++--
arch/powerpc/kernel/signal_32.c | 6 ++--
arch/powerpc/kernel/signal_64.c | 13 +++++---
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++-------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 7 ++++
11 files changed, 71 insertions(+), 56 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -66,7 +66,7 @@ int main(void)
DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
- DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
+ DEFINE(THREAD_FPR0, offsetof(struct thread_struct, TS_FPR(0)));
DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
#ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
- memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+ memset(current->thread.TS_FPRSTART, 0,
+ sizeof(current->thread.TS_FPRSTART));
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ &target->thread.TS_FPRSTART, 0, -1);
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ &target->thread.TS_FPRSTART, 0, -1);
}
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -122,7 +122,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +163,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (numReg - PT_FPR0)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +219,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -343,7 +343,7 @@ static int save_user_regs(struct pt_regs
/* save general and floating-point registers */
if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.fpr,
+ __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
ELF_NFPREG * sizeof(double)))
return 1;
@@ -431,7 +431,7 @@ static long restore_user_regs(struct pt_
/*
* Do this before updating the thread state in
- * current->thread.fpr/vr/evr. That way, if we get preempted
+ * current->thread.FPR/vr/evr. That way, if we get preempted
* and another task grabs the FPU/Altivec/SPE, it won't be
* tempted to save the current CPU state into the thread_struct
* and corrupt what we are writing there.
@@ -441,7 +441,7 @@ static long restore_user_regs(struct pt_
/* force the process to reload the FP registers from
current->thread when it next does FP instructions */
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+ if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
return 1;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -116,7 +116,8 @@ static long setup_sigcontext(struct sigc
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPRSTART,
+ FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -168,7 +169,7 @@ static long restore_sigcontext(struct pt
/*
* Do this before updating the thread state in
- * current->thread.fpr/vr. That way, if we get preempted
+ * current->thread.TS_FPR/vr. That way, if we get preempted
* and another task grabs the FPU/Altivec, it won't be
* tempted to save the current CPU state into the thread_struct
* and corrupt what we are writing there.
@@ -177,12 +178,14 @@ static long restore_sigcontext(struct pt
/*
* Force reload of FP/VEC.
- * This has to be done before copying stuff into current->thread.fpr/vr
- * for the reasons explained in the previous comment.
+ * This has to be done before copying stuff into
+ * current->thread.TS_FPR/vr for the reasons explained in the
+ * previous comment.
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+ err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
+ FP_REGS_SIZE);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,9 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+#define TS_FPRSTART fpr
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
@@ -197,12 +200,13 @@ struct thread_struct {
.fpexc_mode = MSR_FE0 | MSR_FE1, \
}
#else
+#define FPVSR_INIT_THREAD .fpr = {0}
#define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
.fs = KERNEL_DS, \
- .fpr = {0}, \
+ FPVSR_INIT_THREAD, \
.fpscr = { .val = 0, }, \
.fpexc_mode = 0, \
}
@@ -289,4 +293,5 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 3/9] powerpc: Move altivec_unavailable
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (2 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
` (6 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/head_64.S | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b performance_monitor_pSeries
- STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+ . = 0xf20
+ b altivec_unavailable_pSeries
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+ STD_EXCEPTION_PSERIES(., altivec_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
` (9 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 2 +-
arch/powerpc/kernel/head_32.S | 6 ++++--
arch/powerpc/kernel/head_64.S | 8 +++++---
arch/powerpc/kernel/head_booke.h | 6 ++++--
4 files changed, 14 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
#endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
- b fast_exception_return
+ blr
/*
* giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b ProgramCheck
END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
- bne load_up_fpu /* if from user, just load it up */
- addi r3,r1,STACK_FRAME_OVERHEAD
+ beq 1f
+ bl load_up_fpu /* if from user, just load it up */
+ b fast_exception_return
+1: addi r3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
/* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b .load_up_fpu
+1: bl .load_up_fpu
+ b fast_exception_return
.align 7
.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
- bne .load_up_altivec /* if from user, just load it up */
+ bnel .load_up_altivec
+ b fast_exception_return
END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
#endif
bl .save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
#endif /* CONFIG_SMP */
/* restore registers and return */
- b fast_exception_return
+ blr
#endif /* CONFIG_ALTIVEC */
/*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
#define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG; \
- bne load_up_fpu; /* if from user, just load it up */ \
- addi r3,r1,STACK_FRAME_OVERHEAD; \
+ beq 1f; \
+ bl load_up_fpu; /* if from user, just load it up */ \
+ b fast_exception_return; \
+1: addi r3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
#endif /* __HEAD_BOOKE_H__ */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-18 0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-18 0:47 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 19:35 ` Kumar Gala
2008-06-19 4:22 ` Kumar Gala
2008-06-18 0:47 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
` (7 subsequent siblings)
10 siblings, 2 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This patch introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/asm-offsets.c | 4 ++
arch/powerpc/kernel/ptrace.c | 28 +++++++++++++++
arch/powerpc/kernel/signal_32.c | 59 +++++++++++++++++++++++++--------
arch/powerpc/kernel/signal_64.c | 36 +++++++++++++++++---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++
include/asm-powerpc/processor.h | 31 ++++++++++++++++-
6 files changed, 155 insertions(+), 19 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpvsr.vsr[0]));
+ DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
#else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = target->thread.TS_FPR(i);
+ memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.TS_FPRSTART, 0, -1);
+#endif
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+ if (i)
+ return i;
+ for (i = 0; i < 32 ; i++)
+ target->thread.TS_FPR(i) = buf[i];
+ memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+ return 0;
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.TS_FPRSTART, 0, -1);
+#endif
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
int sigret)
{
unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
- /* save general and floating-point registers */
- if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
- ELF_NFPREG * sizeof(double)))
+ /* save general registers */
+ if (save_general_regs(regs, frame))
return 1;
#ifdef CONFIG_ALTIVEC
@@ -368,7 +370,21 @@ static int save_user_regs(struct pt_regs
if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* save FPR copy to local buffer then write to the thread_struct */
+ flush_fp_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+ return 1;
+#else
+ /* save floating-point registers */
+ if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
+ ELF_NFPREG * sizeof(double)))
+ return 1;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */
if (current->thread.used_spe) {
@@ -411,6 +427,10 @@ static long restore_user_regs(struct pt_
long err;
unsigned int save_r2 = 0;
unsigned long msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/*
* restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +458,11 @@ static long restore_user_regs(struct pt_
*/
discard_lazy_cpu_state();
- /* force the process to reload the FP registers from
- current->thread when it next does FP instructions */
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
- sizeof(sr->mc_fregs)))
- return 1;
-
#ifdef CONFIG_ALTIVEC
- /* force the process to reload the altivec registers from
- current->thread when it next does altivec instructions */
+ /*
+ * Force the process to reload the altivec registers from
+ * current->thread when it next does altivec instructions
+ */
regs->msr &= ~MSR_VEC;
if (msr & MSR_VEC) {
/* restore altivec registers from the stack */
@@ -462,6 +477,24 @@ static long restore_user_regs(struct pt_
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+
+#else
+ if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
+ sizeof(sr->mc_fregs)))
+ return 1;
+#endif /* CONFIG_VSX */
+ /*
+ * force the process to reload the FP registers from
+ * current->thread when it next does FP instructions
+ */
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
#ifdef CONFIG_SPE
/* force the process to reload the spe registers from
current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
#endif
unsigned long msr = regs->msr;
long err = 0;
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+ int i;
+#endif
flush_fp_to_thread(current);
@@ -112,12 +116,22 @@ static long setup_sigcontext(struct sigc
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
+ flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ /* Copy FP to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+ /* copy fpr regs and fpscr */
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPR(0),
+ FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
err |= __put_user(&sc->gp_regs, &sc->regs);
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPRSTART,
- FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -136,6 +150,9 @@ static long restore_sigcontext(struct pt
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+#endif
unsigned long err = 0;
unsigned long save_r13 = 0;
elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -184,9 +201,6 @@ static long restore_sigcontext(struct pt
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
- FP_REGS_SIZE);
-
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
if (err)
@@ -205,7 +219,19 @@ static long restore_sigcontext(struct pt
else
current->thread.vrsave = 0;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* restore floating point */
+ err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+ if (err)
+ return err;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
+ FP_REGS_SIZE);
+#endif
return err;
}
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only usefully if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ cpu (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
/* Lazy FPU handling on uni-processor */
extern struct task_struct *last_task_used_math;
extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
extern struct task_struct *last_task_used_spe;
#ifdef CONFIG_PPC32
@@ -136,8 +137,13 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpvsr.fp[i].fpr
+#define TS_FPRSTART fpvsr.fp
+#else
#define TS_FPR(i) fpr[i]
#define TS_FPRSTART fpr
+#endif
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
@@ -155,8 +161,19 @@ struct thread_struct {
unsigned long dbcr0; /* debug control register values */
unsigned long dbcr1;
#endif
+#ifdef CONFIG_VSX
+ /* First 32 VSX registers (overlap with fpr[32]) */
+ union {
+ struct {
+ double fpr;
+ double vsrlow;
+ } fp[32];
+ vector128 vsr[32];
+ } fpvsr __attribute__((aligned(16)));
+#else
double fpr[32]; /* Complete floating point set */
- struct { /* fpr ... fpscr must be contiguous */
+#endif
+ struct {
unsigned int pad;
unsigned int val; /* Floating point status */
@@ -176,6 +193,10 @@ struct thread_struct {
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* VSR status */
+ int used_vsr; /* set if process has used altivec */
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
unsigned long evr[32]; /* upper 32-bits of SPE regs */
u64 acc; /* Accumulator */
@@ -200,7 +221,11 @@ struct thread_struct {
.fpexc_mode = MSR_FE0 | MSR_FE1, \
}
#else
+#ifdef CONFIG_VSX
+#define FPVSR_INIT_THREAD .fpvsr = { .vsr = 0, }
+#else
#define FPVSR_INIT_THREAD .fpr = {0}
+#endif
#define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
@@ -293,5 +318,9 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
#define TS_FPRSPACING 1
+#endif
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (6 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 16:28 ` Joel Schopp
2008-06-19 6:51 ` David Woodhouse
2008-06-18 0:47 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
` (2 subsequent siblings)
10 siblings, 2 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/prom.c | 3 +++
include/asm-powerpc/cputable.h | 13 +++++++++++++
2 files changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,9 @@ static struct feature_property {
{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
#define PPC_FEATURE_HAS_DFP 0x00000400
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
+#define PPC_FEATURE_HAS_VSX 0x00000080
#define PPC_FEATURE_TRUE_LE 0x00000002
#define PPC_FEATURE_PPC_LE 0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
#ifndef __ASSEMBLY__
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
#define PPC_FEATURE_HAS_ALTIVEC_COMP 0
#endif
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP 0
+#define PPC_FEATURE_HAS_VSX_COMP 0
+#endif
+
/* We only set the spe features if the kernel was compiled with spe
* support
*/
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (5 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
` (3 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
REST_10GPRS(22, base)
#endif
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
+ ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
#define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ REST_32FPRS(n,base); \
+ b 3f; \
+2: REST_32VSRS(n,c,base); \
+3:
+
+#define SAVE_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ SAVE_32FPRS(n,base); \
+ b 3f; \
+2: SAVE_32VSRS(n,c,base); \
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#endif
+
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
#define vr30 30
#define vr31 31
+/* VSX Registers (VSRs) */
+
+#define vsr0 0
+#define vsr1 1
+#define vsr2 2
+#define vsr3 3
+#define vsr4 4
+#define vsr5 5
+#define vsr6 6
+#define vsr7 7
+#define vsr8 8
+#define vsr9 9
+#define vsr10 10
+#define vsr11 11
+#define vsr12 12
+#define vsr13 13
+#define vsr14 14
+#define vsr15 15
+#define vsr16 16
+#define vsr17 17
+#define vsr18 18
+#define vsr19 19
+#define vsr20 20
+#define vsr21 21
+#define vsr22 22
+#define vsr23 23
+#define vsr24 24
+#define vsr25 25
+#define vsr26 26
+#define vsr27 27
+#define vsr28 28
+#define vsr29 29
+#define vsr30 30
+#define vsr31 31
+#define vsr32 32
+#define vsr33 33
+#define vsr34 34
+#define vsr35 35
+#define vsr36 36
+#define vsr37 37
+#define vsr38 38
+#define vsr39 39
+#define vsr40 40
+#define vsr41 41
+#define vsr42 42
+#define vsr43 43
+#define vsr44 44
+#define vsr45 45
+#define vsr46 46
+#define vsr47 47
+#define vsr48 48
+#define vsr49 49
+#define vsr50 50
+#define vsr51 51
+#define vsr52 52
+#define vsr53 53
+#define vsr54 54
+#define vsr55 55
+#define vsr56 56
+#define vsr57 57
+#define vsr58 58
+#define vsr59 59
+#define vsr60 60
+#define vsr61 61
+#define vsr62 62
+#define vsr63 63
+
/* SPE Registers (EVPRs) */
#define evr0 0
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (4 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
` (4 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/entry_64.S | 5 +
arch/powerpc/kernel/fpu.S | 16 ++++-
arch/powerpc/kernel/head_64.S | 65 +++++++++++++++++++++++
arch/powerpc/kernel/misc_64.S | 33 +++++++++++
arch/powerpc/kernel/ppc32.h | 1
arch/powerpc/kernel/ppc_ksyms.c | 3 +
arch/powerpc/kernel/process.c | 108 ++++++++++++++++++++++++++++++++++++++-
arch/powerpc/kernel/ptrace.c | 70 +++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 33 +++++++++++
arch/powerpc/kernel/signal_64.c | 31 ++++++++++-
arch/powerpc/kernel/traps.c | 29 ++++++++++
include/asm-powerpc/elf.h | 6 +-
include/asm-powerpc/ptrace.h | 12 ++++
include/asm-powerpc/reg.h | 2
include/asm-powerpc/sigcontext.h | 37 +++++++++++++
include/asm-powerpc/system.h | 9 +++
include/linux/elf.h | 1
17 files changed, 453 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflr r20 /* Return to switch caller */
mfmsr r22
li r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r0,r0,MSR_VSX@h /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
oris r0,r0,MSR_VEC@h /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
_GLOBAL(load_up_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5) /* enable use of fpu now */
isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
- SAVE_32FPRS(0, r4)
+ SAVE_32FPVSRS(0, r5, r4)
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
- REST_32FPRS(0, r5)
+ REST_32FPVSRS(0, r4, r5)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
_GLOBAL(giveup_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5) /* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r5,0
- SAVE_32FPRS(0, r3)
+ SAVE_32FPVSRS(0, r4 ,r3)
mffs fr0
stfd fr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b altivec_unavailable_pSeries
+ . = 0xf40
+ b vsx_unavailable_pSeries
+
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+ STD_EXCEPTION_PSERIES(., vsx_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
blr
#endif /* CONFIG_ALTIVEC */
+ .align 7
+ .globl vsx_unavailable_common
+vsx_unavailable_common:
+ EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ bne .load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+ bl .save_nvgprs
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ ENABLE_INTS
+ bl .vsx_unavailable_exception
+ b .ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+ andi. r5,r12,MSR_FP
+ beql+ load_up_fpu /* skip if already loaded */
+ andis. r5,r12,MSR_VEC@h
+ beql+ load_up_altivec /* skip if already loaded */
+
+#ifndef CONFIG_SMP
+ ld r3,last_task_used_vsx@got(r2)
+ ld r4,0(r3)
+ cmpdi 0,r4,0
+ beq 1f
+ /* Disable VSX for last_task_used_vsx */
+ addi r4,r4,THREAD
+ ld r5,PT_REGS(r4)
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r6,MSR_VSX@h
+ andc r6,r4,r6
+ std r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+ ld r4,PACACURRENT(r13)
+ addi r4,r4,THREAD /* Get THREAD */
+ li r6,1
+ stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+ /* enable use of VSX after return */
+ oris r12,r12,MSR_VSX@h
+ std r12,_MSR(r1)
+#ifndef CONFIG_SMP
+ /* Update last_task_used_math to 'current' */
+ ld r4,PACACURRENT(r13)
+ std r4,0(r3)
+#endif /* CONFIG_SMP */
+ b fast_exception_return
+#endif /* CONFIG_VSX */
+
/*
* Hash table stuff
*/
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+ mfmsr r5
+ oris r5,r5,MSR_VSX@h
+ mtmsrd r5 /* enable use of VSX now */
+ isync
+
+ cmpdi 0,r3,0
+ beqlr- /* if no previous owner, done */
+ addi r3,r3,THREAD /* want THREAD of task */
+ ld r5,PT_REGS(r3)
+ cmpdi 0,r5,0
+ beq 1f
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r3,MSR_VSX@h
+ andc r4,r4,r3 /* disable VSX for previous task */
+ std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+ li r5,0
+ ld r4,last_task_used_vsx@got(r2)
+ std r5,0(r4)
+#endif /* CONFIG_SMP */
+ blr
+
+#endif /* CONFIG_VSX */
+
/* kexec_wait(phys_cpu)
*
* wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
elf_fpregset_t mc_fregs;
unsigned int mc_pad[2];
elf_vrregset_t32 mc_vregs __attribute__((__aligned__(16)));
+ elf_vsrreghalf_t32 mc_vsregs __attribute__((__aligned__(16)));
};
struct ucontext32 {
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
#ifdef CONFIG_ALTIVEC
EXPORT_SYMBOL(giveup_altivec);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
EXPORT_SYMBOL(giveup_spe);
#endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
#ifndef CONFIG_SMP
struct task_struct *last_task_used_math = NULL;
struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
struct task_struct *last_task_used_spe = NULL;
#endif
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
{
+#ifdef CONFIG_VSX
+ int i;
+ elf_fpreg_t *reg;
+#endif
+
if (!tsk->thread.regs)
return 0;
flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ reg = (elf_fpreg_t *)fpregs;
+ for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+ *reg = tsk->thread.TS_FPR(i);
+ memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
return 1;
}
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
}
}
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
{
/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
* separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+ WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+ if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+ giveup_vsx(current);
+ else
+ giveup_vsx(NULL); /* just enable vsx for kernel - force */
+#else
+ giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+ if (tsk->thread.regs) {
+ preempt_disable();
+ if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+ BUG_ON(tsk != current);
+#endif
+ giveup_vsx(tsk);
+ }
+ preempt_enable();
+ }
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers. This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+ /* Grab only the first half */
+ const int nregs = 32;
+ elf_vrreg_t *reg;
+
+ if (tsk == current)
+ flush_vsx_to_thread(tsk);
+
+ reg = (elf_vrreg_t *)vrregs;
+
+ /* copy the first 32 vsr registers */
+ memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+ return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+ int rc = 0;
+ elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+ rc = dump_task_altivec(tsk, regs);
+ if (rc)
+ return rc;
+ regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+ rc = dump_task_altivec(tsk, regs);
+#endif
+ return rc;
+}
+
#ifdef CONFIG_SPE
void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
if (last_task_used_altivec == current)
last_task_used_altivec = NULL;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (last_task_used_vsx == current)
+ last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
if (last_task_used_spe == current)
last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
giveup_altivec(prev);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+ giveup_vsx(prev);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/*
* If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
if (new->thread.regs && last_task_used_altivec == new)
new->thread.regs->msr |= MSR_VEC;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (new->thread.regs && last_task_used_vsx == new)
+ new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* Avoid the trap. On smp this this never happens since
* we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
{MSR_EE, "EE"},
{MSR_PR, "PR"},
{MSR_FP, "FP"},
+ {MSR_VEC, "VEC"},
+ {MSR_VSX, "VSX"},
{MSR_ME, "ME"},
{MSR_IR, "IR"},
{MSR_DR, "DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct
{
flush_fp_to_thread(current);
flush_altivec_to_thread(current);
+ flush_vsx_to_thread(current);
flush_spe_to_thread(current);
}
@@ -689,8 +790,13 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+ memset(current->thread.fpvsr.vsr, 0, sizeof(current->thread.fpvsr.vsr));
+ current->thread.used_vsr = 0;
+#else
memset(current->thread.TS_FPRSTART, 0,
sizeof(current->thread.TS_FPRSTART));
+#endif /* CONFIG_VSX */
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell. This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ flush_vsx_to_thread(target);
+ return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.fpvsr.vsr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ &target->thread.fpvsr.vsr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_SPE
/*
@@ -427,6 +472,9 @@ enum powerpc_regset {
#ifdef CONFIG_ALTIVEC
REGSET_VMX,
#endif
+#ifdef CONFIG_VSX
+ REGSET_VSX,
+#endif
#ifdef CONFIG_SPE
REGSET_SPE,
#endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
.active = vr_active, .get = vr_get, .set = vr_set
},
#endif
+#ifdef CONFIG_VSX
+ [REGSET_VSX] = {
+ .core_note_type = NT_PPC_VSX, .n = 34,
+ .size = sizeof(vector128), .align = sizeof(vector128),
+ .active = vsr_active, .get = vsr_get, .set = vsr_set
+ },
+#endif
#ifdef CONFIG_SPE
[REGSET_SPE] = {
.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
sizeof(u32)),
(const void __user *) data);
#endif
+#ifdef CONFIG_VSX
+ case PTRACE_GETVSRREGS:
+ return copy_regset_to_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (void __user *) data);
+
+ case PTRACE_SETVSRREGS:
+ return copy_regset_from_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (const void __user *) data);
+#endif
#ifdef CONFIG_SPE
case PTRACE_GETEVRREGS:
/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -379,6 +379,21 @@ static int save_user_regs(struct pt_regs
if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
return 1;
+ /*
+ * Copy VSR 0-31 upper half from thread_struct to local
+ * buffer, then write that to userspace. Also set MSR_VSX in
+ * the saved MSR value to indicate that frame->mc_vregs
+ * contains valid data
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpvsr.fp[i].vsrlow;
+ if (__copy_to_user(&frame->mc_vsregs, buf,
+ ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ msr |= MSR_VSX;
+ }
#else
/* save floating-point registers */
if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
@@ -484,6 +499,24 @@ static long restore_user_regs(struct pt_
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Force the process to reload the VSX registers from
+ * current->thread when it next does VSX instruction.
+ */
+ regs->msr &= ~MSR_VSX;
+ if (msr & MSR_VSX) {
+ /*
+ * Restore altivec registers from the stack to a local
+ * buffer, then write this out to the thread_struct
+ */
+ if (__copy_from_user(buf, &sr->mc_vsregs,
+ sizeof(sr->mc_vsregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpvsr.fp[i].vsrlow = buf[i];
+ } else if (current->thread.used_vsr)
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpvsr.fp[i].vsrlow = 0;
#else
if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
buf[i] = current->thread.TS_FPR(i);
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+ /*
+ * Copy VSX low doubleword to local buffer for formatting,
+ * then out to userspace. Update v_regs to point after the
+ * VMX data.
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ v_regs += ELF_NVRREG;
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpvsr.fp[i].vsrlow;
+ err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+ /* set MSR_VSX in the MSR value in the frame to
+ * indicate that sc->vs_reg) contains valid data.
+ */
+ msr |= MSR_VSX;
+ }
#else /* CONFIG_VSX */
/* copy fpr regs and fpscr */
err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPR(0),
@@ -199,7 +215,7 @@ static long restore_sigcontext(struct pt
* current->thread.TS_FPR/vr for the reasons explained in the
* previous comment.
*/
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
@@ -228,6 +244,19 @@ static long restore_sigcontext(struct pt
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Get additional VSX data. Update v_regs to point after the
+ * VMX data. Copy VSX low doubleword from userspace to local
+ * buffer for formatting, then into the taskstruct.
+ */
+ v_regs += ELF_NVRREG;
+ if ((msr & MSR_VSX) != 0)
+ err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+ else
+ memset(buf, 0, 32 * sizeof(double));
+
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpvsr.fp[i].vsrlow = buf[i];
#else
err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
FP_REGS_SIZE);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
}
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /* A user program has executed an vsx instruction,
+ but this kernel doesn't support vsx. */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return;
+ }
+
+ printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+ "%lx at %lx\n", regs->trap, regs->nip);
+ die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
void performance_monitor_exception(struct pt_regs *regs)
{
perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+ if (!user_mode(regs)) {
+ printk(KERN_EMERG "VSX assist exception in kernel mode"
+ " at %lx\n", regs->nip);
+ die("Kernel VSX assist exception", regs, SIGILL);
+ }
+
+ flush_vsx_to_thread(current);
+ printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_FSL_BOOKE
void CacheLockingException(struct pt_regs *regs, unsigned long address,
unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
#ifdef __powerpc64__
# define ELF_NVRREG32 33 /* includes vscr & vrsave stuffed together */
# define ELF_NVRREG 34 /* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32 /* Half the vsx registers */
# define ELF_GREG_TYPE elf_greg_t64
#else
# define ELF_NEVRREG 34 /* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
#ifdef __powerpc64__
typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
#endif
#ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
typedef elf_vrregset_t elf_fpxregset_t;
#ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
#define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
#endif
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
#define PT_VRSAVE_32 (PT_VR0 + 33*4)
#endif
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 /* each VSR reg occupies 4 slots in 32-bit */
+#endif
#endif /* __powerpc64__ */
/*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
#define PTRACE_GETEVRREGS 20
#define PTRACE_SETEVRREGS 21
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS 27
+#define PTRACE_SETVSRREGS 28
+
/*
* Get or set a debug register. The first 16 are DABR registers and the
* second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
#define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */
#define MSR_HV_LG 60 /* Hypervisor state */
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
#define MSR_WE_LG 18 /* Wait State Enable */
#define MSR_TGPR_LG 17 /* TLB Update registers in use */
@@ -71,6 +72,7 @@
#endif
#define MSR_VEC __MASK(MSR_VEC_LG) /* Enable AltiVec */
+#define MSR_VSX __MASK(MSR_VSX_LG) /* Enable VSX */
#define MSR_POW __MASK(MSR_POW_LG) /* Enable Power Management */
#define MSR_WE __MASK(MSR_WE_LG) /* Wait State Enable */
#define MSR_TGPR __MASK(MSR_TGPR_LG) /* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
* it must be copied via a vector register to/from storage) or as a word.
* The entry with index 33 contains the vrsave as the first word (offset 0)
* within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words. Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ * VSR doubleword 0 VSR doubleword 1
+ * ----------------------------------------------------------------
+ * VSR[0] | FPR[0] | |
+ * ----------------------------------------------------------------
+ * VSR[1] | FPR[1] | |
+ * ----------------------------------------------------------------
+ * | ... | |
+ * | ... | |
+ * ----------------------------------------------------------------
+ * VSR[30] | FPR[30] | |
+ * ----------------------------------------------------------------
+ * VSR[31] | FPR[31] | |
+ * ----------------------------------------------------------------
+ * VSR[32] | VR[0] |
+ * ----------------------------------------------------------------
+ * VSR[33] | VR[1] |
+ * ----------------------------------------------------------------
+ * | ... |
+ * | ... |
+ * ----------------------------------------------------------------
+ * VSR[62] | VR[30] |
+ * ----------------------------------------------------------------
+ * VSR[63] | VR[31] |
+ * ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve. vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
*/
elf_vrreg_t __user *v_regs;
- long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+ long vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
#endif
};
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
extern void giveup_altivec(struct task_struct *);
extern void load_up_altivec(struct task_struct *);
extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
extern void enable_kernel_spe(void);
extern void giveup_spe(struct task_struct *);
extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
}
#endif
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
#ifdef CONFIG_SPE
extern void flush_spe_to_thread(struct task_struct *);
#else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
#define NT_PPC_VMX 0x100 /* PowerPC Altivec/VMX registers */
#define NT_PPC_SPE 0x101 /* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX 0x102 /* PowerPC VSX registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (7 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-18 0:47 ` Michael Neuling
2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
2008-06-20 4:13 ` Michael Neuling
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 0:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -171,6 +171,22 @@ config VSX
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only usefully if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (8 preceding siblings ...)
2008-06-18 0:47 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-18 13:05 ` Kumar Gala
2008-06-18 23:54 ` Michael Neuling
2008-06-20 4:13 ` Michael Neuling
10 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-18 13:05 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
> The following set of patches adds Vector Scalar Extentions (VSX)
> support for POWER7. Includes context switch, ptrace and signals
> support.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
> This series is on top of the POWER7 cputable entry patch.
>
> Paulus: please consider for your 2.6.27 tree.
I bit better explanation of what VSX would be useful. Its not clear
to me exactly how these instructions behave such that we have to touch
all this code.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-18 0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-18 14:53 ` Kumar Gala
2008-06-18 23:55 ` Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-18 14:53 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit. This will never happen in reality, but it looks bad.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> arch/powerpc/kernel/signal_32.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
probably worth commenting on why this will never happen.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-18 0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-18 16:28 ` Joel Schopp
2008-06-19 6:51 ` David Woodhouse
1 sibling, 0 replies; 106+ messages in thread
From: Joel Schopp @ 2008-06-18 16:28 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
A couple of these lines originated with me.
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Michael Neuling wrote:
> Add a VSX CPU feature. Also add code to detect if VSX is available
> from the device tree.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> arch/powerpc/kernel/prom.c | 3 +++
> include/asm-powerpc/cputable.h | 13 +++++++++++++
> 2 files changed, 16 insertions(+)
>
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
> @@ -609,6 +609,9 @@ static struct feature_property {
> {"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
> {"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
> #endif /* CONFIG_ALTIVEC */
> +#ifdef CONFIG_VSX
> + {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
> +#endif /* CONFIG_VSX */
> #ifdef CONFIG_PPC64
> {"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
> {"ibm,purr", 1, CPU_FTR_PURR, 0},
> Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
> @@ -27,6 +27,7 @@
> #define PPC_FEATURE_HAS_DFP 0x00000400
> #define PPC_FEATURE_POWER6_EXT 0x00000200
> #define PPC_FEATURE_ARCH_2_06 0x00000100
> +#define PPC_FEATURE_HAS_VSX 0x00000080
>
> #define PPC_FEATURE_TRUE_LE 0x00000002
> #define PPC_FEATURE_PPC_LE 0x00000001
> @@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
> #define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
> #define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
> #define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
> +#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
>
> #ifndef __ASSEMBLY__
>
> @@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
> #define PPC_FEATURE_HAS_ALTIVEC_COMP 0
> #endif
>
> +/* We only set the VSX features if the kernel was compiled with VSX
> + * support
> + */
> +#ifdef CONFIG_VSX
> +#define CPU_FTR_VSX_COMP CPU_FTR_VSX
> +#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
> +#else
> +#define CPU_FTR_VSX_COMP 0
> +#define PPC_FEATURE_HAS_VSX_COMP 0
> +#endif
> +
> /* We only set the spe features if the kernel was compiled with spe
> * support
> */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
>
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-18 0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-18 19:35 ` Kumar Gala
2008-06-18 22:58 ` Paul Mackerras
2008-06-19 4:22 ` Kumar Gala
1 sibling, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-18 19:35 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
> The layout of the new VSR registers and how they overlap on top of the
> legacy FPR and VR registers is:
>
> VSR doubleword 0 VSR doubleword 1
>
> ----------------------------------------------------------------
> VSR[0] | FPR[0]
> | |
>
> ----------------------------------------------------------------
> VSR[1] | FPR[1]
> | |
>
> ----------------------------------------------------------------
> | ...
> | |
> | ...
> | |
>
> ----------------------------------------------------------------
> VSR[30] | FPR[30]
> | |
>
> ----------------------------------------------------------------
> VSR[31] | FPR[31]
> | |
>
> ----------------------------------------------------------------
> VSR[32] |
> VR[0] |
>
> ----------------------------------------------------------------
> VSR[33] |
> VR[1] |
>
> ----------------------------------------------------------------
>
> | ... |
>
> | ... |
>
> ----------------------------------------------------------------
> VSR[62] |
> VR[30] |
>
> ----------------------------------------------------------------
> VSR[63] |
> VR[31] |
>
> ----------------------------------------------------------------
>
> VSX has 64 128bit registers. The first 32 regs overlap with the FP
> registers and hence extend them with and additional 64 bits. The
> second 32 regs overlap with the VMX registers.
>
> This patch introduces the thread_struct changes required to reflect
> this register layout. Ptrace and signals code is updated so that the
> floating point registers are correctly accessed from the thread_struct
> when CONFIG_VSX is enabled.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
Is VSX mutually exclusive with altivec/fp? is there a MSR bit for it?
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-18 19:35 ` Kumar Gala
@ 2008-06-18 22:58 ` Paul Mackerras
2008-06-19 4:13 ` Kumar Gala
0 siblings, 1 reply; 106+ messages in thread
From: Paul Mackerras @ 2008-06-18 22:58 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling
Kumar Gala writes:
> Is VSX mutually exclusive with altivec/fp? is there a MSR bit for it?
It's not exclusive, it's an extension of altivec/fp, and yes it has
its own MSR bit to enable it.
Paul.
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-18 23:54 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 23:54 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
> On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
>
> > The following set of patches adds Vector Scalar Extentions (VSX)
> > support for POWER7. Includes context switch, ptrace and signals
> > support.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> > This series is on top of the POWER7 cputable entry patch.
> >
> > Paulus: please consider for your 2.6.27 tree.
>
> I bit better explanation of what VSX would be useful. Its not clear
> to me exactly how these instructions behave such that we have to touch
> all this code.
There is a register layout description which it looks like you found at
the top of patch 5.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-18 14:53 ` Kumar Gala
@ 2008-06-18 23:55 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-18 23:55 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <DB1B686B-FE98-486B-B345-D18408C51135@kernel.crashing.org> you wrote:
>
> On Jun 17, 2008, at 7:47 PM, Michael Neuling wrote:
>
> > If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> > bit. This will never happen in reality, but it looks bad.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >
> > arch/powerpc/kernel/signal_32.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
>
> probably worth commenting on why this will never happen.
Ok, I'll update the comments.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-18 22:58 ` Paul Mackerras
@ 2008-06-19 4:13 ` Kumar Gala
2008-06-19 4:30 ` Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, Michael Neuling
On Jun 18, 2008, at 5:58 PM, Paul Mackerras wrote:
> Kumar Gala writes:
>
>> Is VSX mutually exclusive with altivec/fp? is there a MSR bit for
>> it?
>
> It's not exclusive, it's an extension of altivec/fp, and yes it has
> its own MSR bit to enable it.
what MSR bit does it use... I'm not seeing the code add or test a new
MSR bit anywhere.
What exactly do you mean by its an extension of altivec/fp? Are the
instructions considered part of altivec/fp or is it just reusing the
register storage like SPE?
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-18 0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-18 19:35 ` Kumar Gala
@ 2008-06-19 4:22 ` Kumar Gala
2008-06-19 4:35 ` Michael Neuling
1 sibling, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 4:22 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
>
>
> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> /* Lazy FPU handling on uni-processor */
> extern struct task_struct *last_task_used_math;
> extern struct task_struct *last_task_used_altivec;
> +extern struct task_struct *last_task_used_vsx;
> extern struct task_struct *last_task_used_spe;
>
> #ifdef CONFIG_PPC32
> @@ -136,8 +137,13 @@ typedef struct {
> unsigned long seg;
> } mm_segment_t;
>
> +#ifdef CONFIG_VSX
> +#define TS_FPR(i) fpvsr.fp[i].fpr
> +#define TS_FPRSTART fpvsr.fp
> +#else
> #define TS_FPR(i) fpr[i]
> #define TS_FPRSTART fpr
> +#endif
>
> struct thread_struct {
> unsigned long ksp; /* Kernel stack pointer */
> @@ -155,8 +161,19 @@ struct thread_struct {
> unsigned long dbcr0; /* debug control register values */
> unsigned long dbcr1;
> #endif
> +#ifdef CONFIG_VSX
> + /* First 32 VSX registers (overlap with fpr[32]) */
> + union {
> + struct {
> + double fpr;
> + double vsrlow;
> + } fp[32];
> + vector128 vsr[32];
> + } fpvsr __attribute__((aligned(16)));
Do we really need a union here? what would happen if you just changed
the type of fpr[32] from double to vector if #CONFIG_VSX?
I really dont like the union and think we can just make the storage
look opaque which is the key. I doubt we every really care about
using fpr[] as a double in the kernel.
Also, the attribute is redundant, vector is already aligned(16).
> +#else
> double fpr[32]; /* Complete floating point set */
> - struct { /* fpr ... fpscr must be contiguous */
> +#endif
> + struct {
>
> unsigned int pad;
> unsigned int val; /* Floating point status */
> @@ -176,6 +193,10 @@ struct thread_struct {
> unsigned long vrsave;
> int used_vr; /* set if process has used altivec */
> #endif /* CONFIG_ALTIVEC */
> +#ifdef CONFIG_VSX
> + /* VSR status */
> + int used_vsr; /* set if process has used altivec */
> +#endif /* CONFIG_VSX */
> #ifdef CONFIG_SPE
> unsigned long evr[32]; /* upper 32-bits of SPE regs */
> u64 acc; /* Accumulator */
> @@ -200,7 +221,11 @@ struct thread_struct {
> .fpexc_mode = MSR_FE0 | MSR_FE1, \
> }
> #else
> +#ifdef CONFIG_VSX
> +#define FPVSR_INIT_THREAD .fpvsr = { .vsr = 0, }
> +#else
> #define FPVSR_INIT_THREAD .fpr = {0}
> +#endif
> #define INIT_THREAD { \
> .ksp = INIT_SP, \
> .ksp_limit = INIT_SP_LIMIT, \
> @@ -293,5 +318,9 @@ static inline void prefetchw(const void
>
> #endif /* __KERNEL__ */
> #endif /* __ASSEMBLY__ */
> +#ifdef CONFIG_VSX
> +#define TS_FPRSPACING 2
> +#else
> #define TS_FPRSPACING 1
> +#endif
> #endif /* _ASM_POWERPC_PROCESSOR_H */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 4:13 ` Kumar Gala
@ 2008-06-19 4:30 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-19 4:30 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <C780D687-D505-4A01-BED8-9866F4D0160A@kernel.crashing.org> you wrote:
>
> On Jun 18, 2008, at 5:58 PM, Paul Mackerras wrote:
>
> > Kumar Gala writes:
> >
> >> Is VSX mutually exclusive with altivec/fp? is there a MSR bit for
> >> it?
> >
> > It's not exclusive, it's an extension of altivec/fp, and yes it has
> > its own MSR bit to enable it.
>
> what MSR bit does it use... I'm not seeing the code add or test a new
> MSR bit anywhere.
It's introduced in patch 8.
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
> What exactly do you mean by its an extension of altivec/fp? Are the
> instructions considered part of altivec/fp or is it just reusing the
> register storage like SPE?
VSX is considered separate instructions, but it uses the same
architected registers as FP and VMX.
ie if you execute a VSX instruction which touches VSX regsister 0, it'll
change FP register 0 (and visa versa).
Also, if execute a VSX instruction which touches VSX register 32, it'll
change VMX register 0 (and visa versa). In fact, for this patch we use
the 128bit VMX load/stores to perform the context save/restore on the
VSX registers 32-63.
I guess in theory you could have VSX without FP and VMX, but this patch
assumes you have FP and VMX if you have VSX.
This set of patches should allow any crazy mix of FP, VMX and VSX code
and the architected state should be context switched correctly.
Sorry, I'm not familiar with how SPE works, so I can't comment on it's
relevance.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 4:22 ` Kumar Gala
@ 2008-06-19 4:35 ` Michael Neuling
2008-06-19 4:58 ` Kumar Gala
0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-19 4:35 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <5AEB0769-1394-4924-803D-C40CAF685519@kernel.crashing.org> you wrote
:
> >
> >
> > Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> > +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> > /* Lazy FPU handling on uni-processor */
> > extern struct task_struct *last_task_used_math;
> > extern struct task_struct *last_task_used_altivec;
> > +extern struct task_struct *last_task_used_vsx;
> > extern struct task_struct *last_task_used_spe;
> >
> > #ifdef CONFIG_PPC32
> > @@ -136,8 +137,13 @@ typedef struct {
> > unsigned long seg;
> > } mm_segment_t;
> >
> > +#ifdef CONFIG_VSX
> > +#define TS_FPR(i) fpvsr.fp[i].fpr
> > +#define TS_FPRSTART fpvsr.fp
> > +#else
> > #define TS_FPR(i) fpr[i]
> > #define TS_FPRSTART fpr
> > +#endif
> >
> > struct thread_struct {
> > unsigned long ksp; /* Kernel stack pointer */
> > @@ -155,8 +161,19 @@ struct thread_struct {
> > unsigned long dbcr0; /* debug control register values */
> > unsigned long dbcr1;
> > #endif
> > +#ifdef CONFIG_VSX
> > + /* First 32 VSX registers (overlap with fpr[32]) */
> > + union {
> > + struct {
> > + double fpr;
> > + double vsrlow;
> > + } fp[32];
> > + vector128 vsr[32];
> > + } fpvsr __attribute__((aligned(16)));
>
> Do we really need a union here? what would happen if you just changed
> the type of fpr[32] from double to vector if #CONFIG_VSX?
>
> I really dont like the union and think we can just make the storage
> look opaque which is the key. I doubt we every really care about
> using fpr[] as a double in the kernel.
I did something similar to this for the first cut of this patch, but it
made the code accessing this structure much less readable.
Personally, I think the union is good as it represents the true
structure of what it's storing.
> Also, the attribute is redundant, vector is already aligned(16).
Ok, I'll remove.
Mikey
>
> > +#else
> > double fpr[32]; /* Complete floating point set */
> > - struct { /* fpr ... fpscr must be contiguous */
> > +#endif
> > + struct {
> >
> > unsigned int pad;
> > unsigned int val; /* Floating point status */
> > @@ -176,6 +193,10 @@ struct thread_struct {
> > unsigned long vrsave;
> > int used_vr; /* set if process has used altivec */
> > #endif /* CONFIG_ALTIVEC */
> > +#ifdef CONFIG_VSX
> > + /* VSR status */
> > + int used_vsr; /* set if process has used altivec */
> > +#endif /* CONFIG_VSX */
> > #ifdef CONFIG_SPE
> > unsigned long evr[32]; /* upper 32-bits of SPE regs */
> > u64 acc; /* Accumulator */
> > @@ -200,7 +221,11 @@ struct thread_struct {
> > .fpexc_mode = MSR_FE0 | MSR_FE1, \
> > }
> > #else
> > +#ifdef CONFIG_VSX
> > +#define FPVSR_INIT_THREAD .fpvsr = { .vsr = 0, }
> > +#else
> > #define FPVSR_INIT_THREAD .fpr = {0}
> > +#endif
> > #define INIT_THREAD { \
> > .ksp = INIT_SP, \
> > .ksp_limit = INIT_SP_LIMIT, \
> > @@ -293,5 +318,9 @@ static inline void prefetchw(const void
> >
> > #endif /* __KERNEL__ */
> > #endif /* __ASSEMBLY__ */
> > +#ifdef CONFIG_VSX
> > +#define TS_FPRSPACING 2
> > +#else
> > #define TS_FPRSPACING 1
> > +#endif
> > #endif /* _ASM_POWERPC_PROCESSOR_H */
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
>
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 4:35 ` Michael Neuling
@ 2008-06-19 4:58 ` Kumar Gala
2008-06-19 5:37 ` Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 4:58 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 18, 2008, at 11:35 PM, Michael Neuling wrote:
> In message <5AEB0769-1394-4924-803D-
> C40CAF685519@kernel.crashing.org> you wrote
> :
>>>
>>>
>>> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
>>> ===================================================================
>>> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
>>> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
>>> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
>>> /* Lazy FPU handling on uni-processor */
>>> extern struct task_struct *last_task_used_math;
>>> extern struct task_struct *last_task_used_altivec;
>>> +extern struct task_struct *last_task_used_vsx;
>>> extern struct task_struct *last_task_used_spe;
>>>
>>> #ifdef CONFIG_PPC32
>>> @@ -136,8 +137,13 @@ typedef struct {
>>> unsigned long seg;
>>> } mm_segment_t;
>>>
>>> +#ifdef CONFIG_VSX
>>> +#define TS_FPR(i) fpvsr.fp[i].fpr
>>> +#define TS_FPRSTART fpvsr.fp
>>> +#else
>>> #define TS_FPR(i) fpr[i]
>>> #define TS_FPRSTART fpr
>>> +#endif
>>>
>>> struct thread_struct {
>>> unsigned long ksp; /* Kernel stack pointer */
>>> @@ -155,8 +161,19 @@ struct thread_struct {
>>> unsigned long dbcr0; /* debug control register values */
>>> unsigned long dbcr1;
>>> #endif
>>> +#ifdef CONFIG_VSX
>>> + /* First 32 VSX registers (overlap with fpr[32]) */
>>> + union {
>>> + struct {
>>> + double fpr;
>>> + double vsrlow;
>>> + } fp[32];
>>> + vector128 vsr[32];
how about:
union {
struct {
double fp;
double vsrlow;
} fpr;
vector128 v;
} fpvsr[32];
>>>
>>> + } fpvsr __attribute__((aligned(16)));
>>
>> Do we really need a union here? what would happen if you just
>> changed
>> the type of fpr[32] from double to vector if #CONFIG_VSX?
>>
>> I really dont like the union and think we can just make the storage
>> look opaque which is the key. I doubt we every really care about
>> using fpr[] as a double in the kernel.
>
> I did something similar to this for the first cut of this patch, but
> it
> made the code accessing this structure much less readable.
really, what code is that?
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 4:58 ` Kumar Gala
@ 2008-06-19 5:37 ` Michael Neuling
2008-06-19 5:47 ` Kumar Gala
0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-19 5:37 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <A62DFD0C-387A-4833-B266-99DB1B09E101@kernel.crashing.org> you wrote
:
>
> On Jun 18, 2008, at 11:35 PM, Michael Neuling wrote:
>
> > In message <5AEB0769-1394-4924-803D-
> > C40CAF685519@kernel.crashing.org> you wrote
> > :
> >>>
> >>>
> >>> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> >>> ===================================================================
> >>> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> >>> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> >>> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> >>> /* Lazy FPU handling on uni-processor */
> >>> extern struct task_struct *last_task_used_math;
> >>> extern struct task_struct *last_task_used_altivec;
> >>> +extern struct task_struct *last_task_used_vsx;
> >>> extern struct task_struct *last_task_used_spe;
> >>>
> >>> #ifdef CONFIG_PPC32
> >>> @@ -136,8 +137,13 @@ typedef struct {
> >>> unsigned long seg;
> >>> } mm_segment_t;
> >>>
> >>> +#ifdef CONFIG_VSX
> >>> +#define TS_FPR(i) fpvsr.fp[i].fpr
> >>> +#define TS_FPRSTART fpvsr.fp
> >>> +#else
> >>> #define TS_FPR(i) fpr[i]
> >>> #define TS_FPRSTART fpr
> >>> +#endif
> >>>
> >>> struct thread_struct {
> >>> unsigned long ksp; /* Kernel stack pointer */
> >>> @@ -155,8 +161,19 @@ struct thread_struct {
> >>> unsigned long dbcr0; /* debug control register values */
> >>> unsigned long dbcr1;
> >>> #endif
> >>> +#ifdef CONFIG_VSX
> >>> + /* First 32 VSX registers (overlap with fpr[32]) */
> >>> + union {
> >>> + struct {
> >>> + double fpr;
> >>> + double vsrlow;
> >>> + } fp[32];
> >>> + vector128 vsr[32];
>
> how about:
>
> union {
> struct {
> double fp;
> double vsrlow;
> } fpr;
> vector128 v;
> } fpvsr[32];
Arrh, yep, makes more sense to put the array definition outside the
union. I'll change.
>
> >>>
> >>> + } fpvsr __attribute__((aligned(16)));
> >>
> >> Do we really need a union here? what would happen if you just
> >> changed
> >> the type of fpr[32] from double to vector if #CONFIG_VSX?
> >>
> >> I really dont like the union and think we can just make the storage
> >> look opaque which is the key. I doubt we every really care about
> >> using fpr[] as a double in the kernel.
> >
> > I did something similar to this for the first cut of this patch, but
> > it
> > made the code accessing this structure much less readable.
>
> really, what code is that?
Any code that has to read/write the top or bottom 64 bits _only_ of the
128 bit vector.
The signals code is a good example where, for backwards compatibility,
we need to read/write the old 64 bit FP regs, from the 128 bit value in
the struct.
Similarly, the way we've extended the signals interface for VSX, you
need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.
eg. the simple:
current->thread.fpvsr.fp[i].vsrlow = buf[i]
would turn into some abomination/macro.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 5:37 ` Michael Neuling
@ 2008-06-19 5:47 ` Kumar Gala
2008-06-19 6:01 ` Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 5:47 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
>>>>> + } fpvsr __attribute__((aligned(16)));
>>>>
>>>> Do we really need a union here? what would happen if you just
>>>> changed
>>>> the type of fpr[32] from double to vector if #CONFIG_VSX?
>>>>
>>>> I really dont like the union and think we can just make the storage
>>>> look opaque which is the key. I doubt we every really care about
>>>> using fpr[] as a double in the kernel.
>>>
>>> I did something similar to this for the first cut of this patch, but
>>> it
>>> made the code accessing this structure much less readable.
>>
>> really, what code is that?
>
> Any code that has to read/write the top or bottom 64 bits _only_ of
> the
> 128 bit vector.
>
> The signals code is a good example where, for backwards compatibility,
> we need to read/write the old 64 bit FP regs, from the 128 bit value
> in
> the struct.
>
> Similarly, the way we've extended the signals interface for VSX, you
> need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.
>
> eg. the simple:
> current->thread.fpvsr.fp[i].vsrlow = buf[i]
> would turn into some abomination/macro.
it would turn into something like:
current->thread.fpr[i][2] = buf[i];
current->thread.fpr[i][3] = buf[i+1];
if you look at your code you'll see there are only a few places you
accessing the union as fpvsr.vsr[] and those places could easily be
fpr[], since they are already #CONFIG_VSX protected.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 5:47 ` Kumar Gala
@ 2008-06-19 6:01 ` Michael Neuling
2008-06-19 6:10 ` Kumar Gala
0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-19 6:01 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <B0E87874-BC65-4037-A43D-91C4142475E7@kernel.crashing.org> you wrote
:
> >>>>> + } fpvsr __attribute__((aligned(16)));
> >>>>
> >>>> Do we really need a union here? what would happen if you just
> >>>> changed
> >>>> the type of fpr[32] from double to vector if #CONFIG_VSX?
> >>>>
> >>>> I really dont like the union and think we can just make the storage
> >>>> look opaque which is the key. I doubt we every really care about
> >>>> using fpr[] as a double in the kernel.
> >>>
> >>> I did something similar to this for the first cut of this patch, but
> >>> it
> >>> made the code accessing this structure much less readable.
> >>
> >> really, what code is that?
> >
> > Any code that has to read/write the top or bottom 64 bits _only_ of
> > the
> > 128 bit vector.
> >
> > The signals code is a good example where, for backwards compatibility,
> > we need to read/write the old 64 bit FP regs, from the 128 bit value
> > in
> > the struct.
> >
> > Similarly, the way we've extended the signals interface for VSX, you
> > need to read/write out the bottom 64 bits (vsrlow) of a 128 bit value.
> >
> > eg. the simple:
> > current->thread.fpvsr.fp[i].vsrlow = buf[i]
> > would turn into some abomination/macro.
>
> it would turn into something like:
>
> current->thread.fpr[i][2] = buf[i];
> current->thread.fpr[i][3] = buf[i+1];
Maybe abomination was going too far :-)
I still think using the union makes it is easier to read than what you
have here. Also, it better reflects the structure of what's being
stored there.
Mikey
> if you look at your code you'll see there are only a few places you
> accessing the union as fpvsr.vsr[] and those places could easily be
> fpr[], since they are already #CONFIG_VSX protected.
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 6:01 ` Michael Neuling
@ 2008-06-19 6:10 ` Kumar Gala
2008-06-19 9:33 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 6:10 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 19, 2008, at 1:01 AM, Michael Neuling wrote:
> In message <B0E87874-BC65-4037-
> A43D-91C4142475E7@kernel.crashing.org> you wrote
> :
>>>>>>> + } fpvsr __attribute__((aligned(16)));
>>>>>>
>>>>>> Do we really need a union here? what would happen if you just
>>>>>> changed
>>>>>> the type of fpr[32] from double to vector if #CONFIG_VSX?
>>>>>>
>>>>>> I really dont like the union and think we can just make the
>>>>>> storage
>>>>>> look opaque which is the key. I doubt we every really care about
>>>>>> using fpr[] as a double in the kernel.
>>>>>
>>>>> I did something similar to this for the first cut of this patch,
>>>>> but
>>>>> it
>>>>> made the code accessing this structure much less readable.
>>>>
>>>> really, what code is that?
>>>
>>> Any code that has to read/write the top or bottom 64 bits _only_ of
>>> the
>>> 128 bit vector.
>>>
>>> The signals code is a good example where, for backwards
>>> compatibility,
>>> we need to read/write the old 64 bit FP regs, from the 128 bit value
>>> in
>>> the struct.
>>>
>>> Similarly, the way we've extended the signals interface for VSX, you
>>> need to read/write out the bottom 64 bits (vsrlow) of a 128 bit
>>> value.
>>>
>>> eg. the simple:
>>> current->thread.fpvsr.fp[i].vsrlow = buf[i]
>>> would turn into some abomination/macro.
>>
>> it would turn into something like:
>>
>> current->thread.fpr[i][2] = buf[i];
>> current->thread.fpr[i][3] = buf[i+1];
>
> Maybe abomination was going too far :-)
>
> I still think using the union makes it is easier to read than what you
> have here. Also, it better reflects the structure of what's being
> stored there.
I don't think that holds much weight with me. We don't union the
vector128 type to show it also supports float, u16, and u8 types.
I stick by the fact that the ONLY place it looks like you access the
union via the .vsr member is for memset or memcpy so you clearly know
if the size should be sizeof(double) or sizeof(vector).
Also, I can see the case in the future that 'fpr's become 128-bits
wide' and allow for native long double support.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-18 0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-18 16:28 ` Joel Schopp
@ 2008-06-19 6:51 ` David Woodhouse
2008-06-19 7:00 ` Michael Neuling
1 sibling, 1 reply; 106+ messages in thread
From: David Woodhouse @ 2008-06-19 6:51 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
> {"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
> #endif /* CONFIG_ALTIVEC */
> +#ifdef CONFIG_VSX
> + {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
> +#endif /* CONFIG_VSX */
Should that be "ibm,vsx"?
--
dwmw2
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-19 6:51 ` David Woodhouse
@ 2008-06-19 7:00 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-19 7:00 UTC (permalink / raw)
To: David Woodhouse; +Cc: linuxppc-dev, Paul Mackerras
> On Wed, 2008-06-18 at 10:47 +1000, Michael Neuling wrote:
> > {"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
> > #endif /* CONFIG_ALTIVEC */
> > +#ifdef CONFIG_VSX
> > + {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
> > +#endif /* CONFIG_VSX */
>
> Should that be "ibm,vsx"?
Nope "ibm,vmx" == 2 is correct for VSX.
You're not the first to think it looks wrong, so I should add a
comment.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 6:10 ` Kumar Gala
@ 2008-06-19 9:33 ` Benjamin Herrenschmidt
2008-06-19 13:24 ` Kumar Gala
0 siblings, 1 reply; 106+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-19 9:33 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
> > I still think using the union makes it is easier to read than what you
> > have here. Also, it better reflects the structure of what's being
> > stored there.
>
> I don't think that holds much weight with me. We don't union the
> vector128 type to show it also supports float, u16, and u8 types.
But this is different. The same registers are either basic FP regs or
full VSX regs.
I don't see what's wrong with union, it's a nice way to express things.
> I stick by the fact that the ONLY place it looks like you access the
> union via the .vsr member is for memset or memcpy so you clearly know
> if the size should be sizeof(double) or sizeof(vector).
>
> Also, I can see the case in the future that 'fpr's become
What's wrong with the union ? there's nothing ugly about them..
Cheers,
Ben.
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-19 9:33 ` Benjamin Herrenschmidt
@ 2008-06-19 13:24 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-19 13:24 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
On Jun 19, 2008, at 4:33 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2008-06-19 at 01:10 -0500, Kumar Gala wrote:
>>> I still think using the union makes it is easier to read than what
>>> you
>>> have here. Also, it better reflects the structure of what's being
>>> stored there.
>>
>> I don't think that holds much weight with me. We don't union the
>> vector128 type to show it also supports float, u16, and u8 types.
>
> But this is different. The same registers are either basic FP regs or
> full VSX regs.
>
> I don't see what's wrong with union, it's a nice way to express
> things.
We also don't do this for SPE (the freescale version).
>> I stick by the fact that the ONLY place it looks like you access the
>> union via the .vsr member is for memset or memcpy so you clearly know
>> if the size should be sizeof(double) or sizeof(vector).
>>
>> Also, I can see the case in the future that 'fpr's become
>
> What's wrong with the union ? there's nothing ugly about them..
I'll wait for the next version and see how many places .vsr is
actually accessed.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (9 preceding siblings ...)
2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
` (10 more replies)
10 siblings, 11 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Paulus: please consider for your 2.6.27 tree.
Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
'NKOTB' Nelson.
- Changed thread_struct array definition to be cleaner
- Updated CPU_FTRS_POSSIBLE
- Updated Kconfig typo and dupilicate
- Added comment to clarify ibm,vmx = 2 really means VSX.
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 3/9] powerpc: Move altivec_unavailable
2008-06-20 4:13 ` Michael Neuling
` (5 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
` (3 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/head_64.S | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b performance_monitor_pSeries
- STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+ . = 0xf20
+ b altivec_unavailable_pSeries
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+ STD_EXCEPTION_PSERIES(., altivec_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-20 4:13 ` Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 6:39 ` Kumar Gala
2008-06-20 4:13 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
` (9 subsequent siblings)
10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/asm-offsets.c | 2 -
arch/powerpc/kernel/process.c | 5 ++-
arch/powerpc/kernel/ptrace.c | 14 +++++----
arch/powerpc/kernel/ptrace32.c | 9 ++++--
arch/powerpc/kernel/signal_32.c | 6 ++--
arch/powerpc/kernel/signal_64.c | 13 +++++---
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++-------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 7 ++++
11 files changed, 71 insertions(+), 56 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -66,7 +66,7 @@ int main(void)
DEFINE(KSP_LIMIT, offsetof(struct thread_struct, ksp_limit));
DEFINE(PT_REGS, offsetof(struct thread_struct, regs));
DEFINE(THREAD_FPEXC_MODE, offsetof(struct thread_struct, fpexc_mode));
- DEFINE(THREAD_FPR0, offsetof(struct thread_struct, fpr[0]));
+ DEFINE(THREAD_FPR0, offsetof(struct thread_struct, TS_FPR(0)));
DEFINE(THREAD_FPSCR, offsetof(struct thread_struct, fpscr));
#ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VR0, offsetof(struct thread_struct, vr[0]));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
- memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+ memset(current->thread.TS_FPRSTART, 0,
+ sizeof(current->thread.TS_FPRSTART));
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ &target->thread.TS_FPRSTART, 0, -1);
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ &target->thread.TS_FPRSTART, 0, -1);
}
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -122,7 +122,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +163,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (numReg - PT_FPR0)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +219,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.TS_FPRSTART)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -343,7 +343,7 @@ static int save_user_regs(struct pt_regs
/* save general and floating-point registers */
if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.fpr,
+ __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
ELF_NFPREG * sizeof(double)))
return 1;
@@ -431,7 +431,7 @@ static long restore_user_regs(struct pt_
/*
* Do this before updating the thread state in
- * current->thread.fpr/vr/evr. That way, if we get preempted
+ * current->thread.FPR/vr/evr. That way, if we get preempted
* and another task grabs the FPU/Altivec/SPE, it won't be
* tempted to save the current CPU state into the thread_struct
* and corrupt what we are writing there.
@@ -441,7 +441,7 @@ static long restore_user_regs(struct pt_
/* force the process to reload the FP registers from
current->thread when it next does FP instructions */
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+ if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
return 1;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -116,7 +116,8 @@ static long setup_sigcontext(struct sigc
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPRSTART,
+ FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -168,7 +169,7 @@ static long restore_sigcontext(struct pt
/*
* Do this before updating the thread state in
- * current->thread.fpr/vr. That way, if we get preempted
+ * current->thread.TS_FPR/vr. That way, if we get preempted
* and another task grabs the FPU/Altivec, it won't be
* tempted to save the current CPU state into the thread_struct
* and corrupt what we are writing there.
@@ -177,12 +178,14 @@ static long restore_sigcontext(struct pt
/*
* Force reload of FP/VEC.
- * This has to be done before copying stuff into current->thread.fpr/vr
- * for the reasons explained in the previous comment.
+ * This has to be done before copying stuff into
+ * current->thread.TS_FPR/vr for the reasons explained in the
+ * previous comment.
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+ err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
+ FP_REGS_SIZE);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,9 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+#define TS_FPRSTART fpr
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
@@ -197,12 +200,13 @@ struct thread_struct {
.fpexc_mode = MSR_FE0 | MSR_FE1, \
}
#else
+#define FPVSR_INIT_THREAD .fpr = {0}
#define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
.fs = KERNEL_DS, \
- .fpr = {0}, \
+ FPVSR_INIT_THREAD, \
.fpscr = { .val = 0, }, \
.fpexc_mode = 0, \
}
@@ -289,4 +293,5 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 6:35 ` Kumar Gala
2008-06-20 4:13 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
` (8 subsequent siblings)
10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad. Also
when we add VSX here in a later patch, we can hit two of these at the
same time.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/signal_32.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
{
+ unsigned long msr = regs->msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_VEC;
}
/* else assert((regs->msr & MSR_VEC) == 0) */
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_SPE;
}
/* else assert((regs->msr & MSR_SPE) == 0) */
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
#endif /* CONFIG_SPE */
+ if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+ return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x38000000UL + sigret, &frame->tramp[0])
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-20 4:13 ` Michael Neuling
` (3 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
` (5 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 2 +-
arch/powerpc/kernel/head_32.S | 6 ++++--
arch/powerpc/kernel/head_64.S | 8 +++++---
arch/powerpc/kernel/head_booke.h | 6 ++++--
4 files changed, 14 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
#endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
- b fast_exception_return
+ blr
/*
* giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b ProgramCheck
END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
- bne load_up_fpu /* if from user, just load it up */
- addi r3,r1,STACK_FRAME_OVERHEAD
+ beq 1f
+ bl load_up_fpu /* if from user, just load it up */
+ b fast_exception_return
+1: addi r3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
/* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b .load_up_fpu
+1: bl .load_up_fpu
+ b fast_exception_return
.align 7
.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
- bne .load_up_altivec /* if from user, just load it up */
+ bnel .load_up_altivec
+ b fast_exception_return
END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
#endif
bl .save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
#endif /* CONFIG_SMP */
/* restore registers and return */
- b fast_exception_return
+ blr
#endif /* CONFIG_ALTIVEC */
/*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
#define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG; \
- bne load_up_fpu; /* if from user, just load it up */ \
- addi r3,r1,STACK_FRAME_OVERHEAD; \
+ beq 1f; \
+ bl load_up_fpu; /* if from user, just load it up */ \
+ b fast_exception_return; \
+1: addi r3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
#endif /* __HEAD_BOOKE_H__ */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-20 4:13 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 6:44 ` Kumar Gala
2008-06-20 4:13 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
` (7 subsequent siblings)
10 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This patch introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/asm-offsets.c | 4 ++
arch/powerpc/kernel/ptrace.c | 28 ++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 59 +++++++++++++++++++++++++++++---------
arch/powerpc/kernel/signal_64.c | 36 +++++++++++++++++++----
include/asm-powerpc/processor.h | 31 +++++++++++++++++++
5 files changed, 139 insertions(+), 19 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpvsr[0].vsr));
+ DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
#else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = target->thread.TS_FPR(i);
+ memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.TS_FPRSTART, 0, -1);
+#endif
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+ if (i)
+ return i;
+ for (i = 0; i < 32 ; i++)
+ target->thread.TS_FPR(i) = buf[i];
+ memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+ return 0;
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.TS_FPRSTART, 0, -1);
+#endif
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
int sigret)
{
unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
- /* save general and floating-point registers */
- if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
- ELF_NFPREG * sizeof(double)))
+ /* save general registers */
+ if (save_general_regs(regs, frame))
return 1;
#ifdef CONFIG_ALTIVEC
@@ -368,7 +370,21 @@ static int save_user_regs(struct pt_regs
if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* save FPR copy to local buffer then write to the thread_struct */
+ flush_fp_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+ return 1;
+#else
+ /* save floating-point registers */
+ if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
+ ELF_NFPREG * sizeof(double)))
+ return 1;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */
if (current->thread.used_spe) {
@@ -411,6 +427,10 @@ static long restore_user_regs(struct pt_
long err;
unsigned int save_r2 = 0;
unsigned long msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/*
* restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +458,11 @@ static long restore_user_regs(struct pt_
*/
discard_lazy_cpu_state();
- /* force the process to reload the FP registers from
- current->thread when it next does FP instructions */
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
- sizeof(sr->mc_fregs)))
- return 1;
-
#ifdef CONFIG_ALTIVEC
- /* force the process to reload the altivec registers from
- current->thread when it next does altivec instructions */
+ /*
+ * Force the process to reload the altivec registers from
+ * current->thread when it next does altivec instructions
+ */
regs->msr &= ~MSR_VEC;
if (msr & MSR_VEC) {
/* restore altivec registers from the stack */
@@ -462,6 +477,24 @@ static long restore_user_regs(struct pt_
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+
+#else
+ if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
+ sizeof(sr->mc_fregs)))
+ return 1;
+#endif /* CONFIG_VSX */
+ /*
+ * force the process to reload the FP registers from
+ * current->thread when it next does FP instructions
+ */
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
#ifdef CONFIG_SPE
/* force the process to reload the spe registers from
current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
#endif
unsigned long msr = regs->msr;
long err = 0;
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+ int i;
+#endif
flush_fp_to_thread(current);
@@ -112,12 +116,22 @@ static long setup_sigcontext(struct sigc
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
+ flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ /* Copy FP to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+ /* copy fpr regs and fpscr */
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPR(0),
+ FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
err |= __put_user(&sc->gp_regs, &sc->regs);
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPRSTART,
- FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -136,6 +150,9 @@ static long restore_sigcontext(struct pt
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+#endif
unsigned long err = 0;
unsigned long save_r13 = 0;
elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -184,9 +201,6 @@ static long restore_sigcontext(struct pt
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
- FP_REGS_SIZE);
-
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
if (err)
@@ -205,7 +219,19 @@ static long restore_sigcontext(struct pt
else
current->thread.vrsave = 0;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* restore floating point */
+ err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+ if (err)
+ return err;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
+ FP_REGS_SIZE);
+#endif
return err;
}
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
/* Lazy FPU handling on uni-processor */
extern struct task_struct *last_task_used_math;
extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
extern struct task_struct *last_task_used_spe;
#ifdef CONFIG_PPC32
@@ -136,8 +137,13 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpvsr[i].fpr.fp
+#define TS_FPRSTART fpvsr
+#else
#define TS_FPR(i) fpr[i]
#define TS_FPRSTART fpr
+#endif
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
@@ -155,8 +161,19 @@ struct thread_struct {
unsigned long dbcr0; /* debug control register values */
unsigned long dbcr1;
#endif
+#ifdef CONFIG_VSX
+ /* First 32 VSX registers (overlap with fpr[32]) */
+ union {
+ struct {
+ double fp;
+ double vsrlow;
+ } fpr;
+ vector128 vsr;
+ } fpvsr[32];
+#else
double fpr[32]; /* Complete floating point set */
- struct { /* fpr ... fpscr must be contiguous */
+#endif
+ struct {
unsigned int pad;
unsigned int val; /* Floating point status */
@@ -176,6 +193,10 @@ struct thread_struct {
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* VSR status */
+ int used_vsr; /* set if process has used altivec */
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
unsigned long evr[32]; /* upper 32-bits of SPE regs */
u64 acc; /* Accumulator */
@@ -200,7 +221,11 @@ struct thread_struct {
.fpexc_mode = MSR_FE0 | MSR_FE1, \
}
#else
+#ifdef CONFIG_VSX
+#define FPVSR_INIT_THREAD .fpvsr = {0}
+#else
#define FPVSR_INIT_THREAD .fpr = {0}
+#endif
#define INIT_THREAD { \
.ksp = INIT_SP, \
.ksp_limit = INIT_SP_LIMIT, \
@@ -293,5 +318,9 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
#define TS_FPRSPACING 1
+#endif
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-20 4:13 ` Michael Neuling
` (6 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
` (2 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only useful if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-20 4:13 ` Michael Neuling
` (4 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
` (4 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
arch/powerpc/kernel/prom.c | 4 ++++
include/asm-powerpc/cputable.h | 15 ++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+ {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
#define PPC_FEATURE_HAS_DFP 0x00000400
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
+#define PPC_FEATURE_HAS_VSX 0x00000080
#define PPC_FEATURE_TRUE_LE 0x00000002
#define PPC_FEATURE_PPC_LE 0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
#ifndef __ASSEMBLY__
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
#define PPC_FEATURE_HAS_ALTIVEC_COMP 0
#endif
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP 0
+#define PPC_FEATURE_HAS_VSX_COMP 0
+#endif
+
/* We only set the spe features if the kernel was compiled with spe
* support
*/
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 | \
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
- CPU_FTR_1T_SEGMENT)
+ CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
#else
enum {
CPU_FTRS_POSSIBLE =
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-20 4:13 ` Michael Neuling
` (7 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 6:37 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
2008-06-23 5:31 ` Michael Neuling
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
REST_10GPRS(22, base)
#endif
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
+ ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
#define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ REST_32FPRS(n,base); \
+ b 3f; \
+2: REST_32VSRS(n,c,base); \
+3:
+
+#define SAVE_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ SAVE_32FPRS(n,base); \
+ b 3f; \
+2: SAVE_32VSRS(n,c,base); \
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#endif
+
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
#define vr30 30
#define vr31 31
+/* VSX Registers (VSRs) */
+
+#define vsr0 0
+#define vsr1 1
+#define vsr2 2
+#define vsr3 3
+#define vsr4 4
+#define vsr5 5
+#define vsr6 6
+#define vsr7 7
+#define vsr8 8
+#define vsr9 9
+#define vsr10 10
+#define vsr11 11
+#define vsr12 12
+#define vsr13 13
+#define vsr14 14
+#define vsr15 15
+#define vsr16 16
+#define vsr17 17
+#define vsr18 18
+#define vsr19 19
+#define vsr20 20
+#define vsr21 21
+#define vsr22 22
+#define vsr23 23
+#define vsr24 24
+#define vsr25 25
+#define vsr26 26
+#define vsr27 27
+#define vsr28 28
+#define vsr29 29
+#define vsr30 30
+#define vsr31 31
+#define vsr32 32
+#define vsr33 33
+#define vsr34 34
+#define vsr35 35
+#define vsr36 36
+#define vsr37 37
+#define vsr38 38
+#define vsr39 39
+#define vsr40 40
+#define vsr41 41
+#define vsr42 42
+#define vsr43 43
+#define vsr44 44
+#define vsr45 45
+#define vsr46 46
+#define vsr47 47
+#define vsr48 48
+#define vsr49 49
+#define vsr50 50
+#define vsr51 51
+#define vsr52 52
+#define vsr53 53
+#define vsr54 54
+#define vsr55 55
+#define vsr56 56
+#define vsr57 57
+#define vsr58 58
+#define vsr59 59
+#define vsr60 60
+#define vsr61 61
+#define vsr62 62
+#define vsr63 63
+
/* SPE Registers (EVPRs) */
#define evr0 0
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
2008-06-20 4:13 ` Michael Neuling
` (2 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
` (6 subsequent siblings)
10 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 4:13 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/entry_64.S | 5 +
arch/powerpc/kernel/fpu.S | 16 ++++-
arch/powerpc/kernel/head_64.S | 65 +++++++++++++++++++++++
arch/powerpc/kernel/misc_64.S | 33 +++++++++++
arch/powerpc/kernel/ppc32.h | 1
arch/powerpc/kernel/ppc_ksyms.c | 3 +
arch/powerpc/kernel/process.c | 109 ++++++++++++++++++++++++++++++++++++++-
arch/powerpc/kernel/ptrace.c | 70 +++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 33 +++++++++++
arch/powerpc/kernel/signal_64.c | 31 ++++++++++-
arch/powerpc/kernel/traps.c | 29 ++++++++++
include/asm-powerpc/elf.h | 6 +-
include/asm-powerpc/ptrace.h | 12 ++++
include/asm-powerpc/reg.h | 2
include/asm-powerpc/sigcontext.h | 37 ++++++++++++-
include/asm-powerpc/system.h | 9 +++
include/linux/elf.h | 1
17 files changed, 454 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflr r20 /* Return to switch caller */
mfmsr r22
li r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r0,r0,MSR_VSX@h /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
oris r0,r0,MSR_VEC@h /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
_GLOBAL(load_up_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5) /* enable use of fpu now */
isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
- SAVE_32FPRS(0, r4)
+ SAVE_32FPVSRS(0, r5, r4)
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
- REST_32FPRS(0, r5)
+ REST_32FPVSRS(0, r4, r5)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
_GLOBAL(giveup_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5) /* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r5,0
- SAVE_32FPRS(0, r3)
+ SAVE_32FPVSRS(0, r4 ,r3)
mffs fr0
stfd fr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b altivec_unavailable_pSeries
+ . = 0xf40
+ b vsx_unavailable_pSeries
+
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+ STD_EXCEPTION_PSERIES(., vsx_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
blr
#endif /* CONFIG_ALTIVEC */
+ .align 7
+ .globl vsx_unavailable_common
+vsx_unavailable_common:
+ EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ bne .load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+ bl .save_nvgprs
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ ENABLE_INTS
+ bl .vsx_unavailable_exception
+ b .ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+ andi. r5,r12,MSR_FP
+ beql+ load_up_fpu /* skip if already loaded */
+ andis. r5,r12,MSR_VEC@h
+ beql+ load_up_altivec /* skip if already loaded */
+
+#ifndef CONFIG_SMP
+ ld r3,last_task_used_vsx@got(r2)
+ ld r4,0(r3)
+ cmpdi 0,r4,0
+ beq 1f
+ /* Disable VSX for last_task_used_vsx */
+ addi r4,r4,THREAD
+ ld r5,PT_REGS(r4)
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r6,MSR_VSX@h
+ andc r6,r4,r6
+ std r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+ ld r4,PACACURRENT(r13)
+ addi r4,r4,THREAD /* Get THREAD */
+ li r6,1
+ stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+ /* enable use of VSX after return */
+ oris r12,r12,MSR_VSX@h
+ std r12,_MSR(r1)
+#ifndef CONFIG_SMP
+ /* Update last_task_used_math to 'current' */
+ ld r4,PACACURRENT(r13)
+ std r4,0(r3)
+#endif /* CONFIG_SMP */
+ b fast_exception_return
+#endif /* CONFIG_VSX */
+
/*
* Hash table stuff
*/
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+ mfmsr r5
+ oris r5,r5,MSR_VSX@h
+ mtmsrd r5 /* enable use of VSX now */
+ isync
+
+ cmpdi 0,r3,0
+ beqlr- /* if no previous owner, done */
+ addi r3,r3,THREAD /* want THREAD of task */
+ ld r5,PT_REGS(r3)
+ cmpdi 0,r5,0
+ beq 1f
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r3,MSR_VSX@h
+ andc r4,r4,r3 /* disable VSX for previous task */
+ std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+ li r5,0
+ ld r4,last_task_used_vsx@got(r2)
+ std r5,0(r4)
+#endif /* CONFIG_SMP */
+ blr
+
+#endif /* CONFIG_VSX */
+
/* kexec_wait(phys_cpu)
*
* wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
elf_fpregset_t mc_fregs;
unsigned int mc_pad[2];
elf_vrregset_t32 mc_vregs __attribute__((__aligned__(16)));
+ elf_vsrreghalf_t32 mc_vsregs __attribute__((__aligned__(16)));
};
struct ucontext32 {
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
#ifdef CONFIG_ALTIVEC
EXPORT_SYMBOL(giveup_altivec);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
EXPORT_SYMBOL(giveup_spe);
#endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
#ifndef CONFIG_SMP
struct task_struct *last_task_used_math = NULL;
struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
struct task_struct *last_task_used_spe = NULL;
#endif
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
{
+#ifdef CONFIG_VSX
+ int i;
+ elf_fpreg_t *reg;
+#endif
+
if (!tsk->thread.regs)
return 0;
flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ reg = (elf_fpreg_t *)fpregs;
+ for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+ *reg = tsk->thread.TS_FPR(i);
+ memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
return 1;
}
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
}
}
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
{
/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
* separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+ WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+ if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+ giveup_vsx(current);
+ else
+ giveup_vsx(NULL); /* just enable vsx for kernel - force */
+#else
+ giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+ if (tsk->thread.regs) {
+ preempt_disable();
+ if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+ BUG_ON(tsk != current);
+#endif
+ giveup_vsx(tsk);
+ }
+ preempt_enable();
+ }
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers. This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+ /* Grab only the first half */
+ const int nregs = 32;
+ elf_vrreg_t *reg;
+
+ if (tsk == current)
+ flush_vsx_to_thread(tsk);
+
+ reg = (elf_vrreg_t *)vrregs;
+
+ /* copy the first 32 vsr registers */
+ memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+ return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+ int rc = 0;
+ elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+ rc = dump_task_altivec(tsk, regs);
+ if (rc)
+ return rc;
+ regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+ rc = dump_task_altivec(tsk, regs);
+#endif
+ return rc;
+}
+
#ifdef CONFIG_SPE
void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
if (last_task_used_altivec == current)
last_task_used_altivec = NULL;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (last_task_used_vsx == current)
+ last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
if (last_task_used_spe == current)
last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
giveup_altivec(prev);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+ giveup_vsx(prev);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/*
* If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
if (new->thread.regs && last_task_used_altivec == new)
new->thread.regs->msr |= MSR_VEC;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (new->thread.regs && last_task_used_vsx == new)
+ new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* Avoid the trap. On smp this this never happens since
* we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
{MSR_EE, "EE"},
{MSR_PR, "PR"},
{MSR_FP, "FP"},
+ {MSR_VEC, "VEC"},
+ {MSR_VSX, "VSX"},
{MSR_ME, "ME"},
{MSR_IR, "IR"},
{MSR_DR, "DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct
{
flush_fp_to_thread(current);
flush_altivec_to_thread(current);
+ flush_vsx_to_thread(current);
flush_spe_to_thread(current);
}
@@ -689,8 +790,14 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+ memset(current->thread.fpvsr, 0,
+ sizeof(current->thread.fpvsr));
+ current->thread.used_vsr = 0;
+#else
memset(current->thread.TS_FPRSTART, 0,
sizeof(current->thread.TS_FPRSTART));
+#endif /* CONFIG_VSX */
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell. This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ flush_vsx_to_thread(target);
+ return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ &target->thread.fpvsr[0].vsr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ &target->thread.fpvsr[0].vsr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_SPE
/*
@@ -427,6 +472,9 @@ enum powerpc_regset {
#ifdef CONFIG_ALTIVEC
REGSET_VMX,
#endif
+#ifdef CONFIG_VSX
+ REGSET_VSX,
+#endif
#ifdef CONFIG_SPE
REGSET_SPE,
#endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
.active = vr_active, .get = vr_get, .set = vr_set
},
#endif
+#ifdef CONFIG_VSX
+ [REGSET_VSX] = {
+ .core_note_type = NT_PPC_VSX, .n = 34,
+ .size = sizeof(vector128), .align = sizeof(vector128),
+ .active = vsr_active, .get = vsr_get, .set = vsr_set
+ },
+#endif
#ifdef CONFIG_SPE
[REGSET_SPE] = {
.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
sizeof(u32)),
(const void __user *) data);
#endif
+#ifdef CONFIG_VSX
+ case PTRACE_GETVSRREGS:
+ return copy_regset_to_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (void __user *) data);
+
+ case PTRACE_SETVSRREGS:
+ return copy_regset_from_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (const void __user *) data);
+#endif
#ifdef CONFIG_SPE
case PTRACE_GETEVRREGS:
/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -379,6 +379,21 @@ static int save_user_regs(struct pt_regs
if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
return 1;
+ /*
+ * Copy VSR 0-31 upper half from thread_struct to local
+ * buffer, then write that to userspace. Also set MSR_VSX in
+ * the saved MSR value to indicate that frame->mc_vregs
+ * contains valid data
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpvsr[i].fpr.vsrlow;
+ if (__copy_to_user(&frame->mc_vsregs, buf,
+ ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ msr |= MSR_VSX;
+ }
#else
/* save floating-point registers */
if (__copy_to_user(&frame->mc_fregs, current->thread.TS_FPRSTART,
@@ -484,6 +499,24 @@ static long restore_user_regs(struct pt_
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Force the process to reload the VSX registers from
+ * current->thread when it next does VSX instruction.
+ */
+ regs->msr &= ~MSR_VSX;
+ if (msr & MSR_VSX) {
+ /*
+ * Restore altivec registers from the stack to a local
+ * buffer, then write this out to the thread_struct
+ */
+ if (__copy_from_user(buf, &sr->mc_vsregs,
+ sizeof(sr->mc_vsregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpvsr[i].fpr.vsrlow = buf[i];
+ } else if (current->thread.used_vsr)
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpvsr[i].fpr.vsrlow = 0;
#else
if (__copy_from_user(current->thread.TS_FPRSTART, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
buf[i] = current->thread.TS_FPR(i);
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+ /*
+ * Copy VSX low doubleword to local buffer for formatting,
+ * then out to userspace. Update v_regs to point after the
+ * VMX data.
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ v_regs += ELF_NVRREG;
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpvsr[i].fpr.vsrlow;
+ err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+ /* set MSR_VSX in the MSR value in the frame to
+ * indicate that sc->vs_reg) contains valid data.
+ */
+ msr |= MSR_VSX;
+ }
#else /* CONFIG_VSX */
/* copy fpr regs and fpscr */
err |= __copy_to_user(&sc->fp_regs, ¤t->thread.TS_FPR(0),
@@ -199,7 +215,7 @@ static long restore_sigcontext(struct pt
* current->thread.TS_FPR/vr for the reasons explained in the
* previous comment.
*/
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
@@ -228,6 +244,19 @@ static long restore_sigcontext(struct pt
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Get additional VSX data. Update v_regs to point after the
+ * VMX data. Copy VSX low doubleword from userspace to local
+ * buffer for formatting, then into the taskstruct.
+ */
+ v_regs += ELF_NVRREG;
+ if ((msr & MSR_VSX) != 0)
+ err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+ else
+ memset(buf, 0, 32 * sizeof(double));
+
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpvsr[i].fpr.vsrlow = buf[i];
#else
err |= __copy_from_user(¤t->thread.TS_FPRSTART, &sc->fp_regs,
FP_REGS_SIZE);
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
}
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /* A user program has executed an vsx instruction,
+ but this kernel doesn't support vsx. */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return;
+ }
+
+ printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+ "%lx at %lx\n", regs->trap, regs->nip);
+ die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
void performance_monitor_exception(struct pt_regs *regs)
{
perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+ if (!user_mode(regs)) {
+ printk(KERN_EMERG "VSX assist exception in kernel mode"
+ " at %lx\n", regs->nip);
+ die("Kernel VSX assist exception", regs, SIGILL);
+ }
+
+ flush_vsx_to_thread(current);
+ printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_FSL_BOOKE
void CacheLockingException(struct pt_regs *regs, unsigned long address,
unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
#ifdef __powerpc64__
# define ELF_NVRREG32 33 /* includes vscr & vrsave stuffed together */
# define ELF_NVRREG 34 /* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32 /* Half the vsx registers */
# define ELF_GREG_TYPE elf_greg_t64
#else
# define ELF_NEVRREG 34 /* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
#ifdef __powerpc64__
typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
#endif
#ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
typedef elf_vrregset_t elf_fpxregset_t;
#ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
#define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
#endif
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
#define PT_VRSAVE_32 (PT_VR0 + 33*4)
#endif
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 /* each VSR reg occupies 4 slots in 32-bit */
+#endif
#endif /* __powerpc64__ */
/*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
#define PTRACE_GETEVRREGS 20
#define PTRACE_SETEVRREGS 21
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS 27
+#define PTRACE_SETVSRREGS 28
+
/*
* Get or set a debug register. The first 16 are DABR registers and the
* second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
#define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */
#define MSR_HV_LG 60 /* Hypervisor state */
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
#define MSR_WE_LG 18 /* Wait State Enable */
#define MSR_TGPR_LG 17 /* TLB Update registers in use */
@@ -71,6 +72,7 @@
#endif
#define MSR_VEC __MASK(MSR_VEC_LG) /* Enable AltiVec */
+#define MSR_VSX __MASK(MSR_VSX_LG) /* Enable VSX */
#define MSR_POW __MASK(MSR_POW_LG) /* Enable Power Management */
#define MSR_WE __MASK(MSR_WE_LG) /* Wait State Enable */
#define MSR_TGPR __MASK(MSR_TGPR_LG) /* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
* it must be copied via a vector register to/from storage) or as a word.
* The entry with index 33 contains the vrsave as the first word (offset 0)
* within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words. Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ * VSR doubleword 0 VSR doubleword 1
+ * ----------------------------------------------------------------
+ * VSR[0] | FPR[0] | |
+ * ----------------------------------------------------------------
+ * VSR[1] | FPR[1] | |
+ * ----------------------------------------------------------------
+ * | ... | |
+ * | ... | |
+ * ----------------------------------------------------------------
+ * VSR[30] | FPR[30] | |
+ * ----------------------------------------------------------------
+ * VSR[31] | FPR[31] | |
+ * ----------------------------------------------------------------
+ * VSR[32] | VR[0] |
+ * ----------------------------------------------------------------
+ * VSR[33] | VR[1] |
+ * ----------------------------------------------------------------
+ * | ... |
+ * | ... |
+ * ----------------------------------------------------------------
+ * VSR[62] | VR[30] |
+ * ----------------------------------------------------------------
+ * VSR[63] | VR[31] |
+ * ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve. vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
*/
elf_vrreg_t __user *v_regs;
- long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+ long vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
#endif
};
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
extern void giveup_altivec(struct task_struct *);
extern void load_up_altivec(struct task_struct *);
extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
extern void enable_kernel_spe(void);
extern void giveup_spe(struct task_struct *);
extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
}
#endif
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
#ifdef CONFIG_SPE
extern void flush_spe_to_thread(struct task_struct *);
#else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
#define NT_PPC_VMX 0x100 /* PowerPC Altivec/VMX registers */
#define NT_PPC_SPE 0x101 /* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX 0x102 /* PowerPC VSX registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-20 4:13 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-20 6:35 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-20 6:35 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit. This will never happen in reality (VMX and SPE will never be in
> the same processor as their opcodes overlap), but it looks bad. Also
> when we add VSX here in a later patch, we can hit two of these at the
> same time.
Also, MSR_SPE and MSR_VEC are the same bit. So we'd never clobber
anything.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-20 4:13 ` Michael Neuling
` (8 preceding siblings ...)
2008-06-20 4:13 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-20 6:37 ` Kumar Gala
2008-06-20 8:15 ` Michael Neuling
2008-06-23 5:31 ` Michael Neuling
10 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-20 6:37 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
> The following set of patches adds Vector Scalar Extentions (VSX)
> support for POWER7. Includes context switch, ptrace and signals
> support.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
> Paulus: please consider for your 2.6.27 tree.
>
> Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
> 'NKOTB' Nelson.
> - Changed thread_struct array definition to be cleaner
> - Updated CPU_FTRS_POSSIBLE
> - Updated Kconfig typo and dupilicate
> - Added comment to clarify ibm,vmx = 2 really means VSX.
One question I was wondering about is the "user space" view of VSX.
Is the intent to have it seem like there is a unique register set for
VSX separate from FP or AltiVec?
(This gets into what the ABI changes look like).
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-20 4:13 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-20 6:39 ` Kumar Gala
2008-06-22 11:29 ` Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-20 6:39 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> @@ -136,6 +136,9 @@ typedef struct {
> unsigned long seg;
> } mm_segment_t;
>
> +#define TS_FPR(i) fpr[i]
> +#define TS_FPRSTART fpr
> +
> struct thread_struct {
> unsigned long ksp; /* Kernel stack pointer */
> unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
> @@ -197,12 +200,13 @@ struct thread_struct {
> .fpexc_mode = MSR_FE0 | MSR_FE1, \
> }
> #else
> +#define FPVSR_INIT_THREAD .fpr = {0}
Being a bit nit picky, but doesn't seem like this patch should
introduce FPVSR.
>
> #define INIT_THREAD { \
> .ksp = INIT_SP, \
> .ksp_limit = INIT_SP_LIMIT, \
> .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
> .fs = KERNEL_DS, \
> - .fpr = {0}, \
> + FPVSR_INIT_THREAD, \
> .fpscr = { .val = 0, }, \
> .fpexc_mode = 0, \
> }
> @@ -289,4 +293,5 @@ static inline void prefetchw(const void
>
> #endif /* __KERNEL__ */
> #endif /* __ASSEMBLY__ */
> +#define TS_FPRSPACING 1
> #endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-20 4:13 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-20 6:44 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-20 6:44 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
> Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> @@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
> /* Lazy FPU handling on uni-processor */
> extern struct task_struct *last_task_used_math;
> extern struct task_struct *last_task_used_altivec;
> +extern struct task_struct *last_task_used_vsx;
> extern struct task_struct *last_task_used_spe;
>
> #ifdef CONFIG_PPC32
> @@ -136,8 +137,13 @@ typedef struct {
> unsigned long seg;
> } mm_segment_t;
>
> +#ifdef CONFIG_VSX
> +#define TS_FPR(i) fpvsr[i].fpr.fp
> +#define TS_FPRSTART fpvsr
> +#else
> #define TS_FPR(i) fpr[i]
> #define TS_FPRSTART fpr
> +#endif
>
> struct thread_struct {
> unsigned long ksp; /* Kernel stack pointer */
> @@ -155,8 +161,19 @@ struct thread_struct {
> unsigned long dbcr0; /* debug control register values */
> unsigned long dbcr1;
> #endif
> +#ifdef CONFIG_VSX
> + /* First 32 VSX registers (overlap with fpr[32]) */
> + union {
> + struct {
> + double fp;
s/fp/fpr
> + double vsrlow;
> + } fpr;
> + vector128 vsr;
> + } fpvsr[32];
> +#else
> double fpr[32]; /* Complete floating point set */
> - struct { /* fpr ... fpscr must be contiguous */
> +#endif
> + struct {
>
> unsigned int pad;
> unsigned int val; /* Floating point status */
So if I search correctly I count 2 uses of .vsr. Seems like we could
easily make those two cases use .fp and drop the union.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-20 6:37 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-20 8:15 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-20 8:15 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <B353EBCF-A080-41C7-B331-61D29C6F5C02@kernel.crashing.org> you wrote
:
>
> On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
>
> > The following set of patches adds Vector Scalar Extentions (VSX)
> > support for POWER7. Includes context switch, ptrace and signals
> > support.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> > Paulus: please consider for your 2.6.27 tree.
> >
> > Updated with comments from Kumar, Milton, Dave Woodhouse and Mark
> > 'NKOTB' Nelson.
> > - Changed thread_struct array definition to be cleaner
> > - Updated CPU_FTRS_POSSIBLE
> > - Updated Kconfig typo and dupilicate
> > - Added comment to clarify ibm,vmx = 2 really means VSX.
>
> One question I was wondering about is the "user space" view of VSX.
> Is the intent to have it seem like there is a unique register set for
> VSX separate from FP or AltiVec?
For userspace it's not a unique register set. So if you execute FP code
in the middle of your VSX code, you change VSX registers 0-31.
Userspace will see the same as if it was running natively on the CPU.
> (This gets into what the ABI changes look like).
Signals and ptrace interfaces have been kept backwards compatible.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-20 6:39 ` Kumar Gala
@ 2008-06-22 11:29 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-22 11:29 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <7F82D2F6-6FB3-49F0-9512-D60AC2E9CBED@kernel.crashing.org> you wrote
:
>
> On Jun 19, 2008, at 11:13 PM, Michael Neuling wrote:
>
> > Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
> > +++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
> > @@ -136,6 +136,9 @@ typedef struct {
> > unsigned long seg;
> > } mm_segment_t;
> >
> > +#define TS_FPR(i) fpr[i]
> > +#define TS_FPRSTART fpr
> > +
> > struct thread_struct {
> > unsigned long ksp; /* Kernel stack pointer */
> > unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow *
/
> > @@ -197,12 +200,13 @@ struct thread_struct {
> > .fpexc_mode = MSR_FE0 | MSR_FE1, \
> > }
> > #else
> > +#define FPVSR_INIT_THREAD .fpr = {0}
>
> Being a bit nit picky, but doesn't seem like this patch should
> introduce FPVSR.
Yep.. a bit early, thanks.
>
> >
> > #define INIT_THREAD { \
> > .ksp = INIT_SP, \
> > .ksp_limit = INIT_SP_LIMIT, \
> > .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
> > .fs = KERNEL_DS, \
> > - .fpr = {0}, \
> > + FPVSR_INIT_THREAD, \
> > .fpscr = { .val = 0, }, \
> > .fpexc_mode = 0, \
> > }
> > @@ -289,4 +293,5 @@ static inline void prefetchw(const void
> >
> > #endif /* __KERNEL__ */
> > #endif /* __ASSEMBLY__ */
> > +#define TS_FPRSPACING 1
> > #endif /* _ASM_POWERPC_PROCESSOR_H */
>
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
` (7 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad. Also
when we add VSX here in a later patch, we can hit two of these at the
same time.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/signal_32.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
{
+ unsigned long msr = regs->msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_VEC;
}
/* else assert((regs->msr & MSR_VEC) == 0) */
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_SPE;
}
/* else assert((regs->msr & MSR_SPE) == 0) */
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
#endif /* CONFIG_SPE */
+ if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+ return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x38000000UL + sigret, &frame->tramp[0])
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-23 5:31 ` Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
` (8 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 2 +-
arch/powerpc/kernel/head_32.S | 6 ++++--
arch/powerpc/kernel/head_64.S | 8 +++++---
arch/powerpc/kernel/head_booke.h | 6 ++++--
4 files changed, 14 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
#endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
- b fast_exception_return
+ blr
/*
* giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b ProgramCheck
END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
- bne load_up_fpu /* if from user, just load it up */
- addi r3,r1,STACK_FRAME_OVERHEAD
+ beq 1f
+ bl load_up_fpu /* if from user, just load it up */
+ b fast_exception_return
+1: addi r3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
/* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b .load_up_fpu
+1: bl .load_up_fpu
+ b fast_exception_return
.align 7
.globl altivec_unavailable_common
@@ -749,7 +750,8 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
- bne .load_up_altivec /* if from user, just load it up */
+ bnel .load_up_altivec
+ b fast_exception_return
END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
#endif
bl .save_nvgprs
@@ -829,7 +831,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
#endif /* CONFIG_SMP */
/* restore registers and return */
- b fast_exception_return
+ blr
#endif /* CONFIG_ALTIVEC */
/*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
#define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG; \
- bne load_up_fpu; /* if from user, just load it up */ \
- addi r3,r1,STACK_FRAME_OVERHEAD; \
+ beq 1f; \
+ bl load_up_fpu; /* if from user, just load it up */ \
+ b fast_exception_return; \
+1: addi r3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
#endif /* __HEAD_BOOKE_H__ */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 3/9] powerpc: Move altivec_unavailable
2008-06-23 5:31 ` Michael Neuling
` (2 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
` (5 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/head_64.S | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b performance_monitor_pSeries
- STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+ . = 0xf20
+ b altivec_unavailable_pSeries
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+ STD_EXCEPTION_PSERIES(., altivec_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-20 4:13 ` Michael Neuling
` (9 preceding siblings ...)
2008-06-20 6:37 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
` (9 more replies)
10 siblings, 10 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Paulus: please consider for your 2.6.27 tree.
- Updated to remove the union that Kumar doesn't like. I'm not sure I
like like this version as much due to the magic offsets required to
access the vsrlow. It does clean up some other parts of the code
though.
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-23 5:31 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
` (6 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/process.c | 5 ++-
arch/powerpc/kernel/ptrace.c | 14 +++++----
arch/powerpc/kernel/ptrace32.c | 9 ++++--
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++--------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 3 ++
8 files changed, 56 insertions(+), 46 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
- memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+ memset(current->thread.fpr, 0,
+ sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ target->thread.fpr, 0, -1);
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ target->thread.fpr, 0, -1);
}
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -122,7 +122,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +163,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.fpr)
+ [TS_FPRSPACING * (numReg - PT_FPR0)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +219,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,8 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
@@ -289,4 +291,5 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-23 5:31 ` Michael Neuling
` (5 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
` (2 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This patch introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/asm-offsets.c | 4 ++
arch/powerpc/kernel/ptrace.c | 28 ++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 59 ++++++++++++++++++++++++++++----------
arch/powerpc/kernel/signal_64.c | 32 ++++++++++++++++++--
include/asm-powerpc/processor.h | 21 ++++++++++++-
5 files changed, 126 insertions(+), 18 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+ DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
#else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = target->thread.TS_FPR(i);
+ memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
target->thread.fpr, 0, -1);
+#endif
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+ if (i)
+ return i;
+ for (i = 0; i < 32 ; i++)
+ target->thread.TS_FPR(i) = buf[i];
+ memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+ return 0;
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
target->thread.fpr, 0, -1);
+#endif
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
int sigret)
{
unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
- /* save general and floating-point registers */
- if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.fpr,
- ELF_NFPREG * sizeof(double)))
+ /* save general registers */
+ if (save_general_regs(regs, frame))
return 1;
#ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
return 1;
#endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+ /* save FPR copy to local buffer then write to the thread_struct */
+ flush_fp_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+ return 1;
+#else
+ /* save floating-point registers */
+ if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+ ELF_NFPREG * sizeof(double)))
+ return 1;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */
if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
long err;
unsigned int save_r2 = 0;
unsigned long msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/*
* restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
*/
discard_lazy_cpu_state();
- /* force the process to reload the FP registers from
- current->thread when it next does FP instructions */
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
- sizeof(sr->mc_fregs)))
- return 1;
-
#ifdef CONFIG_ALTIVEC
- /* force the process to reload the altivec registers from
- current->thread when it next does altivec instructions */
+ /*
+ * Force the process to reload the altivec registers from
+ * current->thread when it next does altivec instructions
+ */
regs->msr &= ~MSR_VEC;
if (msr & MSR_VEC) {
/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+ sizeof(sr->mc_fregs)))
+ return 1;
+#endif /* CONFIG_VSX */
+ /*
+ * force the process to reload the FP registers from
+ * current->thread when it next does FP instructions
+ */
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
#ifdef CONFIG_SPE
/* force the process to reload the spe registers from
current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
#endif
unsigned long msr = regs->msr;
long err = 0;
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+ int i;
+#endif
flush_fp_to_thread(current);
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
+ flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ /* Copy FP to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+ /* copy fpr regs and fpscr */
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
err |= __put_user(&sc->gp_regs, &sc->regs);
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+#endif
unsigned long err = 0;
unsigned long save_r13 = 0;
elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
else
current->thread.vrsave = 0;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* restore floating point */
+ err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+ if (err)
+ return err;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
return err;
}
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
/* Lazy FPU handling on uni-processor */
extern struct task_struct *last_task_used_math;
extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
extern struct task_struct *last_task_used_spe;
#ifdef CONFIG_PPC32
@@ -136,7 +137,13 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
+#else
#define TS_FPR(i) fpr[i]
+#endif
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
@@ -154,8 +161,12 @@ struct thread_struct {
unsigned long dbcr0; /* debug control register values */
unsigned long dbcr1;
#endif
+#ifdef CONFIG_VSX
+ double fpr[32][2]; /* Complete floating point set */
+#else
double fpr[32]; /* Complete floating point set */
- struct { /* fpr ... fpscr must be contiguous */
+#endif
+ struct {
unsigned int pad;
unsigned int val; /* Floating point status */
@@ -175,6 +186,10 @@ struct thread_struct {
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* VSR status */
+ int used_vsr; /* set if process has used altivec */
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
unsigned long evr[32]; /* upper 32-bits of SPE regs */
u64 acc; /* Accumulator */
@@ -291,5 +306,9 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
#define TS_FPRSPACING 1
+#endif
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-23 5:31 ` Michael Neuling
` (3 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
` (4 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
REST_10GPRS(22, base)
#endif
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
+ ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
#define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ REST_32FPRS(n,base); \
+ b 3f; \
+2: REST_32VSRS(n,c,base); \
+3:
+
+#define SAVE_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ SAVE_32FPRS(n,base); \
+ b 3f; \
+2: SAVE_32VSRS(n,c,base); \
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#endif
+
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
#define vr30 30
#define vr31 31
+/* VSX Registers (VSRs) */
+
+#define vsr0 0
+#define vsr1 1
+#define vsr2 2
+#define vsr3 3
+#define vsr4 4
+#define vsr5 5
+#define vsr6 6
+#define vsr7 7
+#define vsr8 8
+#define vsr9 9
+#define vsr10 10
+#define vsr11 11
+#define vsr12 12
+#define vsr13 13
+#define vsr14 14
+#define vsr15 15
+#define vsr16 16
+#define vsr17 17
+#define vsr18 18
+#define vsr19 19
+#define vsr20 20
+#define vsr21 21
+#define vsr22 22
+#define vsr23 23
+#define vsr24 24
+#define vsr25 25
+#define vsr26 26
+#define vsr27 27
+#define vsr28 28
+#define vsr29 29
+#define vsr30 30
+#define vsr31 31
+#define vsr32 32
+#define vsr33 33
+#define vsr34 34
+#define vsr35 35
+#define vsr36 36
+#define vsr37 37
+#define vsr38 38
+#define vsr39 39
+#define vsr40 40
+#define vsr41 41
+#define vsr42 42
+#define vsr43 43
+#define vsr44 44
+#define vsr45 45
+#define vsr46 46
+#define vsr47 47
+#define vsr48 48
+#define vsr49 49
+#define vsr50 50
+#define vsr51 51
+#define vsr52 52
+#define vsr53 53
+#define vsr54 54
+#define vsr55 55
+#define vsr56 56
+#define vsr57 57
+#define vsr58 58
+#define vsr59 59
+#define vsr60 60
+#define vsr61 61
+#define vsr62 62
+#define vsr63 63
+
/* SPE Registers (EVPRs) */
#define evr0 0
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
2008-06-23 5:31 ` Michael Neuling
` (6 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/entry_64.S | 5 +
arch/powerpc/kernel/fpu.S | 16 ++++-
arch/powerpc/kernel/head_64.S | 65 +++++++++++++++++++++++
arch/powerpc/kernel/misc_64.S | 33 ++++++++++++
arch/powerpc/kernel/ppc32.h | 1
arch/powerpc/kernel/ppc_ksyms.c | 3 +
arch/powerpc/kernel/process.c | 106 ++++++++++++++++++++++++++++++++++++++-
arch/powerpc/kernel/ptrace.c | 70 +++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 33 ++++++++++++
arch/powerpc/kernel/signal_64.c | 31 +++++++++++
arch/powerpc/kernel/traps.c | 29 ++++++++++
include/asm-powerpc/elf.h | 6 +-
include/asm-powerpc/ptrace.h | 12 ++++
include/asm-powerpc/reg.h | 2
include/asm-powerpc/sigcontext.h | 37 +++++++++++++
include/asm-powerpc/system.h | 9 +++
include/linux/elf.h | 1
17 files changed, 451 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflr r20 /* Return to switch caller */
mfmsr r22
li r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r0,r0,MSR_VSX@h /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
oris r0,r0,MSR_VEC@h /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
_GLOBAL(load_up_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5) /* enable use of fpu now */
isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
- SAVE_32FPRS(0, r4)
+ SAVE_32FPVSRS(0, r5, r4)
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
- REST_32FPRS(0, r5)
+ REST_32FPVSRS(0, r4, r5)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
_GLOBAL(giveup_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5) /* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r5,0
- SAVE_32FPRS(0, r3)
+ SAVE_32FPVSRS(0, r4 ,r3)
mffs fr0
stfd fr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b altivec_unavailable_pSeries
+ . = 0xf40
+ b vsx_unavailable_pSeries
+
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+ STD_EXCEPTION_PSERIES(., vsx_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -834,6 +838,67 @@ _STATIC(load_up_altivec)
blr
#endif /* CONFIG_ALTIVEC */
+ .align 7
+ .globl vsx_unavailable_common
+vsx_unavailable_common:
+ EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ bne .load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+ bl .save_nvgprs
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ ENABLE_INTS
+ bl .vsx_unavailable_exception
+ b .ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+ andi. r5,r12,MSR_FP
+ beql+ load_up_fpu /* skip if already loaded */
+ andis. r5,r12,MSR_VEC@h
+ beql+ load_up_altivec /* skip if already loaded */
+
+#ifndef CONFIG_SMP
+ ld r3,last_task_used_vsx@got(r2)
+ ld r4,0(r3)
+ cmpdi 0,r4,0
+ beq 1f
+ /* Disable VSX for last_task_used_vsx */
+ addi r4,r4,THREAD
+ ld r5,PT_REGS(r4)
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r6,MSR_VSX@h
+ andc r6,r4,r6
+ std r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+ ld r4,PACACURRENT(r13)
+ addi r4,r4,THREAD /* Get THREAD */
+ li r6,1
+ stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+ /* enable use of VSX after return */
+ oris r12,r12,MSR_VSX@h
+ std r12,_MSR(r1)
+#ifndef CONFIG_SMP
+ /* Update last_task_used_math to 'current' */
+ ld r4,PACACURRENT(r13)
+ std r4,0(r3)
+#endif /* CONFIG_SMP */
+ b fast_exception_return
+#endif /* CONFIG_VSX */
+
/*
* Hash table stuff
*/
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+ mfmsr r5
+ oris r5,r5,MSR_VSX@h
+ mtmsrd r5 /* enable use of VSX now */
+ isync
+
+ cmpdi 0,r3,0
+ beqlr- /* if no previous owner, done */
+ addi r3,r3,THREAD /* want THREAD of task */
+ ld r5,PT_REGS(r3)
+ cmpdi 0,r5,0
+ beq 1f
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r3,MSR_VSX@h
+ andc r4,r4,r3 /* disable VSX for previous task */
+ std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+ li r5,0
+ ld r4,last_task_used_vsx@got(r2)
+ std r5,0(r4)
+#endif /* CONFIG_SMP */
+ blr
+
+#endif /* CONFIG_VSX */
+
/* kexec_wait(phys_cpu)
*
* wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
elf_fpregset_t mc_fregs;
unsigned int mc_pad[2];
elf_vrregset_t32 mc_vregs __attribute__((__aligned__(16)));
+ elf_vsrreghalf_t32 mc_vsregs __attribute__((__aligned__(16)));
};
struct ucontext32 {
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
#ifdef CONFIG_ALTIVEC
EXPORT_SYMBOL(giveup_altivec);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
EXPORT_SYMBOL(giveup_spe);
#endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
#ifndef CONFIG_SMP
struct task_struct *last_task_used_math = NULL;
struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
struct task_struct *last_task_used_spe = NULL;
#endif
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
{
+#ifdef CONFIG_VSX
+ int i;
+ elf_fpreg_t *reg;
+#endif
+
if (!tsk->thread.regs)
return 0;
flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ reg = (elf_fpreg_t *)fpregs;
+ for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+ *reg = tsk->thread.TS_FPR(i);
+ memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
return 1;
}
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
}
}
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
{
/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
* separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+ WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+ if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+ giveup_vsx(current);
+ else
+ giveup_vsx(NULL); /* just enable vsx for kernel - force */
+#else
+ giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+ if (tsk->thread.regs) {
+ preempt_disable();
+ if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+ BUG_ON(tsk != current);
+#endif
+ giveup_vsx(tsk);
+ }
+ preempt_enable();
+ }
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers. This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+ /* Grab only the first half */
+ const int nregs = 32;
+ elf_vrreg_t *reg;
+
+ if (tsk == current)
+ flush_vsx_to_thread(tsk);
+
+ reg = (elf_vrreg_t *)vrregs;
+
+ /* copy the first 32 vsr registers */
+ memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+ return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+ int rc = 0;
+ elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+ rc = dump_task_altivec(tsk, regs);
+ if (rc)
+ return rc;
+ regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+ rc = dump_task_altivec(tsk, regs);
+#endif
+ return rc;
+}
+
#ifdef CONFIG_SPE
void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
if (last_task_used_altivec == current)
last_task_used_altivec = NULL;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (last_task_used_vsx == current)
+ last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
if (last_task_used_spe == current)
last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
giveup_altivec(prev);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+ giveup_vsx(prev);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/*
* If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
if (new->thread.regs && last_task_used_altivec == new)
new->thread.regs->msr |= MSR_VEC;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (new->thread.regs && last_task_used_vsx == new)
+ new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* Avoid the trap. On smp this this never happens since
* we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
{MSR_EE, "EE"},
{MSR_PR, "PR"},
{MSR_FP, "FP"},
+ {MSR_VEC, "VEC"},
+ {MSR_VSX, "VSX"},
{MSR_ME, "ME"},
{MSR_IR, "IR"},
{MSR_DR, "DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct
{
flush_fp_to_thread(current);
flush_altivec_to_thread(current);
+ flush_vsx_to_thread(current);
flush_spe_to_thread(current);
}
@@ -689,6 +790,9 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+ current->thread.used_vsr = 0;
+#endif
memset(current->thread.fpr, 0,
sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell. This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ flush_vsx_to_thread(target);
+ return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_SPE
/*
@@ -427,6 +472,9 @@ enum powerpc_regset {
#ifdef CONFIG_ALTIVEC
REGSET_VMX,
#endif
+#ifdef CONFIG_VSX
+ REGSET_VSX,
+#endif
#ifdef CONFIG_SPE
REGSET_SPE,
#endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
.active = vr_active, .get = vr_get, .set = vr_set
},
#endif
+#ifdef CONFIG_VSX
+ [REGSET_VSX] = {
+ .core_note_type = NT_PPC_VSX, .n = 34,
+ .size = sizeof(vector128), .align = sizeof(vector128),
+ .active = vsr_active, .get = vsr_get, .set = vsr_set
+ },
+#endif
#ifdef CONFIG_SPE
[REGSET_SPE] = {
.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
sizeof(u32)),
(const void __user *) data);
#endif
+#ifdef CONFIG_VSX
+ case PTRACE_GETVSRREGS:
+ return copy_regset_to_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (void __user *) data);
+
+ case PTRACE_SETVSRREGS:
+ return copy_regset_from_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (const void __user *) data);
+#endif
#ifdef CONFIG_SPE
case PTRACE_GETEVRREGS:
/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
return 1;
+ /*
+ * Copy VSR 0-31 upper half from thread_struct to local
+ * buffer, then write that to userspace. Also set MSR_VSX in
+ * the saved MSR value to indicate that frame->mc_vregs
+ * contains valid data
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ if (__copy_to_user(&frame->mc_vsregs, buf,
+ ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ msr |= MSR_VSX;
+ }
#else
/* save floating-point registers */
if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
for (i = 0; i < 32 ; i++)
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Force the process to reload the VSX registers from
+ * current->thread when it next does VSX instruction.
+ */
+ regs->msr &= ~MSR_VSX;
+ if (msr & MSR_VSX) {
+ /*
+ * Restore altivec registers from the stack to a local
+ * buffer, then write this out to the thread_struct
+ */
+ if (__copy_from_user(buf, &sr->mc_vsregs,
+ sizeof(sr->mc_vsregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+ } else if (current->thread.used_vsr)
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
#else
if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
buf[i] = current->thread.TS_FPR(i);
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+ /*
+ * Copy VSX low doubleword to local buffer for formatting,
+ * then out to userspace. Update v_regs to point after the
+ * VMX data.
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ v_regs += ELF_NVRREG;
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+ /* set MSR_VSX in the MSR value in the frame to
+ * indicate that sc->vs_reg) contains valid data.
+ */
+ msr |= MSR_VSX;
+ }
#else /* CONFIG_VSX */
/* copy fpr regs and fpscr */
err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
* This has to be done before copying stuff into current->thread.fpr/vr
* for the reasons explained in the previous comment.
*/
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Get additional VSX data. Update v_regs to point after the
+ * VMX data. Copy VSX low doubleword from userspace to local
+ * buffer for formatting, then into the taskstruct.
+ */
+ v_regs += ELF_NVRREG;
+ if ((msr & MSR_VSX) != 0)
+ err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+ else
+ memset(buf, 0, 32 * sizeof(double));
+
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
#else
err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
#endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
}
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /* A user program has executed an vsx instruction,
+ but this kernel doesn't support vsx. */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return;
+ }
+
+ printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+ "%lx at %lx\n", regs->trap, regs->nip);
+ die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
void performance_monitor_exception(struct pt_regs *regs)
{
perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+ if (!user_mode(regs)) {
+ printk(KERN_EMERG "VSX assist exception in kernel mode"
+ " at %lx\n", regs->nip);
+ die("Kernel VSX assist exception", regs, SIGILL);
+ }
+
+ flush_vsx_to_thread(current);
+ printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_FSL_BOOKE
void CacheLockingException(struct pt_regs *regs, unsigned long address,
unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
#ifdef __powerpc64__
# define ELF_NVRREG32 33 /* includes vscr & vrsave stuffed together */
# define ELF_NVRREG 34 /* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32 /* Half the vsx registers */
# define ELF_GREG_TYPE elf_greg_t64
#else
# define ELF_NEVRREG 34 /* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
#ifdef __powerpc64__
typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
#endif
#ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
typedef elf_vrregset_t elf_fpxregset_t;
#ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
#define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
#endif
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
#define PT_VRSAVE_32 (PT_VR0 + 33*4)
#endif
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 /* each VSR reg occupies 4 slots in 32-bit */
+#endif
#endif /* __powerpc64__ */
/*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
#define PTRACE_GETEVRREGS 20
#define PTRACE_SETEVRREGS 21
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS 27
+#define PTRACE_SETVSRREGS 28
+
/*
* Get or set a debug register. The first 16 are DABR registers and the
* second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
#define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */
#define MSR_HV_LG 60 /* Hypervisor state */
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
#define MSR_WE_LG 18 /* Wait State Enable */
#define MSR_TGPR_LG 17 /* TLB Update registers in use */
@@ -71,6 +72,7 @@
#endif
#define MSR_VEC __MASK(MSR_VEC_LG) /* Enable AltiVec */
+#define MSR_VSX __MASK(MSR_VSX_LG) /* Enable VSX */
#define MSR_POW __MASK(MSR_POW_LG) /* Enable Power Management */
#define MSR_WE __MASK(MSR_WE_LG) /* Wait State Enable */
#define MSR_TGPR __MASK(MSR_TGPR_LG) /* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
* it must be copied via a vector register to/from storage) or as a word.
* The entry with index 33 contains the vrsave as the first word (offset 0)
* within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words. Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ * VSR doubleword 0 VSR doubleword 1
+ * ----------------------------------------------------------------
+ * VSR[0] | FPR[0] | |
+ * ----------------------------------------------------------------
+ * VSR[1] | FPR[1] | |
+ * ----------------------------------------------------------------
+ * | ... | |
+ * | ... | |
+ * ----------------------------------------------------------------
+ * VSR[30] | FPR[30] | |
+ * ----------------------------------------------------------------
+ * VSR[31] | FPR[31] | |
+ * ----------------------------------------------------------------
+ * VSR[32] | VR[0] |
+ * ----------------------------------------------------------------
+ * VSR[33] | VR[1] |
+ * ----------------------------------------------------------------
+ * | ... |
+ * | ... |
+ * ----------------------------------------------------------------
+ * VSR[62] | VR[30] |
+ * ----------------------------------------------------------------
+ * VSR[63] | VR[31] |
+ * ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve. vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
*/
elf_vrreg_t __user *v_regs;
- long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+ long vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
#endif
};
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
extern void giveup_altivec(struct task_struct *);
extern void load_up_altivec(struct task_struct *);
extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
extern void enable_kernel_spe(void);
extern void giveup_spe(struct task_struct *);
extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
}
#endif
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
#ifdef CONFIG_SPE
extern void flush_spe_to_thread(struct task_struct *);
#else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
#define NT_PPC_VMX 0x100 /* PowerPC Altivec/VMX registers */
#define NT_PPC_SPE 0x101 /* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX 0x102 /* PowerPC VSX registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-23 5:31 ` Michael Neuling
` (7 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -171,6 +171,22 @@ config VSX
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only useful if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-23 5:31 ` Michael Neuling
` (4 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
` (3 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 5:31 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
arch/powerpc/kernel/prom.c | 4 ++++
include/asm-powerpc/cputable.h | 15 ++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+ {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
#define PPC_FEATURE_HAS_DFP 0x00000400
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
+#define PPC_FEATURE_HAS_VSX 0x00000080
#define PPC_FEATURE_TRUE_LE 0x00000002
#define PPC_FEATURE_PPC_LE 0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
#ifndef __ASSEMBLY__
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
#define PPC_FEATURE_HAS_ALTIVEC_COMP 0
#endif
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP 0
+#define PPC_FEATURE_HAS_VSX_COMP 0
+#endif
+
/* We only set the spe features if the kernel was compiled with spe
* support
*/
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 | \
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
- CPU_FTR_1T_SEGMENT)
+ CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
#else
enum {
CPU_FTRS_POSSIBLE =
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-23 7:38 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 14:46 ` Kumar Gala
2008-06-23 7:38 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
` (7 subsequent siblings)
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This will never happen in reality (VMX and SPE will never be in
the same processor as their opcodes overlap), but it looks bad. Also
when we add VSX here in a later patch, we can hit two of these at the
same time.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/signal_32.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
{
+ unsigned long msr = regs->msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_VEC;
}
/* else assert((regs->msr & MSR_VEC) == 0) */
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_SPE;
}
/* else assert((regs->msr & MSR_SPE) == 0) */
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
#endif /* CONFIG_SPE */
+ if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+ return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x38000000UL + sigret, &frame->tramp[0])
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-23 5:31 ` Michael Neuling
` (8 preceding siblings ...)
2008-06-23 5:31 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
` (9 more replies)
9 siblings, 10 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Paulus: please consider for your 2.6.27 tree.
Updates this post....
- Fixed ptrace 32 error noticed by paulus.
- Fixed calling of load_up_altivec in head_64.S also noticed by paulus
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
` (8 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/process.c | 5 ++-
arch/powerpc/kernel/ptrace.c | 14 +++++----
arch/powerpc/kernel/ptrace32.c | 14 +++++++--
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++--------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 3 ++
8 files changed, 61 insertions(+), 46 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
- memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+ memset(current->thread.fpr, 0,
+ sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ target->thread.fpr, 0, -1);
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ target->thread.fpr, 0, -1);
}
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
return -EPERM;
}
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
+#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
+
long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
compat_ulong_t caddr, compat_ulong_t cdata)
{
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.fpr)
+ [FPRINDEX(index)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.fpr)
+ [FPRINDEX(numReg)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,8 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
@@ -289,4 +291,5 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 3/9] powerpc: Move altivec_unavailable
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (4 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
` (3 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/head_64.S | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b performance_monitor_pSeries
- STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+ . = 0xf20
+ b altivec_unavailable_pSeries
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+ STD_EXCEPTION_PSERIES(., altivec_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (6 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This patch introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/asm-offsets.c | 4 ++
arch/powerpc/kernel/ptrace.c | 28 ++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 59 ++++++++++++++++++++++++++++----------
arch/powerpc/kernel/signal_64.c | 32 ++++++++++++++++++--
include/asm-powerpc/processor.h | 21 ++++++++++++-
5 files changed, 126 insertions(+), 18 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+ DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
#else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = target->thread.TS_FPR(i);
+ memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
target->thread.fpr, 0, -1);
+#endif
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+ if (i)
+ return i;
+ for (i = 0; i < 32 ; i++)
+ target->thread.TS_FPR(i) = buf[i];
+ memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+ return 0;
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
target->thread.fpr, 0, -1);
+#endif
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
int sigret)
{
unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
- /* save general and floating-point registers */
- if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.fpr,
- ELF_NFPREG * sizeof(double)))
+ /* save general registers */
+ if (save_general_regs(regs, frame))
return 1;
#ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
return 1;
#endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+ /* save FPR copy to local buffer then write to the thread_struct */
+ flush_fp_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+ return 1;
+#else
+ /* save floating-point registers */
+ if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+ ELF_NFPREG * sizeof(double)))
+ return 1;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */
if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
long err;
unsigned int save_r2 = 0;
unsigned long msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/*
* restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
*/
discard_lazy_cpu_state();
- /* force the process to reload the FP registers from
- current->thread when it next does FP instructions */
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
- sizeof(sr->mc_fregs)))
- return 1;
-
#ifdef CONFIG_ALTIVEC
- /* force the process to reload the altivec registers from
- current->thread when it next does altivec instructions */
+ /*
+ * Force the process to reload the altivec registers from
+ * current->thread when it next does altivec instructions
+ */
regs->msr &= ~MSR_VEC;
if (msr & MSR_VEC) {
/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+ sizeof(sr->mc_fregs)))
+ return 1;
+#endif /* CONFIG_VSX */
+ /*
+ * force the process to reload the FP registers from
+ * current->thread when it next does FP instructions
+ */
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
#ifdef CONFIG_SPE
/* force the process to reload the spe registers from
current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
#endif
unsigned long msr = regs->msr;
long err = 0;
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+ int i;
+#endif
flush_fp_to_thread(current);
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
+ flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ /* Copy FP to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+ /* copy fpr regs and fpscr */
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
err |= __put_user(&sc->gp_regs, &sc->regs);
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+#endif
unsigned long err = 0;
unsigned long save_r13 = 0;
elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
else
current->thread.vrsave = 0;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* restore floating point */
+ err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+ if (err)
+ return err;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
return err;
}
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
/* Lazy FPU handling on uni-processor */
extern struct task_struct *last_task_used_math;
extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
extern struct task_struct *last_task_used_spe;
#ifdef CONFIG_PPC32
@@ -136,7 +137,13 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
+#else
#define TS_FPR(i) fpr[i]
+#endif
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
@@ -154,8 +161,12 @@ struct thread_struct {
unsigned long dbcr0; /* debug control register values */
unsigned long dbcr1;
#endif
+#ifdef CONFIG_VSX
+ double fpr[32][2]; /* Complete floating point set */
+#else
double fpr[32]; /* Complete floating point set */
- struct { /* fpr ... fpscr must be contiguous */
+#endif
+ struct {
unsigned int pad;
unsigned int val; /* Floating point status */
@@ -175,6 +186,10 @@ struct thread_struct {
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* VSR status */
+ int used_vsr; /* set if process has used altivec */
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
unsigned long evr[32]; /* upper 32-bits of SPE regs */
u64 acc; /* Accumulator */
@@ -291,5 +306,9 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
#define TS_FPRSPACING 1
+#endif
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-23 7:38 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-23 7:38 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
` (6 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
arch/powerpc/kernel/prom.c | 4 ++++
include/asm-powerpc/cputable.h | 15 ++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+ {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
#define PPC_FEATURE_HAS_DFP 0x00000400
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
+#define PPC_FEATURE_HAS_VSX 0x00000080
#define PPC_FEATURE_TRUE_LE 0x00000002
#define PPC_FEATURE_PPC_LE 0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
#ifndef __ASSEMBLY__
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
#define PPC_FEATURE_HAS_ALTIVEC_COMP 0
#endif
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP 0
+#define PPC_FEATURE_HAS_VSX_COMP 0
+#endif
+
/* We only set the spe features if the kernel was compiled with spe
* support
*/
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 | \
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
- CPU_FTR_1T_SEGMENT)
+ CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
#else
enum {
CPU_FTRS_POSSIBLE =
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (7 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/entry_64.S | 5 +
arch/powerpc/kernel/fpu.S | 16 ++++-
arch/powerpc/kernel/head_64.S | 65 +++++++++++++++++++++++
arch/powerpc/kernel/misc_64.S | 33 ++++++++++++
arch/powerpc/kernel/ppc32.h | 1
arch/powerpc/kernel/ppc_ksyms.c | 3 +
arch/powerpc/kernel/process.c | 106 ++++++++++++++++++++++++++++++++++++++-
arch/powerpc/kernel/ptrace.c | 70 +++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 33 ++++++++++++
arch/powerpc/kernel/signal_64.c | 31 +++++++++++
arch/powerpc/kernel/traps.c | 29 ++++++++++
include/asm-powerpc/elf.h | 6 +-
include/asm-powerpc/ptrace.h | 12 ++++
include/asm-powerpc/reg.h | 2
include/asm-powerpc/sigcontext.h | 37 +++++++++++++
include/asm-powerpc/system.h | 9 +++
include/linux/elf.h | 1
17 files changed, 451 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflr r20 /* Return to switch caller */
mfmsr r22
li r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r0,r0,MSR_VSX@h /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
oris r0,r0,MSR_VEC@h /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
_GLOBAL(load_up_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5) /* enable use of fpu now */
isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
- SAVE_32FPRS(0, r4)
+ SAVE_32FPVSRS(0, r5, r4)
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
- REST_32FPRS(0, r5)
+ REST_32FPVSRS(0, r4, r5)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
_GLOBAL(giveup_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5) /* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r5,0
- SAVE_32FPRS(0, r3)
+ SAVE_32FPVSRS(0, r4 ,r3)
mffs fr0
stfd fr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b altivec_unavailable_pSeries
+ . = 0xf40
+ b vsx_unavailable_pSeries
+
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+ STD_EXCEPTION_PSERIES(., vsx_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -836,6 +840,67 @@ _STATIC(load_up_altivec)
blr
#endif /* CONFIG_ALTIVEC */
+ .align 7
+ .globl vsx_unavailable_common
+vsx_unavailable_common:
+ EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ bne .load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+ bl .save_nvgprs
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ ENABLE_INTS
+ bl .vsx_unavailable_exception
+ b .ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+ andi. r5,r12,MSR_FP
+ beql+ load_up_fpu /* skip if already loaded */
+ andis. r5,r12,MSR_VEC@h
+ beql+ load_up_altivec /* skip if already loaded */
+
+#ifndef CONFIG_SMP
+ ld r3,last_task_used_vsx@got(r2)
+ ld r4,0(r3)
+ cmpdi 0,r4,0
+ beq 1f
+ /* Disable VSX for last_task_used_vsx */
+ addi r4,r4,THREAD
+ ld r5,PT_REGS(r4)
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r6,MSR_VSX@h
+ andc r6,r4,r6
+ std r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+ ld r4,PACACURRENT(r13)
+ addi r4,r4,THREAD /* Get THREAD */
+ li r6,1
+ stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+ /* enable use of VSX after return */
+ oris r12,r12,MSR_VSX@h
+ std r12,_MSR(r1)
+#ifndef CONFIG_SMP
+ /* Update last_task_used_math to 'current' */
+ ld r4,PACACURRENT(r13)
+ std r4,0(r3)
+#endif /* CONFIG_SMP */
+ b fast_exception_return
+#endif /* CONFIG_VSX */
+
/*
* Hash table stuff
*/
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+ mfmsr r5
+ oris r5,r5,MSR_VSX@h
+ mtmsrd r5 /* enable use of VSX now */
+ isync
+
+ cmpdi 0,r3,0
+ beqlr- /* if no previous owner, done */
+ addi r3,r3,THREAD /* want THREAD of task */
+ ld r5,PT_REGS(r3)
+ cmpdi 0,r5,0
+ beq 1f
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r3,MSR_VSX@h
+ andc r4,r4,r3 /* disable VSX for previous task */
+ std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+ li r5,0
+ ld r4,last_task_used_vsx@got(r2)
+ std r5,0(r4)
+#endif /* CONFIG_SMP */
+ blr
+
+#endif /* CONFIG_VSX */
+
/* kexec_wait(phys_cpu)
*
* wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
elf_fpregset_t mc_fregs;
unsigned int mc_pad[2];
elf_vrregset_t32 mc_vregs __attribute__((__aligned__(16)));
+ elf_vsrreghalf_t32 mc_vsregs __attribute__((__aligned__(16)));
};
struct ucontext32 {
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
#ifdef CONFIG_ALTIVEC
EXPORT_SYMBOL(giveup_altivec);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
EXPORT_SYMBOL(giveup_spe);
#endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
#ifndef CONFIG_SMP
struct task_struct *last_task_used_math = NULL;
struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
struct task_struct *last_task_used_spe = NULL;
#endif
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
{
+#ifdef CONFIG_VSX
+ int i;
+ elf_fpreg_t *reg;
+#endif
+
if (!tsk->thread.regs)
return 0;
flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ reg = (elf_fpreg_t *)fpregs;
+ for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+ *reg = tsk->thread.TS_FPR(i);
+ memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
return 1;
}
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
}
}
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
{
/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
* separately, see below */
@@ -179,6 +192,79 @@ int dump_task_altivec(struct task_struct
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+ WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+ if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+ giveup_vsx(current);
+ else
+ giveup_vsx(NULL); /* just enable vsx for kernel - force */
+#else
+ giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+ if (tsk->thread.regs) {
+ preempt_disable();
+ if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+ BUG_ON(tsk != current);
+#endif
+ giveup_vsx(tsk);
+ }
+ preempt_enable();
+ }
+}
+
+/*
+ * This dumps the full 128bits of the first 32 VSX registers. This
+ * needs to be called with dump_task_fp and dump_task_altivec to get
+ * all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+ /* Grab only the first half */
+ const int nregs = 32;
+ elf_vrreg_t *reg;
+
+ if (tsk == current)
+ flush_vsx_to_thread(tsk);
+
+ reg = (elf_vrreg_t *)vrregs;
+
+ /* copy the first 32 vsr registers */
+ memcpy(reg, &tsk->thread.vr[0], nregs * sizeof(*reg));
+
+ return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+ int rc = 0;
+ elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+ rc = dump_task_altivec(tsk, regs);
+ if (rc)
+ return rc;
+ regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+ rc = dump_task_altivec(tsk, regs);
+#endif
+ return rc;
+}
+
#ifdef CONFIG_SPE
void enable_kernel_spe(void)
@@ -233,6 +319,10 @@ void discard_lazy_cpu_state(void)
if (last_task_used_altivec == current)
last_task_used_altivec = NULL;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (last_task_used_vsx == current)
+ last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
if (last_task_used_spe == current)
last_task_used_spe = NULL;
@@ -297,6 +387,10 @@ struct task_struct *__switch_to(struct t
if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
giveup_altivec(prev);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+ giveup_vsx(prev);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/*
* If the previous thread used spe in the last quantum
@@ -317,6 +411,10 @@ struct task_struct *__switch_to(struct t
if (new->thread.regs && last_task_used_altivec == new)
new->thread.regs->msr |= MSR_VEC;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (new->thread.regs && last_task_used_vsx == new)
+ new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* Avoid the trap. On smp this this never happens since
* we don't set last_task_used_spe
@@ -417,6 +515,8 @@ static struct regbit {
{MSR_EE, "EE"},
{MSR_PR, "PR"},
{MSR_FP, "FP"},
+ {MSR_VEC, "VEC"},
+ {MSR_VSX, "VSX"},
{MSR_ME, "ME"},
{MSR_IR, "IR"},
{MSR_DR, "DR"},
@@ -534,6 +634,7 @@ void prepare_to_copy(struct task_struct
{
flush_fp_to_thread(current);
flush_altivec_to_thread(current);
+ flush_vsx_to_thread(current);
flush_spe_to_thread(current);
}
@@ -689,6 +790,9 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+ current->thread.used_vsr = 0;
+#endif
memset(current->thread.fpr, 0,
sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell. This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ flush_vsx_to_thread(target);
+ return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_SPE
/*
@@ -427,6 +472,9 @@ enum powerpc_regset {
#ifdef CONFIG_ALTIVEC
REGSET_VMX,
#endif
+#ifdef CONFIG_VSX
+ REGSET_VSX,
+#endif
#ifdef CONFIG_SPE
REGSET_SPE,
#endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
.active = vr_active, .get = vr_get, .set = vr_set
},
#endif
+#ifdef CONFIG_VSX
+ [REGSET_VSX] = {
+ .core_note_type = NT_PPC_VSX, .n = 34,
+ .size = sizeof(vector128), .align = sizeof(vector128),
+ .active = vsr_active, .get = vsr_get, .set = vsr_set
+ },
+#endif
#ifdef CONFIG_SPE
[REGSET_SPE] = {
.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
sizeof(u32)),
(const void __user *) data);
#endif
+#ifdef CONFIG_VSX
+ case PTRACE_GETVSRREGS:
+ return copy_regset_to_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (void __user *) data);
+
+ case PTRACE_SETVSRREGS:
+ return copy_regset_from_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (const void __user *) data);
+#endif
#ifdef CONFIG_SPE
case PTRACE_GETEVRREGS:
/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
return 1;
+ /*
+ * Copy VSR 0-31 upper half from thread_struct to local
+ * buffer, then write that to userspace. Also set MSR_VSX in
+ * the saved MSR value to indicate that frame->mc_vregs
+ * contains valid data
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ if (__copy_to_user(&frame->mc_vsregs, buf,
+ ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ msr |= MSR_VSX;
+ }
#else
/* save floating-point registers */
if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
for (i = 0; i < 32 ; i++)
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Force the process to reload the VSX registers from
+ * current->thread when it next does VSX instruction.
+ */
+ regs->msr &= ~MSR_VSX;
+ if (msr & MSR_VSX) {
+ /*
+ * Restore altivec registers from the stack to a local
+ * buffer, then write this out to the thread_struct
+ */
+ if (__copy_from_user(buf, &sr->mc_vsregs,
+ sizeof(sr->mc_vsregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+ } else if (current->thread.used_vsr)
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
#else
if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
buf[i] = current->thread.TS_FPR(i);
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+ /*
+ * Copy VSX low doubleword to local buffer for formatting,
+ * then out to userspace. Update v_regs to point after the
+ * VMX data.
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ v_regs += ELF_NVRREG;
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+ /* set MSR_VSX in the MSR value in the frame to
+ * indicate that sc->vs_reg) contains valid data.
+ */
+ msr |= MSR_VSX;
+ }
#else /* CONFIG_VSX */
/* copy fpr regs and fpscr */
err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
* This has to be done before copying stuff into current->thread.fpr/vr
* for the reasons explained in the previous comment.
*/
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Get additional VSX data. Update v_regs to point after the
+ * VMX data. Copy VSX low doubleword from userspace to local
+ * buffer for formatting, then into the taskstruct.
+ */
+ v_regs += ELF_NVRREG;
+ if ((msr & MSR_VSX) != 0)
+ err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+ else
+ memset(buf, 0, 32 * sizeof(double));
+
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
#else
err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
#endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
}
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /* A user program has executed an vsx instruction,
+ but this kernel doesn't support vsx. */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return;
+ }
+
+ printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+ "%lx at %lx\n", regs->trap, regs->nip);
+ die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
void performance_monitor_exception(struct pt_regs *regs)
{
perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+ if (!user_mode(regs)) {
+ printk(KERN_EMERG "VSX assist exception in kernel mode"
+ " at %lx\n", regs->nip);
+ die("Kernel VSX assist exception", regs, SIGILL);
+ }
+
+ flush_vsx_to_thread(current);
+ printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_FSL_BOOKE
void CacheLockingException(struct pt_regs *regs, unsigned long address,
unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
#ifdef __powerpc64__
# define ELF_NVRREG32 33 /* includes vscr & vrsave stuffed together */
# define ELF_NVRREG 34 /* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32 /* Half the vsx registers */
# define ELF_GREG_TYPE elf_greg_t64
#else
# define ELF_NEVRREG 34 /* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
#ifdef __powerpc64__
typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
#endif
#ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
typedef elf_vrregset_t elf_fpxregset_t;
#ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
#define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
#endif
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
#define PT_VRSAVE_32 (PT_VR0 + 33*4)
#endif
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 /* each VSR reg occupies 4 slots in 32-bit */
+#endif
#endif /* __powerpc64__ */
/*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
#define PTRACE_GETEVRREGS 20
#define PTRACE_SETEVRREGS 21
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS 27
+#define PTRACE_SETVSRREGS 28
+
/*
* Get or set a debug register. The first 16 are DABR registers and the
* second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
#define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */
#define MSR_HV_LG 60 /* Hypervisor state */
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
#define MSR_WE_LG 18 /* Wait State Enable */
#define MSR_TGPR_LG 17 /* TLB Update registers in use */
@@ -71,6 +72,7 @@
#endif
#define MSR_VEC __MASK(MSR_VEC_LG) /* Enable AltiVec */
+#define MSR_VSX __MASK(MSR_VSX_LG) /* Enable VSX */
#define MSR_POW __MASK(MSR_POW_LG) /* Enable Power Management */
#define MSR_WE __MASK(MSR_WE_LG) /* Wait State Enable */
#define MSR_TGPR __MASK(MSR_TGPR_LG) /* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
* it must be copied via a vector register to/from storage) or as a word.
* The entry with index 33 contains the vrsave as the first word (offset 0)
* within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words. Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ * VSR doubleword 0 VSR doubleword 1
+ * ----------------------------------------------------------------
+ * VSR[0] | FPR[0] | |
+ * ----------------------------------------------------------------
+ * VSR[1] | FPR[1] | |
+ * ----------------------------------------------------------------
+ * | ... | |
+ * | ... | |
+ * ----------------------------------------------------------------
+ * VSR[30] | FPR[30] | |
+ * ----------------------------------------------------------------
+ * VSR[31] | FPR[31] | |
+ * ----------------------------------------------------------------
+ * VSR[32] | VR[0] |
+ * ----------------------------------------------------------------
+ * VSR[33] | VR[1] |
+ * ----------------------------------------------------------------
+ * | ... |
+ * | ... |
+ * ----------------------------------------------------------------
+ * VSR[62] | VR[30] |
+ * ----------------------------------------------------------------
+ * VSR[63] | VR[31] |
+ * ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve. vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
*/
elf_vrreg_t __user *v_regs;
- long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+ long vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
#endif
};
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
extern void giveup_altivec(struct task_struct *);
extern void load_up_altivec(struct task_struct *);
extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
extern void enable_kernel_spe(void);
extern void giveup_spe(struct task_struct *);
extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
}
#endif
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
#ifdef CONFIG_SPE
extern void flush_spe_to_thread(struct task_struct *);
#else
Index: linux-2.6-ozlabs/include/linux/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/linux/elf.h
+++ linux-2.6-ozlabs/include/linux/elf.h
@@ -358,6 +358,7 @@ typedef struct elf64_shdr {
#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
#define NT_PPC_VMX 0x100 /* PowerPC Altivec/VMX registers */
#define NT_PPC_SPE 0x101 /* PowerPC SPE/EVR registers */
+#define NT_PPC_VSX 0x102 /* PowerPC VSX registers */
#define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (2 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
` (5 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only useful if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (3 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
` (4 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 2 +-
arch/powerpc/kernel/head_32.S | 6 ++++--
arch/powerpc/kernel/head_64.S | 10 +++++++---
arch/powerpc/kernel/head_booke.h | 6 ++++--
4 files changed, 16 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
#endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
- b fast_exception_return
+ blr
/*
* giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b ProgramCheck
END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
- bne load_up_fpu /* if from user, just load it up */
- addi r3,r1,STACK_FRAME_OVERHEAD
+ beq 1f
+ bl load_up_fpu /* if from user, just load it up */
+ b fast_exception_return
+1: addi r3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
/* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b .load_up_fpu
+1: bl .load_up_fpu
+ b fast_exception_return
.align 7
.globl altivec_unavailable_common
@@ -749,7 +750,10 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
- bne .load_up_altivec /* if from user, just load it up */
+ beq 1f
+ bl .load_up_altivec
+ b fast_exception_return
+1:
END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
#endif
bl .save_nvgprs
@@ -829,7 +833,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
#endif /* CONFIG_SMP */
/* restore registers and return */
- b fast_exception_return
+ blr
#endif /* CONFIG_ALTIVEC */
/*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
#define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG; \
- bne load_up_fpu; /* if from user, just load it up */ \
- addi r3,r1,STACK_FRAME_OVERHEAD; \
+ beq 1f; \
+ bl load_up_fpu; /* if from user, just load it up */ \
+ b fast_exception_return; \
+1: addi r3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
#endif /* __HEAD_BOOKE_H__ */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (5 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-23 7:38 ` Michael Neuling
2008-06-23 7:38 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
` (2 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-23 7:38 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
REST_10GPRS(22, base)
#endif
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
+ ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
#define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ REST_32FPRS(n,base); \
+ b 3f; \
+2: REST_32VSRS(n,c,base); \
+3:
+
+#define SAVE_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ SAVE_32FPRS(n,base); \
+ b 3f; \
+2: SAVE_32VSRS(n,c,base); \
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#endif
+
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
#define vr30 30
#define vr31 31
+/* VSX Registers (VSRs) */
+
+#define vsr0 0
+#define vsr1 1
+#define vsr2 2
+#define vsr3 3
+#define vsr4 4
+#define vsr5 5
+#define vsr6 6
+#define vsr7 7
+#define vsr8 8
+#define vsr9 9
+#define vsr10 10
+#define vsr11 11
+#define vsr12 12
+#define vsr13 13
+#define vsr14 14
+#define vsr15 15
+#define vsr16 16
+#define vsr17 17
+#define vsr18 18
+#define vsr19 19
+#define vsr20 20
+#define vsr21 21
+#define vsr22 22
+#define vsr23 23
+#define vsr24 24
+#define vsr25 25
+#define vsr26 26
+#define vsr27 27
+#define vsr28 28
+#define vsr29 29
+#define vsr30 30
+#define vsr31 31
+#define vsr32 32
+#define vsr33 33
+#define vsr34 34
+#define vsr35 35
+#define vsr36 36
+#define vsr37 37
+#define vsr38 38
+#define vsr39 39
+#define vsr40 40
+#define vsr41 41
+#define vsr42 42
+#define vsr43 43
+#define vsr44 44
+#define vsr45 45
+#define vsr46 46
+#define vsr47 47
+#define vsr48 48
+#define vsr49 49
+#define vsr50 50
+#define vsr51 51
+#define vsr52 52
+#define vsr53 53
+#define vsr54 54
+#define vsr55 55
+#define vsr56 56
+#define vsr57 57
+#define vsr58 58
+#define vsr59 59
+#define vsr60 60
+#define vsr61 61
+#define vsr62 62
+#define vsr63 63
+
/* SPE Registers (EVPRs) */
#define evr0 0
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-23 7:38 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-23 14:46 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-23 14:46 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 23, 2008, at 2:38 AM, Michael Neuling wrote:
> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit. This will never happen in reality (VMX and SPE will never be in
> the same processor as their opcodes overlap), but it looks bad. Also
> when we add VSX here in a later patch, we can hit two of these at the
> same time.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
I think it would also be good to comment about how this doesn't happen
since they are the same MSR bit. Having that comment might reduce
confusion if anyone ever looks at this commit message in the future.
(Plus you seem to have trailing white space in the commit message).
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (8 preceding siblings ...)
2008-06-23 7:38 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 10:57 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
` (9 more replies)
9 siblings, 10 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Paulus: please consider for your 2.6.27 tree.
Updates this post....
- Comment on VMX vs SPE as suggested by Kumar.
- Fixes for core files
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-24 10:57 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-24 10:57 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 14:07 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
` (6 subsequent siblings)
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/process.c | 5 ++-
arch/powerpc/kernel/ptrace.c | 14 +++++----
arch/powerpc/kernel/ptrace32.c | 14 +++++++--
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++--------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 3 ++
8 files changed, 61 insertions(+), 46 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
@@ -689,7 +689,8 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
- memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
+ memset(current->thread.fpr, 0,
+ sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
memset(current->thread.vr, 0, sizeof(current->thread.vr));
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ target->thread.fpr, 0, -1);
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpr, 0, -1);
+ target->thread.fpr, 0, -1);
}
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
return -EPERM;
}
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
+#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
+
long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
compat_ulong_t caddr, compat_ulong_t cdata)
{
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.fpr)
+ [FPRINDEX(index)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.fpr)
+ [FPRINDEX(numReg)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.fpr)
+ [TS_FPRSPACING * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRSPACING*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -136,6 +136,8 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
@@ -289,4 +291,5 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#define TS_FPRSPACING 1
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 3/9] powerpc: Move altivec_unavailable
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 10:57 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
` (8 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/head_64.S | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b performance_monitor_pSeries
- STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+ . = 0xf20
+ b altivec_unavailable_pSeries
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+ STD_EXCEPTION_PSERIES(., altivec_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-24 10:57 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 13:47 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
` (7 subsequent siblings)
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This doesn't matter in reality as they are infact the same bit
but looks bad.
Also, when we add VSX in a later patch, we need to be able to set two
separate MSR bits here.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/signal_32.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
{
+ unsigned long msr = regs->msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_VEC;
}
/* else assert((regs->msr & MSR_VEC) == 0) */
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_SPE;
}
/* else assert((regs->msr & MSR_SPE) == 0) */
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
#endif /* CONFIG_SPE */
+ if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+ return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x38000000UL + sigret, &frame->tramp[0])
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (4 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 14:01 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
` (3 subsequent siblings)
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 2 +-
arch/powerpc/kernel/head_32.S | 6 ++++--
arch/powerpc/kernel/head_64.S | 10 +++++++---
arch/powerpc/kernel/head_booke.h | 6 ++++--
4 files changed, 16 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
#endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
- b fast_exception_return
+ blr
/*
* giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b ProgramCheck
END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
- bne load_up_fpu /* if from user, just load it up */
- addi r3,r1,STACK_FRAME_OVERHEAD
+ beq 1f
+ bl load_up_fpu /* if from user, just load it up */
+ b fast_exception_return
+1: addi r3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
/* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b .load_up_fpu
+1: bl .load_up_fpu
+ b fast_exception_return
.align 7
.globl altivec_unavailable_common
@@ -749,7 +750,10 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
- bne .load_up_altivec /* if from user, just load it up */
+ beq 1f
+ bl .load_up_altivec
+ b fast_exception_return
+1:
END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
#endif
bl .save_nvgprs
@@ -829,7 +833,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
#endif /* CONFIG_SMP */
/* restore registers and return */
- b fast_exception_return
+ blr
#endif /* CONFIG_ALTIVEC */
/*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
#define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG; \
- bne load_up_fpu; /* if from user, just load it up */ \
- addi r3,r1,STACK_FRAME_OVERHEAD; \
+ beq 1f; \
+ bl load_up_fpu; /* if from user, just load it up */ \
+ b fast_exception_return; \
+1: addi r3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
#endif /* __HEAD_BOOKE_H__ */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (5 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 14:19 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
` (2 subsequent siblings)
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
arch/powerpc/kernel/prom.c | 4 ++++
include/asm-powerpc/cputable.h | 15 ++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+ {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
#define PPC_FEATURE_HAS_DFP 0x00000400
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
+#define PPC_FEATURE_HAS_VSX 0x00000080
#define PPC_FEATURE_TRUE_LE 0x00000002
#define PPC_FEATURE_PPC_LE 0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
#ifndef __ASSEMBLY__
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
#define PPC_FEATURE_HAS_ALTIVEC_COMP 0
#endif
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP 0
+#define PPC_FEATURE_HAS_VSX_COMP 0
+#endif
+
/* We only set the spe features if the kernel was compiled with spe
* support
*/
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 | \
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
- CPU_FTR_1T_SEGMENT)
+ CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
#else
enum {
CPU_FTRS_POSSIBLE =
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (7 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 14:06 ` Kumar Gala
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
REST_10GPRS(22, base)
#endif
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
+ ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
#define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ REST_32FPRS(n,base); \
+ b 3f; \
+2: REST_32VSRS(n,c,base); \
+3:
+
+#define SAVE_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ SAVE_32FPRS(n,base); \
+ b 3f; \
+2: SAVE_32VSRS(n,c,base); \
+3:
+
+#else
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#endif
+
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +594,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
#define vr30 30
#define vr31 31
+/* VSX Registers (VSRs) */
+
+#define vsr0 0
+#define vsr1 1
+#define vsr2 2
+#define vsr3 3
+#define vsr4 4
+#define vsr5 5
+#define vsr6 6
+#define vsr7 7
+#define vsr8 8
+#define vsr9 9
+#define vsr10 10
+#define vsr11 11
+#define vsr12 12
+#define vsr13 13
+#define vsr14 14
+#define vsr15 15
+#define vsr16 16
+#define vsr17 17
+#define vsr18 18
+#define vsr19 19
+#define vsr20 20
+#define vsr21 21
+#define vsr22 22
+#define vsr23 23
+#define vsr24 24
+#define vsr25 25
+#define vsr26 26
+#define vsr27 27
+#define vsr28 28
+#define vsr29 29
+#define vsr30 30
+#define vsr31 31
+#define vsr32 32
+#define vsr33 33
+#define vsr34 34
+#define vsr35 35
+#define vsr36 36
+#define vsr37 37
+#define vsr38 38
+#define vsr39 39
+#define vsr40 40
+#define vsr41 41
+#define vsr42 42
+#define vsr43 43
+#define vsr44 44
+#define vsr45 45
+#define vsr46 46
+#define vsr47 47
+#define vsr48 48
+#define vsr49 49
+#define vsr50 50
+#define vsr51 51
+#define vsr52 52
+#define vsr53 53
+#define vsr54 54
+#define vsr55 55
+#define vsr56 56
+#define vsr57 57
+#define vsr58 58
+#define vsr59 59
+#define vsr60 60
+#define vsr61 61
+#define vsr62 62
+#define vsr63 63
+
/* SPE Registers (EVPRs) */
#define evr0 0
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (2 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 10:57 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
` (5 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This patch introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/asm-offsets.c | 4 ++
arch/powerpc/kernel/ptrace.c | 28 ++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 59 ++++++++++++++++++++++++++++----------
arch/powerpc/kernel/signal_64.c | 32 ++++++++++++++++++--
include/asm-powerpc/processor.h | 21 ++++++++++++-
5 files changed, 126 insertions(+), 18 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+ DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
#else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,26 +215,54 @@ static int fpr_get(struct task_struct *t
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = target->thread.TS_FPR(i);
+ memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
target->thread.fpr, 0, -1);
+#endif
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+ if (i)
+ return i;
+ for (i = 0; i < 32 ; i++)
+ target->thread.TS_FPR(i) = buf[i];
+ memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+ return 0;
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
target->thread.fpr, 0, -1);
+#endif
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
int sigret)
{
unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
- /* save general and floating-point registers */
- if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.fpr,
- ELF_NFPREG * sizeof(double)))
+ /* save general registers */
+ if (save_general_regs(regs, frame))
return 1;
#ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
return 1;
#endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+ /* save FPR copy to local buffer then write to the thread_struct */
+ flush_fp_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+ return 1;
+#else
+ /* save floating-point registers */
+ if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+ ELF_NFPREG * sizeof(double)))
+ return 1;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */
if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
long err;
unsigned int save_r2 = 0;
unsigned long msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/*
* restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
*/
discard_lazy_cpu_state();
- /* force the process to reload the FP registers from
- current->thread when it next does FP instructions */
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
- sizeof(sr->mc_fregs)))
- return 1;
-
#ifdef CONFIG_ALTIVEC
- /* force the process to reload the altivec registers from
- current->thread when it next does altivec instructions */
+ /*
+ * Force the process to reload the altivec registers from
+ * current->thread when it next does altivec instructions
+ */
regs->msr &= ~MSR_VEC;
if (msr & MSR_VEC) {
/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+ sizeof(sr->mc_fregs)))
+ return 1;
+#endif /* CONFIG_VSX */
+ /*
+ * force the process to reload the FP registers from
+ * current->thread when it next does FP instructions
+ */
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
#ifdef CONFIG_SPE
/* force the process to reload the spe registers from
current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
#endif
unsigned long msr = regs->msr;
long err = 0;
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+ int i;
+#endif
flush_fp_to_thread(current);
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
+ flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ /* Copy FP to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+ /* copy fpr regs and fpscr */
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
err |= __put_user(&sc->gp_regs, &sc->regs);
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+#endif
unsigned long err = 0;
unsigned long save_r13 = 0;
elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
else
current->thread.vrsave = 0;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* restore floating point */
+ err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+ if (err)
+ return err;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
return err;
}
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -78,6 +78,7 @@ extern long kernel_thread(int (*fn)(void
/* Lazy FPU handling on uni-processor */
extern struct task_struct *last_task_used_math;
extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
extern struct task_struct *last_task_used_spe;
#ifdef CONFIG_PPC32
@@ -136,7 +137,13 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#ifdef CONFIG_VSX
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
+#else
#define TS_FPR(i) fpr[i]
+#endif
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
@@ -154,8 +161,12 @@ struct thread_struct {
unsigned long dbcr0; /* debug control register values */
unsigned long dbcr1;
#endif
+#ifdef CONFIG_VSX
+ double fpr[32][2]; /* Complete floating point set */
+#else
double fpr[32]; /* Complete floating point set */
- struct { /* fpr ... fpscr must be contiguous */
+#endif
+ struct {
unsigned int pad;
unsigned int val; /* Floating point status */
@@ -175,6 +186,10 @@ struct thread_struct {
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* VSR status */
+ int used_vsr; /* set if process has used altivec */
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
unsigned long evr[32]; /* upper 32-bits of SPE regs */
u64 acc; /* Accumulator */
@@ -291,5 +306,9 @@ static inline void prefetchw(const void
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
+#ifdef CONFIG_VSX
+#define TS_FPRSPACING 2
+#else
#define TS_FPRSPACING 1
+#endif
#endif /* _ASM_POWERPC_PROCESSOR_H */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (6 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 14:19 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
9 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only useful if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (3 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-24 10:57 ` Michael Neuling
2008-06-24 10:57 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
` (4 subsequent siblings)
9 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-24 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/entry_64.S | 5 +
arch/powerpc/kernel/fpu.S | 16 ++++-
arch/powerpc/kernel/head_64.S | 65 +++++++++++++++++++++++
arch/powerpc/kernel/misc_64.S | 33 ++++++++++++
arch/powerpc/kernel/ppc32.h | 1
arch/powerpc/kernel/ppc_ksyms.c | 3 +
arch/powerpc/kernel/process.c | 107 ++++++++++++++++++++++++++++++++++++++-
arch/powerpc/kernel/ptrace.c | 70 +++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 33 ++++++++++++
arch/powerpc/kernel/signal_64.c | 31 ++++++++++-
arch/powerpc/kernel/traps.c | 29 ++++++++++
include/asm-powerpc/elf.h | 6 +-
include/asm-powerpc/ptrace.h | 12 ++++
include/asm-powerpc/reg.h | 2
include/asm-powerpc/sigcontext.h | 37 +++++++++++++
include/asm-powerpc/system.h | 9 +++
16 files changed, 451 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflr r20 /* Return to switch caller */
mfmsr r22
li r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r0,r0,MSR_VSX@h /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
oris r0,r0,MSR_VEC@h /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -34,6 +34,11 @@
_GLOBAL(load_up_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5) /* enable use of fpu now */
isync
@@ -50,7 +55,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
- SAVE_32FPRS(0, r4)
+ SAVE_32FPVSRS(0, r5, r4)
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -77,7 +82,7 @@ _GLOBAL(load_up_fpu)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
- REST_32FPRS(0, r5)
+ REST_32FPVSRS(0, r4, r5)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
fromreal(r4)
@@ -96,6 +101,11 @@ _GLOBAL(load_up_fpu)
_GLOBAL(giveup_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5) /* enable use of fpu now */
@@ -106,7 +116,7 @@ _GLOBAL(giveup_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r5,0
- SAVE_32FPRS(0, r3)
+ SAVE_32FPVSRS(0, r4 ,r3)
mffs fr0
stfd fr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b altivec_unavailable_pSeries
+ . = 0xf40
+ b vsx_unavailable_pSeries
+
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+ STD_EXCEPTION_PSERIES(., vsx_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -836,6 +840,67 @@ _STATIC(load_up_altivec)
blr
#endif /* CONFIG_ALTIVEC */
+ .align 7
+ .globl vsx_unavailable_common
+vsx_unavailable_common:
+ EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ bne .load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+ bl .save_nvgprs
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ ENABLE_INTS
+ bl .vsx_unavailable_exception
+ b .ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+ andi. r5,r12,MSR_FP
+ beql+ load_up_fpu /* skip if already loaded */
+ andis. r5,r12,MSR_VEC@h
+ beql+ load_up_altivec /* skip if already loaded */
+
+#ifndef CONFIG_SMP
+ ld r3,last_task_used_vsx@got(r2)
+ ld r4,0(r3)
+ cmpdi 0,r4,0
+ beq 1f
+ /* Disable VSX for last_task_used_vsx */
+ addi r4,r4,THREAD
+ ld r5,PT_REGS(r4)
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r6,MSR_VSX@h
+ andc r6,r4,r6
+ std r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+ ld r4,PACACURRENT(r13)
+ addi r4,r4,THREAD /* Get THREAD */
+ li r6,1
+ stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+ /* enable use of VSX after return */
+ oris r12,r12,MSR_VSX@h
+ std r12,_MSR(r1)
+#ifndef CONFIG_SMP
+ /* Update last_task_used_math to 'current' */
+ ld r4,PACACURRENT(r13)
+ std r4,0(r3)
+#endif /* CONFIG_SMP */
+ b fast_exception_return
+#endif /* CONFIG_VSX */
+
/*
* Hash table stuff
*/
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+ mfmsr r5
+ oris r5,r5,MSR_VSX@h
+ mtmsrd r5 /* enable use of VSX now */
+ isync
+
+ cmpdi 0,r3,0
+ beqlr- /* if no previous owner, done */
+ addi r3,r3,THREAD /* want THREAD of task */
+ ld r5,PT_REGS(r3)
+ cmpdi 0,r5,0
+ beq 1f
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r3,MSR_VSX@h
+ andc r4,r4,r3 /* disable VSX for previous task */
+ std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+ li r5,0
+ ld r4,last_task_used_vsx@got(r2)
+ std r5,0(r4)
+#endif /* CONFIG_SMP */
+ blr
+
+#endif /* CONFIG_VSX */
+
/* kexec_wait(phys_cpu)
*
* wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
elf_fpregset_t mc_fregs;
unsigned int mc_pad[2];
elf_vrregset_t32 mc_vregs __attribute__((__aligned__(16)));
+ elf_vsrreghalf_t32 mc_vsregs __attribute__((__aligned__(16)));
};
struct ucontext32 {
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
#ifdef CONFIG_ALTIVEC
EXPORT_SYMBOL(giveup_altivec);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
EXPORT_SYMBOL(giveup_spe);
#endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
#ifndef CONFIG_SMP
struct task_struct *last_task_used_math = NULL;
struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
struct task_struct *last_task_used_spe = NULL;
#endif
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
{
+#ifdef CONFIG_VSX
+ int i;
+ elf_fpreg_t *reg;
+#endif
+
if (!tsk->thread.regs)
return 0;
flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ reg = (elf_fpreg_t *)fpregs;
+ for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+ *reg = tsk->thread.TS_FPR(i);
+ memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
return 1;
}
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
}
}
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
{
/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
* separately, see below */
@@ -179,6 +192,80 @@ int dump_task_altivec(struct task_struct
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+ WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+ if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+ giveup_vsx(current);
+ else
+ giveup_vsx(NULL); /* just enable vsx for kernel - force */
+#else
+ giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+ if (tsk->thread.regs) {
+ preempt_disable();
+ if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+ BUG_ON(tsk != current);
+#endif
+ giveup_vsx(tsk);
+ }
+ preempt_enable();
+ }
+}
+
+/*
+ * This dumps the lower half 64bits of the first 32 VSX registers.
+ * This needs to be called with dump_task_fp and dump_task_altivec to
+ * get all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+ elf_vrreg_t *reg;
+ double buf[32];
+ int i;
+
+ if (tsk == current)
+ flush_vsx_to_thread(tsk);
+
+ reg = (elf_vrreg_t *)vrregs;
+
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ memcpy(reg, buf, sizeof(buf));
+
+ return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+ int rc = 0;
+ elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+ rc = dump_task_altivec(tsk, regs);
+ if (rc)
+ return rc;
+ regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+ rc = dump_task_vsx(tsk, regs);
+#endif
+ return rc;
+}
+
#ifdef CONFIG_SPE
void enable_kernel_spe(void)
@@ -233,6 +320,10 @@ void discard_lazy_cpu_state(void)
if (last_task_used_altivec == current)
last_task_used_altivec = NULL;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (last_task_used_vsx == current)
+ last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
if (last_task_used_spe == current)
last_task_used_spe = NULL;
@@ -297,6 +388,10 @@ struct task_struct *__switch_to(struct t
if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
giveup_altivec(prev);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+ giveup_vsx(prev);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/*
* If the previous thread used spe in the last quantum
@@ -317,6 +412,10 @@ struct task_struct *__switch_to(struct t
if (new->thread.regs && last_task_used_altivec == new)
new->thread.regs->msr |= MSR_VEC;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (new->thread.regs && last_task_used_vsx == new)
+ new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* Avoid the trap. On smp this this never happens since
* we don't set last_task_used_spe
@@ -417,6 +516,8 @@ static struct regbit {
{MSR_EE, "EE"},
{MSR_PR, "PR"},
{MSR_FP, "FP"},
+ {MSR_VEC, "VEC"},
+ {MSR_VSX, "VSX"},
{MSR_ME, "ME"},
{MSR_IR, "IR"},
{MSR_DR, "DR"},
@@ -534,6 +635,7 @@ void prepare_to_copy(struct task_struct
{
flush_fp_to_thread(current);
flush_altivec_to_thread(current);
+ flush_vsx_to_thread(current);
flush_spe_to_thread(current);
}
@@ -689,6 +791,9 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+ current->thread.used_vsr = 0;
+#endif
memset(current->thread.fpr, 0,
sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -351,6 +351,51 @@ static int vr_set(struct task_struct *ta
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell. This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ flush_vsx_to_thread(target);
+ return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_SPE
/*
@@ -427,6 +472,9 @@ enum powerpc_regset {
#ifdef CONFIG_ALTIVEC
REGSET_VMX,
#endif
+#ifdef CONFIG_VSX
+ REGSET_VSX,
+#endif
#ifdef CONFIG_SPE
REGSET_SPE,
#endif
@@ -450,6 +498,13 @@ static const struct user_regset native_r
.active = vr_active, .get = vr_get, .set = vr_set
},
#endif
+#ifdef CONFIG_VSX
+ [REGSET_VSX] = {
+ .n = 32,
+ .size = sizeof(vector128), .align = sizeof(vector128),
+ .active = vsr_active, .get = vsr_get, .set = vsr_set
+ },
+#endif
#ifdef CONFIG_SPE
[REGSET_SPE] = {
.n = 35,
@@ -850,6 +905,21 @@ long arch_ptrace(struct task_struct *chi
sizeof(u32)),
(const void __user *) data);
#endif
+#ifdef CONFIG_VSX
+ case PTRACE_GETVSRREGS:
+ return copy_regset_to_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (void __user *) data);
+
+ case PTRACE_SETVSRREGS:
+ return copy_regset_from_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (const void __user *) data);
+#endif
#ifdef CONFIG_SPE
case PTRACE_GETEVRREGS:
/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
return 1;
+ /*
+ * Copy VSR 0-31 upper half from thread_struct to local
+ * buffer, then write that to userspace. Also set MSR_VSX in
+ * the saved MSR value to indicate that frame->mc_vregs
+ * contains valid data
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ if (__copy_to_user(&frame->mc_vsregs, buf,
+ ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ msr |= MSR_VSX;
+ }
#else
/* save floating-point registers */
if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
for (i = 0; i < 32 ; i++)
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Force the process to reload the VSX registers from
+ * current->thread when it next does VSX instruction.
+ */
+ regs->msr &= ~MSR_VSX;
+ if (msr & MSR_VSX) {
+ /*
+ * Restore altivec registers from the stack to a local
+ * buffer, then write this out to the thread_struct
+ */
+ if (__copy_from_user(buf, &sr->mc_vsregs,
+ sizeof(sr->mc_vsregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+ } else if (current->thread.used_vsr)
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
#else
if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
buf[i] = current->thread.TS_FPR(i);
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+ /*
+ * Copy VSX low doubleword to local buffer for formatting,
+ * then out to userspace. Update v_regs to point after the
+ * VMX data.
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ v_regs += ELF_NVRREG;
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+ /* set MSR_VSX in the MSR value in the frame to
+ * indicate that sc->vs_reg) contains valid data.
+ */
+ msr |= MSR_VSX;
+ }
#else /* CONFIG_VSX */
/* copy fpr regs and fpscr */
err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
* This has to be done before copying stuff into current->thread.fpr/vr
* for the reasons explained in the previous comment.
*/
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Get additional VSX data. Update v_regs to point after the
+ * VMX data. Copy VSX low doubleword from userspace to local
+ * buffer for formatting, then into the taskstruct.
+ */
+ v_regs += ELF_NVRREG;
+ if ((msr & MSR_VSX) != 0)
+ err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+ else
+ memset(buf, 0, 32 * sizeof(double));
+
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
#else
err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
#endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
}
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /* A user program has executed an vsx instruction,
+ but this kernel doesn't support vsx. */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return;
+ }
+
+ printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+ "%lx at %lx\n", regs->trap, regs->nip);
+ die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
void performance_monitor_exception(struct pt_regs *regs)
{
perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+ if (!user_mode(regs)) {
+ printk(KERN_EMERG "VSX assist exception in kernel mode"
+ " at %lx\n", regs->nip);
+ die("Kernel VSX assist exception", regs, SIGILL);
+ }
+
+ flush_vsx_to_thread(current);
+ printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_FSL_BOOKE
void CacheLockingException(struct pt_regs *regs, unsigned long address,
unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
#ifdef __powerpc64__
# define ELF_NVRREG32 33 /* includes vscr & vrsave stuffed together */
# define ELF_NVRREG 34 /* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32 /* Half the vsx registers */
# define ELF_GREG_TYPE elf_greg_t64
#else
# define ELF_NEVRREG 34 /* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
#ifdef __powerpc64__
typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
#endif
#ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
typedef elf_vrregset_t elf_fpxregset_t;
#ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
#define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
#endif
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
#define PT_VRSAVE_32 (PT_VR0 + 33*4)
#endif
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 /* each VSR reg occupies 4 slots in 32-bit */
+#endif
#endif /* __powerpc64__ */
/*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
#define PTRACE_GETEVRREGS 20
#define PTRACE_SETEVRREGS 21
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS 27
+#define PTRACE_SETVSRREGS 28
+
/*
* Get or set a debug register. The first 16 are DABR registers and the
* second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
#define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */
#define MSR_HV_LG 60 /* Hypervisor state */
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
#define MSR_WE_LG 18 /* Wait State Enable */
#define MSR_TGPR_LG 17 /* TLB Update registers in use */
@@ -71,6 +72,7 @@
#endif
#define MSR_VEC __MASK(MSR_VEC_LG) /* Enable AltiVec */
+#define MSR_VSX __MASK(MSR_VSX_LG) /* Enable VSX */
#define MSR_POW __MASK(MSR_POW_LG) /* Enable Power Management */
#define MSR_WE __MASK(MSR_WE_LG) /* Wait State Enable */
#define MSR_TGPR __MASK(MSR_TGPR_LG) /* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
* it must be copied via a vector register to/from storage) or as a word.
* The entry with index 33 contains the vrsave as the first word (offset 0)
* within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words. Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ * VSR doubleword 0 VSR doubleword 1
+ * ----------------------------------------------------------------
+ * VSR[0] | FPR[0] | |
+ * ----------------------------------------------------------------
+ * VSR[1] | FPR[1] | |
+ * ----------------------------------------------------------------
+ * | ... | |
+ * | ... | |
+ * ----------------------------------------------------------------
+ * VSR[30] | FPR[30] | |
+ * ----------------------------------------------------------------
+ * VSR[31] | FPR[31] | |
+ * ----------------------------------------------------------------
+ * VSR[32] | VR[0] |
+ * ----------------------------------------------------------------
+ * VSR[33] | VR[1] |
+ * ----------------------------------------------------------------
+ * | ... |
+ * | ... |
+ * ----------------------------------------------------------------
+ * VSR[62] | VR[30] |
+ * ----------------------------------------------------------------
+ * VSR[63] | VR[31] |
+ * ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve. vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
*/
elf_vrreg_t __user *v_regs;
- long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+ long vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
#endif
};
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
extern void giveup_altivec(struct task_struct *);
extern void load_up_altivec(struct task_struct *);
extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
extern void enable_kernel_spe(void);
extern void giveup_spe(struct task_struct *);
extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
}
#endif
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
#ifdef CONFIG_SPE
extern void flush_spe_to_thread(struct task_struct *);
#else
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-24 10:57 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-24 13:47 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 13:47 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> If we set the SPE MSR bit in save_user_regs we can blow away the VEC
> bit. This doesn't matter in reality as they are infact the same bit
> but looks bad.
>
> Also, when we add VSX in a later patch, we need to be able to set two
> separate MSR bits here.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-24 10:57 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-24 14:01 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:01 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> Make load_up_fpu and load_up_altivec callable so they can be reused by
> the VSX code.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-24 10:57 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-24 14:06 ` Kumar Gala
2008-06-25 0:06 ` Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:06 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> This adds the macros for the VSX load/store instruction as most
> binutils are not going to support this for a while.
>
> Also add VSX register save/restore macros and vsr[0-63] register
> definitions.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++
> ++++++++++
> 1 file changed, 127 insertions(+)
>
> Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> ===================================================================
> --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
> +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
> REST_10GPRS(22, base)
> #endif
>
> +/*
> + * Define what the VSX XX1 form instructions will look like, then add
> + * the 128 bit load store instructions based on that.
> + */
> +#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
> + ((rb) << 11) | (((xs) >> 5)))
> +
> +#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra),
> (rb)))
> +#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra),
> (rb)))
>
> #define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
> #define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
> @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
> #define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n
> +8,b,base)
> #define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n
> +16,b,base)
>
> +/* Save the lower 32 VSRs in the thread VSR region */
> +#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));
> STXVD2X(n,b,base)
> +#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
> +#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n
> +2,b,base)
> +#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n
> +4,b,base)
> +#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n
> +8,b,base)
> +#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n
> +16,b,base)
> +#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));
> LXVD2X(n,b,base)
> +#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
> +#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n
> +2,b,base)
> +#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n
> +4,b,base)
> +#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n
> +8,b,base)
> +#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n
> +16,b,base)
> +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
> +#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n
> +32,b,base)
> +#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n
> +1,b,base)
> +#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n
> +2,b,base)
> +#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n
> +4,b,base)
> +#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n
> +8,b,base)
> +#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base);
> SAVE_16VSRSU(n+16,b,base)
> +#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n
> +32,b,base)
> +#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n
> +1,b,base)
> +#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n
> +2,b,base)
> +#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n
> +4,b,base)
> +#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n
> +8,b,base)
> +#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base);
> REST_16VSRSU(n+16,b,base)
> +
> +#ifdef CONFIG_VSX
I think we should do this in fpu.S so its clearly in the code when
reading it what's going on.
>
> +#define REST_32FPVSRS(n,c,base) \
> +BEGIN_FTR_SECTION \
> + b 2f; \
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> + REST_32FPRS(n,base); \
> + b 3f; \
> +2: REST_32VSRS(n,c,base); \
> +3:
> +
> +#define SAVE_32FPVSRS(n,c,base) \
> +BEGIN_FTR_SECTION \
> + b 2f; \
> +END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> + SAVE_32FPRS(n,base); \
> + b 3f; \
> +2: SAVE_32VSRS(n,c,base); \
> +3:
> +
> +#else
> +#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
> +#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
> +#endif
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-24 10:57 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-24 14:07 ` Kumar Gala
2008-06-24 16:33 ` Segher Boessenkool
2008-06-25 0:25 ` Michael Neuling
0 siblings, 2 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:07 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> We are going to change where the floating point registers are stored
> in the thread_struct, so in preparation add some macros to access the
> floating point registers. Update all code to use these new macros.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> ---
>
> arch/powerpc/kernel/align.c | 6 ++--
> arch/powerpc/kernel/process.c | 5 ++-
> arch/powerpc/kernel/ptrace.c | 14 +++++----
> arch/powerpc/kernel/ptrace32.c | 14 +++++++--
> arch/powerpc/kernel/softemu8xx.c | 4 +-
> arch/powerpc/math-emu/math.c | 56 ++++++++++++++++++
> +--------------------
> include/asm-powerpc/ppc_asm.h | 5 ++-
> include/asm-powerpc/processor.h | 3 ++
> 8 files changed, 61 insertions(+), 46 deletions(-)
>
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> @@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
> flush_fp_to_thread(target);
>
> BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> - offsetof(struct thread_struct, fpr[32]));
> + offsetof(struct thread_struct, TS_FPR(32)));
>
> return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> - &target->thread.fpr, 0, -1);
> + target->thread.fpr, 0, -1);
is there a reason we can drop the '&'? (I'm only look at this as a
textual diff, not at what the code is trying to do).
>
> }
>
> static int fpr_set(struct task_struct *target, const struct
> user_regset *regset,
> @@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
> flush_fp_to_thread(target);
>
> BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> - offsetof(struct thread_struct, fpr[32]));
> + offsetof(struct thread_struct, TS_FPR(32)));
>
> return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> - &target->thread.fpr, 0, -1);
> + target->thread.fpr, 0, -1);
ditto.
>
> }
>
>
> @@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
> tmp = ptrace_get_reg(child, (int) index);
> } else {
> flush_fp_to_thread(child);
> - tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
> + tmp = ((unsigned long *)child->thread.fpr)
> + [TS_FPRSPACING * (index - PT_FPR0)];
> }
> ret = put_user(tmp,(unsigned long __user *) data);
> break;
> @@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
> ret = ptrace_put_reg(child, index, data);
> } else {
> flush_fp_to_thread(child);
> - ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
> + ((unsigned long *)child->thread.fpr)
> + [TS_FPRSPACING * (index - PT_FPR0)] = data;
> ret = 0;
> }
> break;
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> @@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
> return -EPERM;
> }
>
> +/* Macros to workout the correct index for the FPR in the thread
> struct */
> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> +#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
we should either use this macros in both ptrace.c and ptrace32.c or
drop them
>
> +
> long compat_arch_ptrace(struct task_struct *child, compat_long_t
> request,
> compat_ulong_t caddr, compat_ulong_t cdata)
> {
> @@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
> * to be an array of unsigned int (32 bits) - the
> * index passed in is based on this assumption.
> */
> - tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
> + tmp = ((unsigned int *)child->thread.fpr)
> + [FPRINDEX(index)];
> }
> ret = put_user((unsigned int)tmp, (u32 __user *)data);
> break;
> @@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
> CHECK_FULL_REGS(child->thread.regs);
> if (numReg >= PT_FPR0) {
> flush_fp_to_thread(child);
> - tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
> + tmp = ((unsigned long int *)child->thread.fpr)
> + [FPRINDEX(numReg)];
> } else { /* register within PT_REGS struct */
> tmp = ptrace_get_reg(child, numReg);
> }
> @@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
> * to be an array of unsigned int (32 bits) - the
> * index passed in is based on this assumption.
> */
> - ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
> + ((unsigned int *)child->thread.fpr)
> + [TS_FPRSPACING * (index - PT_FPR0)] = data;
is there a reason this isn't FPRINDEX(index)?
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-24 10:57 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-24 14:19 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:19 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> Add a VSX CPU feature. Also add code to detect if VSX is available
> from the device tree.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-24 10:57 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
@ 2008-06-24 14:19 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-24 14:19 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
> Add CONFIG_VSX config build option. Must compile with POWER4, FPU
> and ALTIVEC.
>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-24 14:07 ` Kumar Gala
@ 2008-06-24 16:33 ` Segher Boessenkool
2008-06-25 0:25 ` Michael Neuling
1 sibling, 0 replies; 106+ messages in thread
From: Segher Boessenkool @ 2008-06-24 16:33 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
>> return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>> - &target->thread.fpr, 0, -1);
>> + target->thread.fpr, 0, -1);
>
> is there a reason we can drop the '&'?
Yes, .fpr is an array. C is _such_ a fun language, heh.
Segher
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-24 14:06 ` Kumar Gala
@ 2008-06-25 0:06 ` Michael Neuling
2008-06-25 2:19 ` Kumar Gala
0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 0:06 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <B2FEFA8A-8814-44BE-81E5-8E2A873C2A1F@kernel.crashing.org> you wrote
:
>
> On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
>
> > This adds the macros for the VSX load/store instruction as most
> > binutils are not going to support this for a while.
> >
> > Also add VSX register save/restore macros and vsr[0-63] register
> > definitions.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >
> > include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++++
> > ++++++++++
> > 1 file changed, 127 insertions(+)
> >
> > Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
> > +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
> > @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
> > REST_10GPRS(22, base)
> > #endif
> >
> > +/*
> > + * Define what the VSX XX1 form instructions will look like, then add
> > + * the 128 bit load store instructions based on that.
> > + */
> > +#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) |
\
> > + ((rb) << 11) | (((xs) >> 5)))
> > +
> > +#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra),
> > (rb)))
> > +#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra),
> > (rb)))
> >
> > #define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
> > #define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
> > @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
> > #define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n
> > +8,b,base)
> > #define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n
> > +16,b,base)
> >
> > +/* Save the lower 32 VSRs in the thread VSR region */
> > +#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));
> > STXVD2X(n,b,base)
> > +#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base
)
> > +#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n
> > +2,b,base)
> > +#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n
> > +4,b,base)
> > +#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n
> > +8,b,base)
> > +#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n
> > +16,b,base)
> > +#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));
> > LXVD2X(n,b,base)
> > +#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base
)
> > +#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n
> > +2,b,base)
> > +#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n
> > +4,b,base)
> > +#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n
> > +8,b,base)
> > +#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n
> > +16,b,base)
> > +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
> > +#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n
> > +32,b,base)
> > +#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n
> > +1,b,base)
> > +#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n
> > +2,b,base)
> > +#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n
> > +4,b,base)
> > +#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n
> > +8,b,base)
> > +#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base);
> > SAVE_16VSRSU(n+16,b,base)
> > +#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n
> > +32,b,base)
> > +#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n
> > +1,b,base)
> > +#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n
> > +2,b,base)
> > +#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n
> > +4,b,base)
> > +#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n
> > +8,b,base)
> > +#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base);
> > REST_16VSRSU(n+16,b,base)
> > +
> > +#ifdef CONFIG_VSX
>
> I think we should do this in fpu.S so its clearly in the code when
> reading it what's going on.
Do you mean the section above or below this comment?
>
> >
> > +#define REST_32FPVSRS(n,c,base)
\
> > +BEGIN_FTR_SECTION \
> > + b 2f; \
> > +END_FTR_SECTION_IFSET(CPU_FTR_VSX);
\
> > + REST_32FPRS(n,base); \
> > + b 3f; \
> > +2: REST_32VSRS(n,c,base); \
> > +3:
> > +
> > +#define SAVE_32FPVSRS(n,c,base)
\
> > +BEGIN_FTR_SECTION \
> > + b 2f; \
> > +END_FTR_SECTION_IFSET(CPU_FTR_VSX);
\
> > + SAVE_32FPRS(n,base); \
> > + b 3f; \
> > +2: SAVE_32VSRS(n,c,base); \
> > +3:
> > +
> > +#else
> > +#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
> > +#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
> > +#endif
>
>
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-24 14:07 ` Kumar Gala
2008-06-24 16:33 ` Segher Boessenkool
@ 2008-06-25 0:25 ` Michael Neuling
1 sibling, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 0:25 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Paul Mackerras
In message <0DCAAAC2-52AB-4704-98C0-4E9235C3AC88@kernel.crashing.org> you wrote
:
>
> On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
>
> > We are going to change where the floating point registers are stored
> > in the thread_struct, so in preparation add some macros to access the
> > floating point registers. Update all code to use these new macros.
> >
> > Signed-off-by: Michael Neuling <mikey@neuling.org>
> > ---
> >
> > arch/powerpc/kernel/align.c | 6 ++--
> > arch/powerpc/kernel/process.c | 5 ++-
> > arch/powerpc/kernel/ptrace.c | 14 +++++----
> > arch/powerpc/kernel/ptrace32.c | 14 +++++++--
> > arch/powerpc/kernel/softemu8xx.c | 4 +-
> > arch/powerpc/math-emu/math.c | 56 ++++++++++++++++++
> > +--------------------
> > include/asm-powerpc/ppc_asm.h | 5 ++-
> > include/asm-powerpc/processor.h | 3 ++
> > 8 files changed, 61 insertions(+), 46 deletions(-)
> >
>
> > Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
> > +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
> > @@ -218,10 +218,10 @@ static int fpr_get(struct task_struct *t
> > flush_fp_to_thread(target);
> >
> > BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> > - offsetof(struct thread_struct, fpr[32]));
> > + offsetof(struct thread_struct, TS_FPR(32)));
> >
> > return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
> > - &target->thread.fpr, 0, -1);
> > + target->thread.fpr, 0, -1);
>
> is there a reason we can drop the '&'? (I'm only look at this as a
> textual diff, not at what the code is trying to do).
Oops.. I'll fix.
> >
> > }
> >
> > static int fpr_set(struct task_struct *target, const struct
> > user_regset *regset,
> > @@ -231,10 +231,10 @@ static int fpr_set(struct task_struct *t
> > flush_fp_to_thread(target);
> >
> > BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
> > - offsetof(struct thread_struct, fpr[32]));
> > + offsetof(struct thread_struct, TS_FPR(32)));
> >
> > return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
> > - &target->thread.fpr, 0, -1);
> > + target->thread.fpr, 0, -1);
>
> ditto.
> >
> > }
> >
> >
> > @@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
> > tmp = ptrace_get_reg(child, (int) index);
> > } else {
> > flush_fp_to_thread(child);
> > - tmp = ((unsigned long *)child->thread.fpr)[index - PT_F
PR0];
> > + tmp = ((unsigned long *)child->thread.fpr)
> > + [TS_FPRSPACING * (index - PT_FPR0)];
> > }
> > ret = put_user(tmp,(unsigned long __user *) data);
> > break;
> > @@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
> > ret = ptrace_put_reg(child, index, data);
> > } else {
> > flush_fp_to_thread(child);
> > - ((unsigned long *)child->thread.fpr)[index - PT_FPR0] =
data;
> > + ((unsigned long *)child->thread.fpr)
> > + [TS_FPRSPACING * (index - PT_FPR0)] = data;
> > ret = 0;
> > }
> > break;
> > Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
> > +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> > @@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
> > return -EPERM;
> > }
> >
> > +/* Macros to workout the correct index for the FPR in the thread
> > struct */
> > +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> > +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> > +#define FPRINDEX(i) TS_FPRSPACING * FPRNUMBER(i) + FPRHALF(i)
>
> we should either use this macros in both ptrace.c and ptrace32.c or
> drop them
This set of macros is really only 32 bit specific since in ptrace 32 we
access the registers as 32 bits (hence needing two accesses to get the
full 64 bits), but in ptrace 64, we access them as 64 bit (hence only 1
access).
Theses macros are really only here to deal with the unique indexing into
the thread struct that we now need to do for ptrace 32 only (thanks to
paulus who pointed out I got this wrong first time).
The only macro here that could potentially be reused is FPRNUMER(i).
>
> >
> > +
> > long compat_arch_ptrace(struct task_struct *child, compat_long_t
> > request,
> > compat_ulong_t caddr, compat_ulong_t cdata)
> > {
> > @@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
> > * to be an array of unsigned int (32 bits) - the
> > * index passed in is based on this assumption.
> > */
> > - tmp = ((unsigned int *)child->thread.fpr)[index - PT_FP
R0];
> > + tmp = ((unsigned int *)child->thread.fpr)
> > + [FPRINDEX(index)];
> > }
> > ret = put_user((unsigned int)tmp, (u32 __user *)data);
> > break;
> > @@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
> > CHECK_FULL_REGS(child->thread.regs);
> > if (numReg >= PT_FPR0) {
> > flush_fp_to_thread(child);
> > - tmp = ((unsigned long int *)child->thread.fpr)[numReg -
PT_FPR0];
> > + tmp = ((unsigned long int *)child->thread.fpr)
> > + [FPRINDEX(numReg)];
> > } else { /* register within PT_REGS struct */
> > tmp = ptrace_get_reg(child, numReg);
> > }
> > @@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
> > * to be an array of unsigned int (32 bits) - the
> > * index passed in is based on this assumption.
> > */
> > - ((unsigned int *)child->thread.fpr)[index - PT_FPR0] =
data;
> > + ((unsigned int *)child->thread.fpr)
> > + [TS_FPRSPACING * (index - PT_FPR0)] = data;
>
> is there a reason this isn't FPRINDEX(index)?
Oops, fixed.
Can you tell I only tested peek not poke user :-D
>
> - k
>
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-25 0:06 ` Michael Neuling
@ 2008-06-25 2:19 ` Kumar Gala
0 siblings, 0 replies; 106+ messages in thread
From: Kumar Gala @ 2008-06-25 2:19 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
On Jun 24, 2008, at 7:06 PM, Michael Neuling wrote:
> In message
> <B2FEFA8A-8814-44BE-81E5-8E2A873C2A1F@kernel.crashing.org> you wrote
> :
>>
>> On Jun 24, 2008, at 5:57 AM, Michael Neuling wrote:
>>
>>> This adds the macros for the VSX load/store instruction as most
>>> binutils are not going to support this for a while.
>>>
>>> Also add VSX register save/restore macros and vsr[0-63] register
>>> definitions.
>>>
>>> Signed-off-by: Michael Neuling <mikey@neuling.org>
>>> ---
>>>
>>> include/asm-powerpc/ppc_asm.h | 127 ++++++++++++++++++++++++++++++
>>> ++
>>> ++++++++++
>>> 1 file changed, 127 insertions(+)
>>>
>>> Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
>>> ===================================================================
>>> --- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
>>> +++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
>>> @@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
>
>>> REST_10GPRS(22, base)
>>> #endif
>>>
>>> +/*
>>> + * Define what the VSX XX1 form instructions will look like, then
>>> add
>>> + * the 128 bit load store instructions based on that.
>>> + */
>>> +#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) |
> \
>>> + ((rb) << 11) | (((xs) >> 5)))
>>> +
>>> +#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra),
>
>>> (rb)))
>>> +#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra),
>>> (rb)))
>>>
>>> #define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
>>> #define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2,
>>> base)
>>> @@ -110,6 +119,57 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
>
>>> #define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n
>>> +8,b,base)
>>> #define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n
>>> +16,b,base)
>>>
>>> +/* Save the lower 32 VSRs in the thread VSR region */
>>> +#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));
>>> STXVD2X(n,b,base)
>>> +#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n
>>> +1,b,base
> )
>>> +#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n
>>> +2,b,base)
>>> +#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n
>>> +4,b,base)
>>> +#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n
>>> +8,b,base)
>>> +#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n
>>> +16,b,base)
>>> +#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n));
>>> LXVD2X(n,b,base)
>>> +#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n
>>> +1,b,base
> )
>>> +#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n
>>> +2,b,base)
>>> +#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n
>>> +4,b,base)
>>> +#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n
>>> +8,b,base)
>>> +#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n
>>> +16,b,base)
>>> +/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31)
>>> */
>>> +#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n
>>> +32,b,base)
>>> +#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n
>>> +1,b,base)
>>> +#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n
>>> +2,b,base)
>>> +#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n
>>> +4,b,base)
>>> +#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n
>>> +8,b,base)
>>> +#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base);
>>> SAVE_16VSRSU(n+16,b,base)
>>> +#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n
>>> +32,b,base)
>>> +#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n
>>> +1,b,base)
>>> +#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n
>>> +2,b,base)
>>> +#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n
>>> +4,b,base)
>>> +#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n
>>> +8,b,base)
>>> +#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base);
>>> REST_16VSRSU(n+16,b,base)
>>> +
>>> +#ifdef CONFIG_VSX
>>
>> I think we should do this in fpu.S so its clearly in the code when
>> reading it what's going on.
>
> Do you mean the section above or below this comment?
Sorry, the code below. (That does REST_32FPVSRS)..
>
>
>>
>>>
>>> +#define REST_32FPVSRS(n,c,base)
> \
>>> +BEGIN_FTR_SECTION \
>>> + b 2f; \
>>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);
> \
>>> + REST_32FPRS(n,base); \
>>> + b 3f; \
>>> +2: REST_32VSRS(n,c,base); \
>>> +3:
>>> +
>>> +#define SAVE_32FPVSRS(n,c,base)
> \
>>> +BEGIN_FTR_SECTION \
>>> + b 2f; \
>>> +END_FTR_SECTION_IFSET(CPU_FTR_VSX);
> \
>>> + SAVE_32FPRS(n,base); \
>>> + b 3f; \
>>> +2: SAVE_32VSRS(n,c,base); \
>>> +3:
>>> +
>>> +#else
>>> +#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
>>> +#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
>>> +#endif
>>
>>
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX.
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (8 preceding siblings ...)
2008-06-24 10:57 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
` (8 more replies)
9 siblings, 9 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The following set of patches adds Vector Scalar Extentions (VSX)
support for POWER7. Includes context switch, ptrace and signals support.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Paulus: please consider for your 2.6.27 tree.
Updates this post....
- White space change in start_thread thanks to Paulus
- thread_struct change/cleanup suggested thanks to Paulus. This
also resulted in changing TS_FPRSPACING to TS_FPRWIDTH
- pointer to array fix, thanks to Kumar
- indexing macro fix in ptrace32 thanks to Kumar
- moved SAVE/REST_32FPVSRS to where they are used in fpu.S suggested by Kumar
This time for sure!
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-25 4:07 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 14:08 ` Kumar Gala
2008-06-25 4:07 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
` (6 subsequent siblings)
8 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/process.c | 2 -
arch/powerpc/kernel/ptrace.c | 10 ++++--
arch/powerpc/kernel/ptrace32.c | 14 +++++++--
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++--------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 4 ++
8 files changed, 58 insertions(+), 43 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,7 +218,7 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpr, 0, -1);
@@ -231,7 +231,7 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpr, 0, -1);
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.fpr)
+ [TS_FPRWIDTH * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.fpr)
+ [TS_FPRWIDTH * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
return -EPERM;
}
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
+#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) + FPRHALF(i)
+
long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
compat_ulong_t caddr, compat_ulong_t cdata)
{
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.fpr)
+ [FPRINDEX(index)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.fpr)
+ [FPRINDEX(numReg)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.fpr)
+ [FPRINDEX(index)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -12,6 +12,8 @@
#include <asm/reg.h>
+#define TS_FPRWIDTH 1
+
#ifndef __ASSEMBLY__
#include <linux/compiler.h>
#include <asm/ptrace.h>
@@ -136,6 +138,8 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
` (7 subsequent siblings)
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
If we set the SPE MSR bit in save_user_regs we can blow away the VEC
bit. This doesn't matter in reality as they are infact the same bit
but looks bad.
Also, when we add VSX in a later patch, we need to be able to set two
separate MSR bits here.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/signal_32.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -336,6 +336,8 @@ struct rt_sigframe {
static int save_user_regs(struct pt_regs *regs, struct mcontext __user *frame,
int sigret)
{
+ unsigned long msr = regs->msr;
+
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
@@ -354,8 +356,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_VEC in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_VEC, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_VEC;
}
/* else assert((regs->msr & MSR_VEC) == 0) */
@@ -377,8 +378,7 @@ static int save_user_regs(struct pt_regs
return 1;
/* set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data */
- if (__put_user(regs->msr | MSR_SPE, &frame->mc_gregs[PT_MSR]))
- return 1;
+ msr |= MSR_SPE;
}
/* else assert((regs->msr & MSR_SPE) == 0) */
@@ -387,6 +387,8 @@ static int save_user_regs(struct pt_regs
return 1;
#endif /* CONFIG_SPE */
+ if (__put_user(msr, &frame->mc_gregs[PT_MSR]))
+ return 1;
if (sigret) {
/* Set up the sigreturn trampoline: li r0,sigret; sc */
if (__put_user(0x38000000UL + sigret, &frame->tramp[0])
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 3/9] powerpc: Move altivec_unavailable
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (3 preceding siblings ...)
2008-06-25 4:07 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
` (3 subsequent siblings)
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Move the altivec_unavailable code, to make room at 0xf40 where the
vsx_unavailable exception will be.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/head_64.S | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -275,7 +275,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf00
b performance_monitor_pSeries
- STD_EXCEPTION_PSERIES(0xf20, altivec_unavailable)
+ . = 0xf20
+ b altivec_unavailable_pSeries
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
@@ -295,6 +296,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
+ STD_EXCEPTION_PSERIES(., altivec_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (5 preceding siblings ...)
2008-06-25 4:07 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-25 4:07 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Make load_up_fpu and load_up_altivec callable so they can be reused by
the VSX code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 2 +-
arch/powerpc/kernel/head_32.S | 6 ++++--
arch/powerpc/kernel/head_64.S | 10 +++++++---
arch/powerpc/kernel/head_booke.h | 6 ++++--
4 files changed, 16 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -85,7 +85,7 @@ _GLOBAL(load_up_fpu)
#endif /* CONFIG_SMP */
/* restore registers and return */
/* we haven't used ctr or xer or lr */
- b fast_exception_return
+ blr
/*
* giveup_fpu(tsk)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_32.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_32.S
@@ -421,8 +421,10 @@ BEGIN_FTR_SECTION
b ProgramCheck
END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
EXCEPTION_PROLOG
- bne load_up_fpu /* if from user, just load it up */
- addi r3,r1,STACK_FRAME_OVERHEAD
+ beq 1f
+ bl load_up_fpu /* if from user, just load it up */
+ b fast_exception_return
+1: addi r3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
/* Decrementer */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -741,7 +741,8 @@ fp_unavailable_common:
ENABLE_INTS
bl .kernel_fp_unavailable_exception
BUG_OPCODE
-1: b .load_up_fpu
+1: bl .load_up_fpu
+ b fast_exception_return
.align 7
.globl altivec_unavailable_common
@@ -749,7 +750,10 @@ altivec_unavailable_common:
EXCEPTION_PROLOG_COMMON(0xf20, PACA_EXGEN)
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
- bne .load_up_altivec /* if from user, just load it up */
+ beq 1f
+ bl .load_up_altivec
+ b fast_exception_return
+1:
END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
#endif
bl .save_nvgprs
@@ -829,7 +833,7 @@ _STATIC(load_up_altivec)
std r4,0(r3)
#endif /* CONFIG_SMP */
/* restore registers and return */
- b fast_exception_return
+ blr
#endif /* CONFIG_ALTIVEC */
/*
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_booke.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_booke.h
@@ -363,8 +363,10 @@ label:
#define FP_UNAVAILABLE_EXCEPTION \
START_EXCEPTION(FloatingPointUnavailable) \
NORMAL_EXCEPTION_PROLOG; \
- bne load_up_fpu; /* if from user, just load it up */ \
- addi r3,r1,STACK_FRAME_OVERHEAD; \
+ beq 1f; \
+ bl load_up_fpu; /* if from user, just load it up */ \
+ b fast_exception_return; \
+1: addi r3,r1,STACK_FRAME_OVERHEAD; \
EXC_XFER_EE_LITE(0x800, kernel_fp_unavailable_exception)
#endif /* __HEAD_BOOKE_H__ */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (2 preceding siblings ...)
2008-06-25 4:07 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
` (4 subsequent siblings)
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
The layout of the new VSR registers and how they overlap on top of the
legacy FPR and VR registers is:
VSR doubleword 0 VSR doubleword 1
----------------------------------------------------------------
VSR[0] | FPR[0] | |
----------------------------------------------------------------
VSR[1] | FPR[1] | |
----------------------------------------------------------------
| ... | |
| ... | |
----------------------------------------------------------------
VSR[30] | FPR[30] | |
----------------------------------------------------------------
VSR[31] | FPR[31] | |
----------------------------------------------------------------
VSR[32] | VR[0] |
----------------------------------------------------------------
VSR[33] | VR[1] |
----------------------------------------------------------------
| ... |
| ... |
----------------------------------------------------------------
VSR[62] | VR[30] |
----------------------------------------------------------------
VSR[63] | VR[31] |
----------------------------------------------------------------
VSX has 64 128bit registers. The first 32 regs overlap with the FP
registers and hence extend them with and additional 64 bits. The
second 32 regs overlap with the VMX registers.
This patch introduces the thread_struct changes required to reflect
this register layout. Ptrace and signals code is updated so that the
floating point registers are correctly accessed from the thread_struct
when CONFIG_VSX is enabled.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/asm-offsets.c | 4 ++
arch/powerpc/kernel/ptrace.c | 29 ++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 59 ++++++++++++++++++++++++++++----------
arch/powerpc/kernel/signal_64.c | 32 ++++++++++++++++++--
include/asm-powerpc/processor.h | 18 +++++++++--
5 files changed, 121 insertions(+), 21 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/asm-offsets.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/asm-offsets.c
@@ -74,6 +74,10 @@ int main(void)
DEFINE(THREAD_VSCR, offsetof(struct thread_struct, vscr));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ DEFINE(THREAD_VSR0, offsetof(struct thread_struct, fpr));
+ DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
#else /* CONFIG_PPC64 */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -215,29 +215,56 @@ static int fpr_get(struct task_struct *t
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = target->thread.TS_FPR(i);
+ memcpy(&buf[32], &target->thread.fpscr, sizeof(double));
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpr, 0, -1);
+#endif
}
static int fpr_set(struct task_struct *target, const struct user_regset *regset,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
+#ifdef CONFIG_VSX
+ double buf[33];
+ int i;
+#endif
flush_fp_to_thread(target);
+#ifdef CONFIG_VSX
+ /* copy to local buffer then write that out */
+ i = user_regset_copyin(&pos, &count, &kbuf, &ubuf, buf, 0, -1);
+ if (i)
+ return i;
+ for (i = 0; i < 32 ; i++)
+ target->thread.TS_FPR(i) = buf[i];
+ memcpy(&target->thread.fpscr, &buf[32], sizeof(double));
+ return 0;
+#else
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpr, 0, -1);
+#endif
}
-
#ifdef CONFIG_ALTIVEC
/*
* Get/set all the altivec registers vr0..vr31, vscr, vrsave, in one go.
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -337,14 +337,16 @@ static int save_user_regs(struct pt_regs
int sigret)
{
unsigned long msr = regs->msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/* Make sure floating point registers are stored in regs */
flush_fp_to_thread(current);
- /* save general and floating-point registers */
- if (save_general_regs(regs, frame) ||
- __copy_to_user(&frame->mc_fregs, current->thread.fpr,
- ELF_NFPREG * sizeof(double)))
+ /* save general registers */
+ if (save_general_regs(regs, frame))
return 1;
#ifdef CONFIG_ALTIVEC
@@ -368,7 +370,20 @@ static int save_user_regs(struct pt_regs
if (__put_user(current->thread.vrsave, (u32 __user *)&frame->mc_vregs[32]))
return 1;
#endif /* CONFIG_ALTIVEC */
-
+#ifdef CONFIG_VSX
+ /* save FPR copy to local buffer then write to the thread_struct */
+ flush_fp_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
+ return 1;
+#else
+ /* save floating-point registers */
+ if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
+ ELF_NFPREG * sizeof(double)))
+ return 1;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* save spe registers */
if (current->thread.used_spe) {
@@ -411,6 +426,10 @@ static long restore_user_regs(struct pt_
long err;
unsigned int save_r2 = 0;
unsigned long msr;
+#ifdef CONFIG_VSX
+ double buf[32];
+ int i;
+#endif
/*
* restore general registers but not including MSR or SOFTE. Also
@@ -438,16 +457,11 @@ static long restore_user_regs(struct pt_
*/
discard_lazy_cpu_state();
- /* force the process to reload the FP registers from
- current->thread when it next does FP instructions */
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
- if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
- sizeof(sr->mc_fregs)))
- return 1;
-
#ifdef CONFIG_ALTIVEC
- /* force the process to reload the altivec registers from
- current->thread when it next does altivec instructions */
+ /*
+ * Force the process to reload the altivec registers from
+ * current->thread when it next does altivec instructions
+ */
regs->msr &= ~MSR_VEC;
if (msr & MSR_VEC) {
/* restore altivec registers from the stack */
@@ -462,6 +476,23 @@ static long restore_user_regs(struct pt_
return 1;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (__copy_from_user(buf, &sr->mc_fregs,sizeof(sr->mc_fregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
+ sizeof(sr->mc_fregs)))
+ return 1;
+#endif /* CONFIG_VSX */
+ /*
+ * force the process to reload the FP registers from
+ * current->thread when it next does FP instructions
+ */
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
+
#ifdef CONFIG_SPE
/* force the process to reload the spe registers from
current->thread when it next does spe instructions */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -89,6 +89,10 @@ static long setup_sigcontext(struct sigc
#endif
unsigned long msr = regs->msr;
long err = 0;
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+ int i;
+#endif
flush_fp_to_thread(current);
@@ -112,11 +116,21 @@ static long setup_sigcontext(struct sigc
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
+ flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ /* Copy FP to local buffer then write that out */
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.TS_FPR(i);
+ memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
+ err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+#else /* CONFIG_VSX */
+ /* copy fpr regs and fpscr */
+ err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
+#endif /* CONFIG_VSX */
err |= __put_user(&sc->gp_regs, &sc->regs);
WARN_ON(!FULL_REGS(regs));
err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
err |= __put_user(signr, &sc->signal);
err |= __put_user(handler, &sc->handler);
if (set != NULL)
@@ -135,6 +149,9 @@ static long restore_sigcontext(struct pt
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
+#ifdef CONFIG_VSX
+ double buf[FP_REGS_SIZE];
+#endif
unsigned long err = 0;
unsigned long save_r13 = 0;
elf_greg_t *gregs = (elf_greg_t *)regs;
@@ -182,8 +199,6 @@ static long restore_sigcontext(struct pt
*/
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
- err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
-
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
if (err)
@@ -202,7 +217,18 @@ static long restore_sigcontext(struct pt
else
current->thread.vrsave = 0;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* restore floating point */
+ err |= __copy_from_user(buf, &sc->fp_regs, FP_REGS_SIZE);
+ if (err)
+ return err;
+ for (i = 0; i < 32 ; i++)
+ current->thread.TS_FPR(i) = buf[i];
+ memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+#else
+ err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
+#endif
return err;
}
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -12,7 +12,11 @@
#include <asm/reg.h>
+#ifdef CONFIG_VSX
+#define TS_FPRWIDTH 2
+#else
#define TS_FPRWIDTH 1
+#endif
#ifndef __ASSEMBLY__
#include <linux/compiler.h>
@@ -80,6 +84,7 @@ extern long kernel_thread(int (*fn)(void
/* Lazy FPU handling on uni-processor */
extern struct task_struct *last_task_used_math;
extern struct task_struct *last_task_used_altivec;
+extern struct task_struct *last_task_used_vsx;
extern struct task_struct *last_task_used_spe;
#ifdef CONFIG_PPC32
@@ -138,7 +143,9 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
-#define TS_FPR(i) fpr[i]
+#define TS_FPROFFSET 0
+#define TS_VSRLOWOFFSET 1
+#define TS_FPR(i) fpr[i][TS_FPROFFSET]
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
@@ -156,8 +163,9 @@ struct thread_struct {
unsigned long dbcr0; /* debug control register values */
unsigned long dbcr1;
#endif
- double fpr[32]; /* Complete floating point set */
- struct { /* fpr ... fpscr must be contiguous */
+ /* FP and VSX 0-31 register set */
+ double fpr[32][TS_FPRWIDTH];
+ struct {
unsigned int pad;
unsigned int val; /* Floating point status */
@@ -177,6 +185,10 @@ struct thread_struct {
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* VSR status */
+ int used_vsr; /* set if process has used altivec */
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
unsigned long evr[32]; /* upper 32-bits of SPE regs */
u64 acc; /* Accumulator */
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 7/9] powerpc: Add VSX assembler code macros
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (4 preceding siblings ...)
2008-06-25 4:07 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
` (2 subsequent siblings)
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This adds the macros for the VSX load/store instruction as most
binutils are not going to support this for a while.
Also add VSX register save/restore macros and vsr[0-63] register definitions.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/fpu.S | 23 +++++++++
include/asm-powerpc/ppc_asm.h | 103 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 126 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -24,6 +24,29 @@
#include <asm/ppc_asm.h>
#include <asm/asm-offsets.h>
+#ifdef CONFIG_VSX
+#define REST_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ REST_32FPRS(n,base); \
+ b 3f; \
+2: REST_32VSRS(n,c,base); \
+3:
+
+#define SAVE_32FPVSRS(n,c,base) \
+BEGIN_FTR_SECTION \
+ b 2f; \
+END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
+ SAVE_32FPRS(n,base); \
+ b 3f; \
+2: SAVE_32VSRS(n,c,base); \
+3:
+#else
+#define REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
+#define SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
+#endif
+
/*
* This task wants to use the FPU now.
* On UP, disable FP for the task which had the FPU previously,
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -74,6 +74,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
REST_10GPRS(22, base)
#endif
+/*
+ * Define what the VSX XX1 form instructions will look like, then add
+ * the 128 bit load store instructions based on that.
+ */
+#define VSX_XX1(xs, ra, rb) (((xs) & 0x1f) << 21 | ((ra) << 16) | \
+ ((rb) << 11) | (((xs) >> 5)))
+
+#define STXVD2X(xs, ra, rb) .long (0x7c000798 | VSX_XX1((xs), (ra), (rb)))
+#define LXVD2X(xs, ra, rb) .long (0x7c000698 | VSX_XX1((xs), (ra), (rb)))
#define SAVE_2GPRS(n, base) SAVE_GPR(n, base); SAVE_GPR(n+1, base)
#define SAVE_4GPRS(n, base) SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -110,6 +119,33 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_16VRS(n,b,base) REST_8VRS(n,b,base); REST_8VRS(n+8,b,base)
#define REST_32VRS(n,b,base) REST_16VRS(n,b,base); REST_16VRS(n+16,b,base)
+/* Save the lower 32 VSRs in the thread VSR region */
+#define SAVE_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); STXVD2X(n,b,base)
+#define SAVE_2VSRS(n,b,base) SAVE_VSR(n,b,base); SAVE_VSR(n+1,b,base)
+#define SAVE_4VSRS(n,b,base) SAVE_2VSRS(n,b,base); SAVE_2VSRS(n+2,b,base)
+#define SAVE_8VSRS(n,b,base) SAVE_4VSRS(n,b,base); SAVE_4VSRS(n+4,b,base)
+#define SAVE_16VSRS(n,b,base) SAVE_8VSRS(n,b,base); SAVE_8VSRS(n+8,b,base)
+#define SAVE_32VSRS(n,b,base) SAVE_16VSRS(n,b,base); SAVE_16VSRS(n+16,b,base)
+#define REST_VSR(n,b,base) li b,THREAD_VSR0+(16*(n)); LXVD2X(n,b,base)
+#define REST_2VSRS(n,b,base) REST_VSR(n,b,base); REST_VSR(n+1,b,base)
+#define REST_4VSRS(n,b,base) REST_2VSRS(n,b,base); REST_2VSRS(n+2,b,base)
+#define REST_8VSRS(n,b,base) REST_4VSRS(n,b,base); REST_4VSRS(n+4,b,base)
+#define REST_16VSRS(n,b,base) REST_8VSRS(n,b,base); REST_8VSRS(n+8,b,base)
+#define REST_32VSRS(n,b,base) REST_16VSRS(n,b,base); REST_16VSRS(n+16,b,base)
+/* Save the upper 32 VSRs (32-63) in the thread VSX region (0-31) */
+#define SAVE_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); STXVD2X(n+32,b,base)
+#define SAVE_2VSRSU(n,b,base) SAVE_VSRU(n,b,base); SAVE_VSRU(n+1,b,base)
+#define SAVE_4VSRSU(n,b,base) SAVE_2VSRSU(n,b,base); SAVE_2VSRSU(n+2,b,base)
+#define SAVE_8VSRSU(n,b,base) SAVE_4VSRSU(n,b,base); SAVE_4VSRSU(n+4,b,base)
+#define SAVE_16VSRSU(n,b,base) SAVE_8VSRSU(n,b,base); SAVE_8VSRSU(n+8,b,base)
+#define SAVE_32VSRSU(n,b,base) SAVE_16VSRSU(n,b,base); SAVE_16VSRSU(n+16,b,base)
+#define REST_VSRU(n,b,base) li b,THREAD_VR0+(16*(n)); LXVD2X(n+32,b,base)
+#define REST_2VSRSU(n,b,base) REST_VSRU(n,b,base); REST_VSRU(n+1,b,base)
+#define REST_4VSRSU(n,b,base) REST_2VSRSU(n,b,base); REST_2VSRSU(n+2,b,base)
+#define REST_8VSRSU(n,b,base) REST_4VSRSU(n,b,base); REST_4VSRSU(n+4,b,base)
+#define REST_16VSRSU(n,b,base) REST_8VSRSU(n,b,base); REST_8VSRSU(n+8,b,base)
+#define REST_32VSRSU(n,b,base) REST_16VSRSU(n,b,base); REST_16VSRSU(n+16,b,base)
+
#define SAVE_EVR(n,s,base) evmergehi s,s,n; stw s,THREAD_EVR0+4*(n)(base)
#define SAVE_2EVRS(n,s,base) SAVE_EVR(n,s,base); SAVE_EVR(n+1,s,base)
#define SAVE_4EVRS(n,s,base) SAVE_2EVRS(n,s,base); SAVE_2EVRS(n+2,s,base)
@@ -534,6 +570,73 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
#define vr30 30
#define vr31 31
+/* VSX Registers (VSRs) */
+
+#define vsr0 0
+#define vsr1 1
+#define vsr2 2
+#define vsr3 3
+#define vsr4 4
+#define vsr5 5
+#define vsr6 6
+#define vsr7 7
+#define vsr8 8
+#define vsr9 9
+#define vsr10 10
+#define vsr11 11
+#define vsr12 12
+#define vsr13 13
+#define vsr14 14
+#define vsr15 15
+#define vsr16 16
+#define vsr17 17
+#define vsr18 18
+#define vsr19 19
+#define vsr20 20
+#define vsr21 21
+#define vsr22 22
+#define vsr23 23
+#define vsr24 24
+#define vsr25 25
+#define vsr26 26
+#define vsr27 27
+#define vsr28 28
+#define vsr29 29
+#define vsr30 30
+#define vsr31 31
+#define vsr32 32
+#define vsr33 33
+#define vsr34 34
+#define vsr35 35
+#define vsr36 36
+#define vsr37 37
+#define vsr38 38
+#define vsr39 39
+#define vsr40 40
+#define vsr41 41
+#define vsr42 42
+#define vsr43 43
+#define vsr44 44
+#define vsr45 45
+#define vsr46 46
+#define vsr47 47
+#define vsr48 48
+#define vsr49 49
+#define vsr50 50
+#define vsr51 51
+#define vsr52 52
+#define vsr53 53
+#define vsr54 54
+#define vsr55 55
+#define vsr56 56
+#define vsr57 57
+#define vsr58 58
+#define vsr59 59
+#define vsr60 60
+#define vsr61 61
+#define vsr62 62
+#define vsr63 63
+
/* SPE Registers (EVPRs) */
#define evr0 0
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-25 4:07 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-25 4:07 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
` (5 subsequent siblings)
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.
Mixing FP, VMX and VSX code will get constant architected state.
The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.
The ptrace interface is also extended to allow access to VSR 0-31 full
registers.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/kernel/entry_64.S | 5 +
arch/powerpc/kernel/fpu.S | 16 ++++-
arch/powerpc/kernel/head_64.S | 65 +++++++++++++++++++++++
arch/powerpc/kernel/misc_64.S | 33 ++++++++++++
arch/powerpc/kernel/ppc32.h | 1
arch/powerpc/kernel/ppc_ksyms.c | 3 +
arch/powerpc/kernel/process.c | 107 ++++++++++++++++++++++++++++++++++++++-
arch/powerpc/kernel/ptrace.c | 70 +++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 33 ++++++++++++
arch/powerpc/kernel/signal_64.c | 31 ++++++++++-
arch/powerpc/kernel/traps.c | 29 ++++++++++
include/asm-powerpc/elf.h | 6 +-
include/asm-powerpc/ptrace.h | 12 ++++
include/asm-powerpc/reg.h | 2
include/asm-powerpc/sigcontext.h | 37 +++++++++++++
include/asm-powerpc/system.h | 9 +++
16 files changed, 451 insertions(+), 8 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/entry_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/entry_64.S
@@ -353,6 +353,11 @@ _GLOBAL(_switch)
mflr r20 /* Return to switch caller */
mfmsr r22
li r0, MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r0,r0,MSR_VSX@h /* Disable VSX */
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif /* CONFIG_VSX */
#ifdef CONFIG_ALTIVEC
BEGIN_FTR_SECTION
oris r0,r0,MSR_VEC@h /* Disable altivec */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/fpu.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/fpu.S
@@ -57,6 +57,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);
_GLOBAL(load_up_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC
MTMSRD(r5) /* enable use of fpu now */
isync
@@ -73,7 +78,7 @@ _GLOBAL(load_up_fpu)
beq 1f
toreal(r4)
addi r4,r4,THREAD /* want last_task_used_math->thread */
- SAVE_32FPRS(0, r4)
+ SAVE_32FPVSRS(0, r5, r4)
mffs fr0
stfd fr0,THREAD_FPSCR(r4)
PPC_LL r5,PT_REGS(r4)
@@ -100,7 +105,7 @@ _GLOBAL(load_up_fpu)
#endif
lfd fr0,THREAD_FPSCR(r5)
MTFSF_L(fr0)
- REST_32FPRS(0, r5)
+ REST_32FPVSRS(0, r4, r5)
#ifndef CONFIG_SMP
subi r4,r5,THREAD
fromreal(r4)
@@ -119,6 +124,11 @@ _GLOBAL(load_up_fpu)
_GLOBAL(giveup_fpu)
mfmsr r5
ori r5,r5,MSR_FP
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ oris r5,r5,MSR_VSX@h
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
SYNC_601
ISYNC_601
MTMSRD(r5) /* enable use of fpu now */
@@ -129,7 +139,7 @@ _GLOBAL(giveup_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r5,0
- SAVE_32FPRS(0, r3)
+ SAVE_32FPVSRS(0, r4 ,r3)
mffs fr0
stfd fr0,THREAD_FPSCR(r3)
beq 1f
Index: linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/head_64.S
@@ -278,6 +278,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
. = 0xf20
b altivec_unavailable_pSeries
+ . = 0xf40
+ b vsx_unavailable_pSeries
+
#ifdef CONFIG_CBE_RAS
HSTD_EXCEPTION_PSERIES(0x1200, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
@@ -297,6 +300,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES(., performance_monitor)
STD_EXCEPTION_PSERIES(., altivec_unavailable)
+ STD_EXCEPTION_PSERIES(., vsx_unavailable)
/*
* An interrupt came in while soft-disabled; clear EE in SRR1,
@@ -836,6 +840,67 @@ _STATIC(load_up_altivec)
blr
#endif /* CONFIG_ALTIVEC */
+ .align 7
+ .globl vsx_unavailable_common
+vsx_unavailable_common:
+ EXCEPTION_PROLOG_COMMON(0xf40, PACA_EXGEN)
+#ifdef CONFIG_VSX
+BEGIN_FTR_SECTION
+ bne .load_up_vsx
+1:
+END_FTR_SECTION_IFSET(CPU_FTR_VSX)
+#endif
+ bl .save_nvgprs
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ ENABLE_INTS
+ bl .vsx_unavailable_exception
+ b .ret_from_except
+
+#ifdef CONFIG_VSX
+/*
+ * load_up_vsx(unused, unused, tsk)
+ * Disable VSX for the task which had it previously,
+ * and save its vector registers in its thread_struct.
+ * Reuse the fp and vsx saves, but first check to see if they have
+ * been saved already.
+ * On entry: r13 == 'current' && last_task_used_vsx != 'current'
+ */
+_STATIC(load_up_vsx)
+/* Load FP and VSX registers if they haven't been done yet */
+ andi. r5,r12,MSR_FP
+ beql+ load_up_fpu /* skip if already loaded */
+ andis. r5,r12,MSR_VEC@h
+ beql+ load_up_altivec /* skip if already loaded */
+
+#ifndef CONFIG_SMP
+ ld r3,last_task_used_vsx@got(r2)
+ ld r4,0(r3)
+ cmpdi 0,r4,0
+ beq 1f
+ /* Disable VSX for last_task_used_vsx */
+ addi r4,r4,THREAD
+ ld r5,PT_REGS(r4)
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r6,MSR_VSX@h
+ andc r6,r4,r6
+ std r6,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#endif /* CONFIG_SMP */
+ ld r4,PACACURRENT(r13)
+ addi r4,r4,THREAD /* Get THREAD */
+ li r6,1
+ stw r6,THREAD_USED_VSR(r4) /* ... also set thread used vsr */
+ /* enable use of VSX after return */
+ oris r12,r12,MSR_VSX@h
+ std r12,_MSR(r1)
+#ifndef CONFIG_SMP
+ /* Update last_task_used_math to 'current' */
+ ld r4,PACACURRENT(r13)
+ std r4,0(r3)
+#endif /* CONFIG_SMP */
+ b fast_exception_return
+#endif /* CONFIG_VSX */
+
/*
* Hash table stuff
*/
Index: linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/misc_64.S
+++ linux-2.6-ozlabs/arch/powerpc/kernel/misc_64.S
@@ -506,6 +506,39 @@ _GLOBAL(giveup_altivec)
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * giveup_vsx(tsk)
+ * Disable VSX for the task given as the argument,
+ * and save the vector registers in its thread_struct.
+ * Enables the VSX for use in the kernel on return.
+ */
+_GLOBAL(giveup_vsx)
+ mfmsr r5
+ oris r5,r5,MSR_VSX@h
+ mtmsrd r5 /* enable use of VSX now */
+ isync
+
+ cmpdi 0,r3,0
+ beqlr- /* if no previous owner, done */
+ addi r3,r3,THREAD /* want THREAD of task */
+ ld r5,PT_REGS(r3)
+ cmpdi 0,r5,0
+ beq 1f
+ ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+ lis r3,MSR_VSX@h
+ andc r4,r4,r3 /* disable VSX for previous task */
+ std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
+1:
+#ifndef CONFIG_SMP
+ li r5,0
+ ld r4,last_task_used_vsx@got(r2)
+ std r5,0(r4)
+#endif /* CONFIG_SMP */
+ blr
+
+#endif /* CONFIG_VSX */
+
/* kexec_wait(phys_cpu)
*
* wait for the flag to change, indicating this kernel is going away but
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc32.h
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc32.h
@@ -120,6 +120,7 @@ struct mcontext32 {
elf_fpregset_t mc_fregs;
unsigned int mc_pad[2];
elf_vrregset_t32 mc_vregs __attribute__((__aligned__(16)));
+ elf_vsrreghalf_t32 mc_vsregs __attribute__((__aligned__(16)));
};
struct ucontext32 {
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ppc_ksyms.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ppc_ksyms.c
@@ -102,6 +102,9 @@ EXPORT_SYMBOL(giveup_fpu);
#ifdef CONFIG_ALTIVEC
EXPORT_SYMBOL(giveup_altivec);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+EXPORT_SYMBOL(giveup_vsx);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
EXPORT_SYMBOL(giveup_spe);
#endif /* CONFIG_SPE */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -53,6 +53,7 @@ extern unsigned long _get_SP(void);
#ifndef CONFIG_SMP
struct task_struct *last_task_used_math = NULL;
struct task_struct *last_task_used_altivec = NULL;
+struct task_struct *last_task_used_vsx = NULL;
struct task_struct *last_task_used_spe = NULL;
#endif
@@ -106,11 +107,23 @@ EXPORT_SYMBOL(enable_kernel_fp);
int dump_task_fpu(struct task_struct *tsk, elf_fpregset_t *fpregs)
{
+#ifdef CONFIG_VSX
+ int i;
+ elf_fpreg_t *reg;
+#endif
+
if (!tsk->thread.regs)
return 0;
flush_fp_to_thread(current);
+#ifdef CONFIG_VSX
+ reg = (elf_fpreg_t *)fpregs;
+ for (i = 0; i < ELF_NFPREG - 1; i++, reg++)
+ *reg = tsk->thread.TS_FPR(i);
+ memcpy(reg, &tsk->thread.fpscr, sizeof(elf_fpreg_t));
+#else
memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
+#endif
return 1;
}
@@ -149,7 +162,7 @@ void flush_altivec_to_thread(struct task
}
}
-int dump_task_altivec(struct task_struct *tsk, elf_vrregset_t *vrregs)
+int dump_task_altivec(struct task_struct *tsk, elf_vrreg_t *vrregs)
{
/* ELF_NVRREG includes the VSCR and VRSAVE which we need to save
* separately, see below */
@@ -179,6 +192,80 @@ int dump_task_altivec(struct task_struct
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+#if 0
+/* not currently used, but some crazy RAID module might want to later */
+void enable_kernel_vsx(void)
+{
+ WARN_ON(preemptible());
+
+#ifdef CONFIG_SMP
+ if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
+ giveup_vsx(current);
+ else
+ giveup_vsx(NULL); /* just enable vsx for kernel - force */
+#else
+ giveup_vsx(last_task_used_vsx);
+#endif /* CONFIG_SMP */
+}
+EXPORT_SYMBOL(enable_kernel_vsx);
+#endif
+
+void flush_vsx_to_thread(struct task_struct *tsk)
+{
+ if (tsk->thread.regs) {
+ preempt_disable();
+ if (tsk->thread.regs->msr & MSR_VSX) {
+#ifdef CONFIG_SMP
+ BUG_ON(tsk != current);
+#endif
+ giveup_vsx(tsk);
+ }
+ preempt_enable();
+ }
+}
+
+/*
+ * This dumps the lower half 64bits of the first 32 VSX registers.
+ * This needs to be called with dump_task_fp and dump_task_altivec to
+ * get all the VSX state.
+ */
+int dump_task_vsx(struct task_struct *tsk, elf_vrreg_t *vrregs)
+{
+ elf_vrreg_t *reg;
+ double buf[32];
+ int i;
+
+ if (tsk == current)
+ flush_vsx_to_thread(tsk);
+
+ reg = (elf_vrreg_t *)vrregs;
+
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ memcpy(reg, buf, sizeof(buf));
+
+ return 1;
+}
+#endif /* CONFIG_VSX */
+
+int dump_task_vector(struct task_struct *tsk, elf_vrregset_t *vrregs)
+{
+ int rc = 0;
+ elf_vrreg_t *regs = (elf_vrreg_t *)vrregs;
+#ifdef CONFIG_ALTIVEC
+ rc = dump_task_altivec(tsk, regs);
+ if (rc)
+ return rc;
+ regs += ELF_NVRREG;
+#endif
+
+#ifdef CONFIG_VSX
+ rc = dump_task_vsx(tsk, regs);
+#endif
+ return rc;
+}
+
#ifdef CONFIG_SPE
void enable_kernel_spe(void)
@@ -233,6 +320,10 @@ void discard_lazy_cpu_state(void)
if (last_task_used_altivec == current)
last_task_used_altivec = NULL;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (last_task_used_vsx == current)
+ last_task_used_vsx = NULL;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
if (last_task_used_spe == current)
last_task_used_spe = NULL;
@@ -297,6 +388,10 @@ struct task_struct *__switch_to(struct t
if (prev->thread.regs && (prev->thread.regs->msr & MSR_VEC))
giveup_altivec(prev);
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (prev->thread.regs && (prev->thread.regs->msr & MSR_VSX))
+ giveup_vsx(prev);
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/*
* If the previous thread used spe in the last quantum
@@ -317,6 +412,10 @@ struct task_struct *__switch_to(struct t
if (new->thread.regs && last_task_used_altivec == new)
new->thread.regs->msr |= MSR_VEC;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ if (new->thread.regs && last_task_used_vsx == new)
+ new->thread.regs->msr |= MSR_VSX;
+#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
/* Avoid the trap. On smp this this never happens since
* we don't set last_task_used_spe
@@ -417,6 +516,8 @@ static struct regbit {
{MSR_EE, "EE"},
{MSR_PR, "PR"},
{MSR_FP, "FP"},
+ {MSR_VEC, "VEC"},
+ {MSR_VSX, "VSX"},
{MSR_ME, "ME"},
{MSR_IR, "IR"},
{MSR_DR, "DR"},
@@ -534,6 +635,7 @@ void prepare_to_copy(struct task_struct
{
flush_fp_to_thread(current);
flush_altivec_to_thread(current);
+ flush_vsx_to_thread(current);
flush_spe_to_thread(current);
}
@@ -689,6 +791,9 @@ void start_thread(struct pt_regs *regs,
#endif
discard_lazy_cpu_state();
+#ifdef CONFIG_VSX
+ current->thread.used_vsr = 0;
+#endif
memset(current->thread.fpr, 0, sizeof(current->thread.fpr));
current->thread.fpscr.val = 0;
#ifdef CONFIG_ALTIVEC
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -350,6 +350,51 @@ static int vr_set(struct task_struct *ta
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+/*
+ * Currently to set and and get all the vsx state, you need to call
+ * the fp and VMX calls aswell. This only get/sets the lower 32
+ * 128bit VSX registers.
+ */
+
+static int vsr_active(struct task_struct *target,
+ const struct user_regset *regset)
+{
+ flush_vsx_to_thread(target);
+ return target->thread.used_vsr ? regset->n : 0;
+}
+
+static int vsr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+
+static int vsr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+
+ flush_vsx_to_thread(target);
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
+ target->thread.fpr, 0,
+ 32 * sizeof(vector128));
+
+ return ret;
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_SPE
/*
@@ -426,6 +471,9 @@ enum powerpc_regset {
#ifdef CONFIG_ALTIVEC
REGSET_VMX,
#endif
+#ifdef CONFIG_VSX
+ REGSET_VSX,
+#endif
#ifdef CONFIG_SPE
REGSET_SPE,
#endif
@@ -449,6 +497,13 @@ static const struct user_regset native_r
.active = vr_active, .get = vr_get, .set = vr_set
},
#endif
+#ifdef CONFIG_VSX
+ [REGSET_VSX] = {
+ .n = 32,
+ .size = sizeof(vector128), .align = sizeof(vector128),
+ .active = vsr_active, .get = vsr_get, .set = vsr_set
+ },
+#endif
#ifdef CONFIG_SPE
[REGSET_SPE] = {
.n = 35,
@@ -849,6 +904,21 @@ long arch_ptrace(struct task_struct *chi
sizeof(u32)),
(const void __user *) data);
#endif
+#ifdef CONFIG_VSX
+ case PTRACE_GETVSRREGS:
+ return copy_regset_to_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (void __user *) data);
+
+ case PTRACE_SETVSRREGS:
+ return copy_regset_from_user(child, &user_ppc_native_view,
+ REGSET_VSX,
+ 0, (32 * sizeof(vector128) +
+ sizeof(u32)),
+ (const void __user *) data);
+#endif
#ifdef CONFIG_SPE
case PTRACE_GETEVRREGS:
/* Get the child spe register state. */
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_32.c
@@ -378,6 +378,21 @@ static int save_user_regs(struct pt_regs
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
if (__copy_to_user(&frame->mc_fregs, buf, ELF_NFPREG * sizeof(double)))
return 1;
+ /*
+ * Copy VSR 0-31 upper half from thread_struct to local
+ * buffer, then write that to userspace. Also set MSR_VSX in
+ * the saved MSR value to indicate that frame->mc_vregs
+ * contains valid data
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ if (__copy_to_user(&frame->mc_vsregs, buf,
+ ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ msr |= MSR_VSX;
+ }
#else
/* save floating-point registers */
if (__copy_to_user(&frame->mc_fregs, current->thread.fpr,
@@ -482,6 +497,24 @@ static long restore_user_regs(struct pt_
for (i = 0; i < 32 ; i++)
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Force the process to reload the VSX registers from
+ * current->thread when it next does VSX instruction.
+ */
+ regs->msr &= ~MSR_VSX;
+ if (msr & MSR_VSX) {
+ /*
+ * Restore altivec registers from the stack to a local
+ * buffer, then write this out to the thread_struct
+ */
+ if (__copy_from_user(buf, &sr->mc_vsregs,
+ sizeof(sr->mc_vsregs)))
+ return 1;
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+ } else if (current->thread.used_vsr)
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = 0;
#else
if (__copy_from_user(current->thread.fpr, &sr->mc_fregs,
sizeof(sr->mc_fregs)))
Index: linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/signal_64.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/signal_64.c
@@ -123,6 +123,22 @@ static long setup_sigcontext(struct sigc
buf[i] = current->thread.TS_FPR(i);
memcpy(&buf[i], ¤t->thread.fpscr, sizeof(double));
err |= __copy_to_user(&sc->fp_regs, buf, FP_REGS_SIZE);
+ /*
+ * Copy VSX low doubleword to local buffer for formatting,
+ * then out to userspace. Update v_regs to point after the
+ * VMX data.
+ */
+ if (current->thread.used_vsr) {
+ flush_vsx_to_thread(current);
+ v_regs += ELF_NVRREG;
+ for (i = 0; i < 32 ; i++)
+ buf[i] = current->thread.fpr[i][TS_VSRLOWOFFSET];
+ err |= __copy_to_user(v_regs, buf, 32 * sizeof(double));
+ /* set MSR_VSX in the MSR value in the frame to
+ * indicate that sc->vs_reg) contains valid data.
+ */
+ msr |= MSR_VSX;
+ }
#else /* CONFIG_VSX */
/* copy fpr regs and fpscr */
err |= __copy_to_user(&sc->fp_regs, ¤t->thread.fpr, FP_REGS_SIZE);
@@ -197,7 +213,7 @@ static long restore_sigcontext(struct pt
* This has to be done before copying stuff into current->thread.fpr/vr
* for the reasons explained in the previous comment.
*/
- regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+ regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
err |= __get_user(v_regs, &sc->v_regs);
@@ -226,6 +242,19 @@ static long restore_sigcontext(struct pt
current->thread.TS_FPR(i) = buf[i];
memcpy(¤t->thread.fpscr, &buf[i], sizeof(double));
+ /*
+ * Get additional VSX data. Update v_regs to point after the
+ * VMX data. Copy VSX low doubleword from userspace to local
+ * buffer for formatting, then into the taskstruct.
+ */
+ v_regs += ELF_NVRREG;
+ if ((msr & MSR_VSX) != 0)
+ err |= __copy_from_user(buf, v_regs, 32 * sizeof(double));
+ else
+ memset(buf, 0, 32 * sizeof(double));
+
+ for (i = 0; i < 32 ; i++)
+ current->thread.fpr[i][TS_VSRLOWOFFSET] = buf[i];
#else
err |= __copy_from_user(¤t->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
#endif
Index: linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/traps.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/traps.c
@@ -967,6 +967,20 @@ void altivec_unavailable_exception(struc
die("Unrecoverable VMX/Altivec Unavailable Exception", regs, SIGABRT);
}
+void vsx_unavailable_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs)) {
+ /* A user program has executed an vsx instruction,
+ but this kernel doesn't support vsx. */
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+ return;
+ }
+
+ printk(KERN_EMERG "Unrecoverable VSX Unavailable Exception "
+ "%lx at %lx\n", regs->trap, regs->nip);
+ die("Unrecoverable VSX Unavailable Exception", regs, SIGABRT);
+}
+
void performance_monitor_exception(struct pt_regs *regs)
{
perf_irq(regs);
@@ -1091,6 +1105,21 @@ void altivec_assist_exception(struct pt_
}
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+void vsx_assist_exception(struct pt_regs *regs)
+{
+ if (!user_mode(regs)) {
+ printk(KERN_EMERG "VSX assist exception in kernel mode"
+ " at %lx\n", regs->nip);
+ die("Kernel VSX assist exception", regs, SIGILL);
+ }
+
+ flush_vsx_to_thread(current);
+ printk(KERN_INFO "VSX assist not supported at %lx\n", regs->nip);
+ _exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
+}
+#endif /* CONFIG_VSX */
+
#ifdef CONFIG_FSL_BOOKE
void CacheLockingException(struct pt_regs *regs, unsigned long address,
unsigned long error_code)
Index: linux-2.6-ozlabs/include/asm-powerpc/elf.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/elf.h
+++ linux-2.6-ozlabs/include/asm-powerpc/elf.h
@@ -109,6 +109,7 @@ typedef elf_gregset_t32 compat_elf_gregs
#ifdef __powerpc64__
# define ELF_NVRREG32 33 /* includes vscr & vrsave stuffed together */
# define ELF_NVRREG 34 /* includes vscr & vrsave in split vectors */
+# define ELF_NVSRHALFREG 32 /* Half the vsx registers */
# define ELF_GREG_TYPE elf_greg_t64
#else
# define ELF_NEVRREG 34 /* includes acc (as 2) */
@@ -158,6 +159,7 @@ typedef __vector128 elf_vrreg_t;
typedef elf_vrreg_t elf_vrregset_t[ELF_NVRREG];
#ifdef __powerpc64__
typedef elf_vrreg_t elf_vrregset_t32[ELF_NVRREG32];
+typedef elf_fpreg_t elf_vsrreghalf_t32[ELF_NVSRHALFREG];
#endif
#ifdef __KERNEL__
@@ -219,8 +221,8 @@ extern int dump_task_fpu(struct task_str
typedef elf_vrregset_t elf_fpxregset_t;
#ifdef CONFIG_ALTIVEC
-extern int dump_task_altivec(struct task_struct *, elf_vrregset_t *vrregs);
-#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_altivec(tsk, regs)
+extern int dump_task_vector(struct task_struct *, elf_vrregset_t *vrregs);
+#define ELF_CORE_COPY_XFPREGS(tsk, regs) dump_task_vector(tsk, regs)
#define ELF_CORE_XFPREG_TYPE NT_PPC_VMX
#endif
Index: linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ptrace.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ptrace.h
@@ -223,6 +223,14 @@ extern void user_disable_single_step(str
#define PT_VRSAVE_32 (PT_VR0 + 33*4)
#endif
+/*
+ * Only store first 32 VSRs here. The second 32 VSRs in VR0-31
+ */
+#define PT_VSR0 150 /* each VSR reg occupies 2 slots in 64-bit */
+#define PT_VSR31 (PT_VSR0 + 2*31)
+#ifdef __KERNEL__
+#define PT_VSR0_32 300 /* each VSR reg occupies 4 slots in 32-bit */
+#endif
#endif /* __powerpc64__ */
/*
@@ -245,6 +253,10 @@ extern void user_disable_single_step(str
#define PTRACE_GETEVRREGS 20
#define PTRACE_SETEVRREGS 21
+/* Get the first 32 128bit VSX registers */
+#define PTRACE_GETVSRREGS 27
+#define PTRACE_SETVSRREGS 28
+
/*
* Get or set a debug register. The first 16 are DABR registers and the
* second 16 are IABR registers.
Index: linux-2.6-ozlabs/include/asm-powerpc/reg.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/reg.h
+++ linux-2.6-ozlabs/include/asm-powerpc/reg.h
@@ -30,6 +30,7 @@
#define MSR_ISF_LG 61 /* Interrupt 64b mode valid on 630 */
#define MSR_HV_LG 60 /* Hypervisor state */
#define MSR_VEC_LG 25 /* Enable AltiVec */
+#define MSR_VSX_LG 23 /* Enable VSX */
#define MSR_POW_LG 18 /* Enable Power Management */
#define MSR_WE_LG 18 /* Wait State Enable */
#define MSR_TGPR_LG 17 /* TLB Update registers in use */
@@ -71,6 +72,7 @@
#endif
#define MSR_VEC __MASK(MSR_VEC_LG) /* Enable AltiVec */
+#define MSR_VSX __MASK(MSR_VSX_LG) /* Enable VSX */
#define MSR_POW __MASK(MSR_POW_LG) /* Enable Power Management */
#define MSR_WE __MASK(MSR_WE_LG) /* Wait State Enable */
#define MSR_TGPR __MASK(MSR_TGPR_LG) /* TLB Update registers in use */
Index: linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/sigcontext.h
+++ linux-2.6-ozlabs/include/asm-powerpc/sigcontext.h
@@ -43,9 +43,44 @@ struct sigcontext {
* it must be copied via a vector register to/from storage) or as a word.
* The entry with index 33 contains the vrsave as the first word (offset 0)
* within the quadword.
+ *
+ * Part of the VSX data is stored here also by extending vmx_restore
+ * by an additional 32 double words. Architecturally the layout of
+ * the VSR registers and how they overlap on top of the legacy FPR and
+ * VR registers is shown below:
+ *
+ * VSR doubleword 0 VSR doubleword 1
+ * ----------------------------------------------------------------
+ * VSR[0] | FPR[0] | |
+ * ----------------------------------------------------------------
+ * VSR[1] | FPR[1] | |
+ * ----------------------------------------------------------------
+ * | ... | |
+ * | ... | |
+ * ----------------------------------------------------------------
+ * VSR[30] | FPR[30] | |
+ * ----------------------------------------------------------------
+ * VSR[31] | FPR[31] | |
+ * ----------------------------------------------------------------
+ * VSR[32] | VR[0] |
+ * ----------------------------------------------------------------
+ * VSR[33] | VR[1] |
+ * ----------------------------------------------------------------
+ * | ... |
+ * | ... |
+ * ----------------------------------------------------------------
+ * VSR[62] | VR[30] |
+ * ----------------------------------------------------------------
+ * VSR[63] | VR[31] |
+ * ----------------------------------------------------------------
+ *
+ * FPR/VSR 0-31 doubleword 0 is stored in fp_regs, and VMX/VSR 32-63
+ * is stored at the start of vmx_reserve. vmx_reserve is extended for
+ * backwards compatility to store VSR 0-31 doubleword 1 after the VMX
+ * registers and vscr/vrsave.
*/
elf_vrreg_t __user *v_regs;
- long vmx_reserve[ELF_NVRREG+ELF_NVRREG+1];
+ long vmx_reserve[ELF_NVRREG+ELF_NVRREG+32+1];
#endif
};
Index: linux-2.6-ozlabs/include/asm-powerpc/system.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/system.h
+++ linux-2.6-ozlabs/include/asm-powerpc/system.h
@@ -132,6 +132,7 @@ extern void enable_kernel_altivec(void);
extern void giveup_altivec(struct task_struct *);
extern void load_up_altivec(struct task_struct *);
extern int emulate_altivec(struct pt_regs *);
+extern void giveup_vsx(struct task_struct *);
extern void enable_kernel_spe(void);
extern void giveup_spe(struct task_struct *);
extern void load_up_spe(struct task_struct *);
@@ -155,6 +156,14 @@ static inline void flush_altivec_to_thre
}
#endif
+#ifdef CONFIG_VSX
+extern void flush_vsx_to_thread(struct task_struct *);
+#else
+static inline void flush_vsx_to_thread(struct task_struct *t)
+{
+}
+#endif
+
#ifdef CONFIG_SPE
extern void flush_spe_to_thread(struct task_struct *);
#else
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 9/9] powerpc: Add CONFIG_VSX config option
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (7 preceding siblings ...)
2008-06-25 4:07 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add CONFIG_VSX config build option. Must compile with POWER4, FPU and ALTIVEC.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/platforms/Kconfig.cputype | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
Index: linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-2.6-ozlabs/arch/powerpc/platforms/Kconfig.cputype
@@ -155,6 +155,22 @@ config ALTIVEC
If in doubt, say Y here.
+config VSX
+ bool "VSX Support"
+ depends on POWER4 && ALTIVEC && PPC_FPU
+ ---help---
+
+ This option enables kernel support for the Vector Scaler extensions
+ to the PowerPC processor. The kernel currently supports saving and
+ restoring VSX registers, and turning on the 'VSX enable' bit so user
+ processes can execute VSX instructions.
+
+ This option is only useful if you have a processor that supports
+ VSX (P7 and above), but does not have any affect on a non-VSX
+ CPUs (it does, however add code to the kernel).
+
+ If in doubt, say Y here.
+
config SPE
bool "SPE Support"
depends on E200 || E500
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH 6/9] powerpc: Add VSX CPU feature
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
` (6 preceding siblings ...)
2008-06-25 4:07 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
@ 2008-06-25 4:07 ` Michael Neuling
2008-06-25 4:07 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
8 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-25 4:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
Add a VSX CPU feature. Also add code to detect if VSX is available
from the device tree.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
---
arch/powerpc/kernel/prom.c | 4 ++++
include/asm-powerpc/cputable.h | 15 ++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/prom.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/prom.c
@@ -609,6 +609,10 @@ static struct feature_property {
{"altivec", 0, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
{"ibm,vmx", 1, CPU_FTR_ALTIVEC, PPC_FEATURE_HAS_ALTIVEC},
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_VSX
+ /* Yes, this _really_ is ibm,vmx == 2 to enable VSX */
+ {"ibm,vmx", 2, CPU_FTR_VSX, PPC_FEATURE_HAS_VSX},
+#endif /* CONFIG_VSX */
#ifdef CONFIG_PPC64
{"ibm,dfp", 1, 0, PPC_FEATURE_HAS_DFP},
{"ibm,purr", 1, CPU_FTR_PURR, 0},
Index: linux-2.6-ozlabs/include/asm-powerpc/cputable.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/cputable.h
+++ linux-2.6-ozlabs/include/asm-powerpc/cputable.h
@@ -27,6 +27,7 @@
#define PPC_FEATURE_HAS_DFP 0x00000400
#define PPC_FEATURE_POWER6_EXT 0x00000200
#define PPC_FEATURE_ARCH_2_06 0x00000100
+#define PPC_FEATURE_HAS_VSX 0x00000080
#define PPC_FEATURE_TRUE_LE 0x00000002
#define PPC_FEATURE_PPC_LE 0x00000001
@@ -181,6 +182,7 @@ extern void do_feature_fixups(unsigned l
#define CPU_FTR_DSCR LONG_ASM_CONST(0x0002000000000000)
#define CPU_FTR_1T_SEGMENT LONG_ASM_CONST(0x0004000000000000)
#define CPU_FTR_NO_SLBIE_B LONG_ASM_CONST(0x0008000000000000)
+#define CPU_FTR_VSX LONG_ASM_CONST(0x0010000000000000)
#ifndef __ASSEMBLY__
@@ -199,6 +201,17 @@ extern void do_feature_fixups(unsigned l
#define PPC_FEATURE_HAS_ALTIVEC_COMP 0
#endif
+/* We only set the VSX features if the kernel was compiled with VSX
+ * support
+ */
+#ifdef CONFIG_VSX
+#define CPU_FTR_VSX_COMP CPU_FTR_VSX
+#define PPC_FEATURE_HAS_VSX_COMP PPC_FEATURE_HAS_VSX
+#else
+#define CPU_FTR_VSX_COMP 0
+#define PPC_FEATURE_HAS_VSX_COMP 0
+#endif
+
/* We only set the spe features if the kernel was compiled with spe
* support
*/
@@ -399,7 +412,7 @@ extern void do_feature_fixups(unsigned l
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 | \
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
- CPU_FTR_1T_SEGMENT)
+ CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
#else
enum {
CPU_FTRS_POSSIBLE =
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 4:07 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
@ 2008-06-25 14:08 ` Kumar Gala
2008-06-25 15:34 ` Scott Wood
0 siblings, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-25 14:08 UTC (permalink / raw)
To: Michael Neuling; +Cc: linuxppc-dev, Paul Mackerras
>
> Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
> +++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
> @@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
> return -EPERM;
> }
>
> +/* Macros to workout the correct index for the FPR in the thread
> struct */
> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
Have you looked at what the compiler spits out here to make sure we
aren't getting a divide? Seems like we could use '& 0x1'.
> +#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) + FPRHALF(i)
>
> +
> long compat_arch_ptrace(struct task_struct *child, compat_long_t
> request,
> compat_ulong_t caddr, compat_ulong_t cdata)
> {
> @@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
> * to be an array of unsigned int (32 bits) - the
> * index passed in is based on this assumption.
> */
> - tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
> + tmp = ((unsigned int *)child->thread.fpr)
> + [FPRINDEX(index)];
> }
> ret = put_user((unsigned int)tmp, (u32 __user *)data);
> break;
> @@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
> CHECK_FULL_REGS(child->thread.regs);
> if (numReg >= PT_FPR0) {
> flush_fp_to_thread(child);
> - tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
> + tmp = ((unsigned long int *)child->thread.fpr)
> + [FPRINDEX(numReg)];
> } else { /* register within PT_REGS struct */
> tmp = ptrace_get_reg(child, numReg);
> }
> @@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
> * to be an array of unsigned int (32 bits) - the
> * index passed in is based on this assumption.
> */
> - ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
> + ((unsigned int *)child->thread.fpr)
> + [FPRINDEX(index)] = data;
> ret = 0;
> }
> break;
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 14:08 ` Kumar Gala
@ 2008-06-25 15:34 ` Scott Wood
2008-06-25 16:12 ` Gabriel Paubert
0 siblings, 1 reply; 106+ messages in thread
From: Scott Wood @ 2008-06-25 15:34 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
Kumar Gala wrote:
>> +/* Macros to workout the correct index for the FPR in the thread
>> struct */
>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>
> Have you looked at what the compiler spits out here to make sure we
> aren't getting a divide? Seems like we could use '& 0x1'.
GCC's not *that* dumb. However, you may get some unnecessary
sign-twiddling if "i" is signed.
-Scott
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 15:34 ` Scott Wood
@ 2008-06-25 16:12 ` Gabriel Paubert
2008-06-25 16:17 ` Scott Wood
2008-06-25 17:08 ` Andreas Schwab
0 siblings, 2 replies; 106+ messages in thread
From: Gabriel Paubert @ 2008-06-25 16:12 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> Kumar Gala wrote:
> >>+/* Macros to workout the correct index for the FPR in the thread
> >>struct */
> >>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >
> >Have you looked at what the compiler spits out here to make sure we
> >aren't getting a divide? Seems like we could use '& 0x1'.
>
> GCC's not *that* dumb. However, you may get some unnecessary
> sign-twiddling if "i" is signed.
Not for modulo 2, it's only an even/odd choice and GCC
implements that efficiently IIRC. For other powers of 2,
making the left hand side unsigned helps the compiler.
The right shift OTOH might be faster if "i" is unsigned
since right signed right shifts affect the carry on PPC (I really
don't know if srawi is slower than srwi on some processors,
srwi is a form of rlwinm which is always fast).
Gabriel
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 16:12 ` Gabriel Paubert
@ 2008-06-25 16:17 ` Scott Wood
2008-06-25 17:07 ` Kumar Gala
2008-06-26 10:44 ` [PATCH 2/9] " Gabriel Paubert
2008-06-25 17:08 ` Andreas Schwab
1 sibling, 2 replies; 106+ messages in thread
From: Scott Wood @ 2008-06-25 16:17 UTC (permalink / raw)
To: Gabriel Paubert; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
Gabriel Paubert wrote:
> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
>> Kumar Gala wrote:
>>>> +/* Macros to workout the correct index for the FPR in the thread
>>>> struct */
>>>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>>>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>>> Have you looked at what the compiler spits out here to make sure we
>>> aren't getting a divide? Seems like we could use '& 0x1'.
>> GCC's not *that* dumb. However, you may get some unnecessary
>> sign-twiddling if "i" is signed.
>
> Not for modulo 2, it's only an even/odd choice and GCC
> implements that efficiently IIRC. For other powers of 2,
> making the left hand side unsigned helps the compiler.
From this:
int foo(int x)
{
return x % 2;
}
I get this with -O3:
foo:
mr 0,3
srawi 3,3,1
addze 3,3
slwi 3,3,1
subf 3,3,0
blr
.size foo, .-foo
.ident "GCC: (GNU) 4.1.2"
Changing it to "x & 1", or to unsigned, gives this:
foo:
rlwinm 3,3,0,31,31
blr
.size foo, .-foo
.ident "GCC: (GNU) 4.1.2"
Maybe newer GCCs are better?
-Scott
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 16:17 ` Scott Wood
@ 2008-06-25 17:07 ` Kumar Gala
2008-06-26 0:09 ` Michael Neuling
2008-06-26 10:44 ` [PATCH 2/9] " Gabriel Paubert
1 sibling, 1 reply; 106+ messages in thread
From: Kumar Gala @ 2008-06-25 17:07 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
On Jun 25, 2008, at 11:17 AM, Scott Wood wrote:
> Gabriel Paubert wrote:
>> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
>>> Kumar Gala wrote:
>>>>> +/* Macros to workout the correct index for the FPR in the
>>>>> thread struct */
>>>>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>>>>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>>>> Have you looked at what the compiler spits out here to make sure
>>>> we aren't getting a divide? Seems like we could use '& 0x1'.
>>> GCC's not *that* dumb. However, you may get some unnecessary sign-
>>> twiddling if "i" is signed.
>> Not for modulo 2, it's only an even/odd choice and GCC implements
>> that efficiently IIRC. For other powers of 2,
>> making the left hand side unsigned helps the compiler.
>
> From this:
>
> int foo(int x)
> {
> return x % 2;
> }
>
> I get this with -O3:
>
> foo:
> mr 0,3
> srawi 3,3,1
> addze 3,3
> slwi 3,3,1
> subf 3,3,0
> blr
> .size foo, .-foo
> .ident "GCC: (GNU) 4.1.2"
>
> Changing it to "x & 1", or to unsigned, gives this:
>
> foo:
> rlwinm 3,3,0,31,31
> blr
> .size foo, .-foo
> .ident "GCC: (GNU) 4.1.2"
>
> Maybe newer GCCs are better?
Nope. gcc-4.3.0 from fedora 9:
foo:
mr 0,3
srawi 3,3,1
addze 3,3
slwi 3,3,1
subf 3,3,0
blr
bar:
rlwinm 3,3,0,31,31
blr
if you make 'x' unsigned things are better.
- k
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 16:12 ` Gabriel Paubert
2008-06-25 16:17 ` Scott Wood
@ 2008-06-25 17:08 ` Andreas Schwab
1 sibling, 0 replies; 106+ messages in thread
From: Andreas Schwab @ 2008-06-25 17:08 UTC (permalink / raw)
To: Gabriel Paubert; +Cc: Scott Wood, linuxppc-dev, Michael Neuling, Paul Mackerras
Gabriel Paubert <paubert@iram.es> writes:
> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
>> Kumar Gala wrote:
>> >>+/* Macros to workout the correct index for the FPR in the thread
>> >>struct */
>> >>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>> >>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
>> >
>> >Have you looked at what the compiler spits out here to make sure we
>> >aren't getting a divide? Seems like we could use '& 0x1'.
>>
>> GCC's not *that* dumb. However, you may get some unnecessary
>> sign-twiddling if "i" is signed.
>
> Not for modulo 2, it's only an even/odd choice
That's wrong. -1 % 2 == -1, 1 % 2 == 1.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 17:07 ` Kumar Gala
@ 2008-06-26 0:09 ` Michael Neuling
2008-06-26 7:07 ` [PATCH] " Michael Neuling
0 siblings, 1 reply; 106+ messages in thread
From: Michael Neuling @ 2008-06-26 0:09 UTC (permalink / raw)
To: Kumar Gala; +Cc: Scott Wood, linuxppc-dev, Paul Mackerras
In message <1DD06CDB-428E-4832-93CA-6F0404CA6692@kernel.crashing.org> you wrote:
>
> On Jun 25, 2008, at 11:17 AM, Scott Wood wrote:
>
> > Gabriel Paubert wrote:
> >> On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> >>> Kumar Gala wrote:
> >>>>> +/* Macros to workout the correct index for the FPR in the
> >>>>> thread struct */
> >>>>> +#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>>>> +#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >>>> Have you looked at what the compiler spits out here to make sure
> >>>> we aren't getting a divide? Seems like we could use '& 0x1'.
> >>> GCC's not *that* dumb. However, you may get some unnecessary sign-
> >>> twiddling if "i" is signed.
> >> Not for modulo 2, it's only an even/odd choice and GCC implements
> >> that efficiently IIRC. For other powers of 2,
> >> making the left hand side unsigned helps the compiler.
> >
> > From this:
> >
> > int foo(int x)
> > {
> > return x % 2;
> > }
> >
> > I get this with -O3:
> >
> > foo:
> > mr 0,3
> > srawi 3,3,1
> > addze 3,3
> > slwi 3,3,1
> > subf 3,3,0
> > blr
> > .size foo, .-foo
> > .ident "GCC: (GNU) 4.1.2"
> >
> > Changing it to "x & 1", or to unsigned, gives this:
> >
> > foo:
> > rlwinm 3,3,0,31,31
> > blr
> > .size foo, .-foo
> > .ident "GCC: (GNU) 4.1.2"
> >
> > Maybe newer GCCs are better?
>
> Nope. gcc-4.3.0 from fedora 9:
>
> foo:
> mr 0,3
> srawi 3,3,1
> addze 3,3
> slwi 3,3,1
> subf 3,3,0
> blr
>
> bar:
> rlwinm 3,3,0,31,31
> blr
>
> if you make 'x' unsigned things are better.
I've changed it to '& 0x1', which compiles to something better here.
Mikey
^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-26 0:09 ` Michael Neuling
@ 2008-06-26 7:07 ` Michael Neuling
0 siblings, 0 replies; 106+ messages in thread
From: Michael Neuling @ 2008-06-26 7:07 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev
We are going to change where the floating point registers are stored
in the thread_struct, so in preparation add some macros to access the
floating point registers. Update all code to use these new macros.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
Changes '% 2' to '& 1' as noticed by Kumar
---
arch/powerpc/kernel/align.c | 6 ++--
arch/powerpc/kernel/process.c | 2 -
arch/powerpc/kernel/ptrace.c | 10 ++++--
arch/powerpc/kernel/ptrace32.c | 14 +++++++--
arch/powerpc/kernel/softemu8xx.c | 4 +-
arch/powerpc/math-emu/math.c | 56 +++++++++++++++++++--------------------
include/asm-powerpc/ppc_asm.h | 5 ++-
include/asm-powerpc/processor.h | 4 ++
8 files changed, 58 insertions(+), 43 deletions(-)
Index: linux-2.6-ozlabs/arch/powerpc/kernel/align.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/align.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/align.c
@@ -366,7 +366,7 @@ static int emulate_multiple(struct pt_re
static int emulate_fp_pair(struct pt_regs *regs, unsigned char __user *addr,
unsigned int reg, unsigned int flags)
{
- char *ptr = (char *) ¤t->thread.fpr[reg];
+ char *ptr = (char *) ¤t->thread.TS_FPR(reg);
int i, ret;
if (!(flags & F))
@@ -784,7 +784,7 @@ int fix_alignment(struct pt_regs *regs)
return -EFAULT;
}
} else if (flags & F) {
- data.dd = current->thread.fpr[reg];
+ data.dd = current->thread.TS_FPR(reg);
if (flags & S) {
/* Single-precision FP store requires conversion... */
#ifdef CONFIG_PPC_FPU
@@ -862,7 +862,7 @@ int fix_alignment(struct pt_regs *regs)
if (unlikely(ret))
return -EFAULT;
} else if (flags & F)
- current->thread.fpr[reg] = data.dd;
+ current->thread.TS_FPR(reg) = data.dd;
else
regs->gpr[reg] = data.ll;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -110,7 +110,7 @@ int dump_task_fpu(struct task_struct *ts
return 0;
flush_fp_to_thread(current);
- memcpy(fpregs, &tsk->thread.fpr[0], sizeof(*fpregs));
+ memcpy(fpregs, &tsk->thread.TS_FPR(0), sizeof(*fpregs));
return 1;
}
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace.c
@@ -218,7 +218,7 @@ static int fpr_get(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&target->thread.fpr, 0, -1);
@@ -231,7 +231,7 @@ static int fpr_set(struct task_struct *t
flush_fp_to_thread(target);
BUILD_BUG_ON(offsetof(struct thread_struct, fpscr) !=
- offsetof(struct thread_struct, fpr[32]));
+ offsetof(struct thread_struct, TS_FPR(32)));
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&target->thread.fpr, 0, -1);
@@ -728,7 +728,8 @@ long arch_ptrace(struct task_struct *chi
tmp = ptrace_get_reg(child, (int) index);
} else {
flush_fp_to_thread(child);
- tmp = ((unsigned long *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned long *)child->thread.fpr)
+ [TS_FPRWIDTH * (index - PT_FPR0)];
}
ret = put_user(tmp,(unsigned long __user *) data);
break;
@@ -755,7 +756,8 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_put_reg(child, index, data);
} else {
flush_fp_to_thread(child);
- ((unsigned long *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned long *)child->thread.fpr)
+ [TS_FPRWIDTH * (index - PT_FPR0)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/ptrace32.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/ptrace32.c
@@ -64,6 +64,11 @@ static long compat_ptrace_old(struct tas
return -EPERM;
}
+/* Macros to workout the correct index for the FPR in the thread struct */
+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
+#define FPRHALF(i) (((i) - PT_FPR0) & 1)
+#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) + FPRHALF(i)
+
long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
compat_ulong_t caddr, compat_ulong_t cdata)
{
@@ -122,7 +127,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- tmp = ((unsigned int *)child->thread.fpr)[index - PT_FPR0];
+ tmp = ((unsigned int *)child->thread.fpr)
+ [FPRINDEX(index)];
}
ret = put_user((unsigned int)tmp, (u32 __user *)data);
break;
@@ -162,7 +168,8 @@ long compat_arch_ptrace(struct task_stru
CHECK_FULL_REGS(child->thread.regs);
if (numReg >= PT_FPR0) {
flush_fp_to_thread(child);
- tmp = ((unsigned long int *)child->thread.fpr)[numReg - PT_FPR0];
+ tmp = ((unsigned long int *)child->thread.fpr)
+ [FPRINDEX(numReg)];
} else { /* register within PT_REGS struct */
tmp = ptrace_get_reg(child, numReg);
}
@@ -217,7 +224,8 @@ long compat_arch_ptrace(struct task_stru
* to be an array of unsigned int (32 bits) - the
* index passed in is based on this assumption.
*/
- ((unsigned int *)child->thread.fpr)[index - PT_FPR0] = data;
+ ((unsigned int *)child->thread.fpr)
+ [FPRINDEX(index)] = data;
ret = 0;
}
break;
Index: linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/softemu8xx.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/softemu8xx.c
@@ -124,7 +124,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
disp = instword & 0xffff;
ea = (u32 *)(regs->gpr[idxreg] + disp);
- ip = (u32 *)¤t->thread.fpr[flreg];
+ ip = (u32 *)¤t->thread.TS_FPR(flreg);
switch ( inst )
{
@@ -168,7 +168,7 @@ int Soft_emulate_8xx(struct pt_regs *reg
break;
case FMR:
/* assume this is a fp move -- Cort */
- memcpy(ip, ¤t->thread.fpr[(instword>>11)&0x1f],
+ memcpy(ip, ¤t->thread.TS_FPR((instword>>11)&0x1f),
sizeof(double));
break;
default:
Index: linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
===================================================================
--- linux-2.6-ozlabs.orig/arch/powerpc/math-emu/math.c
+++ linux-2.6-ozlabs/arch/powerpc/math-emu/math.c
@@ -230,14 +230,14 @@ do_mathemu(struct pt_regs *regs)
case LFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
break;
case LFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
lfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
@@ -245,21 +245,21 @@ do_mathemu(struct pt_regs *regs)
case STFD:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
break;
case STFDU:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
stfd(op0, op1, op2, op3);
regs->gpr[idx] = (unsigned long)op1;
break;
case OP63:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
fmr(op0, op1, op2, op3);
break;
default:
@@ -356,28 +356,28 @@ do_mathemu(struct pt_regs *regs)
switch (type) {
case AB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case AC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case ABC:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op2 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 6) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 6) & 0x1f);
break;
case D:
idx = (insn >> 16) & 0x1f;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0) + sdisp);
break;
@@ -387,27 +387,27 @@ do_mathemu(struct pt_regs *regs)
goto illegal;
sdisp = (insn & 0xffff);
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)(regs->gpr[idx] + sdisp);
break;
case X:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
break;
case XA:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
break;
case XB:
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XE:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
if (!idx) {
if (((insn >> 1) & 0x3ff) == STFIWX)
op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
@@ -421,7 +421,7 @@ do_mathemu(struct pt_regs *regs)
case XEU:
idx = (insn >> 16) & 0x1f;
- op0 = (void *)¤t->thread.fpr[(insn >> 21) & 0x1f];
+ op0 = (void *)¤t->thread.TS_FPR((insn >> 21) & 0x1f);
op1 = (void *)((idx ? regs->gpr[idx] : 0)
+ regs->gpr[(insn >> 11) & 0x1f]);
break;
@@ -429,8 +429,8 @@ do_mathemu(struct pt_regs *regs)
case XCR:
op0 = (void *)®s->ccr;
op1 = (void *)((insn >> 23) & 0x7);
- op2 = (void *)¤t->thread.fpr[(insn >> 16) & 0x1f];
- op3 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op2 = (void *)¤t->thread.TS_FPR((insn >> 16) & 0x1f);
+ op3 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
case XCRL:
@@ -450,7 +450,7 @@ do_mathemu(struct pt_regs *regs)
case XFLB:
op0 = (void *)((insn >> 17) & 0xff);
- op1 = (void *)¤t->thread.fpr[(insn >> 11) & 0x1f];
+ op1 = (void *)¤t->thread.TS_FPR((insn >> 11) & 0x1f);
break;
default:
Index: linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/ppc_asm.h
+++ linux-2.6-ozlabs/include/asm-powerpc/ppc_asm.h
@@ -6,6 +6,7 @@
#include <linux/stringify.h>
#include <asm/asm-compat.h>
+#include <asm/processor.h>
#ifndef __ASSEMBLY__
#error __FILE__ should only be used in assembler files
@@ -83,13 +84,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_PURR);
#define REST_8GPRS(n, base) REST_4GPRS(n, base); REST_4GPRS(n+4, base)
#define REST_10GPRS(n, base) REST_8GPRS(n, base); REST_2GPRS(n+8, base)
-#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*(n)(base)
+#define SAVE_FPR(n, base) stfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
#define SAVE_2FPRS(n, base) SAVE_FPR(n, base); SAVE_FPR(n+1, base)
#define SAVE_4FPRS(n, base) SAVE_2FPRS(n, base); SAVE_2FPRS(n+2, base)
#define SAVE_8FPRS(n, base) SAVE_4FPRS(n, base); SAVE_4FPRS(n+4, base)
#define SAVE_16FPRS(n, base) SAVE_8FPRS(n, base); SAVE_8FPRS(n+8, base)
#define SAVE_32FPRS(n, base) SAVE_16FPRS(n, base); SAVE_16FPRS(n+16, base)
-#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*(n)(base)
+#define REST_FPR(n, base) lfd n,THREAD_FPR0+8*TS_FPRWIDTH*(n)(base)
#define REST_2FPRS(n, base) REST_FPR(n, base); REST_FPR(n+1, base)
#define REST_4FPRS(n, base) REST_2FPRS(n, base); REST_2FPRS(n+2, base)
#define REST_8FPRS(n, base) REST_4FPRS(n, base); REST_4FPRS(n+4, base)
Index: linux-2.6-ozlabs/include/asm-powerpc/processor.h
===================================================================
--- linux-2.6-ozlabs.orig/include/asm-powerpc/processor.h
+++ linux-2.6-ozlabs/include/asm-powerpc/processor.h
@@ -12,6 +12,8 @@
#include <asm/reg.h>
+#define TS_FPRWIDTH 1
+
#ifndef __ASSEMBLY__
#include <linux/compiler.h>
#include <asm/ptrace.h>
@@ -136,6 +138,8 @@ typedef struct {
unsigned long seg;
} mm_segment_t;
+#define TS_FPR(i) fpr[i]
+
struct thread_struct {
unsigned long ksp; /* Kernel stack pointer */
unsigned long ksp_limit; /* if ksp <= ksp_limit stack overflow */
^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.
2008-06-25 16:17 ` Scott Wood
2008-06-25 17:07 ` Kumar Gala
@ 2008-06-26 10:44 ` Gabriel Paubert
1 sibling, 0 replies; 106+ messages in thread
From: Gabriel Paubert @ 2008-06-26 10:44 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Michael Neuling, Paul Mackerras
On Wed, Jun 25, 2008 at 11:17:45AM -0500, Scott Wood wrote:
> Gabriel Paubert wrote:
> >On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> >>Kumar Gala wrote:
> >>>>+/* Macros to workout the correct index for the FPR in the thread
> >>>>struct */
> >>>>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>>>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >>>Have you looked at what the compiler spits out here to make sure we
> >>>aren't getting a divide? Seems like we could use '& 0x1'.
> >>GCC's not *that* dumb. However, you may get some unnecessary
> >>sign-twiddling if "i" is signed.
> >
> >Not for modulo 2, it's only an even/odd choice and GCC
> >implements that efficiently IIRC. For other powers of 2,
> >making the left hand side unsigned helps the compiler.
>
> From this:
>
> int foo(int x)
> {
> return x % 2;
> }
>
> I get this with -O3:
>
> foo:
> mr 0,3
> srawi 3,3,1
> addze 3,3
> slwi 3,3,1
> subf 3,3,0
> blr
> .size foo, .-foo
> .ident "GCC: (GNU) 4.1.2"
>
Indeed. Signed modulo results can be negative...
There are probably better ways to implement this case
on PPC, for example:
rlwinm tmp,input,4,27,28 ; make shift amount from LSB and MSB
lis result,0xff01
srw result,result,tmp
; result is now 0x00 for even, 0x01 for odd positive,
; and 0xff for odd negative
extsb result,result
No carry, shorter dependency length (although srw may be slow
on Cell it seems, but addze may be worse).
> Changing it to "x & 1", or to unsigned, gives this:
>
> foo:
> rlwinm 3,3,0,31,31
> blr
> .size foo, .-foo
> .ident "GCC: (GNU) 4.1.2"
>
> Maybe newer GCCs are better?
Nope, but unsigned is often better for the right shift.
Gabriel
^ permalink raw reply [flat|nested] 106+ messages in thread
end of thread, other threads:[~2008-06-26 11:21 UTC | newest]
Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-18 0:47 [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-18 0:47 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-18 0:47 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-18 0:47 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-18 19:35 ` Kumar Gala
2008-06-18 22:58 ` Paul Mackerras
2008-06-19 4:13 ` Kumar Gala
2008-06-19 4:30 ` Michael Neuling
2008-06-19 4:22 ` Kumar Gala
2008-06-19 4:35 ` Michael Neuling
2008-06-19 4:58 ` Kumar Gala
2008-06-19 5:37 ` Michael Neuling
2008-06-19 5:47 ` Kumar Gala
2008-06-19 6:01 ` Michael Neuling
2008-06-19 6:10 ` Kumar Gala
2008-06-19 9:33 ` Benjamin Herrenschmidt
2008-06-19 13:24 ` Kumar Gala
2008-06-18 0:47 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-18 0:47 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-18 14:53 ` Kumar Gala
2008-06-18 23:55 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-18 0:47 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-18 0:47 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-18 16:28 ` Joel Schopp
2008-06-19 6:51 ` David Woodhouse
2008-06-19 7:00 ` Michael Neuling
2008-06-18 0:47 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-18 13:05 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
2008-06-18 23:54 ` Michael Neuling
2008-06-20 4:13 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-20 6:39 ` Kumar Gala
2008-06-22 11:29 ` Michael Neuling
2008-06-20 4:13 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-20 6:35 ` Kumar Gala
2008-06-20 4:13 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-20 6:44 ` Kumar Gala
2008-06-20 4:13 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-20 4:13 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-20 4:13 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-20 4:13 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-20 4:13 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-20 4:13 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-20 6:37 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Kumar Gala
2008-06-20 8:15 ` Michael Neuling
2008-06-23 5:31 ` Michael Neuling
2008-06-23 5:31 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-23 5:31 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-23 5:31 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-23 5:31 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-23 5:31 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-23 5:31 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-23 5:31 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-23 5:31 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-23 5:31 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-23 7:38 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-23 7:38 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-23 7:38 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-23 14:46 ` Kumar Gala
2008-06-23 7:38 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-23 7:38 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-23 7:38 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-23 7:38 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-23 7:38 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-23 7:38 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-23 7:38 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-24 10:57 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-24 10:57 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-24 10:57 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-24 13:47 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-24 14:07 ` Kumar Gala
2008-06-24 16:33 ` Segher Boessenkool
2008-06-25 0:25 ` Michael Neuling
2008-06-24 10:57 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-24 10:57 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-24 10:57 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-24 14:01 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-24 14:19 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
2008-06-24 14:19 ` Kumar Gala
2008-06-24 10:57 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-24 14:06 ` Kumar Gala
2008-06-25 0:06 ` Michael Neuling
2008-06-25 2:19 ` Kumar Gala
2008-06-25 4:07 ` [PATCH 0/9] powerpc: Add kernel support for POWER7 VSX Michael Neuling
2008-06-25 4:07 ` [PATCH 1/9] powerpc: Fix msr setting in 32 bit signal code Michael Neuling
2008-06-25 4:07 ` [PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct Michael Neuling
2008-06-25 14:08 ` Kumar Gala
2008-06-25 15:34 ` Scott Wood
2008-06-25 16:12 ` Gabriel Paubert
2008-06-25 16:17 ` Scott Wood
2008-06-25 17:07 ` Kumar Gala
2008-06-26 0:09 ` Michael Neuling
2008-06-26 7:07 ` [PATCH] " Michael Neuling
2008-06-26 10:44 ` [PATCH 2/9] " Gabriel Paubert
2008-06-25 17:08 ` Andreas Schwab
2008-06-25 4:07 ` [PATCH 8/9] powerpc: Add VSX context save/restore, ptrace and signal support Michael Neuling
2008-06-25 4:07 ` [PATCH 5/9] powerpc: Introduce VSX thread_struct and CONFIG_VSX Michael Neuling
2008-06-25 4:07 ` [PATCH 3/9] powerpc: Move altivec_unavailable Michael Neuling
2008-06-25 4:07 ` [PATCH 7/9] powerpc: Add VSX assembler code macros Michael Neuling
2008-06-25 4:07 ` [PATCH 4/9] powerpc: Make load_up_fpu and load_up_altivec callable Michael Neuling
2008-06-25 4:07 ` [PATCH 6/9] powerpc: Add VSX CPU feature Michael Neuling
2008-06-25 4:07 ` [PATCH 9/9] powerpc: Add CONFIG_VSX config option Michael Neuling
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.