All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc.
@ 2017-08-30  4:12 Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 01/17] powerpc: Correct instruction code for xxlor instruction Paul Mackerras
                   ` (19 more replies)
  0 siblings, 20 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This series extends the instruction emulation infrastructure in
arch/powerpc/lib/sstep.c and uses it for emulating instructions when
we get an alignment interrupt.  The advantage of this is that we only
have to add the new POWER9 instructions in one place, and it fixes
several bugs in alignment interrupt handling that have been identified
recently.

With this, analyse_instr() and emulate_step() handle almost all load
and store instructions in Power ISA v3.00 -- all except the atomic
memory operations (lwat, stwat, etc.).  We now always use the largest
possible aligned memory accesses (up to 8 bytes) to emulate unaligned
accesses.  If we get a fault, the faulting address is accurately
recorded in regs->dar.  We also can now access FP/VMX/VSX registers
directly if they are live, without having to spill them all to the
thread_struct and the reload them all later.  There are also various
other fixes in the series.

This version is based on the current powerpc next branch.

Paul.

 arch/powerpc/Kconfig                  |    4 -
 arch/powerpc/include/asm/ppc-opcode.h |   10 +-
 arch/powerpc/include/asm/sstep.h      |   90 +-
 arch/powerpc/kernel/align.c           |  774 +-----------
 arch/powerpc/lib/Makefile             |    3 +-
 arch/powerpc/lib/ldstfp.S             |  307 ++---
 arch/powerpc/lib/quad.S               |   62 +
 arch/powerpc/lib/sstep.c              | 2139 +++++++++++++++++++++++----------
 8 files changed, 1802 insertions(+), 1587 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 01/17] powerpc: Correct instruction code for xxlor instruction
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-09-01 13:29   ` [v3, " Michael Ellerman
  2017-08-30  4:12 ` [PATCH v3 02/17] powerpc: Change analyse_instr so it doesn't modify *regs Paul Mackerras
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

The instruction code for xxlor that commit 0016a4cf5582 ("powerpc:
Emulate most Book I instructions in emulate_step()", 2010-06-15)
added is actually the code for xxlnor.  It is used in get_vsr()
and put_vsr() and the effect of the error is that if emulate_step
is used to emulate a VSX load or store from any register other
than vsr0, the bitwise complement of the correct value will be
loaded or stored.  This corrects the error.

Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/ppc-opcode.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 041ba15..8861289 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -262,7 +262,7 @@
 #define PPC_INST_TLBSRX_DOT		0x7c0006a5
 #define PPC_INST_VPMSUMW		0x10000488
 #define PPC_INST_VPMSUMD		0x100004c8
-#define PPC_INST_XXLOR			0xf0000510
+#define PPC_INST_XXLOR			0xf0000490
 #define PPC_INST_XXSWAPD		0xf0000250
 #define PPC_INST_XVCPSGNDP		0xf0000780
 #define PPC_INST_TRECHKPT		0x7c0007dd
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 02/17] powerpc: Change analyse_instr so it doesn't modify *regs
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 01/17] powerpc: Correct instruction code for xxlor instruction Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 03/17] powerpc: Don't check MSR FP/VMX/VSX enable bits in analyse_instr() Paul Mackerras
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

The analyse_instr function currently doesn't just work out what an
instruction does, it also executes those instructions whose effect
is only to update CPU registers that are stored in struct pt_regs.
This is undesirable because optprobes uses analyse_instr to work out
if an instruction could be successfully emulated in future.

This changes analyse_instr so it doesn't modify *regs; instead it
stores information in the instruction_op structure to indicate what
registers (GPRs, CR, XER, LR) would be set and what value they would
be set to.  A companion function called emulate_update_regs() can
then use that information to update a pt_regs struct appropriately.

As a minor cleanup, this replaces inline asm using the cntlzw and
cntlzd instructions with calls to __builtin_clz() and __builtin_clzl().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |  52 +++-
 arch/powerpc/lib/sstep.c         | 601 +++++++++++++++++++++++----------------
 2 files changed, 396 insertions(+), 257 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index d3a42cc..442e636 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -23,9 +23,6 @@ struct pt_regs;
 #define IS_RFID(instr)		(((instr) & 0xfc0007fe) == 0x4c000024)
 #define IS_RFI(instr)		(((instr) & 0xfc0007fe) == 0x4c000064)
 
-/* Emulate instructions that cause a transfer of control. */
-extern int emulate_step(struct pt_regs *regs, unsigned int instr);
-
 enum instruction_type {
 	COMPUTE,		/* arith/logical/CR op, etc. */
 	LOAD,
@@ -55,11 +52,29 @@ enum instruction_type {
 
 #define INSTR_TYPE_MASK	0x1f
 
+/* Compute flags, ORed in with type */
+#define SETREG		0x20
+#define SETCC		0x40
+#define SETXER		0x80
+
+/* Branch flags, ORed in with type */
+#define SETLK		0x20
+#define BRTAKEN		0x40
+#define DECCTR		0x80
+
 /* Load/store flags, ORed in with type */
 #define SIGNEXT		0x20
 #define UPDATE		0x40	/* matches bit in opcode 31 instructions */
 #define BYTEREV		0x80
 
+/* Barrier type field, ORed in with type */
+#define BARRIER_MASK	0xe0
+#define BARRIER_SYNC	0x00
+#define BARRIER_ISYNC	0x20
+#define BARRIER_EIEIO	0x40
+#define BARRIER_LWSYNC	0x60
+#define BARRIER_PTESYNC	0x80
+
 /* Cacheop values, ORed in with type */
 #define CACHEOP_MASK	0x700
 #define DCBST		0
@@ -83,7 +98,36 @@ struct instruction_op {
 	int update_reg;
 	/* For MFSPR */
 	int spr;
+	u32 ccval;
+	u32 xerval;
 };
 
-extern int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
+/*
+ * Decode an instruction, and return information about it in *op
+ * without changing *regs.
+ *
+ * Return value is 1 if the instruction can be emulated just by
+ * updating *regs with the information in *op, -1 if we need the
+ * GPRs but *regs doesn't contain the full register set, or 0
+ * otherwise.
+ */
+extern int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			 unsigned int instr);
+
+/*
+ * Emulate an instruction that can be executed just by updating
+ * fields in *regs.
+ */
+void emulate_update_regs(struct pt_regs *reg, struct instruction_op *op);
+
+/*
+ * Emulate instructions that cause a transfer of control,
+ * arithmetic/logical instructions, loads and stores,
+ * cache operations and barriers.
+ *
+ * Returns 1 if the instruction was emulated successfully,
+ * 0 if it could not be emulated, or -1 for an instruction that
+ * should not be emulated (rfid, mtmsrd clearing MSR_RI, etc.).
+ */
+extern int emulate_step(struct pt_regs *regs, unsigned int instr);
+
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index a85b82c..8e581c6 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -62,15 +62,17 @@ static nokprobe_inline unsigned long truncate_if_32bit(unsigned long msr,
 /*
  * Determine whether a conditional branch instruction would branch.
  */
-static nokprobe_inline int branch_taken(unsigned int instr, struct pt_regs *regs)
+static nokprobe_inline int branch_taken(unsigned int instr,
+					const struct pt_regs *regs,
+					struct instruction_op *op)
 {
 	unsigned int bo = (instr >> 21) & 0x1f;
 	unsigned int bi;
 
 	if ((bo & 4) == 0) {
 		/* decrement counter */
-		--regs->ctr;
-		if (((bo >> 1) & 1) ^ (regs->ctr == 0))
+		op->type |= DECCTR;
+		if (((bo >> 1) & 1) ^ (regs->ctr == 1))
 			return 0;
 	}
 	if ((bo & 0x10) == 0) {
@@ -92,7 +94,8 @@ static nokprobe_inline long address_ok(struct pt_regs *regs, unsigned long ea, i
 /*
  * Calculate effective address for a D-form instruction
  */
-static nokprobe_inline unsigned long dform_ea(unsigned int instr, struct pt_regs *regs)
+static nokprobe_inline unsigned long dform_ea(unsigned int instr,
+					      const struct pt_regs *regs)
 {
 	int ra;
 	unsigned long ea;
@@ -109,7 +112,8 @@ static nokprobe_inline unsigned long dform_ea(unsigned int instr, struct pt_regs
 /*
  * Calculate effective address for a DS-form instruction
  */
-static nokprobe_inline unsigned long dsform_ea(unsigned int instr, struct pt_regs *regs)
+static nokprobe_inline unsigned long dsform_ea(unsigned int instr,
+					       const struct pt_regs *regs)
 {
 	int ra;
 	unsigned long ea;
@@ -127,7 +131,7 @@ static nokprobe_inline unsigned long dsform_ea(unsigned int instr, struct pt_reg
  * Calculate effective address for an X-form instruction
  */
 static nokprobe_inline unsigned long xform_ea(unsigned int instr,
-						struct pt_regs *regs)
+					      const struct pt_regs *regs)
 {
 	int ra, rb;
 	unsigned long ea;
@@ -526,24 +530,27 @@ static nokprobe_inline int do_vsx_store(int rn, int (*func)(int, unsigned long),
 		: "=r" (err)				\
 		: "r" (addr), "i" (-EFAULT), "0" (err))
 
-static nokprobe_inline void set_cr0(struct pt_regs *regs, int rd)
+static nokprobe_inline void set_cr0(const struct pt_regs *regs,
+				    struct instruction_op *op, int rd)
 {
 	long val = regs->gpr[rd];
 
-	regs->ccr = (regs->ccr & 0x0fffffff) | ((regs->xer >> 3) & 0x10000000);
+	op->type |= SETCC;
+	op->ccval = (regs->ccr & 0x0fffffff) | ((regs->xer >> 3) & 0x10000000);
 #ifdef __powerpc64__
 	if (!(regs->msr & MSR_64BIT))
 		val = (int) val;
 #endif
 	if (val < 0)
-		regs->ccr |= 0x80000000;
+		op->ccval |= 0x80000000;
 	else if (val > 0)
-		regs->ccr |= 0x40000000;
+		op->ccval |= 0x40000000;
 	else
-		regs->ccr |= 0x20000000;
+		op->ccval |= 0x20000000;
 }
 
-static nokprobe_inline void add_with_carry(struct pt_regs *regs, int rd,
+static nokprobe_inline void add_with_carry(const struct pt_regs *regs,
+				     struct instruction_op *op, int rd,
 				     unsigned long val1, unsigned long val2,
 				     unsigned long carry_in)
 {
@@ -551,24 +558,29 @@ static nokprobe_inline void add_with_carry(struct pt_regs *regs, int rd,
 
 	if (carry_in)
 		++val;
-	regs->gpr[rd] = val;
+	op->type = COMPUTE + SETREG + SETXER;
+	op->reg = rd;
+	op->val = val;
 #ifdef __powerpc64__
 	if (!(regs->msr & MSR_64BIT)) {
 		val = (unsigned int) val;
 		val1 = (unsigned int) val1;
 	}
 #endif
+	op->xerval = regs->xer;
 	if (val < val1 || (carry_in && val == val1))
-		regs->xer |= XER_CA;
+		op->xerval |= XER_CA;
 	else
-		regs->xer &= ~XER_CA;
+		op->xerval &= ~XER_CA;
 }
 
-static nokprobe_inline void do_cmp_signed(struct pt_regs *regs, long v1, long v2,
-				    int crfld)
+static nokprobe_inline void do_cmp_signed(const struct pt_regs *regs,
+					  struct instruction_op *op,
+					  long v1, long v2, int crfld)
 {
 	unsigned int crval, shift;
 
+	op->type = COMPUTE + SETCC;
 	crval = (regs->xer >> 31) & 1;		/* get SO bit */
 	if (v1 < v2)
 		crval |= 8;
@@ -577,14 +589,17 @@ static nokprobe_inline void do_cmp_signed(struct pt_regs *regs, long v1, long v2
 	else
 		crval |= 2;
 	shift = (7 - crfld) * 4;
-	regs->ccr = (regs->ccr & ~(0xf << shift)) | (crval << shift);
+	op->ccval = (regs->ccr & ~(0xf << shift)) | (crval << shift);
 }
 
-static nokprobe_inline void do_cmp_unsigned(struct pt_regs *regs, unsigned long v1,
-				      unsigned long v2, int crfld)
+static nokprobe_inline void do_cmp_unsigned(const struct pt_regs *regs,
+					    struct instruction_op *op,
+					    unsigned long v1,
+					    unsigned long v2, int crfld)
 {
 	unsigned int crval, shift;
 
+	op->type = COMPUTE + SETCC;
 	crval = (regs->xer >> 31) & 1;		/* get SO bit */
 	if (v1 < v2)
 		crval |= 8;
@@ -593,11 +608,12 @@ static nokprobe_inline void do_cmp_unsigned(struct pt_regs *regs, unsigned long
 	else
 		crval |= 2;
 	shift = (7 - crfld) * 4;
-	regs->ccr = (regs->ccr & ~(0xf << shift)) | (crval << shift);
+	op->ccval = (regs->ccr & ~(0xf << shift)) | (crval << shift);
 }
 
-static nokprobe_inline void do_cmpb(struct pt_regs *regs, unsigned long v1,
-				unsigned long v2, int rd)
+static nokprobe_inline void do_cmpb(const struct pt_regs *regs,
+				    struct instruction_op *op,
+				    unsigned long v1, unsigned long v2)
 {
 	unsigned long long out_val, mask;
 	int i;
@@ -608,16 +624,16 @@ static nokprobe_inline void do_cmpb(struct pt_regs *regs, unsigned long v1,
 		if ((v1 & mask) == (v2 & mask))
 			out_val |= mask;
 	}
-
-	regs->gpr[rd] = out_val;
+	op->val = out_val;
 }
 
 /*
  * The size parameter is used to adjust the equivalent popcnt instruction.
  * popcntb = 8, popcntw = 32, popcntd = 64
  */
-static nokprobe_inline void do_popcnt(struct pt_regs *regs, unsigned long v1,
-				int size, int ra)
+static nokprobe_inline void do_popcnt(const struct pt_regs *regs,
+				      struct instruction_op *op,
+				      unsigned long v1, int size)
 {
 	unsigned long long out = v1;
 
@@ -626,23 +642,24 @@ static nokprobe_inline void do_popcnt(struct pt_regs *regs, unsigned long v1,
 	out = (out + (out >> 4)) & 0x0f0f0f0f0f0f0f0f;
 
 	if (size == 8) {	/* popcntb */
-		regs->gpr[ra] = out;
+		op->val = out;
 		return;
 	}
 	out += out >> 8;
 	out += out >> 16;
 	if (size == 32) {	/* popcntw */
-		regs->gpr[ra] = out & 0x0000003f0000003f;
+		op->val = out & 0x0000003f0000003f;
 		return;
 	}
 
 	out = (out + (out >> 32)) & 0x7f;
-	regs->gpr[ra] = out;	/* popcntd */
+	op->val = out;	/* popcntd */
 }
 
 #ifdef CONFIG_PPC64
-static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1,
-				unsigned long v2, int ra)
+static nokprobe_inline void do_bpermd(const struct pt_regs *regs,
+				      struct instruction_op *op,
+				      unsigned long v1, unsigned long v2)
 {
 	unsigned char perm, idx;
 	unsigned int i;
@@ -654,26 +671,27 @@ static nokprobe_inline void do_bpermd(struct pt_regs *regs, unsigned long v1,
 			if (v2 & PPC_BIT(idx))
 				perm |= 1 << i;
 	}
-	regs->gpr[ra] = perm;
+	op->val = perm;
 }
 #endif /* CONFIG_PPC64 */
 /*
  * The size parameter adjusts the equivalent prty instruction.
  * prtyw = 32, prtyd = 64
  */
-static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v,
-				int size, int ra)
+static nokprobe_inline void do_prty(const struct pt_regs *regs,
+				    struct instruction_op *op,
+				    unsigned long v, int size)
 {
 	unsigned long long res = v ^ (v >> 8);
 
 	res ^= res >> 16;
 	if (size == 32) {		/* prtyw */
-		regs->gpr[ra] = res & 0x0000000100000001;
+		op->val = res & 0x0000000100000001;
 		return;
 	}
 
 	res ^= res >> 32;
-	regs->gpr[ra] = res & 1;	/*prtyd */
+	op->val = res & 1;	/*prtyd */
 }
 
 static nokprobe_inline int trap_compare(long v1, long v2)
@@ -709,14 +727,18 @@ static nokprobe_inline int trap_compare(long v1, long v2)
 #define ROTATE(x, n)	((n) ? (((x) << (n)) | ((x) >> (8 * sizeof(long) - (n)))) : (x))
 
 /*
- * Decode an instruction, and execute it if that can be done just by
- * modifying *regs (i.e. integer arithmetic and logical instructions,
- * branches, and barrier instructions).
- * Returns 1 if the instruction has been executed, or 0 if not.
- * Sets *op to indicate what the instruction does.
+ * Decode an instruction, and return information about it in *op
+ * without changing *regs.
+ * Integer arithmetic and logical instructions, branches, and barrier
+ * instructions can be emulated just using the information in *op.
+ *
+ * Return value is 1 if the instruction can be emulated just by
+ * updating *regs with the information in *op, -1 if we need the
+ * GPRs but *regs doesn't contain the full register set, or 0
+ * otherwise.
  */
-int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
-			    unsigned int instr)
+int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
+		  unsigned int instr)
 {
 	unsigned int opcode, ra, rb, rd, spr, u;
 	unsigned long int imm;
@@ -733,12 +755,11 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		imm = (signed short)(instr & 0xfffc);
 		if ((instr & 2) == 0)
 			imm += regs->nip;
-		regs->nip += 4;
-		regs->nip = truncate_if_32bit(regs->msr, regs->nip);
+		op->val = truncate_if_32bit(regs->msr, imm);
 		if (instr & 1)
-			regs->link = regs->nip;
-		if (branch_taken(instr, regs))
-			regs->nip = truncate_if_32bit(regs->msr, imm);
+			op->type |= SETLK;
+		if (branch_taken(instr, regs, op))
+			op->type |= BRTAKEN;
 		return 1;
 #ifdef CONFIG_PPC64
 	case 17:	/* sc */
@@ -749,38 +770,37 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		return 0;
 #endif
 	case 18:	/* b */
-		op->type = BRANCH;
+		op->type = BRANCH | BRTAKEN;
 		imm = instr & 0x03fffffc;
 		if (imm & 0x02000000)
 			imm -= 0x04000000;
 		if ((instr & 2) == 0)
 			imm += regs->nip;
+		op->val = truncate_if_32bit(regs->msr, imm);
 		if (instr & 1)
-			regs->link = truncate_if_32bit(regs->msr, regs->nip + 4);
-		imm = truncate_if_32bit(regs->msr, imm);
-		regs->nip = imm;
+			op->type |= SETLK;
 		return 1;
 	case 19:
 		switch ((instr >> 1) & 0x3ff) {
 		case 0:		/* mcrf */
+			op->type = COMPUTE + SETCC;
 			rd = 7 - ((instr >> 23) & 0x7);
 			ra = 7 - ((instr >> 18) & 0x7);
 			rd *= 4;
 			ra *= 4;
 			val = (regs->ccr >> ra) & 0xf;
-			regs->ccr = (regs->ccr & ~(0xfUL << rd)) | (val << rd);
-			goto instr_done;
+			op->ccval = (regs->ccr & ~(0xfUL << rd)) | (val << rd);
+			return 1;
 
 		case 16:	/* bclr */
 		case 528:	/* bcctr */
 			op->type = BRANCH;
 			imm = (instr & 0x400)? regs->ctr: regs->link;
-			regs->nip = truncate_if_32bit(regs->msr, regs->nip + 4);
-			imm = truncate_if_32bit(regs->msr, imm);
+			op->val = truncate_if_32bit(regs->msr, imm);
 			if (instr & 1)
-				regs->link = regs->nip;
-			if (branch_taken(instr, regs))
-				regs->nip = imm;
+				op->type |= SETLK;
+			if (branch_taken(instr, regs, op))
+				op->type |= BRTAKEN;
 			return 1;
 
 		case 18:	/* rfid, scary */
@@ -790,9 +810,8 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 			return 0;
 
 		case 150:	/* isync */
-			op->type = BARRIER;
-			isync();
-			goto instr_done;
+			op->type = BARRIER | BARRIER_ISYNC;
+			return 1;
 
 		case 33:	/* crnor */
 		case 129:	/* crandc */
@@ -802,45 +821,47 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		case 289:	/* creqv */
 		case 417:	/* crorc */
 		case 449:	/* cror */
+			op->type = COMPUTE + SETCC;
 			ra = (instr >> 16) & 0x1f;
 			rb = (instr >> 11) & 0x1f;
 			rd = (instr >> 21) & 0x1f;
 			ra = (regs->ccr >> (31 - ra)) & 1;
 			rb = (regs->ccr >> (31 - rb)) & 1;
 			val = (instr >> (6 + ra * 2 + rb)) & 1;
-			regs->ccr = (regs->ccr & ~(1UL << (31 - rd))) |
+			op->ccval = (regs->ccr & ~(1UL << (31 - rd))) |
 				(val << (31 - rd));
-			goto instr_done;
+			return 1;
+		default:
+			op->type = UNKNOWN;
+			return 0;
 		}
 		break;
 	case 31:
 		switch ((instr >> 1) & 0x3ff) {
 		case 598:	/* sync */
-			op->type = BARRIER;
+			op->type = BARRIER + BARRIER_SYNC;
 #ifdef __powerpc64__
 			switch ((instr >> 21) & 3) {
 			case 1:		/* lwsync */
-				asm volatile("lwsync" : : : "memory");
-				goto instr_done;
+				op->type = BARRIER + BARRIER_LWSYNC;
+				break;
 			case 2:		/* ptesync */
-				asm volatile("ptesync" : : : "memory");
-				goto instr_done;
+				op->type = BARRIER + BARRIER_PTESYNC;
+				break;
 			}
 #endif
-			mb();
-			goto instr_done;
+			return 1;
 
 		case 854:	/* eieio */
-			op->type = BARRIER;
-			eieio();
-			goto instr_done;
+			op->type = BARRIER + BARRIER_EIEIO;
+			return 1;
 		}
 		break;
 	}
 
 	/* Following cases refer to regs->gpr[], so we need all regs */
 	if (!FULL_REGS(regs))
-		return 0;
+		return -1;
 
 	rd = (instr >> 21) & 0x1f;
 	ra = (instr >> 16) & 0x1f;
@@ -851,21 +872,21 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 	case 2:		/* tdi */
 		if (rd & trap_compare(regs->gpr[ra], (short) instr))
 			goto trap;
-		goto instr_done;
+		return 1;
 #endif
 	case 3:		/* twi */
 		if (rd & trap_compare((int)regs->gpr[ra], (short) instr))
 			goto trap;
-		goto instr_done;
+		return 1;
 
 	case 7:		/* mulli */
-		regs->gpr[rd] = regs->gpr[ra] * (short) instr;
-		goto instr_done;
+		op->val = regs->gpr[ra] * (short) instr;
+		goto compute_done;
 
 	case 8:		/* subfic */
 		imm = (short) instr;
-		add_with_carry(regs, rd, ~regs->gpr[ra], imm, 1);
-		goto instr_done;
+		add_with_carry(regs, op, rd, ~regs->gpr[ra], imm, 1);
+		return 1;
 
 	case 10:	/* cmpli */
 		imm = (unsigned short) instr;
@@ -874,8 +895,8 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		if ((rd & 1) == 0)
 			val = (unsigned int) val;
 #endif
-		do_cmp_unsigned(regs, val, imm, rd >> 2);
-		goto instr_done;
+		do_cmp_unsigned(regs, op, val, imm, rd >> 2);
+		return 1;
 
 	case 11:	/* cmpi */
 		imm = (short) instr;
@@ -884,47 +905,47 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		if ((rd & 1) == 0)
 			val = (int) val;
 #endif
-		do_cmp_signed(regs, val, imm, rd >> 2);
-		goto instr_done;
+		do_cmp_signed(regs, op, val, imm, rd >> 2);
+		return 1;
 
 	case 12:	/* addic */
 		imm = (short) instr;
-		add_with_carry(regs, rd, regs->gpr[ra], imm, 0);
-		goto instr_done;
+		add_with_carry(regs, op, rd, regs->gpr[ra], imm, 0);
+		return 1;
 
 	case 13:	/* addic. */
 		imm = (short) instr;
-		add_with_carry(regs, rd, regs->gpr[ra], imm, 0);
-		set_cr0(regs, rd);
-		goto instr_done;
+		add_with_carry(regs, op, rd, regs->gpr[ra], imm, 0);
+		set_cr0(regs, op, rd);
+		return 1;
 
 	case 14:	/* addi */
 		imm = (short) instr;
 		if (ra)
 			imm += regs->gpr[ra];
-		regs->gpr[rd] = imm;
-		goto instr_done;
+		op->val = imm;
+		goto compute_done;
 
 	case 15:	/* addis */
 		imm = ((short) instr) << 16;
 		if (ra)
 			imm += regs->gpr[ra];
-		regs->gpr[rd] = imm;
-		goto instr_done;
+		op->val = imm;
+		goto compute_done;
 
 	case 20:	/* rlwimi */
 		mb = (instr >> 6) & 0x1f;
 		me = (instr >> 1) & 0x1f;
 		val = DATA32(regs->gpr[rd]);
 		imm = MASK32(mb, me);
-		regs->gpr[ra] = (regs->gpr[ra] & ~imm) | (ROTATE(val, rb) & imm);
+		op->val = (regs->gpr[ra] & ~imm) | (ROTATE(val, rb) & imm);
 		goto logical_done;
 
 	case 21:	/* rlwinm */
 		mb = (instr >> 6) & 0x1f;
 		me = (instr >> 1) & 0x1f;
 		val = DATA32(regs->gpr[rd]);
-		regs->gpr[ra] = ROTATE(val, rb) & MASK32(mb, me);
+		op->val = ROTATE(val, rb) & MASK32(mb, me);
 		goto logical_done;
 
 	case 23:	/* rlwnm */
@@ -932,40 +953,37 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		me = (instr >> 1) & 0x1f;
 		rb = regs->gpr[rb] & 0x1f;
 		val = DATA32(regs->gpr[rd]);
-		regs->gpr[ra] = ROTATE(val, rb) & MASK32(mb, me);
+		op->val = ROTATE(val, rb) & MASK32(mb, me);
 		goto logical_done;
 
 	case 24:	/* ori */
-		imm = (unsigned short) instr;
-		regs->gpr[ra] = regs->gpr[rd] | imm;
-		goto instr_done;
+		op->val = regs->gpr[rd] | (unsigned short) instr;
+		goto logical_done_nocc;
 
 	case 25:	/* oris */
 		imm = (unsigned short) instr;
-		regs->gpr[ra] = regs->gpr[rd] | (imm << 16);
-		goto instr_done;
+		op->val = regs->gpr[rd] | (imm << 16);
+		goto logical_done_nocc;
 
 	case 26:	/* xori */
-		imm = (unsigned short) instr;
-		regs->gpr[ra] = regs->gpr[rd] ^ imm;
-		goto instr_done;
+		op->val = regs->gpr[rd] ^ (unsigned short) instr;
+		goto logical_done_nocc;
 
 	case 27:	/* xoris */
 		imm = (unsigned short) instr;
-		regs->gpr[ra] = regs->gpr[rd] ^ (imm << 16);
-		goto instr_done;
+		op->val = regs->gpr[rd] ^ (imm << 16);
+		goto logical_done_nocc;
 
 	case 28:	/* andi. */
-		imm = (unsigned short) instr;
-		regs->gpr[ra] = regs->gpr[rd] & imm;
-		set_cr0(regs, ra);
-		goto instr_done;
+		op->val = regs->gpr[rd] & (unsigned short) instr;
+		set_cr0(regs, op, ra);
+		goto logical_done_nocc;
 
 	case 29:	/* andis. */
 		imm = (unsigned short) instr;
-		regs->gpr[ra] = regs->gpr[rd] & (imm << 16);
-		set_cr0(regs, ra);
-		goto instr_done;
+		op->val = regs->gpr[rd] & (imm << 16);
+		set_cr0(regs, op, ra);
+		goto logical_done_nocc;
 
 #ifdef __powerpc64__
 	case 30:	/* rld* */
@@ -976,34 +994,36 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 			val = ROTATE(val, sh);
 			switch ((instr >> 2) & 3) {
 			case 0:		/* rldicl */
-				regs->gpr[ra] = val & MASK64_L(mb);
-				goto logical_done;
+				val &= MASK64_L(mb);
+				break;
 			case 1:		/* rldicr */
-				regs->gpr[ra] = val & MASK64_R(mb);
-				goto logical_done;
+				val &= MASK64_R(mb);
+				break;
 			case 2:		/* rldic */
-				regs->gpr[ra] = val & MASK64(mb, 63 - sh);
-				goto logical_done;
+				val &= MASK64(mb, 63 - sh);
+				break;
 			case 3:		/* rldimi */
 				imm = MASK64(mb, 63 - sh);
-				regs->gpr[ra] = (regs->gpr[ra] & ~imm) |
+				val = (regs->gpr[ra] & ~imm) |
 					(val & imm);
-				goto logical_done;
 			}
+			op->val = val;
+			goto logical_done;
 		} else {
 			sh = regs->gpr[rb] & 0x3f;
 			val = ROTATE(val, sh);
 			switch ((instr >> 1) & 7) {
 			case 0:		/* rldcl */
-				regs->gpr[ra] = val & MASK64_L(mb);
+				op->val = val & MASK64_L(mb);
 				goto logical_done;
 			case 1:		/* rldcr */
-				regs->gpr[ra] = val & MASK64_R(mb);
+				op->val = val & MASK64_R(mb);
 				goto logical_done;
 			}
 		}
 #endif
-	break; /* illegal instruction */
+		op->type = UNKNOWN;	/* illegal instruction */
+		return 0;
 
 	case 31:
 		switch ((instr >> 1) & 0x3ff) {
@@ -1012,12 +1032,12 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 			    (rd & trap_compare((int)regs->gpr[ra],
 					       (int)regs->gpr[rb])))
 				goto trap;
-			goto instr_done;
+			return 1;
 #ifdef __powerpc64__
 		case 68:	/* td */
 			if (rd & trap_compare(regs->gpr[ra], regs->gpr[rb]))
 				goto trap;
-			goto instr_done;
+			return 1;
 #endif
 		case 83:	/* mfmsr */
 			if (regs->msr & MSR_PR)
@@ -1046,74 +1066,50 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 #endif
 
 		case 19:	/* mfcr */
+			imm = 0xffffffffUL;
 			if ((instr >> 20) & 1) {
 				imm = 0xf0000000UL;
 				for (sh = 0; sh < 8; ++sh) {
-					if (instr & (0x80000 >> sh)) {
-						regs->gpr[rd] = regs->ccr & imm;
+					if (instr & (0x80000 >> sh))
 						break;
-					}
 					imm >>= 4;
 				}
-
-				goto instr_done;
 			}
-
-			regs->gpr[rd] = regs->ccr;
-			regs->gpr[rd] &= 0xffffffffUL;
-			goto instr_done;
+			op->val = regs->ccr & imm;
+			goto compute_done;
 
 		case 144:	/* mtcrf */
+			op->type = COMPUTE + SETCC;
 			imm = 0xf0000000UL;
 			val = regs->gpr[rd];
+			op->val = regs->ccr;
 			for (sh = 0; sh < 8; ++sh) {
 				if (instr & (0x80000 >> sh))
-					regs->ccr = (regs->ccr & ~imm) |
+					op->val = (op->val & ~imm) |
 						(val & imm);
 				imm >>= 4;
 			}
-			goto instr_done;
+			return 1;
 
 		case 339:	/* mfspr */
 			spr = ((instr >> 16) & 0x1f) | ((instr >> 6) & 0x3e0);
-			switch (spr) {
-			case SPRN_XER:	/* mfxer */
-				regs->gpr[rd] = regs->xer;
-				regs->gpr[rd] &= 0xffffffffUL;
-				goto instr_done;
-			case SPRN_LR:	/* mflr */
-				regs->gpr[rd] = regs->link;
-				goto instr_done;
-			case SPRN_CTR:	/* mfctr */
-				regs->gpr[rd] = regs->ctr;
-				goto instr_done;
-			default:
-				op->type = MFSPR;
-				op->reg = rd;
-				op->spr = spr;
-				return 0;
-			}
-			break;
+			op->type = MFSPR;
+			op->reg = rd;
+			op->spr = spr;
+			if (spr == SPRN_XER || spr == SPRN_LR ||
+			    spr == SPRN_CTR)
+				return 1;
+			return 0;
 
 		case 467:	/* mtspr */
 			spr = ((instr >> 16) & 0x1f) | ((instr >> 6) & 0x3e0);
-			switch (spr) {
-			case SPRN_XER:	/* mtxer */
-				regs->xer = (regs->gpr[rd] & 0xffffffffUL);
-				goto instr_done;
-			case SPRN_LR:	/* mtlr */
-				regs->link = regs->gpr[rd];
-				goto instr_done;
-			case SPRN_CTR:	/* mtctr */
-				regs->ctr = regs->gpr[rd];
-				goto instr_done;
-			default:
-				op->type = MTSPR;
-				op->val = regs->gpr[rd];
-				op->spr = spr;
-				return 0;
-			}
-			break;
+			op->type = MTSPR;
+			op->val = regs->gpr[rd];
+			op->spr = spr;
+			if (spr == SPRN_XER || spr == SPRN_LR ||
+			    spr == SPRN_CTR)
+				return 1;
+			return 0;
 
 /*
  * Compare instructions
@@ -1128,8 +1124,8 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 				val2 = (int) val2;
 			}
 #endif
-			do_cmp_signed(regs, val, val2, rd >> 2);
-			goto instr_done;
+			do_cmp_signed(regs, op, val, val2, rd >> 2);
+			return 1;
 
 		case 32:	/* cmpl */
 			val = regs->gpr[ra];
@@ -1141,113 +1137,113 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 				val2 = (unsigned int) val2;
 			}
 #endif
-			do_cmp_unsigned(regs, val, val2, rd >> 2);
-			goto instr_done;
+			do_cmp_unsigned(regs, op, val, val2, rd >> 2);
+			return 1;
 
 		case 508: /* cmpb */
-			do_cmpb(regs, regs->gpr[rd], regs->gpr[rb], ra);
-			goto instr_done;
+			do_cmpb(regs, op, regs->gpr[rd], regs->gpr[rb]);
+			goto logical_done_nocc;
 
 /*
  * Arithmetic instructions
  */
 		case 8:	/* subfc */
-			add_with_carry(regs, rd, ~regs->gpr[ra],
+			add_with_carry(regs, op, rd, ~regs->gpr[ra],
 				       regs->gpr[rb], 1);
 			goto arith_done;
 #ifdef __powerpc64__
 		case 9:	/* mulhdu */
-			asm("mulhdu %0,%1,%2" : "=r" (regs->gpr[rd]) :
+			asm("mulhdu %0,%1,%2" : "=r" (op->val) :
 			    "r" (regs->gpr[ra]), "r" (regs->gpr[rb]));
 			goto arith_done;
 #endif
 		case 10:	/* addc */
-			add_with_carry(regs, rd, regs->gpr[ra],
+			add_with_carry(regs, op, rd, regs->gpr[ra],
 				       regs->gpr[rb], 0);
 			goto arith_done;
 
 		case 11:	/* mulhwu */
-			asm("mulhwu %0,%1,%2" : "=r" (regs->gpr[rd]) :
+			asm("mulhwu %0,%1,%2" : "=r" (op->val) :
 			    "r" (regs->gpr[ra]), "r" (regs->gpr[rb]));
 			goto arith_done;
 
 		case 40:	/* subf */
-			regs->gpr[rd] = regs->gpr[rb] - regs->gpr[ra];
+			op->val = regs->gpr[rb] - regs->gpr[ra];
 			goto arith_done;
 #ifdef __powerpc64__
 		case 73:	/* mulhd */
-			asm("mulhd %0,%1,%2" : "=r" (regs->gpr[rd]) :
+			asm("mulhd %0,%1,%2" : "=r" (op->val) :
 			    "r" (regs->gpr[ra]), "r" (regs->gpr[rb]));
 			goto arith_done;
 #endif
 		case 75:	/* mulhw */
-			asm("mulhw %0,%1,%2" : "=r" (regs->gpr[rd]) :
+			asm("mulhw %0,%1,%2" : "=r" (op->val) :
 			    "r" (regs->gpr[ra]), "r" (regs->gpr[rb]));
 			goto arith_done;
 
 		case 104:	/* neg */
-			regs->gpr[rd] = -regs->gpr[ra];
+			op->val = -regs->gpr[ra];
 			goto arith_done;
 
 		case 136:	/* subfe */
-			add_with_carry(regs, rd, ~regs->gpr[ra], regs->gpr[rb],
-				       regs->xer & XER_CA);
+			add_with_carry(regs, op, rd, ~regs->gpr[ra],
+				       regs->gpr[rb], regs->xer & XER_CA);
 			goto arith_done;
 
 		case 138:	/* adde */
-			add_with_carry(regs, rd, regs->gpr[ra], regs->gpr[rb],
-				       regs->xer & XER_CA);
+			add_with_carry(regs, op, rd, regs->gpr[ra],
+				       regs->gpr[rb], regs->xer & XER_CA);
 			goto arith_done;
 
 		case 200:	/* subfze */
-			add_with_carry(regs, rd, ~regs->gpr[ra], 0L,
+			add_with_carry(regs, op, rd, ~regs->gpr[ra], 0L,
 				       regs->xer & XER_CA);
 			goto arith_done;
 
 		case 202:	/* addze */
-			add_with_carry(regs, rd, regs->gpr[ra], 0L,
+			add_with_carry(regs, op, rd, regs->gpr[ra], 0L,
 				       regs->xer & XER_CA);
 			goto arith_done;
 
 		case 232:	/* subfme */
-			add_with_carry(regs, rd, ~regs->gpr[ra], -1L,
+			add_with_carry(regs, op, rd, ~regs->gpr[ra], -1L,
 				       regs->xer & XER_CA);
 			goto arith_done;
 #ifdef __powerpc64__
 		case 233:	/* mulld */
-			regs->gpr[rd] = regs->gpr[ra] * regs->gpr[rb];
+			op->val = regs->gpr[ra] * regs->gpr[rb];
 			goto arith_done;
 #endif
 		case 234:	/* addme */
-			add_with_carry(regs, rd, regs->gpr[ra], -1L,
+			add_with_carry(regs, op, rd, regs->gpr[ra], -1L,
 				       regs->xer & XER_CA);
 			goto arith_done;
 
 		case 235:	/* mullw */
-			regs->gpr[rd] = (unsigned int) regs->gpr[ra] *
+			op->val = (unsigned int) regs->gpr[ra] *
 				(unsigned int) regs->gpr[rb];
 			goto arith_done;
 
 		case 266:	/* add */
-			regs->gpr[rd] = regs->gpr[ra] + regs->gpr[rb];
+			op->val = regs->gpr[ra] + regs->gpr[rb];
 			goto arith_done;
 #ifdef __powerpc64__
 		case 457:	/* divdu */
-			regs->gpr[rd] = regs->gpr[ra] / regs->gpr[rb];
+			op->val = regs->gpr[ra] / regs->gpr[rb];
 			goto arith_done;
 #endif
 		case 459:	/* divwu */
-			regs->gpr[rd] = (unsigned int) regs->gpr[ra] /
+			op->val = (unsigned int) regs->gpr[ra] /
 				(unsigned int) regs->gpr[rb];
 			goto arith_done;
 #ifdef __powerpc64__
 		case 489:	/* divd */
-			regs->gpr[rd] = (long int) regs->gpr[ra] /
+			op->val = (long int) regs->gpr[ra] /
 				(long int) regs->gpr[rb];
 			goto arith_done;
 #endif
 		case 491:	/* divw */
-			regs->gpr[rd] = (int) regs->gpr[ra] /
+			op->val = (int) regs->gpr[ra] /
 				(int) regs->gpr[rb];
 			goto arith_done;
 
@@ -1260,85 +1256,83 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 			val = (regs->ccr >> (31 - mb)) & 1;
 			val2 = (ra) ? regs->gpr[ra] : 0;
 
-			regs->gpr[rd] = (val) ? val2 : regs->gpr[rb];
-			goto logical_done;
+			op->val = (val) ? val2 : regs->gpr[rb];
+			goto compute_done;
 
 		case 26:	/* cntlzw */
-			asm("cntlzw %0,%1" : "=r" (regs->gpr[ra]) :
-			    "r" (regs->gpr[rd]));
+			op->val = __builtin_clz((unsigned int) regs->gpr[rd]);
 			goto logical_done;
 #ifdef __powerpc64__
 		case 58:	/* cntlzd */
-			asm("cntlzd %0,%1" : "=r" (regs->gpr[ra]) :
-			    "r" (regs->gpr[rd]));
+			op->val = __builtin_clzl(regs->gpr[rd]);
 			goto logical_done;
 #endif
 		case 28:	/* and */
-			regs->gpr[ra] = regs->gpr[rd] & regs->gpr[rb];
+			op->val = regs->gpr[rd] & regs->gpr[rb];
 			goto logical_done;
 
 		case 60:	/* andc */
-			regs->gpr[ra] = regs->gpr[rd] & ~regs->gpr[rb];
+			op->val = regs->gpr[rd] & ~regs->gpr[rb];
 			goto logical_done;
 
 		case 122:	/* popcntb */
-			do_popcnt(regs, regs->gpr[rd], 8, ra);
+			do_popcnt(regs, op, regs->gpr[rd], 8);
 			goto logical_done;
 
 		case 124:	/* nor */
-			regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]);
+			op->val = ~(regs->gpr[rd] | regs->gpr[rb]);
 			goto logical_done;
 
 		case 154:	/* prtyw */
-			do_prty(regs, regs->gpr[rd], 32, ra);
+			do_prty(regs, op, regs->gpr[rd], 32);
 			goto logical_done;
 
 		case 186:	/* prtyd */
-			do_prty(regs, regs->gpr[rd], 64, ra);
+			do_prty(regs, op, regs->gpr[rd], 64);
 			goto logical_done;
 #ifdef CONFIG_PPC64
 		case 252:	/* bpermd */
-			do_bpermd(regs, regs->gpr[rd], regs->gpr[rb], ra);
+			do_bpermd(regs, op, regs->gpr[rd], regs->gpr[rb]);
 			goto logical_done;
 #endif
 		case 284:	/* xor */
-			regs->gpr[ra] = ~(regs->gpr[rd] ^ regs->gpr[rb]);
+			op->val = ~(regs->gpr[rd] ^ regs->gpr[rb]);
 			goto logical_done;
 
 		case 316:	/* xor */
-			regs->gpr[ra] = regs->gpr[rd] ^ regs->gpr[rb];
+			op->val = regs->gpr[rd] ^ regs->gpr[rb];
 			goto logical_done;
 
 		case 378:	/* popcntw */
-			do_popcnt(regs, regs->gpr[rd], 32, ra);
+			do_popcnt(regs, op, regs->gpr[rd], 32);
 			goto logical_done;
 
 		case 412:	/* orc */
-			regs->gpr[ra] = regs->gpr[rd] | ~regs->gpr[rb];
+			op->val = regs->gpr[rd] | ~regs->gpr[rb];
 			goto logical_done;
 
 		case 444:	/* or */
-			regs->gpr[ra] = regs->gpr[rd] | regs->gpr[rb];
+			op->val = regs->gpr[rd] | regs->gpr[rb];
 			goto logical_done;
 
 		case 476:	/* nand */
-			regs->gpr[ra] = ~(regs->gpr[rd] & regs->gpr[rb]);
+			op->val = ~(regs->gpr[rd] & regs->gpr[rb]);
 			goto logical_done;
 #ifdef CONFIG_PPC64
 		case 506:	/* popcntd */
-			do_popcnt(regs, regs->gpr[rd], 64, ra);
+			do_popcnt(regs, op, regs->gpr[rd], 64);
 			goto logical_done;
 #endif
 		case 922:	/* extsh */
-			regs->gpr[ra] = (signed short) regs->gpr[rd];
+			op->val = (signed short) regs->gpr[rd];
 			goto logical_done;
 
 		case 954:	/* extsb */
-			regs->gpr[ra] = (signed char) regs->gpr[rd];
+			op->val = (signed char) regs->gpr[rd];
 			goto logical_done;
 #ifdef __powerpc64__
 		case 986:	/* extsw */
-			regs->gpr[ra] = (signed int) regs->gpr[rd];
+			op->val = (signed int) regs->gpr[rd];
 			goto logical_done;
 #endif
 
@@ -1348,75 +1342,83 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 		case 24:	/* slw */
 			sh = regs->gpr[rb] & 0x3f;
 			if (sh < 32)
-				regs->gpr[ra] = (regs->gpr[rd] << sh) & 0xffffffffUL;
+				op->val = (regs->gpr[rd] << sh) & 0xffffffffUL;
 			else
-				regs->gpr[ra] = 0;
+				op->val = 0;
 			goto logical_done;
 
 		case 536:	/* srw */
 			sh = regs->gpr[rb] & 0x3f;
 			if (sh < 32)
-				regs->gpr[ra] = (regs->gpr[rd] & 0xffffffffUL) >> sh;
+				op->val = (regs->gpr[rd] & 0xffffffffUL) >> sh;
 			else
-				regs->gpr[ra] = 0;
+				op->val = 0;
 			goto logical_done;
 
 		case 792:	/* sraw */
+			op->type = COMPUTE + SETREG + SETXER;
 			sh = regs->gpr[rb] & 0x3f;
 			ival = (signed int) regs->gpr[rd];
-			regs->gpr[ra] = ival >> (sh < 32 ? sh : 31);
+			op->val = ival >> (sh < 32 ? sh : 31);
+			op->xerval = regs->xer;
 			if (ival < 0 && (sh >= 32 || (ival & ((1ul << sh) - 1)) != 0))
-				regs->xer |= XER_CA;
+				op->xerval |= XER_CA;
 			else
-				regs->xer &= ~XER_CA;
+				op->xerval &= ~XER_CA;
 			goto logical_done;
 
 		case 824:	/* srawi */
+			op->type = COMPUTE + SETREG + SETXER;
 			sh = rb;
 			ival = (signed int) regs->gpr[rd];
-			regs->gpr[ra] = ival >> sh;
+			op->val = ival >> sh;
+			op->xerval = regs->xer;
 			if (ival < 0 && (ival & ((1ul << sh) - 1)) != 0)
-				regs->xer |= XER_CA;
+				op->xerval |= XER_CA;
 			else
-				regs->xer &= ~XER_CA;
+				op->xerval &= ~XER_CA;
 			goto logical_done;
 
 #ifdef __powerpc64__
 		case 27:	/* sld */
 			sh = regs->gpr[rb] & 0x7f;
 			if (sh < 64)
-				regs->gpr[ra] = regs->gpr[rd] << sh;
+				op->val = regs->gpr[rd] << sh;
 			else
-				regs->gpr[ra] = 0;
+				op->val = 0;
 			goto logical_done;
 
 		case 539:	/* srd */
 			sh = regs->gpr[rb] & 0x7f;
 			if (sh < 64)
-				regs->gpr[ra] = regs->gpr[rd] >> sh;
+				op->val = regs->gpr[rd] >> sh;
 			else
-				regs->gpr[ra] = 0;
+				op->val = 0;
 			goto logical_done;
 
 		case 794:	/* srad */
+			op->type = COMPUTE + SETREG + SETXER;
 			sh = regs->gpr[rb] & 0x7f;
 			ival = (signed long int) regs->gpr[rd];
-			regs->gpr[ra] = ival >> (sh < 64 ? sh : 63);
+			op->val = ival >> (sh < 64 ? sh : 63);
+			op->xerval = regs->xer;
 			if (ival < 0 && (sh >= 64 || (ival & ((1ul << sh) - 1)) != 0))
-				regs->xer |= XER_CA;
+				op->xerval |= XER_CA;
 			else
-				regs->xer &= ~XER_CA;
+				op->xerval &= ~XER_CA;
 			goto logical_done;
 
 		case 826:	/* sradi with sh_5 = 0 */
 		case 827:	/* sradi with sh_5 = 1 */
+			op->type = COMPUTE + SETREG + SETXER;
 			sh = rb | ((instr & 2) << 4);
 			ival = (signed long int) regs->gpr[rd];
-			regs->gpr[ra] = ival >> sh;
+			op->val = ival >> sh;
+			op->xerval = regs->xer;
 			if (ival < 0 && (ival & ((1ul << sh) - 1)) != 0)
-				regs->xer |= XER_CA;
+				op->xerval |= XER_CA;
 			else
-				regs->xer &= ~XER_CA;
+				op->xerval &= ~XER_CA;
 			goto logical_done;
 #endif /* __powerpc64__ */
 
@@ -1787,15 +1789,18 @@ int analyse_instr(struct instruction_op *op, struct pt_regs *regs,
 
  logical_done:
 	if (instr & 1)
-		set_cr0(regs, ra);
-	goto instr_done;
+		set_cr0(regs, op, ra);
+ logical_done_nocc:
+	op->reg = ra;
+	op->type |= SETREG;
+	return 1;
 
  arith_done:
 	if (instr & 1)
-		set_cr0(regs, rd);
-
- instr_done:
-	regs->nip = truncate_if_32bit(regs->msr, regs->nip + 4);
+		set_cr0(regs, op, rd);
+ compute_done:
+	op->reg = rd;
+	op->type |= SETREG;
 	return 1;
 
  priv:
@@ -1887,6 +1892,92 @@ static nokprobe_inline void do_byterev(unsigned long *valp, int size)
 }
 
 /*
+ * Emulate an instruction that can be executed just by updating
+ * fields in *regs.
+ */
+void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op)
+{
+	unsigned long next_pc;
+
+	next_pc = truncate_if_32bit(regs->msr, regs->nip + 4);
+	switch (op->type & INSTR_TYPE_MASK) {
+	case COMPUTE:
+		if (op->type & SETREG)
+			regs->gpr[op->reg] = op->val;
+		if (op->type & SETCC)
+			regs->ccr = op->ccval;
+		if (op->type & SETXER)
+			regs->xer = op->xerval;
+		break;
+
+	case BRANCH:
+		if (op->type & SETLK)
+			regs->link = next_pc;
+		if (op->type & BRTAKEN)
+			next_pc = op->val;
+		if (op->type & DECCTR)
+			--regs->ctr;
+		break;
+
+	case BARRIER:
+		switch (op->type & BARRIER_MASK) {
+		case BARRIER_SYNC:
+			mb();
+			break;
+		case BARRIER_ISYNC:
+			isync();
+			break;
+		case BARRIER_EIEIO:
+			eieio();
+			break;
+		case BARRIER_LWSYNC:
+			asm volatile("lwsync" : : : "memory");
+			break;
+		case BARRIER_PTESYNC:
+			asm volatile("ptesync" : : : "memory");
+			break;
+		}
+		break;
+
+	case MFSPR:
+		switch (op->spr) {
+		case SPRN_XER:
+			regs->gpr[op->reg] = regs->xer & 0xffffffffUL;
+			break;
+		case SPRN_LR:
+			regs->gpr[op->reg] = regs->link;
+			break;
+		case SPRN_CTR:
+			regs->gpr[op->reg] = regs->ctr;
+			break;
+		default:
+			WARN_ON_ONCE(1);
+		}
+		break;
+
+	case MTSPR:
+		switch (op->spr) {
+		case SPRN_XER:
+			regs->xer = op->val & 0xffffffffUL;
+			break;
+		case SPRN_LR:
+			regs->link = op->val;
+			break;
+		case SPRN_CTR:
+			regs->ctr = op->val;
+			break;
+		default:
+			WARN_ON_ONCE(1);
+		}
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+	}
+	regs->nip = next_pc;
+}
+
+/*
  * Emulate instructions that cause a transfer of control,
  * loads and stores, and a few other instructions.
  * Returns 1 if the step was emulated, 0 if not,
@@ -1902,8 +1993,12 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	int i, rd, nb;
 
 	r = analyse_instr(&op, regs, instr);
-	if (r != 0)
+	if (r < 0)
 		return r;
+	if (r > 0) {
+		emulate_update_regs(regs, &op);
+		return 0;
+	}
 
 	err = 0;
 	size = GETSIZE(op.type);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 03/17] powerpc: Don't check MSR FP/VMX/VSX enable bits in analyse_instr()
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 01/17] powerpc: Correct instruction code for xxlor instruction Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 02/17] powerpc: Change analyse_instr so it doesn't modify *regs Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 04/17] powerpc: Handle most loads and stores in instruction emulation code Paul Mackerras
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This removes the checks for the FP/VMX/VSX enable bits in the MSR
from analyse_instr() and adds them to emulate_step() instead.

The reason for this is that we may want to use analyse_instr() in
a situation where the FP/VMX/VSX register values are stored in the
current thread_struct and the FP/VMX/VSX enable bits in the MSR
image in the pt_regs are zero.  Since analyse_instr() doesn't make
any changes to register state, it is reasonable for it to indicate
what the effect of an instruction would be even though the relevant
enable bit is off.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 54 +++++++++++-------------------------------------
 1 file changed, 12 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 8e581c6..13733b7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1505,15 +1505,11 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_ALTIVEC
 		case 103:	/* lvx */
 		case 359:	/* lvxl */
-			if (!(regs->msr & MSR_VEC))
-				goto vecunavail;
 			op->type = MKOP(LOAD_VMX, 0, 16);
 			break;
 
 		case 231:	/* stvx */
 		case 487:	/* stvxl */
-			if (!(regs->msr & MSR_VEC))
-				goto vecunavail;
 			op->type = MKOP(STORE_VMX, 0, 16);
 			break;
 #endif /* CONFIG_ALTIVEC */
@@ -1584,29 +1580,21 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_PPC_FPU
 		case 535:	/* lfsx */
 		case 567:	/* lfsux */
-			if (!(regs->msr & MSR_FP))
-				goto fpunavail;
 			op->type = MKOP(LOAD_FP, u, 4);
 			break;
 
 		case 599:	/* lfdx */
 		case 631:	/* lfdux */
-			if (!(regs->msr & MSR_FP))
-				goto fpunavail;
 			op->type = MKOP(LOAD_FP, u, 8);
 			break;
 
 		case 663:	/* stfsx */
 		case 695:	/* stfsux */
-			if (!(regs->msr & MSR_FP))
-				goto fpunavail;
 			op->type = MKOP(STORE_FP, u, 4);
 			break;
 
 		case 727:	/* stfdx */
 		case 759:	/* stfdux */
-			if (!(regs->msr & MSR_FP))
-				goto fpunavail;
 			op->type = MKOP(STORE_FP, u, 8);
 			break;
 #endif
@@ -1649,16 +1637,12 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_VSX
 		case 844:	/* lxvd2x */
 		case 876:	/* lxvd2ux */
-			if (!(regs->msr & MSR_VSX))
-				goto vsxunavail;
 			op->reg = rd | ((instr & 1) << 5);
 			op->type = MKOP(LOAD_VSX, u, 16);
 			break;
 
 		case 972:	/* stxvd2x */
 		case 1004:	/* stxvd2ux */
-			if (!(regs->msr & MSR_VSX))
-				goto vsxunavail;
 			op->reg = rd | ((instr & 1) << 5);
 			op->type = MKOP(STORE_VSX, u, 16);
 			break;
@@ -1724,32 +1708,24 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_PPC_FPU
 	case 48:	/* lfs */
 	case 49:	/* lfsu */
-		if (!(regs->msr & MSR_FP))
-			goto fpunavail;
 		op->type = MKOP(LOAD_FP, u, 4);
 		op->ea = dform_ea(instr, regs);
 		break;
 
 	case 50:	/* lfd */
 	case 51:	/* lfdu */
-		if (!(regs->msr & MSR_FP))
-			goto fpunavail;
 		op->type = MKOP(LOAD_FP, u, 8);
 		op->ea = dform_ea(instr, regs);
 		break;
 
 	case 52:	/* stfs */
 	case 53:	/* stfsu */
-		if (!(regs->msr & MSR_FP))
-			goto fpunavail;
 		op->type = MKOP(STORE_FP, u, 4);
 		op->ea = dform_ea(instr, regs);
 		break;
 
 	case 54:	/* stfd */
 	case 55:	/* stfdu */
-		if (!(regs->msr & MSR_FP))
-			goto fpunavail;
 		op->type = MKOP(STORE_FP, u, 8);
 		op->ea = dform_ea(instr, regs);
 		break;
@@ -1812,24 +1788,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 	op->type = INTERRUPT | 0x700;
 	op->val = SRR1_PROGTRAP;
 	return 0;
-
-#ifdef CONFIG_PPC_FPU
- fpunavail:
-	op->type = INTERRUPT | 0x800;
-	return 0;
-#endif
-
-#ifdef CONFIG_ALTIVEC
- vecunavail:
-	op->type = INTERRUPT | 0xf20;
-	return 0;
-#endif
-
-#ifdef CONFIG_VSX
- vsxunavail:
-	op->type = INTERRUPT | 0xf40;
-	return 0;
-#endif
 }
 EXPORT_SYMBOL_GPL(analyse_instr);
 NOKPROBE_SYMBOL(analyse_instr);
@@ -2087,6 +2045,8 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 
 #ifdef CONFIG_PPC_FPU
 	case LOAD_FP:
+		if (!(regs->msr & MSR_FP))
+			return 0;
 		if (size == 4)
 			err = do_fp_load(op.reg, do_lfs, op.ea, size, regs);
 		else
@@ -2095,11 +2055,15 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 #endif
 #ifdef CONFIG_ALTIVEC
 	case LOAD_VMX:
+		if (!(regs->msr & MSR_VEC))
+			return 0;
 		err = do_vec_load(op.reg, do_lvx, op.ea & ~0xfUL, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
 	case LOAD_VSX:
+		if (!(regs->msr & MSR_VSX))
+			return 0;
 		err = do_vsx_load(op.reg, do_lxvd2x, op.ea, regs);
 		goto ldst_done;
 #endif
@@ -2134,6 +2098,8 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 
 #ifdef CONFIG_PPC_FPU
 	case STORE_FP:
+		if (!(regs->msr & MSR_FP))
+			return 0;
 		if (size == 4)
 			err = do_fp_store(op.reg, do_stfs, op.ea, size, regs);
 		else
@@ -2142,11 +2108,15 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 #endif
 #ifdef CONFIG_ALTIVEC
 	case STORE_VMX:
+		if (!(regs->msr & MSR_VEC))
+			return 0;
 		err = do_vec_store(op.reg, do_stvx, op.ea & ~0xfUL, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
 	case STORE_VSX:
+		if (!(regs->msr & MSR_VSX))
+			return 0;
 		err = do_vsx_store(op.reg, do_stxvd2x, op.ea, regs);
 		goto ldst_done;
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 04/17] powerpc: Handle most loads and stores in instruction emulation code
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (2 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 03/17] powerpc: Don't check MSR FP/VMX/VSX enable bits in analyse_instr() Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 05/17] powerpc/64: Fix update forms of loads and stores to write 64-bit EA Paul Mackerras
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This extends the instruction emulation infrastructure in sstep.c to
handle all the load and store instructions defined in the Power ISA
v3.0, except for the atomic memory operations, ldmx (which was never
implemented), lfdp/stfdp, and the vector element load/stores.

The instructions added are:

Integer loads and stores: lbarx, lharx, lqarx, stbcx., sthcx., stqcx.,
lq, stq.

VSX loads and stores: lxsiwzx, lxsiwax, stxsiwx, lxvx, lxvl, lxvll,
lxvdsx, lxvwsx, stxvx, stxvl, stxvll, lxsspx, lxsdx, stxsspx, stxsdx,
lxvw4x, lxsibzx, lxvh8x, lxsihzx, lxvb16x, stxvw4x, stxsibx, stxvh8x,
stxsihx, stxvb16x, lxsd, lxssp, lxv, stxsd, stxssp, stxv.

These instructions are handled both in the analyse_instr phase and in
the emulate_step phase.

The code for lxvd2ux and stxvd2ux has been taken out, as those
instructions were never implemented in any processor and have been
taken out of the architecture, and their opcodes have been reused for
other instructions in POWER9 (lxvb16x and stxvb16x).

The emulation for the VSX loads and stores uses helper functions
which don't access registers or memory directly, which can hopefully
be reused by KVM later.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/ppc-opcode.h |   8 +
 arch/powerpc/include/asm/sstep.h      |  21 ++
 arch/powerpc/lib/Makefile             |   1 +
 arch/powerpc/lib/ldstfp.S             |  70 ++--
 arch/powerpc/lib/quad.S               |  62 ++++
 arch/powerpc/lib/sstep.c              | 610 +++++++++++++++++++++++++++++++---
 6 files changed, 710 insertions(+), 62 deletions(-)
 create mode 100644 arch/powerpc/lib/quad.S

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 8861289..46f3b26 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -205,6 +205,8 @@
 #define PPC_INST_ISEL_MASK		0xfc00003e
 #define PPC_INST_LDARX			0x7c0000a8
 #define PPC_INST_STDCX			0x7c0001ad
+#define PPC_INST_LQARX			0x7c000228
+#define PPC_INST_STQCX			0x7c00016d
 #define PPC_INST_LSWI			0x7c0004aa
 #define PPC_INST_LSWX			0x7c00042a
 #define PPC_INST_LWARX			0x7c000028
@@ -403,12 +405,18 @@
 					__PPC_RA(a) | __PPC_RB(b))
 #define	PPC_DCBZL(a, b)		stringify_in_c(.long PPC_INST_DCBZL | \
 					__PPC_RA(a) | __PPC_RB(b))
+#define PPC_LQARX(t, a, b, eh)	stringify_in_c(.long PPC_INST_LQARX | \
+					___PPC_RT(t) | ___PPC_RA(a) | \
+					___PPC_RB(b) | __PPC_EH(eh))
 #define PPC_LDARX(t, a, b, eh)	stringify_in_c(.long PPC_INST_LDARX | \
 					___PPC_RT(t) | ___PPC_RA(a) | \
 					___PPC_RB(b) | __PPC_EH(eh))
 #define PPC_LWARX(t, a, b, eh)	stringify_in_c(.long PPC_INST_LWARX | \
 					___PPC_RT(t) | ___PPC_RA(a) | \
 					___PPC_RB(b) | __PPC_EH(eh))
+#define PPC_STQCX(t, a, b)	stringify_in_c(.long PPC_INST_STQCX | \
+					___PPC_RT(t) | ___PPC_RA(a) | \
+					___PPC_RB(b))
 #define PPC_MSGSND(b)		stringify_in_c(.long PPC_INST_MSGSND | \
 					___PPC_RB(b))
 #define PPC_MSGSYNC		stringify_in_c(.long PPC_INST_MSGSYNC)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 442e636..9801970 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -83,6 +83,12 @@ enum instruction_type {
 #define DCBT		0x300
 #define ICBI		0x400
 
+/* VSX flags values */
+#define VSX_FPCONV	1	/* do floating point SP/DP conversion */
+#define VSX_SPLAT	2	/* store loaded value into all elements */
+#define VSX_LDLEFT	4	/* load VSX register from left */
+#define VSX_CHECK_VEC	8	/* check MSR_VEC not MSR_VSX for reg >= 32 */
+
 /* Size field in type word */
 #define SIZE(n)		((n) << 8)
 #define GETSIZE(w)	((w) >> 8)
@@ -100,6 +106,17 @@ struct instruction_op {
 	int spr;
 	u32 ccval;
 	u32 xerval;
+	u8 element_size;	/* for VSX/VMX loads/stores */
+	u8 vsx_flags;
+};
+
+union vsx_reg {
+	u8	b[16];
+	u16	h[8];
+	u32	w[4];
+	unsigned long d[2];
+	float	fp[4];
+	double	dp[2];
 };
 
 /*
@@ -131,3 +148,7 @@ void emulate_update_regs(struct pt_regs *reg, struct instruction_op *op);
  */
 extern int emulate_step(struct pt_regs *regs, unsigned int instr);
 
+extern void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
+			     const void *mem);
+extern void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
+			      void *mem);
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 3c3146b..400778d 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -32,6 +32,7 @@ obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
 obj-y			+= checksum_$(BITS).o checksum_wrappers.o
 
 obj-$(CONFIG_PPC_EMULATE_SSTEP)	+= sstep.o ldstfp.o
+obj64-$(CONFIG_PPC_EMULATE_SSTEP) += quad.o
 
 obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
 
diff --git a/arch/powerpc/lib/ldstfp.S b/arch/powerpc/lib/ldstfp.S
index a58777c..6840911 100644
--- a/arch/powerpc/lib/ldstfp.S
+++ b/arch/powerpc/lib/ldstfp.S
@@ -178,10 +178,10 @@ _GLOBAL(do_stfd)
 	EX_TABLE(2b,3b)
 
 #ifdef CONFIG_ALTIVEC
-/* Get the contents of vrN into v0; N is in r3. */
+/* Get the contents of vrN into v0; N is in r3. Doesn't touch r3 or r4. */
 _GLOBAL(get_vr)
 	mflr	r0
-	rlwinm	r3,r3,3,0xf8
+	rlwinm	r6,r3,3,0xf8
 	bcl	20,31,1f
 	blr			/* v0 is already in v0 */
 	nop
@@ -192,15 +192,15 @@ reg = 1
 reg = reg + 1
 	.endr
 1:	mflr	r5
-	add	r5,r3,r5
+	add	r5,r6,r5
 	mtctr	r5
 	mtlr	r0
 	bctr
 
-/* Put the contents of v0 into vrN; N is in r3. */
+/* Put the contents of v0 into vrN; N is in r3. Doesn't touch r3 or r4. */
 _GLOBAL(put_vr)
 	mflr	r0
-	rlwinm	r3,r3,3,0xf8
+	rlwinm	r6,r3,3,0xf8
 	bcl	20,31,1f
 	blr			/* v0 is already in v0 */
 	nop
@@ -211,7 +211,7 @@ reg = 1
 reg = reg + 1
 	.endr
 1:	mflr	r5
-	add	r5,r3,r5
+	add	r5,r6,r5
 	mtctr	r5
 	mtlr	r0
 	bctr
@@ -313,7 +313,7 @@ reg = reg + 1
 	bctr
 
 /* Load VSX reg N from vector doubleword *p.  N is in r3, p in r4. */
-_GLOBAL(do_lxvd2x)
+_GLOBAL(load_vsrn)
 	PPC_STLU r1,-STKFRM(r1)
 	mflr	r0
 	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
@@ -325,41 +325,38 @@ _GLOBAL(do_lxvd2x)
 	isync
 	beq	cr7,1f
 	STXVD2X(0,R1,R8)
-1:	li	r9,-EFAULT
-2:	LXVD2X(0,R0,R4)
-	li	r9,0
-3:	beq	cr7,4f
+1:	LXVD2X(0,R0,R4)
+#ifdef __LITTLE_ENDIAN__
+	XXSWAPD(0,0)
+#endif
+	beq	cr7,4f
 	bl	put_vsr
 	LXVD2X(0,R1,R8)
 4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
 	mtlr	r0
 	MTMSRD(r6)
 	isync
-	mr	r3,r9
 	addi	r1,r1,STKFRM
 	blr
-	EX_TABLE(2b,3b)
 
 /* Store VSX reg N to vector doubleword *p.  N is in r3, p in r4. */
-_GLOBAL(do_stxvd2x)
+_GLOBAL(store_vsrn)
 	PPC_STLU r1,-STKFRM(r1)
 	mflr	r0
 	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
 	mfmsr	r6
 	oris	r7,r6,MSR_VSX@h
-	cmpwi	cr7,r3,0
 	li	r8,STKFRM-16
 	MTMSRD(r7)
 	isync
-	beq	cr7,1f
 	STXVD2X(0,R1,R8)
 	bl	get_vsr
-1:	li	r9,-EFAULT
-2:	STXVD2X(0,R0,R4)
-	li	r9,0
-3:	beq	cr7,4f
+#ifdef __LITTLE_ENDIAN__
+	XXSWAPD(0,0)
+#endif
+	STXVD2X(0,R0,R4)
 	LXVD2X(0,R1,R8)
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
+	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
 	mtlr	r0
 	MTMSRD(r6)
 	isync
@@ -367,7 +364,36 @@ _GLOBAL(do_stxvd2x)
 	addi	r1,r1,STKFRM
 	blr
 	EX_TABLE(2b,3b)
-
 #endif /* CONFIG_VSX */
 
+/* Convert single-precision to double, without disturbing FPRs. */
+/* conv_sp_to_dp(float *sp, double *dp) */
+_GLOBAL(conv_sp_to_dp)
+	mfmsr	r6
+	ori	r7, r6, MSR_FP
+	MTMSRD(r7)
+	isync
+	stfd	fr0, -16(r1)
+	lfs	fr0, 0(r3)
+	stfd	fr0, 0(r4)
+	lfd	fr0, -16(r1)
+	MTMSRD(r6)
+	isync
+	blr
+
+/* Convert single-precision to double, without disturbing FPRs. */
+/* conv_sp_to_dp(double *dp, float *sp) */
+_GLOBAL(conv_dp_to_sp)
+	mfmsr	r6
+	ori	r7, r6, MSR_FP
+	MTMSRD(r7)
+	isync
+	stfd	fr0, -16(r1)
+	lfd	fr0, 0(r3)
+	stfs	fr0, 0(r4)
+	lfd	fr0, -16(r1)
+	MTMSRD(r6)
+	isync
+	blr
+
 #endif	/* CONFIG_PPC_FPU */
diff --git a/arch/powerpc/lib/quad.S b/arch/powerpc/lib/quad.S
new file mode 100644
index 0000000..c4d12fa
--- /dev/null
+++ b/arch/powerpc/lib/quad.S
@@ -0,0 +1,62 @@
+/*
+ * Quadword loads and stores
+ * for use in instruction emulation.
+ *
+ * Copyright 2017 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/processor.h>
+#include <asm/ppc_asm.h>
+#include <asm/ppc-opcode.h>
+#include <asm/reg.h>
+#include <asm/asm-offsets.h>
+#include <linux/errno.h>
+
+/* do_lq(unsigned long ea, unsigned long *regs) */
+_GLOBAL(do_lq)
+1:	lq	r6, 0(r3)
+	std	r6, 0(r4)
+	std	r7, 8(r4)
+	li	r3, 0
+	blr
+2:	li	r3, -EFAULT
+	blr
+	EX_TABLE(1b, 2b)
+
+/* do_stq(unsigned long ea, unsigned long val0, unsigned long val1) */
+_GLOBAL(do_stq)
+1:	stq	r4, 0(r3)
+	li	r3, 0
+	blr
+2:	li	r3, -EFAULT
+	blr
+	EX_TABLE(1b, 2b)
+
+/* do_lqarx(unsigned long ea, unsigned long *regs) */
+_GLOBAL(do_lqarx)
+1:	PPC_LQARX(6, 0, 3, 0)
+	std	r6, 0(r4)
+	std	r7, 8(r4)
+	li	r3, 0
+	blr
+2:	li	r3, -EFAULT
+	blr
+	EX_TABLE(1b, 2b)
+
+/* do_stqcx(unsigned long ea, unsigned long val0, unsigned long val1,
+	    unsigned int *crp) */
+
+_GLOBAL(do_stqcx)
+1:	PPC_STQCX(4, 0, 3)
+	mfcr	r5
+	stw	r5, 0(r6)
+	li	r3, 0
+	blr
+2:	li	r3, -EFAULT
+	blr
+	EX_TABLE(1b, 2b)
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 13733b7..88c7487 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -42,8 +42,29 @@ extern int do_stfs(int rn, unsigned long ea);
 extern int do_stfd(int rn, unsigned long ea);
 extern int do_lvx(int rn, unsigned long ea);
 extern int do_stvx(int rn, unsigned long ea);
-extern int do_lxvd2x(int rn, unsigned long ea);
-extern int do_stxvd2x(int rn, unsigned long ea);
+extern void load_vsrn(int vsr, const void *p);
+extern void store_vsrn(int vsr, void *p);
+extern void conv_sp_to_dp(const float *sp, double *dp);
+extern void conv_dp_to_sp(const double *dp, float *sp);
+#endif
+
+#ifdef __powerpc64__
+/*
+ * Functions in quad.S
+ */
+extern int do_lq(unsigned long ea, unsigned long *regs);
+extern int do_stq(unsigned long ea, unsigned long val0, unsigned long val1);
+extern int do_lqarx(unsigned long ea, unsigned long *regs);
+extern int do_stqcx(unsigned long ea, unsigned long val0, unsigned long val1,
+		    unsigned int *crp);
+#endif
+
+#ifdef __LITTLE_ENDIAN__
+#define IS_LE	1
+#define IS_BE	0
+#else
+#define IS_LE	0
+#define IS_BE	1
 #endif
 
 /*
@@ -125,6 +146,23 @@ static nokprobe_inline unsigned long dsform_ea(unsigned int instr,
 
 	return truncate_if_32bit(regs->msr, ea);
 }
+
+/*
+ * Calculate effective address for a DQ-form instruction
+ */
+static nokprobe_inline unsigned long dqform_ea(unsigned int instr,
+					       const struct pt_regs *regs)
+{
+	int ra;
+	unsigned long ea;
+
+	ra = (instr >> 16) & 0x1f;
+	ea = (signed short) (instr & ~0xf);	/* sign-extend */
+	if (ra)
+		ea += regs->gpr[ra];
+
+	return truncate_if_32bit(regs->msr, ea);
+}
 #endif /* __powerpc64 */
 
 /*
@@ -454,43 +492,195 @@ static nokprobe_inline int do_vec_store(int rn, int (*func)(int, unsigned long),
 }
 #endif /* CONFIG_ALTIVEC */
 
-#ifdef CONFIG_VSX
-static nokprobe_inline int do_vsx_load(int rn, int (*func)(int, unsigned long),
-				 unsigned long ea, struct pt_regs *regs)
+#ifdef __powerpc64__
+static nokprobe_inline int emulate_lq(struct pt_regs *regs, unsigned long ea,
+				      int reg)
 {
 	int err;
-	unsigned long val[2];
 
 	if (!address_ok(regs, ea, 16))
 		return -EFAULT;
-	if ((ea & 3) == 0)
-		return (*func)(rn, ea);
-	err = read_mem_unaligned(&val[0], ea, 8, regs);
-	if (!err)
-		err = read_mem_unaligned(&val[1], ea + 8, 8, regs);
+	/* if aligned, should be atomic */
+	if ((ea & 0xf) == 0)
+		return do_lq(ea, &regs->gpr[reg]);
+
+	err = read_mem(&regs->gpr[reg + IS_LE], ea, 8, regs);
 	if (!err)
-		err = (*func)(rn, (unsigned long) &val[0]);
+		err = read_mem(&regs->gpr[reg + IS_BE], ea + 8, 8, regs);
 	return err;
 }
 
-static nokprobe_inline int do_vsx_store(int rn, int (*func)(int, unsigned long),
-				 unsigned long ea, struct pt_regs *regs)
+static nokprobe_inline int emulate_stq(struct pt_regs *regs, unsigned long ea,
+				       int reg)
 {
 	int err;
-	unsigned long val[2];
 
 	if (!address_ok(regs, ea, 16))
 		return -EFAULT;
-	if ((ea & 3) == 0)
-		return (*func)(rn, ea);
-	err = (*func)(rn, (unsigned long) &val[0]);
-	if (err)
-		return err;
-	err = write_mem_unaligned(val[0], ea, 8, regs);
+	/* if aligned, should be atomic */
+	if ((ea & 0xf) == 0)
+		return do_stq(ea, regs->gpr[reg], regs->gpr[reg + 1]);
+
+	err = write_mem(regs->gpr[reg + IS_LE], ea, 8, regs);
 	if (!err)
-		err = write_mem_unaligned(val[1], ea + 8, 8, regs);
+		err = write_mem(regs->gpr[reg + IS_BE], ea + 8, 8, regs);
 	return err;
 }
+#endif /* __powerpc64 */
+
+#ifdef CONFIG_VSX
+void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
+		      const void *mem)
+{
+	int size, read_size;
+	int i, j;
+	const unsigned int *wp;
+	const unsigned short *hp;
+	const unsigned char *bp;
+
+	size = GETSIZE(op->type);
+	reg->d[0] = reg->d[1] = 0;
+
+	switch (op->element_size) {
+	case 16:
+		/* whole vector; lxv[x] or lxvl[l] */
+		if (size == 0)
+			break;
+		memcpy(reg, mem, size);
+		if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) {
+			/* reverse 16 bytes */
+			unsigned long tmp;
+			tmp = byterev_8(reg->d[0]);
+			reg->d[0] = byterev_8(reg->d[1]);
+			reg->d[1] = tmp;
+		}
+		break;
+	case 8:
+		/* scalar loads, lxvd2x, lxvdsx */
+		read_size = (size >= 8) ? 8 : size;
+		i = IS_LE ? 8 : 8 - read_size;
+		memcpy(&reg->b[i], mem, read_size);
+		if (size < 8) {
+			if (op->type & SIGNEXT) {
+				/* size == 4 is the only case here */
+				reg->d[IS_LE] = (signed int) reg->d[IS_LE];
+			} else if (op->vsx_flags & VSX_FPCONV) {
+				preempt_disable();
+				conv_sp_to_dp(&reg->fp[1 + IS_LE],
+					      &reg->dp[IS_LE]);
+				preempt_enable();
+			}
+		} else {
+			if (size == 16)
+				reg->d[IS_BE] = *(unsigned long *)(mem + 8);
+			else if (op->vsx_flags & VSX_SPLAT)
+				reg->d[IS_BE] = reg->d[IS_LE];
+		}
+		break;
+	case 4:
+		/* lxvw4x, lxvwsx */
+		wp = mem;
+		for (j = 0; j < size / 4; ++j) {
+			i = IS_LE ? 3 - j : j;
+			reg->w[i] = *wp++;
+		}
+		if (op->vsx_flags & VSX_SPLAT) {
+			u32 val = reg->w[IS_LE ? 3 : 0];
+			for (; j < 4; ++j) {
+				i = IS_LE ? 3 - j : j;
+				reg->w[i] = val;
+			}
+		}
+		break;
+	case 2:
+		/* lxvh8x */
+		hp = mem;
+		for (j = 0; j < size / 2; ++j) {
+			i = IS_LE ? 7 - j : j;
+			reg->h[i] = *hp++;
+		}
+		break;
+	case 1:
+		/* lxvb16x */
+		bp = mem;
+		for (j = 0; j < size; ++j) {
+			i = IS_LE ? 15 - j : j;
+			reg->b[i] = *bp++;
+		}
+		break;
+	}
+}
+EXPORT_SYMBOL_GPL(emulate_vsx_load);
+NOKPROBE_SYMBOL(emulate_vsx_load);
+
+void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
+		       void *mem)
+{
+	int size, write_size;
+	int i, j;
+	union vsx_reg buf;
+	unsigned int *wp;
+	unsigned short *hp;
+	unsigned char *bp;
+
+	size = GETSIZE(op->type);
+
+	switch (op->element_size) {
+	case 16:
+		/* stxv, stxvx, stxvl, stxvll */
+		if (size == 0)
+			break;
+		if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) {
+			/* reverse 16 bytes */
+			buf.d[0] = byterev_8(reg->d[1]);
+			buf.d[1] = byterev_8(reg->d[0]);
+			reg = &buf;
+		}
+		memcpy(mem, reg, size);
+		break;
+	case 8:
+		/* scalar stores, stxvd2x */
+		write_size = (size >= 8) ? 8 : size;
+		i = IS_LE ? 8 : 8 - write_size;
+		if (size < 8 && op->vsx_flags & VSX_FPCONV) {
+			buf.d[0] = buf.d[1] = 0;
+			preempt_disable();
+			conv_dp_to_sp(&reg->dp[IS_LE], &buf.fp[1 + IS_LE]);
+			preempt_enable();
+			reg = &buf;
+		}
+		memcpy(mem, &reg->b[i], write_size);
+		if (size == 16)
+			memcpy(mem + 8, &reg->d[IS_BE], 8);
+		break;
+	case 4:
+		/* stxvw4x */
+		wp = mem;
+		for (j = 0; j < size / 4; ++j) {
+			i = IS_LE ? 3 - j : j;
+			*wp++ = reg->w[i];
+		}
+		break;
+	case 2:
+		/* stxvh8x */
+		hp = mem;
+		for (j = 0; j < size / 2; ++j) {
+			i = IS_LE ? 7 - j : j;
+			*hp++ = reg->h[i];
+		}
+		break;
+	case 1:
+		/* stvxb16x */
+		bp = mem;
+		for (j = 0; j < size; ++j) {
+			i = IS_LE ? 15 - j : j;
+			*bp++ = reg->b[i];
+		}
+		break;
+	}
+}
+EXPORT_SYMBOL_GPL(emulate_vsx_store);
+NOKPROBE_SYMBOL(emulate_vsx_store);
 #endif /* CONFIG_VSX */
 
 #define __put_user_asmx(x, addr, err, op, cr)		\
@@ -1455,14 +1645,15 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		break;
 	}
 
-	/*
-	 * Loads and stores.
-	 */
+/*
+ * Loads and stores.
+ */
 	op->type = UNKNOWN;
 	op->update_reg = ra;
 	op->reg = rd;
 	op->val = regs->gpr[rd];
 	u = (instr >> 20) & UPDATE;
+	op->vsx_flags = 0;
 
 	switch (opcode) {
 	case 31:
@@ -1486,9 +1677,30 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->type = MKOP(STCX, 0, 8);
 			break;
 
-		case 21:	/* ldx */
-		case 53:	/* ldux */
-			op->type = MKOP(LOAD, u, 8);
+		case 52:	/* lbarx */
+			op->type = MKOP(LARX, 0, 1);
+			break;
+
+		case 694:	/* stbcx. */
+			op->type = MKOP(STCX, 0, 1);
+			break;
+
+		case 116:	/* lharx */
+			op->type = MKOP(LARX, 0, 2);
+			break;
+
+		case 726:	/* sthcx. */
+			op->type = MKOP(STCX, 0, 2);
+			break;
+
+		case 276:	/* lqarx */
+			if (!((rd & 1) || rd == ra || rd == rb))
+				op->type = MKOP(LARX, 0, 16);
+			break;
+
+		case 182:	/* stqcx. */
+			if (!(rd & 1))
+				op->type = MKOP(STCX, 0, 16);
 			break;
 #endif
 
@@ -1506,6 +1718,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		case 103:	/* lvx */
 		case 359:	/* lvxl */
 			op->type = MKOP(LOAD_VMX, 0, 16);
+			op->element_size = 16;
 			break;
 
 		case 231:	/* stvx */
@@ -1515,6 +1728,11 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef __powerpc64__
+		case 21:	/* ldx */
+		case 53:	/* ldux */
+			op->type = MKOP(LOAD, u, 8);
+			break;
+
 		case 149:	/* stdx */
 		case 181:	/* stdux */
 			op->type = MKOP(STORE, u, 8);
@@ -1635,16 +1853,184 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			break;
 
 #ifdef CONFIG_VSX
+		case 12:	/* lxsiwzx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 4);
+			op->element_size = 8;
+			break;
+
+		case 76:	/* lxsiwax */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, SIGNEXT, 4);
+			op->element_size = 8;
+			break;
+
+		case 140:	/* stxsiwx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 4);
+			op->element_size = 8;
+			break;
+
+		case 268:	/* lxvx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 16);
+			op->element_size = 16;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 269:	/* lxvl */
+		case 301: {	/* lxvll */
+			int nb;
+			op->reg = rd | ((instr & 1) << 5);
+			op->ea = ra ? regs->gpr[ra] : 0;
+			nb = regs->gpr[rb] & 0xff;
+			if (nb > 16)
+				nb = 16;
+			op->type = MKOP(LOAD_VSX, 0, nb);
+			op->element_size = 16;
+			op->vsx_flags = ((instr & 0x20) ? VSX_LDLEFT : 0) |
+				VSX_CHECK_VEC;
+			break;
+		}
+		case 332:	/* lxvdsx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 8);
+			op->element_size = 8;
+			op->vsx_flags = VSX_SPLAT;
+			break;
+
+		case 364:	/* lxvwsx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 4);
+			op->element_size = 4;
+			op->vsx_flags = VSX_SPLAT | VSX_CHECK_VEC;
+			break;
+
+		case 396:	/* stxvx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 16);
+			op->element_size = 16;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 397:	/* stxvl */
+		case 429: {	/* stxvll */
+			int nb;
+			op->reg = rd | ((instr & 1) << 5);
+			op->ea = ra ? regs->gpr[ra] : 0;
+			nb = regs->gpr[rb] & 0xff;
+			if (nb > 16)
+				nb = 16;
+			op->type = MKOP(STORE_VSX, 0, nb);
+			op->element_size = 16;
+			op->vsx_flags = ((instr & 0x20) ? VSX_LDLEFT : 0) |
+				VSX_CHECK_VEC;
+			break;
+		}
+		case 524:	/* lxsspx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 4);
+			op->element_size = 8;
+			op->vsx_flags = VSX_FPCONV;
+			break;
+
+		case 588:	/* lxsdx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 8);
+			op->element_size = 8;
+			break;
+
+		case 652:	/* stxsspx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 4);
+			op->element_size = 8;
+			op->vsx_flags = VSX_FPCONV;
+			break;
+
+		case 716:	/* stxsdx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 8);
+			op->element_size = 8;
+			break;
+
+		case 780:	/* lxvw4x */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 16);
+			op->element_size = 4;
+			break;
+
+		case 781:	/* lxsibzx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 1);
+			op->element_size = 8;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 812:	/* lxvh8x */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 16);
+			op->element_size = 2;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 813:	/* lxsihzx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 2);
+			op->element_size = 8;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
 		case 844:	/* lxvd2x */
-		case 876:	/* lxvd2ux */
 			op->reg = rd | ((instr & 1) << 5);
-			op->type = MKOP(LOAD_VSX, u, 16);
+			op->type = MKOP(LOAD_VSX, 0, 16);
+			op->element_size = 8;
+			break;
+
+		case 876:	/* lxvb16x */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(LOAD_VSX, 0, 16);
+			op->element_size = 1;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 908:	/* stxvw4x */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 16);
+			op->element_size = 4;
+			break;
+
+		case 909:	/* stxsibx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 1);
+			op->element_size = 8;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 940:	/* stxvh8x */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 16);
+			op->element_size = 2;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 941:	/* stxsihx */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 2);
+			op->element_size = 8;
+			op->vsx_flags = VSX_CHECK_VEC;
 			break;
 
 		case 972:	/* stxvd2x */
-		case 1004:	/* stxvd2ux */
 			op->reg = rd | ((instr & 1) << 5);
-			op->type = MKOP(STORE_VSX, u, 16);
+			op->type = MKOP(STORE_VSX, 0, 16);
+			op->element_size = 8;
+			break;
+
+		case 1004:	/* stxvb16x */
+			op->reg = rd | ((instr & 1) << 5);
+			op->type = MKOP(STORE_VSX, 0, 16);
+			op->element_size = 1;
+			op->vsx_flags = VSX_CHECK_VEC;
 			break;
 
 #endif /* CONFIG_VSX */
@@ -1732,6 +2118,34 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #endif
 
 #ifdef __powerpc64__
+	case 56:	/* lq */
+		if (!((rd & 1) || (rd == ra)))
+			op->type = MKOP(LOAD, 0, 16);
+		op->ea = dqform_ea(instr, regs);
+		break;
+#endif
+
+#ifdef CONFIG_VSX
+	case 57:	/* lxsd, lxssp */
+		op->ea = dsform_ea(instr, regs);
+		switch (instr & 3) {
+		case 2:		/* lxsd */
+			op->reg = rd + 32;
+			op->type = MKOP(LOAD_VSX, 0, 8);
+			op->element_size = 8;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+		case 3:		/* lxssp */
+			op->reg = rd + 32;
+			op->type = MKOP(LOAD_VSX, 0, 4);
+			op->element_size = 8;
+			op->vsx_flags = VSX_FPCONV | VSX_CHECK_VEC;
+			break;
+		}
+		break;
+#endif /* CONFIG_VSX */
+
+#ifdef __powerpc64__
 	case 58:	/* ld[u], lwa */
 		op->ea = dsform_ea(instr, regs);
 		switch (instr & 3) {
@@ -1746,7 +2160,51 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			break;
 		}
 		break;
+#endif
 
+#ifdef CONFIG_VSX
+	case 61:	/* lxv, stxsd, stxssp, stxv */
+		switch (instr & 7) {
+		case 1:		/* lxv */
+			op->ea = dqform_ea(instr, regs);
+			if (instr & 8)
+				op->reg = rd + 32;
+			op->type = MKOP(LOAD_VSX, 0, 16);
+			op->element_size = 16;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 2:		/* stxsd with LSB of DS field = 0 */
+		case 6:		/* stxsd with LSB of DS field = 1 */
+			op->ea = dsform_ea(instr, regs);
+			op->reg = rd + 32;
+			op->type = MKOP(STORE_VSX, 0, 8);
+			op->element_size = 8;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+
+		case 3:		/* stxssp with LSB of DS field = 0 */
+		case 7:		/* stxssp with LSB of DS field = 1 */
+			op->ea = dsform_ea(instr, regs);
+			op->reg = rd + 32;
+			op->type = MKOP(STORE_VSX, 0, 4);
+			op->element_size = 8;
+			op->vsx_flags = VSX_FPCONV | VSX_CHECK_VEC;
+			break;
+
+		case 5:		/* stxv */
+			op->ea = dqform_ea(instr, regs);
+			if (instr & 8)
+				op->reg = rd + 32;
+			op->type = MKOP(STORE_VSX, 0, 16);
+			op->element_size = 16;
+			op->vsx_flags = VSX_CHECK_VEC;
+			break;
+		}
+		break;
+#endif /* CONFIG_VSX */
+
+#ifdef __powerpc64__
 	case 62:	/* std[u] */
 		op->ea = dsform_ea(instr, regs);
 		switch (instr & 3) {
@@ -1756,6 +2214,10 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		case 1:		/* stdu */
 			op->type = MKOP(STORE, UPDATE, 8);
 			break;
+		case 2:		/* stq */
+			if (!(rd & 1))
+				op->type = MKOP(STORE, 0, 16);
+			break;
 		}
 		break;
 #endif /* __powerpc64__ */
@@ -1994,6 +2456,14 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			return 0;
 		err = 0;
 		switch (size) {
+#ifdef __powerpc64__
+		case 1:
+			__get_user_asmx(val, op.ea, err, "lbarx");
+			break;
+		case 2:
+			__get_user_asmx(val, op.ea, err, "lharx");
+			break;
+#endif
 		case 4:
 			__get_user_asmx(val, op.ea, err, "lwarx");
 			break;
@@ -2001,6 +2471,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		case 8:
 			__get_user_asmx(val, op.ea, err, "ldarx");
 			break;
+		case 16:
+			err = do_lqarx(op.ea, &regs->gpr[op.reg]);
+			goto ldst_done;
 #endif
 		default:
 			return 0;
@@ -2016,6 +2489,14 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			return 0;
 		err = 0;
 		switch (size) {
+#ifdef __powerpc64__
+		case 1:
+			__put_user_asmx(op.val, op.ea, err, "stbcx.", cr);
+			break;
+		case 2:
+			__put_user_asmx(op.val, op.ea, err, "stbcx.", cr);
+			break;
+#endif
 		case 4:
 			__put_user_asmx(op.val, op.ea, err, "stwcx.", cr);
 			break;
@@ -2023,6 +2504,10 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		case 8:
 			__put_user_asmx(op.val, op.ea, err, "stdcx.", cr);
 			break;
+		case 16:
+			err = do_stqcx(op.ea, regs->gpr[op.reg],
+				       regs->gpr[op.reg + 1], &cr);
+			break;
 #endif
 		default:
 			return 0;
@@ -2034,6 +2519,12 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		goto ldst_done;
 
 	case LOAD:
+#ifdef __powerpc64__
+		if (size == 16) {
+			err = emulate_lq(regs, op.ea, op.reg);
+			goto ldst_done;
+		}
+#endif
 		err = read_mem(&regs->gpr[op.reg], op.ea, size, regs);
 		if (!err) {
 			if (op.type & SIGNEXT)
@@ -2057,15 +2548,31 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case LOAD_VMX:
 		if (!(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_load(op.reg, do_lvx, op.ea & ~0xfUL, regs);
+		err = do_vec_load(op.reg, do_lvx, op.ea, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
-	case LOAD_VSX:
-		if (!(regs->msr & MSR_VSX))
+	case LOAD_VSX: {
+		char mem[16];
+		union vsx_reg buf;
+		unsigned long msrbit = MSR_VSX;
+
+		/*
+		 * Some VSX instructions check the MSR_VEC bit rather than MSR_VSX
+		 * when the target of the instruction is a vector register.
+		 */
+		if (op.reg >= 32 && (op.vsx_flags & VSX_CHECK_VEC))
+			msrbit = MSR_VEC;
+		if (!(regs->msr & msrbit))
+			return 0;
+		if (!address_ok(regs, op.ea, size) ||
+		    __copy_from_user(mem, (void __user *)op.ea, size))
 			return 0;
-		err = do_vsx_load(op.reg, do_lxvd2x, op.ea, regs);
+
+		emulate_vsx_load(&op, &buf, mem);
+		load_vsrn(op.reg, &buf);
 		goto ldst_done;
+	}
 #endif
 	case LOAD_MULTI:
 		if (regs->msr & MSR_LE)
@@ -2086,6 +2593,12 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		goto instr_done;
 
 	case STORE:
+#ifdef __powerpc64__
+		if (size == 16) {
+			err = emulate_stq(regs, op.ea, op.reg);
+			goto ldst_done;
+		}
+#endif
 		if ((op.type & UPDATE) && size == sizeof(long) &&
 		    op.reg == 1 && op.update_reg == 1 &&
 		    !(regs->msr & MSR_PR) &&
@@ -2110,15 +2623,32 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case STORE_VMX:
 		if (!(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_store(op.reg, do_stvx, op.ea & ~0xfUL, regs);
+		err = do_vec_store(op.reg, do_stvx, op.ea, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
-	case STORE_VSX:
-		if (!(regs->msr & MSR_VSX))
+	case STORE_VSX: {
+		char mem[16];
+		union vsx_reg buf;
+		unsigned long msrbit = MSR_VSX;
+
+		/*
+		 * Some VSX instructions check the MSR_VEC bit rather than MSR_VSX
+		 * when the target of the instruction is a vector register.
+		 */
+		if (op.reg >= 32 && (op.vsx_flags & VSX_CHECK_VEC))
+			msrbit = MSR_VEC;
+		if (!(regs->msr & msrbit))
+			return 0;
+		if (!address_ok(regs, op.ea, size))
+			return 0;
+
+		store_vsrn(op.reg, &buf);
+		emulate_vsx_store(&op, &buf, mem);
+		if (__copy_to_user((void __user *)op.ea, mem, size))
 			return 0;
-		err = do_vsx_store(op.reg, do_stxvd2x, op.ea, regs);
 		goto ldst_done;
+	}
 #endif
 	case STORE_MULTI:
 		if (regs->msr & MSR_LE)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 05/17] powerpc/64: Fix update forms of loads and stores to write 64-bit EA
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (3 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 04/17] powerpc: Handle most loads and stores in instruction emulation code Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 06/17] powerpc: Fix emulation of the isel instruction Paul Mackerras
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

When a 64-bit processor is executing in 32-bit mode, the update forms
of load and store instructions are required by the architecture to
write the full 64-bit effective address into the RA register, though
only the bottom 32 bits are used to address memory.  Currently,
the instruction emulation code writes the truncated address to the
RA register.  This fixes it by keeping the full 64-bit EA in the
instruction_op structure, truncating the address in emulate_step()
where it is used to address memory, rather than in the address
computations in analyse_instr().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |   4 +-
 arch/powerpc/lib/sstep.c         | 109 ++++++++++++++++++++-------------------
 2 files changed, 58 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 9801970..4fcc2c9 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -25,7 +25,7 @@ struct pt_regs;
 
 enum instruction_type {
 	COMPUTE,		/* arith/logical/CR op, etc. */
-	LOAD,
+	LOAD,			/* load and store types need to be contiguous */
 	LOAD_MULTI,
 	LOAD_FP,
 	LOAD_VMX,
@@ -52,6 +52,8 @@ enum instruction_type {
 
 #define INSTR_TYPE_MASK	0x1f
 
+#define OP_IS_LOAD_STORE(type)	(LOAD <= (type) && (type) <= STCX)
+
 /* Compute flags, ORed in with type */
 #define SETREG		0x20
 #define SETCC		0x40
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 88c7487..e20f2b4 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -126,7 +126,7 @@ static nokprobe_inline unsigned long dform_ea(unsigned int instr,
 	if (ra)
 		ea += regs->gpr[ra];
 
-	return truncate_if_32bit(regs->msr, ea);
+	return ea;
 }
 
 #ifdef __powerpc64__
@@ -144,7 +144,7 @@ static nokprobe_inline unsigned long dsform_ea(unsigned int instr,
 	if (ra)
 		ea += regs->gpr[ra];
 
-	return truncate_if_32bit(regs->msr, ea);
+	return ea;
 }
 
 /*
@@ -161,7 +161,7 @@ static nokprobe_inline unsigned long dqform_ea(unsigned int instr,
 	if (ra)
 		ea += regs->gpr[ra];
 
-	return truncate_if_32bit(regs->msr, ea);
+	return ea;
 }
 #endif /* __powerpc64 */
 
@@ -180,7 +180,7 @@ static nokprobe_inline unsigned long xform_ea(unsigned int instr,
 	if (ra)
 		ea += regs->gpr[ra];
 
-	return truncate_if_32bit(regs->msr, ea);
+	return ea;
 }
 
 /*
@@ -1789,10 +1789,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			if (rb == 0)
 				rb = 32;	/* # bytes to load */
 			op->type = MKOP(LOAD_MULTI, 0, rb);
-			op->ea = 0;
-			if (ra)
-				op->ea = truncate_if_32bit(regs->msr,
-							   regs->gpr[ra]);
+			op->ea = ra ? regs->gpr[ra] : 0;
 			break;
 
 #ifdef CONFIG_PPC_FPU
@@ -1837,10 +1834,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			if (rb == 0)
 				rb = 32;	/* # bytes to store */
 			op->type = MKOP(STORE_MULTI, 0, rb);
-			op->ea = 0;
-			if (ra)
-				op->ea = truncate_if_32bit(regs->msr,
-							   regs->gpr[ra]);
+			op->ea = ra ? regs->gpr[ra] : 0;
 			break;
 
 		case 790:	/* lhbrx */
@@ -2407,10 +2401,11 @@ void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op)
 int emulate_step(struct pt_regs *regs, unsigned int instr)
 {
 	struct instruction_op op;
-	int r, err, size;
+	int r, err, size, type;
 	unsigned long val;
 	unsigned int cr;
 	int i, rd, nb;
+	unsigned long ea;
 
 	r = analyse_instr(&op, regs, instr);
 	if (r < 0)
@@ -2422,27 +2417,33 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 
 	err = 0;
 	size = GETSIZE(op.type);
-	switch (op.type & INSTR_TYPE_MASK) {
+	type = op.type & INSTR_TYPE_MASK;
+
+	ea = op.ea;
+	if (OP_IS_LOAD_STORE(type) || type == CACHEOP)
+		ea = truncate_if_32bit(regs->msr, op.ea);
+
+	switch (type) {
 	case CACHEOP:
-		if (!address_ok(regs, op.ea, 8))
+		if (!address_ok(regs, ea, 8))
 			return 0;
 		switch (op.type & CACHEOP_MASK) {
 		case DCBST:
-			__cacheop_user_asmx(op.ea, err, "dcbst");
+			__cacheop_user_asmx(ea, err, "dcbst");
 			break;
 		case DCBF:
-			__cacheop_user_asmx(op.ea, err, "dcbf");
+			__cacheop_user_asmx(ea, err, "dcbf");
 			break;
 		case DCBTST:
 			if (op.reg == 0)
-				prefetchw((void *) op.ea);
+				prefetchw((void *) ea);
 			break;
 		case DCBT:
 			if (op.reg == 0)
-				prefetch((void *) op.ea);
+				prefetch((void *) ea);
 			break;
 		case ICBI:
-			__cacheop_user_asmx(op.ea, err, "icbi");
+			__cacheop_user_asmx(ea, err, "icbi");
 			break;
 		}
 		if (err)
@@ -2450,29 +2451,29 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		goto instr_done;
 
 	case LARX:
-		if (op.ea & (size - 1))
+		if (ea & (size - 1))
 			break;		/* can't handle misaligned */
-		if (!address_ok(regs, op.ea, size))
+		if (!address_ok(regs, ea, size))
 			return 0;
 		err = 0;
 		switch (size) {
 #ifdef __powerpc64__
 		case 1:
-			__get_user_asmx(val, op.ea, err, "lbarx");
+			__get_user_asmx(val, ea, err, "lbarx");
 			break;
 		case 2:
-			__get_user_asmx(val, op.ea, err, "lharx");
+			__get_user_asmx(val, ea, err, "lharx");
 			break;
 #endif
 		case 4:
-			__get_user_asmx(val, op.ea, err, "lwarx");
+			__get_user_asmx(val, ea, err, "lwarx");
 			break;
 #ifdef __powerpc64__
 		case 8:
-			__get_user_asmx(val, op.ea, err, "ldarx");
+			__get_user_asmx(val, ea, err, "ldarx");
 			break;
 		case 16:
-			err = do_lqarx(op.ea, &regs->gpr[op.reg]);
+			err = do_lqarx(ea, &regs->gpr[op.reg]);
 			goto ldst_done;
 #endif
 		default:
@@ -2483,29 +2484,29 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		goto ldst_done;
 
 	case STCX:
-		if (op.ea & (size - 1))
+		if (ea & (size - 1))
 			break;		/* can't handle misaligned */
-		if (!address_ok(regs, op.ea, size))
+		if (!address_ok(regs, ea, size))
 			return 0;
 		err = 0;
 		switch (size) {
 #ifdef __powerpc64__
 		case 1:
-			__put_user_asmx(op.val, op.ea, err, "stbcx.", cr);
+			__put_user_asmx(op.val, ea, err, "stbcx.", cr);
 			break;
 		case 2:
-			__put_user_asmx(op.val, op.ea, err, "stbcx.", cr);
+			__put_user_asmx(op.val, ea, err, "stbcx.", cr);
 			break;
 #endif
 		case 4:
-			__put_user_asmx(op.val, op.ea, err, "stwcx.", cr);
+			__put_user_asmx(op.val, ea, err, "stwcx.", cr);
 			break;
 #ifdef __powerpc64__
 		case 8:
-			__put_user_asmx(op.val, op.ea, err, "stdcx.", cr);
+			__put_user_asmx(op.val, ea, err, "stdcx.", cr);
 			break;
 		case 16:
-			err = do_stqcx(op.ea, regs->gpr[op.reg],
+			err = do_stqcx(ea, regs->gpr[op.reg],
 				       regs->gpr[op.reg + 1], &cr);
 			break;
 #endif
@@ -2521,11 +2522,11 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case LOAD:
 #ifdef __powerpc64__
 		if (size == 16) {
-			err = emulate_lq(regs, op.ea, op.reg);
+			err = emulate_lq(regs, ea, op.reg);
 			goto ldst_done;
 		}
 #endif
-		err = read_mem(&regs->gpr[op.reg], op.ea, size, regs);
+		err = read_mem(&regs->gpr[op.reg], ea, size, regs);
 		if (!err) {
 			if (op.type & SIGNEXT)
 				do_signext(&regs->gpr[op.reg], size);
@@ -2539,16 +2540,16 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		if (!(regs->msr & MSR_FP))
 			return 0;
 		if (size == 4)
-			err = do_fp_load(op.reg, do_lfs, op.ea, size, regs);
+			err = do_fp_load(op.reg, do_lfs, ea, size, regs);
 		else
-			err = do_fp_load(op.reg, do_lfd, op.ea, size, regs);
+			err = do_fp_load(op.reg, do_lfd, ea, size, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case LOAD_VMX:
 		if (!(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_load(op.reg, do_lvx, op.ea, regs);
+		err = do_vec_load(op.reg, do_lvx, ea, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
@@ -2565,8 +2566,8 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			msrbit = MSR_VEC;
 		if (!(regs->msr & msrbit))
 			return 0;
-		if (!address_ok(regs, op.ea, size) ||
-		    __copy_from_user(mem, (void __user *)op.ea, size))
+		if (!address_ok(regs, ea, size) ||
+		    __copy_from_user(mem, (void __user *)ea, size))
 			return 0;
 
 		emulate_vsx_load(&op, &buf, mem);
@@ -2582,12 +2583,12 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			nb = size - i;
 			if (nb > 4)
 				nb = 4;
-			err = read_mem(&regs->gpr[rd], op.ea, nb, regs);
+			err = read_mem(&regs->gpr[rd], ea, nb, regs);
 			if (err)
 				return 0;
 			if (nb < 4)	/* left-justify last bytes */
 				regs->gpr[rd] <<= 32 - 8 * nb;
-			op.ea += 4;
+			ea += 4;
 			++rd;
 		}
 		goto instr_done;
@@ -2595,18 +2596,18 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case STORE:
 #ifdef __powerpc64__
 		if (size == 16) {
-			err = emulate_stq(regs, op.ea, op.reg);
+			err = emulate_stq(regs, ea, op.reg);
 			goto ldst_done;
 		}
 #endif
 		if ((op.type & UPDATE) && size == sizeof(long) &&
 		    op.reg == 1 && op.update_reg == 1 &&
 		    !(regs->msr & MSR_PR) &&
-		    op.ea >= regs->gpr[1] - STACK_INT_FRAME_SIZE) {
-			err = handle_stack_update(op.ea, regs);
+		    ea >= regs->gpr[1] - STACK_INT_FRAME_SIZE) {
+			err = handle_stack_update(ea, regs);
 			goto ldst_done;
 		}
-		err = write_mem(op.val, op.ea, size, regs);
+		err = write_mem(op.val, ea, size, regs);
 		goto ldst_done;
 
 #ifdef CONFIG_PPC_FPU
@@ -2614,16 +2615,16 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		if (!(regs->msr & MSR_FP))
 			return 0;
 		if (size == 4)
-			err = do_fp_store(op.reg, do_stfs, op.ea, size, regs);
+			err = do_fp_store(op.reg, do_stfs, ea, size, regs);
 		else
-			err = do_fp_store(op.reg, do_stfd, op.ea, size, regs);
+			err = do_fp_store(op.reg, do_stfd, ea, size, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case STORE_VMX:
 		if (!(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_store(op.reg, do_stvx, op.ea, regs);
+		err = do_vec_store(op.reg, do_stvx, ea, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
@@ -2640,12 +2641,12 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			msrbit = MSR_VEC;
 		if (!(regs->msr & msrbit))
 			return 0;
-		if (!address_ok(regs, op.ea, size))
+		if (!address_ok(regs, ea, size))
 			return 0;
 
 		store_vsrn(op.reg, &buf);
 		emulate_vsx_store(&op, &buf, mem);
-		if (__copy_to_user((void __user *)op.ea, mem, size))
+		if (__copy_to_user((void __user *)ea, mem, size))
 			return 0;
 		goto ldst_done;
 	}
@@ -2661,10 +2662,10 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 				nb = 4;
 			else
 				val >>= 32 - 8 * nb;
-			err = write_mem(val, op.ea, nb, regs);
+			err = write_mem(val, ea, nb, regs);
 			if (err)
 				return 0;
-			op.ea += 4;
+			ea += 4;
 			++rd;
 		}
 		goto instr_done;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 06/17] powerpc: Fix emulation of the isel instruction
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (4 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 05/17] powerpc/64: Fix update forms of loads and stores to write 64-bit EA Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 07/17] powerpc: Don't update CR0 in emulation of popcnt, prty, bpermd instructions Paul Mackerras
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

The case added for the isel instruction was added inside a switch
statement which uses the 10-bit minor opcode field in the 0x7fe
bits of the instruction word.  However, for the isel instruction,
the minor opcode field is only the 0x3e bits, and the 0x7c0 bits
are used for the "BC" field, which indicates which CR bit to use
to select the result.

Therefore, for the isel emulation to work correctly when BC != 0,
we need to match on ((instr >> 1) & 0x1f) == 15).  To do this, we
pull the isel case out of the switch statement and put it in an
if statement of its own.

Fixes: e27f71e5ff3c ("powerpc/lib/sstep: Add isel instruction emulation")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e20f2b4..522bc7b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1216,6 +1216,16 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		return 0;
 
 	case 31:
+		/* isel occupies 32 minor opcodes */
+		if (((instr >> 1) & 0x1f) == 15) {
+			mb = (instr >> 6) & 0x1f; /* bc field */
+			val = (regs->ccr >> (31 - mb)) & 1;
+			val2 = (ra) ? regs->gpr[ra] : 0;
+
+			op->val = (val) ? val2 : regs->gpr[rb];
+			goto compute_done;
+		}
+
 		switch ((instr >> 1) & 0x3ff) {
 		case 4:		/* tw */
 			if (rd == 0x1f ||
@@ -1441,14 +1451,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 /*
  * Logical instructions
  */
-		case 15:	/* isel */
-			mb = (instr >> 6) & 0x1f; /* bc */
-			val = (regs->ccr >> (31 - mb)) & 1;
-			val2 = (ra) ? regs->gpr[ra] : 0;
-
-			op->val = (val) ? val2 : regs->gpr[rb];
-			goto compute_done;
-
 		case 26:	/* cntlzw */
 			op->val = __builtin_clz((unsigned int) regs->gpr[rd]);
 			goto logical_done;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 07/17] powerpc: Don't update CR0 in emulation of popcnt, prty, bpermd instructions
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (5 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 06/17] powerpc: Fix emulation of the isel instruction Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 08/17] powerpc: Add emulation for the addpcis instruction Paul Mackerras
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

The architecture shows the least-significant bit of the instruction
word as reserved for the popcnt[bwd], prty[wd] and bpermd
instructions, that is, these instructions never update CR0.
Therefore this changes the emulation of these instructions to
skip the CR0 update.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 522bc7b..114e597 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1469,7 +1469,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 
 		case 122:	/* popcntb */
 			do_popcnt(regs, op, regs->gpr[rd], 8);
-			goto logical_done;
+			goto logical_done_nocc;
 
 		case 124:	/* nor */
 			op->val = ~(regs->gpr[rd] | regs->gpr[rb]);
@@ -1477,15 +1477,15 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 
 		case 154:	/* prtyw */
 			do_prty(regs, op, regs->gpr[rd], 32);
-			goto logical_done;
+			goto logical_done_nocc;
 
 		case 186:	/* prtyd */
 			do_prty(regs, op, regs->gpr[rd], 64);
-			goto logical_done;
+			goto logical_done_nocc;
 #ifdef CONFIG_PPC64
 		case 252:	/* bpermd */
 			do_bpermd(regs, op, regs->gpr[rd], regs->gpr[rb]);
-			goto logical_done;
+			goto logical_done_nocc;
 #endif
 		case 284:	/* xor */
 			op->val = ~(regs->gpr[rd] ^ regs->gpr[rb]);
@@ -1497,7 +1497,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 
 		case 378:	/* popcntw */
 			do_popcnt(regs, op, regs->gpr[rd], 32);
-			goto logical_done;
+			goto logical_done_nocc;
 
 		case 412:	/* orc */
 			op->val = regs->gpr[rd] | ~regs->gpr[rb];
@@ -1513,7 +1513,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_PPC64
 		case 506:	/* popcntd */
 			do_popcnt(regs, op, regs->gpr[rd], 64);
-			goto logical_done;
+			goto logical_done_nocc;
 #endif
 		case 922:	/* extsh */
 			op->val = (signed short) regs->gpr[rd];
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 08/17] powerpc: Add emulation for the addpcis instruction
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (6 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 07/17] powerpc: Don't update CR0 in emulation of popcnt, prty, bpermd instructions Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 09/17] powerpc: Make load/store emulation use larger memory accesses Paul Mackerras
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

The addpcis instruction puts the sum of the next instruction address
plus a constant into a register.  Since the result depends on the
address of the instruction, it will give an incorrect result if it
is single-stepped out of line, which is what the *probes subsystem
will currently do if a probe is placed on an addpcis instruction.
This fixes the problem by adding emulation of it to analyse_instr().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 114e597..ed2bc4c 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1021,9 +1021,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->ccval = (regs->ccr & ~(1UL << (31 - rd))) |
 				(val << (31 - rd));
 			return 1;
-		default:
-			op->type = UNKNOWN;
-			return 0;
 		}
 		break;
 	case 31:
@@ -1123,6 +1120,17 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		op->val = imm;
 		goto compute_done;
 
+	case 19:
+		if (((instr >> 1) & 0x1f) == 2) {
+			/* addpcis */
+			imm = (short) (instr & 0xffc1);	/* d0 + d2 fields */
+			imm |= (instr >> 15) & 0x3e;	/* d1 field */
+			op->val = regs->nip + (imm << 16) + 4;
+			goto compute_done;
+		}
+		op->type = UNKNOWN;
+		return 0;
+
 	case 20:	/* rlwimi */
 		mb = (instr >> 6) & 0x1f;
 		me = (instr >> 1) & 0x1f;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 09/17] powerpc: Make load/store emulation use larger memory accesses
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (7 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 08/17] powerpc: Add emulation for the addpcis instruction Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 10/17] powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live Paul Mackerras
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

At the moment, emulation of loads and stores of up to 8 bytes to
unaligned addresses on a little-endian system uses a sequence of
single-byte loads or stores to memory.  This is rather inefficient,
and the code is hard to follow because it has many ifdefs.
In addition, the Power ISA has requirements on how unaligned accesses
are performed, which are not met by doing all accesses as
sequences of single-byte accesses.

Emulation of VSX loads and stores uses __copy_{to,from}_user,
which means the emulation code has no control on the size of
accesses.

To simplify this, we add new copy_mem_in() and copy_mem_out()
functions for accessing memory.  These use a sequence of the largest
possible aligned accesses, up to 8 bytes (or 4 on 32-bit systems),
to copy memory between a local buffer and user memory.  We then
rewrite {read,write}_mem_unaligned and the VSX load/store
emulation using these new functions.

These new functions also simplify the code in do_fp_load() and
do_fp_store() for the unaligned cases.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 235 +++++++++++++++++++++--------------------------
 1 file changed, 106 insertions(+), 129 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index ed2bc4c..6cc2911 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -193,7 +193,6 @@ static nokprobe_inline unsigned long max_align(unsigned long x)
 	return x & -x;		/* isolates rightmost bit */
 }
 
-
 static nokprobe_inline unsigned long byterev_2(unsigned long x)
 {
 	return ((x >> 8) & 0xff) | ((x & 0xff) << 8);
@@ -239,56 +238,69 @@ static nokprobe_inline int read_mem_aligned(unsigned long *dest,
 	return err;
 }
 
-static nokprobe_inline int read_mem_unaligned(unsigned long *dest,
-				unsigned long ea, int nb, struct pt_regs *regs)
+/*
+ * Copy from userspace to a buffer, using the largest possible
+ * aligned accesses, up to sizeof(long).
+ */
+static int nokprobe_inline copy_mem_in(u8 *dest, unsigned long ea, int nb)
 {
-	int err;
-	unsigned long x, b, c;
-#ifdef __LITTLE_ENDIAN__
-	int len = nb; /* save a copy of the length for byte reversal */
-#endif
+	int err = 0;
+	int c;
 
-	/* unaligned, do this in pieces */
-	x = 0;
 	for (; nb > 0; nb -= c) {
-#ifdef __LITTLE_ENDIAN__
-		c = 1;
-#endif
-#ifdef __BIG_ENDIAN__
 		c = max_align(ea);
-#endif
 		if (c > nb)
 			c = max_align(nb);
-		err = read_mem_aligned(&b, ea, c);
+		switch (c) {
+		case 1:
+			err = __get_user(*dest, (unsigned char __user *) ea);
+			break;
+		case 2:
+			err = __get_user(*(u16 *)dest,
+					 (unsigned short __user *) ea);
+			break;
+		case 4:
+			err = __get_user(*(u32 *)dest,
+					 (unsigned int __user *) ea);
+			break;
+#ifdef __powerpc64__
+		case 8:
+			err = __get_user(*(unsigned long *)dest,
+					 (unsigned long __user *) ea);
+			break;
+#endif
+		}
 		if (err)
 			return err;
-		x = (x << (8 * c)) + b;
+		dest += c;
 		ea += c;
 	}
-#ifdef __LITTLE_ENDIAN__
-	switch (len) {
-	case 2:
-		*dest = byterev_2(x);
-		break;
-	case 4:
-		*dest = byterev_4(x);
-		break;
-#ifdef __powerpc64__
-	case 8:
-		*dest = byterev_8(x);
-		break;
-#endif
-	}
-#endif
-#ifdef __BIG_ENDIAN__
-	*dest = x;
-#endif
 	return 0;
 }
 
+static nokprobe_inline int read_mem_unaligned(unsigned long *dest,
+					      unsigned long ea, int nb,
+					      struct pt_regs *regs)
+{
+	union {
+		unsigned long ul;
+		u8 b[sizeof(unsigned long)];
+	} u;
+	int i;
+	int err;
+
+	u.ul = 0;
+	i = IS_BE ? sizeof(unsigned long) - nb : 0;
+	err = copy_mem_in(&u.b[i], ea, nb);
+	if (!err)
+		*dest = u.ul;
+	return err;
+}
+
 /*
  * Read memory at address ea for nb bytes, return 0 for success
- * or -EFAULT if an error occurred.
+ * or -EFAULT if an error occurred.  N.B. nb must be 1, 2, 4 or 8.
+ * If nb < sizeof(long), the result is right-justified on BE systems.
  */
 static int read_mem(unsigned long *dest, unsigned long ea, int nb,
 			      struct pt_regs *regs)
@@ -325,48 +337,64 @@ static nokprobe_inline int write_mem_aligned(unsigned long val,
 	return err;
 }
 
-static nokprobe_inline int write_mem_unaligned(unsigned long val,
-				unsigned long ea, int nb, struct pt_regs *regs)
+/*
+ * Copy from a buffer to userspace, using the largest possible
+ * aligned accesses, up to sizeof(long).
+ */
+static int nokprobe_inline copy_mem_out(u8 *dest, unsigned long ea, int nb)
 {
-	int err;
-	unsigned long c;
+	int err = 0;
+	int c;
 
-#ifdef __LITTLE_ENDIAN__
-	switch (nb) {
-	case 2:
-		val = byterev_2(val);
-		break;
-	case 4:
-		val = byterev_4(val);
-		break;
-#ifdef __powerpc64__
-	case 8:
-		val = byterev_8(val);
-		break;
-#endif
-	}
-#endif
-	/* unaligned or little-endian, do this in pieces */
 	for (; nb > 0; nb -= c) {
-#ifdef __LITTLE_ENDIAN__
-		c = 1;
-#endif
-#ifdef __BIG_ENDIAN__
 		c = max_align(ea);
-#endif
 		if (c > nb)
 			c = max_align(nb);
-		err = write_mem_aligned(val >> (nb - c) * 8, ea, c);
+		switch (c) {
+		case 1:
+			err = __put_user(*dest, (unsigned char __user *) ea);
+			break;
+		case 2:
+			err = __put_user(*(u16 *)dest,
+					 (unsigned short __user *) ea);
+			break;
+		case 4:
+			err = __put_user(*(u32 *)dest,
+					 (unsigned int __user *) ea);
+			break;
+#ifdef __powerpc64__
+		case 8:
+			err = __put_user(*(unsigned long *)dest,
+					 (unsigned long __user *) ea);
+			break;
+#endif
+		}
 		if (err)
 			return err;
+		dest += c;
 		ea += c;
 	}
 	return 0;
 }
 
+static nokprobe_inline int write_mem_unaligned(unsigned long val,
+					       unsigned long ea, int nb,
+					       struct pt_regs *regs)
+{
+	union {
+		unsigned long ul;
+		u8 b[sizeof(unsigned long)];
+	} u;
+	int i;
+
+	u.ul = val;
+	i = IS_BE ? sizeof(unsigned long) - nb : 0;
+	return copy_mem_out(&u.b[i], ea, nb);
+}
+
 /*
  * Write memory at address ea for nb bytes, return 0 for success
- * or -EFAULT if an error occurred.
+ * or -EFAULT if an error occurred.  N.B. nb must be 1, 2, 4 or 8.
  */
 static int write_mem(unsigned long val, unsigned long ea, int nb,
 			       struct pt_regs *regs)
@@ -389,40 +417,17 @@ static int do_fp_load(int rn, int (*func)(int, unsigned long),
 				struct pt_regs *regs)
 {
 	int err;
-	union {
-		double dbl;
-		unsigned long ul[2];
-		struct {
-#ifdef __BIG_ENDIAN__
-			unsigned _pad_;
-			unsigned word;
-#endif
-#ifdef __LITTLE_ENDIAN__
-			unsigned word;
-			unsigned _pad_;
-#endif
-		} single;
-	} data;
-	unsigned long ptr;
+	u8 buf[sizeof(double)] __attribute__((aligned(sizeof(double))));
 
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
-	if ((ea & 3) == 0)
-		return (*func)(rn, ea);
-	ptr = (unsigned long) &data.ul;
-	if (sizeof(unsigned long) == 8 || nb == 4) {
-		err = read_mem_unaligned(&data.ul[0], ea, nb, regs);
-		if (nb == 4)
-			ptr = (unsigned long)&(data.single.word);
-	} else {
-		/* reading a double on 32-bit */
-		err = read_mem_unaligned(&data.ul[0], ea, 4, regs);
-		if (!err)
-			err = read_mem_unaligned(&data.ul[1], ea + 4, 4, regs);
+	if (ea & 3) {
+		err = copy_mem_in(buf, ea, nb);
+		if (err)
+			return err;
+		ea = (unsigned long) buf;
 	}
-	if (err)
-		return err;
-	return (*func)(rn, ptr);
+	return (*func)(rn, ea);
 }
 NOKPROBE_SYMBOL(do_fp_load);
 
@@ -431,43 +436,15 @@ static int do_fp_store(int rn, int (*func)(int, unsigned long),
 				 struct pt_regs *regs)
 {
 	int err;
-	union {
-		double dbl;
-		unsigned long ul[2];
-		struct {
-#ifdef __BIG_ENDIAN__
-			unsigned _pad_;
-			unsigned word;
-#endif
-#ifdef __LITTLE_ENDIAN__
-			unsigned word;
-			unsigned _pad_;
-#endif
-		} single;
-	} data;
-	unsigned long ptr;
+	u8 buf[sizeof(double)] __attribute__((aligned(sizeof(double))));
 
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
 	if ((ea & 3) == 0)
 		return (*func)(rn, ea);
-	ptr = (unsigned long) &data.ul[0];
-	if (sizeof(unsigned long) == 8 || nb == 4) {
-		if (nb == 4)
-			ptr = (unsigned long)&(data.single.word);
-		err = (*func)(rn, ptr);
-		if (err)
-			return err;
-		err = write_mem_unaligned(data.ul[0], ea, nb, regs);
-	} else {
-		/* writing a double on 32-bit */
-		err = (*func)(rn, ptr);
-		if (err)
-			return err;
-		err = write_mem_unaligned(data.ul[0], ea, 4, regs);
-		if (!err)
-			err = write_mem_unaligned(data.ul[1], ea + 4, 4, regs);
-	}
+	err = (*func)(rn, (unsigned long) buf);
+	if (!err)
+		err = copy_mem_out(buf, ea, nb);
 	return err;
 }
 NOKPROBE_SYMBOL(do_fp_store);
@@ -2564,7 +2541,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 #endif
 #ifdef CONFIG_VSX
 	case LOAD_VSX: {
-		char mem[16];
+		u8 mem[16];
 		union vsx_reg buf;
 		unsigned long msrbit = MSR_VSX;
 
@@ -2577,7 +2554,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		if (!(regs->msr & msrbit))
 			return 0;
 		if (!address_ok(regs, ea, size) ||
-		    __copy_from_user(mem, (void __user *)ea, size))
+		    copy_mem_in(mem, ea, size))
 			return 0;
 
 		emulate_vsx_load(&op, &buf, mem);
@@ -2639,7 +2616,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 #endif
 #ifdef CONFIG_VSX
 	case STORE_VSX: {
-		char mem[16];
+		u8 mem[16];
 		union vsx_reg buf;
 		unsigned long msrbit = MSR_VSX;
 
@@ -2656,7 +2633,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 
 		store_vsrn(op.reg, &buf);
 		emulate_vsx_store(&op, &buf, mem);
-		if (__copy_to_user((void __user *)ea, mem, size))
+		if (copy_mem_out(mem, ea, size))
 			return 0;
 		goto ldst_done;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 10/17] powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (8 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 09/17] powerpc: Make load/store emulation use larger memory accesses Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 11/17] powerpc: Emulate vector element load/store instructions Paul Mackerras
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

At present, the analyse_instr/emulate_step code checks for the
relevant MSR_FP/VEC/VSX bit being set when a FP/VMX/VSX load
or store is decoded, but doesn't recheck the bit before reading or
writing the relevant FP/VMX/VSX register in emulate_step().

Since we don't have preemption disabled, it is possible that we get
preempted between checking the MSR bit and doing the register access.
If that happened, then the registers would have been saved to the
thread_struct for the current process.  Accesses to the CPU registers
would then potentially read stale values, or write values that would
never be seen by the user process.

Another way that the registers can become non-live is if a page
fault occurs when accessing user memory, and the page fault code
calls a copy routine that wants to use the VMX or VSX registers.

To fix this, the code for all the FP/VMX/VSX loads gets restructured
so that it forms an image in a local variable of the desired register
contents, then disables preemption, checks the MSR bit and either
sets the CPU register or writes the value to the thread struct.
Similarly, the code for stores checks the MSR bit, copies either the
CPU register or the thread struct to a local variable, then reenables
preemption and then copies the register image to memory.

If the instruction being emulated is in the kernel, then we must not
use the register values in the thread_struct.  In this case, if the
relevant MSR enable bit is not set, then emulate_step refuses to
emulate the instruction.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |   1 +
 arch/powerpc/lib/ldstfp.S        | 241 +++++++--------------------------------
 arch/powerpc/lib/sstep.c         | 228 +++++++++++++++++++++++++-----------
 3 files changed, 203 insertions(+), 267 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 4fcc2c9..474a992 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -119,6 +119,7 @@ union vsx_reg {
 	unsigned long d[2];
 	float	fp[4];
 	double	dp[2];
+	__vector128 v;
 };
 
 /*
diff --git a/arch/powerpc/lib/ldstfp.S b/arch/powerpc/lib/ldstfp.S
index 6840911..7b5cf5e 100644
--- a/arch/powerpc/lib/ldstfp.S
+++ b/arch/powerpc/lib/ldstfp.S
@@ -21,27 +21,19 @@
 
 #define STKFRM	(PPC_MIN_STKFRM + 16)
 
-	.macro	inst32	op
-reg = 0
-	.rept	32
-20:	\op	reg,0,r4
-	b	3f
-	EX_TABLE(20b,99f)
-reg = reg + 1
-	.endr
-	.endm
-
-/* Get the contents of frN into fr0; N is in r3. */
+/* Get the contents of frN into *p; N is in r3 and p is in r4. */
 _GLOBAL(get_fpr)
 	mflr	r0
+	mfmsr	r6
+	ori	r7, r6, MSR_FP
+	MTMSRD(r7)
+	isync
 	rlwinm	r3,r3,3,0xf8
 	bcl	20,31,1f
-	blr			/* fr0 is already in fr0 */
-	nop
-reg = 1
-	.rept	31
-	fmr	fr0,reg
-	blr
+reg = 0
+	.rept	32
+	stfd	reg, 0(r4)
+	b	2f
 reg = reg + 1
 	.endr
 1:	mflr	r5
@@ -49,18 +41,23 @@ reg = reg + 1
 	mtctr	r5
 	mtlr	r0
 	bctr
+2:	MTMSRD(r6)
+	isync
+	blr
 
-/* Put the contents of fr0 into frN; N is in r3. */
+/* Put the contents of *p into frN; N is in r3 and p is in r4. */
 _GLOBAL(put_fpr)
 	mflr	r0
+	mfmsr	r6
+	ori	r7, r6, MSR_FP
+	MTMSRD(r7)
+	isync
 	rlwinm	r3,r3,3,0xf8
 	bcl	20,31,1f
-	blr			/* fr0 is already in fr0 */
-	nop
-reg = 1
-	.rept	31
-	fmr	reg,fr0
-	blr
+reg = 0
+	.rept	32
+	lfd	reg, 0(r4)
+	b	2f
 reg = reg + 1
 	.endr
 1:	mflr	r5
@@ -68,127 +65,24 @@ reg = reg + 1
 	mtctr	r5
 	mtlr	r0
 	bctr
-
-/* Load FP reg N from float at *p.  N is in r3, p in r4. */
-_GLOBAL(do_lfs)
-	PPC_STLU r1,-STKFRM(r1)
-	mflr	r0
-	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mfmsr	r6
-	ori	r7,r6,MSR_FP
-	cmpwi	cr7,r3,0
-	MTMSRD(r7)
-	isync
-	beq	cr7,1f
-	stfd	fr0,STKFRM-16(r1)
-1:	li	r9,-EFAULT
-2:	lfs	fr0,0(r4)
-	li	r9,0
-3:	bl	put_fpr
-	beq	cr7,4f
-	lfd	fr0,STKFRM-16(r1)
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mtlr	r0
-	MTMSRD(r6)
-	isync
-	mr	r3,r9
-	addi	r1,r1,STKFRM
-	blr
-	EX_TABLE(2b,3b)
-
-/* Load FP reg N from double at *p.  N is in r3, p in r4. */
-_GLOBAL(do_lfd)
-	PPC_STLU r1,-STKFRM(r1)
-	mflr	r0
-	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mfmsr	r6
-	ori	r7,r6,MSR_FP
-	cmpwi	cr7,r3,0
-	MTMSRD(r7)
-	isync
-	beq	cr7,1f
-	stfd	fr0,STKFRM-16(r1)
-1:	li	r9,-EFAULT
-2:	lfd	fr0,0(r4)
-	li	r9,0
-3:	beq	cr7,4f
-	bl	put_fpr
-	lfd	fr0,STKFRM-16(r1)
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mtlr	r0
-	MTMSRD(r6)
-	isync
-	mr	r3,r9
-	addi	r1,r1,STKFRM
-	blr
-	EX_TABLE(2b,3b)
-
-/* Store FP reg N to float at *p.  N is in r3, p in r4. */
-_GLOBAL(do_stfs)
-	PPC_STLU r1,-STKFRM(r1)
-	mflr	r0
-	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mfmsr	r6
-	ori	r7,r6,MSR_FP
-	cmpwi	cr7,r3,0
-	MTMSRD(r7)
-	isync
-	beq	cr7,1f
-	stfd	fr0,STKFRM-16(r1)
-	bl	get_fpr
-1:	li	r9,-EFAULT
-2:	stfs	fr0,0(r4)
-	li	r9,0
-3:	beq	cr7,4f
-	lfd	fr0,STKFRM-16(r1)
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mtlr	r0
-	MTMSRD(r6)
+2:	MTMSRD(r6)
 	isync
-	mr	r3,r9
-	addi	r1,r1,STKFRM
 	blr
-	EX_TABLE(2b,3b)
 
-/* Store FP reg N to double at *p.  N is in r3, p in r4. */
-_GLOBAL(do_stfd)
-	PPC_STLU r1,-STKFRM(r1)
+#ifdef CONFIG_ALTIVEC
+/* Get the contents of vrN into *p; N is in r3 and p is in r4. */
+_GLOBAL(get_vr)
 	mflr	r0
-	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
 	mfmsr	r6
-	ori	r7,r6,MSR_FP
-	cmpwi	cr7,r3,0
+	oris	r7, r6, MSR_VEC@h
 	MTMSRD(r7)
 	isync
-	beq	cr7,1f
-	stfd	fr0,STKFRM-16(r1)
-	bl	get_fpr
-1:	li	r9,-EFAULT
-2:	stfd	fr0,0(r4)
-	li	r9,0
-3:	beq	cr7,4f
-	lfd	fr0,STKFRM-16(r1)
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mtlr	r0
-	MTMSRD(r6)
-	isync
-	mr	r3,r9
-	addi	r1,r1,STKFRM
-	blr
-	EX_TABLE(2b,3b)
-
-#ifdef CONFIG_ALTIVEC
-/* Get the contents of vrN into v0; N is in r3. Doesn't touch r3 or r4. */
-_GLOBAL(get_vr)
-	mflr	r0
 	rlwinm	r6,r3,3,0xf8
 	bcl	20,31,1f
-	blr			/* v0 is already in v0 */
-	nop
-reg = 1
-	.rept	31
-	vor	v0,reg,reg	/* assembler doesn't know vmr? */
-	blr
+reg = 0
+	.rept	32
+	stvx	reg, 0, r4
+	b	2f
 reg = reg + 1
 	.endr
 1:	mflr	r5
@@ -196,18 +90,23 @@ reg = reg + 1
 	mtctr	r5
 	mtlr	r0
 	bctr
+2:	MTMSRD(r6)
+	isync
+	blr
 
-/* Put the contents of v0 into vrN; N is in r3. Doesn't touch r3 or r4. */
+/* Put the contents of *p into vrN; N is in r3 and p is in r4. */
 _GLOBAL(put_vr)
 	mflr	r0
+	mfmsr	r6
+	oris	r7, r6, MSR_VEC@h
+	MTMSRD(r7)
+	isync
 	rlwinm	r6,r3,3,0xf8
 	bcl	20,31,1f
-	blr			/* v0 is already in v0 */
-	nop
-reg = 1
-	.rept	31
-	vor	reg,v0,v0
-	blr
+reg = 0
+	.rept	32
+	lvx	reg, 0, r4
+	b	2f
 reg = reg + 1
 	.endr
 1:	mflr	r5
@@ -215,62 +114,9 @@ reg = reg + 1
 	mtctr	r5
 	mtlr	r0
 	bctr
-
-/* Load vector reg N from *p.  N is in r3, p in r4. */
-_GLOBAL(do_lvx)
-	PPC_STLU r1,-STKFRM(r1)
-	mflr	r0
-	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mfmsr	r6
-	oris	r7,r6,MSR_VEC@h
-	cmpwi	cr7,r3,0
-	li	r8,STKFRM-16
-	MTMSRD(r7)
-	isync
-	beq	cr7,1f
-	stvx	v0,r1,r8
-1:	li	r9,-EFAULT
-2:	lvx	v0,0,r4
-	li	r9,0
-3:	beq	cr7,4f
-	bl	put_vr
-	lvx	v0,r1,r8
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mtlr	r0
-	MTMSRD(r6)
-	isync
-	mr	r3,r9
-	addi	r1,r1,STKFRM
-	blr
-	EX_TABLE(2b,3b)
-
-/* Store vector reg N to *p.  N is in r3, p in r4. */
-_GLOBAL(do_stvx)
-	PPC_STLU r1,-STKFRM(r1)
-	mflr	r0
-	PPC_STL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mfmsr	r6
-	oris	r7,r6,MSR_VEC@h
-	cmpwi	cr7,r3,0
-	li	r8,STKFRM-16
-	MTMSRD(r7)
-	isync
-	beq	cr7,1f
-	stvx	v0,r1,r8
-	bl	get_vr
-1:	li	r9,-EFAULT
-2:	stvx	v0,0,r4
-	li	r9,0
-3:	beq	cr7,4f
-	lvx	v0,r1,r8
-4:	PPC_LL	r0,STKFRM+PPC_LR_STKOFF(r1)
-	mtlr	r0
-	MTMSRD(r6)
+2:	MTMSRD(r6)
 	isync
-	mr	r3,r9
-	addi	r1,r1,STKFRM
 	blr
-	EX_TABLE(2b,3b)
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef CONFIG_VSX
@@ -363,7 +209,6 @@ _GLOBAL(store_vsrn)
 	mr	r3,r9
 	addi	r1,r1,STKFRM
 	blr
-	EX_TABLE(2b,3b)
 #endif /* CONFIG_VSX */
 
 /* Convert single-precision to double, without disturbing FPRs. */
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 6cc2911..91ae031 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -36,12 +36,10 @@ extern char system_call_common[];
 /*
  * Functions in ldstfp.S
  */
-extern int do_lfs(int rn, unsigned long ea);
-extern int do_lfd(int rn, unsigned long ea);
-extern int do_stfs(int rn, unsigned long ea);
-extern int do_stfd(int rn, unsigned long ea);
-extern int do_lvx(int rn, unsigned long ea);
-extern int do_stvx(int rn, unsigned long ea);
+extern void get_fpr(int rn, double *p);
+extern void put_fpr(int rn, const double *p);
+extern void get_vr(int rn, __vector128 *p);
+extern void put_vr(int rn, __vector128 *p);
 extern void load_vsrn(int vsr, const void *p);
 extern void store_vsrn(int vsr, void *p);
 extern void conv_sp_to_dp(const float *sp, double *dp);
@@ -409,63 +407,108 @@ NOKPROBE_SYMBOL(write_mem);
 
 #ifdef CONFIG_PPC_FPU
 /*
- * Check the address and alignment, and call func to do the actual
- * load or store.
+ * These access either the real FP register or the image in the
+ * thread_struct, depending on regs->msr & MSR_FP.
  */
-static int do_fp_load(int rn, int (*func)(int, unsigned long),
-				unsigned long ea, int nb,
-				struct pt_regs *regs)
+static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 {
 	int err;
-	u8 buf[sizeof(double)] __attribute__((aligned(sizeof(double))));
+	union {
+		float f;
+		double d;
+		unsigned long l;
+		u8 b[sizeof(double)];
+	} u;
 
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
-	if (ea & 3) {
-		err = copy_mem_in(buf, ea, nb);
-		if (err)
-			return err;
-		ea = (unsigned long) buf;
-	}
-	return (*func)(rn, ea);
+	err = copy_mem_in(u.b, ea, nb);
+	if (err)
+		return err;
+	preempt_disable();
+	if (nb == 4)
+		conv_sp_to_dp(&u.f, &u.d);
+	if (regs->msr & MSR_FP)
+		put_fpr(rn, &u.d);
+	else
+		current->thread.TS_FPR(rn) = u.l;
+	preempt_enable();
+	return 0;
 }
 NOKPROBE_SYMBOL(do_fp_load);
 
-static int do_fp_store(int rn, int (*func)(int, unsigned long),
-				 unsigned long ea, int nb,
-				 struct pt_regs *regs)
+static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 {
-	int err;
-	u8 buf[sizeof(double)] __attribute__((aligned(sizeof(double))));
+	union {
+		float f;
+		double d;
+		unsigned long l;
+		u8 b[sizeof(double)];
+	} u;
 
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
-	if ((ea & 3) == 0)
-		return (*func)(rn, ea);
-	err = (*func)(rn, (unsigned long) buf);
-	if (!err)
-		err = copy_mem_out(buf, ea, nb);
-	return err;
+	preempt_disable();
+	if (regs->msr & MSR_FP)
+		get_fpr(rn, &u.d);
+	else
+		u.l = current->thread.TS_FPR(rn);
+	if (nb == 4)
+		conv_dp_to_sp(&u.d, &u.f);
+	preempt_enable();
+	return copy_mem_out(u.b, ea, nb);
 }
 NOKPROBE_SYMBOL(do_fp_store);
 #endif
 
 #ifdef CONFIG_ALTIVEC
 /* For Altivec/VMX, no need to worry about alignment */
-static nokprobe_inline int do_vec_load(int rn, int (*func)(int, unsigned long),
-				 unsigned long ea, struct pt_regs *regs)
+static nokprobe_inline int do_vec_load(int rn, unsigned long ea,
+				       int size, struct pt_regs *regs)
 {
+	int err;
+	union {
+		__vector128 v;
+		u8 b[sizeof(__vector128)];
+	} u = {};
+
 	if (!address_ok(regs, ea & ~0xfUL, 16))
 		return -EFAULT;
-	return (*func)(rn, ea);
+	/* align to multiple of size */
+	ea &= ~(size - 1);
+	err = copy_mem_in(u.b, ea, size);
+	if (err)
+		return err;
+
+	preempt_disable();
+	if (regs->msr & MSR_VEC)
+		put_vr(rn, &u.v);
+	else
+		current->thread.vr_state.vr[rn] = u.v;
+	preempt_enable();
+	return 0;
 }
 
-static nokprobe_inline int do_vec_store(int rn, int (*func)(int, unsigned long),
-				  unsigned long ea, struct pt_regs *regs)
+static nokprobe_inline int do_vec_store(int rn, unsigned long ea,
+					int size, struct pt_regs *regs)
 {
+	union {
+		__vector128 v;
+		u8 b[sizeof(__vector128)];
+	} u;
+
 	if (!address_ok(regs, ea & ~0xfUL, 16))
 		return -EFAULT;
-	return (*func)(rn, ea);
+	/* align to multiple of size */
+	ea &= ~(size - 1);
+
+	preempt_disable();
+	if (regs->msr & MSR_VEC)
+		get_vr(rn, &u.v);
+	else
+		u.v = current->thread.vr_state.vr[rn];
+	preempt_enable();
+	return copy_mem_out(u.b, ea, size);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -658,6 +701,68 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 }
 EXPORT_SYMBOL_GPL(emulate_vsx_store);
 NOKPROBE_SYMBOL(emulate_vsx_store);
+
+static nokprobe_inline int do_vsx_load(struct instruction_op *op,
+				       unsigned long ea, struct pt_regs *regs)
+{
+	int reg = op->reg;
+	u8 mem[16];
+	union vsx_reg buf;
+	int size = GETSIZE(op->type);
+
+	if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size))
+		return -EFAULT;
+
+	emulate_vsx_load(op, &buf, mem);
+	preempt_disable();
+	if (reg < 32) {
+		/* FP regs + extensions */
+		if (regs->msr & MSR_FP) {
+			load_vsrn(reg, &buf);
+		} else {
+			current->thread.fp_state.fpr[reg][0] = buf.d[0];
+			current->thread.fp_state.fpr[reg][1] = buf.d[1];
+		}
+	} else {
+		if (regs->msr & MSR_VEC)
+			load_vsrn(reg, &buf);
+		else
+			current->thread.vr_state.vr[reg - 32] = buf.v;
+	}
+	preempt_enable();
+	return 0;
+}
+
+static nokprobe_inline int do_vsx_store(struct instruction_op *op,
+					unsigned long ea, struct pt_regs *regs)
+{
+	int reg = op->reg;
+	u8 mem[16];
+	union vsx_reg buf;
+	int size = GETSIZE(op->type);
+
+	if (!address_ok(regs, ea, size))
+		return -EFAULT;
+
+	preempt_disable();
+	if (reg < 32) {
+		/* FP regs + extensions */
+		if (regs->msr & MSR_FP) {
+			store_vsrn(reg, &buf);
+		} else {
+			buf.d[0] = current->thread.fp_state.fpr[reg][0];
+			buf.d[1] = current->thread.fp_state.fpr[reg][1];
+		}
+	} else {
+		if (regs->msr & MSR_VEC)
+			store_vsrn(reg, &buf);
+		else
+			buf.v = current->thread.vr_state.vr[reg - 32];
+	}
+	preempt_enable();
+	emulate_vsx_store(op, &buf, mem);
+	return  copy_mem_out(mem, ea, size);
+}
 #endif /* CONFIG_VSX */
 
 #define __put_user_asmx(x, addr, err, op, cr)		\
@@ -2524,25 +2629,26 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 
 #ifdef CONFIG_PPC_FPU
 	case LOAD_FP:
-		if (!(regs->msr & MSR_FP))
+		/*
+		 * If the instruction is in userspace, we can emulate it even
+		 * if the VMX state is not live, because we have the state
+		 * stored in the thread_struct.  If the instruction is in
+		 * the kernel, we must not touch the state in the thread_struct.
+		 */
+		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		if (size == 4)
-			err = do_fp_load(op.reg, do_lfs, ea, size, regs);
-		else
-			err = do_fp_load(op.reg, do_lfd, ea, size, regs);
+		err = do_fp_load(op.reg, ea, size, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case LOAD_VMX:
-		if (!(regs->msr & MSR_VEC))
+		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_load(op.reg, do_lvx, ea, regs);
+		err = do_vec_load(op.reg, ea, size, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
 	case LOAD_VSX: {
-		u8 mem[16];
-		union vsx_reg buf;
 		unsigned long msrbit = MSR_VSX;
 
 		/*
@@ -2551,14 +2657,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 */
 		if (op.reg >= 32 && (op.vsx_flags & VSX_CHECK_VEC))
 			msrbit = MSR_VEC;
-		if (!(regs->msr & msrbit))
+		if (!(regs->msr & MSR_PR) && !(regs->msr & msrbit))
 			return 0;
-		if (!address_ok(regs, ea, size) ||
-		    copy_mem_in(mem, ea, size))
-			return 0;
-
-		emulate_vsx_load(&op, &buf, mem);
-		load_vsrn(op.reg, &buf);
+		err = do_vsx_load(&op, ea, regs);
 		goto ldst_done;
 	}
 #endif
@@ -2599,25 +2700,20 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 
 #ifdef CONFIG_PPC_FPU
 	case STORE_FP:
-		if (!(regs->msr & MSR_FP))
+		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		if (size == 4)
-			err = do_fp_store(op.reg, do_stfs, ea, size, regs);
-		else
-			err = do_fp_store(op.reg, do_stfd, ea, size, regs);
+		err = do_fp_store(op.reg, ea, size, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case STORE_VMX:
-		if (!(regs->msr & MSR_VEC))
+		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_store(op.reg, do_stvx, ea, regs);
+		err = do_vec_store(op.reg, ea, size, regs);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
 	case STORE_VSX: {
-		u8 mem[16];
-		union vsx_reg buf;
 		unsigned long msrbit = MSR_VSX;
 
 		/*
@@ -2626,15 +2722,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 */
 		if (op.reg >= 32 && (op.vsx_flags & VSX_CHECK_VEC))
 			msrbit = MSR_VEC;
-		if (!(regs->msr & msrbit))
-			return 0;
-		if (!address_ok(regs, ea, size))
-			return 0;
-
-		store_vsrn(op.reg, &buf);
-		emulate_vsx_store(&op, &buf, mem);
-		if (copy_mem_out(mem, ea, size))
+		if (!(regs->msr & MSR_PR) && !(regs->msr & msrbit))
 			return 0;
+		err = do_vsx_store(&op, ea, regs);
 		goto ldst_done;
 	}
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 11/17] powerpc: Emulate vector element load/store instructions
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (9 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 10/17] powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 12/17] powerpc: Emulate load/store floating double pair instructions Paul Mackerras
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This adds code to analyse_instr() and emulate_step() to handle the
vector element loads and stores:

lvebx, lvehx, lvewx, stvebx, stvehx, stvewx.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 91ae031..167d40d 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -476,7 +476,7 @@ static nokprobe_inline int do_vec_load(int rn, unsigned long ea,
 		return -EFAULT;
 	/* align to multiple of size */
 	ea &= ~(size - 1);
-	err = copy_mem_in(u.b, ea, size);
+	err = copy_mem_in(&u.b[ea & 0xf], ea, size);
 	if (err)
 		return err;
 
@@ -508,7 +508,7 @@ static nokprobe_inline int do_vec_store(int rn, unsigned long ea,
 	else
 		u.v = current->thread.vr_state.vr[rn];
 	preempt_enable();
-	return copy_mem_out(u.b, ea, size);
+	return copy_mem_out(&u.b[ea & 0xf], ea, size);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -1807,12 +1807,46 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			break;
 
 #ifdef CONFIG_ALTIVEC
+		/*
+		 * Note: for the load/store vector element instructions,
+		 * bits of the EA say which field of the VMX register to use.
+		 */
+		case 7:		/* lvebx */
+			op->type = MKOP(LOAD_VMX, 0, 1);
+			op->element_size = 1;
+			break;
+
+		case 39:	/* lvehx */
+			op->type = MKOP(LOAD_VMX, 0, 2);
+			op->element_size = 2;
+			break;
+
+		case 71:	/* lvewx */
+			op->type = MKOP(LOAD_VMX, 0, 4);
+			op->element_size = 4;
+			break;
+
 		case 103:	/* lvx */
 		case 359:	/* lvxl */
 			op->type = MKOP(LOAD_VMX, 0, 16);
 			op->element_size = 16;
 			break;
 
+		case 135:	/* stvebx */
+			op->type = MKOP(STORE_VMX, 0, 1);
+			op->element_size = 1;
+			break;
+
+		case 167:	/* stvehx */
+			op->type = MKOP(STORE_VMX, 0, 2);
+			op->element_size = 2;
+			break;
+
+		case 199:	/* stvewx */
+			op->type = MKOP(STORE_VMX, 0, 4);
+			op->element_size = 4;
+			break;
+
 		case 231:	/* stvx */
 		case 487:	/* stvxl */
 			op->type = MKOP(STORE_VMX, 0, 16);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 12/17] powerpc: Emulate load/store floating double pair instructions
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (10 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 11/17] powerpc: Emulate vector element load/store instructions Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 13/17] powerpc: Emulate the dcbz instruction Paul Mackerras
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This adds lfdp[x] and stfdp[x] to the set of instructions that
analyse_instr() and emulate_step() understand.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 68 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 167d40d..817cdc9 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -415,9 +415,9 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 	int err;
 	union {
 		float f;
-		double d;
-		unsigned long l;
-		u8 b[sizeof(double)];
+		double d[2];
+		unsigned long l[2];
+		u8 b[2 * sizeof(double)];
 	} u;
 
 	if (!address_ok(regs, ea, nb))
@@ -427,11 +427,19 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 		return err;
 	preempt_disable();
 	if (nb == 4)
-		conv_sp_to_dp(&u.f, &u.d);
+		conv_sp_to_dp(&u.f, &u.d[0]);
 	if (regs->msr & MSR_FP)
-		put_fpr(rn, &u.d);
+		put_fpr(rn, &u.d[0]);
 	else
-		current->thread.TS_FPR(rn) = u.l;
+		current->thread.TS_FPR(rn) = u.l[0];
+	if (nb == 16) {
+		/* lfdp */
+		rn |= 1;
+		if (regs->msr & MSR_FP)
+			put_fpr(rn, &u.d[1]);
+		else
+			current->thread.TS_FPR(rn) = u.l[1];
+	}
 	preempt_enable();
 	return 0;
 }
@@ -441,20 +449,27 @@ static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 {
 	union {
 		float f;
-		double d;
-		unsigned long l;
-		u8 b[sizeof(double)];
+		double d[2];
+		unsigned long l[2];
+		u8 b[2 * sizeof(double)];
 	} u;
 
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
 	preempt_disable();
 	if (regs->msr & MSR_FP)
-		get_fpr(rn, &u.d);
+		get_fpr(rn, &u.d[0]);
 	else
-		u.l = current->thread.TS_FPR(rn);
+		u.l[0] = current->thread.TS_FPR(rn);
 	if (nb == 4)
-		conv_dp_to_sp(&u.d, &u.f);
+		conv_dp_to_sp(&u.d[0], &u.f);
+	if (nb == 16) {
+		rn |= 1;
+		if (regs->msr & MSR_FP)
+			get_fpr(rn, &u.d[1]);
+		else
+			u.l[1] = current->thread.TS_FPR(rn);
+	}
 	preempt_enable();
 	return copy_mem_out(u.b, ea, nb);
 }
@@ -1938,7 +1953,17 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		case 759:	/* stfdux */
 			op->type = MKOP(STORE_FP, u, 8);
 			break;
-#endif
+
+#ifdef __powerpc64__
+		case 791:	/* lfdpx */
+			op->type = MKOP(LOAD_FP, 0, 16);
+			break;
+
+		case 919:	/* stfdpx */
+			op->type = MKOP(STORE_FP, 0, 16);
+			break;
+#endif /* __powerpc64 */
+#endif /* CONFIG_PPC_FPU */
 
 #ifdef __powerpc64__
 		case 660:	/* stdbrx */
@@ -1956,7 +1981,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->val = byterev_4(regs->gpr[rd]);
 			break;
 
-		case 725:
+		case 725:	/* stswi */
 			if (rb == 0)
 				rb = 32;	/* # bytes to store */
 			op->type = MKOP(STORE_MULTI, 0, rb);
@@ -2246,9 +2271,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #endif
 
 #ifdef CONFIG_VSX
-	case 57:	/* lxsd, lxssp */
+	case 57:	/* lfdp, lxsd, lxssp */
 		op->ea = dsform_ea(instr, regs);
 		switch (instr & 3) {
+		case 0:		/* lfdp */
+			if (rd & 1)
+				break;		/* reg must be even */
+			op->type = MKOP(LOAD_FP, 0, 16);
+			break;
 		case 2:		/* lxsd */
 			op->reg = rd + 32;
 			op->type = MKOP(LOAD_VSX, 0, 8);
@@ -2283,8 +2313,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #endif
 
 #ifdef CONFIG_VSX
-	case 61:	/* lxv, stxsd, stxssp, stxv */
+	case 61:	/* stfdp, lxv, stxsd, stxssp, stxv */
 		switch (instr & 7) {
+		case 0:		/* stfdp with LSB of DS field = 0 */
+		case 4:		/* stfdp with LSB of DS field = 1 */
+			op->ea = dsform_ea(instr, regs);
+			op->type = MKOP(STORE_FP, 0, 16);
+			break;
+
 		case 1:		/* lxv */
 			op->ea = dqform_ea(instr, regs);
 			if (instr & 8)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 13/17] powerpc: Emulate the dcbz instruction
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (11 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 12/17] powerpc: Emulate load/store floating double pair instructions Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 14/17] powerpc: Set regs->dar if memory access fails in emulate_step() Paul Mackerras
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This adds code to analyse_instr() and emulate_step() to understand the
dcbz (data cache block zero) instruction.  The emulate_dcbz() function
is made public so it can be used by the alignment handler in future.
(The apparently unnecessary cropping of the address to 32 bits is
there because it will be needed in that situation.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |  2 ++
 arch/powerpc/lib/sstep.c         | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 474a992..793639a 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -84,6 +84,7 @@ enum instruction_type {
 #define DCBTST		0x200
 #define DCBT		0x300
 #define ICBI		0x400
+#define DCBZ		0x500
 
 /* VSX flags values */
 #define VSX_FPCONV	1	/* do floating point SP/DP conversion */
@@ -155,3 +156,4 @@ extern void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 			     const void *mem);
 extern void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 			      void *mem);
+extern int emulate_dcbz(unsigned long ea, struct pt_regs *regs);
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 817cdc9..fa20f3a 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -780,6 +780,30 @@ static nokprobe_inline int do_vsx_store(struct instruction_op *op,
 }
 #endif /* CONFIG_VSX */
 
+int emulate_dcbz(unsigned long ea, struct pt_regs *regs)
+{
+	int err;
+	unsigned long i, size;
+
+#ifdef __powerpc64__
+	size = ppc64_caches.l1d.block_size;
+	if (!(regs->msr & MSR_64BIT))
+		ea &= 0xffffffffUL;
+#else
+	size = L1_CACHE_BYTES;
+#endif
+	ea &= ~(size - 1);
+	if (!address_ok(regs, ea, size))
+		return -EFAULT;
+	for (i = 0; i < size; i += sizeof(long)) {
+		err = __put_user(0, (unsigned long __user *) (ea + i));
+		if (err)
+			return err;
+	}
+	return 0;
+}
+NOKPROBE_SYMBOL(emulate_dcbz);
+
 #define __put_user_asmx(x, addr, err, op, cr)		\
 	__asm__ __volatile__(				\
 		"1:	" op " %2,0,%3\n"		\
@@ -1748,6 +1772,11 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->type = MKOP(CACHEOP, ICBI, 0);
 			op->ea = xform_ea(instr, regs);
 			return 0;
+
+		case 1014:	/* dcbz */
+			op->type = MKOP(CACHEOP, DCBZ, 0);
+			op->ea = xform_ea(instr, regs);
+			return 0;
 		}
 		break;
 	}
@@ -2607,6 +2636,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		case ICBI:
 			__cacheop_user_asmx(ea, err, "icbi");
 			break;
+		case DCBZ:
+			err = emulate_dcbz(ea, regs);
+			break;
 		}
 		if (err)
 			return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 14/17] powerpc: Set regs->dar if memory access fails in emulate_step()
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (12 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 13/17] powerpc: Emulate the dcbz instruction Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 15/17] powerpc: Handle opposite-endian processes in emulation code Paul Mackerras
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This adds code to the instruction emulation code to set regs->dar
to the address of any memory access that fails.  This address is
not necessarily the same as the effective address of the instruction,
because if the memory access is unaligned, it might cross a page
boundary and fault on the second page.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 74 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index fa20f3a..5c0f50b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -103,11 +103,19 @@ static nokprobe_inline int branch_taken(unsigned int instr,
 	return 1;
 }
 
-static nokprobe_inline long address_ok(struct pt_regs *regs, unsigned long ea, int nb)
+static nokprobe_inline long address_ok(struct pt_regs *regs,
+				       unsigned long ea, int nb)
 {
 	if (!user_mode(regs))
 		return 1;
-	return __access_ok(ea, nb, USER_DS);
+	if (__access_ok(ea, nb, USER_DS))
+		return 1;
+	if (__access_ok(ea, 1, USER_DS))
+		/* Access overlaps the end of the user region */
+		regs->dar = USER_DS.seg;
+	else
+		regs->dar = ea;
+	return 0;
 }
 
 /*
@@ -210,7 +218,8 @@ static nokprobe_inline unsigned long byterev_8(unsigned long x)
 #endif
 
 static nokprobe_inline int read_mem_aligned(unsigned long *dest,
-					unsigned long ea, int nb)
+					    unsigned long ea, int nb,
+					    struct pt_regs *regs)
 {
 	int err = 0;
 	unsigned long x = 0;
@@ -233,6 +242,8 @@ static nokprobe_inline int read_mem_aligned(unsigned long *dest,
 	}
 	if (!err)
 		*dest = x;
+	else
+		regs->dar = ea;
 	return err;
 }
 
@@ -240,7 +251,8 @@ static nokprobe_inline int read_mem_aligned(unsigned long *dest,
  * Copy from userspace to a buffer, using the largest possible
  * aligned accesses, up to sizeof(long).
  */
-static int nokprobe_inline copy_mem_in(u8 *dest, unsigned long ea, int nb)
+static int nokprobe_inline copy_mem_in(u8 *dest, unsigned long ea, int nb,
+				       struct pt_regs *regs)
 {
 	int err = 0;
 	int c;
@@ -268,8 +280,10 @@ static int nokprobe_inline copy_mem_in(u8 *dest, unsigned long ea, int nb)
 			break;
 #endif
 		}
-		if (err)
+		if (err) {
+			regs->dar = ea;
 			return err;
+		}
 		dest += c;
 		ea += c;
 	}
@@ -289,7 +303,7 @@ static nokprobe_inline int read_mem_unaligned(unsigned long *dest,
 
 	u.ul = 0;
 	i = IS_BE ? sizeof(unsigned long) - nb : 0;
-	err = copy_mem_in(&u.b[i], ea, nb);
+	err = copy_mem_in(&u.b[i], ea, nb, regs);
 	if (!err)
 		*dest = u.ul;
 	return err;
@@ -306,13 +320,14 @@ static int read_mem(unsigned long *dest, unsigned long ea, int nb,
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
 	if ((ea & (nb - 1)) == 0)
-		return read_mem_aligned(dest, ea, nb);
+		return read_mem_aligned(dest, ea, nb, regs);
 	return read_mem_unaligned(dest, ea, nb, regs);
 }
 NOKPROBE_SYMBOL(read_mem);
 
 static nokprobe_inline int write_mem_aligned(unsigned long val,
-					unsigned long ea, int nb)
+					     unsigned long ea, int nb,
+					     struct pt_regs *regs)
 {
 	int err = 0;
 
@@ -332,6 +347,8 @@ static nokprobe_inline int write_mem_aligned(unsigned long val,
 		break;
 #endif
 	}
+	if (err)
+		regs->dar = ea;
 	return err;
 }
 
@@ -339,7 +356,8 @@ static nokprobe_inline int write_mem_aligned(unsigned long val,
  * Copy from a buffer to userspace, using the largest possible
  * aligned accesses, up to sizeof(long).
  */
-static int nokprobe_inline copy_mem_out(u8 *dest, unsigned long ea, int nb)
+static int nokprobe_inline copy_mem_out(u8 *dest, unsigned long ea, int nb,
+					struct pt_regs *regs)
 {
 	int err = 0;
 	int c;
@@ -367,8 +385,10 @@ static int nokprobe_inline copy_mem_out(u8 *dest, unsigned long ea, int nb)
 			break;
 #endif
 		}
-		if (err)
+		if (err) {
+			regs->dar = ea;
 			return err;
+		}
 		dest += c;
 		ea += c;
 	}
@@ -387,7 +407,7 @@ static nokprobe_inline int write_mem_unaligned(unsigned long val,
 
 	u.ul = val;
 	i = IS_BE ? sizeof(unsigned long) - nb : 0;
-	return copy_mem_out(&u.b[i], ea, nb);
+	return copy_mem_out(&u.b[i], ea, nb, regs);
 }
 
 /*
@@ -400,7 +420,7 @@ static int write_mem(unsigned long val, unsigned long ea, int nb,
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
 	if ((ea & (nb - 1)) == 0)
-		return write_mem_aligned(val, ea, nb);
+		return write_mem_aligned(val, ea, nb, regs);
 	return write_mem_unaligned(val, ea, nb, regs);
 }
 NOKPROBE_SYMBOL(write_mem);
@@ -422,7 +442,7 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
-	err = copy_mem_in(u.b, ea, nb);
+	err = copy_mem_in(u.b, ea, nb, regs);
 	if (err)
 		return err;
 	preempt_disable();
@@ -471,7 +491,7 @@ static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 			u.l[1] = current->thread.TS_FPR(rn);
 	}
 	preempt_enable();
-	return copy_mem_out(u.b, ea, nb);
+	return copy_mem_out(u.b, ea, nb, regs);
 }
 NOKPROBE_SYMBOL(do_fp_store);
 #endif
@@ -491,7 +511,7 @@ static nokprobe_inline int do_vec_load(int rn, unsigned long ea,
 		return -EFAULT;
 	/* align to multiple of size */
 	ea &= ~(size - 1);
-	err = copy_mem_in(&u.b[ea & 0xf], ea, size);
+	err = copy_mem_in(&u.b[ea & 0xf], ea, size, regs);
 	if (err)
 		return err;
 
@@ -523,7 +543,7 @@ static nokprobe_inline int do_vec_store(int rn, unsigned long ea,
 	else
 		u.v = current->thread.vr_state.vr[rn];
 	preempt_enable();
-	return copy_mem_out(&u.b[ea & 0xf], ea, size);
+	return copy_mem_out(&u.b[ea & 0xf], ea, size, regs);
 }
 #endif /* CONFIG_ALTIVEC */
 
@@ -725,7 +745,7 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op,
 	union vsx_reg buf;
 	int size = GETSIZE(op->type);
 
-	if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size))
+	if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
 		return -EFAULT;
 
 	emulate_vsx_load(op, &buf, mem);
@@ -776,7 +796,7 @@ static nokprobe_inline int do_vsx_store(struct instruction_op *op,
 	}
 	preempt_enable();
 	emulate_vsx_store(op, &buf, mem);
-	return  copy_mem_out(mem, ea, size);
+	return  copy_mem_out(mem, ea, size, regs);
 }
 #endif /* CONFIG_VSX */
 
@@ -797,8 +817,10 @@ int emulate_dcbz(unsigned long ea, struct pt_regs *regs)
 		return -EFAULT;
 	for (i = 0; i < size; i += sizeof(long)) {
 		err = __put_user(0, (unsigned long __user *) (ea + i));
-		if (err)
+		if (err) {
+			regs->dar = ea;
 			return err;
+		}
 	}
 	return 0;
 }
@@ -2640,8 +2662,10 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			err = emulate_dcbz(ea, regs);
 			break;
 		}
-		if (err)
+		if (err) {
+			regs->dar = ea;
 			return 0;
+		}
 		goto instr_done;
 
 	case LARX:
@@ -2668,12 +2692,16 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			break;
 		case 16:
 			err = do_lqarx(ea, &regs->gpr[op.reg]);
-			goto ldst_done;
+			break;
 #endif
 		default:
 			return 0;
 		}
-		if (!err)
+		if (err) {
+			regs->dar = ea;
+			return 0;
+		}
+		if (size < 16)
 			regs->gpr[op.reg] = val;
 		goto ldst_done;
 
@@ -2711,6 +2739,8 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			regs->ccr = (regs->ccr & 0x0fffffff) |
 				(cr & 0xe0000000) |
 				((regs->xer >> 3) & 0x10000000);
+		else
+			regs->dar = ea;
 		goto ldst_done;
 
 	case LOAD:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 15/17] powerpc: Handle opposite-endian processes in emulation code
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (13 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 14/17] powerpc: Set regs->dar if memory access fails in emulate_step() Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 16/17] powerpc: Separate out load/store emulation into its own function Paul Mackerras
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This adds code to the load and store emulation code to byte-swap
the data appropriately when the process being emulated is set to
the opposite endianness to that of the kernel.

This also enables the emulation for the multiple-register loads
and stores (lmw, stmw, lswi, stswi, lswx, stswx) to work for
little-endian.  In little-endian mode, the partial word at the
end of a transfer for lsw*/stsw* (when the byte count is not a
multiple of 4) is loaded/stored at the least-significant end of
the register.  Additionally, this fixes a bug in the previous
code in that it could call read_mem/write_mem with a byte count
that was not 1, 2, 4 or 8.

Note that this only works correctly on processors with "true"
little-endian mode, such as IBM POWER processors from POWER6 on, not
the so-called "PowerPC" little-endian mode that uses address swizzling
as implemented on the old 32-bit 603, 604, 740/750, 74xx CPUs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |   7 +-
 arch/powerpc/lib/sstep.c         | 184 +++++++++++++++++++++++++++------------
 2 files changed, 131 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 793639a..958c2c5 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -153,7 +153,8 @@ void emulate_update_regs(struct pt_regs *reg, struct instruction_op *op);
 extern int emulate_step(struct pt_regs *regs, unsigned int instr);
 
 extern void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
-			     const void *mem);
-extern void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
-			      void *mem);
+			     const void *mem, bool cross_endian);
+extern void emulate_vsx_store(struct instruction_op *op,
+			      const union vsx_reg *reg, void *mem,
+			      bool cross_endian);
 extern int emulate_dcbz(unsigned long ea, struct pt_regs *regs);
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5c0f50b..810b5f2 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -217,6 +217,33 @@ static nokprobe_inline unsigned long byterev_8(unsigned long x)
 }
 #endif
 
+static nokprobe_inline void do_byte_reverse(void *ptr, int nb)
+{
+	switch (nb) {
+	case 2:
+		*(u16 *)ptr = byterev_2(*(u16 *)ptr);
+		break;
+	case 4:
+		*(u32 *)ptr = byterev_4(*(u32 *)ptr);
+		break;
+#ifdef __powerpc64__
+	case 8:
+		*(unsigned long *)ptr = byterev_8(*(unsigned long *)ptr);
+		break;
+	case 16: {
+		unsigned long *up = (unsigned long *)ptr;
+		unsigned long tmp;
+		tmp = byterev_8(up[0]);
+		up[0] = byterev_8(up[1]);
+		up[1] = tmp;
+		break;
+	}
+#endif
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 static nokprobe_inline int read_mem_aligned(unsigned long *dest,
 					    unsigned long ea, int nb,
 					    struct pt_regs *regs)
@@ -430,7 +457,8 @@ NOKPROBE_SYMBOL(write_mem);
  * These access either the real FP register or the image in the
  * thread_struct, depending on regs->msr & MSR_FP.
  */
-static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
+static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs,
+		      bool cross_endian)
 {
 	int err;
 	union {
@@ -445,6 +473,11 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 	err = copy_mem_in(u.b, ea, nb, regs);
 	if (err)
 		return err;
+	if (unlikely(cross_endian)) {
+		do_byte_reverse(u.b, min(nb, 8));
+		if (nb == 16)
+			do_byte_reverse(&u.b[8], 8);
+	}
 	preempt_disable();
 	if (nb == 4)
 		conv_sp_to_dp(&u.f, &u.d[0]);
@@ -465,7 +498,8 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(do_fp_load);
 
-static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs)
+static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs,
+		       bool cross_endian)
 {
 	union {
 		float f;
@@ -491,6 +525,11 @@ static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs)
 			u.l[1] = current->thread.TS_FPR(rn);
 	}
 	preempt_enable();
+	if (unlikely(cross_endian)) {
+		do_byte_reverse(u.b, min(nb, 8));
+		if (nb == 16)
+			do_byte_reverse(&u.b[8], 8);
+	}
 	return copy_mem_out(u.b, ea, nb, regs);
 }
 NOKPROBE_SYMBOL(do_fp_store);
@@ -499,7 +538,8 @@ NOKPROBE_SYMBOL(do_fp_store);
 #ifdef CONFIG_ALTIVEC
 /* For Altivec/VMX, no need to worry about alignment */
 static nokprobe_inline int do_vec_load(int rn, unsigned long ea,
-				       int size, struct pt_regs *regs)
+				       int size, struct pt_regs *regs,
+				       bool cross_endian)
 {
 	int err;
 	union {
@@ -514,7 +554,8 @@ static nokprobe_inline int do_vec_load(int rn, unsigned long ea,
 	err = copy_mem_in(&u.b[ea & 0xf], ea, size, regs);
 	if (err)
 		return err;
-
+	if (unlikely(cross_endian))
+		do_byte_reverse(&u.b[ea & 0xf], size);
 	preempt_disable();
 	if (regs->msr & MSR_VEC)
 		put_vr(rn, &u.v);
@@ -525,7 +566,8 @@ static nokprobe_inline int do_vec_load(int rn, unsigned long ea,
 }
 
 static nokprobe_inline int do_vec_store(int rn, unsigned long ea,
-					int size, struct pt_regs *regs)
+					int size, struct pt_regs *regs,
+					bool cross_endian)
 {
 	union {
 		__vector128 v;
@@ -543,49 +585,60 @@ static nokprobe_inline int do_vec_store(int rn, unsigned long ea,
 	else
 		u.v = current->thread.vr_state.vr[rn];
 	preempt_enable();
+	if (unlikely(cross_endian))
+		do_byte_reverse(&u.b[ea & 0xf], size);
 	return copy_mem_out(&u.b[ea & 0xf], ea, size, regs);
 }
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef __powerpc64__
 static nokprobe_inline int emulate_lq(struct pt_regs *regs, unsigned long ea,
-				      int reg)
+				      int reg, bool cross_endian)
 {
 	int err;
 
 	if (!address_ok(regs, ea, 16))
 		return -EFAULT;
 	/* if aligned, should be atomic */
-	if ((ea & 0xf) == 0)
-		return do_lq(ea, &regs->gpr[reg]);
-
-	err = read_mem(&regs->gpr[reg + IS_LE], ea, 8, regs);
-	if (!err)
-		err = read_mem(&regs->gpr[reg + IS_BE], ea + 8, 8, regs);
+	if ((ea & 0xf) == 0) {
+		err = do_lq(ea, &regs->gpr[reg]);
+	} else {
+		err = read_mem(&regs->gpr[reg + IS_LE], ea, 8, regs);
+		if (!err)
+			err = read_mem(&regs->gpr[reg + IS_BE], ea + 8, 8, regs);
+	}
+	if (!err && unlikely(cross_endian))
+		do_byte_reverse(&regs->gpr[reg], 16);
 	return err;
 }
 
 static nokprobe_inline int emulate_stq(struct pt_regs *regs, unsigned long ea,
-				       int reg)
+				       int reg, bool cross_endian)
 {
 	int err;
+	unsigned long vals[2];
 
 	if (!address_ok(regs, ea, 16))
 		return -EFAULT;
+	vals[0] = regs->gpr[reg];
+	vals[1] = regs->gpr[reg + 1];
+	if (unlikely(cross_endian))
+		do_byte_reverse(vals, 16);
+
 	/* if aligned, should be atomic */
 	if ((ea & 0xf) == 0)
-		return do_stq(ea, regs->gpr[reg], regs->gpr[reg + 1]);
+		return do_stq(ea, vals[0], vals[1]);
 
-	err = write_mem(regs->gpr[reg + IS_LE], ea, 8, regs);
+	err = write_mem(vals[IS_LE], ea, 8, regs);
 	if (!err)
-		err = write_mem(regs->gpr[reg + IS_BE], ea + 8, 8, regs);
+		err = write_mem(vals[IS_BE], ea + 8, 8, regs);
 	return err;
 }
 #endif /* __powerpc64 */
 
 #ifdef CONFIG_VSX
 void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
-		      const void *mem)
+		      const void *mem, bool rev)
 {
 	int size, read_size;
 	int i, j;
@@ -602,19 +655,18 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 		if (size == 0)
 			break;
 		memcpy(reg, mem, size);
-		if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) {
-			/* reverse 16 bytes */
-			unsigned long tmp;
-			tmp = byterev_8(reg->d[0]);
-			reg->d[0] = byterev_8(reg->d[1]);
-			reg->d[1] = tmp;
-		}
+		if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
+			rev = !rev;
+		if (rev)
+			do_byte_reverse(reg, 16);
 		break;
 	case 8:
 		/* scalar loads, lxvd2x, lxvdsx */
 		read_size = (size >= 8) ? 8 : size;
 		i = IS_LE ? 8 : 8 - read_size;
 		memcpy(&reg->b[i], mem, read_size);
+		if (rev)
+			do_byte_reverse(&reg->b[i], 8);
 		if (size < 8) {
 			if (op->type & SIGNEXT) {
 				/* size == 4 is the only case here */
@@ -626,9 +678,10 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 				preempt_enable();
 			}
 		} else {
-			if (size == 16)
-				reg->d[IS_BE] = *(unsigned long *)(mem + 8);
-			else if (op->vsx_flags & VSX_SPLAT)
+			if (size == 16) {
+				unsigned long v = *(unsigned long *)(mem + 8);
+				reg->d[IS_BE] = !rev ? v : byterev_8(v);
+			} else if (op->vsx_flags & VSX_SPLAT)
 				reg->d[IS_BE] = reg->d[IS_LE];
 		}
 		break;
@@ -637,7 +690,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 		wp = mem;
 		for (j = 0; j < size / 4; ++j) {
 			i = IS_LE ? 3 - j : j;
-			reg->w[i] = *wp++;
+			reg->w[i] = !rev ? *wp++ : byterev_4(*wp++);
 		}
 		if (op->vsx_flags & VSX_SPLAT) {
 			u32 val = reg->w[IS_LE ? 3 : 0];
@@ -652,7 +705,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 		hp = mem;
 		for (j = 0; j < size / 2; ++j) {
 			i = IS_LE ? 7 - j : j;
-			reg->h[i] = *hp++;
+			reg->h[i] = !rev ? *hp++ : byterev_2(*hp++);
 		}
 		break;
 	case 1:
@@ -669,7 +722,7 @@ EXPORT_SYMBOL_GPL(emulate_vsx_load);
 NOKPROBE_SYMBOL(emulate_vsx_load);
 
 void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
-		       void *mem)
+		       void *mem, bool rev)
 {
 	int size, write_size;
 	int i, j;
@@ -685,7 +738,9 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 		/* stxv, stxvx, stxvl, stxvll */
 		if (size == 0)
 			break;
-		if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) {
+		if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
+			rev = !rev;
+		if (rev) {
 			/* reverse 16 bytes */
 			buf.d[0] = byterev_8(reg->d[1]);
 			buf.d[1] = byterev_8(reg->d[0]);
@@ -707,13 +762,18 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 		memcpy(mem, &reg->b[i], write_size);
 		if (size == 16)
 			memcpy(mem + 8, &reg->d[IS_BE], 8);
+		if (unlikely(rev)) {
+			do_byte_reverse(mem, write_size);
+			if (size == 16)
+				do_byte_reverse(mem + 8, 8);
+		}
 		break;
 	case 4:
 		/* stxvw4x */
 		wp = mem;
 		for (j = 0; j < size / 4; ++j) {
 			i = IS_LE ? 3 - j : j;
-			*wp++ = reg->w[i];
+			*wp++ = !rev ? reg->w[i] : byterev_4(reg->w[i]);
 		}
 		break;
 	case 2:
@@ -721,7 +781,7 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 		hp = mem;
 		for (j = 0; j < size / 2; ++j) {
 			i = IS_LE ? 7 - j : j;
-			*hp++ = reg->h[i];
+			*hp++ = !rev ? reg->h[i] : byterev_2(reg->h[i]);
 		}
 		break;
 	case 1:
@@ -738,7 +798,8 @@ EXPORT_SYMBOL_GPL(emulate_vsx_store);
 NOKPROBE_SYMBOL(emulate_vsx_store);
 
 static nokprobe_inline int do_vsx_load(struct instruction_op *op,
-				       unsigned long ea, struct pt_regs *regs)
+				       unsigned long ea, struct pt_regs *regs,
+				       bool cross_endian)
 {
 	int reg = op->reg;
 	u8 mem[16];
@@ -748,7 +809,7 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op,
 	if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
 		return -EFAULT;
 
-	emulate_vsx_load(op, &buf, mem);
+	emulate_vsx_load(op, &buf, mem, cross_endian);
 	preempt_disable();
 	if (reg < 32) {
 		/* FP regs + extensions */
@@ -769,7 +830,8 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op,
 }
 
 static nokprobe_inline int do_vsx_store(struct instruction_op *op,
-					unsigned long ea, struct pt_regs *regs)
+					unsigned long ea, struct pt_regs *regs,
+					bool cross_endian)
 {
 	int reg = op->reg;
 	u8 mem[16];
@@ -795,7 +857,7 @@ static nokprobe_inline int do_vsx_store(struct instruction_op *op,
 			buf.v = current->thread.vr_state.vr[reg - 32];
 	}
 	preempt_enable();
-	emulate_vsx_store(op, &buf, mem);
+	emulate_vsx_store(op, &buf, mem, cross_endian);
 	return  copy_mem_out(mem, ea, size, regs);
 }
 #endif /* CONFIG_VSX */
@@ -2619,6 +2681,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	unsigned int cr;
 	int i, rd, nb;
 	unsigned long ea;
+	bool cross_endian;
 
 	r = analyse_instr(&op, regs, instr);
 	if (r < 0)
@@ -2631,6 +2694,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	err = 0;
 	size = GETSIZE(op.type);
 	type = op.type & INSTR_TYPE_MASK;
+	cross_endian = (regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE);
 
 	ea = op.ea;
 	if (OP_IS_LOAD_STORE(type) || type == CACHEOP)
@@ -2746,7 +2810,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case LOAD:
 #ifdef __powerpc64__
 		if (size == 16) {
-			err = emulate_lq(regs, ea, op.reg);
+			err = emulate_lq(regs, ea, op.reg, cross_endian);
 			goto ldst_done;
 		}
 #endif
@@ -2754,7 +2818,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		if (!err) {
 			if (op.type & SIGNEXT)
 				do_signext(&regs->gpr[op.reg], size);
-			if (op.type & BYTEREV)
+			if ((op.type & BYTEREV) == (cross_endian ? 0 : BYTEREV))
 				do_byterev(&regs->gpr[op.reg], size);
 		}
 		goto ldst_done;
@@ -2769,14 +2833,14 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 */
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		err = do_fp_load(op.reg, ea, size, regs);
+		err = do_fp_load(op.reg, ea, size, regs, cross_endian);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case LOAD_VMX:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_load(op.reg, ea, size, regs);
+		err = do_vec_load(op.reg, ea, size, regs, cross_endian);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
@@ -2791,23 +2855,26 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			msrbit = MSR_VEC;
 		if (!(regs->msr & MSR_PR) && !(regs->msr & msrbit))
 			return 0;
-		err = do_vsx_load(&op, ea, regs);
+		err = do_vsx_load(&op, ea, regs, cross_endian);
 		goto ldst_done;
 	}
 #endif
 	case LOAD_MULTI:
-		if (regs->msr & MSR_LE)
-			return 0;
+		if (!address_ok(regs, ea, size))
+			return -EFAULT;
 		rd = op.reg;
 		for (i = 0; i < size; i += 4) {
+			unsigned int v32 = 0;
+
 			nb = size - i;
 			if (nb > 4)
 				nb = 4;
-			err = read_mem(&regs->gpr[rd], ea, nb, regs);
+			err = copy_mem_in((u8 *) &v32, ea, nb, regs);
 			if (err)
 				return 0;
-			if (nb < 4)	/* left-justify last bytes */
-				regs->gpr[rd] <<= 32 - 8 * nb;
+			if (unlikely(cross_endian))
+				v32 = byterev_4(v32);
+			regs->gpr[rd] = v32;
 			ea += 4;
 			++rd;
 		}
@@ -2816,7 +2883,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case STORE:
 #ifdef __powerpc64__
 		if (size == 16) {
-			err = emulate_stq(regs, ea, op.reg);
+			err = emulate_stq(regs, ea, op.reg, cross_endian);
 			goto ldst_done;
 		}
 #endif
@@ -2827,6 +2894,8 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			err = handle_stack_update(ea, regs);
 			goto ldst_done;
 		}
+		if (unlikely(cross_endian))
+			do_byterev(&op.val, size);
 		err = write_mem(op.val, ea, size, regs);
 		goto ldst_done;
 
@@ -2834,14 +2903,14 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	case STORE_FP:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		err = do_fp_store(op.reg, ea, size, regs);
+		err = do_fp_store(op.reg, ea, size, regs, cross_endian);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case STORE_VMX:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_store(op.reg, ea, size, regs);
+		err = do_vec_store(op.reg, ea, size, regs, cross_endian);
 		goto ldst_done;
 #endif
 #ifdef CONFIG_VSX
@@ -2856,22 +2925,23 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			msrbit = MSR_VEC;
 		if (!(regs->msr & MSR_PR) && !(regs->msr & msrbit))
 			return 0;
-		err = do_vsx_store(&op, ea, regs);
+		err = do_vsx_store(&op, ea, regs, cross_endian);
 		goto ldst_done;
 	}
 #endif
 	case STORE_MULTI:
-		if (regs->msr & MSR_LE)
-			return 0;
+		if (!address_ok(regs, ea, size))
+			return -EFAULT;
 		rd = op.reg;
 		for (i = 0; i < size; i += 4) {
-			val = regs->gpr[rd];
+			unsigned int v32 = regs->gpr[rd];
+
 			nb = size - i;
 			if (nb > 4)
 				nb = 4;
-			else
-				val >>= 32 - 8 * nb;
-			err = write_mem(val, ea, nb, regs);
+			if (unlikely(cross_endian))
+				v32 = byterev_4(v32);
+			err = copy_mem_out((u8 *) &v32, ea, nb, regs);
 			if (err)
 				return 0;
 			ea += 4;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 16/17] powerpc: Separate out load/store emulation into its own function
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (14 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 15/17] powerpc: Handle opposite-endian processes in emulation code Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  4:12 ` [PATCH v3 17/17] powerpc: Use instruction emulation infrastructure to handle alignment faults Paul Mackerras
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This moves the parts of emulate_step() that deal with emulating
load and store instructions into a new function called
emulate_loadstore().  This is to make it possible to reuse this
code in the alignment handler.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |   9 ++
 arch/powerpc/lib/sstep.c         | 258 ++++++++++++++++++++++-----------------
 2 files changed, 154 insertions(+), 113 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 958c2c5..309d1c5 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -152,6 +152,15 @@ void emulate_update_regs(struct pt_regs *reg, struct instruction_op *op);
  */
 extern int emulate_step(struct pt_regs *regs, unsigned int instr);
 
+/*
+ * Emulate a load or store instruction by reading/writing the
+ * memory of the current process.  FP/VMX/VSX registers are assumed
+ * to hold live values if the appropriate enable bit in regs->msr is
+ * set; otherwise this will use the saved values in the thread struct
+ * for user-mode accesses.
+ */
+extern int emulate_loadstore(struct pt_regs *regs, struct instruction_op *op);
+
 extern void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 			     const void *mem, bool cross_endian);
 extern void emulate_vsx_store(struct instruction_op *op,
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 810b5f2..24031ca 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2667,76 +2667,35 @@ void emulate_update_regs(struct pt_regs *regs, struct instruction_op *op)
 }
 
 /*
- * Emulate instructions that cause a transfer of control,
- * loads and stores, and a few other instructions.
- * Returns 1 if the step was emulated, 0 if not,
- * or -1 if the instruction is one that should not be stepped,
- * such as an rfid, or a mtmsrd that would clear MSR_RI.
+ * Emulate a previously-analysed load or store instruction.
+ * Return values are:
+ * 0 = instruction emulated successfully
+ * -EFAULT = address out of range or access faulted (regs->dar
+ *	     contains the faulting address)
+ * -EACCES = misaligned access, instruction requires alignment
+ * -EINVAL = unknown operation in *op
  */
-int emulate_step(struct pt_regs *regs, unsigned int instr)
+int emulate_loadstore(struct pt_regs *regs, struct instruction_op *op)
 {
-	struct instruction_op op;
-	int r, err, size, type;
-	unsigned long val;
-	unsigned int cr;
+	int err, size, type;
 	int i, rd, nb;
+	unsigned int cr;
+	unsigned long val;
 	unsigned long ea;
 	bool cross_endian;
 
-	r = analyse_instr(&op, regs, instr);
-	if (r < 0)
-		return r;
-	if (r > 0) {
-		emulate_update_regs(regs, &op);
-		return 0;
-	}
-
 	err = 0;
-	size = GETSIZE(op.type);
-	type = op.type & INSTR_TYPE_MASK;
+	size = GETSIZE(op->type);
+	type = op->type & INSTR_TYPE_MASK;
 	cross_endian = (regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE);
-
-	ea = op.ea;
-	if (OP_IS_LOAD_STORE(type) || type == CACHEOP)
-		ea = truncate_if_32bit(regs->msr, op.ea);
+	ea = truncate_if_32bit(regs->msr, op->ea);
 
 	switch (type) {
-	case CACHEOP:
-		if (!address_ok(regs, ea, 8))
-			return 0;
-		switch (op.type & CACHEOP_MASK) {
-		case DCBST:
-			__cacheop_user_asmx(ea, err, "dcbst");
-			break;
-		case DCBF:
-			__cacheop_user_asmx(ea, err, "dcbf");
-			break;
-		case DCBTST:
-			if (op.reg == 0)
-				prefetchw((void *) ea);
-			break;
-		case DCBT:
-			if (op.reg == 0)
-				prefetch((void *) ea);
-			break;
-		case ICBI:
-			__cacheop_user_asmx(ea, err, "icbi");
-			break;
-		case DCBZ:
-			err = emulate_dcbz(ea, regs);
-			break;
-		}
-		if (err) {
-			regs->dar = ea;
-			return 0;
-		}
-		goto instr_done;
-
 	case LARX:
 		if (ea & (size - 1))
-			break;		/* can't handle misaligned */
+			return -EACCES;		/* can't handle misaligned */
 		if (!address_ok(regs, ea, size))
-			return 0;
+			return -EFAULT;
 		err = 0;
 		switch (size) {
 #ifdef __powerpc64__
@@ -2755,49 +2714,49 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 			__get_user_asmx(val, ea, err, "ldarx");
 			break;
 		case 16:
-			err = do_lqarx(ea, &regs->gpr[op.reg]);
+			err = do_lqarx(ea, &regs->gpr[op->reg]);
 			break;
 #endif
 		default:
-			return 0;
+			return -EINVAL;
 		}
 		if (err) {
 			regs->dar = ea;
-			return 0;
+			break;
 		}
 		if (size < 16)
-			regs->gpr[op.reg] = val;
-		goto ldst_done;
+			regs->gpr[op->reg] = val;
+		break;
 
 	case STCX:
 		if (ea & (size - 1))
-			break;		/* can't handle misaligned */
+			return -EACCES;		/* can't handle misaligned */
 		if (!address_ok(regs, ea, size))
-			return 0;
+			return -EFAULT;
 		err = 0;
 		switch (size) {
 #ifdef __powerpc64__
 		case 1:
-			__put_user_asmx(op.val, ea, err, "stbcx.", cr);
+			__put_user_asmx(op->val, ea, err, "stbcx.", cr);
 			break;
 		case 2:
-			__put_user_asmx(op.val, ea, err, "stbcx.", cr);
+			__put_user_asmx(op->val, ea, err, "stbcx.", cr);
 			break;
 #endif
 		case 4:
-			__put_user_asmx(op.val, ea, err, "stwcx.", cr);
+			__put_user_asmx(op->val, ea, err, "stwcx.", cr);
 			break;
 #ifdef __powerpc64__
 		case 8:
-			__put_user_asmx(op.val, ea, err, "stdcx.", cr);
+			__put_user_asmx(op->val, ea, err, "stdcx.", cr);
 			break;
 		case 16:
-			err = do_stqcx(ea, regs->gpr[op.reg],
-				       regs->gpr[op.reg + 1], &cr);
+			err = do_stqcx(ea, regs->gpr[op->reg],
+				       regs->gpr[op->reg + 1], &cr);
 			break;
 #endif
 		default:
-			return 0;
+			return -EINVAL;
 		}
 		if (!err)
 			regs->ccr = (regs->ccr & 0x0fffffff) |
@@ -2805,23 +2764,23 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 				((regs->xer >> 3) & 0x10000000);
 		else
 			regs->dar = ea;
-		goto ldst_done;
+		break;
 
 	case LOAD:
 #ifdef __powerpc64__
 		if (size == 16) {
-			err = emulate_lq(regs, ea, op.reg, cross_endian);
-			goto ldst_done;
+			err = emulate_lq(regs, ea, op->reg, cross_endian);
+			break;
 		}
 #endif
-		err = read_mem(&regs->gpr[op.reg], ea, size, regs);
+		err = read_mem(&regs->gpr[op->reg], ea, size, regs);
 		if (!err) {
-			if (op.type & SIGNEXT)
-				do_signext(&regs->gpr[op.reg], size);
-			if ((op.type & BYTEREV) == (cross_endian ? 0 : BYTEREV))
-				do_byterev(&regs->gpr[op.reg], size);
+			if (op->type & SIGNEXT)
+				do_signext(&regs->gpr[op->reg], size);
+			if ((op->type & BYTEREV) == (cross_endian ? 0 : BYTEREV))
+				do_byterev(&regs->gpr[op->reg], size);
 		}
-		goto ldst_done;
+		break;
 
 #ifdef CONFIG_PPC_FPU
 	case LOAD_FP:
@@ -2833,15 +2792,15 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 */
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		err = do_fp_load(op.reg, ea, size, regs, cross_endian);
-		goto ldst_done;
+		err = do_fp_load(op->reg, ea, size, regs, cross_endian);
+		break;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case LOAD_VMX:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_load(op.reg, ea, size, regs, cross_endian);
-		goto ldst_done;
+		err = do_vec_load(op->reg, ea, size, regs, cross_endian);
+		break;
 #endif
 #ifdef CONFIG_VSX
 	case LOAD_VSX: {
@@ -2851,18 +2810,18 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 * Some VSX instructions check the MSR_VEC bit rather than MSR_VSX
 		 * when the target of the instruction is a vector register.
 		 */
-		if (op.reg >= 32 && (op.vsx_flags & VSX_CHECK_VEC))
+		if (op->reg >= 32 && (op->vsx_flags & VSX_CHECK_VEC))
 			msrbit = MSR_VEC;
 		if (!(regs->msr & MSR_PR) && !(regs->msr & msrbit))
 			return 0;
-		err = do_vsx_load(&op, ea, regs, cross_endian);
-		goto ldst_done;
+		err = do_vsx_load(op, ea, regs, cross_endian);
+		break;
 	}
 #endif
 	case LOAD_MULTI:
 		if (!address_ok(regs, ea, size))
 			return -EFAULT;
-		rd = op.reg;
+		rd = op->reg;
 		for (i = 0; i < size; i += 4) {
 			unsigned int v32 = 0;
 
@@ -2871,47 +2830,47 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 				nb = 4;
 			err = copy_mem_in((u8 *) &v32, ea, nb, regs);
 			if (err)
-				return 0;
+				break;
 			if (unlikely(cross_endian))
 				v32 = byterev_4(v32);
 			regs->gpr[rd] = v32;
 			ea += 4;
 			++rd;
 		}
-		goto instr_done;
+		break;
 
 	case STORE:
 #ifdef __powerpc64__
 		if (size == 16) {
-			err = emulate_stq(regs, ea, op.reg, cross_endian);
-			goto ldst_done;
+			err = emulate_stq(regs, ea, op->reg, cross_endian);
+			break;
 		}
 #endif
-		if ((op.type & UPDATE) && size == sizeof(long) &&
-		    op.reg == 1 && op.update_reg == 1 &&
+		if ((op->type & UPDATE) && size == sizeof(long) &&
+		    op->reg == 1 && op->update_reg == 1 &&
 		    !(regs->msr & MSR_PR) &&
 		    ea >= regs->gpr[1] - STACK_INT_FRAME_SIZE) {
 			err = handle_stack_update(ea, regs);
-			goto ldst_done;
+			break;
 		}
 		if (unlikely(cross_endian))
-			do_byterev(&op.val, size);
-		err = write_mem(op.val, ea, size, regs);
-		goto ldst_done;
+			do_byterev(&op->val, size);
+		err = write_mem(op->val, ea, size, regs);
+		break;
 
 #ifdef CONFIG_PPC_FPU
 	case STORE_FP:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		err = do_fp_store(op.reg, ea, size, regs, cross_endian);
-		goto ldst_done;
+		err = do_fp_store(op->reg, ea, size, regs, cross_endian);
+		break;
 #endif
 #ifdef CONFIG_ALTIVEC
 	case STORE_VMX:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_VEC))
 			return 0;
-		err = do_vec_store(op.reg, ea, size, regs, cross_endian);
-		goto ldst_done;
+		err = do_vec_store(op->reg, ea, size, regs, cross_endian);
+		break;
 #endif
 #ifdef CONFIG_VSX
 	case STORE_VSX: {
@@ -2921,18 +2880,18 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 * Some VSX instructions check the MSR_VEC bit rather than MSR_VSX
 		 * when the target of the instruction is a vector register.
 		 */
-		if (op.reg >= 32 && (op.vsx_flags & VSX_CHECK_VEC))
+		if (op->reg >= 32 && (op->vsx_flags & VSX_CHECK_VEC))
 			msrbit = MSR_VEC;
 		if (!(regs->msr & MSR_PR) && !(regs->msr & msrbit))
 			return 0;
-		err = do_vsx_store(&op, ea, regs, cross_endian);
-		goto ldst_done;
+		err = do_vsx_store(op, ea, regs, cross_endian);
+		break;
 	}
 #endif
 	case STORE_MULTI:
 		if (!address_ok(regs, ea, size))
 			return -EFAULT;
-		rd = op.reg;
+		rd = op->reg;
 		for (i = 0; i < size; i += 4) {
 			unsigned int v32 = regs->gpr[rd];
 
@@ -2943,10 +2902,89 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 				v32 = byterev_4(v32);
 			err = copy_mem_out((u8 *) &v32, ea, nb, regs);
 			if (err)
-				return 0;
+				break;
 			ea += 4;
 			++rd;
 		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	if (err)
+		return err;
+
+	if (op->type & UPDATE)
+		regs->gpr[op->update_reg] = op->ea;
+
+	return 0;
+}
+NOKPROBE_SYMBOL(emulate_loadstore);
+
+/*
+ * Emulate instructions that cause a transfer of control,
+ * loads and stores, and a few other instructions.
+ * Returns 1 if the step was emulated, 0 if not,
+ * or -1 if the instruction is one that should not be stepped,
+ * such as an rfid, or a mtmsrd that would clear MSR_RI.
+ */
+int emulate_step(struct pt_regs *regs, unsigned int instr)
+{
+	struct instruction_op op;
+	int r, err, type;
+	unsigned long val;
+	unsigned long ea;
+
+	r = analyse_instr(&op, regs, instr);
+	if (r < 0)
+		return r;
+	if (r > 0) {
+		emulate_update_regs(regs, &op);
+		return 0;
+	}
+
+	err = 0;
+	type = op.type & INSTR_TYPE_MASK;
+
+	if (OP_IS_LOAD_STORE(type)) {
+		err = emulate_loadstore(regs, &op);
+		if (err)
+			return 0;
+		goto instr_done;
+	}
+
+	switch (type) {
+	case CACHEOP:
+		ea = truncate_if_32bit(regs->msr, op.ea);
+		if (!address_ok(regs, ea, 8))
+			return 0;
+		switch (op.type & CACHEOP_MASK) {
+		case DCBST:
+			__cacheop_user_asmx(ea, err, "dcbst");
+			break;
+		case DCBF:
+			__cacheop_user_asmx(ea, err, "dcbf");
+			break;
+		case DCBTST:
+			if (op.reg == 0)
+				prefetchw((void *) ea);
+			break;
+		case DCBT:
+			if (op.reg == 0)
+				prefetch((void *) ea);
+			break;
+		case ICBI:
+			__cacheop_user_asmx(ea, err, "icbi");
+			break;
+		case DCBZ:
+			err = emulate_dcbz(ea, regs);
+			break;
+		}
+		if (err) {
+			regs->dar = ea;
+			return 0;
+		}
 		goto instr_done;
 
 	case MFMSR:
@@ -2989,12 +3027,6 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 	}
 	return 0;
 
- ldst_done:
-	if (err)
-		return 0;
-	if (op.type & UPDATE)
-		regs->gpr[op.update_reg] = op.ea;
-
  instr_done:
 	regs->nip = truncate_if_32bit(regs->msr, regs->nip + 4);
 	return 1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 17/17] powerpc: Use instruction emulation infrastructure to handle alignment faults
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (15 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 16/17] powerpc: Separate out load/store emulation into its own function Paul Mackerras
@ 2017-08-30  4:12 ` Paul Mackerras
  2017-08-30  6:34 ` [PATCH v3 18/17] powerpc: Emulate load/store floating point as integer word instructions Paul Mackerras
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  4:12 UTC (permalink / raw)
  To: linuxppc-dev

This replaces almost all of the instruction emulation code in
fix_alignment() with calls to analyse_instr(), emulate_loadstore()
and emulate_dcbz().  The only emulation code left is the SPE
emulation code; analyse_instr() etc. do not handle SPE instructions
at present.

One result of this is that we can now handle alignment faults on
all the new VSX load and store instructions that were added in POWER9.
VSX loads/stores will take alignment faults for unaligned accesses
to cache-inhibited memory.

Another effect is that we no longer rely on the DAR and DSISR values
set by the processor.

With this, we now need to include the instruction emulation code
unconditionally.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/Kconfig        |   4 -
 arch/powerpc/kernel/align.c | 774 ++------------------------------------------
 arch/powerpc/lib/Makefile   |   4 +-
 3 files changed, 34 insertions(+), 748 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bf6abab..9fc3c0b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -367,10 +367,6 @@ config PPC_ADV_DEBUG_DAC_RANGE
 	depends on PPC_ADV_DEBUG_REGS && 44x
 	default y
 
-config PPC_EMULATE_SSTEP
-	bool
-	default y if KPROBES || UPROBES || XMON || HAVE_HW_BREAKPOINT
-
 config ZONE_DMA32
 	bool
 	default y if PPC64
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index ec7a8b0..26b9994 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -27,6 +27,7 @@
 #include <asm/switch_to.h>
 #include <asm/disassemble.h>
 #include <asm/cpu_has_feature.h>
+#include <asm/sstep.h>
 
 struct aligninfo {
 	unsigned char len;
@@ -40,364 +41,9 @@ struct aligninfo {
 #define LD	0	/* load */
 #define ST	1	/* store */
 #define SE	2	/* sign-extend value, or FP ld/st as word */
-#define F	4	/* to/from fp regs */
-#define U	8	/* update index register */
-#define M	0x10	/* multiple load/store */
 #define SW	0x20	/* byte swap */
-#define S	0x40	/* single-precision fp or... */
-#define SX	0x40	/* ... byte count in XER */
-#define HARD	0x80	/* string, stwcx. */
 #define E4	0x40	/* SPE endianness is word */
 #define E8	0x80	/* SPE endianness is double word */
-#define SPLT	0x80	/* VSX SPLAT load */
-
-/* DSISR bits reported for a DCBZ instruction: */
-#define DCBZ	0x5f	/* 8xx/82xx dcbz faults when cache not enabled */
-
-/*
- * The PowerPC stores certain bits of the instruction that caused the
- * alignment exception in the DSISR register.  This array maps those
- * bits to information about the operand length and what the
- * instruction would do.
- */
-static struct aligninfo aligninfo[128] = {
-	{ 4, LD },		/* 00 0 0000: lwz / lwarx */
-	INVALID,		/* 00 0 0001 */
-	{ 4, ST },		/* 00 0 0010: stw */
-	INVALID,		/* 00 0 0011 */
-	{ 2, LD },		/* 00 0 0100: lhz */
-	{ 2, LD+SE },		/* 00 0 0101: lha */
-	{ 2, ST },		/* 00 0 0110: sth */
-	{ 4, LD+M },		/* 00 0 0111: lmw */
-	{ 4, LD+F+S },		/* 00 0 1000: lfs */
-	{ 8, LD+F },		/* 00 0 1001: lfd */
-	{ 4, ST+F+S },		/* 00 0 1010: stfs */
-	{ 8, ST+F },		/* 00 0 1011: stfd */
-	{ 16, LD },		/* 00 0 1100: lq */
-	{ 8, LD },		/* 00 0 1101: ld/ldu/lwa */
-	INVALID,		/* 00 0 1110 */
-	{ 8, ST },		/* 00 0 1111: std/stdu */
-	{ 4, LD+U },		/* 00 1 0000: lwzu */
-	INVALID,		/* 00 1 0001 */
-	{ 4, ST+U },		/* 00 1 0010: stwu */
-	INVALID,		/* 00 1 0011 */
-	{ 2, LD+U },		/* 00 1 0100: lhzu */
-	{ 2, LD+SE+U },		/* 00 1 0101: lhau */
-	{ 2, ST+U },		/* 00 1 0110: sthu */
-	{ 4, ST+M },		/* 00 1 0111: stmw */
-	{ 4, LD+F+S+U },	/* 00 1 1000: lfsu */
-	{ 8, LD+F+U },		/* 00 1 1001: lfdu */
-	{ 4, ST+F+S+U },	/* 00 1 1010: stfsu */
-	{ 8, ST+F+U },		/* 00 1 1011: stfdu */
-	{ 16, LD+F },		/* 00 1 1100: lfdp */
-	INVALID,		/* 00 1 1101 */
-	{ 16, ST+F },		/* 00 1 1110: stfdp */
-	INVALID,		/* 00 1 1111 */
-	{ 8, LD },		/* 01 0 0000: ldx */
-	INVALID,		/* 01 0 0001 */
-	{ 8, ST },		/* 01 0 0010: stdx */
-	INVALID,		/* 01 0 0011 */
-	INVALID,		/* 01 0 0100 */
-	{ 4, LD+SE },		/* 01 0 0101: lwax */
-	INVALID,		/* 01 0 0110 */
-	INVALID,		/* 01 0 0111 */
-	{ 4, LD+M+HARD+SX },	/* 01 0 1000: lswx */
-	{ 4, LD+M+HARD },	/* 01 0 1001: lswi */
-	{ 4, ST+M+HARD+SX },	/* 01 0 1010: stswx */
-	{ 4, ST+M+HARD },	/* 01 0 1011: stswi */
-	INVALID,		/* 01 0 1100 */
-	{ 8, LD+U },		/* 01 0 1101: ldu */
-	INVALID,		/* 01 0 1110 */
-	{ 8, ST+U },		/* 01 0 1111: stdu */
-	{ 8, LD+U },		/* 01 1 0000: ldux */
-	INVALID,		/* 01 1 0001 */
-	{ 8, ST+U },		/* 01 1 0010: stdux */
-	INVALID,		/* 01 1 0011 */
-	INVALID,		/* 01 1 0100 */
-	{ 4, LD+SE+U },		/* 01 1 0101: lwaux */
-	INVALID,		/* 01 1 0110 */
-	INVALID,		/* 01 1 0111 */
-	INVALID,		/* 01 1 1000 */
-	INVALID,		/* 01 1 1001 */
-	INVALID,		/* 01 1 1010 */
-	INVALID,		/* 01 1 1011 */
-	INVALID,		/* 01 1 1100 */
-	INVALID,		/* 01 1 1101 */
-	INVALID,		/* 01 1 1110 */
-	INVALID,		/* 01 1 1111 */
-	INVALID,		/* 10 0 0000 */
-	INVALID,		/* 10 0 0001 */
-	INVALID,		/* 10 0 0010: stwcx. */
-	INVALID,		/* 10 0 0011 */
-	INVALID,		/* 10 0 0100 */
-	INVALID,		/* 10 0 0101 */
-	INVALID,		/* 10 0 0110 */
-	INVALID,		/* 10 0 0111 */
-	{ 4, LD+SW },		/* 10 0 1000: lwbrx */
-	INVALID,		/* 10 0 1001 */
-	{ 4, ST+SW },		/* 10 0 1010: stwbrx */
-	INVALID,		/* 10 0 1011 */
-	{ 2, LD+SW },		/* 10 0 1100: lhbrx */
-	{ 4, LD+SE },		/* 10 0 1101  lwa */
-	{ 2, ST+SW },		/* 10 0 1110: sthbrx */
-	{ 16, ST },		/* 10 0 1111: stq */
-	INVALID,		/* 10 1 0000 */
-	INVALID,		/* 10 1 0001 */
-	INVALID,		/* 10 1 0010 */
-	INVALID,		/* 10 1 0011 */
-	INVALID,		/* 10 1 0100 */
-	INVALID,		/* 10 1 0101 */
-	INVALID,		/* 10 1 0110 */
-	INVALID,		/* 10 1 0111 */
-	INVALID,		/* 10 1 1000 */
-	INVALID,		/* 10 1 1001 */
-	INVALID,		/* 10 1 1010 */
-	INVALID,		/* 10 1 1011 */
-	INVALID,		/* 10 1 1100 */
-	INVALID,		/* 10 1 1101 */
-	INVALID,		/* 10 1 1110 */
-	{ 0, ST+HARD },		/* 10 1 1111: dcbz */
-	{ 4, LD },		/* 11 0 0000: lwzx */
-	INVALID,		/* 11 0 0001 */
-	{ 4, ST },		/* 11 0 0010: stwx */
-	INVALID,		/* 11 0 0011 */
-	{ 2, LD },		/* 11 0 0100: lhzx */
-	{ 2, LD+SE },		/* 11 0 0101: lhax */
-	{ 2, ST },		/* 11 0 0110: sthx */
-	INVALID,		/* 11 0 0111 */
-	{ 4, LD+F+S },		/* 11 0 1000: lfsx */
-	{ 8, LD+F },		/* 11 0 1001: lfdx */
-	{ 4, ST+F+S },		/* 11 0 1010: stfsx */
-	{ 8, ST+F },		/* 11 0 1011: stfdx */
-	{ 16, LD+F },		/* 11 0 1100: lfdpx */
-	{ 4, LD+F+SE },		/* 11 0 1101: lfiwax */
-	{ 16, ST+F },		/* 11 0 1110: stfdpx */
-	{ 4, ST+F },		/* 11 0 1111: stfiwx */
-	{ 4, LD+U },		/* 11 1 0000: lwzux */
-	INVALID,		/* 11 1 0001 */
-	{ 4, ST+U },		/* 11 1 0010: stwux */
-	INVALID,		/* 11 1 0011 */
-	{ 2, LD+U },		/* 11 1 0100: lhzux */
-	{ 2, LD+SE+U },		/* 11 1 0101: lhaux */
-	{ 2, ST+U },		/* 11 1 0110: sthux */
-	INVALID,		/* 11 1 0111 */
-	{ 4, LD+F+S+U },	/* 11 1 1000: lfsux */
-	{ 8, LD+F+U },		/* 11 1 1001: lfdux */
-	{ 4, ST+F+S+U },	/* 11 1 1010: stfsux */
-	{ 8, ST+F+U },		/* 11 1 1011: stfdux */
-	INVALID,		/* 11 1 1100 */
-	{ 4, LD+F },		/* 11 1 1101: lfiwzx */
-	INVALID,		/* 11 1 1110 */
-	INVALID,		/* 11 1 1111 */
-};
-
-/*
- * The dcbz (data cache block zero) instruction
- * gives an alignment fault if used on non-cacheable
- * memory.  We handle the fault mainly for the
- * case when we are running with the cache disabled
- * for debugging.
- */
-static int emulate_dcbz(struct pt_regs *regs, unsigned char __user *addr)
-{
-	long __user *p;
-	int i, size;
-
-#ifdef __powerpc64__
-	size = ppc64_caches.l1d.block_size;
-#else
-	size = L1_CACHE_BYTES;
-#endif
-	p = (long __user *) (regs->dar & -size);
-	if (user_mode(regs) && !access_ok(VERIFY_WRITE, p, size))
-		return -EFAULT;
-	for (i = 0; i < size / sizeof(long); ++i)
-		if (__put_user_inatomic(0, p+i))
-			return -EFAULT;
-	return 1;
-}
-
-/*
- * Emulate load & store multiple instructions
- * On 64-bit machines, these instructions only affect/use the
- * bottom 4 bytes of each register, and the loads clear the
- * top 4 bytes of the affected register.
- */
-#ifdef __BIG_ENDIAN__
-#ifdef CONFIG_PPC64
-#define REG_BYTE(rp, i)		*((u8 *)((rp) + ((i) >> 2)) + ((i) & 3) + 4)
-#else
-#define REG_BYTE(rp, i)		*((u8 *)(rp) + (i))
-#endif
-#else
-#define REG_BYTE(rp, i)		(*(((u8 *)((rp) + ((i)>>2)) + ((i)&3))))
-#endif
-
-#define SWIZ_PTR(p)		((unsigned char __user *)((p) ^ swiz))
-
-static int emulate_multiple(struct pt_regs *regs, unsigned char __user *addr,
-			    unsigned int reg, unsigned int nb,
-			    unsigned int flags, unsigned int instr,
-			    unsigned long swiz)
-{
-	unsigned long *rptr;
-	unsigned int nb0, i, bswiz;
-	unsigned long p;
-
-	/*
-	 * We do not try to emulate 8 bytes multiple as they aren't really
-	 * available in our operating environments and we don't try to
-	 * emulate multiples operations in kernel land as they should never
-	 * be used/generated there at least not on unaligned boundaries
-	 */
-	if (unlikely((nb > 4) || !user_mode(regs)))
-		return 0;
-
-	/* lmw, stmw, lswi/x, stswi/x */
-	nb0 = 0;
-	if (flags & HARD) {
-		if (flags & SX) {
-			nb = regs->xer & 127;
-			if (nb == 0)
-				return 1;
-		} else {
-			unsigned long pc = regs->nip ^ (swiz & 4);
-
-			if (__get_user_inatomic(instr,
-						(unsigned int __user *)pc))
-				return -EFAULT;
-			if (swiz == 0 && (flags & SW))
-				instr = cpu_to_le32(instr);
-			nb = (instr >> 11) & 0x1f;
-			if (nb == 0)
-				nb = 32;
-		}
-		if (nb + reg * 4 > 128) {
-			nb0 = nb + reg * 4 - 128;
-			nb = 128 - reg * 4;
-		}
-#ifdef __LITTLE_ENDIAN__
-		/*
-		 *  String instructions are endian neutral but the code
-		 *  below is not.  Force byte swapping on so that the
-		 *  effects of swizzling are undone in the load/store
-		 *  loops below.
-		 */
-		flags ^= SW;
-#endif
-	} else {
-		/* lwm, stmw */
-		nb = (32 - reg) * 4;
-	}
-
-	if (!access_ok((flags & ST ? VERIFY_WRITE: VERIFY_READ), addr, nb+nb0))
-		return -EFAULT;	/* bad address */
-
-	rptr = &regs->gpr[reg];
-	p = (unsigned long) addr;
-	bswiz = (flags & SW)? 3: 0;
-
-	if (!(flags & ST)) {
-		/*
-		 * This zeroes the top 4 bytes of the affected registers
-		 * in 64-bit mode, and also zeroes out any remaining
-		 * bytes of the last register for lsw*.
-		 */
-		memset(rptr, 0, ((nb + 3) / 4) * sizeof(unsigned long));
-		if (nb0 > 0)
-			memset(&regs->gpr[0], 0,
-			       ((nb0 + 3) / 4) * sizeof(unsigned long));
-
-		for (i = 0; i < nb; ++i, ++p)
-			if (__get_user_inatomic(REG_BYTE(rptr, i ^ bswiz),
-						SWIZ_PTR(p)))
-				return -EFAULT;
-		if (nb0 > 0) {
-			rptr = &regs->gpr[0];
-			addr += nb;
-			for (i = 0; i < nb0; ++i, ++p)
-				if (__get_user_inatomic(REG_BYTE(rptr,
-								 i ^ bswiz),
-							SWIZ_PTR(p)))
-					return -EFAULT;
-		}
-
-	} else {
-		for (i = 0; i < nb; ++i, ++p)
-			if (__put_user_inatomic(REG_BYTE(rptr, i ^ bswiz),
-						SWIZ_PTR(p)))
-				return -EFAULT;
-		if (nb0 > 0) {
-			rptr = &regs->gpr[0];
-			addr += nb;
-			for (i = 0; i < nb0; ++i, ++p)
-				if (__put_user_inatomic(REG_BYTE(rptr,
-								 i ^ bswiz),
-							SWIZ_PTR(p)))
-					return -EFAULT;
-		}
-	}
-	return 1;
-}
-
-/*
- * Emulate floating-point pair loads and stores.
- * Only POWER6 has these instructions, and it does true little-endian,
- * so we don't need the address swizzling.
- */
-static int emulate_fp_pair(unsigned char __user *addr, unsigned int reg,
-			   unsigned int flags)
-{
-	char *ptr0 = (char *) &current->thread.TS_FPR(reg);
-	char *ptr1 = (char *) &current->thread.TS_FPR(reg+1);
-	int i, ret, sw = 0;
-
-	if (reg & 1)
-		return 0;	/* invalid form: FRS/FRT must be even */
-	if (flags & SW)
-		sw = 7;
-	ret = 0;
-	for (i = 0; i < 8; ++i) {
-		if (!(flags & ST)) {
-			ret |= __get_user(ptr0[i^sw], addr + i);
-			ret |= __get_user(ptr1[i^sw], addr + i + 8);
-		} else {
-			ret |= __put_user(ptr0[i^sw], addr + i);
-			ret |= __put_user(ptr1[i^sw], addr + i + 8);
-		}
-	}
-	if (ret)
-		return -EFAULT;
-	return 1;	/* exception handled and fixed up */
-}
-
-#ifdef CONFIG_PPC64
-static int emulate_lq_stq(struct pt_regs *regs, unsigned char __user *addr,
-			  unsigned int reg, unsigned int flags)
-{
-	char *ptr0 = (char *)&regs->gpr[reg];
-	char *ptr1 = (char *)&regs->gpr[reg+1];
-	int i, ret, sw = 0;
-
-	if (reg & 1)
-		return 0;	/* invalid form: GPR must be even */
-	if (flags & SW)
-		sw = 7;
-	ret = 0;
-	for (i = 0; i < 8; ++i) {
-		if (!(flags & ST)) {
-			ret |= __get_user(ptr0[i^sw], addr + i);
-			ret |= __get_user(ptr1[i^sw], addr + i + 8);
-		} else {
-			ret |= __put_user(ptr0[i^sw], addr + i);
-			ret |= __put_user(ptr1[i^sw], addr + i + 8);
-		}
-	}
-	if (ret)
-		return -EFAULT;
-	return 1;	/* exception handled and fixed up */
-}
-#endif /* CONFIG_PPC64 */
 
 #ifdef CONFIG_SPE
 
@@ -636,133 +282,21 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg,
 }
 #endif /* CONFIG_SPE */
 
-#ifdef CONFIG_VSX
-/*
- * Emulate VSX instructions...
- */
-static int emulate_vsx(unsigned char __user *addr, unsigned int reg,
-		       unsigned int areg, struct pt_regs *regs,
-		       unsigned int flags, unsigned int length,
-		       unsigned int elsize)
-{
-	char *ptr;
-	unsigned long *lptr;
-	int ret = 0;
-	int sw = 0;
-	int i, j;
-
-	/* userland only */
-	if (unlikely(!user_mode(regs)))
-		return 0;
-
-	flush_vsx_to_thread(current);
-
-	if (reg < 32)
-		ptr = (char *) &current->thread.fp_state.fpr[reg][0];
-	else
-		ptr = (char *) &current->thread.vr_state.vr[reg - 32];
-
-	lptr = (unsigned long *) ptr;
-
-#ifdef __LITTLE_ENDIAN__
-	if (flags & SW) {
-		elsize = length;
-		sw = length-1;
-	} else {
-		/*
-		 * The elements are BE ordered, even in LE mode, so process
-		 * them in reverse order.
-		 */
-		addr += length - elsize;
-
-		/* 8 byte memory accesses go in the top 8 bytes of the VR */
-		if (length == 8)
-			ptr += 8;
-	}
-#else
-	if (flags & SW)
-		sw = elsize-1;
-#endif
-
-	for (j = 0; j < length; j += elsize) {
-		for (i = 0; i < elsize; ++i) {
-			if (flags & ST)
-				ret |= __put_user(ptr[i^sw], addr + i);
-			else
-				ret |= __get_user(ptr[i^sw], addr + i);
-		}
-		ptr  += elsize;
-#ifdef __LITTLE_ENDIAN__
-		addr -= elsize;
-#else
-		addr += elsize;
-#endif
-	}
-
-#ifdef __BIG_ENDIAN__
-#define VSX_HI 0
-#define VSX_LO 1
-#else
-#define VSX_HI 1
-#define VSX_LO 0
-#endif
-
-	if (!ret) {
-		if (flags & U)
-			regs->gpr[areg] = regs->dar;
-
-		/* Splat load copies the same data to top and bottom 8 bytes */
-		if (flags & SPLT)
-			lptr[VSX_LO] = lptr[VSX_HI];
-		/* For 8 byte loads, zero the low 8 bytes */
-		else if (!(flags & ST) && (8 == length))
-			lptr[VSX_LO] = 0;
-	} else
-		return -EFAULT;
-
-	return 1;
-}
-#endif
-
 /*
  * Called on alignment exception. Attempts to fixup
  *
  * Return 1 on success
  * Return 0 if unable to handle the interrupt
  * Return -EFAULT if data address is bad
+ * Other negative return values indicate that the instruction can't
+ * be emulated, and the process should be given a SIGBUS.
  */
 
 int fix_alignment(struct pt_regs *regs)
 {
-	unsigned int instr, nb, flags, instruction = 0;
-	unsigned int reg, areg;
-	unsigned int dsisr;
-	unsigned char __user *addr;
-	unsigned long p, swiz;
-	int ret, i;
-	union data {
-		u64 ll;
-		double dd;
-		unsigned char v[8];
-		struct {
-#ifdef __LITTLE_ENDIAN__
-			int	 low32;
-			unsigned hi32;
-#else
-			unsigned hi32;
-			int	 low32;
-#endif
-		} x32;
-		struct {
-#ifdef __LITTLE_ENDIAN__
-			short	      low16;
-			unsigned char hi48[6];
-#else
-			unsigned char hi48[6];
-			short	      low16;
-#endif
-		} x16;
-	} data;
+	unsigned int instr;
+	struct instruction_op op;
+	int r, type;
 
 	/*
 	 * We require a complete register set, if not, then our assembly
@@ -770,121 +304,23 @@ int fix_alignment(struct pt_regs *regs)
 	 */
 	CHECK_FULL_REGS(regs);
 
-	dsisr = regs->dsisr;
-
-	/* Some processors don't provide us with a DSISR we can use here,
-	 * let's make one up from the instruction
-	 */
-	if (cpu_has_feature(CPU_FTR_NODSISRALIGN)) {
-		unsigned long pc = regs->nip;
-
-		if (cpu_has_feature(CPU_FTR_PPC_LE) && (regs->msr & MSR_LE))
-			pc ^= 4;
-		if (unlikely(__get_user_inatomic(instr,
-						 (unsigned int __user *)pc)))
-			return -EFAULT;
-		if (cpu_has_feature(CPU_FTR_REAL_LE) && (regs->msr & MSR_LE))
-			instr = cpu_to_le32(instr);
-		dsisr = make_dsisr(instr);
-		instruction = instr;
+	if (unlikely(__get_user(instr, (unsigned int __user *)regs->nip)))
+		return -EFAULT;
+	if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) {
+		/* We don't handle PPC little-endian any more... */
+		if (cpu_has_feature(CPU_FTR_PPC_LE))
+			return -EIO;
+		instr = swab32(instr);
 	}
 
-	/* extract the operation and registers from the dsisr */
-	reg = (dsisr >> 5) & 0x1f;	/* source/dest register */
-	areg = dsisr & 0x1f;		/* register to update */
-
 #ifdef CONFIG_SPE
 	if ((instr >> 26) == 0x4) {
+		int reg = (instr >> 21) & 0x1f;
 		PPC_WARN_ALIGNMENT(spe, regs);
 		return emulate_spe(regs, reg, instr);
 	}
 #endif
 
-	instr = (dsisr >> 10) & 0x7f;
-	instr |= (dsisr >> 13) & 0x60;
-
-	/* Lookup the operation in our table */
-	nb = aligninfo[instr].len;
-	flags = aligninfo[instr].flags;
-
-	/*
-	 * Handle some cases which give overlaps in the DSISR values.
-	 */
-	if (IS_XFORM(instruction)) {
-		switch (get_xop(instruction)) {
-		case 532:	/* ldbrx */
-			nb = 8;
-			flags = LD+SW;
-			break;
-		case 660:	/* stdbrx */
-			nb = 8;
-			flags = ST+SW;
-			break;
-		case 20:	/* lwarx */
-		case 84:	/* ldarx */
-		case 116:	/* lharx */
-		case 276:	/* lqarx */
-			return 0;	/* not emulated ever */
-		}
-	}
-
-	/* Byteswap little endian loads and stores */
-	swiz = 0;
-	if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) {
-		flags ^= SW;
-#ifdef __BIG_ENDIAN__
-		/*
-		 * So-called "PowerPC little endian" mode works by
-		 * swizzling addresses rather than by actually doing
-		 * any byte-swapping.  To emulate this, we XOR each
-		 * byte address with 7.  We also byte-swap, because
-		 * the processor's address swizzling depends on the
-		 * operand size (it xors the address with 7 for bytes,
-		 * 6 for halfwords, 4 for words, 0 for doublewords) but
-		 * we will xor with 7 and load/store each byte separately.
-		 */
-		if (cpu_has_feature(CPU_FTR_PPC_LE))
-			swiz = 7;
-#endif
-	}
-
-	/* DAR has the operand effective address */
-	addr = (unsigned char __user *)regs->dar;
-
-#ifdef CONFIG_VSX
-	if ((instruction & 0xfc00003e) == 0x7c000018) {
-		unsigned int elsize;
-
-		/* Additional register addressing bit (64 VSX vs 32 FPR/GPR) */
-		reg |= (instruction & 0x1) << 5;
-		/* Simple inline decoder instead of a table */
-		/* VSX has only 8 and 16 byte memory accesses */
-		nb = 8;
-		if (instruction & 0x200)
-			nb = 16;
-
-		/* Vector stores in little-endian mode swap individual
-		   elements, so process them separately */
-		elsize = 4;
-		if (instruction & 0x80)
-			elsize = 8;
-
-		flags = 0;
-		if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE))
-			flags |= SW;
-		if (instruction & 0x100)
-			flags |= ST;
-		if (instruction & 0x040)
-			flags |= U;
-		/* splat load needs a special decoder */
-		if ((instruction & 0x400) == 0){
-			flags |= SPLT;
-			nb = 8;
-		}
-		PPC_WARN_ALIGNMENT(vsx, regs);
-		return emulate_vsx(addr, reg, areg, regs, flags, nb, elsize);
-	}
-#endif
 
 	/*
 	 * ISA 3.0 (such as P9) copy, copy_first, paste and paste_last alignment
@@ -896,173 +332,27 @@ int fix_alignment(struct pt_regs *regs)
 	 * when pasting to a co-processor. Furthermore, paste_last is the
 	 * synchronisation point for preceding copy/paste sequences.
 	 */
-	if ((instruction & 0xfc0006fe) == PPC_INST_COPY)
+	if ((instr & 0xfc0006fe) == PPC_INST_COPY)
 		return -EIO;
 
-	/* A size of 0 indicates an instruction we don't support, with
-	 * the exception of DCBZ which is handled as a special case here
-	 */
-	if (instr == DCBZ) {
-		PPC_WARN_ALIGNMENT(dcbz, regs);
-		return emulate_dcbz(regs, addr);
-	}
-	if (unlikely(nb == 0))
-		return 0;
-
-	/* Load/Store Multiple instructions are handled in their own
-	 * function
-	 */
-	if (flags & M) {
-		PPC_WARN_ALIGNMENT(multiple, regs);
-		return emulate_multiple(regs, addr, reg, nb,
-					flags, instr, swiz);
-	}
-
-	/* Verify the address of the operand */
-	if (unlikely(user_mode(regs) &&
-		     !access_ok((flags & ST ? VERIFY_WRITE : VERIFY_READ),
-				addr, nb)))
-		return -EFAULT;
-
-	/* Force the fprs into the save area so we can reference them */
-	if (flags & F) {
-		/* userland only */
-		if (unlikely(!user_mode(regs)))
-			return 0;
-		flush_fp_to_thread(current);
-	}
+	r = analyse_instr(&op, regs, instr);
+	if (r < 0)
+		return -EINVAL;
 
-	if (nb == 16) {
-		if (flags & F) {
-			/* Special case for 16-byte FP loads and stores */
-			PPC_WARN_ALIGNMENT(fp_pair, regs);
-			return emulate_fp_pair(addr, reg, flags);
-		} else {
-#ifdef CONFIG_PPC64
-			/* Special case for 16-byte loads and stores */
-			PPC_WARN_ALIGNMENT(lq_stq, regs);
-			return emulate_lq_stq(regs, addr, reg, flags);
-#else
-			return 0;
-#endif
-		}
-	}
-
-	PPC_WARN_ALIGNMENT(unaligned, regs);
-
-	/* If we are loading, get the data from user space, else
-	 * get it from register values
-	 */
-	if (!(flags & ST)) {
-		unsigned int start = 0;
-
-		switch (nb) {
-		case 4:
-			start = offsetof(union data, x32.low32);
-			break;
-		case 2:
-			start = offsetof(union data, x16.low16);
-			break;
-		}
-
-		data.ll = 0;
-		ret = 0;
-		p = (unsigned long)addr;
-
-		for (i = 0; i < nb; i++)
-			ret |= __get_user_inatomic(data.v[start + i],
-						   SWIZ_PTR(p++));
-
-		if (unlikely(ret))
-			return -EFAULT;
-
-	} else if (flags & F) {
-		data.ll = current->thread.TS_FPR(reg);
-		if (flags & S) {
-			/* Single-precision FP store requires conversion... */
-#ifdef CONFIG_PPC_FPU
-			preempt_disable();
-			enable_kernel_fp();
-			cvt_df(&data.dd, (float *)&data.x32.low32);
-			disable_kernel_fp();
-			preempt_enable();
-#else
-			return 0;
-#endif
-		}
-	} else
-		data.ll = regs->gpr[reg];
-
-	if (flags & SW) {
-		switch (nb) {
-		case 8:
-			data.ll = swab64(data.ll);
-			break;
-		case 4:
-			data.x32.low32 = swab32(data.x32.low32);
-			break;
-		case 2:
-			data.x16.low16 = swab16(data.x16.low16);
-			break;
-		}
-	}
-
-	/* Perform other misc operations like sign extension
-	 * or floating point single precision conversion
-	 */
-	switch (flags & ~(U|SW)) {
-	case LD+SE:	/* sign extending integer loads */
-	case LD+F+SE:	/* sign extend for lfiwax */
-		if ( nb == 2 )
-			data.ll = data.x16.low16;
-		else	/* nb must be 4 */
-			data.ll = data.x32.low32;
-		break;
-
-	/* Single-precision FP load requires conversion... */
-	case LD+F+S:
-#ifdef CONFIG_PPC_FPU
-		preempt_disable();
-		enable_kernel_fp();
-		cvt_fd((float *)&data.x32.low32, &data.dd);
-		disable_kernel_fp();
-		preempt_enable();
-#else
-		return 0;
-#endif
-		break;
+	type = op.type & INSTR_TYPE_MASK;
+	if (!OP_IS_LOAD_STORE(type)) {
+		if (type != CACHEOP + DCBZ)
+			return -EINVAL;
+		PPC_WARN_ALIGNMENT(dcbz, regs);
+		r = emulate_dcbz(op.ea, regs);
+	} else {
+		if (type == LARX || type == STCX)
+			return -EIO;
+		PPC_WARN_ALIGNMENT(unaligned, regs);
+		r = emulate_loadstore(regs, &op);
 	}
 
-	/* Store result to memory or update registers */
-	if (flags & ST) {
-		unsigned int start = 0;
-
-		switch (nb) {
-		case 4:
-			start = offsetof(union data, x32.low32);
-			break;
-		case 2:
-			start = offsetof(union data, x16.low16);
-			break;
-		}
-
-		ret = 0;
-		p = (unsigned long)addr;
-
-		for (i = 0; i < nb; i++)
-			ret |= __put_user_inatomic(data.v[start + i],
-						   SWIZ_PTR(p++));
-
-		if (unlikely(ret))
-			return -EFAULT;
-	} else if (flags & F)
-		current->thread.TS_FPR(reg) = data.ll;
-	else
-		regs->gpr[reg] = data.ll;
-
-	/* Update RA as needed */
-	if (flags & U)
-		regs->gpr[areg] = regs->dar;
-
-	return 1;
+	if (!r)
+		return 1;
+	return r;
 }
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 400778d..50d5bf9 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -31,8 +31,8 @@ obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
 
 obj-y			+= checksum_$(BITS).o checksum_wrappers.o
 
-obj-$(CONFIG_PPC_EMULATE_SSTEP)	+= sstep.o ldstfp.o
-obj64-$(CONFIG_PPC_EMULATE_SSTEP) += quad.o
+obj-y			+= sstep.o ldstfp.o quad.o
+obj64-y			+= quad.o
 
 obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 18/17] powerpc: Emulate load/store floating point as integer word instructions
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (16 preceding siblings ...)
  2017-08-30  4:12 ` [PATCH v3 17/17] powerpc: Use instruction emulation infrastructure to handle alignment faults Paul Mackerras
@ 2017-08-30  6:34 ` Paul Mackerras
  2017-08-31  0:49 ` [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Michael Neuling
  2017-08-31 23:51 ` [PATCH 19/17] powerpc: Wrap register number correctly for string load/store instructions Paul Mackerras
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-30  6:34 UTC (permalink / raw)
  To: linuxppc-dev

This adds emulation for the lfiwax, lfiwzx and stfiwx instructions.
This necessitated adding a new flag to indicate whether a floating
point or an integer conversion was needed for LOAD_FP and STORE_FP,
so this moves the size field in op->type up 4 bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/sstep.h |  5 ++--
 arch/powerpc/lib/sstep.c         | 60 ++++++++++++++++++++++++++++++----------
 2 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 309d1c5..ab9d849 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -68,6 +68,7 @@ enum instruction_type {
 #define SIGNEXT		0x20
 #define UPDATE		0x40	/* matches bit in opcode 31 instructions */
 #define BYTEREV		0x80
+#define FPCONV		0x100
 
 /* Barrier type field, ORed in with type */
 #define BARRIER_MASK	0xe0
@@ -93,8 +94,8 @@ enum instruction_type {
 #define VSX_CHECK_VEC	8	/* check MSR_VEC not MSR_VSX for reg >= 32 */
 
 /* Size field in type word */
-#define SIZE(n)		((n) << 8)
-#define GETSIZE(w)	((w) >> 8)
+#define SIZE(n)		((n) << 12)
+#define GETSIZE(w)	((w) >> 12)
 
 #define MKOP(t, f, s)	((t) | (f) | SIZE(s))
 
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 24031ca..2f6897c 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -457,19 +457,23 @@ NOKPROBE_SYMBOL(write_mem);
  * These access either the real FP register or the image in the
  * thread_struct, depending on regs->msr & MSR_FP.
  */
-static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs,
-		      bool cross_endian)
+static int do_fp_load(struct instruction_op *op, unsigned long ea,
+		      struct pt_regs *regs, bool cross_endian)
 {
-	int err;
+	int err, rn, nb;
 	union {
+		int i;
+		unsigned int u;
 		float f;
 		double d[2];
 		unsigned long l[2];
 		u8 b[2 * sizeof(double)];
 	} u;
 
+	nb = GETSIZE(op->type);
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
+	rn = op->reg;
 	err = copy_mem_in(u.b, ea, nb, regs);
 	if (err)
 		return err;
@@ -479,8 +483,14 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs,
 			do_byte_reverse(&u.b[8], 8);
 	}
 	preempt_disable();
-	if (nb == 4)
-		conv_sp_to_dp(&u.f, &u.d[0]);
+	if (nb == 4) {
+		if (op->type & FPCONV)
+			conv_sp_to_dp(&u.f, &u.d[0]);
+		else if (op->type & SIGNEXT)
+			u.l[0] = u.i;
+		else
+			u.l[0] = u.u;
+	}
 	if (regs->msr & MSR_FP)
 		put_fpr(rn, &u.d[0]);
 	else
@@ -498,25 +508,33 @@ static int do_fp_load(int rn, unsigned long ea, int nb, struct pt_regs *regs,
 }
 NOKPROBE_SYMBOL(do_fp_load);
 
-static int do_fp_store(int rn, unsigned long ea, int nb, struct pt_regs *regs,
-		       bool cross_endian)
+static int do_fp_store(struct instruction_op *op, unsigned long ea,
+		       struct pt_regs *regs, bool cross_endian)
 {
+	int rn, nb;
 	union {
+		unsigned int u;
 		float f;
 		double d[2];
 		unsigned long l[2];
 		u8 b[2 * sizeof(double)];
 	} u;
 
+	nb = GETSIZE(op->type);
 	if (!address_ok(regs, ea, nb))
 		return -EFAULT;
+	rn = op->reg;
 	preempt_disable();
 	if (regs->msr & MSR_FP)
 		get_fpr(rn, &u.d[0]);
 	else
 		u.l[0] = current->thread.TS_FPR(rn);
-	if (nb == 4)
-		conv_dp_to_sp(&u.d[0], &u.f);
+	if (nb == 4) {
+		if (op->type & FPCONV)
+			conv_dp_to_sp(&u.d[0], &u.f);
+		else
+			u.u = u.l[0];
+	}
 	if (nb == 16) {
 		rn |= 1;
 		if (regs->msr & MSR_FP)
@@ -2049,7 +2067,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_PPC_FPU
 		case 535:	/* lfsx */
 		case 567:	/* lfsux */
-			op->type = MKOP(LOAD_FP, u, 4);
+			op->type = MKOP(LOAD_FP, u | FPCONV, 4);
 			break;
 
 		case 599:	/* lfdx */
@@ -2059,7 +2077,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 
 		case 663:	/* stfsx */
 		case 695:	/* stfsux */
-			op->type = MKOP(STORE_FP, u, 4);
+			op->type = MKOP(STORE_FP, u | FPCONV, 4);
 			break;
 
 		case 727:	/* stfdx */
@@ -2072,9 +2090,21 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->type = MKOP(LOAD_FP, 0, 16);
 			break;
 
+		case 855:	/* lfiwax */
+			op->type = MKOP(LOAD_FP, SIGNEXT, 4);
+			break;
+
+		case 887:	/* lfiwzx */
+			op->type = MKOP(LOAD_FP, 0, 4);
+			break;
+
 		case 919:	/* stfdpx */
 			op->type = MKOP(STORE_FP, 0, 16);
 			break;
+
+		case 983:	/* stfiwx */
+			op->type = MKOP(STORE_FP, 0, 4);
+			break;
 #endif /* __powerpc64 */
 #endif /* CONFIG_PPC_FPU */
 
@@ -2352,7 +2382,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #ifdef CONFIG_PPC_FPU
 	case 48:	/* lfs */
 	case 49:	/* lfsu */
-		op->type = MKOP(LOAD_FP, u, 4);
+		op->type = MKOP(LOAD_FP, u | FPCONV, 4);
 		op->ea = dform_ea(instr, regs);
 		break;
 
@@ -2364,7 +2394,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 
 	case 52:	/* stfs */
 	case 53:	/* stfsu */
-		op->type = MKOP(STORE_FP, u, 4);
+		op->type = MKOP(STORE_FP, u | FPCONV, 4);
 		op->ea = dform_ea(instr, regs);
 		break;
 
@@ -2792,7 +2822,7 @@ int emulate_loadstore(struct pt_regs *regs, struct instruction_op *op)
 		 */
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		err = do_fp_load(op->reg, ea, size, regs, cross_endian);
+		err = do_fp_load(op, ea, regs, cross_endian);
 		break;
 #endif
 #ifdef CONFIG_ALTIVEC
@@ -2862,7 +2892,7 @@ int emulate_loadstore(struct pt_regs *regs, struct instruction_op *op)
 	case STORE_FP:
 		if (!(regs->msr & MSR_PR) && !(regs->msr & MSR_FP))
 			return 0;
-		err = do_fp_store(op->reg, ea, size, regs, cross_endian);
+		err = do_fp_store(op, ea, regs, cross_endian);
 		break;
 #endif
 #ifdef CONFIG_ALTIVEC
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc.
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (17 preceding siblings ...)
  2017-08-30  6:34 ` [PATCH v3 18/17] powerpc: Emulate load/store floating point as integer word instructions Paul Mackerras
@ 2017-08-31  0:49 ` Michael Neuling
  2017-08-31  0:54   ` Michael Neuling
  2017-08-31 23:51 ` [PATCH 19/17] powerpc: Wrap register number correctly for string load/store instructions Paul Mackerras
  19 siblings, 1 reply; 23+ messages in thread
From: Michael Neuling @ 2017-08-31  0:49 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev

Tested-by: Michael Neuling <mikey@neuling.org>

FWIW I've written a test case for alignment faults (which I'll convert to a
selftest and upstream). It tests all load stores supported by POWER9 (resul=
ts
below).

VSX: 2.06B
	Doing lxvd2x:	PASSED
	Doing lxvw4x:	PASSED
	Doing lxsdx:	PASSED
	Doing lxvdsx:	PASSED
	Doing stxvd2x:	PASSED
	Doing stxvw4x:	PASSED
	Doing stxsdx:	PASSED
VSX: 2.07B
	Doing lxsspx:	PASSED
	Doing lxsiwax:	PASSED
	Doing lxsiwzx:	PASSED
	Doing stxsspx:	PASSED
	Doing stxsiwx:	PASSED
VSX: 3.00B
	Doing lxsd:	PASSED
	Doing lxsibzx:	PASSED
	Doing lxsihzx:	PASSED
	Doing lxssp:	PASSED
	Doing lxv:	PASSED
	Doing lxvb16x:	PASSED
	Doing lxvh8x:	PASSED
	Doing lxvx:	PASSED
	Doing lxvwsx:	PASSED
	Doing lxvl:	PASSED
	Doing lxvll:	PASSED
	Doing stxsd:	PASSED
	Doing stxsibx:	PASSED
	Doing stxsihx:	PASSED
	Doing stxssp:	PASSED
	Doing stxv:	PASSED
	Doing stxvb16x:	PASSED
	Doing stxvh8x:	PASSED
	Doing stxvx:	PASSED
	Doing stxvl:	PASSED
	Doing stxvll:	PASSED
Integer
	Doing lbz:	PASSED
	Doing lbzu:	PASSED
	Doing lbzx:	PASSED
	Doing lbzux:	PASSED
	Doing lhz:	PASSED
	Doing lhzu:	PASSED
	Doing lhzx:	PASSED
	Doing lhzux:	PASSED
	Doing lha:	PASSED
	Doing lhau:	PASSED
	Doing lhax:	PASSED
	Doing lhaux:	PASSED
	Doing lhbrx:	PASSED
	Doing lwz:	PASSED
	Doing lwzu:	PASSED
	Doing lwzx:	PASSED
	Doing lwzux:	PASSED
	Doing lwa:	PASSED
	Doing lwax:	PASSED
	Doing lwaux:	PASSED
	Doing lwbrx:	PASSED
	Doing ld:	PASSED
	Doing ldu:	PASSED
	Doing ldx:	PASSED
	Doing ldux:	PASSED
	Doing ldbrx:	PASSED
	Doing lmw:	PASSED
	Doing stb:	PASSED
	Doing stbx:	PASSED
	Doing stbu:	PASSED
	Doing stbux:	PASSED
	Doing sth:	PASSED
	Doing sthx:	PASSED
	Doing sthu:	PASSED
	Doing sthux:	PASSED
	Doing sthbrx:	PASSED
	Doing stw:	PASSED
	Doing stwx:	PASSED
	Doing stwu:	PASSED
	Doing stwux:	PASSED
	Doing stwbrx:	PASSED
	Doing std:	PASSED
	Doing stdx:	PASSED
	Doing stdu:	PASSED
	Doing stdux:	PASSED
	Doing stdbrx:	PASSED
	Doing stmw:	PASSED
VMX
	Doing stvx:	PASSED
	Doing stvebx:	PASSED
	Doing stvehx:	PASSED
	Doing stvewx:	PASSED
	Doing stvxl:	PASSED
Floating point
	Doing lfd:	PASSED
	Doing lfdx:	PASSED
	Doing lfdp:	PASSED
	Doing lfdpx:	PASSED
	Doing lfdu:	PASSED
	Doing lfdux:	PASSED
	Doing lfs:	PASSED
	Doing lfsx:	PASSED
	Doing lfsu:	PASSED
	Doing lfsux:	PASSED
	Doing lfiwzx:	PASSED
	Doing lfiwax:	PASSED
	Doing stfd:	PASSED
	Doing stfdx:	PASSED
	Doing stfdp:	PASSED
	Doing stfdpx:	PASSED
	Doing stfdu:	PASSED
	Doing stfdux:	PASSED
	Doing stfs:	PASSED
	Doing stfsx:	PASSED
	Doing stfsu:	PASSED
	Doing stfsux:	PASSED
	Doing stfiwx:	PASSED


On Wed, 2017-08-30 at 14:12 +1000, Paul Mackerras wrote:
> This series extends the instruction emulation infrastructure in
> arch/powerpc/lib/sstep.c and uses it for emulating instructions when
> we get an alignment interrupt.=C2=A0=C2=A0The advantage of this is that w=
e only
> have to add the new POWER9 instructions in one place, and it fixes
> several bugs in alignment interrupt handling that have been identified
> recently.
>=20
> With this, analyse_instr() and emulate_step() handle almost all load
> and store instructions in Power ISA v3.00 -- all except the atomic
> memory operations (lwat, stwat, etc.).=C2=A0=C2=A0We now always use the l=
argest
> possible aligned memory accesses (up to 8 bytes) to emulate unaligned
> accesses.=C2=A0=C2=A0If we get a fault, the faulting address is accuratel=
y
> recorded in regs->dar.=C2=A0=C2=A0We also can now access FP/VMX/VSX regis=
ters
> directly if they are live, without having to spill them all to the
> thread_struct and the reload them all later.=C2=A0=C2=A0There are also va=
rious
> other fixes in the series.
>=20
> This version is based on the current powerpc next branch.
>=20
> Paul.
>=20
> =C2=A0arch/powerpc/Kconfig=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=
=A0=C2=A0=C2=A04 -
> =C2=A0arch/powerpc/include/asm/ppc-opcode.h |=C2=A0=C2=A0=C2=A010 +-
> =C2=A0arch/powerpc/include/asm/sstep.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0|=C2=A0=C2=A0=C2=A090 +-
> =C2=A0arch/powerpc/kernel/align.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0774 +-----------
> =C2=A0arch/powerpc/lib/Makefile=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A03 +-
> =C2=A0arch/powerpc/lib/ldstfp.S=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0307 ++---
> =C2=A0arch/powerpc/lib/quad.S=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A062 +
> =C2=A0arch/powerpc/lib/sstep.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0| 2139 +++++++++++++++++++++++---=
------
> -
> =C2=A08 files changed, 1802 insertions(+), 1587 deletions(-)
>=20

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc.
  2017-08-31  0:49 ` [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Michael Neuling
@ 2017-08-31  0:54   ` Michael Neuling
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Neuling @ 2017-08-31  0:54 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev

On Thu, 2017-08-31 at 10:49 +1000, Michael Neuling wrote:
> Tested-by: Michael Neuling <mikey@neuling.org>
>=20
> FWIW I've written a test case for alignment faults (which I'll convert to=
 a
> selftest and upstream). It tests all load stores supported by POWER9 (res=
ults
> below).

Sorry, this is not quite right.  It doesn't test load/store quad or string
instructions.

It also doesn't test atomic memory options (AMO) since they can't be emulat=
ed
anyway.

Mikey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 19/17] powerpc: Wrap register number correctly for string load/store instructions
  2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
                   ` (18 preceding siblings ...)
  2017-08-31  0:49 ` [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Michael Neuling
@ 2017-08-31 23:51 ` Paul Mackerras
  19 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2017-08-31 23:51 UTC (permalink / raw)
  To: linuxppc-dev

Michael Ellerman reported that emulate_loadstore() was trying to
access element 32 of regs->gpr[], which doesn't exist, when
emulating a string store instruction.  This is because the string
load and store instructions (lswi, lswx, stswi and stswx) are
defined to wrap around from register 31 to register 0 if the number
of bytes being loaded or stored is sufficiently large.  This wrapping
was not implemented in the emulation code.  To fix it, we mask the
register number after incrementing it.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: c9f6f4ed95d4 ("powerpc: Implement emulation of string loads and stores")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/lib/sstep.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 2f6897c..c406611 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2865,7 +2865,8 @@ int emulate_loadstore(struct pt_regs *regs, struct instruction_op *op)
 				v32 = byterev_4(v32);
 			regs->gpr[rd] = v32;
 			ea += 4;
-			++rd;
+			/* reg number wraps from 31 to 0 for lsw[ix] */
+			rd = (rd + 1) & 0x1f;
 		}
 		break;
 
@@ -2934,7 +2935,8 @@ int emulate_loadstore(struct pt_regs *regs, struct instruction_op *op)
 			if (err)
 				break;
 			ea += 4;
-			++rd;
+			/* reg number wraps from 31 to 0 for stsw[ix] */
+			rd = (rd + 1) & 0x1f;
 		}
 		break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [v3, 01/17] powerpc: Correct instruction code for xxlor instruction
  2017-08-30  4:12 ` [PATCH v3 01/17] powerpc: Correct instruction code for xxlor instruction Paul Mackerras
@ 2017-09-01 13:29   ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2017-09-01 13:29 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev

On Wed, 2017-08-30 at 04:12:24 UTC, Paul Mackerras wrote:
> The instruction code for xxlor that commit 0016a4cf5582 ("powerpc:
> Emulate most Book I instructions in emulate_step()", 2010-06-15)
> added is actually the code for xxlnor.  It is used in get_vsr()
> and put_vsr() and the effect of the error is that if emulate_step
> is used to emulate a VSX load or store from any register other
> than vsr0, the bitwise complement of the correct value will be
> loaded or stored.  This corrects the error.
> 
> Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()")
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/93b2d3cf3733b4060d3623161551f5

cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-09-01 13:29 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30  4:12 [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 01/17] powerpc: Correct instruction code for xxlor instruction Paul Mackerras
2017-09-01 13:29   ` [v3, " Michael Ellerman
2017-08-30  4:12 ` [PATCH v3 02/17] powerpc: Change analyse_instr so it doesn't modify *regs Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 03/17] powerpc: Don't check MSR FP/VMX/VSX enable bits in analyse_instr() Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 04/17] powerpc: Handle most loads and stores in instruction emulation code Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 05/17] powerpc/64: Fix update forms of loads and stores to write 64-bit EA Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 06/17] powerpc: Fix emulation of the isel instruction Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 07/17] powerpc: Don't update CR0 in emulation of popcnt, prty, bpermd instructions Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 08/17] powerpc: Add emulation for the addpcis instruction Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 09/17] powerpc: Make load/store emulation use larger memory accesses Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 10/17] powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 11/17] powerpc: Emulate vector element load/store instructions Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 12/17] powerpc: Emulate load/store floating double pair instructions Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 13/17] powerpc: Emulate the dcbz instruction Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 14/17] powerpc: Set regs->dar if memory access fails in emulate_step() Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 15/17] powerpc: Handle opposite-endian processes in emulation code Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 16/17] powerpc: Separate out load/store emulation into its own function Paul Mackerras
2017-08-30  4:12 ` [PATCH v3 17/17] powerpc: Use instruction emulation infrastructure to handle alignment faults Paul Mackerras
2017-08-30  6:34 ` [PATCH v3 18/17] powerpc: Emulate load/store floating point as integer word instructions Paul Mackerras
2017-08-31  0:49 ` [PATCH v3 00/17] powerpc: Do alignment fixups using analyse_instr etc Michael Neuling
2017-08-31  0:54   ` Michael Neuling
2017-08-31 23:51 ` [PATCH 19/17] powerpc: Wrap register number correctly for string load/store instructions Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.