All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] use r14 for a per-cpu kernel register
@ 2017-12-20 14:51 Nicholas Piggin
  2017-12-20 14:51 ` [RFC PATCH 1/8] powerpc/64s: stop using r14 register Nicholas Piggin
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This makes r14 a fixed register and used to store per-cpu stuff in
the kernel, including read-write fields that are retained over
interrupts. It ends up being most useful for speeding up per-cpu
pointer dereferencing and soft-irq masking and testing. But it can
also reduce the number of loads and stores in the interrupt entry
paths by moving several bits of interest into r14 (another bit I'm
looking at is adding a bit for HSTATE_IN_GUEST to speed up kvmtest).

The series goes on top of Maddy's softi-irq patches, it works on 64s,
but KVM and 64e are probably broken at the moment. So it's not
intended to merge yet, but if people like the result then maybe the
first patch can be merged to stop using r14 in preparation.

Nicholas Piggin (8):
  powerpc/64s: stop using r14 register
  powerpc/64s: poison r14 register while in kernel
  powerpc/64s: put the per-cpu data_offset in r14
  powerpc/64s: put io_sync bit into r14
  powerpc/64s: put work_pending bit into r14
  powerpc/64s: put irq_soft_mask bits into r14
  powerpc/64s: put irq_soft_mask and irq_happened bits into r14
  powerpc/64s: inline local_irq_enable/restore

 arch/powerpc/Makefile                          |   1 +
 arch/powerpc/crypto/md5-asm.S                  |  40 +++----
 arch/powerpc/crypto/sha1-powerpc-asm.S         |  10 +-
 arch/powerpc/include/asm/exception-64s.h       | 127 +++++++++++----------
 arch/powerpc/include/asm/hw_irq.h              | 130 ++++++++++-----------
 arch/powerpc/include/asm/io.h                  |  11 +-
 arch/powerpc/include/asm/irqflags.h            |  24 ++--
 arch/powerpc/include/asm/kvm_book3s_asm.h      |   2 +-
 arch/powerpc/include/asm/kvm_ppc.h             |   6 +-
 arch/powerpc/include/asm/paca.h                |  73 +++++++++++-
 arch/powerpc/include/asm/percpu.h              |   2 +-
 arch/powerpc/include/asm/ppc_asm.h             |  21 +++-
 arch/powerpc/include/asm/spinlock.h            |  21 ++--
 arch/powerpc/kernel/asm-offsets.c              |  23 +++-
 arch/powerpc/kernel/entry_32.S                 |   4 +-
 arch/powerpc/kernel/entry_64.S                 |  95 ++++++++++------
 arch/powerpc/kernel/exceptions-64s.S           |  52 ++++-----
 arch/powerpc/kernel/head_64.S                  |  25 ++--
 arch/powerpc/kernel/idle_book3s.S              |  86 +++++++-------
 arch/powerpc/kernel/irq.c                      |  92 ++++++---------
 arch/powerpc/kernel/kgdb.c                     |   8 +-
 arch/powerpc/kernel/optprobes_head.S           |   3 +-
 arch/powerpc/kernel/paca.c                     |   1 -
 arch/powerpc/kernel/process.c                  |   6 +-
 arch/powerpc/kernel/setup_64.c                 |  19 +++-
 arch/powerpc/kernel/time.c                     |  15 +--
 arch/powerpc/kernel/tm.S                       |  40 ++++---
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  10 +-
 arch/powerpc/kvm/book3s_hv.c                   |   6 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S        |   5 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S        |   3 +-
 arch/powerpc/kvm/book3s_interrupts.S           |  93 +++++++++++----
 arch/powerpc/kvm/book3s_pr.c                   |   6 +
 arch/powerpc/lib/checksum_64.S                 |  66 +++++------
 arch/powerpc/lib/copypage_power7.S             |  32 +++---
 arch/powerpc/lib/copyuser_power7.S             | 152 ++++++++++++-------------
 arch/powerpc/lib/crtsavres.S                   |   3 +
 arch/powerpc/lib/memcpy_power7.S               |  80 ++++++-------
 arch/powerpc/lib/sstep.c                       |   1 +
 arch/powerpc/xmon/xmon.c                       |   7 +-
 40 files changed, 764 insertions(+), 637 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/8] powerpc/64s: stop using r14 register
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
@ 2017-12-20 14:51 ` Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 2/8] powerpc/64s: poison r14 register while in kernel Nicholas Piggin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:51 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

---
 arch/powerpc/Makefile                          |   1 +
 arch/powerpc/crypto/md5-asm.S                  |  40 +++----
 arch/powerpc/crypto/sha1-powerpc-asm.S         |  10 +-
 arch/powerpc/include/asm/kvm_book3s_asm.h      |   2 +-
 arch/powerpc/include/asm/ppc_asm.h             |  21 +++-
 arch/powerpc/kernel/asm-offsets.c              |   4 +-
 arch/powerpc/kernel/entry_32.S                 |   4 +-
 arch/powerpc/kernel/entry_64.S                 |  45 ++++----
 arch/powerpc/kernel/exceptions-64s.S           |   3 +-
 arch/powerpc/kernel/head_64.S                  |   8 +-
 arch/powerpc/kernel/idle_book3s.S              |  79 +++++++------
 arch/powerpc/kernel/kgdb.c                     |   8 +-
 arch/powerpc/kernel/process.c                  |   4 +-
 arch/powerpc/kernel/tm.S                       |  40 ++++---
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  10 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S        |   5 +-
 arch/powerpc/kvm/book3s_interrupts.S           |  93 +++++++++++----
 arch/powerpc/kvm/book3s_pr.c                   |   6 +
 arch/powerpc/lib/checksum_64.S                 |  66 +++++------
 arch/powerpc/lib/copypage_power7.S             |  32 +++---
 arch/powerpc/lib/copyuser_power7.S             | 152 ++++++++++++-------------
 arch/powerpc/lib/crtsavres.S                   |   3 +
 arch/powerpc/lib/memcpy_power7.S               |  80 ++++++-------
 23 files changed, 396 insertions(+), 320 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 1381693a4a51..8dd38facc5f2 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -140,6 +140,7 @@ AFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mabi=elfv1)
 endif
 CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc))
 CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mno-pointers-to-nested-functions)
+CFLAGS-$(CONFIG_PPC64)	+= -ffixed-r13 -ffixed-r14
 CFLAGS-$(CONFIG_PPC32)	:= -ffixed-r2 $(MULTIPLEWORD)
 
 ifeq ($(CONFIG_PPC_BOOK3S_64),y)
diff --git a/arch/powerpc/crypto/md5-asm.S b/arch/powerpc/crypto/md5-asm.S
index 10cdf5bceebb..99e41af88e19 100644
--- a/arch/powerpc/crypto/md5-asm.S
+++ b/arch/powerpc/crypto/md5-asm.S
@@ -25,31 +25,31 @@
 #define rW02	r10
 #define rW03	r11
 #define rW04	r12
-#define rW05	r14
-#define rW06	r15
-#define rW07	r16
-#define rW08	r17
-#define rW09	r18
-#define rW10	r19
-#define rW11	r20
-#define rW12	r21
-#define rW13	r22
-#define rW14	r23
-#define rW15	r24
-
-#define rT0	r25
-#define rT1	r26
+#define rW05	r15
+#define rW06	r16
+#define rW07	r17
+#define rW08	r18
+#define rW09	r19
+#define rW10	r20
+#define rW11	r21
+#define rW12	r22
+#define rW13	r23
+#define rW14	r24
+#define rW15	r25
+
+#define rT0	r26
+#define rT1	r27
 
 #define INITIALIZE \
 	PPC_STLU r1,-INT_FRAME_SIZE(r1); \
-	SAVE_8GPRS(14, r1);		/* push registers onto stack	*/ \
-	SAVE_4GPRS(22, r1);						   \
-	SAVE_GPR(26, r1)
+	SAVE_8GPRS(15, r1);		/* push registers onto stack	*/ \
+	SAVE_4GPRS(23, r1);						   \
+	SAVE_GPR(27, r1)
 
 #define FINALIZE \
-	REST_8GPRS(14, r1);		/* pop registers from stack	*/ \
-	REST_4GPRS(22, r1);						   \
-	REST_GPR(26, r1);						   \
+	REST_8GPRS(15, r1);		/* pop registers from stack	*/ \
+	REST_4GPRS(23, r1);						   \
+	REST_GPR(27, r1);						   \
 	addi	r1,r1,INT_FRAME_SIZE;
 
 #ifdef __BIG_ENDIAN__
diff --git a/arch/powerpc/crypto/sha1-powerpc-asm.S b/arch/powerpc/crypto/sha1-powerpc-asm.S
index c8951ce0dcc4..6c38de214c11 100644
--- a/arch/powerpc/crypto/sha1-powerpc-asm.S
+++ b/arch/powerpc/crypto/sha1-powerpc-asm.S
@@ -42,10 +42,10 @@
 	or	r6,r6,r0;			\
 	add	r0,RE(t),r15;			\
 	add	RT(t),RT(t),r6;		\
-	add	r14,r0,W(t);			\
+	add	r6,r0,W(t);			\
 	LWZ(W((t)+4),((t)+4)*4,r4);	\
 	rotlwi	RB(t),RB(t),30;			\
-	add	RT(t),RT(t),r14
+	add	RT(t),RT(t),r6
 
 #define STEPD0_UPDATE(t)			\
 	and	r6,RB(t),RC(t);		\
@@ -124,8 +124,7 @@
 
 _GLOBAL(powerpc_sha_transform)
 	PPC_STLU r1,-INT_FRAME_SIZE(r1)
-	SAVE_8GPRS(14, r1)
-	SAVE_10GPRS(22, r1)
+	SAVE_NVGPRS(r1)
 
 	/* Load up A - E */
 	lwz	RA(0),0(r3)	/* A */
@@ -183,7 +182,6 @@ _GLOBAL(powerpc_sha_transform)
 	stw	RD(0),12(r3)
 	stw	RE(0),16(r3)
 
-	REST_8GPRS(14, r1)
-	REST_10GPRS(22, r1)
+	REST_NVGPRS(r1)
 	addi	r1,r1,INT_FRAME_SIZE
 	blr
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index ab386af2904f..fbbd2ca18866 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -143,7 +143,7 @@ struct kvmppc_host_state {
 
 struct kvmppc_book3s_shadow_vcpu {
 	bool in_use;
-	ulong gpr[14];
+	ulong gpr[15];
 	u32 cr;
 	ulong xer;
 	ulong ctr;
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index ae94b3626b6c..109f4dce8a79 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -10,6 +10,16 @@
 #include <asm/ppc-opcode.h>
 #include <asm/firmware.h>
 
+#ifdef __powerpc64__
+#ifdef CONFIG_PPC_BOOK3S
+#define FIRST_NVGPR		15
+#else
+#define FIRST_NVGPR		14
+#endif
+#else
+#define FIRST_NVGPR		13
+#endif
+
 #ifdef __ASSEMBLY__
 
 #define SZL			(BITS_PER_LONG/8)
@@ -75,16 +85,21 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #ifdef __powerpc64__
 #define SAVE_GPR(n, base)	std	n,GPR0+8*(n)(base)
 #define REST_GPR(n, base)	ld	n,GPR0+8*(n)(base)
+#ifdef CONFIG_PPC_BOOK3S
+#define SAVE_NVGPRS(base)	SAVE_GPR(15, base); SAVE_2GPRS(16, base); SAVE_4GPRS(18, base); SAVE_10GPRS(22, base)
+#define REST_NVGPRS(base)	REST_GPR(15, base); REST_2GPRS(16, base); REST_4GPRS(18, base); REST_10GPRS(22, base)
+#else /* CONFIG_PPC_BOOK3S */
 #define SAVE_NVGPRS(base)	SAVE_8GPRS(14, base); SAVE_10GPRS(22, base)
 #define REST_NVGPRS(base)	REST_8GPRS(14, base); REST_10GPRS(22, base)
-#else
+#endif /* CONFIG_PPC_BOOK3S */
+#else /* __powerpc64__ */
 #define SAVE_GPR(n, base)	stw	n,GPR0+4*(n)(base)
 #define REST_GPR(n, base)	lwz	n,GPR0+4*(n)(base)
 #define SAVE_NVGPRS(base)	SAVE_GPR(13, base); SAVE_8GPRS(14, base); \
 				SAVE_10GPRS(22, base)
 #define REST_NVGPRS(base)	REST_GPR(13, base); REST_8GPRS(14, base); \
 				REST_10GPRS(22, base)
-#endif
+#endif /* __powerpc64__ */
 
 #define SAVE_2GPRS(n, base)	SAVE_GPR(n, base); SAVE_GPR(n+1, base)
 #define SAVE_4GPRS(n, base)	SAVE_2GPRS(n, base); SAVE_2GPRS(n+2, base)
@@ -184,7 +199,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 #ifdef CONFIG_PPC64
 
 #define STACKFRAMESIZE 256
-#define __STK_REG(i)   (112 + ((i)-14)*8)
+#define __STK_REG(i)   (112 + ((i)-15)*8)
 #define STK_REG(i)     __STK_REG(__REG_##i)
 
 #ifdef PPC64_ELF_ABI_v2
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 397681f43eed..db8407483c9e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -283,9 +283,9 @@ int main(void)
 	STACK_PT_REGS_OFFSET(GPR11, gpr[11]);
 	STACK_PT_REGS_OFFSET(GPR12, gpr[12]);
 	STACK_PT_REGS_OFFSET(GPR13, gpr[13]);
-#ifndef CONFIG_PPC64
+#ifndef CONFIG_PPC_BOOK3E_64
 	STACK_PT_REGS_OFFSET(GPR14, gpr[14]);
-#endif /* CONFIG_PPC64 */
+#endif
 	/*
 	 * Note: these symbols include _ because they overlap with special
 	 * register names
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e780e1fbf6c2..4b3cf4dc7d05 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -458,8 +458,8 @@ ret_from_fork:
 ret_from_kernel_thread:
 	REST_NVGPRS(r1)
 	bl	schedule_tail
-	mtlr	r14
-	mr	r3,r15
+	mtlr	FIRST_NVGPR
+	mr	r3,FIRST_NVGPR+1
 	PPC440EP_ERR42
 	blrl
 	li	r3,0
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index f2ffb9aa7ff4..b9bf44635b10 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -419,7 +419,7 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
  * The sigsuspend and rt_sigsuspend system calls can call do_signal
  * and thus put the process into the stopped state where we might
  * want to examine its user state with ptrace.  Therefore we need
- * to save all the nonvolatile registers (r14 - r31) before calling
+ * to save all the nonvolatile registers (r15 - r31) before calling
  * the C code.  Similarly, fork, vfork and clone need the full
  * register state on the stack so that it can be copied to the child.
  */
@@ -463,10 +463,10 @@ _GLOBAL(ret_from_fork)
 _GLOBAL(ret_from_kernel_thread)
 	bl	schedule_tail
 	REST_NVGPRS(r1)
-	mtlr	r14
-	mr	r3,r15
+	mtlr	FIRST_NVGPR
+	mr	r3,FIRST_NVGPR+1
 #ifdef PPC64_ELF_ABI_v2
-	mr	r12,r14
+	mr	r12,FIRST_NVGPR
 #endif
 	blrl
 	li	r3,0
@@ -495,9 +495,7 @@ _GLOBAL(_switch)
 	mflr	r0
 	std	r0,16(r1)
 	stdu	r1,-SWITCH_FRAME_SIZE(r1)
-	/* r3-r13 are caller saved -- Cort */
-	SAVE_8GPRS(14, r1)
-	SAVE_10GPRS(22, r1)
+	SAVE_NVGPRS(r1)
 	std	r0,_NIP(r1)	/* Return to switch caller */
 	mfcr	r23
 	std	r23,_CCR(r1)
@@ -609,9 +607,8 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	ld	r6,_CCR(r1)
 	mtcrf	0xFF,r6
 
-	/* r3-r13 are destroyed -- Cort */
-	REST_8GPRS(14, r1)
-	REST_10GPRS(22, r1)
+	/* Volatile regs are destroyed */
+	REST_NVGPRS(r1)
 
 	/* convert old thread to its task_struct for return value */
 	addi	r3,r3,-THREAD
@@ -1011,12 +1008,14 @@ _GLOBAL(enter_rtas)
 
 	/* Because RTAS is running in 32b mode, it clobbers the high order half
 	 * of all registers that it saves.  We therefore save those registers
-	 * RTAS might touch to the stack.  (r0, r3-r13 are caller saved)
+	 * RTAS might touch to the stack.  (r0, r3-r12 are caller saved)
    	 */
 	SAVE_GPR(2, r1)			/* Save the TOC */
 	SAVE_GPR(13, r1)		/* Save paca */
-	SAVE_8GPRS(14, r1)		/* Save the non-volatiles */
-	SAVE_10GPRS(22, r1)		/* ditto */
+#ifdef CONFIG_PPC_BOOK3S
+	SAVE_GPR(14, r1)		/* Save r14 */
+#endif
+	SAVE_NVGPRS(r1)			/* Save the non-volatiles */
 
 	mfcr	r4
 	std	r4,_CCR(r1)
@@ -1118,8 +1117,10 @@ rtas_restore_regs:
 	/* relocation is on at this point */
 	REST_GPR(2, r1)			/* Restore the TOC */
 	REST_GPR(13, r1)		/* Restore paca */
-	REST_8GPRS(14, r1)		/* Restore the non-volatiles */
-	REST_10GPRS(22, r1)		/* ditto */
+#ifdef CONFIG_PPC_BOOK3S
+	REST_GPR(14, r1)		/* Restore r14 */
+#endif
+	REST_NVGPRS(r1)			/* Restore the non-volatiles */
 
 	GET_PACA(r13)
 
@@ -1149,12 +1150,14 @@ _GLOBAL(enter_prom)
 
 	/* Because PROM is running in 32b mode, it clobbers the high order half
 	 * of all registers that it saves.  We therefore save those registers
-	 * PROM might touch to the stack.  (r0, r3-r13 are caller saved)
+	 * PROM might touch to the stack.  (r0, r3-r14 are caller saved)
    	 */
 	SAVE_GPR(2, r1)
 	SAVE_GPR(13, r1)
-	SAVE_8GPRS(14, r1)
-	SAVE_10GPRS(22, r1)
+#ifdef CONFIG_PPC_BOOK3S
+	SAVE_GPR(14, r1)
+#endif
+	SAVE_NVGPRS(r1)
 	mfcr	r10
 	mfmsr	r11
 	std	r10,_CCR(r1)
@@ -1198,8 +1201,10 @@ _GLOBAL(enter_prom)
 	/* Restore other registers */
 	REST_GPR(2, r1)
 	REST_GPR(13, r1)
-	REST_8GPRS(14, r1)
-	REST_10GPRS(22, r1)
+#ifdef CONFIG_PPC_BOOK3S
+	REST_GPR(14, r1)
+#endif
+	REST_NVGPRS(r1)
 	ld	r4,_CCR(r1)
 	mtcr	r4
 	
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 60d9c68ef414..189c456450e2 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1656,8 +1656,7 @@ BEGIN_FTR_SECTION
 	ld	r10,EX_CFAR(r3)
 	std	r10,ORIG_GPR3(r1)
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
-	SAVE_8GPRS(14,r1)
-	SAVE_10GPRS(22,r1)
+	SAVE_NVGPRS(r1)
 	lhz	r12,PACA_TRAP_SAVE(r13)
 	std	r12,_TRAP(r1)
 	addi	r11,r1,INT_FRAME_SIZE
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 56f9e112b98d..87b2c2264118 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -802,9 +802,9 @@ __secondary_start:
 	/* Initialize the kernel stack */
 	LOAD_REG_ADDR(r3, current_set)
 	sldi	r28,r24,3		/* get current_set[cpu#]	 */
-	ldx	r14,r3,r28
-	addi	r14,r14,THREAD_SIZE-STACK_FRAME_OVERHEAD
-	std	r14,PACAKSAVE(r13)
+	ldx	r15,r3,r28
+	addi	r15,r15,THREAD_SIZE-STACK_FRAME_OVERHEAD
+	std	r15,PACAKSAVE(r13)
 
 	/* Do early setup for that CPU (SLB and hash table pointer) */
 	bl	early_setup_secondary
@@ -813,7 +813,7 @@ __secondary_start:
 	 * setup the new stack pointer, but *don't* use this until
 	 * translation is on.
 	 */
-	mr	r1, r14
+	mr	r1, r15
 
 	/* Clear backchain so we get nice backtraces */
 	li	r7,0
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 01e1c1997893..5065c4cb5f12 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -150,19 +150,19 @@ power9_restore_additional_sprs:
 /*
  * Used by threads when the lock bit of core_idle_state is set.
  * Threads will spin in HMT_LOW until the lock bit is cleared.
- * r14 - pointer to core_idle_state
- * r15 - used to load contents of core_idle_state
+ * r15 - pointer to core_idle_state
+ * r16 - used to load contents of core_idle_state
  * r9  - used as a temporary variable
  */
 
 core_idle_lock_held:
 	HMT_LOW
-3:	lwz	r15,0(r14)
-	andis.	r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
+3:	lwz	r16,0(r15)
+	andis.	r16,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	bne	3b
 	HMT_MEDIUM
-	lwarx	r15,0,r14
-	andis.	r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
+	lwarx	r16,0,r15
+	andis.	r9,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	bne-	core_idle_lock_held
 	blr
 
@@ -263,21 +263,21 @@ pnv_enter_arch207_idle_mode:
 2:
 	/* Sleep or winkle */
 	lbz	r7,PACA_THREAD_MASK(r13)
-	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
+	ld	r15,PACA_CORE_IDLE_STATE_PTR(r13)
 	li	r5,0
 	beq	cr3,3f
 	lis	r5,PNV_CORE_IDLE_WINKLE_COUNT@h
 3:
 lwarx_loop1:
-	lwarx	r15,0,r14
+	lwarx	r16,0,r15
 
-	andis.	r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
+	andis.	r9,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	bnel-	core_idle_lock_held
 
-	add	r15,r15,r5			/* Add if winkle */
-	andc	r15,r15,r7			/* Clear thread bit */
+	add	r16,r16,r5			/* Add if winkle */
+	andc	r16,r16,r7			/* Clear thread bit */
 
-	andi.	r9,r15,PNV_CORE_IDLE_THREAD_BITS
+	andi.	r9,r16,PNV_CORE_IDLE_THREAD_BITS
 
 /*
  * If cr0 = 0, then current thread is the last thread of the core entering
@@ -291,7 +291,7 @@ lwarx_loop1:
 pnv_fastsleep_workaround_at_entry:
 	beq	fastsleep_workaround_at_entry
 
-	stwcx.	r15,0,r14
+	stwcx.	r16,0,r15
 	bne-	lwarx_loop1
 	isync
 
@@ -300,8 +300,8 @@ common_enter: /* common code for all the threads entering sleep or winkle */
 	IDLE_STATE_ENTER_SEQ_NORET(PPC_SLEEP)
 
 fastsleep_workaround_at_entry:
-	oris	r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
-	stwcx.	r15,0,r14
+	oris	r16,r16,PNV_CORE_IDLE_LOCK_BIT@h
+	stwcx.	r16,0,r15
 	bne-	lwarx_loop1
 	isync
 
@@ -311,9 +311,9 @@ fastsleep_workaround_at_entry:
 	bl	opal_config_cpu_idle_state
 
 	/* Unlock */
-	xoris	r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
+	xoris	r16,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	lwsync
-	stw	r15,0(r14)
+	stw	r16,0(r15)
 	b	common_enter
 
 enter_winkle:
@@ -381,15 +381,15 @@ END_FTR_SECTION_IFCLR(CPU_FTR_POWER9_DD2_1)
  * stack and enter stop
  */
 	lbz     r7,PACA_THREAD_MASK(r13)
-	ld      r14,PACA_CORE_IDLE_STATE_PTR(r13)
+	ld      r15,PACA_CORE_IDLE_STATE_PTR(r13)
 
 lwarx_loop_stop:
-	lwarx   r15,0,r14
-	andis.	r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
+	lwarx   r16,0,r15
+	andis.	r9,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	bnel-	core_idle_lock_held
-	andc    r15,r15,r7                      /* Clear thread bit */
+	andc    r16,r16,r7                      /* Clear thread bit */
 
-	stwcx.  r15,0,r14
+	stwcx.  r16,0,r15
 	bne-    lwarx_loop_stop
 	isync
 
@@ -643,8 +643,7 @@ pnv_wakeup_tb_loss:
 	stb	r0,PACA_NAPSTATELOST(r13)
 
 	/*
-	 *
-	 * Save SRR1 and LR in NVGPRs as they might be clobbered in
+	 * Save SRR1 (r12) and LR in NVGPRs as they might be clobbered in
 	 * opal_call() (called in CHECK_HMI_INTERRUPT). SRR1 is required
 	 * to determine the wakeup reason if we branch to kvm_start_guest. LR
 	 * is required to return back to reset vector after hypervisor state
@@ -657,7 +656,7 @@ BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 
-	ld	r14,PACA_CORE_IDLE_STATE_PTR(r13)
+	ld	r15,PACA_CORE_IDLE_STATE_PTR(r13)
 	lbz	r7,PACA_THREAD_MASK(r13)
 
 	/*
@@ -671,15 +670,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	 * In either case loop until the lock bit is cleared.
 	 */
 1:
-	lwarx	r15,0,r14
-	andis.	r9,r15,PNV_CORE_IDLE_LOCK_BIT@h
+	lwarx	r16,0,r15
+	andis.	r9,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	bnel-	core_idle_lock_held
-	oris	r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
-	stwcx.	r15,0,r14
+	oris	r16,r16,PNV_CORE_IDLE_LOCK_BIT@h
+	stwcx.	r16,0,r15
 	bne-	1b
 	isync
 
-	andi.	r9,r15,PNV_CORE_IDLE_THREAD_BITS
+	andi.	r9,r16,PNV_CORE_IDLE_THREAD_BITS
 	cmpwi	cr2,r9,0
 
 	/*
@@ -745,27 +744,27 @@ BEGIN_FTR_SECTION
 	 */
 	cmpwi	r18,PNV_THREAD_WINKLE
 	bne	2f
-	andis.	r9,r15,PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT@h
-	subis	r15,r15,PNV_CORE_IDLE_WINKLE_COUNT@h
+	andis.	r9,r16,PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT@h
+	subis	r16,r16,PNV_CORE_IDLE_WINKLE_COUNT@h
 	beq	2f
-	ori	r15,r15,PNV_CORE_IDLE_THREAD_WINKLE_BITS /* all were winkle */
+	ori	r16,r16,PNV_CORE_IDLE_THREAD_WINKLE_BITS /* all were winkle */
 2:
 	/* Shift thread bit to winkle mask, then test if this thread is set,
 	 * and remove it from the winkle bits */
 	slwi	r8,r7,8
-	and	r8,r8,r15
-	andc	r15,r15,r8
+	and	r8,r8,r16
+	andc	r16,r16,r8
 	cmpwi	cr4,r8,1 /* cr4 will be gt if our bit is set, lt if not */
 
 	lbz	r4,PACA_SUBCORE_SIBLING_MASK(r13)
-	and	r4,r4,r15
+	and	r4,r4,r16
 	cmpwi	r4,0	/* Check if first in subcore */
 
-	or	r15,r15,r7		/* Set thread bit */
+	or	r16,r16,r7		/* Set thread bit */
 	beq	first_thread_in_subcore
 END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 
-	or	r15,r15,r7		/* Set thread bit */
+	or	r16,r16,r7		/* Set thread bit */
 	beq	cr2,first_thread_in_core
 
 	/* Not first thread in core or subcore to wake up */
@@ -842,9 +841,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	mtspr	SPRN_WORC,r4
 
 clear_lock:
-	xoris	r15,r15,PNV_CORE_IDLE_LOCK_BIT@h
+	xoris	r16,r16,PNV_CORE_IDLE_LOCK_BIT@h
 	lwsync
-	stw	r15,0(r14)
+	stw	r16,0(r15)
 
 common_exit:
 	/*
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 35e240a0a408..d593e381f34d 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -230,11 +230,11 @@ void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *p)
 	for (reg = 0; reg < 3; reg++)
 		PACK64(ptr, regs->gpr[reg]);
 
-	/* Regs GPR3-13 are caller saved, not in regs->gpr[] */
-	ptr += 11;
+	/* Regs GPR3-13/14 are caller saved, not in regs->gpr[] */
+	ptr += FIRST_NVGPR - 2;
 
-	/* Regs GPR14-31 */
-	for (reg = 14; reg < 32; reg++)
+	/* Regs GPR14/15-31 */
+	for (reg = FIRST_NVGPR; reg < 32; reg++)
 		PACK64(ptr, regs->gpr[reg]);
 
 #ifdef CONFIG_FSL_BOOKE
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 2a63cc78257d..f56e42f06f24 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1672,12 +1672,12 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 		childregs->gpr[1] = sp + sizeof(struct pt_regs);
 		/* function */
 		if (usp)
-			childregs->gpr[14] = ppc_function_entry((void *)usp);
+			childregs->gpr[FIRST_NVGPR] = ppc_function_entry((void *)usp);
 #ifdef CONFIG_PPC64
 		clear_tsk_thread_flag(p, TIF_32BIT);
 		childregs->softe = IRQ_SOFT_MASK_NONE;
 #endif
-		childregs->gpr[15] = kthread_arg;
+		childregs->gpr[FIRST_NVGPR + 1] = kthread_arg;
 		p->thread.regs = NULL;	/* no user register state */
 		ti->flags |= _TIF_RESTOREALL;
 		f = ret_from_kernel_thread;
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index b92ac8e711db..3f62b8157aea 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -106,27 +106,28 @@ _GLOBAL(tm_reclaim)
 	/* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. */
 
 	std	r3, STK_PARAM(R3)(r1)
+	SAVE_GPR(14, r1)
 	SAVE_NVGPRS(r1)
 
 	/* We need to setup MSR for VSX register save instructions. */
-	mfmsr	r14
-	mr	r15, r14
-	ori	r15, r15, MSR_FP
-	li	r16, 0
-	ori	r16, r16, MSR_EE /* IRQs hard off */
-	andc	r15, r15, r16
-	oris	r15, r15, MSR_VEC@h
+	mfmsr	r15
+	mr	r16, r15
+	ori	r16, r16, MSR_FP
+	li	r17, 0
+	ori	r17, r17, MSR_EE /* IRQs hard off */
+	andc	r16, r16, r17
+	oris	r16, r16, MSR_VEC@h
 #ifdef CONFIG_VSX
 	BEGIN_FTR_SECTION
-	oris	r15,r15, MSR_VSX@h
+	oris	r16,r16, MSR_VSX@h
 	END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif
-	mtmsrd	r15
-	std	r14, TM_FRAME_L0(r1)
+	mtmsrd	r16
+	std	r15, TM_FRAME_L0(r1)
 
 	/* Do sanity check on MSR to make sure we are suspended */
 	li	r7, (MSR_TS_S)@higher
-	srdi	r6, r14, 32
+	srdi	r6, r15, 32
 	and	r6, r6, r7
 1:	tdeqi   r6, 0
 	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0
@@ -205,7 +206,8 @@ _GLOBAL(tm_reclaim)
 	std	r6, GPR12(r7)
 	std	r8, GPR13(r7)
 
-	SAVE_NVGPRS(r7)				/* user r14-r31 */
+	SAVE_GPR(14, r7)			/* user r14 */
+	SAVE_NVGPRS(r7)				/* user r15-r31 */
 
 	/* ******************** NIP ******************** */
 	mfspr	r3, SPRN_TFHAR
@@ -278,12 +280,13 @@ _GLOBAL(tm_reclaim)
 	/* AMR is checkpointed too, but is unsupported by Linux. */
 
 	/* Restore original MSR/IRQ state & clear TM mode */
-	ld	r14, TM_FRAME_L0(r1)		/* Orig MSR */
+	ld	r15, TM_FRAME_L0(r1)		/* Orig MSR */
 
-	li	r15, 0
-	rldimi  r14, r15, MSR_TS_LG, (63-MSR_TS_LG)-1
-	mtmsrd  r14
+	li	r16, 0
+	rldimi  r15, r16, MSR_TS_LG, (63-MSR_TS_LG)-1
+	mtmsrd  r15
 
+	REST_GPR(14, r1)
 	REST_NVGPRS(r1)
 
 	addi    r1, r1, TM_FRAME_SIZE
@@ -319,6 +322,7 @@ _GLOBAL(__tm_recheckpoint)
 	/* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD].
 	 * This is used for backing up the NVGPRs:
 	 */
+	SAVE_GPR(14, r1)
 	SAVE_NVGPRS(r1)
 
 	/* Load complete register state from ts_ckpt* registers */
@@ -391,8 +395,9 @@ restore_gprs:
 	REST_GPR(4, r7)				/* GPR4 */
 	REST_4GPRS(8, r7)			/* GPR8-11 */
 	REST_2GPRS(12, r7)			/* GPR12-13 */
+	REST_GPR(14, r7)			/* GPR14 */
 
-	REST_NVGPRS(r7)				/* GPR14-31 */
+	REST_NVGPRS(r7)				/* GPR15-31 */
 
 	/* Load up PPR and DSCR here so we don't run with user values for long
 	 */
@@ -471,6 +476,7 @@ restore_gprs:
 	li	r4, MSR_RI
 	mtmsrd	r4, 1
 
+	REST_GPR(14, r1)
 	REST_NVGPRS(r1)
 
 	addi    r1, r1, TM_FRAME_SIZE
diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index 3f3e81852422..a0f62ff3442f 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -76,7 +76,7 @@ _GLOBAL(ftrace_caller)
 	ld	r5,0(r3)
 
 #ifdef CONFIG_LIVEPATCH
-	mr	r14,r7		/* remember old NIP */
+	mr	r15,r7		/* remember old NIP */
 #endif
 	/* Calculate ip from nip-4 into r3 for call below */
 	subi    r3, r7, MCOUNT_INSN_SIZE
@@ -100,10 +100,10 @@ ftrace_call:
 	nop
 
 	/* Load the possibly modified NIP */
-	ld	r15, _NIP(r1)
+	ld	r16, _NIP(r1)
 
 #ifdef CONFIG_LIVEPATCH
-	cmpd	r14, r15	/* has NIP been altered? */
+	cmpd	r15, r16	/* has NIP been altered? */
 #endif
 
 #if defined(CONFIG_LIVEPATCH) && defined(CONFIG_KPROBES_ON_FTRACE)
@@ -111,7 +111,7 @@ ftrace_call:
 	beq	1f
 
 	/* Check if there is an active jprobe on us */
-	subi	r3, r14, 4
+	subi	r3, r15, 4
 	bl	__is_active_jprobe
 	nop
 
@@ -130,7 +130,7 @@ ftrace_call:
 #endif
 
 	/* Load CTR with the possibly modified NIP */
-	mtctr	r15
+	mtctr	r16
 
 	/* Restore gprs */
 	REST_GPR(0,r1)
diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S
index dc54373c8780..ef1c0c1e57ff 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -46,7 +46,7 @@ _GLOBAL(__kvmppc_vcore_entry)
 	/* Save host state to the stack */
 	stdu	r1, -SWITCH_FRAME_SIZE(r1)
 
-	/* Save non-volatile registers (r14 - r31) and CR */
+	/* Save non-volatile registers (r15 - r31) and CR */
 	SAVE_NVGPRS(r1)
 	mfcr	r3
 	std	r3, _CCR(r1)
@@ -149,9 +149,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
 	 * R3       = trap number on this thread
 	 * R12      = exit handler id
 	 * R13      = PACA
+	 * R14      = ? XXX
 	 */
 
-	/* Restore non-volatile host registers (r14 - r31) and CR */
+	/* Restore non-volatile host registers (r15 - r31) and CR */
 	REST_NVGPRS(r1)
 	ld	r4, _CCR(r1)
 	mtcr	r4
diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S
index 901e6fe00c39..b275a544129e 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -38,6 +38,45 @@
 
 #endif /* CONFIG_PPC_BOOK3S_XX */
 
+#if defined(CONFIG_PPC_BOOK3S_64)
+#define VCPU_LOAD_NVGPRS(vcpu) \
+	PPC_LL	r15, VCPU_GPR(R15)(vcpu); \
+	PPC_LL	r16, VCPU_GPR(R16)(vcpu); \
+	PPC_LL	r17, VCPU_GPR(R17)(vcpu); \
+	PPC_LL	r18, VCPU_GPR(R18)(vcpu); \
+	PPC_LL	r19, VCPU_GPR(R19)(vcpu); \
+	PPC_LL	r20, VCPU_GPR(R20)(vcpu); \
+	PPC_LL	r21, VCPU_GPR(R21)(vcpu); \
+	PPC_LL	r22, VCPU_GPR(R22)(vcpu); \
+	PPC_LL	r23, VCPU_GPR(R23)(vcpu); \
+	PPC_LL	r24, VCPU_GPR(R24)(vcpu); \
+	PPC_LL	r25, VCPU_GPR(R25)(vcpu); \
+	PPC_LL	r26, VCPU_GPR(R26)(vcpu); \
+	PPC_LL	r27, VCPU_GPR(R27)(vcpu); \
+	PPC_LL	r28, VCPU_GPR(R28)(vcpu); \
+	PPC_LL	r29, VCPU_GPR(R29)(vcpu); \
+	PPC_LL	r30, VCPU_GPR(R30)(vcpu); \
+	PPC_LL	r31, VCPU_GPR(R31)(vcpu);
+
+#define VCPU_STORE_NVGPRS(vcpu) \
+	PPC_STL	r15, VCPU_GPR(R15)(vcpu); \
+	PPC_STL	r16, VCPU_GPR(R16)(vcpu); \
+	PPC_STL	r17, VCPU_GPR(R17)(vcpu); \
+	PPC_STL	r18, VCPU_GPR(R18)(vcpu); \
+	PPC_STL	r19, VCPU_GPR(R19)(vcpu); \
+	PPC_STL	r20, VCPU_GPR(R20)(vcpu); \
+	PPC_STL	r21, VCPU_GPR(R21)(vcpu); \
+	PPC_STL	r22, VCPU_GPR(R22)(vcpu); \
+	PPC_STL	r23, VCPU_GPR(R23)(vcpu); \
+	PPC_STL	r24, VCPU_GPR(R24)(vcpu); \
+	PPC_STL	r25, VCPU_GPR(R25)(vcpu); \
+	PPC_STL	r26, VCPU_GPR(R26)(vcpu); \
+	PPC_STL	r27, VCPU_GPR(R27)(vcpu); \
+	PPC_STL	r28, VCPU_GPR(R28)(vcpu); \
+	PPC_STL	r29, VCPU_GPR(R29)(vcpu); \
+	PPC_STL	r30, VCPU_GPR(R30)(vcpu); \
+	PPC_STL	r31, VCPU_GPR(R31)(vcpu);
+#else
 #define VCPU_LOAD_NVGPRS(vcpu) \
 	PPC_LL	r14, VCPU_GPR(R14)(vcpu); \
 	PPC_LL	r15, VCPU_GPR(R15)(vcpu); \
@@ -56,7 +95,28 @@
 	PPC_LL	r28, VCPU_GPR(R28)(vcpu); \
 	PPC_LL	r29, VCPU_GPR(R29)(vcpu); \
 	PPC_LL	r30, VCPU_GPR(R30)(vcpu); \
-	PPC_LL	r31, VCPU_GPR(R31)(vcpu); \
+	PPC_LL	r31, VCPU_GPR(R31)(vcpu);
+
+#define VCPU_STORE_NVGPRS(vcpu) \
+	PPC_STL	r14, VCPU_GPR(R14)(vcpu); \
+	PPC_STL	r15, VCPU_GPR(R15)(vcpu); \
+	PPC_STL	r16, VCPU_GPR(R16)(vcpu); \
+	PPC_STL	r17, VCPU_GPR(R17)(vcpu); \
+	PPC_STL	r18, VCPU_GPR(R18)(vcpu); \
+	PPC_STL	r19, VCPU_GPR(R19)(vcpu); \
+	PPC_STL	r20, VCPU_GPR(R20)(vcpu); \
+	PPC_STL	r21, VCPU_GPR(R21)(vcpu); \
+	PPC_STL	r22, VCPU_GPR(R22)(vcpu); \
+	PPC_STL	r23, VCPU_GPR(R23)(vcpu); \
+	PPC_STL	r24, VCPU_GPR(R24)(vcpu); \
+	PPC_STL	r25, VCPU_GPR(R25)(vcpu); \
+	PPC_STL	r26, VCPU_GPR(R26)(vcpu); \
+	PPC_STL	r27, VCPU_GPR(R27)(vcpu); \
+	PPC_STL	r28, VCPU_GPR(R28)(vcpu); \
+	PPC_STL	r29, VCPU_GPR(R29)(vcpu); \
+	PPC_STL	r30, VCPU_GPR(R30)(vcpu); \
+	PPC_STL	r31, VCPU_GPR(R31)(vcpu);
+#endif
 
 /*****************************************************************************
  *                                                                           *
@@ -81,12 +141,12 @@ kvm_start_entry:
 	/* Save r3 (kvm_run) and r4 (vcpu) */
 	SAVE_2GPRS(3, r1)
 
-	/* Save non-volatile registers (r14 - r31) */
+	/* Save non-volatile registers (r14/r15 - r31) */
 	SAVE_NVGPRS(r1)
 
 	/* Save CR */
-	mfcr	r14
-	stw	r14, _CCR(r1)
+	mfcr	r15
+	stw	r15, _CCR(r1)
 
 	/* Save LR */
 	PPC_STL	r0, _LINK(r1)
@@ -183,24 +243,7 @@ after_sprg3_load:
 	/* R7 = vcpu */
 	PPC_LL	r7, GPR4(r1)
 
-	PPC_STL	r14, VCPU_GPR(R14)(r7)
-	PPC_STL	r15, VCPU_GPR(R15)(r7)
-	PPC_STL	r16, VCPU_GPR(R16)(r7)
-	PPC_STL	r17, VCPU_GPR(R17)(r7)
-	PPC_STL	r18, VCPU_GPR(R18)(r7)
-	PPC_STL	r19, VCPU_GPR(R19)(r7)
-	PPC_STL	r20, VCPU_GPR(R20)(r7)
-	PPC_STL	r21, VCPU_GPR(R21)(r7)
-	PPC_STL	r22, VCPU_GPR(R22)(r7)
-	PPC_STL	r23, VCPU_GPR(R23)(r7)
-	PPC_STL	r24, VCPU_GPR(R24)(r7)
-	PPC_STL	r25, VCPU_GPR(R25)(r7)
-	PPC_STL	r26, VCPU_GPR(R26)(r7)
-	PPC_STL	r27, VCPU_GPR(R27)(r7)
-	PPC_STL	r28, VCPU_GPR(R28)(r7)
-	PPC_STL	r29, VCPU_GPR(R29)(r7)
-	PPC_STL	r30, VCPU_GPR(R30)(r7)
-	PPC_STL	r31, VCPU_GPR(R31)(r7)
+	VCPU_STORE_NVGPRS(r7)
 
 	/* Pass the exit number as 3rd argument to kvmppc_handle_exit */
 	lwz	r5, VCPU_TRAP(r7)
@@ -221,10 +264,10 @@ kvm_exit_loop:
 	PPC_LL	r4, _LINK(r1)
 	mtlr	r4
 
-	lwz	r14, _CCR(r1)
-	mtcr	r14
+	lwz	r15, _CCR(r1)
+	mtcr	r15
 
-	/* Restore non-volatile host registers (r14 - r31) */
+	/* Restore non-volatile host registers (r14/r15 - r31) */
 	REST_NVGPRS(r1)
 
 	addi    r1, r1, SWITCH_FRAME_SIZE
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d0dc8624198f..b149d4fa4a45 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -159,6 +159,9 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
 	svcpu->gpr[11] = vcpu->arch.gpr[11];
 	svcpu->gpr[12] = vcpu->arch.gpr[12];
 	svcpu->gpr[13] = vcpu->arch.gpr[13];
+#ifdef CONFIG_PPC_BOOK3S_64
+	svcpu->gpr[14] = vcpu->arch.gpr[14];
+#endif
 	svcpu->cr  = vcpu->arch.cr;
 	svcpu->xer = vcpu->arch.xer;
 	svcpu->ctr = vcpu->arch.ctr;
@@ -209,6 +212,9 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
 	vcpu->arch.gpr[11] = svcpu->gpr[11];
 	vcpu->arch.gpr[12] = svcpu->gpr[12];
 	vcpu->arch.gpr[13] = svcpu->gpr[13];
+#ifdef CONFIG_PPC_BOOK3S_64
+	vcpu->arch.gpr[14] = svcpu->gpr[14];
+#endif
 	vcpu->arch.cr  = svcpu->cr;
 	vcpu->arch.xer = svcpu->xer;
 	vcpu->arch.ctr = svcpu->ctr;
diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
index d7f1a966136e..19b0fd549b24 100644
--- a/arch/powerpc/lib/checksum_64.S
+++ b/arch/powerpc/lib/checksum_64.S
@@ -64,9 +64,9 @@ _GLOBAL(__csum_partial)
 	mtctr	r6
 
 	stdu	r1,-STACKFRAMESIZE(r1)
-	std	r14,STK_REG(R14)(r1)
 	std	r15,STK_REG(R15)(r1)
 	std	r16,STK_REG(R16)(r1)
+	std	r17,STK_REG(R17)(r1)
 
 	ld	r6,0(r3)
 	ld	r9,8(r3)
@@ -84,11 +84,11 @@ _GLOBAL(__csum_partial)
 2:
 	adde	r0,r0,r6
 	ld	r12,32(r3)
-	ld	r14,40(r3)
+	ld	r15,40(r3)
 
 	adde	r0,r0,r9
-	ld	r15,48(r3)
-	ld	r16,56(r3)
+	ld	r16,48(r3)
+	ld	r17,56(r3)
 	addi	r3,r3,64
 
 	adde	r0,r0,r10
@@ -97,13 +97,13 @@ _GLOBAL(__csum_partial)
 
 	adde	r0,r0,r12
 
-	adde	r0,r0,r14
-
 	adde	r0,r0,r15
+
+	adde	r0,r0,r16
 	ld	r6,0(r3)
 	ld	r9,8(r3)
 
-	adde	r0,r0,r16
+	adde	r0,r0,r17
 	ld	r10,16(r3)
 	ld	r11,24(r3)
 	bdnz	2b
@@ -111,23 +111,23 @@ _GLOBAL(__csum_partial)
 
 	adde	r0,r0,r6
 	ld	r12,32(r3)
-	ld	r14,40(r3)
+	ld	r15,40(r3)
 
 	adde	r0,r0,r9
-	ld	r15,48(r3)
-	ld	r16,56(r3)
+	ld	r16,48(r3)
+	ld	r17,56(r3)
 	addi	r3,r3,64
 
 	adde	r0,r0,r10
 	adde	r0,r0,r11
 	adde	r0,r0,r12
-	adde	r0,r0,r14
 	adde	r0,r0,r15
 	adde	r0,r0,r16
+	adde	r0,r0,r17
 
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 
 	andi.	r4,r4,63
@@ -258,9 +258,9 @@ dstnr;	sth	r6,0(r4)
 	mtctr	r6
 
 	stdu	r1,-STACKFRAMESIZE(r1)
-	std	r14,STK_REG(R14)(r1)
 	std	r15,STK_REG(R15)(r1)
 	std	r16,STK_REG(R16)(r1)
+	std	r17,STK_REG(R17)(r1)
 
 source;	ld	r6,0(r3)
 source;	ld	r9,8(r3)
@@ -278,11 +278,11 @@ source;	ld	r11,24(r3)
 2:
 	adde	r0,r0,r6
 source;	ld	r12,32(r3)
-source;	ld	r14,40(r3)
+source;	ld	r15,40(r3)
 
 	adde	r0,r0,r9
-source;	ld	r15,48(r3)
-source;	ld	r16,56(r3)
+source;	ld	r16,48(r3)
+source;	ld	r17,56(r3)
 	addi	r3,r3,64
 
 	adde	r0,r0,r10
@@ -295,18 +295,18 @@ dest;	std	r11,24(r4)
 
 	adde	r0,r0,r12
 dest;	std	r12,32(r4)
-dest;	std	r14,40(r4)
+dest;	std	r15,40(r4)
 
-	adde	r0,r0,r14
-dest;	std	r15,48(r4)
-dest;	std	r16,56(r4)
+	adde	r0,r0,r15
+dest;	std	r16,48(r4)
+dest;	std	r17,56(r4)
 	addi	r4,r4,64
 
-	adde	r0,r0,r15
+	adde	r0,r0,r16
 source;	ld	r6,0(r3)
 source;	ld	r9,8(r3)
 
-	adde	r0,r0,r16
+	adde	r0,r0,r17
 source;	ld	r10,16(r3)
 source;	ld	r11,24(r3)
 	bdnz	2b
@@ -314,11 +314,11 @@ source;	ld	r11,24(r3)
 
 	adde	r0,r0,r6
 source;	ld	r12,32(r3)
-source;	ld	r14,40(r3)
+source;	ld	r15,40(r3)
 
 	adde	r0,r0,r9
-source;	ld	r15,48(r3)
-source;	ld	r16,56(r3)
+source;	ld	r16,48(r3)
+source;	ld	r17,56(r3)
 	addi	r3,r3,64
 
 	adde	r0,r0,r10
@@ -331,19 +331,19 @@ dest;	std	r11,24(r4)
 
 	adde	r0,r0,r12
 dest;	std	r12,32(r4)
-dest;	std	r14,40(r4)
+dest;	std	r15,40(r4)
 
-	adde	r0,r0,r14
-dest;	std	r15,48(r4)
-dest;	std	r16,56(r4)
+	adde	r0,r0,r15
+dest;	std	r16,48(r4)
+dest;	std	r17,56(r4)
 	addi	r4,r4,64
 
-	adde	r0,r0,r15
 	adde	r0,r0,r16
+	adde	r0,r0,r17
 
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 
 	andi.	r5,r5,63
@@ -406,9 +406,9 @@ dstnr;	stb	r6,0(r4)
 	blr
 
 .Lsrc_error:
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 .Lsrc_error_nr:
 	cmpdi	0,r7,0
@@ -418,9 +418,9 @@ dstnr;	stb	r6,0(r4)
 	blr
 
 .Ldest_error:
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 .Ldest_error_nr:
 	cmpdi	0,r8,0
diff --git a/arch/powerpc/lib/copypage_power7.S b/arch/powerpc/lib/copypage_power7.S
index ca5fc8fa7efc..c65fc4a412f2 100644
--- a/arch/powerpc/lib/copypage_power7.S
+++ b/arch/powerpc/lib/copypage_power7.S
@@ -113,13 +113,13 @@ _GLOBAL(copypage_power7)
 #endif
 
 .Lnonvmx_copy:
-	std	r14,STK_REG(R14)(r1)
 	std	r15,STK_REG(R15)(r1)
 	std	r16,STK_REG(R16)(r1)
 	std	r17,STK_REG(R17)(r1)
 	std	r18,STK_REG(R18)(r1)
 	std	r19,STK_REG(R19)(r1)
 	std	r20,STK_REG(R20)(r1)
+	std	r21,STK_REG(R21)(r1)
 
 1:	ld	r0,0(r4)
 	ld	r5,8(r4)
@@ -130,13 +130,13 @@ _GLOBAL(copypage_power7)
 	ld	r10,48(r4)
 	ld	r11,56(r4)
 	ld	r12,64(r4)
-	ld	r14,72(r4)
-	ld	r15,80(r4)
-	ld	r16,88(r4)
-	ld	r17,96(r4)
-	ld	r18,104(r4)
-	ld	r19,112(r4)
-	ld	r20,120(r4)
+	ld	r15,72(r4)
+	ld	r16,80(r4)
+	ld	r17,88(r4)
+	ld	r18,96(r4)
+	ld	r19,104(r4)
+	ld	r20,112(r4)
+	ld	r21,120(r4)
 	addi	r4,r4,128
 	std	r0,0(r3)
 	std	r5,8(r3)
@@ -147,22 +147,22 @@ _GLOBAL(copypage_power7)
 	std	r10,48(r3)
 	std	r11,56(r3)
 	std	r12,64(r3)
-	std	r14,72(r3)
-	std	r15,80(r3)
-	std	r16,88(r3)
-	std	r17,96(r3)
-	std	r18,104(r3)
-	std	r19,112(r3)
-	std	r20,120(r3)
+	std	r15,72(r3)
+	std	r16,80(r3)
+	std	r17,88(r3)
+	std	r18,96(r3)
+	std	r19,104(r3)
+	std	r20,112(r3)
+	std	r21,120(r3)
 	addi	r3,r3,128
 	bdnz	1b
 
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
 	ld	r17,STK_REG(R17)(r1)
 	ld	r18,STK_REG(R18)(r1)
 	ld	r19,STK_REG(R19)(r1)
 	ld	r20,STK_REG(R20)(r1)
+	ld	r21,STK_REG(R21)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 	blr
diff --git a/arch/powerpc/lib/copyuser_power7.S b/arch/powerpc/lib/copyuser_power7.S
index d416a4a66578..9c137451149b 100644
--- a/arch/powerpc/lib/copyuser_power7.S
+++ b/arch/powerpc/lib/copyuser_power7.S
@@ -50,9 +50,9 @@
 
 
 .Ldo_err4:
-	ld	r16,STK_REG(R16)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r14,STK_REG(R14)(r1)
+	ld	r17,STK_REG(R16)(r1)
+	ld	r16,STK_REG(R15)(r1)
+	ld	r15,STK_REG(R14)(r1)
 .Ldo_err3:
 	bl	exit_vmx_usercopy
 	ld	r0,STACKFRAMESIZE+16(r1)
@@ -61,15 +61,15 @@
 #endif /* CONFIG_ALTIVEC */
 
 .Ldo_err2:
-	ld	r22,STK_REG(R22)(r1)
-	ld	r21,STK_REG(R21)(r1)
-	ld	r20,STK_REG(R20)(r1)
-	ld	r19,STK_REG(R19)(r1)
-	ld	r18,STK_REG(R18)(r1)
-	ld	r17,STK_REG(R17)(r1)
-	ld	r16,STK_REG(R16)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r14,STK_REG(R14)(r1)
+	ld	r23,STK_REG(R22)(r1)
+	ld	r22,STK_REG(R21)(r1)
+	ld	r21,STK_REG(R20)(r1)
+	ld	r20,STK_REG(R19)(r1)
+	ld	r19,STK_REG(R18)(r1)
+	ld	r18,STK_REG(R17)(r1)
+	ld	r17,STK_REG(R16)(r1)
+	ld	r16,STK_REG(R15)(r1)
+	ld	r15,STK_REG(R14)(r1)
 .Lexit:
 	addi	r1,r1,STACKFRAMESIZE
 .Ldo_err1:
@@ -130,15 +130,15 @@ err1;	stw	r0,0(r3)
 
 	mflr	r0
 	stdu	r1,-STACKFRAMESIZE(r1)
-	std	r14,STK_REG(R14)(r1)
-	std	r15,STK_REG(R15)(r1)
-	std	r16,STK_REG(R16)(r1)
-	std	r17,STK_REG(R17)(r1)
-	std	r18,STK_REG(R18)(r1)
-	std	r19,STK_REG(R19)(r1)
-	std	r20,STK_REG(R20)(r1)
-	std	r21,STK_REG(R21)(r1)
-	std	r22,STK_REG(R22)(r1)
+	std	r15,STK_REG(R14)(r1)
+	std	r16,STK_REG(R15)(r1)
+	std	r17,STK_REG(R16)(r1)
+	std	r18,STK_REG(R17)(r1)
+	std	r19,STK_REG(R18)(r1)
+	std	r20,STK_REG(R19)(r1)
+	std	r21,STK_REG(R20)(r1)
+	std	r22,STK_REG(R21)(r1)
+	std	r23,STK_REG(R22)(r1)
 	std	r0,STACKFRAMESIZE+16(r1)
 
 	srdi	r6,r5,7
@@ -155,14 +155,14 @@ err2;	ld	r9,32(r4)
 err2;	ld	r10,40(r4)
 err2;	ld	r11,48(r4)
 err2;	ld	r12,56(r4)
-err2;	ld	r14,64(r4)
-err2;	ld	r15,72(r4)
-err2;	ld	r16,80(r4)
-err2;	ld	r17,88(r4)
-err2;	ld	r18,96(r4)
-err2;	ld	r19,104(r4)
-err2;	ld	r20,112(r4)
-err2;	ld	r21,120(r4)
+err2;	ld	r15,64(r4)
+err2;	ld	r16,72(r4)
+err2;	ld	r17,80(r4)
+err2;	ld	r18,88(r4)
+err2;	ld	r19,96(r4)
+err2;	ld	r20,104(r4)
+err2;	ld	r21,112(r4)
+err2;	ld	r22,120(r4)
 	addi	r4,r4,128
 err2;	std	r0,0(r3)
 err2;	std	r6,8(r3)
@@ -172,28 +172,28 @@ err2;	std	r9,32(r3)
 err2;	std	r10,40(r3)
 err2;	std	r11,48(r3)
 err2;	std	r12,56(r3)
-err2;	std	r14,64(r3)
-err2;	std	r15,72(r3)
-err2;	std	r16,80(r3)
-err2;	std	r17,88(r3)
-err2;	std	r18,96(r3)
-err2;	std	r19,104(r3)
-err2;	std	r20,112(r3)
-err2;	std	r21,120(r3)
+err2;	std	r15,64(r3)
+err2;	std	r16,72(r3)
+err2;	std	r17,80(r3)
+err2;	std	r18,88(r3)
+err2;	std	r19,96(r3)
+err2;	std	r20,104(r3)
+err2;	std	r21,112(r3)
+err2;	std	r22,120(r3)
 	addi	r3,r3,128
 	bdnz	4b
 
 	clrldi	r5,r5,(64-7)
 
-	ld	r14,STK_REG(R14)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r16,STK_REG(R16)(r1)
-	ld	r17,STK_REG(R17)(r1)
-	ld	r18,STK_REG(R18)(r1)
-	ld	r19,STK_REG(R19)(r1)
-	ld	r20,STK_REG(R20)(r1)
-	ld	r21,STK_REG(R21)(r1)
-	ld	r22,STK_REG(R22)(r1)
+	ld	r15,STK_REG(R14)(r1)
+	ld	r16,STK_REG(R15)(r1)
+	ld	r17,STK_REG(R16)(r1)
+	ld	r18,STK_REG(R17)(r1)
+	ld	r19,STK_REG(R18)(r1)
+	ld	r20,STK_REG(R19)(r1)
+	ld	r21,STK_REG(R20)(r1)
+	ld	r22,STK_REG(R21)(r1)
+	ld	r23,STK_REG(R22)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 
 	/* Up to 127B to go */
@@ -404,14 +404,14 @@ err3;	stvx	v0,r3,r11
 7:	sub	r5,r5,r6
 	srdi	r6,r5,7
 
-	std	r14,STK_REG(R14)(r1)
-	std	r15,STK_REG(R15)(r1)
-	std	r16,STK_REG(R16)(r1)
+	std	r15,STK_REG(R14)(r1)
+	std	r16,STK_REG(R15)(r1)
+	std	r17,STK_REG(R16)(r1)
 
 	li	r12,64
-	li	r14,80
-	li	r15,96
-	li	r16,112
+	li	r15,80
+	li	r16,96
+	li	r17,112
 
 	mtctr	r6
 
@@ -426,24 +426,24 @@ err4;	lvx	v6,r4,r9
 err4;	lvx	v5,r4,r10
 err4;	lvx	v4,r4,r11
 err4;	lvx	v3,r4,r12
-err4;	lvx	v2,r4,r14
-err4;	lvx	v1,r4,r15
-err4;	lvx	v0,r4,r16
+err4;	lvx	v2,r4,r15
+err4;	lvx	v1,r4,r16
+err4;	lvx	v0,r4,r17
 	addi	r4,r4,128
 err4;	stvx	v7,0,r3
 err4;	stvx	v6,r3,r9
 err4;	stvx	v5,r3,r10
 err4;	stvx	v4,r3,r11
 err4;	stvx	v3,r3,r12
-err4;	stvx	v2,r3,r14
-err4;	stvx	v1,r3,r15
-err4;	stvx	v0,r3,r16
+err4;	stvx	v2,r3,r15
+err4;	stvx	v1,r3,r16
+err4;	stvx	v0,r3,r17
 	addi	r3,r3,128
 	bdnz	8b
 
-	ld	r14,STK_REG(R14)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r16,STK_REG(R16)(r1)
+	ld	r15,STK_REG(R14)(r1)
+	ld	r16,STK_REG(R15)(r1)
+	ld	r17,STK_REG(R16)(r1)
 
 	/* Up to 127B to go */
 	clrldi	r5,r5,(64-7)
@@ -589,14 +589,14 @@ err3;	stvx	v11,r3,r11
 7:	sub	r5,r5,r6
 	srdi	r6,r5,7
 
-	std	r14,STK_REG(R14)(r1)
-	std	r15,STK_REG(R15)(r1)
-	std	r16,STK_REG(R16)(r1)
+	std	r15,STK_REG(R14)(r1)
+	std	r16,STK_REG(R15)(r1)
+	std	r17,STK_REG(R16)(r1)
 
 	li	r12,64
-	li	r14,80
-	li	r15,96
-	li	r16,112
+	li	r15,80
+	li	r16,96
+	li	r17,112
 
 	mtctr	r6
 
@@ -616,11 +616,11 @@ err4;	lvx	v4,r4,r11
 	VPERM(v11,v5,v4,v16)
 err4;	lvx	v3,r4,r12
 	VPERM(v12,v4,v3,v16)
-err4;	lvx	v2,r4,r14
+err4;	lvx	v2,r4,r15
 	VPERM(v13,v3,v2,v16)
-err4;	lvx	v1,r4,r15
+err4;	lvx	v1,r4,r16
 	VPERM(v14,v2,v1,v16)
-err4;	lvx	v0,r4,r16
+err4;	lvx	v0,r4,r17
 	VPERM(v15,v1,v0,v16)
 	addi	r4,r4,128
 err4;	stvx	v8,0,r3
@@ -628,15 +628,15 @@ err4;	stvx	v9,r3,r9
 err4;	stvx	v10,r3,r10
 err4;	stvx	v11,r3,r11
 err4;	stvx	v12,r3,r12
-err4;	stvx	v13,r3,r14
-err4;	stvx	v14,r3,r15
-err4;	stvx	v15,r3,r16
+err4;	stvx	v13,r3,r15
+err4;	stvx	v14,r3,r16
+err4;	stvx	v15,r3,r17
 	addi	r3,r3,128
 	bdnz	8b
 
-	ld	r14,STK_REG(R14)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r16,STK_REG(R16)(r1)
+	ld	r15,STK_REG(R14)(r1)
+	ld	r16,STK_REG(R15)(r1)
+	ld	r17,STK_REG(R16)(r1)
 
 	/* Up to 127B to go */
 	clrldi	r5,r5,(64-7)
diff --git a/arch/powerpc/lib/crtsavres.S b/arch/powerpc/lib/crtsavres.S
index 7e5e1c28e56a..c46ad2f0a718 100644
--- a/arch/powerpc/lib/crtsavres.S
+++ b/arch/powerpc/lib/crtsavres.S
@@ -314,9 +314,12 @@ _GLOBAL(_restvr_31)
 
 #else /* CONFIG_PPC64 */
 
+/* 64-bit has -ffixed-r13, Book3S also has -ffixed-r14 */
+#ifdef CONFIG_PPC_BOOK3E
 .globl	_savegpr0_14
 _savegpr0_14:
 	std	r14,-144(r1)
+#endif
 .globl	_savegpr0_15
 _savegpr0_15:
 	std	r15,-136(r1)
diff --git a/arch/powerpc/lib/memcpy_power7.S b/arch/powerpc/lib/memcpy_power7.S
index 193909abd18b..dd679a0fa650 100644
--- a/arch/powerpc/lib/memcpy_power7.S
+++ b/arch/powerpc/lib/memcpy_power7.S
@@ -75,7 +75,6 @@ _GLOBAL(memcpy_power7)
 
 	mflr	r0
 	stdu	r1,-STACKFRAMESIZE(r1)
-	std	r14,STK_REG(R14)(r1)
 	std	r15,STK_REG(R15)(r1)
 	std	r16,STK_REG(R16)(r1)
 	std	r17,STK_REG(R17)(r1)
@@ -84,6 +83,7 @@ _GLOBAL(memcpy_power7)
 	std	r20,STK_REG(R20)(r1)
 	std	r21,STK_REG(R21)(r1)
 	std	r22,STK_REG(R22)(r1)
+	std	r23,STK_REG(R23)(r1)
 	std	r0,STACKFRAMESIZE+16(r1)
 
 	srdi	r6,r5,7
@@ -100,14 +100,14 @@ _GLOBAL(memcpy_power7)
 	ld	r10,40(r4)
 	ld	r11,48(r4)
 	ld	r12,56(r4)
-	ld	r14,64(r4)
-	ld	r15,72(r4)
-	ld	r16,80(r4)
-	ld	r17,88(r4)
-	ld	r18,96(r4)
-	ld	r19,104(r4)
-	ld	r20,112(r4)
-	ld	r21,120(r4)
+	ld	r15,64(r4)
+	ld	r16,72(r4)
+	ld	r17,80(r4)
+	ld	r18,88(r4)
+	ld	r19,96(r4)
+	ld	r20,104(r4)
+	ld	r21,112(r4)
+	ld	r22,120(r4)
 	addi	r4,r4,128
 	std	r0,0(r3)
 	std	r6,8(r3)
@@ -117,20 +117,19 @@ _GLOBAL(memcpy_power7)
 	std	r10,40(r3)
 	std	r11,48(r3)
 	std	r12,56(r3)
-	std	r14,64(r3)
-	std	r15,72(r3)
-	std	r16,80(r3)
-	std	r17,88(r3)
-	std	r18,96(r3)
-	std	r19,104(r3)
-	std	r20,112(r3)
-	std	r21,120(r3)
+	std	r15,64(r3)
+	std	r16,72(r3)
+	std	r17,80(r3)
+	std	r18,88(r3)
+	std	r19,96(r3)
+	std	r20,104(r3)
+	std	r21,112(r3)
+	std	r22,120(r3)
 	addi	r3,r3,128
 	bdnz	4b
 
 	clrldi	r5,r5,(64-7)
 
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
 	ld	r17,STK_REG(R17)(r1)
@@ -139,6 +138,7 @@ _GLOBAL(memcpy_power7)
 	ld	r20,STK_REG(R20)(r1)
 	ld	r21,STK_REG(R21)(r1)
 	ld	r22,STK_REG(R22)(r1)
+	ld	r23,STK_REG(R23)(r1)
 	addi	r1,r1,STACKFRAMESIZE
 
 	/* Up to 127B to go */
@@ -349,14 +349,14 @@ _GLOBAL(memcpy_power7)
 7:	sub	r5,r5,r6
 	srdi	r6,r5,7
 
-	std	r14,STK_REG(R14)(r1)
 	std	r15,STK_REG(R15)(r1)
 	std	r16,STK_REG(R16)(r1)
+	std	r17,STK_REG(R17)(r1)
 
 	li	r12,64
-	li	r14,80
-	li	r15,96
-	li	r16,112
+	li	r15,80
+	li	r16,96
+	li	r17,112
 
 	mtctr	r6
 
@@ -371,24 +371,24 @@ _GLOBAL(memcpy_power7)
 	lvx	v5,r4,r10
 	lvx	v4,r4,r11
 	lvx	v3,r4,r12
-	lvx	v2,r4,r14
-	lvx	v1,r4,r15
-	lvx	v0,r4,r16
+	lvx	v2,r4,r15
+	lvx	v1,r4,r16
+	lvx	v0,r4,r17
 	addi	r4,r4,128
 	stvx	v7,0,r3
 	stvx	v6,r3,r9
 	stvx	v5,r3,r10
 	stvx	v4,r3,r11
 	stvx	v3,r3,r12
-	stvx	v2,r3,r14
-	stvx	v1,r3,r15
-	stvx	v0,r3,r16
+	stvx	v2,r3,r15
+	stvx	v1,r3,r16
+	stvx	v0,r3,r17
 	addi	r3,r3,128
 	bdnz	8b
 
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
 
 	/* Up to 127B to go */
 	clrldi	r5,r5,(64-7)
@@ -535,14 +535,14 @@ _GLOBAL(memcpy_power7)
 7:	sub	r5,r5,r6
 	srdi	r6,r5,7
 
-	std	r14,STK_REG(R14)(r1)
 	std	r15,STK_REG(R15)(r1)
 	std	r16,STK_REG(R16)(r1)
+	std	r17,STK_REG(R17)(r1)
 
 	li	r12,64
-	li	r14,80
-	li	r15,96
-	li	r16,112
+	li	r15,80
+	li	r16,96
+	li	r17,112
 
 	mtctr	r6
 
@@ -562,11 +562,11 @@ _GLOBAL(memcpy_power7)
 	VPERM(v11,v5,v4,v16)
 	lvx	v3,r4,r12
 	VPERM(v12,v4,v3,v16)
-	lvx	v2,r4,r14
+	lvx	v2,r4,r15
 	VPERM(v13,v3,v2,v16)
-	lvx	v1,r4,r15
+	lvx	v1,r4,r16
 	VPERM(v14,v2,v1,v16)
-	lvx	v0,r4,r16
+	lvx	v0,r4,r17
 	VPERM(v15,v1,v0,v16)
 	addi	r4,r4,128
 	stvx	v8,0,r3
@@ -574,15 +574,15 @@ _GLOBAL(memcpy_power7)
 	stvx	v10,r3,r10
 	stvx	v11,r3,r11
 	stvx	v12,r3,r12
-	stvx	v13,r3,r14
-	stvx	v14,r3,r15
-	stvx	v15,r3,r16
+	stvx	v13,r3,r15
+	stvx	v14,r3,r16
+	stvx	v15,r3,r17
 	addi	r3,r3,128
 	bdnz	8b
 
-	ld	r14,STK_REG(R14)(r1)
 	ld	r15,STK_REG(R15)(r1)
 	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
 
 	/* Up to 127B to go */
 	clrldi	r5,r5,(64-7)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/8] powerpc/64s: poison r14 register while in kernel
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
  2017-12-20 14:51 ` [RFC PATCH 1/8] powerpc/64s: stop using r14 register Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14 Nicholas Piggin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Poison r14 register with the PIR SPR, an a magic number.
This means it must be treated like r13, saving and restoring the
register on kernel entry/exit, but not restoring it when returning
back to kernel

However r14 will not be a constant like r13, but may be modified by
the kernel, which means it must not be loaded on exception entry if
the exception is coming from the kernel.

This requires loading SRR1 earlier, before the exception mask/kvm
test. That's okay because SRR1 almost always gets loaded anyway.
---
 arch/powerpc/include/asm/exception-64s.h | 121 ++++++++++++++++---------------
 arch/powerpc/include/asm/paca.h          |   5 +-
 arch/powerpc/kernel/asm-offsets.c        |   3 +-
 arch/powerpc/kernel/entry_64.S           |  23 ++++++
 arch/powerpc/kernel/exceptions-64s.S     |  45 ++++++------
 arch/powerpc/kernel/head_64.S            |   5 ++
 arch/powerpc/kernel/idle_book3s.S        |   4 +
 arch/powerpc/kernel/paca.c               |   1 -
 arch/powerpc/kernel/setup_64.c           |   3 +
 arch/powerpc/lib/sstep.c                 |   1 +
 10 files changed, 128 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 54afd1f140a4..dadaa7471755 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -42,16 +42,18 @@
 #define EX_R11		16
 #define EX_R12		24
 #define EX_R13		32
-#define EX_DAR		40
-#define EX_DSISR	48
-#define EX_CCR		52
-#define EX_CFAR		56
-#define EX_PPR		64
+#define EX_R14		40
+#define EX_DAR		48
+#define EX_DSISR	56
+#define EX_CCR		60
+#define EX_CFAR		64
+#define EX_PPR		72
+
 #if defined(CONFIG_RELOCATABLE)
-#define EX_CTR		72
-#define EX_SIZE		10	/* size in u64 units */
+#define EX_CTR		80
+#define EX_SIZE		11	/* size in u64 units */
 #else
-#define EX_SIZE		9	/* size in u64 units */
+#define EX_SIZE		10	/* size in u64 units */
 #endif
 
 /*
@@ -77,9 +79,8 @@
 #ifdef CONFIG_RELOCATABLE
 #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
 	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
-	LOAD_HANDLER(r12,label);					\
-	mtctr	r12;							\
-	mfspr	r12,SPRN_##h##SRR1;	/* and SRR1 */			\
+	LOAD_HANDLER(r10,label);					\
+	mtctr	r10;							\
 	li	r10,MSR_RI;						\
 	mtmsrd 	r10,1;			/* Set RI (EE=0) */		\
 	bctr;
@@ -87,7 +88,6 @@
 /* If not relocatable, we can jump directly -- and save messing with LR */
 #define __EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)			\
 	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
-	mfspr	r12,SPRN_##h##SRR1;	/* and SRR1 */			\
 	li	r10,MSR_RI;						\
 	mtmsrd 	r10,1;			/* Set RI (EE=0) */		\
 	b	label;
@@ -102,7 +102,7 @@
  */
 #define EXCEPTION_RELON_PROLOG_PSERIES(area, label, h, extra, vec)	\
 	EXCEPTION_PROLOG_0(area);					\
-	EXCEPTION_PROLOG_1(area, extra, vec);				\
+	EXCEPTION_PROLOG_1(area, h, extra, vec);				\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)
 
 /*
@@ -198,17 +198,21 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	std	r10,area+EX_R10(r13);	/* save r10 - r12 */		\
 	OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
 
-#define __EXCEPTION_PROLOG_1_PRE(area)					\
+#define __EXCEPTION_PROLOG_1(area, h)					\
+	std	r11,area+EX_R11(r13);					\
+	std	r12,area+EX_R12(r13);					\
+	mfspr	r12,SPRN_##h##SRR1;	/* and SRR1 */			\
+	GET_SCRATCH0(r11);						\
 	OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR);		\
 	OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);		\
 	SAVE_CTR(r10, area);						\
-	mfcr	r9;
-
-#define __EXCEPTION_PROLOG_1_POST(area)					\
-	std	r11,area+EX_R11(r13);					\
-	std	r12,area+EX_R12(r13);					\
-	GET_SCRATCH0(r10);						\
-	std	r10,area+EX_R13(r13)
+	mfcr	r9;							\
+	std	r11,area+EX_R13(r13);					\
+	std	r14,area+EX_R14(r13);					\
+	andi.	r10,r12,MSR_PR;						\
+	beq	1f;							\
+	ld	r14,PACA_R14(r13);					\
+1:
 
 /*
  * This version of the EXCEPTION_PROLOG_1 will carry
@@ -216,30 +220,27 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
  * checking of the interrupt maskable level in the SOFTEN_TEST.
  * Intended to be used in MASKABLE_EXCPETION_* macros.
  */
-#define MASKABLE_EXCEPTION_PROLOG_1(area, extra, vec, bitmask)			\
-	__EXCEPTION_PROLOG_1_PRE(area);					\
-	extra(vec, bitmask);						\
-	__EXCEPTION_PROLOG_1_POST(area);
+#define MASKABLE_EXCEPTION_PROLOG_1(area, h, extra, vec, bitmask)	\
+	__EXCEPTION_PROLOG_1(area, h);					\
+	extra(vec, bitmask);
 
 /*
  * This version of the EXCEPTION_PROLOG_1 is intended
  * to be used in STD_EXCEPTION* macros
  */
-#define _EXCEPTION_PROLOG_1(area, extra, vec)				\
-	__EXCEPTION_PROLOG_1_PRE(area);					\
-	extra(vec);							\
-	__EXCEPTION_PROLOG_1_POST(area);
+#define _EXCEPTION_PROLOG_1(area, h, extra, vec)			\
+	__EXCEPTION_PROLOG_1(area, h);					\
+	extra(vec);
 
-#define EXCEPTION_PROLOG_1(area, extra, vec)				\
-	_EXCEPTION_PROLOG_1(area, extra, vec)
+#define EXCEPTION_PROLOG_1(area, h, extra, vec)				\
+	_EXCEPTION_PROLOG_1(area, h, extra, vec)
 
 #define __EXCEPTION_PROLOG_PSERIES_1(label, h)				\
-	ld	r10,PACAKMSR(r13);	/* get MSR value for kernel */	\
 	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
-	LOAD_HANDLER(r12,label)						\
-	mtspr	SPRN_##h##SRR0,r12;					\
-	mfspr	r12,SPRN_##h##SRR1;	/* and SRR1 */			\
+	ld	r10,PACAKMSR(r13);	/* get MSR value for kernel */	\
 	mtspr	SPRN_##h##SRR1,r10;					\
+	LOAD_HANDLER(r10,label)						\
+	mtspr	SPRN_##h##SRR0,r10;					\
 	h##rfid;							\
 	b	.	/* prevent speculative execution */
 #define EXCEPTION_PROLOG_PSERIES_1(label, h)				\
@@ -247,13 +248,12 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 /* _NORI variant keeps MSR_RI clear */
 #define __EXCEPTION_PROLOG_PSERIES_1_NORI(label, h)			\
+	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
 	ld	r10,PACAKMSR(r13);	/* get MSR value for kernel */	\
 	xori	r10,r10,MSR_RI;		/* Clear MSR_RI */		\
-	mfspr	r11,SPRN_##h##SRR0;	/* save SRR0 */			\
-	LOAD_HANDLER(r12,label)						\
-	mtspr	SPRN_##h##SRR0,r12;					\
-	mfspr	r12,SPRN_##h##SRR1;	/* and SRR1 */			\
 	mtspr	SPRN_##h##SRR1,r10;					\
+	LOAD_HANDLER(r10,label)						\
+	mtspr	SPRN_##h##SRR0,r10;					\
 	h##rfid;							\
 	b	.	/* prevent speculative execution */
 
@@ -262,7 +262,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define EXCEPTION_PROLOG_PSERIES(area, label, h, extra, vec)		\
 	EXCEPTION_PROLOG_0(area);					\
-	EXCEPTION_PROLOG_1(area, extra, vec);				\
+	EXCEPTION_PROLOG_1(area, h, extra, vec);			\
 	EXCEPTION_PROLOG_PSERIES_1(label, h);
 
 #define __KVMTEST(h, n)							\
@@ -336,7 +336,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 /* Do not enable RI */
 #define EXCEPTION_PROLOG_PSERIES_NORI(area, label, h, extra, vec)	\
 	EXCEPTION_PROLOG_0(area);					\
-	EXCEPTION_PROLOG_1(area, extra, vec);				\
+	EXCEPTION_PROLOG_1(area, h, extra, vec);			\
 	EXCEPTION_PROLOG_PSERIES_1_NORI(label, h);
 
 
@@ -349,8 +349,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	ld	r10,area+EX_PPR(r13);					\
 	std	r10,HSTATE_PPR(r13);					\
 	END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);	\
+	ld	r10,area+EX_R12(r13);					\
+	std	r10,HSTATE_SCRATCH0(r13);				\
 	ld	r10,area+EX_R10(r13);					\
-	std	r12,HSTATE_SCRATCH0(r13);				\
 	sldi	r12,r9,32;						\
 	ori	r12,r12,(n);						\
 	/* This reloads r9 before branching to kvmppc_interrupt */	\
@@ -363,8 +364,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	ld	r10,area+EX_PPR(r13);					\
 	std	r10,HSTATE_PPR(r13);					\
 	END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);	\
+	ld	r10,area+EX_R12(r13);					\
+	std	r10,HSTATE_SCRATCH0(r13);				\
 	ld	r10,area+EX_R10(r13);					\
-	std	r12,HSTATE_SCRATCH0(r13);				\
 	sldi	r12,r9,32;						\
 	ori	r12,r12,(n);						\
 	/* This reloads r9 before branching to kvmppc_interrupt */	\
@@ -372,6 +374,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 89:	mtocrf	0x80,r9;						\
 	ld	r9,area+EX_R9(r13);					\
 	ld	r10,area+EX_R10(r13);					\
+	ld	r12,area+EX_R12(r13);					\
 	b	kvmppc_skip_##h##interrupt
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
@@ -393,7 +396,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	std	r12,_MSR(r1);		/* save SRR1 in stackframe	*/ \
 	std	r10,0(r1);		/* make stack chain pointer	*/ \
 	std	r0,GPR0(r1);		/* save r0 in stackframe	*/ \
-	std	r10,GPR1(r1);		/* save r1 in stackframe	*/ \
+	std	r10,GPR1(r1)		/* save r1 in stackframe	*/
 
 
 /*
@@ -430,16 +433,18 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 /* Save original regs values from save area to stack frame. */
 #define EXCEPTION_PROLOG_COMMON_2(area)					   \
-	ld	r9,area+EX_R9(r13);	/* move r9, r10 to stackframe	*/ \
+	ld	r9,area+EX_R9(r13);	/* move r9-r14 to stackframe	*/ \
 	ld	r10,area+EX_R10(r13);					   \
 	std	r9,GPR9(r1);						   \
 	std	r10,GPR10(r1);						   \
-	ld	r9,area+EX_R11(r13);	/* move r11 - r13 to stackframe	*/ \
+	ld	r9,area+EX_R11(r13);					   \
 	ld	r10,area+EX_R12(r13);					   \
-	ld	r11,area+EX_R13(r13);					   \
 	std	r9,GPR11(r1);						   \
 	std	r10,GPR12(r1);						   \
-	std	r11,GPR13(r1);						   \
+	ld	r9,area+EX_R13(r13);					   \
+	ld	r10,area+EX_R14(r13);					   \
+	std	r9,GPR13(r1);						   \
+	std	r10,GPR14(r1);						   \
 	BEGIN_FTR_SECTION_NESTED(66);					   \
 	ld	r10,area+EX_CFAR(r13);					   \
 	std	r10,ORIG_GPR3(r1);					   \
@@ -480,7 +485,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	b hdlr;
 
 #define STD_EXCEPTION_PSERIES_OOL(vec, label)			\
-	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);	\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_STD, KVMTEST_PR, vec);	\
 	EXCEPTION_PROLOG_PSERIES_1(label, EXC_STD)
 
 #define STD_EXCEPTION_HV(loc, vec, label)			\
@@ -489,7 +494,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 				 EXC_HV, KVMTEST_HV, vec);
 
 #define STD_EXCEPTION_HV_OOL(vec, label)			\
-	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_HV, vec);	\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_HV, KVMTEST_HV, vec);	\
 	EXCEPTION_PROLOG_PSERIES_1(label, EXC_HV)
 
 #define STD_RELON_EXCEPTION_PSERIES(loc, vec, label)	\
@@ -498,7 +503,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label, EXC_STD, NOTEST, vec);
 
 #define STD_RELON_EXCEPTION_PSERIES_OOL(vec, label)		\
-	EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec);		\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_STD, NOTEST, vec);		\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_STD)
 
 #define STD_RELON_EXCEPTION_HV(loc, vec, label)		\
@@ -507,7 +512,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 				       EXC_HV, KVMTEST_HV, vec);
 
 #define STD_RELON_EXCEPTION_HV_OOL(vec, label)			\
-	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_HV, vec);	\
+	EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_HV, KVMTEST_HV, vec);	\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_HV)
 
 /* This associate vector numbers with bits in paca->irq_happened */
@@ -548,7 +553,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define __MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)	\
 	SET_SCRATCH0(r13);    /* save r13 */				\
 	EXCEPTION_PROLOG_0(PACA_EXGEN);					\
-	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec, bitmask);	\
+	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, h, extra, vec, bitmask);\
 	EXCEPTION_PROLOG_PSERIES_1(label, h);
 
 #define _MASKABLE_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)	\
@@ -559,7 +564,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 				    EXC_STD, SOFTEN_TEST_PR, bitmask)
 
 #define MASKABLE_EXCEPTION_PSERIES_OOL(vec, label, bitmask)		\
-	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec, bitmask);\
+	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_STD, SOFTEN_TEST_PR, vec, bitmask);\
 	EXCEPTION_PROLOG_PSERIES_1(label, EXC_STD)
 
 #define MASKABLE_EXCEPTION_HV(loc, vec, label, bitmask)			\
@@ -567,13 +572,13 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 				    EXC_HV, SOFTEN_TEST_HV, bitmask)
 
 #define MASKABLE_EXCEPTION_HV_OOL(vec, label, bitmask)			\
-	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec, bitmask);\
+	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_HV, SOFTEN_TEST_HV, vec, bitmask);\
 	EXCEPTION_PROLOG_PSERIES_1(label, EXC_HV)
 
 #define __MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra, bitmask) \
 	SET_SCRATCH0(r13);    /* save r13 */				\
 	EXCEPTION_PROLOG_0(PACA_EXGEN);					\
-	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec, bitmask);	\
+	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, h, extra, vec, bitmask);\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label, h)
 
 #define _MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra, bitmask)\
@@ -584,7 +589,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 					  EXC_STD, SOFTEN_NOTEST_PR, bitmask)
 
 #define MASKABLE_RELON_EXCEPTION_PSERIES_OOL(vec, label, bitmask)	\
-	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_PR, vec, bitmask);\
+	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_STD, SOFTEN_NOTEST_PR, vec, bitmask);\
 	EXCEPTION_PROLOG_PSERIES_1(label, EXC_STD);
 
 #define MASKABLE_RELON_EXCEPTION_HV(loc, vec, label, bitmask)		\
@@ -592,7 +597,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 					  EXC_HV, SOFTEN_TEST_HV, bitmask)
 
 #define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label, bitmask)		\
-	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec, bitmask);\
+	MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_HV, SOFTEN_NOTEST_HV, vec, bitmask);\
 	EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_HV)
 
 /*
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e2ee193eb24d..cd6a9a010895 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -34,6 +34,9 @@
 #include <asm/cpuidle.h>
 
 register struct paca_struct *local_paca asm("r13");
+#ifdef CONFIG_PPC_BOOK3S
+register u64 local_r14 asm("r14");
+#endif
 
 #if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
 extern unsigned int debug_smp_processor_id(void); /* from linux/smp.h */
@@ -65,7 +68,6 @@ struct paca_struct {
 	 * read-only (after boot) fields in the first cacheline to
 	 * avoid cacheline bouncing.
 	 */
-
 	struct lppaca *lppaca_ptr;	/* Pointer to LpPaca for PLIC */
 #endif /* CONFIG_PPC_BOOK3S */
 	/*
@@ -104,6 +106,7 @@ struct paca_struct {
 	 */
 	/* used for most interrupts/exceptions */
 	u64 exgen[EX_SIZE] __attribute__((aligned(0x80)));
+	u64 r14;
 	u64 exslb[EX_SIZE];	/* used for SLB/segment table misses
  				 * on the linear mapping */
 	/* SLB related definitions */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index db8407483c9e..32d393f55a96 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -218,8 +218,9 @@ int main(void)
 	OFFSET(PACACONTEXTSLLP, paca_struct, mm_ctx_sllp);
 #endif /* CONFIG_PPC_MM_SLICES */
 	OFFSET(PACA_EXGEN, paca_struct, exgen);
-	OFFSET(PACA_EXMC, paca_struct, exmc);
+	OFFSET(PACA_R14, paca_struct, r14);
 	OFFSET(PACA_EXSLB, paca_struct, exslb);
+	OFFSET(PACA_EXMC, paca_struct, exmc);
 	OFFSET(PACA_EXNMI, paca_struct, exnmi);
 	OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr);
 	OFFSET(PACA_SLBSHADOWPTR, paca_struct, slb_shadow_ptr);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index b9bf44635b10..592e4b36065f 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -37,6 +37,11 @@
 #include <asm/tm.h>
 #include <asm/ppc-opcode.h>
 #include <asm/export.h>
+#ifdef CONFIG_PPC_BOOK3S
+#include <asm/exception-64s.h>
+#else
+#include <asm/exception-64e.h>
+#endif
 
 /*
  * System calls.
@@ -65,11 +70,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 	addi	r1,r1,-INT_FRAME_SIZE
 	beq-	1f
 	ld	r1,PACAKSAVE(r13)
+#ifdef CONFIG_PPC_BOOK3S
+	ld	r14,PACA_R14(r13)
+#endif
 1:	std	r10,0(r1)
 	std	r11,_NIP(r1)
 	std	r12,_MSR(r1)
 	std	r0,GPR0(r1)
 	std	r10,GPR1(r1)
+#ifdef CONFIG_PPC_BOOK3S
+	ld	r10,PACA_EXGEN+EX_R14(r13)
+	std	r10,GPR14(r1)
+#endif
 	beq	2f			/* if from kernel mode */
 	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
 2:	std	r2,GPR2(r1)
@@ -250,6 +262,11 @@ system_call_exit:
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1			/* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+	LOAD_REG_IMMEDIATE(r10, 0xdeadbeefULL << 32)
+	mfspr	r11,SPRN_PIR
+	or	r10,r10,r11
+	tdne	r10,r14
+
 	andi.	r6,r8,MSR_PR
 	ld	r4,_LINK(r1)
 
@@ -261,6 +278,9 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 	ld	r13,GPR13(r1)	/* only restore r13 if returning to usermode */
+#ifdef CONFIG_PPC_BOOK3S
+	ld	r14,GPR14(r1)	/* only restore r14 if returning to usermode */
+#endif
 1:	ld	r2,GPR2(r1)
 	ld	r1,GPR1(r1)
 	mtlr	r4
@@ -874,6 +894,9 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	ACCOUNT_CPU_USER_EXIT(r13, r2, r4)
 	REST_GPR(13, r1)
+#ifdef CONFIG_PPC_BOOK3S
+	REST_GPR(14, r1)
+#endif
 1:
 	mtspr	SPRN_SRR1,r3
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 189c456450e2..ca962bf85b8a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -196,7 +196,7 @@ EXC_REAL_END(machine_check, 0x200, 0x100)
 EXC_VIRT_NONE(0x4200, 0x100)
 TRAMP_REAL_BEGIN(machine_check_powernv_early)
 BEGIN_FTR_SECTION
-	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
+	EXCEPTION_PROLOG_1(PACA_EXMC, EXC_STD, NOTEST, 0x200)
 	/*
 	 * Register contents:
 	 * R13		= PACA
@@ -248,7 +248,7 @@ BEGIN_FTR_SECTION
 	mfspr	r11,SPRN_DSISR		/* Save DSISR */
 	std	r11,_DSISR(r1)
 	std	r9,_CCR(r1)		/* Save CR in stackframe */
-	/* Save r9 through r13 from EXMC save area to stack frame. */
+	/* Save r9 through r14 from EXMC save area to stack frame. */
 	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
 	mfmsr	r11			/* get MSR value */
 	ori	r11,r11,MSR_ME		/* turn on ME bit */
@@ -278,7 +278,7 @@ machine_check_fwnmi:
 	SET_SCRATCH0(r13)		/* save r13 */
 	EXCEPTION_PROLOG_0(PACA_EXMC)
 machine_check_pSeries_0:
-	EXCEPTION_PROLOG_1(PACA_EXMC, KVMTEST_PR, 0x200)
+	EXCEPTION_PROLOG_1(PACA_EXMC, EXC_STD, KVMTEST_PR, 0x200)
 	/*
 	 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 	 * nested machine check corrupts it. machine_check_common enables
@@ -338,8 +338,7 @@ EXC_COMMON_BEGIN(machine_check_common)
 	lhz	r12,PACA_IN_MCE(r13);			\
 	subi	r12,r12,1;				\
 	sth	r12,PACA_IN_MCE(r13);			\
-	REST_GPR(11, r1);				\
-	REST_2GPRS(12, r1);				\
+	REST_4GPRS(11, r1);				\
 	/* restore original r1. */			\
 	ld	r1,GPR1(r1)
 
@@ -514,7 +513,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXSLB)
-	EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
+	EXCEPTION_PROLOG_1(PACA_EXSLB, EXC_STD, KVMTEST_PR, 0x380)
 	mr	r12,r3	/* save r3 */
 	mfspr	r3,SPRN_DAR
 	mfspr	r11,SPRN_SRR1
@@ -525,7 +524,7 @@ EXC_REAL_END(data_access_slb, 0x380, 0x80)
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXSLB)
-	EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
+	EXCEPTION_PROLOG_1(PACA_EXSLB, EXC_STD, NOTEST, 0x380)
 	mr	r12,r3	/* save r3 */
 	mfspr	r3,SPRN_DAR
 	mfspr	r11,SPRN_SRR1
@@ -558,7 +557,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXSLB)
-	EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
+	EXCEPTION_PROLOG_1(PACA_EXSLB, EXC_STD, KVMTEST_PR, 0x480)
 	mr	r12,r3	/* save r3 */
 	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
 	mfspr	r11,SPRN_SRR1
@@ -569,7 +568,7 @@ EXC_REAL_END(instruction_access_slb, 0x480, 0x80)
 EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
 	SET_SCRATCH0(r13)
 	EXCEPTION_PROLOG_0(PACA_EXSLB)
-	EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x480)
+	EXCEPTION_PROLOG_1(PACA_EXSLB, EXC_STD, NOTEST, 0x480)
 	mr	r12,r3	/* save r3 */
 	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
 	mfspr	r11,SPRN_SRR1
@@ -639,6 +638,7 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
 	ld	r10,PACA_EXSLB+EX_R10(r13)
 	ld	r11,PACA_EXSLB+EX_R11(r13)
 	ld	r12,PACA_EXSLB+EX_R12(r13)
+	ld	r14,PACA_EXSLB+EX_R14(r13)
 	ld	r13,PACA_EXSLB+EX_R13(r13)
 	rfid
 	b	.	/* prevent speculative execution */
@@ -806,7 +806,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #endif
 
 
-EXC_REAL_MASKABLE(decrementer, 0x900, 0x80, IRQ_SOFT_MASK_STD)
+EXC_REAL_OOL_MASKABLE(decrementer, 0x900, 0x80, IRQ_SOFT_MASK_STD)
 EXC_VIRT_MASKABLE(decrementer, 0x4900, 0x80, 0x900, IRQ_SOFT_MASK_STD)
 TRAMP_KVM(PACA_EXGEN, 0x900)
 EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt)
@@ -907,6 +907,7 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
 	mtspr	SPRN_SRR0,r10 ; 				\
 	ld	r10,PACAKMSR(r13) ;				\
 	mtspr	SPRN_SRR1,r10 ; 				\
+	std	r14,PACA_EXGEN+EX_R14(r13);			\
 	rfid ; 							\
 	b	. ;	/* prevent speculative execution */
 
@@ -942,6 +943,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)				\
 	mfspr	r12,SPRN_SRR1 ;					\
 	li	r10,MSR_RI ;					\
 	mtmsrd 	r10,1 ;						\
+	std	r14,PACA_EXGEN+EX_R14(r13);			\
 	bctr ;
 #else
 	/* We can branch directly */
@@ -950,6 +952,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)				\
 	mfspr	r12,SPRN_SRR1 ;					\
 	li	r10,MSR_RI ;					\
 	mtmsrd 	r10,1 ;			/* Set RI (EE=0) */	\
+	std	r14,PACA_EXGEN+EX_R14(r13);			\
 	b	system_call_common ;
 #endif
 
@@ -1035,12 +1038,11 @@ __TRAMP_REAL_OOL_MASKABLE_HV(hmi_exception, 0xe60, IRQ_SOFT_MASK_STD)
 EXC_VIRT_NONE(0x4e60, 0x20)
 TRAMP_KVM_HV(PACA_EXGEN, 0xe60)
 TRAMP_REAL_BEGIN(hmi_exception_early)
-	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_HV, 0xe60)
+	EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_HV, KVMTEST_HV, 0xe60)
 	mr	r10,r1			/* Save r1 */
 	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack for realmode */
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
 	mfspr	r11,SPRN_HSRR0		/* Save HSRR0 */
-	mfspr	r12,SPRN_HSRR1		/* Save HSRR1 */
 	EXCEPTION_PROLOG_COMMON_1()
 	EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN)
 	EXCEPTION_PROLOG_COMMON_3(0xe60)
@@ -1236,7 +1238,7 @@ EXC_VIRT_NONE(0x5400, 0x100)
 EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100)
 	mtspr	SPRN_SPRG_HSCRATCH0,r13
 	EXCEPTION_PROLOG_0(PACA_EXGEN)
-	EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0x1500)
+	EXCEPTION_PROLOG_1(PACA_EXGEN, EXC_HV, NOTEST, 0x1500)
 
 #ifdef CONFIG_PPC_DENORMALISATION
 	mfspr	r10,SPRN_HSRR1
@@ -1319,6 +1321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	ld	r10,PACA_EXGEN+EX_R10(r13)
 	ld	r11,PACA_EXGEN+EX_R11(r13)
 	ld	r12,PACA_EXGEN+EX_R12(r13)
+	ld	r14,PACA_EXGEN+EX_R14(r13)
 	ld	r13,PACA_EXGEN+EX_R13(r13)
 	HRFID
 	b	.
@@ -1364,9 +1367,6 @@ EXC_VIRT_NONE(0x5800, 0x100)
 
 #define MASKED_DEC_HANDLER(_H)				\
 3: /* soft-nmi */					\
-	std	r12,PACA_EXGEN+EX_R12(r13);		\
-	GET_SCRATCH0(r10);				\
-	std	r10,PACA_EXGEN+EX_R13(r13);		\
 	EXCEPTION_PROLOG_PSERIES_1(soft_nmi_common, _H)
 
 /*
@@ -1404,7 +1404,6 @@ EXC_COMMON_BEGIN(soft_nmi_common)
  */
 #define MASKED_INTERRUPT(_H)				\
 masked_##_H##interrupt:					\
-	std	r11,PACA_EXGEN+EX_R11(r13);		\
 	lbz	r11,PACAIRQHAPPENED(r13);		\
 	or	r11,r11,r10;				\
 	stb	r11,PACAIRQHAPPENED(r13);		\
@@ -1416,13 +1415,13 @@ masked_##_H##interrupt:					\
 	b	MASKED_DEC_HANDLER_LABEL;		\
 1:	andi.	r10,r10,(PACA_IRQ_DBELL|PACA_IRQ_HMI);	\
 	bne	2f;					\
-	mfspr	r10,SPRN_##_H##SRR1;			\
-	xori	r10,r10,MSR_EE; /* clear MSR_EE */	\
-	mtspr	SPRN_##_H##SRR1,r10;			\
+	xori	r12,r12,MSR_EE; /* clear MSR_EE */	\
+	mtspr	SPRN_##_H##SRR1,r12;			\
 2:	mtcrf	0x80,r9;				\
 	ld	r9,PACA_EXGEN+EX_R9(r13);		\
 	ld	r10,PACA_EXGEN+EX_R10(r13);		\
 	ld	r11,PACA_EXGEN+EX_R11(r13);		\
+	ld	r12,PACA_EXGEN+EX_R12(r13);		\
 	/* returns to kernel where r13 must be set up, so don't restore it */ \
 	##_H##rfid;					\
 	b	.;					\
@@ -1648,10 +1647,12 @@ bad_stack:
 	SAVE_2GPRS(9,r1)
 	ld	r9,EX_R11(r3)
 	ld	r10,EX_R12(r3)
-	ld	r11,EX_R13(r3)
 	std	r9,GPR11(r1)
 	std	r10,GPR12(r1)
-	std	r11,GPR13(r1)
+	ld	r9,EX_R13(r3)
+	ld	r10,EX_R14(r3)
+	std	r9,GPR13(r1)
+	std	r10,GPR14(r1)
 BEGIN_FTR_SECTION
 	ld	r10,EX_CFAR(r3)
 	std	r10,ORIG_GPR3(r1)
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 87b2c2264118..5a9ec06eab14 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -413,6 +413,11 @@ generic_secondary_common_init:
 	b	kexec_wait		/* next kernel might do better	 */
 
 2:	SET_PACA(r13)
+	LOAD_REG_IMMEDIATE(r14, 0xdeadbeef << 32)
+	mfspr	r3,SPRN_PIR
+	or	r14,r14,r3
+	std	r14,PACA_R14(r13)
+
 #ifdef CONFIG_PPC_BOOK3E
 	addi	r12,r13,PACA_EXTLB	/* and TLB exc frame in another  */
 	mtspr	SPRN_SPRG_TLB_EXFRAME,r12
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 5065c4cb5f12..3b6de0ba4e03 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -195,6 +195,7 @@ pnv_powersave_common:
 
 	/* Continue saving state */
 	SAVE_GPR(2, r1)
+	SAVE_GPR(14, r1) /* XXX: check if we need to save/restore or can rely on PACA_R14 reload */
 	SAVE_NVGPRS(r1)
 	mfcr	r5
 	std	r5,_CCR(r1)
@@ -464,6 +465,7 @@ power9_dd1_recover_paca:
 	/* Load paca->thread_sibling_pacas[i] into r13 */
 	ldx	r13, r4, r5
 	SET_PACA(r13)
+	/* R14 will be restored */
 	/*
 	 * Indicate that we have lost NVGPR state
 	 * which needs to be restored from the stack.
@@ -926,6 +928,7 @@ fastsleep_workaround_at_exit:
 .global pnv_wakeup_loss
 pnv_wakeup_loss:
 	ld	r1,PACAR1(r13)
+	REST_GPR(14, r1)
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
@@ -950,6 +953,7 @@ pnv_wakeup_noloss:
 	cmpwi	r0,0
 	bne	pnv_wakeup_loss
 	ld	r1,PACAR1(r13)
+	REST_GPR(14, r1)
 BEGIN_FTR_SECTION
 	CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index d6597038931d..2491e9cd8d24 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -198,7 +198,6 @@ void setup_paca(struct paca_struct *new_paca)
 		mtspr(SPRN_SPRG_HPACA, local_paca);
 #endif
 	mtspr(SPRN_SPRG_PACA, local_paca);
-
 }
 
 static int __initdata paca_size;
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index a4408a7e6f14..9a4c5bf35d92 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -192,6 +192,9 @@ static void __init fixup_boot_paca(void)
 	get_paca()->data_offset = 0;
 	/* Mark interrupts disabled in PACA */
 	irq_soft_mask_set(IRQ_SOFT_MASK_STD);
+	/* Set r14 and paca_r14 to debug value */
+	get_paca()->r14 = (0xdeadbeefULL << 32) | mfspr(SPRN_PIR);
+	local_r14 = get_paca()->r14;
 }
 
 static void __init configure_exceptions(void)
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 70274b7b4773..b1b8f9f88083 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3073,6 +3073,7 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		regs->gpr[11] = regs->nip + 4;
 		regs->gpr[12] = regs->msr & MSR_MASK;
 		regs->gpr[13] = (unsigned long) get_paca();
+		regs->gpr[14] = local_r14;
 		regs->nip = (unsigned long) &system_call_common;
 		regs->msr = MSR_KERNEL;
 		return 1;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
  2017-12-20 14:51 ` [RFC PATCH 1/8] powerpc/64s: stop using r14 register Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 2/8] powerpc/64s: poison r14 register while in kernel Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  2017-12-20 17:53   ` Gabriel Paubert
  2017-12-20 14:52 ` [RFC PATCH 4/8] powerpc/64s: put io_sync bit into r14 Nicholas Piggin
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Shifted left by 16 bits, so the low 16 bits of r14 remain available.
This allows per-cpu pointers to be dereferenced with a single extra
shift whereas previously it was a load and add.
---
 arch/powerpc/include/asm/paca.h   |  5 +++++
 arch/powerpc/include/asm/percpu.h |  2 +-
 arch/powerpc/kernel/entry_64.S    |  5 -----
 arch/powerpc/kernel/head_64.S     |  5 +----
 arch/powerpc/kernel/setup_64.c    | 11 +++++++++--
 5 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index cd6a9a010895..4dd4ac69e84f 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -35,6 +35,11 @@
 
 register struct paca_struct *local_paca asm("r13");
 #ifdef CONFIG_PPC_BOOK3S
+/*
+ * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.
+ * The per-cpu could be moved completely to vmalloc space if we had large
+ * vmalloc page mapping? (no, must access it in real mode).
+ */
 register u64 local_r14 asm("r14");
 #endif
 
diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
index dce863a7635c..1e0d79d30eac 100644
--- a/arch/powerpc/include/asm/percpu.h
+++ b/arch/powerpc/include/asm/percpu.h
@@ -12,7 +12,7 @@
 
 #include <asm/paca.h>
 
-#define __my_cpu_offset local_paca->data_offset
+#define __my_cpu_offset (local_r14 >> 16)
 
 #endif /* CONFIG_SMP */
 #endif /* __powerpc64__ */
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 592e4b36065f..6b0e3ac311e8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -262,11 +262,6 @@ system_call_exit:
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1			/* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-	LOAD_REG_IMMEDIATE(r10, 0xdeadbeefULL << 32)
-	mfspr	r11,SPRN_PIR
-	or	r10,r10,r11
-	tdne	r10,r14
-
 	andi.	r6,r8,MSR_PR
 	ld	r4,_LINK(r1)
 
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 5a9ec06eab14..cdb710f43681 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -413,10 +413,7 @@ generic_secondary_common_init:
 	b	kexec_wait		/* next kernel might do better	 */
 
 2:	SET_PACA(r13)
-	LOAD_REG_IMMEDIATE(r14, 0xdeadbeef << 32)
-	mfspr	r3,SPRN_PIR
-	or	r14,r14,r3
-	std	r14,PACA_R14(r13)
+	ld	r14,PACA_R14(r13)
 
 #ifdef CONFIG_PPC_BOOK3E
 	addi	r12,r13,PACA_EXTLB	/* and TLB exc frame in another  */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 9a4c5bf35d92..f4a96ebb523a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -192,8 +192,8 @@ static void __init fixup_boot_paca(void)
 	get_paca()->data_offset = 0;
 	/* Mark interrupts disabled in PACA */
 	irq_soft_mask_set(IRQ_SOFT_MASK_STD);
-	/* Set r14 and paca_r14 to debug value */
-	get_paca()->r14 = (0xdeadbeefULL << 32) | mfspr(SPRN_PIR);
+	/* Set r14 and paca_r14 to zero */
+	get_paca()->r14 = 0;
 	local_r14 = get_paca()->r14;
 }
 
@@ -761,7 +761,14 @@ void __init setup_per_cpu_areas(void)
 	for_each_possible_cpu(cpu) {
                 __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
 		paca[cpu].data_offset = __per_cpu_offset[cpu];
+
+		BUG_ON(paca[cpu].data_offset & (PAGE_SIZE-1));
+		BUG_ON(paca[cpu].data_offset >= (1UL << (64 - 16)));
+
+		/* The top 48 bits are used for per-cpu data */
+		paca[cpu].r14 |= paca[cpu].data_offset << 16;
 	}
+	local_r14 = paca[smp_processor_id()].r14;
 }
 #endif
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/8] powerpc/64s: put io_sync bit into r14
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-12-20 14:52 ` [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14 Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  2017-12-22 15:08   ` Thiago Jung Bauermann
  2017-12-20 14:52 ` [RFC PATCH 5/8] powerpc/64s: put work_pending " Nicholas Piggin
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This simplifies spin unlock code and mmio primitives. This
may not be the best use of an r14 bit, but it was a simple
first proof of concept after the per-cpu data_offset, and
so it can stay until we get low on bits.
---
 arch/powerpc/include/asm/io.h       | 11 ++++------
 arch/powerpc/include/asm/paca.h     | 44 ++++++++++++++++++++++++++++++++++++-
 arch/powerpc/include/asm/spinlock.h | 21 ++++++++++--------
 arch/powerpc/xmon/xmon.c            |  1 -
 4 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 422f99cf9924..c817f3a83fcc 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -104,8 +104,8 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG()	do { local_paca->io_sync = 1; } while(0)
+#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
+#define IO_SET_SYNC_FLAG()	do { r14_set_bits(R14_BIT_IO_SYNC); } while(0)
 #else
 #define IO_SET_SYNC_FLAG()
 #endif
@@ -673,11 +673,8 @@ static inline void name at					\
  */
 static inline void mmiowb(void)
 {
-	unsigned long tmp;
-
-	__asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
-	: "=&r" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
-	: "memory");
+	__asm__ __volatile__("sync" : : : "memory");
+	r14_clear_bits(R14_BIT_IO_SYNC);
 }
 #endif /* !CONFIG_PPC32 */
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4dd4ac69e84f..408fa079e00d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -35,12 +35,55 @@
 
 register struct paca_struct *local_paca asm("r13");
 #ifdef CONFIG_PPC_BOOK3S
+
+#define R14_BIT_IO_SYNC	0x0001
+
 /*
  * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.
  * The per-cpu could be moved completely to vmalloc space if we had large
  * vmalloc page mapping? (no, must access it in real mode).
  */
 register u64 local_r14 asm("r14");
+
+/*
+ * r14 should not be modified by C code, because we can not guarantee it
+ * will be done with non-atomic (vs interrupts) read-modify-write sequences.
+ * All updates must be of the form `op r14,r14,xxx` or similar (i.e., atomic
+ * updates).
+ *
+ * Make asm statements have r14 for input and output so that the compiler
+ * does not re-order it with respect to other r14 manipulations.
+ */
+static inline void r14_set_bits(unsigned long mask)
+{
+	if (__builtin_constant_p(mask))
+		asm volatile("ori	%0,%0,%2\n"
+				: "=r" (local_r14)
+				: "0" (local_r14), "i" (mask));
+	else
+		asm volatile("or	%0,%0,%2\n"
+				: "=r" (local_r14)
+				: "0" (local_r14), "r" (mask));
+}
+
+static inline void r14_flip_bits(unsigned long mask)
+{
+	if (__builtin_constant_p(mask))
+		asm volatile("xori	%0,%0,%2\n"
+				: "=r" (local_r14)
+				: "0" (local_r14), "i" (mask));
+	else
+		asm volatile("xor	%0,%0,%2\n"
+				: "=r" (local_r14)
+				: "0" (local_r14), "r" (mask));
+}
+
+static inline void r14_clear_bits(unsigned long mask)
+{
+	asm volatile("andc	%0,%0,%2\n"
+			: "=r" (local_r14)
+			: "0" (local_r14), "r" (mask));
+}
 #endif
 
 #if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
@@ -169,7 +212,6 @@ struct paca_struct {
 	u16 trap_save;			/* Used when bad stack is encountered */
 	u8 irq_soft_mask;		/* mask for irq soft masking */
 	u8 irq_happened;		/* irq happened while soft-disabled */
-	u8 io_sync;			/* writel() needs spin_unlock sync */
 	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
 	u64 sprg_vdso;			/* Saved user-visible sprg */
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index b9ebc3085fb7..182bb9304c79 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -40,16 +40,9 @@
 #endif
 
 #if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC	(get_paca()->io_sync = 0)
-#define SYNC_IO		do {						\
-				if (unlikely(get_paca()->io_sync)) {	\
-					mb();				\
-					get_paca()->io_sync = 0;	\
-				}					\
-			} while (0)
+#define CLEAR_IO_SYNC	do { r14_clear_bits(R14_BIT_IO_SYNC); } while(0)
 #else
 #define CLEAR_IO_SYNC
-#define SYNC_IO
 #endif
 
 #ifdef CONFIG_PPC_PSERIES
@@ -165,9 +158,19 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	SYNC_IO;
+#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
+	bool io_sync = local_r14 & R14_BIT_IO_SYNC;
+	if (unlikely(io_sync)) {
+		mb();
+		CLEAR_IO_SYNC;
+	} else {
+		__asm__ __volatile__("# arch_spin_unlock\n\t"
+				PPC_RELEASE_BARRIER: : :"memory");
+	}
+#else
 	__asm__ __volatile__("# arch_spin_unlock\n\t"
 				PPC_RELEASE_BARRIER: : :"memory");
+#endif
 	lock->slock = 0;
 }
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a53454f61d09..40f0d02ae92d 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2393,7 +2393,6 @@ static void dump_one_paca(int cpu)
 	DUMP(p, trap_save, "x");
 	DUMP(p, irq_soft_mask, "x");
 	DUMP(p, irq_happened, "x");
-	DUMP(p, io_sync, "x");
 	DUMP(p, irq_work_pending, "x");
 	DUMP(p, nap_state_lost, "x");
 	DUMP(p, sprg_vdso, "llx");
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 5/8] powerpc/64s: put work_pending bit into r14
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-12-20 14:52 ` [RFC PATCH 4/8] powerpc/64s: put io_sync bit into r14 Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 6/8] powerpc/64s: put irq_soft_mask bits " Nicholas Piggin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Similarly, may not be worth an r14 bit, but...
---
 arch/powerpc/include/asm/paca.h |  4 ++--
 arch/powerpc/kernel/time.c      | 15 +++------------
 arch/powerpc/xmon/xmon.c        |  1 -
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 408fa079e00d..cd3637f4ee4e 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -36,7 +36,8 @@
 register struct paca_struct *local_paca asm("r13");
 #ifdef CONFIG_PPC_BOOK3S
 
-#define R14_BIT_IO_SYNC	0x0001
+#define R14_BIT_IO_SYNC			0x0001
+#define R14_BIT_IRQ_WORK_PENDING	0x0002 /* IRQ_WORK interrupt while soft-disable */
 
 /*
  * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.
@@ -212,7 +213,6 @@ struct paca_struct {
 	u16 trap_save;			/* Used when bad stack is encountered */
 	u8 irq_soft_mask;		/* mask for irq soft masking */
 	u8 irq_happened;		/* irq happened while soft-disabled */
-	u8 irq_work_pending;		/* IRQ_WORK interrupt while soft-disable */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
 	u64 sprg_vdso;			/* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 8d32ce95ec88..fac30152723f 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -488,26 +488,17 @@ EXPORT_SYMBOL(profile_pc);
 #ifdef CONFIG_PPC64
 static inline unsigned long test_irq_work_pending(void)
 {
-	unsigned long x;
-
-	asm volatile("lbz %0,%1(13)"
-		: "=r" (x)
-		: "i" (offsetof(struct paca_struct, irq_work_pending)));
-	return x;
+	return local_r14 & R14_BIT_IRQ_WORK_PENDING;
 }
 
 static inline void set_irq_work_pending_flag(void)
 {
-	asm volatile("stb %0,%1(13)" : :
-		"r" (1),
-		"i" (offsetof(struct paca_struct, irq_work_pending)));
+	r14_set_bits(R14_BIT_IRQ_WORK_PENDING);
 }
 
 static inline void clear_irq_work_pending(void)
 {
-	asm volatile("stb %0,%1(13)" : :
-		"r" (0),
-		"i" (offsetof(struct paca_struct, irq_work_pending)));
+	r14_clear_bits(R14_BIT_IRQ_WORK_PENDING);
 }
 
 #else /* 32-bit */
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 40f0d02ae92d..7d2bb26ff333 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2393,7 +2393,6 @@ static void dump_one_paca(int cpu)
 	DUMP(p, trap_save, "x");
 	DUMP(p, irq_soft_mask, "x");
 	DUMP(p, irq_happened, "x");
-	DUMP(p, irq_work_pending, "x");
 	DUMP(p, nap_state_lost, "x");
 	DUMP(p, sprg_vdso, "llx");
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 6/8] powerpc/64s: put irq_soft_mask bits into r14
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
                   ` (4 preceding siblings ...)
  2017-12-20 14:52 ` [RFC PATCH 5/8] powerpc/64s: put work_pending " Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 7/8] powerpc/64s: put irq_soft_mask and irq_happened " Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 8/8] powerpc/64s: inline local_irq_enable/restore Nicholas Piggin
  7 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Put the STD and PMI interrupt mask bits into r14. This benefits
IRQ disabling (enabling to a lesser extent), and soft mask check
in the interrupt entry handler.
---
 arch/powerpc/include/asm/exception-64s.h |  6 +-
 arch/powerpc/include/asm/hw_irq.h        | 98 ++++++++++++--------------------
 arch/powerpc/include/asm/irqflags.h      |  9 +--
 arch/powerpc/include/asm/kvm_ppc.h       |  2 +-
 arch/powerpc/include/asm/paca.h          | 18 +++++-
 arch/powerpc/kernel/asm-offsets.c        |  7 ++-
 arch/powerpc/kernel/entry_64.S           | 19 +++----
 arch/powerpc/kernel/idle_book3s.S        |  3 +
 arch/powerpc/kernel/irq.c                | 12 ++--
 arch/powerpc/kernel/optprobes_head.S     |  3 +-
 arch/powerpc/kernel/process.c            |  2 +-
 arch/powerpc/kernel/setup_64.c           | 11 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  3 +-
 arch/powerpc/xmon/xmon.c                 |  5 +-
 14 files changed, 95 insertions(+), 103 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index dadaa7471755..5602454ae56f 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -459,9 +459,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	mflr	r9;			/* Get LR, later save to stack	*/ \
 	ld	r2,PACATOC(r13);	/* get kernel TOC into r2	*/ \
 	std	r9,_LINK(r1);						   \
-	lbz	r10,PACAIRQSOFTMASK(r13);				   \
 	mfspr	r11,SPRN_XER;		/* save XER in stackframe	*/ \
-	std	r10,SOFTE(r1);						   \
+	std	r14,SOFTE(r1);		/* full r14 not just softe XXX	*/ \
 	std	r11,_XER(r1);						   \
 	li	r9,(n)+1;						   \
 	std	r9,_TRAP(r1);		/* set trap number		*/ \
@@ -526,8 +525,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define SOFTEN_VALUE_0xf00	PACA_IRQ_PMI
 
 #define __SOFTEN_TEST(h, vec, bitmask)					\
-	lbz	r10,PACAIRQSOFTMASK(r13);				\
-	andi.	r10,r10,bitmask;					\
+	andi.	r10,r14,bitmask;					\
 	li	r10,SOFTEN_VALUE_##vec;					\
 	bne	masked_##h##interrupt
 
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index eea02cbf5699..9ba445de989d 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -12,8 +12,9 @@
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 
-#ifdef CONFIG_PPC64
+#ifndef __ASSEMBLY__
 
+#ifdef CONFIG_PPC64
 /*
  * PACA flags in paca->irq_happened.
  *
@@ -30,21 +31,16 @@
 #define PACA_IRQ_PMI		0x40
 
 /*
- * flags for paca->irq_soft_mask
+ * 64s uses r14 rather than paca for irq_soft_mask
  */
-#define IRQ_SOFT_MASK_NONE	0x00
-#define IRQ_SOFT_MASK_STD	0x01 /* local_irq_disable() interrupts */
 #ifdef CONFIG_PPC_BOOK3S
-#define IRQ_SOFT_MASK_PMU	0x02
-#define IRQ_SOFT_MASK_ALL	0x03
-#else
-#define IRQ_SOFT_MASK_ALL	0x01
-#endif
+#define IRQ_SOFT_MASK_STD	(0x01 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
+#define IRQ_SOFT_MASK_PMU	(0x02 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
+#define IRQ_SOFT_MASK_ALL	(0x03 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
+#endif /* CONFIG_PPC_BOOK3S */
 
 #endif /* CONFIG_PPC64 */
 
-#ifndef __ASSEMBLY__
-
 extern void replay_system_reset(void);
 extern void __replay_interrupt(unsigned int vector);
 
@@ -56,24 +52,16 @@ extern void unknown_exception(struct pt_regs *regs);
 #ifdef CONFIG_PPC64
 #include <asm/paca.h>
 
-static inline notrace unsigned long irq_soft_mask_return(void)
+/*
+ * __irq_soft_mask_set/clear do not have memory clobbers so they
+ * should not be used by themselves to disable/enable irqs.
+ */
+static inline notrace void __irq_soft_mask_set(unsigned long disable_mask)
 {
-	unsigned long flags;
-
-	asm volatile(
-		"lbz %0,%1(13)"
-		: "=r" (flags)
-		: "i" (offsetof(struct paca_struct, irq_soft_mask)));
-
-	return flags;
+	r14_set_bits(disable_mask);
 }
 
-/*
- * The "memory" clobber acts as both a compiler barrier
- * for the critical section and as a clobber because
- * we changed paca->irq_soft_mask
- */
-static inline notrace void irq_soft_mask_set(unsigned long mask)
+static inline notrace void __irq_soft_mask_insert(unsigned long new_mask)
 {
 #ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
 	/*
@@ -90,49 +78,37 @@ static inline notrace void irq_soft_mask_set(unsigned long mask)
 	 * unmasks to be replayed, among other things. For now, take
 	 * the simple approach.
 	 */
-	WARN_ON(mask && !(mask & IRQ_SOFT_MASK_STD));
+	WARN_ON(new_mask && !(new_mask & IRQ_SOFT_MASK_STD));
 #endif
 
-	asm volatile(
-		"stb %0,%1(13)"
-		:
-		: "r" (mask),
-		  "i" (offsetof(struct paca_struct, irq_soft_mask))
-		: "memory");
+	r14_insert_bits(new_mask, IRQ_SOFT_MASK_ALL);
 }
 
-static inline notrace unsigned long irq_soft_mask_set_return(unsigned long mask)
+static inline notrace void __irq_soft_mask_clear(unsigned long enable_mask)
 {
-	unsigned long flags;
-
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-	WARN_ON(mask && !(mask & IRQ_SOFT_MASK_STD));
-#endif
-
-	asm volatile(
-		"lbz %0,%1(13); stb %2,%1(13)"
-		: "=&r" (flags)
-		: "i" (offsetof(struct paca_struct, irq_soft_mask)),
-		  "r" (mask)
-		: "memory");
+	r14_clear_bits(enable_mask);
+}
 
-	return flags;
+static inline notrace unsigned long irq_soft_mask_return(void)
+{
+	return local_r14 & IRQ_SOFT_MASK_ALL;
 }
 
-static inline notrace unsigned long irq_soft_mask_or_return(unsigned long mask)
+static inline notrace void irq_soft_mask_set(unsigned long disable_mask)
 {
-	unsigned long flags, tmp;
+	barrier();
+	__irq_soft_mask_set(disable_mask);
+	barrier();
+}
 
-	asm volatile(
-		"lbz %0,%2(13); or %1,%0,%3; stb %1,%2(13)"
-		: "=&r" (flags), "=r" (tmp)
-		: "i" (offsetof(struct paca_struct, irq_soft_mask)),
-		  "r" (mask)
-		: "memory");
+static inline notrace unsigned long irq_soft_mask_set_return(unsigned long disable_mask)
+{
+	unsigned long flags;
 
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-	WARN_ON((mask | flags) && !((mask | flags) & IRQ_SOFT_MASK_STD));
-#endif
+	barrier();
+	flags = irq_soft_mask_return();
+	__irq_soft_mask_set(disable_mask);
+	barrier();
 
 	return flags;
 }
@@ -151,7 +127,7 @@ extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
 {
-	arch_local_irq_restore(IRQ_SOFT_MASK_NONE);
+	arch_local_irq_restore(0);
 }
 
 static inline unsigned long arch_local_irq_save(void)
@@ -179,8 +155,8 @@ static inline bool arch_irqs_disabled(void)
 #define raw_local_irq_pmu_save(flags)					\
 	do {								\
 		typecheck(unsigned long, flags);			\
-		flags = irq_soft_mask_or_return(IRQ_SOFT_MASK_STD |	\
-				IRQ_SOFT_MASK_PMU);			\
+		flags = irq_soft_mask_set_return(			\
+				IRQ_SOFT_MASK_STD | IRQ_SOFT_MASK_PMU);	\
 	} while(0)
 
 #define raw_local_irq_pmu_restore(flags)				\
diff --git a/arch/powerpc/include/asm/irqflags.h b/arch/powerpc/include/asm/irqflags.h
index 492b0a9fa352..19a2752868f8 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -47,14 +47,12 @@
  * be clobbered.
  */
 #define RECONCILE_IRQ_STATE(__rA, __rB)		\
-	lbz	__rA,PACAIRQSOFTMASK(r13);	\
 	lbz	__rB,PACAIRQHAPPENED(r13);	\
-	andi.	__rA,__rA,IRQ_SOFT_MASK_STD;	\
-	li	__rA,IRQ_SOFT_MASK_STD;		\
+	andi.	__rA,r14,IRQ_SOFT_MASK_STD;	\
+	ori	r14,r14,IRQ_SOFT_MASK_STD;	\
 	ori	__rB,__rB,PACA_IRQ_HARD_DIS;	\
 	stb	__rB,PACAIRQHAPPENED(r13);	\
 	bne	44f;				\
-	stb	__rA,PACAIRQSOFTMASK(r13);	\
 	TRACE_DISABLE_INTS;			\
 44:
 
@@ -64,9 +62,8 @@
 
 #define RECONCILE_IRQ_STATE(__rA, __rB)		\
 	lbz	__rA,PACAIRQHAPPENED(r13);	\
-	li	__rB,IRQ_SOFT_MASK_STD;		\
+	ori	r14,r14,IRQ_SOFT_MASK_STD;	\
 	ori	__rA,__rA,PACA_IRQ_HARD_DIS;	\
-	stb	__rB,PACAIRQSOFTMASK(r13);	\
 	stb	__rA,PACAIRQHAPPENED(r13)
 #endif
 #endif
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 08053e596753..028b7cefe089 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -873,7 +873,7 @@ static inline void kvmppc_fix_ee_before_entry(void)
 
 	/* Only need to enable IRQs by hard enabling them after this */
 	local_paca->irq_happened = 0;
-	irq_soft_mask_set(IRQ_SOFT_MASK_NONE);
+	__irq_soft_mask_clear(IRQ_SOFT_MASK_ALL);
 #endif
 }
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index cd3637f4ee4e..dbf80fff2f53 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -36,8 +36,10 @@
 register struct paca_struct *local_paca asm("r13");
 #ifdef CONFIG_PPC_BOOK3S
 
-#define R14_BIT_IO_SYNC			0x0001
-#define R14_BIT_IRQ_WORK_PENDING	0x0002 /* IRQ_WORK interrupt while soft-disable */
+#define R14_BIT_IRQ_SOFT_MASK_SHIFT	0
+#define R14_BIT_IRQ_SOFT_MASK		(0x3 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
+#define R14_BIT_IO_SYNC			0x0004
+#define R14_BIT_IRQ_WORK_PENDING	0x0008 /* IRQ_WORK interrupt while soft-disable */
 
 /*
  * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.
@@ -79,6 +81,17 @@ static inline void r14_flip_bits(unsigned long mask)
 				: "0" (local_r14), "r" (mask));
 }
 
+static inline void r14_insert_bits(unsigned long source, unsigned long mask)
+{
+	unsigned long first = ffs(mask) - 1;
+	unsigned long last = fls64(mask) - 1;
+
+	mask >>= first;
+	asm volatile("rldimi	%0,%2,%3,%4\n"
+			: "=r" (local_r14)
+			: "0" (local_r14), "r" (source), "i" (first), "i" (63 - last));
+}
+
 static inline void r14_clear_bits(unsigned long mask)
 {
 	asm volatile("andc	%0,%0,%2\n"
@@ -211,7 +224,6 @@ struct paca_struct {
 	u64 saved_r1;			/* r1 save for RTAS calls or PM */
 	u64 saved_msr;			/* MSR saved here by enter_rtas */
 	u16 trap_save;			/* Used when bad stack is encountered */
-	u8 irq_soft_mask;		/* mask for irq soft masking */
 	u8 irq_happened;		/* irq happened while soft-disabled */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
 	u64 sprg_vdso;			/* Saved user-visible sprg */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 32d393f55a96..c5c005d354b0 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -83,6 +83,12 @@ int main(void)
 #ifdef CONFIG_PPC64
 	DEFINE(SIGSEGV, SIGSEGV);
 	DEFINE(NMI_MASK, NMI_MASK);
+	DEFINE(R14_BIT_IRQ_SOFT_MASK_SHIFT, R14_BIT_IRQ_SOFT_MASK_SHIFT);
+	DEFINE(R14_BIT_IRQ_SOFT_MASK, R14_BIT_IRQ_SOFT_MASK);
+	DEFINE(IRQ_SOFT_MASK_STD, IRQ_SOFT_MASK_STD);
+	DEFINE(IRQ_SOFT_MASK_PMU, IRQ_SOFT_MASK_PMU);
+	DEFINE(IRQ_SOFT_MASK_ALL, IRQ_SOFT_MASK_ALL);
+
 	OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
 	OFFSET(THREAD_INFO, task_struct, stack);
@@ -178,7 +184,6 @@ int main(void)
 	OFFSET(PACATOC, paca_struct, kernel_toc);
 	OFFSET(PACAKBASE, paca_struct, kernelbase);
 	OFFSET(PACAKMSR, paca_struct, kernel_msr);
-	OFFSET(PACAIRQSOFTMASK, paca_struct, irq_soft_mask);
 	OFFSET(PACAIRQHAPPENED, paca_struct, irq_happened);
 #ifdef CONFIG_PPC_BOOK3S
 	OFFSET(PACACONTEXTID, paca_struct, mm_ctx_id);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6b0e3ac311e8..dd06f8f874f3 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -141,8 +141,8 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 	 * is correct
 	 */
 #if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
-	lbz	r10,PACAIRQSOFTMASK(r13)
-1:	tdnei	r10,IRQ_SOFT_MASK_NONE
+	andi.	r10,r14,R14_BIT_IRQ_SOFT_MASK
+1:	tdnei	r10,0
 	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
@@ -158,7 +158,7 @@ system_call:			/* label this so stack traces look sane */
 	/* We do need to set SOFTE in the stack frame or the return
 	 * from interrupt will be painful
 	 */
-	li	r10,IRQ_SOFT_MASK_NONE
+	li	r10,0
 	std	r10,SOFTE(r1)
 
 	CURRENT_THREAD_INFO(r11, r1)
@@ -793,13 +793,12 @@ restore:
 	 * are about to re-enable interrupts
 	 */
 	ld	r5,SOFTE(r1)
-	lbz	r6,PACAIRQSOFTMASK(r13)
 	andi.	r5,r5,IRQ_SOFT_MASK_STD
 	bne	.Lrestore_irq_off
 
 	/* We are enabling, were we already enabled ? Yes, just return */
-	andi.	r6,r6,IRQ_SOFT_MASK_STD
-	beq	cr0,.Ldo_restore
+	andi.	r5,r14,IRQ_SOFT_MASK_STD
+	beq	.Ldo_restore
 
 	/*
 	 * We are about to soft-enable interrupts (we are hard disabled
@@ -817,8 +816,8 @@ restore:
 	 */
 .Lrestore_no_replay:
 	TRACE_ENABLE_INTS
-	li	r0,IRQ_SOFT_MASK_NONE
-	stb	r0,PACAIRQSOFTMASK(r13);
+	li	r0,R14_BIT_IRQ_SOFT_MASK
+	andc	r14,r14,r0
 
 	/*
 	 * Final return path. BookE is handled in a different file
@@ -1056,8 +1055,8 @@ _GLOBAL(enter_rtas)
 	/* There is no way it is acceptable to get here with interrupts enabled,
 	 * check it with the asm equivalent of WARN_ON
 	 */
-	lbz	r0,PACAIRQSOFTMASK(r13)
-1:	tdeqi	r0,IRQ_SOFT_MASK_NONE
+	andi.	r0,r14,R14_BIT_IRQ_SOFT_MASK
+1:	tdeqi	r0,0
 	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 3b6de0ba4e03..0a7de4e13cf8 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -510,12 +510,15 @@ pnv_powersave_wakeup:
 BEGIN_FTR_SECTION
 BEGIN_FTR_SECTION_NESTED(70)
 	bl	power9_dd1_recover_paca
+	ld	r14,PACA_R14(r13)
 END_FTR_SECTION_NESTED_IFSET(CPU_FTR_POWER9_DD1, 70)
 	bl	pnv_restore_hyp_resource_arch300
 FTR_SECTION_ELSE
 	bl	pnv_restore_hyp_resource_arch207
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 
+	ori	r14,r14,PACA_IRQ_HARD_DIS
+
 	li	r0,PNV_THREAD_RUNNING
 	stb	r0,PACA_THREAD_IDLE_STATE(r13)	/* Clear thread state */
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index e7619d144c15..2341029653e4 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -230,10 +230,13 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	unsigned int replay;
 
 	/* Write the new soft-enabled value */
-	irq_soft_mask_set(mask);
+	__irq_soft_mask_insert(mask);
+	/* any bits still disabled */
 	if (mask)
 		return;
 
+	barrier();
+
 	/*
 	 * From this point onward, we can take interrupts, preempt,
 	 * etc... unless we got hard-disabled. We check if an event
@@ -261,6 +264,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	 * then we are already hard disabled (there are other less
 	 * common cases that we'll ignore for now), so we skip the
 	 * (expensive) mtmsrd.
+	 * XXX: why not test & IRQ_HARD_DIS?
 	 */
 	if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
 		__hard_irq_disable();
@@ -277,7 +281,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	}
 #endif
 
-	irq_soft_mask_set(IRQ_SOFT_MASK_ALL);
+	__irq_soft_mask_set(IRQ_SOFT_MASK_ALL);
 	trace_hardirqs_off();
 
 	/*
@@ -289,7 +293,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 
 	/* We can soft-enable now */
 	trace_hardirqs_on();
-	irq_soft_mask_set(IRQ_SOFT_MASK_NONE);
+	__irq_soft_mask_clear(IRQ_SOFT_MASK_ALL);
 
 	/*
 	 * And replay if we have to. This will return with interrupts
@@ -364,7 +368,7 @@ bool prep_irq_for_idle(void)
 	 * of entering the low power state.
 	 */
 	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-	irq_soft_mask_set(IRQ_SOFT_MASK_NONE);
+	__irq_soft_mask_clear(IRQ_SOFT_MASK_ALL);
 
 	/* Tell the caller to enter the low power state */
 	return true;
diff --git a/arch/powerpc/kernel/optprobes_head.S b/arch/powerpc/kernel/optprobes_head.S
index 98a3aeeb3c8c..c8f106e6bc70 100644
--- a/arch/powerpc/kernel/optprobes_head.S
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -58,8 +58,7 @@ optprobe_template_entry:
 	std	r5,_XER(r1)
 	mfcr	r5
 	std	r5,_CCR(r1)
-	lbz     r5,PACAIRQSOFTMASK(r13)
-	std     r5,SOFTE(r1)
+	std     r14,SOFTE(r1)
 
 	/*
 	 * We may get here from a module, so load the kernel TOC in r2.
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index f56e42f06f24..5914da5db7d9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1675,7 +1675,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 			childregs->gpr[FIRST_NVGPR] = ppc_function_entry((void *)usp);
 #ifdef CONFIG_PPC64
 		clear_tsk_thread_flag(p, TIF_32BIT);
-		childregs->softe = IRQ_SOFT_MASK_NONE;
+		childregs->softe = 0;
 #endif
 		childregs->gpr[FIRST_NVGPR + 1] = kthread_arg;
 		p->thread.regs = NULL;	/* no user register state */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f4a96ebb523a..6e4f4e46b76b 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -190,11 +190,12 @@ static void __init fixup_boot_paca(void)
 	get_paca()->cpu_start = 1;
 	/* Allow percpu accesses to work until we setup percpu data */
 	get_paca()->data_offset = 0;
-	/* Mark interrupts disabled in PACA */
-	irq_soft_mask_set(IRQ_SOFT_MASK_STD);
 	/* Set r14 and paca_r14 to zero */
 	get_paca()->r14 = 0;
 	local_r14 = get_paca()->r14;
+
+	/* Mark interrupts disabled in PACA or r14 */
+	__irq_soft_mask_set(IRQ_SOFT_MASK_STD);
 }
 
 static void __init configure_exceptions(void)
@@ -356,8 +357,8 @@ void __init early_setup(unsigned long dt_ptr)
 #ifdef CONFIG_SMP
 void early_setup_secondary(void)
 {
-	/* Mark interrupts disabled in PACA */
-	irq_soft_mask_set(IRQ_SOFT_MASK_STD);
+	/* Mark interrupts disabled in r14 */
+	__irq_soft_mask_set(IRQ_SOFT_MASK_STD);
 
 	/* Initialize the hash table or TLB handling */
 	early_init_mmu_secondary();
@@ -768,7 +769,7 @@ void __init setup_per_cpu_areas(void)
 		/* The top 48 bits are used for per-cpu data */
 		paca[cpu].r14 |= paca[cpu].data_offset << 16;
 	}
-	local_r14 = paca[smp_processor_id()].r14;
+	local_r14 |= paca[smp_processor_id()].r14;
 }
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index a92ad8500917..88455123352c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -3249,11 +3249,10 @@ kvmppc_bad_host_intr:
 	mfctr	r4
 #endif
 	mfxer	r5
-	lbz	r6, PACAIRQSOFTMASK(r13)
 	std	r3, _LINK(r1)
 	std	r4, _CTR(r1)
 	std	r5, _XER(r1)
-	std	r6, SOFTE(r1)
+	std	r14, SOFTE(r1)
 	ld	r2, PACATOC(r13)
 	LOAD_REG_IMMEDIATE(3, 0x7265677368657265)
 	std	r3, STACK_FRAME_OVERHEAD-16(r1)
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 7d2bb26ff333..d7d3885035f2 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1622,8 +1622,8 @@ static void excprint(struct pt_regs *fp)
 
 	printf("  current = 0x%lx\n", current);
 #ifdef CONFIG_PPC64
-	printf("  paca    = 0x%lx\t softe: %d\t irq_happened: 0x%02x\n",
-	       local_paca, local_paca->irq_soft_mask, local_paca->irq_happened);
+	printf("  paca    = 0x%lx\t r14: 0x%lx\t irq_happened: 0x%02x\n",
+	       local_paca, local_r14, local_paca->irq_happened);
 #endif
 	if (current) {
 		printf("    pid   = %ld, comm = %s\n",
@@ -2391,7 +2391,6 @@ static void dump_one_paca(int cpu)
 	DUMP(p, stab_rr, "lx");
 	DUMP(p, saved_r1, "lx");
 	DUMP(p, trap_save, "x");
-	DUMP(p, irq_soft_mask, "x");
 	DUMP(p, irq_happened, "x");
 	DUMP(p, nap_state_lost, "x");
 	DUMP(p, sprg_vdso, "llx");
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 7/8] powerpc/64s: put irq_soft_mask and irq_happened bits into r14
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
                   ` (5 preceding siblings ...)
  2017-12-20 14:52 ` [RFC PATCH 6/8] powerpc/64s: put irq_soft_mask bits " Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  2017-12-20 14:52 ` [RFC PATCH 8/8] powerpc/64s: inline local_irq_enable/restore Nicholas Piggin
  7 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This should be split into two patches. irq_happened and soft_mask.
It may not be worth putting all irq_happened bits into r14, just a
single "an irq did happen" bit may be good enough to then load a
paca variable.
---
 arch/powerpc/include/asm/hw_irq.h    | 23 +++++++++----
 arch/powerpc/include/asm/irqflags.h  | 21 +++++-------
 arch/powerpc/include/asm/kvm_ppc.h   |  4 +--
 arch/powerpc/include/asm/paca.h      |  7 ++--
 arch/powerpc/kernel/asm-offsets.c    |  9 +++++-
 arch/powerpc/kernel/entry_64.S       | 13 ++++----
 arch/powerpc/kernel/exceptions-64s.S |  4 +--
 arch/powerpc/kernel/head_64.S        | 15 ++-------
 arch/powerpc/kernel/irq.c            | 62 ++++++++++++++----------------------
 arch/powerpc/kvm/book3s_hv.c         |  6 ++--
 arch/powerpc/xmon/xmon.c             |  4 +--
 11 files changed, 76 insertions(+), 92 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 9ba445de989d..f492a7779ea3 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -22,6 +22,7 @@
  * and allow a proper replay. Additionally, PACA_IRQ_HARD_DIS
  * is set whenever we manually hard disable.
  */
+#ifdef CONFIG_PPC_BOOK3E
 #define PACA_IRQ_HARD_DIS	0x01
 #define PACA_IRQ_DBELL		0x02
 #define PACA_IRQ_EE		0x04
@@ -30,14 +31,22 @@
 #define PACA_IRQ_HMI		0x20
 #define PACA_IRQ_PMI		0x40
 
+#else /* CONFIG_PPC_BOOK3E */
 /*
- * 64s uses r14 rather than paca for irq_soft_mask
+ * 64s uses r14 rather than paca for irq_soft_mask and irq_happened
  */
-#ifdef CONFIG_PPC_BOOK3S
+
+#define PACA_IRQ_HARD_DIS	(0x01 << R14_BIT_IRQ_HAPPENED_SHIFT)
+#define PACA_IRQ_DBELL		(0x02 << R14_BIT_IRQ_HAPPENED_SHIFT)
+#define PACA_IRQ_EE		(0x04 << R14_BIT_IRQ_HAPPENED_SHIFT)
+#define PACA_IRQ_DEC		(0x08 << R14_BIT_IRQ_HAPPENED_SHIFT)
+#define PACA_IRQ_HMI		(0x10 << R14_BIT_IRQ_HAPPENED_SHIFT)
+#define PACA_IRQ_PMI		(0x20 << R14_BIT_IRQ_HAPPENED_SHIFT)
+
 #define IRQ_SOFT_MASK_STD	(0x01 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
 #define IRQ_SOFT_MASK_PMU	(0x02 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
 #define IRQ_SOFT_MASK_ALL	(0x03 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
-#endif /* CONFIG_PPC_BOOK3S */
+#endif /* CONFIG_PPC_BOOK3E */
 
 #endif /* CONFIG_PPC64 */
 
@@ -206,14 +215,14 @@ static inline bool arch_irqs_disabled(void)
 	unsigned long flags;						\
 	__hard_irq_disable();						\
 	flags = irq_soft_mask_set_return(IRQ_SOFT_MASK_ALL);		\
-	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;			\
+	r14_set_bits(PACA_IRQ_HARD_DIS);				\
 	if (!arch_irqs_disabled_flags(flags))				\
 		trace_hardirqs_off();					\
 } while(0)
 
 static inline bool lazy_irq_pending(void)
 {
-	return !!(get_paca()->irq_happened & ~PACA_IRQ_HARD_DIS);
+	return !!(local_r14 & R14_BIT_IRQ_HAPPENED_MASK & ~PACA_IRQ_HARD_DIS);
 }
 
 /*
@@ -223,8 +232,8 @@ static inline bool lazy_irq_pending(void)
  */
 static inline void may_hard_irq_enable(void)
 {
-	get_paca()->irq_happened &= ~PACA_IRQ_HARD_DIS;
-	if (!(get_paca()->irq_happened & PACA_IRQ_EE))
+	r14_clear_bits(PACA_IRQ_HARD_DIS);
+	if (!(local_r14 & PACA_IRQ_EE))
 		__hard_irq_enable();
 }
 
diff --git a/arch/powerpc/include/asm/irqflags.h b/arch/powerpc/include/asm/irqflags.h
index 19a2752868f8..140e51b9f436 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -45,26 +45,21 @@
  *
  * NB: This may call C code, so the caller must be prepared for volatiles to
  * be clobbered.
+ * XXX: could make this single-register now
  */
-#define RECONCILE_IRQ_STATE(__rA, __rB)		\
-	lbz	__rB,PACAIRQHAPPENED(r13);	\
-	andi.	__rA,r14,IRQ_SOFT_MASK_STD;	\
-	ori	r14,r14,IRQ_SOFT_MASK_STD;	\
-	ori	__rB,__rB,PACA_IRQ_HARD_DIS;	\
-	stb	__rB,PACAIRQHAPPENED(r13);	\
-	bne	44f;				\
-	TRACE_DISABLE_INTS;			\
+#define RECONCILE_IRQ_STATE(__rA, __rB)					\
+	andi.	__rA,r14,IRQ_SOFT_MASK_STD;				\
+	ori	r14,r14,(PACA_IRQ_HARD_DIS | IRQ_SOFT_MASK_STD);	\
+	bne	44f;							\
+	TRACE_DISABLE_INTS;						\
 44:
 
 #else
 #define TRACE_ENABLE_INTS
 #define TRACE_DISABLE_INTS
 
-#define RECONCILE_IRQ_STATE(__rA, __rB)		\
-	lbz	__rA,PACAIRQHAPPENED(r13);	\
-	ori	r14,r14,IRQ_SOFT_MASK_STD;	\
-	ori	__rA,__rA,PACA_IRQ_HARD_DIS;	\
-	stb	__rA,PACAIRQHAPPENED(r13)
+#define RECONCILE_IRQ_STATE(__rA, __rB)					\
+	ori	r14,r14,(PACA_IRQ_HARD_DIS | IRQ_SOFT_MASK_STD)
 #endif
 #endif
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 028b7cefe089..45202e124003 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -869,10 +869,10 @@ static inline void kvmppc_fix_ee_before_entry(void)
 	 * To avoid races, the caller must have gone directly from having
 	 * interrupts fully-enabled to hard-disabled.
 	 */
-	WARN_ON(local_paca->irq_happened != PACA_IRQ_HARD_DIS);
+	WARN_ON((local_r14 & R14_BIT_IRQ_HAPPENED_MASK) != PACA_IRQ_HARD_DIS);
 
 	/* Only need to enable IRQs by hard enabling them after this */
-	local_paca->irq_happened = 0;
+	r14_clear_bits(R14_BIT_IRQ_HAPPENED_MASK);
 	__irq_soft_mask_clear(IRQ_SOFT_MASK_ALL);
 #endif
 }
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index dbf80fff2f53..4edfcdecb268 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -38,8 +38,10 @@ register struct paca_struct *local_paca asm("r13");
 
 #define R14_BIT_IRQ_SOFT_MASK_SHIFT	0
 #define R14_BIT_IRQ_SOFT_MASK		(0x3 << R14_BIT_IRQ_SOFT_MASK_SHIFT)
-#define R14_BIT_IO_SYNC			0x0004
-#define R14_BIT_IRQ_WORK_PENDING	0x0008 /* IRQ_WORK interrupt while soft-disable */
+#define R14_BIT_IRQ_HAPPENED_SHIFT	2
+#define R14_BIT_IRQ_HAPPENED_MASK	(0x3f << R14_BIT_IRQ_HAPPENED_SHIFT) /* irq happened while soft-disabled */
+#define R14_BIT_IO_SYNC			0x0100
+#define R14_BIT_IRQ_WORK_PENDING	0x0200 /* IRQ_WORK interrupt while soft-disable */
 
 /*
  * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.
@@ -224,7 +226,6 @@ struct paca_struct {
 	u64 saved_r1;			/* r1 save for RTAS calls or PM */
 	u64 saved_msr;			/* MSR saved here by enter_rtas */
 	u16 trap_save;			/* Used when bad stack is encountered */
-	u8 irq_happened;		/* irq happened while soft-disabled */
 	u8 nap_state_lost;		/* NV GPR values lost in power7_idle */
 	u64 sprg_vdso;			/* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index c5c005d354b0..2f03f778baf2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -88,6 +88,14 @@ int main(void)
 	DEFINE(IRQ_SOFT_MASK_STD, IRQ_SOFT_MASK_STD);
 	DEFINE(IRQ_SOFT_MASK_PMU, IRQ_SOFT_MASK_PMU);
 	DEFINE(IRQ_SOFT_MASK_ALL, IRQ_SOFT_MASK_ALL);
+	DEFINE(R14_BIT_IRQ_HAPPENED_SHIFT, R14_BIT_IRQ_HAPPENED_SHIFT);
+	DEFINE(R14_BIT_IRQ_HAPPENED_MASK, R14_BIT_IRQ_HAPPENED_MASK);
+	DEFINE(PACA_IRQ_HARD_DIS, PACA_IRQ_HARD_DIS);
+	DEFINE(PACA_IRQ_EE, PACA_IRQ_EE);
+	DEFINE(PACA_IRQ_DBELL, PACA_IRQ_DBELL);
+	DEFINE(PACA_IRQ_DEC, PACA_IRQ_DEC);
+	DEFINE(PACA_IRQ_HMI, PACA_IRQ_HMI);
+	DEFINE(PACA_IRQ_PMI, PACA_IRQ_PMI);
 
 	OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
@@ -184,7 +192,6 @@ int main(void)
 	OFFSET(PACATOC, paca_struct, kernel_toc);
 	OFFSET(PACAKBASE, paca_struct, kernelbase);
 	OFFSET(PACAKMSR, paca_struct, kernel_msr);
-	OFFSET(PACAIRQHAPPENED, paca_struct, irq_happened);
 #ifdef CONFIG_PPC_BOOK3S
 	OFFSET(PACACONTEXTID, paca_struct, mm_ctx_id);
 #ifdef CONFIG_PPC_MM_SLICES
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index dd06f8f874f3..33e596d587fb 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -805,8 +805,7 @@ restore:
 	 * at this point). We check if there's anything that needs to
 	 * be replayed first.
 	 */
-	lbz	r0,PACAIRQHAPPENED(r13)
-	cmpwi	cr0,r0,0
+	andi.	r0,r14,R14_BIT_IRQ_HAPPENED_MASK
 	bne-	.Lrestore_check_irq_replay
 
 	/*
@@ -919,16 +918,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	 */
 .Lrestore_irq_off:
 	ld	r3,_MSR(r1)
-	lbz	r7,PACAIRQHAPPENED(r13)
+	ori	r14,r14,IRQ_SOFT_MASK_STD
 	andi.	r0,r3,MSR_EE
 	beq	1f
-	rlwinm	r7,r7,0,~PACA_IRQ_HARD_DIS
-	stb	r7,PACAIRQHAPPENED(r13)
+	li	r0,PACA_IRQ_HARD_DIS
+	andc	r14,r14,r0
 1:
 #if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
 	/* The interrupt should not have soft enabled. */
-	lbz	r7,PACAIRQSOFTMASK(r13)
-	tdeqi	r7,IRQ_SOFT_MASK_NONE
+	andi.	r7,r14,R14_BIT_IRQ_SOFT_MASK
+1:	tdeqi	r7,0
 	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 	b	.Ldo_restore
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ca962bf85b8a..4da2b586e29e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1404,9 +1404,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
  */
 #define MASKED_INTERRUPT(_H)				\
 masked_##_H##interrupt:					\
-	lbz	r11,PACAIRQHAPPENED(r13);		\
-	or	r11,r11,r10;				\
-	stb	r11,PACAIRQHAPPENED(r13);		\
+	or	r14,r14,r10;				\
 	cmpwi	r10,PACA_IRQ_DEC;			\
 	bne	1f;					\
 	lis	r10,0x7fff;				\
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index cdb710f43681..60de1aa6ef85 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -767,10 +767,7 @@ _GLOBAL(pmac_secondary_start)
 	/* Mark interrupts soft and hard disabled (they might be enabled
 	 * in the PACA when doing hotplug)
 	 */
-	li	r0,IRQ_SOFT_MASK_STD
-	stb	r0,PACAIRQSOFTMASK(r13)
-	li	r0,PACA_IRQ_HARD_DIS
-	stb	r0,PACAIRQHAPPENED(r13)
+	ori	r14,r14,(IRQ_SOFT_MASK_STD | PACA_IRQ_HARD_DIS)
 
 	/* Create a temp kernel stack for use before relocation is on.	*/
 	ld	r1,PACAEMERGSP(r13)
@@ -824,10 +821,7 @@ __secondary_start:
 	/* Mark interrupts soft and hard disabled (they might be enabled
 	 * in the PACA when doing hotplug)
 	 */
-	li	r7,IRQ_SOFT_MASK_STD
-	stb	r7,PACAIRQSOFTMASK(r13)
-	li	r0,PACA_IRQ_HARD_DIS
-	stb	r0,PACAIRQHAPPENED(r13)
+	ori	r14,r14,(IRQ_SOFT_MASK_STD | PACA_IRQ_HARD_DIS)
 
 	/* enable MMU and jump to start_secondary */
 	LOAD_REG_ADDR(r3, start_secondary_prolog)
@@ -991,10 +985,7 @@ start_here_common:
 	/* Mark interrupts soft and hard disabled (they might be enabled
 	 * in the PACA when doing hotplug)
 	 */
-	li	r0,IRQ_SOFT_MASK_STD
-	stb	r0,PACAIRQSOFTMASK(r13)
-	li	r0,PACA_IRQ_HARD_DIS
-	stb	r0,PACAIRQHAPPENED(r13)
+	ori	r14,r14,(IRQ_SOFT_MASK_STD | PACA_IRQ_HARD_DIS)
 
 	/* Generic kernel entry */
 	bl	start_kernel
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 2341029653e4..ebaf210a7406 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -99,12 +99,7 @@ int distribute_irqs = 1;
 
 static inline notrace unsigned long get_irq_happened(void)
 {
-	unsigned long happened;
-
-	__asm__ __volatile__("lbz %0,%1(13)"
-	: "=r" (happened) : "i" (offsetof(struct paca_struct, irq_happened)));
-
-	return happened;
+	return local_r14 & R14_BIT_IRQ_HAPPENED_MASK;
 }
 
 static inline notrace int decrementer_check_overflow(void)
@@ -131,13 +126,6 @@ static inline notrace int decrementer_check_overflow(void)
  */
 notrace unsigned int __check_irq_replay(void)
 {
-	/*
-	 * We use local_paca rather than get_paca() to avoid all
-	 * the debug_smp_processor_id() business in this low level
-	 * function
-	 */
-	unsigned char happened = local_paca->irq_happened;
-
 	/*
 	 * We are responding to the next interrupt, so interrupt-off
 	 * latencies should be reset here.
@@ -145,20 +133,17 @@ notrace unsigned int __check_irq_replay(void)
 	trace_hardirqs_on();
 	trace_hardirqs_off();
 
-	if (happened & PACA_IRQ_HARD_DIS) {
+	if (local_r14 & PACA_IRQ_HARD_DIS) {
 		/* Clear bit 0 which we wouldn't clear otherwise */
-		local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-
+		r14_clear_bits(PACA_IRQ_HARD_DIS);
 		/*
 		 * We may have missed a decrementer interrupt if hard disabled.
 		 * Check the decrementer register in case we had a rollover
 		 * while hard disabled.
 		 */
-		if (!(happened & PACA_IRQ_DEC)) {
-			if (decrementer_check_overflow()) {
-				local_paca->irq_happened |= PACA_IRQ_DEC;
-				happened |= PACA_IRQ_DEC;
-			}
+		if (!(local_r14 & PACA_IRQ_DEC)) {
+			if (decrementer_check_overflow())
+				r14_set_bits(PACA_IRQ_DEC);
 		}
 	}
 
@@ -176,23 +161,24 @@ notrace unsigned int __check_irq_replay(void)
 	 * This is a higher priority interrupt than the others, so
 	 * replay it first.
 	 */
-	if (happened & PACA_IRQ_HMI) {
-		local_paca->irq_happened &= ~PACA_IRQ_HMI;
+	if (local_r14 & PACA_IRQ_HMI) {
+		r14_clear_bits(PACA_IRQ_HMI);
 		return 0xe60;
 	}
 
-	if (happened & PACA_IRQ_DEC) {
-		local_paca->irq_happened &= ~PACA_IRQ_DEC;
+	if (local_r14 & PACA_IRQ_DEC) {
+		r14_clear_bits(PACA_IRQ_DEC);
 		return 0x900;
 	}
 
-	if (happened & PACA_IRQ_PMI) {
-		local_paca->irq_happened &= ~PACA_IRQ_PMI;
+	if (local_r14 & PACA_IRQ_PMI) {
+		r14_clear_bits(PACA_IRQ_PMI);
 		return 0xf00;
 	}
 
-	if (happened & PACA_IRQ_EE) {
-		local_paca->irq_happened &= ~PACA_IRQ_EE;
+	/* Finally check if an external interrupt happened */
+	if (local_r14 & PACA_IRQ_EE) {
+		r14_clear_bits(PACA_IRQ_EE);
 		return 0x500;
 	}
 
@@ -212,14 +198,14 @@ notrace unsigned int __check_irq_replay(void)
 		return 0x280;
 	}
 #else
-	if (happened & PACA_IRQ_DBELL) {
-		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
+	if (local_r14 & PACA_IRQ_DBELL) {
+		r14_clear_bits(PACA_IRQ_DBELL);
 		return 0xa00;
 	}
 #endif /* CONFIG_PPC_BOOK3E */
 
 	/* There should be nothing left ! */
-	BUG_ON(local_paca->irq_happened != 0);
+	BUG_ON((local_r14 & R14_BIT_IRQ_HAPPENED_MASK) != 0);
 
 	return 0;
 }
@@ -321,7 +307,7 @@ EXPORT_SYMBOL(arch_local_irq_restore);
 void notrace restore_interrupts(void)
 {
 	if (irqs_disabled()) {
-		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+		r14_set_bits(PACA_IRQ_HARD_DIS);
 		local_irq_enable();
 	} else
 		__hard_irq_enable();
@@ -349,7 +335,7 @@ bool prep_irq_for_idle(void)
 	 * occurs before we effectively enter the low power state
 	 */
 	__hard_irq_disable();
-	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+	r14_set_bits(PACA_IRQ_HARD_DIS);
 
 	/*
 	 * If anything happened while we were soft-disabled,
@@ -367,7 +353,7 @@ bool prep_irq_for_idle(void)
 	 * are about to hard enable as well as a side effect
 	 * of entering the low power state.
 	 */
-	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+	r14_clear_bits(PACA_IRQ_HARD_DIS);
 	__irq_soft_mask_clear(IRQ_SOFT_MASK_ALL);
 
 	/* Tell the caller to enter the low power state */
@@ -390,7 +376,7 @@ bool prep_irq_for_idle_irqsoff(void)
 	 * occurs before we effectively enter the low power state
 	 */
 	__hard_irq_disable();
-	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+	r14_set_bits(PACA_IRQ_HARD_DIS);
 
 	/*
 	 * If anything happened while we were soft-disabled,
@@ -465,7 +451,7 @@ void irq_set_pending_from_srr1(unsigned long srr1)
 	 * If a future CPU was to designate this as an interrupt reason,
 	 * then a new index for no interrupt must be assigned.
 	 */
-	local_paca->irq_happened |= reason;
+	r14_set_bits(reason);
 }
 #endif /* CONFIG_PPC_BOOK3S */
 
@@ -481,7 +467,7 @@ void force_external_irq_replay(void)
 	WARN_ON(!arch_irqs_disabled());
 
 	/* Indicate in the PACA that we have an interrupt to replay */
-	local_paca->irq_happened |= PACA_IRQ_EE;
+	r14_set_bits(PACA_IRQ_EE);
 }
 
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2d46037ce936..870dd835c8b6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2614,13 +2614,13 @@ static void set_irq_happened(int trap)
 {
 	switch (trap) {
 	case BOOK3S_INTERRUPT_EXTERNAL:
-		local_paca->irq_happened |= PACA_IRQ_EE;
+		r14_set_bits(PACA_IRQ_EE);
 		break;
 	case BOOK3S_INTERRUPT_H_DOORBELL:
-		local_paca->irq_happened |= PACA_IRQ_DBELL;
+		r14_set_bits(PACA_IRQ_DBELL);
 		break;
 	case BOOK3S_INTERRUPT_HMI:
-		local_paca->irq_happened |= PACA_IRQ_HMI;
+		r14_set_bits(PACA_IRQ_HMI);
 		break;
 	case BOOK3S_INTERRUPT_SYSTEM_RESET:
 		replay_system_reset();
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index d7d3885035f2..df73f8a1f030 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1622,8 +1622,7 @@ static void excprint(struct pt_regs *fp)
 
 	printf("  current = 0x%lx\n", current);
 #ifdef CONFIG_PPC64
-	printf("  paca    = 0x%lx\t r14: 0x%lx\t irq_happened: 0x%02x\n",
-	       local_paca, local_r14, local_paca->irq_happened);
+	printf("  paca    = 0x%lx\t r14: %lx\n", local_paca, local_r14);
 #endif
 	if (current) {
 		printf("    pid   = %ld, comm = %s\n",
@@ -2391,7 +2390,6 @@ static void dump_one_paca(int cpu)
 	DUMP(p, stab_rr, "lx");
 	DUMP(p, saved_r1, "lx");
 	DUMP(p, trap_save, "x");
-	DUMP(p, irq_happened, "x");
 	DUMP(p, nap_state_lost, "x");
 	DUMP(p, sprg_vdso, "llx");
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 8/8] powerpc/64s: inline local_irq_enable/restore
  2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
                   ` (6 preceding siblings ...)
  2017-12-20 14:52 ` [RFC PATCH 7/8] powerpc/64s: put irq_soft_mask and irq_happened " Nicholas Piggin
@ 2017-12-20 14:52 ` Nicholas Piggin
  7 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-20 14:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This does increase kernel text size by about 0.4%, but code is often
improved by putting the interrupt-replay call out of line, and gcc
function "shrink wrapping" can more often avoid setting up a stack
frame, e.g., _raw_spin_unlock_irqrestore fastpath before:

	<_raw_spin_unlock_irqrestore>:
	addis   r2,r12,63
	addi    r2,r2,24688
	mflr    r0
	andi.   r9,r14,256
	mr      r9,r3
	std     r0,16(r1)
	stdu    r1,-32(r1)
	bne     c0000000009fd1e0 <_raw_spin_unlock_irqrestore+0x50>
	lwsync
	li      r10,0
	mr      r3,r4
	stw     r10,0(r9)
	bl      c000000000013f98 <arch_local_irq_restore+0x8>

		<arch_local_irq_restore>:
		addis   r2,r12,222
		addi    r2,r2,-3472
		rldimi  r14,r3,0,62
		cmpdi   cr7,r3,0
		bnelr   cr7
		andi.   r9,r14,252
		beqlr

	nop
	addi    r1,r1,32
	ld      r0,16(r1)
	mtlr    r0
	blr

And after:

	<_raw_spin_unlock_irqrestore>:
	addis   r2,r12,64
	addi    r2,r2,-15200
	andi.   r9,r14,256
	bne     c000000000a06dd0 <_raw_spin_unlock_irqrestore+0x70>
	lwsync
	li      r9,0
	stw     r9,0(r3)
	rldimi  r14,r4,0,62
	cmpdi   cr7,r4,0
	bne     cr7,c000000000a06d90 <_raw_spin_unlock_irqrestore+0x30>
	andi.   r9,r14,252
	bne     c000000000a06da0 <_raw_spin_unlock_irqrestore+0x40>
	blr

GCC can still improve code size for the slow paths by avoiding aligning
branch targets too, so there is room to reduce the text size cost there.
---
 arch/powerpc/include/asm/hw_irq.h | 15 +++++++++++++--
 arch/powerpc/kernel/irq.c         | 28 ++++++----------------------
 2 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index f492a7779ea3..8690e0d5605d 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -132,11 +132,22 @@ static inline void arch_local_irq_disable(void)
 	irq_soft_mask_set(IRQ_SOFT_MASK_STD);
 }
 
-extern void arch_local_irq_restore(unsigned long);
+extern void __arch_local_irq_enable(void);
 
 static inline void arch_local_irq_enable(void)
 {
-	arch_local_irq_restore(0);
+	__irq_soft_mask_clear(IRQ_SOFT_MASK_ALL);
+	if (unlikely(local_r14 & R14_BIT_IRQ_HAPPENED_MASK))
+		__arch_local_irq_enable();
+}
+
+static inline void arch_local_irq_restore(unsigned long flags)
+{
+	__irq_soft_mask_insert(flags);
+	if (!flags) {
+		if (unlikely(local_r14 & R14_BIT_IRQ_HAPPENED_MASK))
+			__arch_local_irq_enable();
+	}
 }
 
 static inline unsigned long arch_local_irq_save(void)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index ebaf210a7406..e2ff0210477e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -97,11 +97,6 @@ extern int tau_interrupts(int);
 
 int distribute_irqs = 1;
 
-static inline notrace unsigned long get_irq_happened(void)
-{
-	return local_r14 & R14_BIT_IRQ_HAPPENED_MASK;
-}
-
 static inline notrace int decrementer_check_overflow(void)
 {
  	u64 now = get_tb_or_rtc();
@@ -210,19 +205,10 @@ notrace unsigned int __check_irq_replay(void)
 	return 0;
 }
 
-notrace void arch_local_irq_restore(unsigned long mask)
+notrace void __arch_local_irq_enable(void)
 {
-	unsigned char irq_happened;
 	unsigned int replay;
 
-	/* Write the new soft-enabled value */
-	__irq_soft_mask_insert(mask);
-	/* any bits still disabled */
-	if (mask)
-		return;
-
-	barrier();
-
 	/*
 	 * From this point onward, we can take interrupts, preempt,
 	 * etc... unless we got hard-disabled. We check if an event
@@ -236,9 +222,6 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	 * be hard-disabled, so there is no problem, we
 	 * cannot have preempted.
 	 */
-	irq_happened = get_irq_happened();
-	if (!irq_happened)
-		return;
 
 	/*
 	 * We need to hard disable to get a trusted value from
@@ -252,10 +235,11 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	 * (expensive) mtmsrd.
 	 * XXX: why not test & IRQ_HARD_DIS?
 	 */
-	if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
+	if (unlikely((local_r14 & R14_BIT_IRQ_HAPPENED_MASK) !=
+						PACA_IRQ_HARD_DIS)) {
 		__hard_irq_disable();
 #ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-	else {
+	} else {
 		/*
 		 * We should already be hard disabled here. We had bugs
 		 * where that wasn't the case so let's dbl check it and
@@ -264,8 +248,8 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		 */
 		if (WARN_ON(mfmsr() & MSR_EE))
 			__hard_irq_disable();
-	}
 #endif
+	}
 
 	__irq_soft_mask_set(IRQ_SOFT_MASK_ALL);
 	trace_hardirqs_off();
@@ -293,7 +277,7 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	/* Finally, let's ensure we are hard enabled */
 	__hard_irq_enable();
 }
-EXPORT_SYMBOL(arch_local_irq_restore);
+EXPORT_SYMBOL(__arch_local_irq_enable);
 
 /*
  * This is specifically called by assembly code to re-enable interrupts
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14
  2017-12-20 14:52 ` [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14 Nicholas Piggin
@ 2017-12-20 17:53   ` Gabriel Paubert
  2017-12-22 13:50     ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Gabriel Paubert @ 2017-12-20 17:53 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev

On Thu, Dec 21, 2017 at 12:52:01AM +1000, Nicholas Piggin wrote:
> Shifted left by 16 bits, so the low 16 bits of r14 remain available.
> This allows per-cpu pointers to be dereferenced with a single extra
> shift whereas previously it was a load and add.
> ---
>  arch/powerpc/include/asm/paca.h   |  5 +++++
>  arch/powerpc/include/asm/percpu.h |  2 +-
>  arch/powerpc/kernel/entry_64.S    |  5 -----
>  arch/powerpc/kernel/head_64.S     |  5 +----
>  arch/powerpc/kernel/setup_64.c    | 11 +++++++++--
>  5 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index cd6a9a010895..4dd4ac69e84f 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -35,6 +35,11 @@
>  
>  register struct paca_struct *local_paca asm("r13");
>  #ifdef CONFIG_PPC_BOOK3S
> +/*
> + * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.

Top 32, really? It's 48 in later comments.

	Gabriel

> + * The per-cpu could be moved completely to vmalloc space if we had large
> + * vmalloc page mapping? (no, must access it in real mode).
> + */
>  register u64 local_r14 asm("r14");
>  #endif
>  
> diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
> index dce863a7635c..1e0d79d30eac 100644
> --- a/arch/powerpc/include/asm/percpu.h
> +++ b/arch/powerpc/include/asm/percpu.h
> @@ -12,7 +12,7 @@
>  
>  #include <asm/paca.h>
>  
> -#define __my_cpu_offset local_paca->data_offset
> +#define __my_cpu_offset (local_r14 >> 16)
>  
>  #endif /* CONFIG_SMP */
>  #endif /* __powerpc64__ */
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 592e4b36065f..6b0e3ac311e8 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -262,11 +262,6 @@ system_call_exit:
>  BEGIN_FTR_SECTION
>  	stdcx.	r0,0,r1			/* to clear the reservation */
>  END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
> -	LOAD_REG_IMMEDIATE(r10, 0xdeadbeefULL << 32)
> -	mfspr	r11,SPRN_PIR
> -	or	r10,r10,r11
> -	tdne	r10,r14
> -
>  	andi.	r6,r8,MSR_PR
>  	ld	r4,_LINK(r1)
>  
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 5a9ec06eab14..cdb710f43681 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -413,10 +413,7 @@ generic_secondary_common_init:
>  	b	kexec_wait		/* next kernel might do better	 */
>  
>  2:	SET_PACA(r13)
> -	LOAD_REG_IMMEDIATE(r14, 0xdeadbeef << 32)
> -	mfspr	r3,SPRN_PIR
> -	or	r14,r14,r3
> -	std	r14,PACA_R14(r13)
> +	ld	r14,PACA_R14(r13)
>  
>  #ifdef CONFIG_PPC_BOOK3E
>  	addi	r12,r13,PACA_EXTLB	/* and TLB exc frame in another  */
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 9a4c5bf35d92..f4a96ebb523a 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -192,8 +192,8 @@ static void __init fixup_boot_paca(void)
>  	get_paca()->data_offset = 0;
>  	/* Mark interrupts disabled in PACA */
>  	irq_soft_mask_set(IRQ_SOFT_MASK_STD);
> -	/* Set r14 and paca_r14 to debug value */
> -	get_paca()->r14 = (0xdeadbeefULL << 32) | mfspr(SPRN_PIR);
> +	/* Set r14 and paca_r14 to zero */
> +	get_paca()->r14 = 0;
>  	local_r14 = get_paca()->r14;
>  }
>  
> @@ -761,7 +761,14 @@ void __init setup_per_cpu_areas(void)
>  	for_each_possible_cpu(cpu) {
>                  __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
>  		paca[cpu].data_offset = __per_cpu_offset[cpu];
> +
> +		BUG_ON(paca[cpu].data_offset & (PAGE_SIZE-1));
> +		BUG_ON(paca[cpu].data_offset >= (1UL << (64 - 16)));
> +
> +		/* The top 48 bits are used for per-cpu data */
> +		paca[cpu].r14 |= paca[cpu].data_offset << 16;
>  	}
> +	local_r14 = paca[smp_processor_id()].r14;
>  }
>  #endif
>  
> -- 
> 2.15.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14
  2017-12-20 17:53   ` Gabriel Paubert
@ 2017-12-22 13:50     ` Nicholas Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2017-12-22 13:50 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev

On Wed, 20 Dec 2017 18:53:24 +0100
Gabriel Paubert <paubert@iram.es> wrote:

> On Thu, Dec 21, 2017 at 12:52:01AM +1000, Nicholas Piggin wrote:
> > Shifted left by 16 bits, so the low 16 bits of r14 remain available.
> > This allows per-cpu pointers to be dereferenced with a single extra
> > shift whereas previously it was a load and add.
> > ---
> >  arch/powerpc/include/asm/paca.h   |  5 +++++
> >  arch/powerpc/include/asm/percpu.h |  2 +-
> >  arch/powerpc/kernel/entry_64.S    |  5 -----
> >  arch/powerpc/kernel/head_64.S     |  5 +----
> >  arch/powerpc/kernel/setup_64.c    | 11 +++++++++--
> >  5 files changed, 16 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> > index cd6a9a010895..4dd4ac69e84f 100644
> > --- a/arch/powerpc/include/asm/paca.h
> > +++ b/arch/powerpc/include/asm/paca.h
> > @@ -35,6 +35,11 @@
> >  
> >  register struct paca_struct *local_paca asm("r13");
> >  #ifdef CONFIG_PPC_BOOK3S
> > +/*
> > + * The top 32-bits of r14 is used as the per-cpu offset, shifted by PAGE_SHIFT.  
> 
> Top 32, really? It's 48 in later comments.

Yep, I used 32 to start with but it wasn't enough. Will fix.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 4/8] powerpc/64s: put io_sync bit into r14
  2017-12-20 14:52 ` [RFC PATCH 4/8] powerpc/64s: put io_sync bit into r14 Nicholas Piggin
@ 2017-12-22 15:08   ` Thiago Jung Bauermann
  0 siblings, 0 replies; 12+ messages in thread
From: Thiago Jung Bauermann @ 2017-12-22 15:08 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev


Hello Nicholas,

Just a small comment about syntax. I'm afraid I can't comment much about
the substance of the patch.

Nicholas Piggin <npiggin@gmail.com> writes:
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index b9ebc3085fb7..182bb9304c79 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -40,16 +40,9 @@
>  #endif
>
>  #if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
> -#define CLEAR_IO_SYNC	(get_paca()->io_sync = 0)
> -#define SYNC_IO		do {						\
> -				if (unlikely(get_paca()->io_sync)) {	\
> -					mb();				\
> -					get_paca()->io_sync = 0;	\
> -				}					\
> -			} while (0)
> +#define CLEAR_IO_SYNC	do { r14_clear_bits(R14_BIT_IO_SYNC); } while(0)

Is there a reason for the do { } while(0) idiom here? If
r14_clear_bits() is an inline function, isn't it a single statement
already?

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-12-22 15:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-20 14:51 [RFC PATCH 0/8] use r14 for a per-cpu kernel register Nicholas Piggin
2017-12-20 14:51 ` [RFC PATCH 1/8] powerpc/64s: stop using r14 register Nicholas Piggin
2017-12-20 14:52 ` [RFC PATCH 2/8] powerpc/64s: poison r14 register while in kernel Nicholas Piggin
2017-12-20 14:52 ` [RFC PATCH 3/8] powerpc/64s: put the per-cpu data_offset in r14 Nicholas Piggin
2017-12-20 17:53   ` Gabriel Paubert
2017-12-22 13:50     ` Nicholas Piggin
2017-12-20 14:52 ` [RFC PATCH 4/8] powerpc/64s: put io_sync bit into r14 Nicholas Piggin
2017-12-22 15:08   ` Thiago Jung Bauermann
2017-12-20 14:52 ` [RFC PATCH 5/8] powerpc/64s: put work_pending " Nicholas Piggin
2017-12-20 14:52 ` [RFC PATCH 6/8] powerpc/64s: put irq_soft_mask bits " Nicholas Piggin
2017-12-20 14:52 ` [RFC PATCH 7/8] powerpc/64s: put irq_soft_mask and irq_happened " Nicholas Piggin
2017-12-20 14:52 ` [RFC PATCH 8/8] powerpc/64s: inline local_irq_enable/restore Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.