All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/32] powerpc/64: interrupts and syscalls series
@ 2020-02-25 17:35 Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation Nicholas Piggin
                   ` (33 more replies)
  0 siblings, 34 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

This is a long overdue update of the series, with fixes from me Michal
and Michael. Does not include Michal's syscall compat series.

Patches 1-22 are changes to low level 64s interrupt entry assembly
which has been posted before, no change except adding patch 21 and
fixing patch 22 to reconcile irq state in the soft-nmi handler to
avoid preempt warnings.

Patches 23-26 are to turn system call entry/exit code into C. Bunch
of irq and preempt and TM warnings and bugs caught by selftests etc
fixed, plus a few peripheral patches added (sstep and zeroing regs).

Patches 27-29 are to turn interrupt exit code into C. This had a bit
more change, most significantly a change to how interrupt exit soft
irq replay works.

Patches 30-32 are for scv system call support. Lot of changes here
to turn it into something a bit better than RFC quality. Discussion
about ABI seems to be settling and not very controversial.

Thanks,
Nick

Nicholas Piggin (32):
  powerpc/64s/exception: Introduce INT_DEFINE parameter block for code
    generation
  powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE
    parameters
  powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE
    parameters
  powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
  powerpc/64s/exception: Move all interrupt handlers to new style code
    gen macros
  powerpc/64s/exception: Remove old INT_ENTRY macro
  powerpc/64s/exception: Remove old INT_COMMON macro
  powerpc/64s/exception: Remove old INT_KVM_HANDLER
  powerpc/64s/exception: Add ISIDE option
  powerpc/64s/exception: move real->virt switch into the common handler
  powerpc/64s/exception: move soft-mask test to common code
  powerpc/64s/exception: move KVM test to common code
  powerpc/64s/exception: remove confusing IEARLY option
  powerpc/64s/exception: remove the SPR saving patch code macros
  powerpc/64s/exception: trim unused arguments from KVMTEST macro
  powerpc/64s/exception: hdecrementer avoid touching the stack
  powerpc/64s/exception: re-inline some handlers
  powerpc/64s/exception: Clean up SRR specifiers
  powerpc/64s/exception: add more comments for interrupt handlers
  powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is
    supported
  powerpc/64s/exception: sreset interrupts reconcile fix
  powerpc/64s/exception: soft nmi interrupt should not use
    ret_from_except
  powerpc/64: system call remove non-volatile GPR save optimisation
  powerpc/64: sstep ifdef the deprecated fast endian switch syscall
  powerpc/64: system call implement entry/exit logic in C
  powerpc/64: system call zero volatile registers when returning
  powerpc/64: implement soft interrupt replay in C
  powerpc/64s: interrupt implement exit logic in C
  powerpc/64s/exception: remove lite interrupt return
  powerpc/64: system call reconcile interrupts
  powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
  powerpc/64s: system call support for scv/rfscv instructions

 Documentation/powerpc/syscall64-abi.rst       |   42 +-
 arch/powerpc/include/asm/asm-prototypes.h     |   17 +-
 .../powerpc/include/asm/book3s/64/kup-radix.h |   24 +-
 arch/powerpc/include/asm/cputime.h            |   29 +
 arch/powerpc/include/asm/exception-64s.h      |   10 +-
 arch/powerpc/include/asm/head-64.h            |    2 +-
 arch/powerpc/include/asm/hw_irq.h             |    6 +-
 arch/powerpc/include/asm/ppc_asm.h            |    2 +
 arch/powerpc/include/asm/processor.h          |    2 +-
 arch/powerpc/include/asm/ptrace.h             |    3 +
 arch/powerpc/include/asm/setup.h              |    4 +-
 arch/powerpc/include/asm/signal.h             |    3 +
 arch/powerpc/include/asm/switch_to.h          |   11 +
 arch/powerpc/include/asm/time.h               |    4 +-
 arch/powerpc/kernel/Makefile                  |    3 +-
 arch/powerpc/kernel/cpu_setup_power.S         |    2 +-
 arch/powerpc/kernel/cputable.c                |    3 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c             |    1 +
 arch/powerpc/kernel/entry_64.S                | 1017 +++-----
 arch/powerpc/kernel/exceptions-64e.S          |  287 ++-
 arch/powerpc/kernel/exceptions-64s.S          | 2168 ++++++++++++-----
 arch/powerpc/kernel/irq.c                     |  183 +-
 arch/powerpc/kernel/process.c                 |   89 +-
 arch/powerpc/kernel/setup_64.c                |    5 +-
 arch/powerpc/kernel/signal.h                  |    2 -
 arch/powerpc/kernel/syscall_64.c              |  379 +++
 arch/powerpc/kernel/syscalls/syscall.tbl      |   22 +-
 arch/powerpc/kernel/systbl.S                  |    9 +-
 arch/powerpc/kernel/time.c                    |    9 -
 arch/powerpc/kernel/vector.S                  |    2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S       |   11 -
 arch/powerpc/kvm/book3s_segment.S             |    7 -
 arch/powerpc/lib/sstep.c                      |    5 +-
 arch/powerpc/platforms/pseries/setup.c        |    8 +-
 34 files changed, 2769 insertions(+), 1602 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-04-01 12:53   ` Michael Ellerman
  2020-02-25 17:35 ` [PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters Nicholas Piggin
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The code generation macro arguments are difficult to read, and
defaults can't easily be used.

This introduces a block where parameters can be set for interrupt
handler code generation by the subsequent macros, and adds the first
generation macro for interrupt entry.

One interrupt handler is converted to the new macros to demonstrate
the change, the rest will be coverted all at once.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 77 ++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ffc15f4f079d..1b942c98bc05 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -193,6 +193,61 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 	mtctr	reg;							\
 	bctr
 
+/*
+ * Interrupt code generation macros
+ */
+#define IVEC		.L_IVEC_\name\()
+#define IHSRR		.L_IHSRR_\name\()
+#define IAREA		.L_IAREA_\name\()
+#define IDAR		.L_IDAR_\name\()
+#define IDSISR		.L_IDSISR_\name\()
+#define ISET_RI		.L_ISET_RI_\name\()
+#define IEARLY		.L_IEARLY_\name\()
+#define IMASK		.L_IMASK_\name\()
+#define IKVM_REAL	.L_IKVM_REAL_\name\()
+#define IKVM_VIRT	.L_IKVM_VIRT_\name\()
+
+#define INT_DEFINE_BEGIN(n)						\
+.macro int_define_ ## n name
+
+#define INT_DEFINE_END(n)						\
+.endm ;									\
+int_define_ ## n n ;							\
+do_define_int n
+
+.macro do_define_int name
+	.ifndef IVEC
+		.error "IVEC not defined"
+	.endif
+	.ifndef IHSRR
+		IHSRR=EXC_STD
+	.endif
+	.ifndef IAREA
+		IAREA=PACA_EXGEN
+	.endif
+	.ifndef IDAR
+		IDAR=0
+	.endif
+	.ifndef IDSISR
+		IDSISR=0
+	.endif
+	.ifndef ISET_RI
+		ISET_RI=1
+	.endif
+	.ifndef IEARLY
+		IEARLY=0
+	.endif
+	.ifndef IMASK
+		IMASK=0
+	.endif
+	.ifndef IKVM_REAL
+		IKVM_REAL=0
+	.endif
+	.ifndef IKVM_VIRT
+		IKVM_VIRT=0
+	.endif
+.endm
+
 .macro INT_KVM_HANDLER name, vec, hsrr, area, skip
 	TRAMP_KVM_BEGIN(\name\()_kvm)
 	KVM_HANDLER \vec, \hsrr, \area, \skip
@@ -474,7 +529,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	 */
 	GET_SCRATCH0(r10)
 	std	r10,\area\()+EX_R13(r13)
-	.if \dar
+	.if \dar == 1
 	.if \hsrr
 	mfspr	r10,SPRN_HDAR
 	.else
@@ -482,7 +537,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.endif
 	std	r10,\area\()+EX_DAR(r13)
 	.endif
-	.if \dsisr
+	.if \dsisr == 1
 	.if \hsrr
 	mfspr	r10,SPRN_HDSISR
 	.else
@@ -506,6 +561,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.endif
 .endm
 
+.macro GEN_INT_ENTRY name, virt, ool=0
+	.if ! \virt
+		INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, ISET_RI, IDAR, IDSISR, IMASK, IKVM_REAL
+	.else
+		INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, ISET_RI, IDAR, IDSISR, IMASK, IKVM_VIRT
+	.endif
+.endm
+
 /*
  * On entry r13 points to the paca, r9-r13 are saved in the paca,
  * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and
@@ -1143,12 +1206,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	bl	unrecoverable_exception
 	b	.
 
+INT_DEFINE_BEGIN(data_access)
+	IVEC=0x300
+	IDAR=1
+	IDSISR=1
+	IKVM_REAL=1
+INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
-	INT_HANDLER data_access, 0x300, ool=1, dar=1, dsisr=1, kvm=1
+	GEN_INT_ENTRY data_access, virt=0, ool=1
 EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
-	INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1
+	GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
 EXC_COMMON_BEGIN(data_access_common)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 03/32] powerpc/64s/exception: Add GEN_KVM " Nicholas Piggin
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1b942c98bc05..f3f2ec88b3d8 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -206,6 +206,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IMASK		.L_IMASK_\name\()
 #define IKVM_REAL	.L_IKVM_REAL_\name\()
 #define IKVM_VIRT	.L_IKVM_VIRT_\name\()
+#define ISTACK		.L_ISTACK_\name\()
+#define IRECONCILE	.L_IRECONCILE_\name\()
+#define IKUAP		.L_IKUAP_\name\()
 
 #define INT_DEFINE_BEGIN(n)						\
 .macro int_define_ ## n name
@@ -246,6 +249,15 @@ do_define_int n
 	.ifndef IKVM_VIRT
 		IKVM_VIRT=0
 	.endif
+	.ifndef ISTACK
+		ISTACK=1
+	.endif
+	.ifndef IRECONCILE
+		IRECONCILE=1
+	.endif
+	.ifndef IKUAP
+		IKUAP=1
+	.endif
 .endm
 
 .macro INT_KVM_HANDLER name, vec, hsrr, area, skip
@@ -670,6 +682,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
 	.endif
 .endm
 
+.macro GEN_COMMON name
+	INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR
+.endm
+
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -1221,13 +1237,7 @@ EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
 EXC_COMMON_BEGIN(data_access_common)
-	/*
-	 * Here r13 points to the paca, r9 contains the saved CR,
-	 * SRR0 and SRR1 are saved in r11 and r12,
-	 * r9 - r13 are saved in paca->exgen.
-	 * EX_DAR and EX_DSISR have saved DAR/DSISR
-	 */
-	INT_COMMON 0x300, PACA_EXGEN, 1, 1, 1, 1, 1
+	GEN_COMMON data_access
 	ld	r4,_DAR(r1)
 	ld	r5,_DSISR(r1)
 BEGIN_MMU_FTR_SECTION
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 03/32] powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 04/32] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros Nicholas Piggin
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f3f2ec88b3d8..da3c22eea72d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -204,6 +204,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define ISET_RI		.L_ISET_RI_\name\()
 #define IEARLY		.L_IEARLY_\name\()
 #define IMASK		.L_IMASK_\name\()
+#define IKVM_SKIP	.L_IKVM_SKIP_\name\()
 #define IKVM_REAL	.L_IKVM_REAL_\name\()
 #define IKVM_VIRT	.L_IKVM_VIRT_\name\()
 #define ISTACK		.L_ISTACK_\name\()
@@ -243,6 +244,9 @@ do_define_int n
 	.ifndef IMASK
 		IMASK=0
 	.endif
+	.ifndef IKVM_SKIP
+		IKVM_SKIP=0
+	.endif
 	.ifndef IKVM_REAL
 		IKVM_REAL=0
 	.endif
@@ -265,6 +269,10 @@ do_define_int n
 	KVM_HANDLER \vec, \hsrr, \area, \skip
 .endm
 
+.macro GEN_KVM name
+	KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP
+.endm
+
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -1226,6 +1234,7 @@ INT_DEFINE_BEGIN(data_access)
 	IVEC=0x300
 	IDAR=1
 	IDSISR=1
+	IKVM_SKIP=1
 	IKVM_REAL=1
 INT_DEFINE_END(data_access)
 
@@ -1235,7 +1244,8 @@ EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 	GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
-INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(data_access_kvm)
+	GEN_KVM data_access
 EXC_COMMON_BEGIN(data_access_common)
 	GEN_COMMON data_access
 	ld	r4,_DAR(r1)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 04/32] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (2 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 03/32] powerpc/64s/exception: Add GEN_KVM " Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 05/32] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros Nicholas Piggin
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

These don't provide a large amount of code sharing. Removing them
makes code easier to shuffle around. For example, some of the common
instructions will be moved into the common code gen macro.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 160 ++++++++++++++++++++-------
 1 file changed, 117 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index da3c22eea72d..0f1da3099c28 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -757,28 +757,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
 #define FINISH_NAP
 #endif
 
-#define EXC_COMMON(name, realvec, hdlr)					\
-	EXC_COMMON_BEGIN(name);						\
-	INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ;			\
-	bl	save_nvgprs;						\
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				\
-	bl	hdlr;							\
-	b	ret_from_except
-
-/*
- * Like EXC_COMMON, but for exceptions that can occur in the idle task and
- * therefore need the special idle handling (finish nap and runlatch)
- */
-#define EXC_COMMON_ASYNC(name, realvec, hdlr)				\
-	EXC_COMMON_BEGIN(name);						\
-	INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ;			\
-	FINISH_NAP;							\
-	RUNLATCH_ON;							\
-	addi	r3,r1,STACK_FRAME_OVERHEAD;				\
-	bl	hdlr;							\
-	b	ret_from_except_lite
-
-
 /*
  * There are a few constraints to be concerned with.
  * - Real mode exceptions code/data must be located at their physical location.
@@ -1349,7 +1327,13 @@ EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100)
 	INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1
 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100)
 INT_KVM_HANDLER hardware_interrupt, 0x500, EXC_HV_OR_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ)
+EXC_COMMON_BEGIN(hardware_interrupt_common)
+	INT_COMMON 0x500, PACA_EXGEN, 1, 1, 1, 0, 0
+	FINISH_NAP
+	RUNLATCH_ON
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	do_IRQ
+	b	ret_from_except_lite
 
 
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
@@ -1455,7 +1439,13 @@ EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
 	INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED
 EXC_VIRT_END(decrementer, 0x4900, 0x80)
 INT_KVM_HANDLER decrementer, 0x900, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt)
+EXC_COMMON_BEGIN(decrementer_common)
+	INT_COMMON 0x900, PACA_EXGEN, 1, 1, 1, 0, 0
+	FINISH_NAP
+	RUNLATCH_ON
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	timer_interrupt
+	b	ret_from_except_lite
 
 
 EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80)
@@ -1465,7 +1455,12 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
 	INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
 INT_KVM_HANDLER hdecrementer, 0x980, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt)
+EXC_COMMON_BEGIN(hdecrementer_common)
+	INT_COMMON 0x980, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	hdec_interrupt
+	b	ret_from_except
 
 
 EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100)
@@ -1475,11 +1470,17 @@ EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100)
 	INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED
 EXC_VIRT_END(doorbell_super, 0x4a00, 0x100)
 INT_KVM_HANDLER doorbell_super, 0xa00, EXC_STD, PACA_EXGEN, 0
+EXC_COMMON_BEGIN(doorbell_super_common)
+	INT_COMMON 0xa00, PACA_EXGEN, 1, 1, 1, 0, 0
+	FINISH_NAP
+	RUNLATCH_ON
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_DOORBELL
-EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception)
+	bl	doorbell_exception
 #else
-EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, unknown_exception)
+	bl	unknown_exception
 #endif
+	b	ret_from_except_lite
 
 
 EXC_REAL_NONE(0xb00, 0x100)
@@ -1610,7 +1611,12 @@ EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
 	INT_HANDLER single_step, 0xd00, virt=1
 EXC_VIRT_END(single_step, 0x4d00, 0x100)
 INT_KVM_HANDLER single_step, 0xd00, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON(single_step_common, 0xd00, single_step_exception)
+EXC_COMMON_BEGIN(single_step_common)
+	INT_COMMON 0xd00, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	single_step_exception
+	b	ret_from_except
 
 
 EXC_REAL_BEGIN(h_data_storage, 0xe00, 0x20)
@@ -1641,7 +1647,12 @@ EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20)
 	INT_HANDLER h_instr_storage, 0xe20, ool=1, virt=1, hsrr=EXC_HV, kvm=1
 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20)
 INT_KVM_HANDLER h_instr_storage, 0xe20, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON(h_instr_storage_common, 0xe20, unknown_exception)
+EXC_COMMON_BEGIN(h_instr_storage_common)
+	INT_COMMON 0xe20, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	unknown_exception
+	b	ret_from_except
 
 
 EXC_REAL_BEGIN(emulation_assist, 0xe40, 0x20)
@@ -1651,7 +1662,12 @@ EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20)
 	INT_HANDLER emulation_assist, 0xe40, ool=1, virt=1, hsrr=EXC_HV, kvm=1
 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20)
 INT_KVM_HANDLER emulation_assist, 0xe40, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON(emulation_assist_common, 0xe40, emulation_assist_interrupt)
+EXC_COMMON_BEGIN(emulation_assist_common)
+	INT_COMMON 0xe40, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	emulation_assist_interrupt
+	b	ret_from_except
 
 
 /*
@@ -1708,11 +1724,17 @@ EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20)
 	INT_HANDLER h_doorbell, 0xe80, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
 EXC_VIRT_END(h_doorbell, 0x4e80, 0x20)
 INT_KVM_HANDLER h_doorbell, 0xe80, EXC_HV, PACA_EXGEN, 0
+EXC_COMMON_BEGIN(h_doorbell_common)
+	INT_COMMON 0xe80, PACA_EXGEN, 1, 1, 1, 0, 0
+	FINISH_NAP
+	RUNLATCH_ON
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_DOORBELL
-EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, doorbell_exception)
+	bl	doorbell_exception
 #else
-EXC_COMMON_ASYNC(h_doorbell_common, 0xe80, unknown_exception)
+	bl	unknown_exception
 #endif
+	b	ret_from_except_lite
 
 
 EXC_REAL_BEGIN(h_virt_irq, 0xea0, 0x20)
@@ -1722,7 +1744,13 @@ EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20)
 	INT_HANDLER h_virt_irq, 0xea0, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
 EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20)
 INT_KVM_HANDLER h_virt_irq, 0xea0, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(h_virt_irq_common, 0xea0, do_IRQ)
+EXC_COMMON_BEGIN(h_virt_irq_common)
+	INT_COMMON 0xea0, PACA_EXGEN, 1, 1, 1, 0, 0
+	FINISH_NAP
+	RUNLATCH_ON
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	do_IRQ
+	b	ret_from_except_lite
 
 
 EXC_REAL_NONE(0xec0, 0x20)
@@ -1738,7 +1766,13 @@ EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20)
 	INT_HANDLER performance_monitor, 0xf00, ool=1, virt=1, bitmask=IRQS_PMI_DISABLED
 EXC_VIRT_END(performance_monitor, 0x4f00, 0x20)
 INT_KVM_HANDLER performance_monitor, 0xf00, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(performance_monitor_common, 0xf00, performance_monitor_exception)
+EXC_COMMON_BEGIN(performance_monitor_common)
+	INT_COMMON 0xf00, PACA_EXGEN, 1, 1, 1, 0, 0
+	FINISH_NAP
+	RUNLATCH_ON
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	performance_monitor_exception
+	b	ret_from_except_lite
 
 
 EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20)
@@ -1829,7 +1863,12 @@ EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20)
 	INT_HANDLER facility_unavailable, 0xf60, ool=1, virt=1
 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20)
 INT_KVM_HANDLER facility_unavailable, 0xf60, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON(facility_unavailable_common, 0xf60, facility_unavailable_exception)
+EXC_COMMON_BEGIN(facility_unavailable_common)
+	INT_COMMON 0xf60, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	facility_unavailable_exception
+	b	ret_from_except
 
 
 EXC_REAL_BEGIN(h_facility_unavailable, 0xf80, 0x20)
@@ -1839,7 +1878,12 @@ EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20)
 	INT_HANDLER h_facility_unavailable, 0xf80, ool=1, virt=1, hsrr=EXC_HV, kvm=1
 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20)
 INT_KVM_HANDLER h_facility_unavailable, 0xf80, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON(h_facility_unavailable_common, 0xf80, facility_unavailable_exception)
+EXC_COMMON_BEGIN(h_facility_unavailable_common)
+	INT_COMMON 0xf80, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	facility_unavailable_exception
+	b	ret_from_except
 
 
 EXC_REAL_NONE(0xfa0, 0x20)
@@ -1860,7 +1904,12 @@ EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100)
 EXC_REAL_END(cbe_system_error, 0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
 INT_KVM_HANDLER cbe_system_error, 0x1200, EXC_HV, PACA_EXGEN, 1
-EXC_COMMON(cbe_system_error_common, 0x1200, cbe_system_error_exception)
+EXC_COMMON_BEGIN(cbe_system_error_common)
+	INT_COMMON 0x1200, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	cbe_system_error_exception
+	b	ret_from_except
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
@@ -1874,7 +1923,12 @@ EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100)
 	INT_HANDLER instruction_breakpoint, 0x1300, virt=1
 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100)
 INT_KVM_HANDLER instruction_breakpoint, 0x1300, EXC_STD, PACA_EXGEN, 1
-EXC_COMMON(instruction_breakpoint_common, 0x1300, instruction_breakpoint_exception)
+EXC_COMMON_BEGIN(instruction_breakpoint_common)
+	INT_COMMON 0x1300, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	instruction_breakpoint_exception
+	b	ret_from_except
 
 
 EXC_REAL_NONE(0x1400, 0x100)
@@ -1974,7 +2028,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	b	.
 #endif
 
-EXC_COMMON(denorm_common, 0x1500, unknown_exception)
+EXC_COMMON_BEGIN(denorm_common)
+	INT_COMMON 0x1500, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	unknown_exception
+	b	ret_from_except
 
 
 #ifdef CONFIG_CBE_RAS
@@ -1983,7 +2042,12 @@ EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100)
 EXC_REAL_END(cbe_maintenance, 0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
 INT_KVM_HANDLER cbe_maintenance, 0x1600, EXC_HV, PACA_EXGEN, 1
-EXC_COMMON(cbe_maintenance_common, 0x1600, cbe_maintenance_exception)
+EXC_COMMON_BEGIN(cbe_maintenance_common)
+	INT_COMMON 0x1600, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	cbe_maintenance_exception
+	b	ret_from_except
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
@@ -1997,11 +2061,16 @@ EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100)
 	INT_HANDLER altivec_assist, 0x1700, virt=1
 EXC_VIRT_END(altivec_assist, 0x5700, 0x100)
 INT_KVM_HANDLER altivec_assist, 0x1700, EXC_STD, PACA_EXGEN, 0
+EXC_COMMON_BEGIN(altivec_assist_common)
+	INT_COMMON 0x1700, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_ALTIVEC
-EXC_COMMON(altivec_assist_common, 0x1700, altivec_assist_exception)
+	bl	altivec_assist_exception
 #else
-EXC_COMMON(altivec_assist_common, 0x1700, unknown_exception)
+	bl	unknown_exception
 #endif
+	b	ret_from_except
 
 
 #ifdef CONFIG_CBE_RAS
@@ -2010,7 +2079,12 @@ EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100)
 EXC_REAL_END(cbe_thermal, 0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
 INT_KVM_HANDLER cbe_thermal, 0x1800, EXC_HV, PACA_EXGEN, 1
-EXC_COMMON(cbe_thermal_common, 0x1800, cbe_thermal_exception)
+EXC_COMMON_BEGIN(cbe_thermal_common)
+	INT_COMMON 0x1800, PACA_EXGEN, 1, 1, 1, 0, 0
+	bl	save_nvgprs
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	cbe_thermal_exception
+	b	ret_from_except
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 05/32] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (3 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 04/32] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 06/32] powerpc/64s/exception: Remove old INT_ENTRY macro Nicholas Piggin
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Aside from label names and BUG line numbers, the generated code change
is an additional HMI KVM handler added for the "late" KVM handler,
because early and late HMI generation is achieved by defining two
different interrupt types.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 556 ++++++++++++++++++++-------
 1 file changed, 418 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 0f1da3099c28..0157ba48efe9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -206,8 +206,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IMASK		.L_IMASK_\name\()
 #define IKVM_SKIP	.L_IKVM_SKIP_\name\()
 #define IKVM_REAL	.L_IKVM_REAL_\name\()
+#define __IKVM_REAL(name)	.L_IKVM_REAL_ ## name
 #define IKVM_VIRT	.L_IKVM_VIRT_\name\()
 #define ISTACK		.L_ISTACK_\name\()
+#define __ISTACK(name)	.L_ISTACK_ ## name
 #define IRECONCILE	.L_IRECONCILE_\name\()
 #define IKUAP		.L_IKUAP_\name\()
 
@@ -570,7 +572,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	/* nothing more */
 	.elseif \early
 	mfctr	r10			/* save ctr, even for !RELOCATABLE */
-	BRANCH_TO_C000(r11, \name\()_early_common)
+	BRANCH_TO_C000(r11, \name\()_common)
 	.elseif !\virt
 	INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri
 	.else
@@ -843,6 +845,19 @@ __start_interrupts:
 EXC_VIRT_NONE(0x4000, 0x100)
 
 
+INT_DEFINE_BEGIN(system_reset)
+	IVEC=0x100
+	IAREA=PACA_EXNMI
+	/*
+	 * MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
+	 * being used, so a nested NMI exception would corrupt it.
+	 */
+	ISET_RI=0
+	ISTACK=0
+	IRECONCILE=0
+	IKVM_REAL=1
+INT_DEFINE_END(system_reset)
+
 EXC_REAL_BEGIN(system_reset, 0x100, 0x100)
 #ifdef CONFIG_PPC_P7_NAP
 	/*
@@ -880,11 +895,8 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif
 
-	INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0, kvm=1
+	GEN_INT_ENTRY system_reset, virt=0
 	/*
-	 * MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
-	 * being used, so a nested NMI exception would corrupt it.
-	 *
 	 * In theory, we should not enable relocation here if it was disabled
 	 * in SRR1, because the MMU may not be configured to support it (e.g.,
 	 * SLB may have been cleared). In practice, there should only be a few
@@ -893,7 +905,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 	 */
 EXC_REAL_END(system_reset, 0x100, 0x100)
 EXC_VIRT_NONE(0x4100, 0x100)
-INT_KVM_HANDLER system_reset 0x100, EXC_STD, PACA_EXNMI, 0
+TRAMP_KVM_BEGIN(system_reset_kvm)
+	GEN_KVM system_reset
 
 #ifdef CONFIG_PPC_P7_NAP
 TRAMP_REAL_BEGIN(system_reset_idle_wake)
@@ -908,8 +921,8 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake)
  * Vectors for the FWNMI option.  Share common code.
  */
 TRAMP_REAL_BEGIN(system_reset_fwnmi)
-	/* See comment at system_reset exception, don't turn on RI */
-	INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0
+	__IKVM_REAL(system_reset)=0
+	GEN_INT_ENTRY system_reset, virt=0
 
 #endif /* CONFIG_PPC_PSERIES */
 
@@ -929,7 +942,7 @@ EXC_COMMON_BEGIN(system_reset_common)
 	mr	r10,r1
 	ld	r1,PACA_NMI_EMERG_SP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
-	INT_COMMON 0x100, PACA_EXNMI, 0, 1, 0, 0, 0
+	GEN_COMMON system_reset
 	bl	save_nvgprs
 	/*
 	 * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does
@@ -971,23 +984,46 @@ EXC_COMMON_BEGIN(system_reset_common)
 	RFI_TO_USER_OR_KERNEL
 
 
-EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
-	INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1
+INT_DEFINE_BEGIN(machine_check_early)
+	IVEC=0x200
+	IAREA=PACA_EXMC
 	/*
 	 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 	 * nested machine check corrupts it. machine_check_common enables
 	 * MSR_RI.
 	 */
+	ISET_RI=0
+	ISTACK=0
+	IEARLY=1
+	IDAR=1
+	IDSISR=1
+	IRECONCILE=0
+	IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
+INT_DEFINE_END(machine_check_early)
+
+INT_DEFINE_BEGIN(machine_check)
+	IVEC=0x200
+	IAREA=PACA_EXMC
+	ISET_RI=0
+	IDAR=1
+	IDSISR=1
+	IKVM_SKIP=1
+	IKVM_REAL=1
+INT_DEFINE_END(machine_check)
+
+EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
+	GEN_INT_ENTRY machine_check_early, virt=0
 EXC_REAL_END(machine_check, 0x200, 0x100)
 EXC_VIRT_NONE(0x4200, 0x100)
 
 #ifdef CONFIG_PPC_PSERIES
 TRAMP_REAL_BEGIN(machine_check_fwnmi)
 	/* See comment at machine_check exception, don't turn on RI */
-	INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1
+	GEN_INT_ENTRY machine_check_early, virt=0
 #endif
 
-INT_KVM_HANDLER machine_check 0x200, EXC_STD, PACA_EXMC, 1
+TRAMP_KVM_BEGIN(machine_check_kvm)
+	GEN_KVM machine_check
 
 #define MACHINE_CHECK_HANDLER_WINDUP			\
 	/* Clear MSR_RI before setting SRR0 and SRR1. */\
@@ -1039,8 +1075,7 @@ EXC_COMMON_BEGIN(machine_check_early_common)
 	bgt	cr1,unrecoverable_mce	/* Check if we hit limit of 4 */
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame */
 
-	/* We don't touch AMR here, we never go to virtual mode */
-	INT_COMMON 0x200, PACA_EXMC, 0, 0, 0, 1, 1
+	GEN_COMMON machine_check_early
 
 BEGIN_FTR_SECTION
 	bl	enable_machine_check
@@ -1128,15 +1163,15 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_CFAR,r10
 END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	MACHINE_CHECK_HANDLER_WINDUP
-	/* See comment at machine_check exception, don't turn on RI */
-	INT_HANDLER machine_check, 0x200, area=PACA_EXMC, ri=0, dar=1, dsisr=1, kvm=1
+	GEN_INT_ENTRY machine_check, virt=0
 
 EXC_COMMON_BEGIN(machine_check_common)
 	/*
 	 * Machine check is different because we use a different
 	 * save area: PACA_EXMC instead of PACA_EXGEN.
 	 */
-	INT_COMMON 0x200, PACA_EXMC, 1, 1, 1, 1, 1
+	GEN_COMMON machine_check
+
 	FINISH_NAP
 	/* Enable MSR_RI when finished with PACA_EXMC */
 	li	r10,MSR_RI
@@ -1208,6 +1243,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	bl	unrecoverable_exception
 	b	.
 
+
+/**
+ * 0x300 - Data Storage Interrupt (DSI)
+ * This interrupt is generated due to a data access which does not have a valid
+ * page table entry with permissions to allow the data access to be performed.
+ * DAWR matches also fault here, as do RC updates, and minor misc errors e.g.,
+ * copy/paste, AMO, certain invalid CI accesses, etc.
+ *
+ * This interrupt is delivered to the guest (HV bit unchanged).
+ *
+ * Linux HPT responds by first attempting to refill the hash table from the
+ * Linux page table, then going to a full page fault if the Linux page table
+ * entry was insufficient. RPT goes straight to full page fault.
+ *
+ * PR KVM ...?
+ */
 INT_DEFINE_BEGIN(data_access)
 	IVEC=0x300
 	IDAR=1
@@ -1237,15 +1288,25 @@ MMU_FTR_SECTION_ELSE
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
+INT_DEFINE_BEGIN(data_access_slb)
+	IVEC=0x380
+	IAREA=PACA_EXSLB
+	IRECONCILE=0
+	IDAR=1
+	IKVM_SKIP=1
+	IKVM_REAL=1
+INT_DEFINE_END(data_access_slb)
+
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-	INT_HANDLER data_access_slb, 0x380, ool=1, area=PACA_EXSLB, dar=1, kvm=1
+	GEN_INT_ENTRY data_access_slb, virt=0, ool=1
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
-	INT_HANDLER data_access_slb, 0x380, virt=1, area=PACA_EXSLB, dar=1
+	GEN_INT_ENTRY data_access_slb, virt=1
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
-INT_KVM_HANDLER data_access_slb, 0x380, EXC_STD, PACA_EXSLB, 1
+TRAMP_KVM_BEGIN(data_access_slb_kvm)
+	GEN_KVM data_access_slb
 EXC_COMMON_BEGIN(data_access_slb_common)
-	INT_COMMON 0x380, PACA_EXSLB, 1, 1, 0, 1, 0
+	GEN_COMMON data_access_slb
 	ld	r4,_DAR(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 BEGIN_MMU_FTR_SECTION
@@ -1269,15 +1330,23 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(instruction_access)
+	IVEC=0x400
+	IDAR=2
+	IDSISR=2
+	IKVM_REAL=1
+INT_DEFINE_END(instruction_access)
+
 EXC_REAL_BEGIN(instruction_access, 0x400, 0x80)
-	INT_HANDLER instruction_access, 0x400, kvm=1
+	GEN_INT_ENTRY instruction_access, virt=0
 EXC_REAL_END(instruction_access, 0x400, 0x80)
 EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80)
-	INT_HANDLER instruction_access, 0x400, virt=1
+	GEN_INT_ENTRY instruction_access, virt=1
 EXC_VIRT_END(instruction_access, 0x4400, 0x80)
-INT_KVM_HANDLER instruction_access, 0x400, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(instruction_access_kvm)
+	GEN_KVM instruction_access
 EXC_COMMON_BEGIN(instruction_access_common)
-	INT_COMMON 0x400, PACA_EXGEN, 1, 1, 1, 2, 2
+	GEN_COMMON instruction_access
 	ld	r4,_DAR(r1)
 	ld	r5,_DSISR(r1)
 BEGIN_MMU_FTR_SECTION
@@ -1289,15 +1358,24 @@ MMU_FTR_SECTION_ELSE
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
+INT_DEFINE_BEGIN(instruction_access_slb)
+	IVEC=0x480
+	IAREA=PACA_EXSLB
+	IRECONCILE=0
+	IDAR=2
+	IKVM_REAL=1
+INT_DEFINE_END(instruction_access_slb)
+
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
-	INT_HANDLER instruction_access_slb, 0x480, area=PACA_EXSLB, kvm=1
+	GEN_INT_ENTRY instruction_access_slb, virt=0
 EXC_REAL_END(instruction_access_slb, 0x480, 0x80)
 EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
-	INT_HANDLER instruction_access_slb, 0x480, virt=1, area=PACA_EXSLB
+	GEN_INT_ENTRY instruction_access_slb, virt=1
 EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80)
-INT_KVM_HANDLER instruction_access_slb, 0x480, EXC_STD, PACA_EXSLB, 0
+TRAMP_KVM_BEGIN(instruction_access_slb_kvm)
+	GEN_KVM instruction_access_slb
 EXC_COMMON_BEGIN(instruction_access_slb_common)
-	INT_COMMON 0x480, PACA_EXSLB, 1, 1, 0, 2, 0
+	GEN_COMMON instruction_access_slb
 	ld	r4,_DAR(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 BEGIN_MMU_FTR_SECTION
@@ -1320,15 +1398,24 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	bl	do_bad_slb_fault
 	b	ret_from_except
 
+INT_DEFINE_BEGIN(hardware_interrupt)
+	IVEC=0x500
+	IHSRR=EXC_HV_OR_STD
+	IMASK=IRQS_DISABLED
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(hardware_interrupt)
+
 EXC_REAL_BEGIN(hardware_interrupt, 0x500, 0x100)
-	INT_HANDLER hardware_interrupt, 0x500, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY hardware_interrupt, virt=0
 EXC_REAL_END(hardware_interrupt, 0x500, 0x100)
 EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100)
-	INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY hardware_interrupt, virt=1
 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100)
-INT_KVM_HANDLER hardware_interrupt, 0x500, EXC_HV_OR_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(hardware_interrupt_kvm)
+	GEN_KVM hardware_interrupt
 EXC_COMMON_BEGIN(hardware_interrupt_common)
-	INT_COMMON 0x500, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON hardware_interrupt
 	FINISH_NAP
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1336,28 +1423,42 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	b	ret_from_except_lite
 
 
+INT_DEFINE_BEGIN(alignment)
+	IVEC=0x600
+	IDAR=1
+	IDSISR=1
+	IKVM_REAL=1
+INT_DEFINE_END(alignment)
+
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
-	INT_HANDLER alignment, 0x600, dar=1, dsisr=1, kvm=1
+	GEN_INT_ENTRY alignment, virt=0
 EXC_REAL_END(alignment, 0x600, 0x100)
 EXC_VIRT_BEGIN(alignment, 0x4600, 0x100)
-	INT_HANDLER alignment, 0x600, virt=1, dar=1, dsisr=1
+	GEN_INT_ENTRY alignment, virt=1
 EXC_VIRT_END(alignment, 0x4600, 0x100)
-INT_KVM_HANDLER alignment, 0x600, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(alignment_kvm)
+	GEN_KVM alignment
 EXC_COMMON_BEGIN(alignment_common)
-	INT_COMMON 0x600, PACA_EXGEN, 1, 1, 1, 1, 1
+	GEN_COMMON alignment
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	alignment_exception
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(program_check)
+	IVEC=0x700
+	IKVM_REAL=1
+INT_DEFINE_END(program_check)
+
 EXC_REAL_BEGIN(program_check, 0x700, 0x100)
-	INT_HANDLER program_check, 0x700, kvm=1
+	GEN_INT_ENTRY program_check, virt=0
 EXC_REAL_END(program_check, 0x700, 0x100)
 EXC_VIRT_BEGIN(program_check, 0x4700, 0x100)
-	INT_HANDLER program_check, 0x700, virt=1
+	GEN_INT_ENTRY program_check, virt=1
 EXC_VIRT_END(program_check, 0x4700, 0x100)
-INT_KVM_HANDLER program_check, 0x700, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(program_check_kvm)
+	GEN_KVM program_check
 EXC_COMMON_BEGIN(program_check_common)
 	/*
 	 * It's possible to receive a TM Bad Thing type program check with
@@ -1383,10 +1484,12 @@ EXC_COMMON_BEGIN(program_check_common)
 	mr	r10,r1			/* Save r1			*/
 	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack		*/
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
-	INT_COMMON 0x700, PACA_EXGEN, 0, 1, 1, 0, 0
+	__ISTACK(program_check)=0
+	GEN_COMMON program_check
 	b 3f
 2:
-	INT_COMMON 0x700, PACA_EXGEN, 1, 1, 1, 0, 0
+	__ISTACK(program_check)=1
+	GEN_COMMON program_check
 3:
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1394,15 +1497,22 @@ EXC_COMMON_BEGIN(program_check_common)
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(fp_unavailable)
+	IVEC=0x800
+	IRECONCILE=0
+	IKVM_REAL=1
+INT_DEFINE_END(fp_unavailable)
+
 EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100)
-	INT_HANDLER fp_unavailable, 0x800, kvm=1
+	GEN_INT_ENTRY fp_unavailable, virt=0
 EXC_REAL_END(fp_unavailable, 0x800, 0x100)
 EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100)
-	INT_HANDLER fp_unavailable, 0x800, virt=1
+	GEN_INT_ENTRY fp_unavailable, virt=1
 EXC_VIRT_END(fp_unavailable, 0x4800, 0x100)
-INT_KVM_HANDLER fp_unavailable, 0x800, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(fp_unavailable_kvm)
+	GEN_KVM fp_unavailable
 EXC_COMMON_BEGIN(fp_unavailable_common)
-	INT_COMMON 0x800, PACA_EXGEN, 1, 1, 0, 0, 0
+	GEN_COMMON fp_unavailable
 	bne	1f			/* if from user, just load it up */
 	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
@@ -1432,15 +1542,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #endif
 
 
+INT_DEFINE_BEGIN(decrementer)
+	IVEC=0x900
+	IMASK=IRQS_DISABLED
+	IKVM_REAL=1
+INT_DEFINE_END(decrementer)
+
 EXC_REAL_BEGIN(decrementer, 0x900, 0x80)
-	INT_HANDLER decrementer, 0x900, ool=1, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY decrementer, virt=0, ool=1
 EXC_REAL_END(decrementer, 0x900, 0x80)
 EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
-	INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED
+	GEN_INT_ENTRY decrementer, virt=1
 EXC_VIRT_END(decrementer, 0x4900, 0x80)
-INT_KVM_HANDLER decrementer, 0x900, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(decrementer_kvm)
+	GEN_KVM decrementer
 EXC_COMMON_BEGIN(decrementer_common)
-	INT_COMMON 0x900, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON decrementer
 	FINISH_NAP
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1448,30 +1565,45 @@ EXC_COMMON_BEGIN(decrementer_common)
 	b	ret_from_except_lite
 
 
+INT_DEFINE_BEGIN(hdecrementer)
+	IVEC=0x980
+	IHSRR=EXC_HV
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(hdecrementer)
+
 EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80)
-	INT_HANDLER hdecrementer, 0x980, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY hdecrementer, virt=0
 EXC_REAL_END(hdecrementer, 0x980, 0x80)
 EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
-	INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY hdecrementer, virt=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
-INT_KVM_HANDLER hdecrementer, 0x980, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(hdecrementer_kvm)
+	GEN_KVM hdecrementer
 EXC_COMMON_BEGIN(hdecrementer_common)
-	INT_COMMON 0x980, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON hdecrementer
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	hdec_interrupt
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(doorbell_super)
+	IVEC=0xa00
+	IMASK=IRQS_DISABLED
+	IKVM_REAL=1
+INT_DEFINE_END(doorbell_super)
+
 EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100)
-	INT_HANDLER doorbell_super, 0xa00, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY doorbell_super, virt=0
 EXC_REAL_END(doorbell_super, 0xa00, 0x100)
 EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100)
-	INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED
+	GEN_INT_ENTRY doorbell_super, virt=1
 EXC_VIRT_END(doorbell_super, 0x4a00, 0x100)
-INT_KVM_HANDLER doorbell_super, 0xa00, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(doorbell_super_kvm)
+	GEN_KVM doorbell_super
 EXC_COMMON_BEGIN(doorbell_super_common)
-	INT_COMMON 0xa00, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON doorbell_super
 	FINISH_NAP
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1604,30 +1736,47 @@ TRAMP_KVM_BEGIN(system_call_kvm)
 #endif
 
 
+INT_DEFINE_BEGIN(single_step)
+	IVEC=0xd00
+	IKVM_REAL=1
+INT_DEFINE_END(single_step)
+
 EXC_REAL_BEGIN(single_step, 0xd00, 0x100)
-	INT_HANDLER single_step, 0xd00, kvm=1
+	GEN_INT_ENTRY single_step, virt=0
 EXC_REAL_END(single_step, 0xd00, 0x100)
 EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
-	INT_HANDLER single_step, 0xd00, virt=1
+	GEN_INT_ENTRY single_step, virt=1
 EXC_VIRT_END(single_step, 0x4d00, 0x100)
-INT_KVM_HANDLER single_step, 0xd00, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(single_step_kvm)
+	GEN_KVM single_step
 EXC_COMMON_BEGIN(single_step_common)
-	INT_COMMON 0xd00, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON single_step
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	single_step_exception
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(h_data_storage)
+	IVEC=0xe00
+	IHSRR=EXC_HV
+	IDAR=1
+	IDSISR=1
+	IKVM_SKIP=1
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(h_data_storage)
+
 EXC_REAL_BEGIN(h_data_storage, 0xe00, 0x20)
-	INT_HANDLER h_data_storage, 0xe00, ool=1, hsrr=EXC_HV, dar=1, dsisr=1, kvm=1
+	GEN_INT_ENTRY h_data_storage, virt=0, ool=1
 EXC_REAL_END(h_data_storage, 0xe00, 0x20)
 EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20)
-	INT_HANDLER h_data_storage, 0xe00, ool=1, virt=1, hsrr=EXC_HV, dar=1, dsisr=1, kvm=1
+	GEN_INT_ENTRY h_data_storage, virt=1, ool=1
 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20)
-INT_KVM_HANDLER h_data_storage, 0xe00, EXC_HV, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(h_data_storage_kvm)
+	GEN_KVM h_data_storage
 EXC_COMMON_BEGIN(h_data_storage_common)
-	INT_COMMON 0xe00, PACA_EXGEN, 1, 1, 1, 1, 1
+	GEN_COMMON h_data_storage
 	bl      save_nvgprs
 	addi    r3,r1,STACK_FRAME_OVERHEAD
 BEGIN_MMU_FTR_SECTION
@@ -1640,30 +1789,46 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 	b       ret_from_except
 
 
+INT_DEFINE_BEGIN(h_instr_storage)
+	IVEC=0xe20
+	IHSRR=EXC_HV
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(h_instr_storage)
+
 EXC_REAL_BEGIN(h_instr_storage, 0xe20, 0x20)
-	INT_HANDLER h_instr_storage, 0xe20, ool=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY h_instr_storage, virt=0, ool=1
 EXC_REAL_END(h_instr_storage, 0xe20, 0x20)
 EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20)
-	INT_HANDLER h_instr_storage, 0xe20, ool=1, virt=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY h_instr_storage, virt=1, ool=1
 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20)
-INT_KVM_HANDLER h_instr_storage, 0xe20, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(h_instr_storage_kvm)
+	GEN_KVM h_instr_storage
 EXC_COMMON_BEGIN(h_instr_storage_common)
-	INT_COMMON 0xe20, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON h_instr_storage
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unknown_exception
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(emulation_assist)
+	IVEC=0xe40
+	IHSRR=EXC_HV
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(emulation_assist)
+
 EXC_REAL_BEGIN(emulation_assist, 0xe40, 0x20)
-	INT_HANDLER emulation_assist, 0xe40, ool=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY emulation_assist, virt=0, ool=1
 EXC_REAL_END(emulation_assist, 0xe40, 0x20)
 EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20)
-	INT_HANDLER emulation_assist, 0xe40, ool=1, virt=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY emulation_assist, virt=1, ool=1
 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20)
-INT_KVM_HANDLER emulation_assist, 0xe40, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(emulation_assist_kvm)
+	GEN_KVM emulation_assist
 EXC_COMMON_BEGIN(emulation_assist_common)
-	INT_COMMON 0xe40, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON emulation_assist
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	emulation_assist_interrupt
@@ -1675,11 +1840,32 @@ EXC_COMMON_BEGIN(emulation_assist_common)
  * first, and then eventaully from there to the trampoline to get into virtual
  * mode.
  */
+INT_DEFINE_BEGIN(hmi_exception_early)
+	IVEC=0xe60
+	IHSRR=EXC_HV
+	IEARLY=1
+	ISTACK=0
+	IRECONCILE=0
+	IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
+	IKVM_REAL=1
+INT_DEFINE_END(hmi_exception_early)
+
+INT_DEFINE_BEGIN(hmi_exception)
+	IVEC=0xe60
+	IHSRR=EXC_HV
+	IMASK=IRQS_DISABLED
+	IKVM_REAL=1
+INT_DEFINE_END(hmi_exception)
+
 EXC_REAL_BEGIN(hmi_exception, 0xe60, 0x20)
-	INT_HANDLER hmi_exception, 0xe60, ool=1, early=1, hsrr=EXC_HV, ri=0, kvm=1
+	GEN_INT_ENTRY hmi_exception_early, virt=0, ool=1
 EXC_REAL_END(hmi_exception, 0xe60, 0x20)
 EXC_VIRT_NONE(0x4e60, 0x20)
-INT_KVM_HANDLER hmi_exception, 0xe60, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(hmi_exception_early_kvm)
+	GEN_KVM hmi_exception_early
+TRAMP_KVM_BEGIN(hmi_exception_kvm)
+	GEN_KVM hmi_exception
+
 EXC_COMMON_BEGIN(hmi_exception_early_common)
 	mtctr	r10			/* Restore ctr */
 	mfspr	r11,SPRN_HSRR0		/* Save HSRR0 */
@@ -1688,8 +1874,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common)
 	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack for realmode */
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
 
-	/* We don't touch AMR here, we never go to virtual mode */
-	INT_COMMON 0xe60, PACA_EXGEN, 0, 0, 0, 0, 0
+	GEN_COMMON hmi_exception_early
 
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	hmi_exception_realmode
@@ -1705,10 +1890,10 @@ EXC_COMMON_BEGIN(hmi_exception_early_common)
 	 * firmware.
 	 */
 	EXCEPTION_RESTORE_REGS EXC_HV
-	INT_HANDLER hmi_exception, 0xe60, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY hmi_exception, virt=0
 
 EXC_COMMON_BEGIN(hmi_exception_common)
-	INT_COMMON 0xe60, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON hmi_exception
 	FINISH_NAP
 	RUNLATCH_ON
 	bl	save_nvgprs
@@ -1717,15 +1902,24 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(h_doorbell)
+	IVEC=0xe80
+	IHSRR=EXC_HV
+	IMASK=IRQS_DISABLED
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(h_doorbell)
+
 EXC_REAL_BEGIN(h_doorbell, 0xe80, 0x20)
-	INT_HANDLER h_doorbell, 0xe80, ool=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY h_doorbell, virt=0, ool=1
 EXC_REAL_END(h_doorbell, 0xe80, 0x20)
 EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20)
-	INT_HANDLER h_doorbell, 0xe80, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY h_doorbell, virt=1, ool=1
 EXC_VIRT_END(h_doorbell, 0x4e80, 0x20)
-INT_KVM_HANDLER h_doorbell, 0xe80, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(h_doorbell_kvm)
+	GEN_KVM h_doorbell
 EXC_COMMON_BEGIN(h_doorbell_common)
-	INT_COMMON 0xe80, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON h_doorbell
 	FINISH_NAP
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1737,15 +1931,24 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 	b	ret_from_except_lite
 
 
+INT_DEFINE_BEGIN(h_virt_irq)
+	IVEC=0xea0
+	IHSRR=EXC_HV
+	IMASK=IRQS_DISABLED
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(h_virt_irq)
+
 EXC_REAL_BEGIN(h_virt_irq, 0xea0, 0x20)
-	INT_HANDLER h_virt_irq, 0xea0, ool=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY h_virt_irq, virt=0, ool=1
 EXC_REAL_END(h_virt_irq, 0xea0, 0x20)
 EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20)
-	INT_HANDLER h_virt_irq, 0xea0, ool=1, virt=1, hsrr=EXC_HV, bitmask=IRQS_DISABLED, kvm=1
+	GEN_INT_ENTRY h_virt_irq, virt=1, ool=1
 EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20)
-INT_KVM_HANDLER h_virt_irq, 0xea0, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(h_virt_irq_kvm)
+	GEN_KVM h_virt_irq
 EXC_COMMON_BEGIN(h_virt_irq_common)
-	INT_COMMON 0xea0, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON h_virt_irq
 	FINISH_NAP
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1759,15 +1962,22 @@ EXC_REAL_NONE(0xee0, 0x20)
 EXC_VIRT_NONE(0x4ee0, 0x20)
 
 
+INT_DEFINE_BEGIN(performance_monitor)
+	IVEC=0xf00
+	IMASK=IRQS_PMI_DISABLED
+	IKVM_REAL=1
+INT_DEFINE_END(performance_monitor)
+
 EXC_REAL_BEGIN(performance_monitor, 0xf00, 0x20)
-	INT_HANDLER performance_monitor, 0xf00, ool=1, bitmask=IRQS_PMI_DISABLED, kvm=1
+	GEN_INT_ENTRY performance_monitor, virt=0, ool=1
 EXC_REAL_END(performance_monitor, 0xf00, 0x20)
 EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20)
-	INT_HANDLER performance_monitor, 0xf00, ool=1, virt=1, bitmask=IRQS_PMI_DISABLED
+	GEN_INT_ENTRY performance_monitor, virt=1, ool=1
 EXC_VIRT_END(performance_monitor, 0x4f00, 0x20)
-INT_KVM_HANDLER performance_monitor, 0xf00, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(performance_monitor_kvm)
+	GEN_KVM performance_monitor
 EXC_COMMON_BEGIN(performance_monitor_common)
-	INT_COMMON 0xf00, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON performance_monitor
 	FINISH_NAP
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1775,15 +1985,22 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	b	ret_from_except_lite
 
 
+INT_DEFINE_BEGIN(altivec_unavailable)
+	IVEC=0xf20
+	IRECONCILE=0
+	IKVM_REAL=1
+INT_DEFINE_END(altivec_unavailable)
+
 EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20)
-	INT_HANDLER altivec_unavailable, 0xf20, ool=1, kvm=1
+	GEN_INT_ENTRY altivec_unavailable, virt=0, ool=1
 EXC_REAL_END(altivec_unavailable, 0xf20, 0x20)
 EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20)
-	INT_HANDLER altivec_unavailable, 0xf20, ool=1, virt=1
+	GEN_INT_ENTRY altivec_unavailable, virt=1, ool=1
 EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20)
-INT_KVM_HANDLER altivec_unavailable, 0xf20, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(altivec_unavailable_kvm)
+	GEN_KVM altivec_unavailable
 EXC_COMMON_BEGIN(altivec_unavailable_common)
-	INT_COMMON 0xf20, PACA_EXGEN, 1, 1, 0, 0, 0
+	GEN_COMMON altivec_unavailable
 #ifdef CONFIG_ALTIVEC
 BEGIN_FTR_SECTION
 	beq	1f
@@ -1816,15 +2033,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(vsx_unavailable)
+	IVEC=0xf40
+	IRECONCILE=0
+	IKVM_REAL=1
+INT_DEFINE_END(vsx_unavailable)
+
 EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20)
-	INT_HANDLER vsx_unavailable, 0xf40, ool=1, kvm=1
+	GEN_INT_ENTRY vsx_unavailable, virt=0, ool=1
 EXC_REAL_END(vsx_unavailable, 0xf40, 0x20)
 EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20)
-	INT_HANDLER vsx_unavailable, 0xf40, ool=1, virt=1
+	GEN_INT_ENTRY vsx_unavailable, virt=1, ool=1
 EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20)
-INT_KVM_HANDLER vsx_unavailable, 0xf40, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(vsx_unavailable_kvm)
+	GEN_KVM vsx_unavailable
 EXC_COMMON_BEGIN(vsx_unavailable_common)
-	INT_COMMON 0xf40, PACA_EXGEN, 1, 1, 0, 0, 0
+	GEN_COMMON vsx_unavailable
 #ifdef CONFIG_VSX
 BEGIN_FTR_SECTION
 	beq	1f
@@ -1856,30 +2080,44 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(facility_unavailable)
+	IVEC=0xf60
+	IKVM_REAL=1
+INT_DEFINE_END(facility_unavailable)
+
 EXC_REAL_BEGIN(facility_unavailable, 0xf60, 0x20)
-	INT_HANDLER facility_unavailable, 0xf60, ool=1, kvm=1
+	GEN_INT_ENTRY facility_unavailable, virt=0, ool=1
 EXC_REAL_END(facility_unavailable, 0xf60, 0x20)
 EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20)
-	INT_HANDLER facility_unavailable, 0xf60, ool=1, virt=1
+	GEN_INT_ENTRY facility_unavailable, virt=1, ool=1
 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20)
-INT_KVM_HANDLER facility_unavailable, 0xf60, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(facility_unavailable_kvm)
+	GEN_KVM facility_unavailable
 EXC_COMMON_BEGIN(facility_unavailable_common)
-	INT_COMMON 0xf60, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON facility_unavailable
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	facility_unavailable_exception
 	b	ret_from_except
 
 
+INT_DEFINE_BEGIN(h_facility_unavailable)
+	IVEC=0xf80
+	IHSRR=EXC_HV
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(h_facility_unavailable)
+
 EXC_REAL_BEGIN(h_facility_unavailable, 0xf80, 0x20)
-	INT_HANDLER h_facility_unavailable, 0xf80, ool=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY h_facility_unavailable, virt=0, ool=1
 EXC_REAL_END(h_facility_unavailable, 0xf80, 0x20)
 EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20)
-	INT_HANDLER h_facility_unavailable, 0xf80, ool=1, virt=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY h_facility_unavailable, virt=1, ool=1
 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20)
-INT_KVM_HANDLER h_facility_unavailable, 0xf80, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(h_facility_unavailable_kvm)
+	GEN_KVM h_facility_unavailable
 EXC_COMMON_BEGIN(h_facility_unavailable_common)
-	INT_COMMON 0xf80, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON h_facility_unavailable
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	facility_unavailable_exception
@@ -1899,13 +2137,21 @@ EXC_REAL_NONE(0x1100, 0x100)
 EXC_VIRT_NONE(0x5100, 0x100)
 
 #ifdef CONFIG_CBE_RAS
+INT_DEFINE_BEGIN(cbe_system_error)
+	IVEC=0x1200
+	IHSRR=EXC_HV
+	IKVM_SKIP=1
+	IKVM_REAL=1
+INT_DEFINE_END(cbe_system_error)
+
 EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100)
-	INT_HANDLER cbe_system_error, 0x1200, ool=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY cbe_system_error, virt=0
 EXC_REAL_END(cbe_system_error, 0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
-INT_KVM_HANDLER cbe_system_error, 0x1200, EXC_HV, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(cbe_system_error_kvm)
+	GEN_KVM cbe_system_error
 EXC_COMMON_BEGIN(cbe_system_error_common)
-	INT_COMMON 0x1200, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON cbe_system_error
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_system_error_exception
@@ -1916,15 +2162,22 @@ EXC_VIRT_NONE(0x5200, 0x100)
 #endif
 
 
+INT_DEFINE_BEGIN(instruction_breakpoint)
+	IVEC=0x1300
+	IKVM_SKIP=1
+	IKVM_REAL=1
+INT_DEFINE_END(instruction_breakpoint)
+
 EXC_REAL_BEGIN(instruction_breakpoint, 0x1300, 0x100)
-	INT_HANDLER instruction_breakpoint, 0x1300, kvm=1
+	GEN_INT_ENTRY instruction_breakpoint, virt=0
 EXC_REAL_END(instruction_breakpoint, 0x1300, 0x100)
 EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100)
-	INT_HANDLER instruction_breakpoint, 0x1300, virt=1
+	GEN_INT_ENTRY instruction_breakpoint, virt=1
 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100)
-INT_KVM_HANDLER instruction_breakpoint, 0x1300, EXC_STD, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(instruction_breakpoint_kvm)
+	GEN_KVM instruction_breakpoint
 EXC_COMMON_BEGIN(instruction_breakpoint_common)
-	INT_COMMON 0x1300, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON instruction_breakpoint
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	instruction_breakpoint_exception
@@ -1934,30 +2187,35 @@ EXC_COMMON_BEGIN(instruction_breakpoint_common)
 EXC_REAL_NONE(0x1400, 0x100)
 EXC_VIRT_NONE(0x5400, 0x100)
 
-EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100)
-	INT_HANDLER denorm_exception_hv, 0x1500, early=2, hsrr=EXC_HV
+INT_DEFINE_BEGIN(denorm_exception)
+	IVEC=0x1500
+	IHSRR=EXC_HV
+	IEARLY=2
+INT_DEFINE_END(denorm_exception)
+
+EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
+	GEN_INT_ENTRY denorm_exception, virt=0
 #ifdef CONFIG_PPC_DENORMALISATION
 	mfspr	r10,SPRN_HSRR1
 	andis.	r10,r10,(HSRR1_DENORM)@h /* denorm? */
 	bne+	denorm_assist
 #endif
-	KVMTEST denorm_exception_hv, EXC_HV 0x1500
-	INT_SAVE_SRR_AND_JUMP denorm_common, EXC_HV, 1
-EXC_REAL_END(denorm_exception_hv, 0x1500, 0x100)
-
+	KVMTEST denorm_exception, EXC_HV, 0x1500
+	INT_SAVE_SRR_AND_JUMP denorm_exception_common, EXC_HV, 1
+EXC_REAL_END(denorm_exception, 0x1500, 0x100)
 #ifdef CONFIG_PPC_DENORMALISATION
 EXC_VIRT_BEGIN(denorm_exception, 0x5500, 0x100)
-	INT_HANDLER denorm_exception, 0x1500, 0, 2, 1, EXC_HV, PACA_EXGEN, 1, 0, 0, 0, 0
+	GEN_INT_ENTRY denorm_exception, virt=1
 	mfspr	r10,SPRN_HSRR1
 	andis.	r10,r10,(HSRR1_DENORM)@h /* denorm? */
 	bne+	denorm_assist
-	INT_VIRT_SAVE_SRR_AND_JUMP denorm_common, EXC_HV
+	INT_VIRT_SAVE_SRR_AND_JUMP denorm_exception_common, EXC_HV
 EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
 #else
 EXC_VIRT_NONE(0x5500, 0x100)
 #endif
-
-INT_KVM_HANDLER denorm_exception_hv, 0x1500, EXC_HV, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(denorm_exception_kvm)
+	GEN_KVM denorm_exception
 
 #ifdef CONFIG_PPC_DENORMALISATION
 TRAMP_REAL_BEGIN(denorm_assist)
@@ -2028,8 +2286,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	b	.
 #endif
 
-EXC_COMMON_BEGIN(denorm_common)
-	INT_COMMON 0x1500, PACA_EXGEN, 1, 1, 1, 0, 0
+EXC_COMMON_BEGIN(denorm_exception_common)
+	GEN_COMMON denorm_exception
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unknown_exception
@@ -2037,13 +2295,21 @@ EXC_COMMON_BEGIN(denorm_common)
 
 
 #ifdef CONFIG_CBE_RAS
+INT_DEFINE_BEGIN(cbe_maintenance)
+	IVEC=0x1600
+	IHSRR=EXC_HV
+	IKVM_SKIP=1
+	IKVM_REAL=1
+INT_DEFINE_END(cbe_maintenance)
+
 EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100)
-	INT_HANDLER cbe_maintenance, 0x1600, ool=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY cbe_maintenance, virt=0
 EXC_REAL_END(cbe_maintenance, 0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
-INT_KVM_HANDLER cbe_maintenance, 0x1600, EXC_HV, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(cbe_maintenance_kvm)
+	GEN_KVM cbe_maintenance
 EXC_COMMON_BEGIN(cbe_maintenance_common)
-	INT_COMMON 0x1600, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON cbe_maintenance
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_maintenance_exception
@@ -2054,15 +2320,21 @@ EXC_VIRT_NONE(0x5600, 0x100)
 #endif
 
 
+INT_DEFINE_BEGIN(altivec_assist)
+	IVEC=0x1700
+	IKVM_REAL=1
+INT_DEFINE_END(altivec_assist)
+
 EXC_REAL_BEGIN(altivec_assist, 0x1700, 0x100)
-	INT_HANDLER altivec_assist, 0x1700, kvm=1
+	GEN_INT_ENTRY altivec_assist, virt=0
 EXC_REAL_END(altivec_assist, 0x1700, 0x100)
 EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100)
-	INT_HANDLER altivec_assist, 0x1700, virt=1
+	GEN_INT_ENTRY altivec_assist, virt=1
 EXC_VIRT_END(altivec_assist, 0x5700, 0x100)
-INT_KVM_HANDLER altivec_assist, 0x1700, EXC_STD, PACA_EXGEN, 0
+TRAMP_KVM_BEGIN(altivec_assist_kvm)
+	GEN_KVM altivec_assist
 EXC_COMMON_BEGIN(altivec_assist_common)
-	INT_COMMON 0x1700, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON altivec_assist
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_ALTIVEC
@@ -2074,13 +2346,21 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 
 
 #ifdef CONFIG_CBE_RAS
+INT_DEFINE_BEGIN(cbe_thermal)
+	IVEC=0x1800
+	IHSRR=EXC_HV
+	IKVM_SKIP=1
+	IKVM_REAL=1
+INT_DEFINE_END(cbe_thermal)
+
 EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100)
-	INT_HANDLER cbe_thermal, 0x1800, ool=1, hsrr=EXC_HV, kvm=1
+	GEN_INT_ENTRY cbe_thermal, virt=0
 EXC_REAL_END(cbe_thermal, 0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
-INT_KVM_HANDLER cbe_thermal, 0x1800, EXC_HV, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(cbe_thermal_kvm)
+	GEN_KVM cbe_thermal
 EXC_COMMON_BEGIN(cbe_thermal_common)
-	INT_COMMON 0x1800, PACA_EXGEN, 1, 1, 1, 0, 0
+	GEN_COMMON cbe_thermal
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_thermal_exception
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 06/32] powerpc/64s/exception: Remove old INT_ENTRY macro
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (4 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 05/32] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 07/32] powerpc/64s/exception: Remove old INT_COMMON macro Nicholas Piggin
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 68 ++++++++++++----------------
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 0157ba48efe9..74bf6e0bf61f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -482,13 +482,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * - Fall through and continue executing in real, unrelocated mode.
  *   This is done if early=2.
  */
-.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, ri=1, dar=0, dsisr=0, bitmask=0, kvm=0
+.macro GEN_INT_ENTRY name, virt, ool=0
 	SET_SCRATCH0(r13)			/* save r13 */
 	GET_PACA(r13)
-	std	r9,\area\()+EX_R9(r13)		/* save r9 */
+	std	r9,IAREA+EX_R9(r13)		/* save r9 */
 	OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
 	HMT_MEDIUM
-	std	r10,\area\()+EX_R10(r13)	/* save r10 - r12 */
+	std	r10,IAREA+EX_R10(r13)		/* save r10 - r12 */
 	OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
 	.if \ool
 	.if !\virt
@@ -502,47 +502,47 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.endif
 	.endif
 
-	OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR)
-	OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR)
+	OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
+	OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
 	INTERRUPT_TO_KERNEL
-	SAVE_CTR(r10, \area\())
+	SAVE_CTR(r10, IAREA)
 	mfcr	r9
-	.if \kvm
-		KVMTEST \name \hsrr \vec
+	.if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
+		KVMTEST \name IHSRR IVEC
 	.endif
-	.if \bitmask
+	.if IMASK
 		lbz	r10,PACAIRQSOFTMASK(r13)
-		andi.	r10,r10,\bitmask
+		andi.	r10,r10,IMASK
 		/* Associate vector numbers with bits in paca->irq_happened */
-		.if \vec == 0x500 || \vec == 0xea0
+		.if IVEC == 0x500 || IVEC == 0xea0
 		li	r10,PACA_IRQ_EE
-		.elseif \vec == 0x900
+		.elseif IVEC == 0x900
 		li	r10,PACA_IRQ_DEC
-		.elseif \vec == 0xa00 || \vec == 0xe80
+		.elseif IVEC == 0xa00 || IVEC == 0xe80
 		li	r10,PACA_IRQ_DBELL
-		.elseif \vec == 0xe60
+		.elseif IVEC == 0xe60
 		li	r10,PACA_IRQ_HMI
-		.elseif \vec == 0xf00
+		.elseif IVEC == 0xf00
 		li	r10,PACA_IRQ_PMI
 		.else
 		.abort "Bad maskable vector"
 		.endif
 
-		.if \hsrr == EXC_HV_OR_STD
+		.if IHSRR == EXC_HV_OR_STD
 		BEGIN_FTR_SECTION
 		bne	masked_Hinterrupt
 		FTR_SECTION_ELSE
 		bne	masked_interrupt
 		ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-		.elseif \hsrr
+		.elseif IHSRR
 		bne	masked_Hinterrupt
 		.else
 		bne	masked_interrupt
 		.endif
 	.endif
 
-	std	r11,\area\()+EX_R11(r13)
-	std	r12,\area\()+EX_R12(r13)
+	std	r11,IAREA+EX_R11(r13)
+	std	r12,IAREA+EX_R12(r13)
 
 	/*
 	 * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI],
@@ -550,47 +550,39 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	 * not recoverable if they are live.
 	 */
 	GET_SCRATCH0(r10)
-	std	r10,\area\()+EX_R13(r13)
-	.if \dar == 1
-	.if \hsrr
+	std	r10,IAREA+EX_R13(r13)
+	.if IDAR == 1
+	.if IHSRR
 	mfspr	r10,SPRN_HDAR
 	.else
 	mfspr	r10,SPRN_DAR
 	.endif
-	std	r10,\area\()+EX_DAR(r13)
+	std	r10,IAREA+EX_DAR(r13)
 	.endif
-	.if \dsisr == 1
-	.if \hsrr
+	.if IDSISR == 1
+	.if IHSRR
 	mfspr	r10,SPRN_HDSISR
 	.else
 	mfspr	r10,SPRN_DSISR
 	.endif
-	stw	r10,\area\()+EX_DSISR(r13)
+	stw	r10,IAREA+EX_DSISR(r13)
 	.endif
 
-	.if \early == 2
+	.if IEARLY == 2
 	/* nothing more */
-	.elseif \early
+	.elseif IEARLY
 	mfctr	r10			/* save ctr, even for !RELOCATABLE */
 	BRANCH_TO_C000(r11, \name\()_common)
 	.elseif !\virt
-	INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri
+	INT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR, ISET_RI
 	.else
-	INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr
+	INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR
 	.endif
 	.if \ool
 	.popsection
 	.endif
 .endm
 
-.macro GEN_INT_ENTRY name, virt, ool=0
-	.if ! \virt
-		INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, ISET_RI, IDAR, IDSISR, IMASK, IKVM_REAL
-	.else
-		INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, ISET_RI, IDAR, IDSISR, IMASK, IKVM_VIRT
-	.endif
-.endm
-
 /*
  * On entry r13 points to the paca, r9-r13 are saved in the paca,
  * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 07/32] powerpc/64s/exception: Remove old INT_COMMON macro
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (5 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 06/32] powerpc/64s/exception: Remove old INT_ENTRY macro Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 08/32] powerpc/64s/exception: Remove old INT_KVM_HANDLER Nicholas Piggin
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 51 +++++++++++++---------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 74bf6e0bf61f..90514766dc7d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -591,8 +591,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * If stack=0, then the stack is already set in r1, and r1 is saved in r10.
  * PPR save and CPU accounting is not done for the !stack case (XXX why not?)
  */
-.macro INT_COMMON vec, area, stack, kuap, reconcile, dar, dsisr
-	.if \stack
+.macro GEN_COMMON name
+	.if ISTACK
 	andi.	r10,r12,MSR_PR		/* See if coming from user	*/
 	mr	r10,r1			/* Save r1			*/
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc frame on kernel stack	*/
@@ -609,54 +609,54 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	std	r0,GPR0(r1)		/* save r0 in stackframe	*/
 	std	r10,GPR1(r1)		/* save r1 in stackframe	*/
 
-	.if \stack
-	.if \kuap
+	.if ISTACK
+	.if IKUAP
 	kuap_save_amr_and_lock r9, r10, cr1, cr0
 	.endif
 	beq	101f			/* if from kernel mode		*/
 	ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
-	SAVE_PPR(\area, r9)
+	SAVE_PPR(IAREA, r9)
 101:
 	.else
-	.if \kuap
+	.if IKUAP
 	kuap_save_amr_and_lock r9, r10, cr1
 	.endif
 	.endif
 
 	/* Save original regs values from save area to stack frame. */
-	ld	r9,\area+EX_R9(r13)	/* move r9, r10 to stackframe	*/
-	ld	r10,\area+EX_R10(r13)
+	ld	r9,IAREA+EX_R9(r13)	/* move r9, r10 to stackframe	*/
+	ld	r10,IAREA+EX_R10(r13)
 	std	r9,GPR9(r1)
 	std	r10,GPR10(r1)
-	ld	r9,\area+EX_R11(r13)	/* move r11 - r13 to stackframe	*/
-	ld	r10,\area+EX_R12(r13)
-	ld	r11,\area+EX_R13(r13)
+	ld	r9,IAREA+EX_R11(r13)	/* move r11 - r13 to stackframe	*/
+	ld	r10,IAREA+EX_R12(r13)
+	ld	r11,IAREA+EX_R13(r13)
 	std	r9,GPR11(r1)
 	std	r10,GPR12(r1)
 	std	r11,GPR13(r1)
-	.if \dar
-	.if \dar == 2
+	.if IDAR
+	.if IDAR == 2
 	ld	r10,_NIP(r1)
 	.else
-	ld	r10,\area+EX_DAR(r13)
+	ld	r10,IAREA+EX_DAR(r13)
 	.endif
 	std	r10,_DAR(r1)
 	.endif
-	.if \dsisr
-	.if \dsisr == 2
+	.if IDSISR
+	.if IDSISR == 2
 	ld	r10,_MSR(r1)
 	lis	r11,DSISR_SRR1_MATCH_64S@h
 	and	r10,r10,r11
 	.else
-	lwz	r10,\area+EX_DSISR(r13)
+	lwz	r10,IAREA+EX_DSISR(r13)
 	.endif
 	std	r10,_DSISR(r1)
 	.endif
 BEGIN_FTR_SECTION_NESTED(66)
-	ld	r10,\area+EX_CFAR(r13)
+	ld	r10,IAREA+EX_CFAR(r13)
 	std	r10,ORIG_GPR3(r1)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
-	GET_CTR(r10, \area)
+	GET_CTR(r10, IAREA)
 	std	r10,_CTR(r1)
 	std	r2,GPR2(r1)		/* save r2 in stackframe	*/
 	SAVE_4GPRS(3, r1)		/* save r3 - r6 in stackframe   */
@@ -668,26 +668,22 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
 	mfspr	r11,SPRN_XER		/* save XER in stackframe	*/
 	std	r10,SOFTE(r1)
 	std	r11,_XER(r1)
-	li	r9,(\vec)+1
+	li	r9,(IVEC)+1
 	std	r9,_TRAP(r1)		/* set trap number		*/
 	li	r10,0
 	ld	r11,exception_marker@toc(r2)
 	std	r10,RESULT(r1)		/* clear regs->result		*/
 	std	r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame	*/
 
-	.if \stack
+	.if ISTACK
 	ACCOUNT_STOLEN_TIME
 	.endif
 
-	.if \reconcile
+	.if IRECONCILE
 	RECONCILE_IRQ_STATE(r10, r11)
 	.endif
 .endm
 
-.macro GEN_COMMON name
-	INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR
-.endm
-
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -2387,7 +2383,8 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	mr	r10,r1
 	ld	r1,PACAEMERGSP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
-	INT_COMMON 0x900, PACA_EXGEN, 0, 1, 1, 0, 0
+	__ISTACK(decrementer)=0
+	GEN_COMMON decrementer
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	soft_nmi_interrupt
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 08/32] powerpc/64s/exception: Remove old INT_KVM_HANDLER
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (6 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 07/32] powerpc/64s/exception: Remove old INT_COMMON macro Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 09/32] powerpc/64s/exception: Add ISIDE option Nicholas Piggin
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 55 +++++++++++++---------------
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 90514766dc7d..cba99f9a815b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -266,15 +266,6 @@ do_define_int n
 	.endif
 .endm
 
-.macro INT_KVM_HANDLER name, vec, hsrr, area, skip
-	TRAMP_KVM_BEGIN(\name\()_kvm)
-	KVM_HANDLER \vec, \hsrr, \area, \skip
-.endm
-
-.macro GEN_KVM name
-	KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP
-.endm
-
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -293,35 +284,35 @@ do_define_int n
 	bne	\name\()_kvm
 .endm
 
-.macro KVM_HANDLER vec, hsrr, area, skip
-	.if \skip
+.macro GEN_KVM name
+	.if IKVM_SKIP
 	cmpwi	r10,KVM_GUEST_MODE_SKIP
 	beq	89f
 	.else
 BEGIN_FTR_SECTION_NESTED(947)
-	ld	r10,\area+EX_CFAR(r13)
+	ld	r10,IAREA+EX_CFAR(r13)
 	std	r10,HSTATE_CFAR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
 	.endif
 
 BEGIN_FTR_SECTION_NESTED(948)
-	ld	r10,\area+EX_PPR(r13)
+	ld	r10,IAREA+EX_PPR(r13)
 	std	r10,HSTATE_PPR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-	ld	r10,\area+EX_R10(r13)
+	ld	r10,IAREA+EX_R10(r13)
 	std	r12,HSTATE_SCRATCH0(r13)
 	sldi	r12,r9,32
 	/* HSRR variants have the 0x2 bit added to their trap number */
-	.if \hsrr == EXC_HV_OR_STD
+	.if IHSRR == EXC_HV_OR_STD
 	BEGIN_FTR_SECTION
-	ori	r12,r12,(\vec + 0x2)
+	ori	r12,r12,(IVEC + 0x2)
 	FTR_SECTION_ELSE
-	ori	r12,r12,(\vec)
+	ori	r12,r12,(IVEC)
 	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
-	ori	r12,r12,(\vec + 0x2)
+	.elseif IHSRR
+	ori	r12,r12,(IVEC+ 0x2)
 	.else
-	ori	r12,r12,(\vec)
+	ori	r12,r12,(IVEC)
 	.endif
 
 #ifdef CONFIG_RELOCATABLE
@@ -334,25 +325,25 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	std	r9,HSTATE_SCRATCH1(r13)
 	__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
 	mtctr	r9
-	ld	r9,\area+EX_R9(r13)
+	ld	r9,IAREA+EX_R9(r13)
 	bctr
 #else
-	ld	r9,\area+EX_R9(r13)
+	ld	r9,IAREA+EX_R9(r13)
 	b	kvmppc_interrupt
 #endif
 
 
-	.if \skip
+	.if IKVM_SKIP
 89:	mtocrf	0x80,r9
-	ld	r9,\area+EX_R9(r13)
-	ld	r10,\area+EX_R10(r13)
-	.if \hsrr == EXC_HV_OR_STD
+	ld	r9,IAREA+EX_R9(r13)
+	ld	r10,IAREA+EX_R10(r13)
+	.if IHSRR == EXC_HV_OR_STD
 	BEGIN_FTR_SECTION
 	b	kvmppc_skip_Hinterrupt
 	FTR_SECTION_ELSE
 	b	kvmppc_skip_interrupt
 	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
+	.elseif IHSRR
 	b	kvmppc_skip_Hinterrupt
 	.else
 	b	kvmppc_skip_interrupt
@@ -363,7 +354,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 #else
 .macro KVMTEST name, hsrr, n
 .endm
-.macro KVM_HANDLER name, vec, hsrr, area, skip
+.macro GEN_KVM name
 .endm
 #endif
 
@@ -1627,6 +1618,12 @@ EXC_VIRT_NONE(0x4b00, 0x100)
  * without saving, though xer is not a good idea to use, as hardware may
  * interpret some bits so it may be costly to change them.
  */
+INT_DEFINE_BEGIN(system_call)
+	IVEC=0xc00
+	IKVM_REAL=1
+	IKVM_VIRT=1
+INT_DEFINE_END(system_call)
+
 .macro SYSTEM_CALL virt
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 	/*
@@ -1720,7 +1717,7 @@ TRAMP_KVM_BEGIN(system_call_kvm)
 	SET_SCRATCH0(r10)
 	std	r9,PACA_EXGEN+EX_R9(r13)
 	mfcr	r9
-	KVM_HANDLER 0xc00, EXC_STD, PACA_EXGEN, 0
+	GEN_KVM system_call
 #endif
 
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 09/32] powerpc/64s/exception: Add ISIDE option
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (7 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 08/32] powerpc/64s/exception: Remove old INT_KVM_HANDLER Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 10/32] powerpc/64s/exception: move real->virt switch into the common handler Nicholas Piggin
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Rather than using DAR=2 to select the i-side registers, add an
explicit option.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index cba99f9a815b..4eb099046f9d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -199,6 +199,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IVEC		.L_IVEC_\name\()
 #define IHSRR		.L_IHSRR_\name\()
 #define IAREA		.L_IAREA_\name\()
+#define IISIDE		.L_IISIDE_\name\()
 #define IDAR		.L_IDAR_\name\()
 #define IDSISR		.L_IDSISR_\name\()
 #define ISET_RI		.L_ISET_RI_\name\()
@@ -231,6 +232,9 @@ do_define_int n
 	.ifndef IAREA
 		IAREA=PACA_EXGEN
 	.endif
+	.ifndef IISIDE
+		IISIDE=0
+	.endif
 	.ifndef IDAR
 		IDAR=0
 	.endif
@@ -542,7 +546,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	 */
 	GET_SCRATCH0(r10)
 	std	r10,IAREA+EX_R13(r13)
-	.if IDAR == 1
+	.if IDAR && !IISIDE
 	.if IHSRR
 	mfspr	r10,SPRN_HDAR
 	.else
@@ -550,7 +554,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.endif
 	std	r10,IAREA+EX_DAR(r13)
 	.endif
-	.if IDSISR == 1
+	.if IDSISR && !IISIDE
 	.if IHSRR
 	mfspr	r10,SPRN_HDSISR
 	.else
@@ -625,16 +629,18 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	std	r9,GPR11(r1)
 	std	r10,GPR12(r1)
 	std	r11,GPR13(r1)
+
 	.if IDAR
-	.if IDAR == 2
+	.if IISIDE
 	ld	r10,_NIP(r1)
 	.else
 	ld	r10,IAREA+EX_DAR(r13)
 	.endif
 	std	r10,_DAR(r1)
 	.endif
+
 	.if IDSISR
-	.if IDSISR == 2
+	.if IISIDE
 	ld	r10,_MSR(r1)
 	lis	r11,DSISR_SRR1_MATCH_64S@h
 	and	r10,r10,r11
@@ -643,6 +649,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.endif
 	std	r10,_DSISR(r1)
 	.endif
+
 BEGIN_FTR_SECTION_NESTED(66)
 	ld	r10,IAREA+EX_CFAR(r13)
 	std	r10,ORIG_GPR3(r1)
@@ -1311,8 +1318,9 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(instruction_access)
 	IVEC=0x400
-	IDAR=2
-	IDSISR=2
+	IISIDE=1
+	IDAR=1
+	IDSISR=1
 	IKVM_REAL=1
 INT_DEFINE_END(instruction_access)
 
@@ -1341,7 +1349,8 @@ INT_DEFINE_BEGIN(instruction_access_slb)
 	IVEC=0x480
 	IAREA=PACA_EXSLB
 	IRECONCILE=0
-	IDAR=2
+	IISIDE=1
+	IDAR=1
 	IKVM_REAL=1
 INT_DEFINE_END(instruction_access_slb)
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 10/32] powerpc/64s/exception: move real->virt switch into the common handler
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (8 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 09/32] powerpc/64s/exception: Add ISIDE option Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 11/32] powerpc/64s/exception: move soft-mask test to common code Nicholas Piggin
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The real mode interrupt entry points currently use rfid to branch to
the common handler in virtual mode. This is a significant amount of
code, and forces other code (notably the KVM test) to live in the
real mode handler.

In the interest of minimising the amount of code that runs unrelocated
move the switch to virt mode into the common code, and do it with
mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST
performance of real-mode interrupt handlers is not a big concern these
days).

This requires CTR to always be saved (real-mode needs to reach 0xc...)
but that's not a huge impact these days. It could be optimized away in
future.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/exception-64s.h |   4 -
 arch/powerpc/kernel/exceptions-64s.S     | 251 ++++++++++-------------
 2 files changed, 109 insertions(+), 146 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 33f4f72eb035..47bd4ea0837d 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -33,11 +33,7 @@
 #include <asm/feature-fixups.h>
 
 /* PACA save area size in u64 units (exgen, exmc, etc) */
-#if defined(CONFIG_RELOCATABLE)
 #define EX_SIZE		10
-#else
-#define EX_SIZE		9
-#endif
 
 /*
  * maximum recursive depth of MCE exceptions
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 4eb099046f9d..112cdb446e03 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -32,16 +32,10 @@
 #define EX_CCR		52
 #define EX_CFAR		56
 #define EX_PPR		64
-#if defined(CONFIG_RELOCATABLE)
 #define EX_CTR		72
 .if EX_SIZE != 10
 	.error "EX_SIZE is wrong"
 .endif
-#else
-.if EX_SIZE != 9
-	.error "EX_SIZE is wrong"
-.endif
-#endif
 
 /*
  * Following are fixed section helper macros.
@@ -124,22 +118,6 @@ name:
 #define EXC_HV		1
 #define EXC_STD		0
 
-#if defined(CONFIG_RELOCATABLE)
-/*
- * If we support interrupts with relocation on AND we're a relocatable kernel,
- * we need to use CTR to get to the 2nd level handler.  So, save/restore it
- * when required.
- */
-#define SAVE_CTR(reg, area)	mfctr	reg ; 	std	reg,area+EX_CTR(r13)
-#define GET_CTR(reg, area) 			ld	reg,area+EX_CTR(r13)
-#define RESTORE_CTR(reg, area)	ld	reg,area+EX_CTR(r13) ; mtctr reg
-#else
-/* ...else CTR is unused and in register. */
-#define SAVE_CTR(reg, area)
-#define GET_CTR(reg, area) 	mfctr	reg
-#define RESTORE_CTR(reg, area)
-#endif
-
 /*
  * PPR save/restore macros used in exceptions-64s.S
  * Used for P7 or later processors
@@ -199,6 +177,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IVEC		.L_IVEC_\name\()
 #define IHSRR		.L_IHSRR_\name\()
 #define IAREA		.L_IAREA_\name\()
+#define IVIRT		.L_IVIRT_\name\()
 #define IISIDE		.L_IISIDE_\name\()
 #define IDAR		.L_IDAR_\name\()
 #define IDSISR		.L_IDSISR_\name\()
@@ -232,6 +211,9 @@ do_define_int n
 	.ifndef IAREA
 		IAREA=PACA_EXGEN
 	.endif
+	.ifndef IVIRT
+		IVIRT=1
+	.endif
 	.ifndef IISIDE
 		IISIDE=0
 	.endif
@@ -325,7 +307,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	 * outside the head section. CONFIG_RELOCATABLE KVM expects CTR
 	 * to be saved in HSTATE_SCRATCH1.
 	 */
-	mfctr	r9
+	ld	r9,IAREA+EX_CTR(r13)
 	std	r9,HSTATE_SCRATCH1(r13)
 	__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
 	mtctr	r9
@@ -362,101 +344,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 .endm
 #endif
 
-.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri
-	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
-	.if ! \set_ri
-	xori	r10,r10,MSR_RI		/* Clear MSR_RI */
-	.endif
-	.if \hsrr == EXC_HV_OR_STD
-	BEGIN_FTR_SECTION
-	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
-	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
-	mtspr	SPRN_HSRR1,r10
-	FTR_SECTION_ELSE
-	mfspr	r11,SPRN_SRR0		/* save SRR0 */
-	mfspr	r12,SPRN_SRR1		/* and SRR1 */
-	mtspr	SPRN_SRR1,r10
-	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
-	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
-	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
-	mtspr	SPRN_HSRR1,r10
-	.else
-	mfspr	r11,SPRN_SRR0		/* save SRR0 */
-	mfspr	r12,SPRN_SRR1		/* and SRR1 */
-	mtspr	SPRN_SRR1,r10
-	.endif
-	LOAD_HANDLER(r10, \label\())
-	.if \hsrr == EXC_HV_OR_STD
-	BEGIN_FTR_SECTION
-	mtspr	SPRN_HSRR0,r10
-	HRFI_TO_KERNEL
-	FTR_SECTION_ELSE
-	mtspr	SPRN_SRR0,r10
-	RFI_TO_KERNEL
-	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
-	mtspr	SPRN_HSRR0,r10
-	HRFI_TO_KERNEL
-	.else
-	mtspr	SPRN_SRR0,r10
-	RFI_TO_KERNEL
-	.endif
-	b	.	/* prevent speculative execution */
-.endm
-
-/* INT_SAVE_SRR_AND_JUMP works for real or virt, this is faster but virt only */
-.macro INT_VIRT_SAVE_SRR_AND_JUMP label, hsrr
-#ifdef CONFIG_RELOCATABLE
-	.if \hsrr == EXC_HV_OR_STD
-	BEGIN_FTR_SECTION
-	mfspr	r11,SPRN_HSRR0	/* save HSRR0 */
-	FTR_SECTION_ELSE
-	mfspr	r11,SPRN_SRR0	/* save SRR0 */
-	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
-	mfspr	r11,SPRN_HSRR0	/* save HSRR0 */
-	.else
-	mfspr	r11,SPRN_SRR0	/* save SRR0 */
-	.endif
-	LOAD_HANDLER(r12, \label\())
-	mtctr	r12
-	.if \hsrr == EXC_HV_OR_STD
-	BEGIN_FTR_SECTION
-	mfspr	r12,SPRN_HSRR1	/* and HSRR1 */
-	FTR_SECTION_ELSE
-	mfspr	r12,SPRN_SRR1	/* and HSRR1 */
-	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
-	mfspr	r12,SPRN_HSRR1	/* and HSRR1 */
-	.else
-	mfspr	r12,SPRN_SRR1	/* and HSRR1 */
-	.endif
-	li	r10,MSR_RI
-	mtmsrd 	r10,1		/* Set RI (EE=0) */
-	bctr
-#else
-	.if \hsrr == EXC_HV_OR_STD
-	BEGIN_FTR_SECTION
-	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
-	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
-	FTR_SECTION_ELSE
-	mfspr	r11,SPRN_SRR0		/* save SRR0 */
-	mfspr	r12,SPRN_SRR1		/* and SRR1 */
-	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-	.elseif \hsrr
-	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
-	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
-	.else
-	mfspr	r11,SPRN_SRR0		/* save SRR0 */
-	mfspr	r12,SPRN_SRR1		/* and SRR1 */
-	.endif
-	li	r10,MSR_RI
-	mtmsrd 	r10,1			/* Set RI (EE=0) */
-	b	\label
-#endif
-.endm
-
 /*
  * This is the BOOK3S interrupt entry code macro.
  *
@@ -477,6 +364,23 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * - Fall through and continue executing in real, unrelocated mode.
  *   This is done if early=2.
  */
+
+.macro GEN_BRANCH_TO_COMMON name, virt
+	.if \virt
+#ifndef CONFIG_RELOCATABLE
+	b	\name\()_common_virt
+#else
+	LOAD_HANDLER(r10, \name\()_common_virt)
+	mtctr	r10
+	bctr
+#endif
+	.else
+	LOAD_HANDLER(r10, \name\()_common_real)
+	mtctr	r10
+	bctr
+	.endif
+.endm
+
 .macro GEN_INT_ENTRY name, virt, ool=0
 	SET_SCRATCH0(r13)			/* save r13 */
 	GET_PACA(r13)
@@ -500,8 +404,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
 	OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
 	INTERRUPT_TO_KERNEL
-	SAVE_CTR(r10, IAREA)
+	mfctr	r10
+	std	r10,IAREA+EX_CTR(r13)
 	mfcr	r9
+
 	.if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
 		KVMTEST \name IHSRR IVEC
 	.endif
@@ -566,27 +472,58 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.if IEARLY == 2
 	/* nothing more */
 	.elseif IEARLY
-	mfctr	r10			/* save ctr, even for !RELOCATABLE */
 	BRANCH_TO_C000(r11, \name\()_common)
-	.elseif !\virt
-	INT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR, ISET_RI
 	.else
-	INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR
+	.if IHSRR == EXC_HV_OR_STD
+	BEGIN_FTR_SECTION
+	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
+	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
+	FTR_SECTION_ELSE
+	mfspr	r11,SPRN_SRR0		/* save SRR0 */
+	mfspr	r12,SPRN_SRR1		/* and SRR1 */
+	ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
+	.elseif IHSRR
+	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
+	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
+	.else
+	mfspr	r11,SPRN_SRR0		/* save SRR0 */
+	mfspr	r12,SPRN_SRR1		/* and SRR1 */
 	.endif
+	GEN_BRANCH_TO_COMMON \name \virt
+	.endif
+
 	.if \ool
 	.popsection
 	.endif
 .endm
 
 /*
- * On entry r13 points to the paca, r9-r13 are saved in the paca,
- * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and
- * SRR1, and relocation is on.
- *
- * If stack=0, then the stack is already set in r1, and r1 is saved in r10.
- * PPR save and CPU accounting is not done for the !stack case (XXX why not?)
+ * __GEN_COMMON_ENTRY is required to receive the branch from interrupt
+ * entry, except in the case of the IEARLY handlers.
+ * This switches to virtual mode and sets MSR[RI].
  */
-.macro GEN_COMMON name
+.macro __GEN_COMMON_ENTRY name
+DEFINE_FIXED_SYMBOL(\name\()_common_real)
+\name\()_common_real:
+	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
+	/* MSR[RI] is clear iff using SRR regs */
+	.if IHSRR == EXC_HV_OR_STD
+	BEGIN_FTR_SECTION
+	xori	r10,r10,MSR_RI
+	END_FTR_SECTION_IFCLR(CPU_FTR_HVMODE)
+	.elseif ! IHSRR
+	xori	r10,r10,MSR_RI
+	.endif
+	mtmsrd	r10
+
+	.if IVIRT
+	.balign IFETCH_ALIGN_BYTES
+DEFINE_FIXED_SYMBOL(\name\()_common_virt)
+\name\()_common_virt:
+	.endif /* IVIRT */
+.endm
+
+.macro __GEN_COMMON_BODY name
 	.if ISTACK
 	andi.	r10,r12,MSR_PR		/* See if coming from user	*/
 	mr	r10,r1			/* Save r1			*/
@@ -604,6 +541,11 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	std	r0,GPR0(r1)		/* save r0 in stackframe	*/
 	std	r10,GPR1(r1)		/* save r1 in stackframe	*/
 
+	.if ISET_RI
+	li	r10,MSR_RI
+	mtmsrd	r10,1			/* Set MSR_RI */
+	.endif
+
 	.if ISTACK
 	.if IKUAP
 	kuap_save_amr_and_lock r9, r10, cr1, cr0
@@ -654,7 +596,7 @@ BEGIN_FTR_SECTION_NESTED(66)
 	ld	r10,IAREA+EX_CFAR(r13)
 	std	r10,ORIG_GPR3(r1)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
-	GET_CTR(r10, IAREA)
+	ld	r10,IAREA+EX_CTR(r13)
 	std	r10,_CTR(r1)
 	std	r2,GPR2(r1)		/* save r2 in stackframe	*/
 	SAVE_4GPRS(3, r1)		/* save r3 - r6 in stackframe   */
@@ -682,6 +624,19 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
 	.endif
 .endm
 
+/*
+ * On entry r13 points to the paca, r9-r13 are saved in the paca,
+ * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and
+ * SRR1, and relocation is on.
+ *
+ * If stack=0, then the stack is already set in r1, and r1 is saved in r10.
+ * PPR save and CPU accounting is not done for the !stack case (XXX why not?)
+ */
+.macro GEN_COMMON name
+	__GEN_COMMON_ENTRY \name
+	__GEN_COMMON_BODY \name
+.endm
+
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -834,6 +789,7 @@ EXC_VIRT_NONE(0x4000, 0x100)
 INT_DEFINE_BEGIN(system_reset)
 	IVEC=0x100
 	IAREA=PACA_EXNMI
+	IVIRT=0 /* no virt entry point */
 	/*
 	 * MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
 	 * being used, so a nested NMI exception would corrupt it.
@@ -913,6 +869,7 @@ TRAMP_REAL_BEGIN(system_reset_fwnmi)
 #endif /* CONFIG_PPC_PSERIES */
 
 EXC_COMMON_BEGIN(system_reset_common)
+	__GEN_COMMON_ENTRY system_reset
 	/*
 	 * Increment paca->in_nmi then enable MSR_RI. SLB or MCE will be able
 	 * to recover, but nested NMI will notice in_nmi and not recover
@@ -928,7 +885,7 @@ EXC_COMMON_BEGIN(system_reset_common)
 	mr	r10,r1
 	ld	r1,PACA_NMI_EMERG_SP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
-	GEN_COMMON system_reset
+	__GEN_COMMON_BODY system_reset
 	bl	save_nvgprs
 	/*
 	 * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does
@@ -973,6 +930,7 @@ EXC_COMMON_BEGIN(system_reset_common)
 INT_DEFINE_BEGIN(machine_check_early)
 	IVEC=0x200
 	IAREA=PACA_EXMC
+	IVIRT=0 /* no virt entry point */
 	/*
 	 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 	 * nested machine check corrupts it. machine_check_common enables
@@ -990,6 +948,7 @@ INT_DEFINE_END(machine_check_early)
 INT_DEFINE_BEGIN(machine_check)
 	IVEC=0x200
 	IAREA=PACA_EXMC
+	IVIRT=0 /* no virt entry point */
 	ISET_RI=0
 	IDAR=1
 	IDSISR=1
@@ -1022,7 +981,6 @@ TRAMP_KVM_BEGIN(machine_check_kvm)
 	EXCEPTION_RESTORE_REGS EXC_STD
 
 EXC_COMMON_BEGIN(machine_check_early_common)
-	mtctr	r10			/* Restore ctr */
 	mfspr	r11,SPRN_SRR0
 	mfspr	r12,SPRN_SRR1
 
@@ -1061,7 +1019,7 @@ EXC_COMMON_BEGIN(machine_check_early_common)
 	bgt	cr1,unrecoverable_mce	/* Check if we hit limit of 4 */
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame */
 
-	GEN_COMMON machine_check_early
+	__GEN_COMMON_BODY machine_check_early
 
 BEGIN_FTR_SECTION
 	bl	enable_machine_check
@@ -1448,6 +1406,8 @@ EXC_VIRT_END(program_check, 0x4700, 0x100)
 TRAMP_KVM_BEGIN(program_check_kvm)
 	GEN_KVM program_check
 EXC_COMMON_BEGIN(program_check_common)
+	__GEN_COMMON_ENTRY program_check
+
 	/*
 	 * It's possible to receive a TM Bad Thing type program check with
 	 * userspace register values (in particular r1), but with SRR1 reporting
@@ -1473,11 +1433,11 @@ EXC_COMMON_BEGIN(program_check_common)
 	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack		*/
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
 	__ISTACK(program_check)=0
-	GEN_COMMON program_check
+	__GEN_COMMON_BODY program_check
 	b 3f
 2:
 	__ISTACK(program_check)=1
-	GEN_COMMON program_check
+	__GEN_COMMON_BODY program_check
 3:
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
@@ -1861,14 +1821,13 @@ TRAMP_KVM_BEGIN(hmi_exception_kvm)
 	GEN_KVM hmi_exception
 
 EXC_COMMON_BEGIN(hmi_exception_early_common)
-	mtctr	r10			/* Restore ctr */
 	mfspr	r11,SPRN_HSRR0		/* Save HSRR0 */
 	mfspr	r12,SPRN_HSRR1		/* Save HSRR1 */
 	mr	r10,r1			/* Save r1 */
 	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack for realmode */
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
 
-	GEN_COMMON hmi_exception_early
+	__GEN_COMMON_BODY hmi_exception_early
 
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	hmi_exception_realmode
@@ -2195,7 +2154,9 @@ EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
 	bne+	denorm_assist
 #endif
 	KVMTEST denorm_exception, EXC_HV, 0x1500
-	INT_SAVE_SRR_AND_JUMP denorm_exception_common, EXC_HV, 1
+	mfspr	r11,SPRN_HSRR0
+	mfspr	r12,SPRN_HSRR1
+	GEN_BRANCH_TO_COMMON denorm_exception, virt=0
 EXC_REAL_END(denorm_exception, 0x1500, 0x100)
 #ifdef CONFIG_PPC_DENORMALISATION
 EXC_VIRT_BEGIN(denorm_exception, 0x5500, 0x100)
@@ -2203,7 +2164,9 @@ EXC_VIRT_BEGIN(denorm_exception, 0x5500, 0x100)
 	mfspr	r10,SPRN_HSRR1
 	andis.	r10,r10,(HSRR1_DENORM)@h /* denorm? */
 	bne+	denorm_assist
-	INT_VIRT_SAVE_SRR_AND_JUMP denorm_exception_common, EXC_HV
+	mfspr	r11,SPRN_HSRR0
+	mfspr	r12,SPRN_HSRR1
+	GEN_BRANCH_TO_COMMON denorm_exception, virt=1
 EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
 #else
 EXC_VIRT_NONE(0x5500, 0x100)
@@ -2374,7 +2337,11 @@ EXC_VIRT_NONE(0x5800, 0x100)
 	std	r12,PACA_EXGEN+EX_R12(r13);		\
 	GET_SCRATCH0(r10);				\
 	std	r10,PACA_EXGEN+EX_R13(r13);		\
-	INT_SAVE_SRR_AND_JUMP soft_nmi_common, _H, 1
+	mfspr	r11,SPRN_SRR0;		/* save SRR0 */	\
+	mfspr	r12,SPRN_SRR1;		/* and SRR1 */	\
+	LOAD_HANDLER(r10, soft_nmi_common);		\
+	mtctr	r10;					\
+	bctr
 
 /*
  * Branch to soft_nmi_interrupt using the emergency stack. The emergency
@@ -2390,7 +2357,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	ld	r1,PACAEMERGSP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
 	__ISTACK(decrementer)=0
-	GEN_COMMON decrementer
+	__GEN_COMMON_BODY decrementer
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	soft_nmi_interrupt
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 11/32] powerpc/64s/exception: move soft-mask test to common code
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (9 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 10/32] powerpc/64s/exception: move real->virt switch into the common handler Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 12/32] powerpc/64s/exception: move KVM " Nicholas Piggin
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

As well as moving code out of the unrelocated vectors, this allows the
masked handlers to be moved to common code, and allows the soft_nmi
handler to be generated more like a regular handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 106 +++++++++++++--------------
 1 file changed, 49 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 112cdb446e03..a23f2450f9ed 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -411,36 +411,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
 		KVMTEST \name IHSRR IVEC
 	.endif
-	.if IMASK
-		lbz	r10,PACAIRQSOFTMASK(r13)
-		andi.	r10,r10,IMASK
-		/* Associate vector numbers with bits in paca->irq_happened */
-		.if IVEC == 0x500 || IVEC == 0xea0
-		li	r10,PACA_IRQ_EE
-		.elseif IVEC == 0x900
-		li	r10,PACA_IRQ_DEC
-		.elseif IVEC == 0xa00 || IVEC == 0xe80
-		li	r10,PACA_IRQ_DBELL
-		.elseif IVEC == 0xe60
-		li	r10,PACA_IRQ_HMI
-		.elseif IVEC == 0xf00
-		li	r10,PACA_IRQ_PMI
-		.else
-		.abort "Bad maskable vector"
-		.endif
-
-		.if IHSRR == EXC_HV_OR_STD
-		BEGIN_FTR_SECTION
-		bne	masked_Hinterrupt
-		FTR_SECTION_ELSE
-		bne	masked_interrupt
-		ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-		.elseif IHSRR
-		bne	masked_Hinterrupt
-		.else
-		bne	masked_interrupt
-		.endif
-	.endif
 
 	std	r11,IAREA+EX_R11(r13)
 	std	r12,IAREA+EX_R12(r13)
@@ -524,6 +494,37 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 .endm
 
 .macro __GEN_COMMON_BODY name
+	.if IMASK
+		lbz	r10,PACAIRQSOFTMASK(r13)
+		andi.	r10,r10,IMASK
+		/* Associate vector numbers with bits in paca->irq_happened */
+		.if IVEC == 0x500 || IVEC == 0xea0
+		li	r10,PACA_IRQ_EE
+		.elseif IVEC == 0x900
+		li	r10,PACA_IRQ_DEC
+		.elseif IVEC == 0xa00 || IVEC == 0xe80
+		li	r10,PACA_IRQ_DBELL
+		.elseif IVEC == 0xe60
+		li	r10,PACA_IRQ_HMI
+		.elseif IVEC == 0xf00
+		li	r10,PACA_IRQ_PMI
+		.else
+		.abort "Bad maskable vector"
+		.endif
+
+		.if IHSRR == EXC_HV_OR_STD
+		BEGIN_FTR_SECTION
+		bne	masked_Hinterrupt
+		FTR_SECTION_ELSE
+		bne	masked_interrupt
+		ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
+		.elseif IHSRR
+		bne	masked_Hinterrupt
+		.else
+		bne	masked_interrupt
+		.endif
+	.endif
+
 	.if ISTACK
 	andi.	r10,r12,MSR_PR		/* See if coming from user	*/
 	mr	r10,r1			/* Save r1			*/
@@ -2330,18 +2331,10 @@ EXC_VIRT_NONE(0x5800, 0x100)
 
 #ifdef CONFIG_PPC_WATCHDOG
 
-#define MASKED_DEC_HANDLER_LABEL 3f
-
-#define MASKED_DEC_HANDLER(_H)				\
-3: /* soft-nmi */					\
-	std	r12,PACA_EXGEN+EX_R12(r13);		\
-	GET_SCRATCH0(r10);				\
-	std	r10,PACA_EXGEN+EX_R13(r13);		\
-	mfspr	r11,SPRN_SRR0;		/* save SRR0 */	\
-	mfspr	r12,SPRN_SRR1;		/* and SRR1 */	\
-	LOAD_HANDLER(r10, soft_nmi_common);		\
-	mtctr	r10;					\
-	bctr
+INT_DEFINE_BEGIN(soft_nmi)
+	IVEC=0x900
+	ISTACK=0
+INT_DEFINE_END(soft_nmi)
 
 /*
  * Branch to soft_nmi_interrupt using the emergency stack. The emergency
@@ -2353,19 +2346,16 @@ EXC_VIRT_NONE(0x5800, 0x100)
  * and run it entirely with interrupts hard disabled.
  */
 EXC_COMMON_BEGIN(soft_nmi_common)
+	mfspr	r11,SPRN_SRR0
 	mr	r10,r1
 	ld	r1,PACAEMERGSP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
-	__ISTACK(decrementer)=0
-	__GEN_COMMON_BODY decrementer
+	__GEN_COMMON_BODY soft_nmi
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	soft_nmi_interrupt
 	b	ret_from_except
 
-#else /* CONFIG_PPC_WATCHDOG */
-#define MASKED_DEC_HANDLER_LABEL 2f /* normal return */
-#define MASKED_DEC_HANDLER(_H)
 #endif /* CONFIG_PPC_WATCHDOG */
 
 /*
@@ -2384,7 +2374,6 @@ masked_Hinterrupt:
 	.else
 masked_interrupt:
 	.endif
-	std	r11,PACA_EXGEN+EX_R11(r13)
 	lbz	r11,PACAIRQHAPPENED(r13)
 	or	r11,r11,r10
 	stb	r11,PACAIRQHAPPENED(r13)
@@ -2393,26 +2382,30 @@ masked_interrupt:
 	lis	r10,0x7fff
 	ori	r10,r10,0xffff
 	mtspr	SPRN_DEC,r10
-	b	MASKED_DEC_HANDLER_LABEL
+#ifdef CONFIG_PPC_WATCHDOG
+	b	soft_nmi_common
+#else
+	b	2f
+#endif
 1:	andi.	r10,r10,PACA_IRQ_MUST_HARD_MASK
 	beq	2f
+	xori	r12,r12,MSR_EE	/* clear MSR_EE */
 	.if \hsrr
-	mfspr	r10,SPRN_HSRR1
-	xori	r10,r10,MSR_EE	/* clear MSR_EE */
-	mtspr	SPRN_HSRR1,r10
+	mtspr	SPRN_HSRR1,r12
 	.else
-	mfspr	r10,SPRN_SRR1
-	xori	r10,r10,MSR_EE	/* clear MSR_EE */
-	mtspr	SPRN_SRR1,r10
+	mtspr	SPRN_SRR1,r12
 	.endif
 	ori	r11,r11,PACA_IRQ_HARD_DIS
 	stb	r11,PACAIRQHAPPENED(r13)
 2:	/* done */
+	ld	r10,PACA_EXGEN+EX_CTR(r13)
+	mtctr	r10
 	mtcrf	0x80,r9
 	std	r1,PACAR1(r13)
 	ld	r9,PACA_EXGEN+EX_R9(r13)
 	ld	r10,PACA_EXGEN+EX_R10(r13)
 	ld	r11,PACA_EXGEN+EX_R11(r13)
+	ld	r12,PACA_EXGEN+EX_R12(r13)
 	/* returns to kernel where r13 must be set up, so don't restore it */
 	.if \hsrr
 	HRFI_TO_KERNEL
@@ -2420,7 +2413,6 @@ masked_interrupt:
 	RFI_TO_KERNEL
 	.endif
 	b	.
-	MASKED_DEC_HANDLER(\hsrr\())
 .endm
 
 TRAMP_REAL_BEGIN(stf_barrier_fallback)
@@ -2527,7 +2519,7 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
  * instruction code patches (which end up in the common .text area)
  * cannot reach these if they are put there.
  */
-USE_FIXED_SECTION(virt_trampolines)
+USE_TEXT_SECTION()
 	MASKED_INTERRUPT EXC_STD
 	MASKED_INTERRUPT EXC_HV
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 12/32] powerpc/64s/exception: move KVM test to common code
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (10 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 11/32] powerpc/64s/exception: move soft-mask test to common code Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 13/32] powerpc/64s/exception: remove confusing IEARLY option Nicholas Piggin
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

This allows more code to be moved out of unrelocated regions. The system
call KVMTEST is changed to be open-coded and remain in the tramp area to
avoid having to move it to entry_64.S. The custom nature of the system
call entry code means the hcall case can be made more streamlined than
regular interrupt handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S    | 239 ++++++++++++------------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  11 --
 arch/powerpc/kvm/book3s_segment.S       |   7 -
 3 files changed, 119 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index a23f2450f9ed..eb2f6ee4d652 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -44,7 +44,6 @@
  * EXC_VIRT_BEGIN/END  - virt (AIL), unrelocated exception vectors
  * TRAMP_REAL_BEGIN    - real, unrelocated helpers (virt may call these)
  * TRAMP_VIRT_BEGIN    - virt, unreloc helpers (in practice, real can use)
- * TRAMP_KVM_BEGIN     - KVM handlers, these are put into real, unrelocated
  * EXC_COMMON          - After switching to virtual, relocated mode.
  */
 
@@ -74,13 +73,6 @@ name:
 #define TRAMP_VIRT_BEGIN(name)					\
 	FIXED_SECTION_ENTRY_BEGIN(virt_trampolines, name)
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-#define TRAMP_KVM_BEGIN(name)					\
-	TRAMP_VIRT_BEGIN(name)
-#else
-#define TRAMP_KVM_BEGIN(name)
-#endif
-
 #define EXC_REAL_NONE(start, size)				\
 	FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, exc_real_##start##_##unused, start, size); \
 	FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, exc_real_##start##_##unused, start, size)
@@ -271,6 +263,9 @@ do_define_int n
 .endm
 
 .macro GEN_KVM name
+	.balign IFETCH_ALIGN_BYTES
+\name\()_kvm:
+
 	.if IKVM_SKIP
 	cmpwi	r10,KVM_GUEST_MODE_SKIP
 	beq	89f
@@ -281,13 +276,18 @@ BEGIN_FTR_SECTION_NESTED(947)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
 	.endif
 
+	ld	r10,PACA_EXGEN+EX_CTR(r13)
+	mtctr	r10
 BEGIN_FTR_SECTION_NESTED(948)
 	ld	r10,IAREA+EX_PPR(r13)
 	std	r10,HSTATE_PPR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-	ld	r10,IAREA+EX_R10(r13)
+	ld	r11,IAREA+EX_R11(r13)
+	ld	r12,IAREA+EX_R12(r13)
 	std	r12,HSTATE_SCRATCH0(r13)
 	sldi	r12,r9,32
+	ld	r9,IAREA+EX_R9(r13)
+	ld	r10,IAREA+EX_R10(r13)
 	/* HSRR variants have the 0x2 bit added to their trap number */
 	.if IHSRR == EXC_HV_OR_STD
 	BEGIN_FTR_SECTION
@@ -300,29 +300,16 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.else
 	ori	r12,r12,(IVEC)
 	.endif
-
-#ifdef CONFIG_RELOCATABLE
-	/*
-	 * KVM requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
-	 * outside the head section. CONFIG_RELOCATABLE KVM expects CTR
-	 * to be saved in HSTATE_SCRATCH1.
-	 */
-	ld	r9,IAREA+EX_CTR(r13)
-	std	r9,HSTATE_SCRATCH1(r13)
-	__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
-	mtctr	r9
-	ld	r9,IAREA+EX_R9(r13)
-	bctr
-#else
-	ld	r9,IAREA+EX_R9(r13)
 	b	kvmppc_interrupt
-#endif
-
 
 	.if IKVM_SKIP
 89:	mtocrf	0x80,r9
+	ld	r10,PACA_EXGEN+EX_CTR(r13)
+	mtctr	r10
 	ld	r9,IAREA+EX_R9(r13)
 	ld	r10,IAREA+EX_R10(r13)
+	ld	r11,IAREA+EX_R11(r13)
+	ld	r12,IAREA+EX_R12(r13)
 	.if IHSRR == EXC_HV_OR_STD
 	BEGIN_FTR_SECTION
 	b	kvmppc_skip_Hinterrupt
@@ -407,11 +394,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	mfctr	r10
 	std	r10,IAREA+EX_CTR(r13)
 	mfcr	r9
-
-	.if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
-		KVMTEST \name IHSRR IVEC
-	.endif
-
 	std	r11,IAREA+EX_R11(r13)
 	std	r12,IAREA+EX_R12(r13)
 
@@ -475,6 +457,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 .macro __GEN_COMMON_ENTRY name
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
+	.if IKVM_REAL
+		KVMTEST \name IHSRR IVEC
+	.endif
+
 	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
 	/* MSR[RI] is clear iff using SRR regs */
 	.if IHSRR == EXC_HV_OR_STD
@@ -487,9 +473,17 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 	mtmsrd	r10
 
 	.if IVIRT
+	.if IKVM_VIRT
+	b	1f /* skip the virt test coming from real */
+	.endif
+
 	.balign IFETCH_ALIGN_BYTES
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
+	.if IKVM_VIRT
+		KVMTEST \name IHSRR IVEC
+1:
+	.endif
 	.endif /* IVIRT */
 .endm
 
@@ -848,8 +842,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 	 */
 EXC_REAL_END(system_reset, 0x100, 0x100)
 EXC_VIRT_NONE(0x4100, 0x100)
-TRAMP_KVM_BEGIN(system_reset_kvm)
-	GEN_KVM system_reset
 
 #ifdef CONFIG_PPC_P7_NAP
 TRAMP_REAL_BEGIN(system_reset_idle_wake)
@@ -927,6 +919,8 @@ EXC_COMMON_BEGIN(system_reset_common)
 	EXCEPTION_RESTORE_REGS EXC_STD
 	RFI_TO_USER_OR_KERNEL
 
+	GEN_KVM system_reset
+
 
 INT_DEFINE_BEGIN(machine_check_early)
 	IVEC=0x200
@@ -968,9 +962,6 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
 	GEN_INT_ENTRY machine_check_early, virt=0
 #endif
 
-TRAMP_KVM_BEGIN(machine_check_kvm)
-	GEN_KVM machine_check
-
 #define MACHINE_CHECK_HANDLER_WINDUP			\
 	/* Clear MSR_RI before setting SRR0 and SRR1. */\
 	li	r9,0;					\
@@ -1126,6 +1117,9 @@ EXC_COMMON_BEGIN(machine_check_common)
 	bl	machine_check_exception
 	b	ret_from_except
 
+	GEN_KVM machine_check
+
+
 #ifdef CONFIG_PPC_P7_NAP
 /*
  * This is an idle wakeup. Low level machine check has already been
@@ -1218,8 +1212,6 @@ EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 	GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
-TRAMP_KVM_BEGIN(data_access_kvm)
-	GEN_KVM data_access
 EXC_COMMON_BEGIN(data_access_common)
 	GEN_COMMON data_access
 	ld	r4,_DAR(r1)
@@ -1232,6 +1224,8 @@ MMU_FTR_SECTION_ELSE
 	b	handle_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
+	GEN_KVM data_access
+
 
 INT_DEFINE_BEGIN(data_access_slb)
 	IVEC=0x380
@@ -1248,8 +1242,6 @@ EXC_REAL_END(data_access_slb, 0x380, 0x80)
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
 	GEN_INT_ENTRY data_access_slb, virt=1
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
-TRAMP_KVM_BEGIN(data_access_slb_kvm)
-	GEN_KVM data_access_slb
 EXC_COMMON_BEGIN(data_access_slb_common)
 	GEN_COMMON data_access_slb
 	ld	r4,_DAR(r1)
@@ -1274,6 +1266,8 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	bl	do_bad_slb_fault
 	b	ret_from_except
 
+	GEN_KVM data_access_slb
+
 
 INT_DEFINE_BEGIN(instruction_access)
 	IVEC=0x400
@@ -1289,8 +1283,6 @@ EXC_REAL_END(instruction_access, 0x400, 0x80)
 EXC_VIRT_BEGIN(instruction_access, 0x4400, 0x80)
 	GEN_INT_ENTRY instruction_access, virt=1
 EXC_VIRT_END(instruction_access, 0x4400, 0x80)
-TRAMP_KVM_BEGIN(instruction_access_kvm)
-	GEN_KVM instruction_access
 EXC_COMMON_BEGIN(instruction_access_common)
 	GEN_COMMON instruction_access
 	ld	r4,_DAR(r1)
@@ -1303,6 +1295,8 @@ MMU_FTR_SECTION_ELSE
 	b	handle_page_fault
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
+	GEN_KVM instruction_access
+
 
 INT_DEFINE_BEGIN(instruction_access_slb)
 	IVEC=0x480
@@ -1319,8 +1313,6 @@ EXC_REAL_END(instruction_access_slb, 0x480, 0x80)
 EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
 	GEN_INT_ENTRY instruction_access_slb, virt=1
 EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80)
-TRAMP_KVM_BEGIN(instruction_access_slb_kvm)
-	GEN_KVM instruction_access_slb
 EXC_COMMON_BEGIN(instruction_access_slb_common)
 	GEN_COMMON instruction_access_slb
 	ld	r4,_DAR(r1)
@@ -1345,6 +1337,9 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	bl	do_bad_slb_fault
 	b	ret_from_except
 
+	GEN_KVM instruction_access_slb
+
+
 INT_DEFINE_BEGIN(hardware_interrupt)
 	IVEC=0x500
 	IHSRR=EXC_HV_OR_STD
@@ -1359,8 +1354,6 @@ EXC_REAL_END(hardware_interrupt, 0x500, 0x100)
 EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100)
 	GEN_INT_ENTRY hardware_interrupt, virt=1
 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100)
-TRAMP_KVM_BEGIN(hardware_interrupt_kvm)
-	GEN_KVM hardware_interrupt
 EXC_COMMON_BEGIN(hardware_interrupt_common)
 	GEN_COMMON hardware_interrupt
 	FINISH_NAP
@@ -1369,6 +1362,8 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	bl	do_IRQ
 	b	ret_from_except_lite
 
+	GEN_KVM hardware_interrupt
+
 
 INT_DEFINE_BEGIN(alignment)
 	IVEC=0x600
@@ -1383,8 +1378,6 @@ EXC_REAL_END(alignment, 0x600, 0x100)
 EXC_VIRT_BEGIN(alignment, 0x4600, 0x100)
 	GEN_INT_ENTRY alignment, virt=1
 EXC_VIRT_END(alignment, 0x4600, 0x100)
-TRAMP_KVM_BEGIN(alignment_kvm)
-	GEN_KVM alignment
 EXC_COMMON_BEGIN(alignment_common)
 	GEN_COMMON alignment
 	bl	save_nvgprs
@@ -1392,6 +1385,8 @@ EXC_COMMON_BEGIN(alignment_common)
 	bl	alignment_exception
 	b	ret_from_except
 
+	GEN_KVM alignment
+
 
 INT_DEFINE_BEGIN(program_check)
 	IVEC=0x700
@@ -1404,8 +1399,6 @@ EXC_REAL_END(program_check, 0x700, 0x100)
 EXC_VIRT_BEGIN(program_check, 0x4700, 0x100)
 	GEN_INT_ENTRY program_check, virt=1
 EXC_VIRT_END(program_check, 0x4700, 0x100)
-TRAMP_KVM_BEGIN(program_check_kvm)
-	GEN_KVM program_check
 EXC_COMMON_BEGIN(program_check_common)
 	__GEN_COMMON_ENTRY program_check
 
@@ -1445,6 +1438,8 @@ EXC_COMMON_BEGIN(program_check_common)
 	bl	program_check_exception
 	b	ret_from_except
 
+	GEN_KVM program_check
+
 
 INT_DEFINE_BEGIN(fp_unavailable)
 	IVEC=0x800
@@ -1458,8 +1453,6 @@ EXC_REAL_END(fp_unavailable, 0x800, 0x100)
 EXC_VIRT_BEGIN(fp_unavailable, 0x4800, 0x100)
 	GEN_INT_ENTRY fp_unavailable, virt=1
 EXC_VIRT_END(fp_unavailable, 0x4800, 0x100)
-TRAMP_KVM_BEGIN(fp_unavailable_kvm)
-	GEN_KVM fp_unavailable
 EXC_COMMON_BEGIN(fp_unavailable_common)
 	GEN_COMMON fp_unavailable
 	bne	1f			/* if from user, just load it up */
@@ -1490,6 +1483,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 	b	ret_from_except
 #endif
 
+	GEN_KVM fp_unavailable
+
 
 INT_DEFINE_BEGIN(decrementer)
 	IVEC=0x900
@@ -1503,8 +1498,6 @@ EXC_REAL_END(decrementer, 0x900, 0x80)
 EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
 	GEN_INT_ENTRY decrementer, virt=1
 EXC_VIRT_END(decrementer, 0x4900, 0x80)
-TRAMP_KVM_BEGIN(decrementer_kvm)
-	GEN_KVM decrementer
 EXC_COMMON_BEGIN(decrementer_common)
 	GEN_COMMON decrementer
 	FINISH_NAP
@@ -1513,6 +1506,8 @@ EXC_COMMON_BEGIN(decrementer_common)
 	bl	timer_interrupt
 	b	ret_from_except_lite
 
+	GEN_KVM decrementer
+
 
 INT_DEFINE_BEGIN(hdecrementer)
 	IVEC=0x980
@@ -1527,8 +1522,6 @@ EXC_REAL_END(hdecrementer, 0x980, 0x80)
 EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
 	GEN_INT_ENTRY hdecrementer, virt=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
-TRAMP_KVM_BEGIN(hdecrementer_kvm)
-	GEN_KVM hdecrementer
 EXC_COMMON_BEGIN(hdecrementer_common)
 	GEN_COMMON hdecrementer
 	bl	save_nvgprs
@@ -1536,6 +1529,8 @@ EXC_COMMON_BEGIN(hdecrementer_common)
 	bl	hdec_interrupt
 	b	ret_from_except
 
+	GEN_KVM hdecrementer
+
 
 INT_DEFINE_BEGIN(doorbell_super)
 	IVEC=0xa00
@@ -1549,8 +1544,6 @@ EXC_REAL_END(doorbell_super, 0xa00, 0x100)
 EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100)
 	GEN_INT_ENTRY doorbell_super, virt=1
 EXC_VIRT_END(doorbell_super, 0x4a00, 0x100)
-TRAMP_KVM_BEGIN(doorbell_super_kvm)
-	GEN_KVM doorbell_super
 EXC_COMMON_BEGIN(doorbell_super_common)
 	GEN_COMMON doorbell_super
 	FINISH_NAP
@@ -1563,6 +1556,8 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #endif
 	b	ret_from_except_lite
 
+	GEN_KVM doorbell_super
+
 
 EXC_REAL_NONE(0xb00, 0x100)
 EXC_VIRT_NONE(0x4b00, 0x100)
@@ -1667,6 +1662,7 @@ EXC_VIRT_BEGIN(system_call, 0x4c00, 0x100)
 EXC_VIRT_END(system_call, 0x4c00, 0x100)
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+TRAMP_REAL_BEGIN(system_call_kvm)
 	/*
 	 * This is a hcall, so register convention is as above, with these
 	 * differences:
@@ -1674,20 +1670,35 @@ EXC_VIRT_END(system_call, 0x4c00, 0x100)
 	 * ctr = orig r13
 	 * orig r10 saved in PACA
 	 */
-TRAMP_KVM_BEGIN(system_call_kvm)
 	 /*
 	  * Save the PPR (on systems that support it) before changing to
 	  * HMT_MEDIUM. That allows the KVM code to save that value into the
 	  * guest state (it is the guest's PPR value).
 	  */
-	OPT_GET_SPR(r10, SPRN_PPR, CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION_NESTED(948)
+	mfspr	r10,SPRN_PPR
+	std	r10,HSTATE_PPR(r13)
+END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	HMT_MEDIUM
-	OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r10, CPU_FTR_HAS_PPR)
 	mfctr	r10
 	SET_SCRATCH0(r10)
-	std	r9,PACA_EXGEN+EX_R9(r13)
-	mfcr	r9
-	GEN_KVM system_call
+	mfcr	r10
+	std	r12,HSTATE_SCRATCH0(r13)
+	sldi	r12,r10,32
+	ori	r12,r12,0xc00
+#ifdef CONFIG_RELOCATABLE
+	/*
+	 * Requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
+	 * outside the head section.
+	 */
+	__LOAD_FAR_HANDLER(r10, kvmppc_interrupt)
+	mtctr   r10
+	ld	r10,PACA_EXGEN+EX_R10(r13)
+	bctr
+#else
+	ld	r10,PACA_EXGEN+EX_R10(r13)
+	b       kvmppc_interrupt
+#endif
 #endif
 
 
@@ -1702,8 +1713,6 @@ EXC_REAL_END(single_step, 0xd00, 0x100)
 EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
 	GEN_INT_ENTRY single_step, virt=1
 EXC_VIRT_END(single_step, 0x4d00, 0x100)
-TRAMP_KVM_BEGIN(single_step_kvm)
-	GEN_KVM single_step
 EXC_COMMON_BEGIN(single_step_common)
 	GEN_COMMON single_step
 	bl	save_nvgprs
@@ -1711,6 +1720,8 @@ EXC_COMMON_BEGIN(single_step_common)
 	bl	single_step_exception
 	b	ret_from_except
 
+	GEN_KVM single_step
+
 
 INT_DEFINE_BEGIN(h_data_storage)
 	IVEC=0xe00
@@ -1728,8 +1739,6 @@ EXC_REAL_END(h_data_storage, 0xe00, 0x20)
 EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20)
 	GEN_INT_ENTRY h_data_storage, virt=1, ool=1
 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20)
-TRAMP_KVM_BEGIN(h_data_storage_kvm)
-	GEN_KVM h_data_storage
 EXC_COMMON_BEGIN(h_data_storage_common)
 	GEN_COMMON h_data_storage
 	bl      save_nvgprs
@@ -1743,6 +1752,8 @@ MMU_FTR_SECTION_ELSE
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 	b       ret_from_except
 
+	GEN_KVM h_data_storage
+
 
 INT_DEFINE_BEGIN(h_instr_storage)
 	IVEC=0xe20
@@ -1757,8 +1768,6 @@ EXC_REAL_END(h_instr_storage, 0xe20, 0x20)
 EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20)
 	GEN_INT_ENTRY h_instr_storage, virt=1, ool=1
 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20)
-TRAMP_KVM_BEGIN(h_instr_storage_kvm)
-	GEN_KVM h_instr_storage
 EXC_COMMON_BEGIN(h_instr_storage_common)
 	GEN_COMMON h_instr_storage
 	bl	save_nvgprs
@@ -1766,6 +1775,8 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 	bl	unknown_exception
 	b	ret_from_except
 
+	GEN_KVM h_instr_storage
+
 
 INT_DEFINE_BEGIN(emulation_assist)
 	IVEC=0xe40
@@ -1780,8 +1791,6 @@ EXC_REAL_END(emulation_assist, 0xe40, 0x20)
 EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20)
 	GEN_INT_ENTRY emulation_assist, virt=1, ool=1
 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20)
-TRAMP_KVM_BEGIN(emulation_assist_kvm)
-	GEN_KVM emulation_assist
 EXC_COMMON_BEGIN(emulation_assist_common)
 	GEN_COMMON emulation_assist
 	bl	save_nvgprs
@@ -1789,6 +1798,8 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 	bl	emulation_assist_interrupt
 	b	ret_from_except
 
+	GEN_KVM emulation_assist
+
 
 /*
  * hmi_exception trampoline is a special case. It jumps to hmi_exception_early
@@ -1816,10 +1827,6 @@ EXC_REAL_BEGIN(hmi_exception, 0xe60, 0x20)
 	GEN_INT_ENTRY hmi_exception_early, virt=0, ool=1
 EXC_REAL_END(hmi_exception, 0xe60, 0x20)
 EXC_VIRT_NONE(0x4e60, 0x20)
-TRAMP_KVM_BEGIN(hmi_exception_early_kvm)
-	GEN_KVM hmi_exception_early
-TRAMP_KVM_BEGIN(hmi_exception_kvm)
-	GEN_KVM hmi_exception
 
 EXC_COMMON_BEGIN(hmi_exception_early_common)
 	mfspr	r11,SPRN_HSRR0		/* Save HSRR0 */
@@ -1846,6 +1853,8 @@ EXC_COMMON_BEGIN(hmi_exception_early_common)
 	EXCEPTION_RESTORE_REGS EXC_HV
 	GEN_INT_ENTRY hmi_exception, virt=0
 
+	GEN_KVM hmi_exception_early
+
 EXC_COMMON_BEGIN(hmi_exception_common)
 	GEN_COMMON hmi_exception
 	FINISH_NAP
@@ -1855,6 +1864,8 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 	bl	handle_hmi_exception
 	b	ret_from_except
 
+	GEN_KVM hmi_exception
+
 
 INT_DEFINE_BEGIN(h_doorbell)
 	IVEC=0xe80
@@ -1870,8 +1881,6 @@ EXC_REAL_END(h_doorbell, 0xe80, 0x20)
 EXC_VIRT_BEGIN(h_doorbell, 0x4e80, 0x20)
 	GEN_INT_ENTRY h_doorbell, virt=1, ool=1
 EXC_VIRT_END(h_doorbell, 0x4e80, 0x20)
-TRAMP_KVM_BEGIN(h_doorbell_kvm)
-	GEN_KVM h_doorbell
 EXC_COMMON_BEGIN(h_doorbell_common)
 	GEN_COMMON h_doorbell
 	FINISH_NAP
@@ -1884,6 +1893,8 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #endif
 	b	ret_from_except_lite
 
+	GEN_KVM h_doorbell
+
 
 INT_DEFINE_BEGIN(h_virt_irq)
 	IVEC=0xea0
@@ -1899,8 +1910,6 @@ EXC_REAL_END(h_virt_irq, 0xea0, 0x20)
 EXC_VIRT_BEGIN(h_virt_irq, 0x4ea0, 0x20)
 	GEN_INT_ENTRY h_virt_irq, virt=1, ool=1
 EXC_VIRT_END(h_virt_irq, 0x4ea0, 0x20)
-TRAMP_KVM_BEGIN(h_virt_irq_kvm)
-	GEN_KVM h_virt_irq
 EXC_COMMON_BEGIN(h_virt_irq_common)
 	GEN_COMMON h_virt_irq
 	FINISH_NAP
@@ -1909,6 +1918,8 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
 	bl	do_IRQ
 	b	ret_from_except_lite
 
+	GEN_KVM h_virt_irq
+
 
 EXC_REAL_NONE(0xec0, 0x20)
 EXC_VIRT_NONE(0x4ec0, 0x20)
@@ -1928,8 +1939,6 @@ EXC_REAL_END(performance_monitor, 0xf00, 0x20)
 EXC_VIRT_BEGIN(performance_monitor, 0x4f00, 0x20)
 	GEN_INT_ENTRY performance_monitor, virt=1, ool=1
 EXC_VIRT_END(performance_monitor, 0x4f00, 0x20)
-TRAMP_KVM_BEGIN(performance_monitor_kvm)
-	GEN_KVM performance_monitor
 EXC_COMMON_BEGIN(performance_monitor_common)
 	GEN_COMMON performance_monitor
 	FINISH_NAP
@@ -1938,6 +1947,8 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	bl	performance_monitor_exception
 	b	ret_from_except_lite
 
+	GEN_KVM performance_monitor
+
 
 INT_DEFINE_BEGIN(altivec_unavailable)
 	IVEC=0xf20
@@ -1951,8 +1962,6 @@ EXC_REAL_END(altivec_unavailable, 0xf20, 0x20)
 EXC_VIRT_BEGIN(altivec_unavailable, 0x4f20, 0x20)
 	GEN_INT_ENTRY altivec_unavailable, virt=1, ool=1
 EXC_VIRT_END(altivec_unavailable, 0x4f20, 0x20)
-TRAMP_KVM_BEGIN(altivec_unavailable_kvm)
-	GEN_KVM altivec_unavailable
 EXC_COMMON_BEGIN(altivec_unavailable_common)
 	GEN_COMMON altivec_unavailable
 #ifdef CONFIG_ALTIVEC
@@ -1986,6 +1995,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 	bl	altivec_unavailable_exception
 	b	ret_from_except
 
+	GEN_KVM altivec_unavailable
+
 
 INT_DEFINE_BEGIN(vsx_unavailable)
 	IVEC=0xf40
@@ -1999,8 +2010,6 @@ EXC_REAL_END(vsx_unavailable, 0xf40, 0x20)
 EXC_VIRT_BEGIN(vsx_unavailable, 0x4f40, 0x20)
 	GEN_INT_ENTRY vsx_unavailable, virt=1, ool=1
 EXC_VIRT_END(vsx_unavailable, 0x4f40, 0x20)
-TRAMP_KVM_BEGIN(vsx_unavailable_kvm)
-	GEN_KVM vsx_unavailable
 EXC_COMMON_BEGIN(vsx_unavailable_common)
 	GEN_COMMON vsx_unavailable
 #ifdef CONFIG_VSX
@@ -2033,6 +2042,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	bl	vsx_unavailable_exception
 	b	ret_from_except
 
+	GEN_KVM vsx_unavailable
+
 
 INT_DEFINE_BEGIN(facility_unavailable)
 	IVEC=0xf60
@@ -2045,8 +2056,6 @@ EXC_REAL_END(facility_unavailable, 0xf60, 0x20)
 EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20)
 	GEN_INT_ENTRY facility_unavailable, virt=1, ool=1
 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20)
-TRAMP_KVM_BEGIN(facility_unavailable_kvm)
-	GEN_KVM facility_unavailable
 EXC_COMMON_BEGIN(facility_unavailable_common)
 	GEN_COMMON facility_unavailable
 	bl	save_nvgprs
@@ -2054,6 +2063,8 @@ EXC_COMMON_BEGIN(facility_unavailable_common)
 	bl	facility_unavailable_exception
 	b	ret_from_except
 
+	GEN_KVM facility_unavailable
+
 
 INT_DEFINE_BEGIN(h_facility_unavailable)
 	IVEC=0xf80
@@ -2068,8 +2079,6 @@ EXC_REAL_END(h_facility_unavailable, 0xf80, 0x20)
 EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20)
 	GEN_INT_ENTRY h_facility_unavailable, virt=1, ool=1
 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20)
-TRAMP_KVM_BEGIN(h_facility_unavailable_kvm)
-	GEN_KVM h_facility_unavailable
 EXC_COMMON_BEGIN(h_facility_unavailable_common)
 	GEN_COMMON h_facility_unavailable
 	bl	save_nvgprs
@@ -2077,6 +2086,8 @@ EXC_COMMON_BEGIN(h_facility_unavailable_common)
 	bl	facility_unavailable_exception
 	b	ret_from_except
 
+	GEN_KVM h_facility_unavailable
+
 
 EXC_REAL_NONE(0xfa0, 0x20)
 EXC_VIRT_NONE(0x4fa0, 0x20)
@@ -2102,14 +2113,15 @@ EXC_REAL_BEGIN(cbe_system_error, 0x1200, 0x100)
 	GEN_INT_ENTRY cbe_system_error, virt=0
 EXC_REAL_END(cbe_system_error, 0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
-TRAMP_KVM_BEGIN(cbe_system_error_kvm)
-	GEN_KVM cbe_system_error
 EXC_COMMON_BEGIN(cbe_system_error_common)
 	GEN_COMMON cbe_system_error
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_system_error_exception
 	b	ret_from_except
+
+	GEN_KVM cbe_system_error
+
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
@@ -2128,8 +2140,6 @@ EXC_REAL_END(instruction_breakpoint, 0x1300, 0x100)
 EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100)
 	GEN_INT_ENTRY instruction_breakpoint, virt=1
 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100)
-TRAMP_KVM_BEGIN(instruction_breakpoint_kvm)
-	GEN_KVM instruction_breakpoint
 EXC_COMMON_BEGIN(instruction_breakpoint_common)
 	GEN_COMMON instruction_breakpoint
 	bl	save_nvgprs
@@ -2137,6 +2147,8 @@ EXC_COMMON_BEGIN(instruction_breakpoint_common)
 	bl	instruction_breakpoint_exception
 	b	ret_from_except
 
+	GEN_KVM instruction_breakpoint
+
 
 EXC_REAL_NONE(0x1400, 0x100)
 EXC_VIRT_NONE(0x5400, 0x100)
@@ -2145,6 +2157,7 @@ INT_DEFINE_BEGIN(denorm_exception)
 	IVEC=0x1500
 	IHSRR=EXC_HV
 	IEARLY=2
+	IKVM_REAL=1
 INT_DEFINE_END(denorm_exception)
 
 EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
@@ -2154,7 +2167,6 @@ EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
 	andis.	r10,r10,(HSRR1_DENORM)@h /* denorm? */
 	bne+	denorm_assist
 #endif
-	KVMTEST denorm_exception, EXC_HV, 0x1500
 	mfspr	r11,SPRN_HSRR0
 	mfspr	r12,SPRN_HSRR1
 	GEN_BRANCH_TO_COMMON denorm_exception, virt=0
@@ -2172,8 +2184,6 @@ EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
 #else
 EXC_VIRT_NONE(0x5500, 0x100)
 #endif
-TRAMP_KVM_BEGIN(denorm_exception_kvm)
-	GEN_KVM denorm_exception
 
 #ifdef CONFIG_PPC_DENORMALISATION
 TRAMP_REAL_BEGIN(denorm_assist)
@@ -2251,6 +2261,8 @@ EXC_COMMON_BEGIN(denorm_exception_common)
 	bl	unknown_exception
 	b	ret_from_except
 
+	GEN_KVM denorm_exception
+
 
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_maintenance)
@@ -2264,14 +2276,15 @@ EXC_REAL_BEGIN(cbe_maintenance, 0x1600, 0x100)
 	GEN_INT_ENTRY cbe_maintenance, virt=0
 EXC_REAL_END(cbe_maintenance, 0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
-TRAMP_KVM_BEGIN(cbe_maintenance_kvm)
-	GEN_KVM cbe_maintenance
 EXC_COMMON_BEGIN(cbe_maintenance_common)
 	GEN_COMMON cbe_maintenance
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_maintenance_exception
 	b	ret_from_except
+
+	GEN_KVM cbe_maintenance
+
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
@@ -2289,8 +2302,6 @@ EXC_REAL_END(altivec_assist, 0x1700, 0x100)
 EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100)
 	GEN_INT_ENTRY altivec_assist, virt=1
 EXC_VIRT_END(altivec_assist, 0x5700, 0x100)
-TRAMP_KVM_BEGIN(altivec_assist_kvm)
-	GEN_KVM altivec_assist
 EXC_COMMON_BEGIN(altivec_assist_common)
 	GEN_COMMON altivec_assist
 	bl	save_nvgprs
@@ -2302,6 +2313,8 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 #endif
 	b	ret_from_except
 
+	GEN_KVM altivec_assist
+
 
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_thermal)
@@ -2315,14 +2328,15 @@ EXC_REAL_BEGIN(cbe_thermal, 0x1800, 0x100)
 	GEN_INT_ENTRY cbe_thermal, virt=0
 EXC_REAL_END(cbe_thermal, 0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
-TRAMP_KVM_BEGIN(cbe_thermal_kvm)
-	GEN_KVM cbe_thermal
 EXC_COMMON_BEGIN(cbe_thermal_common)
 	GEN_COMMON cbe_thermal
 	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_thermal_exception
 	b	ret_from_except
+
+	GEN_KVM cbe_thermal
+
 #else /* CONFIG_CBE_RAS */
 EXC_REAL_NONE(0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
@@ -2514,17 +2528,12 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
 	GET_SCRATCH0(r13);
 	hrfid
 
-/*
- * Real mode exceptions actually use this too, but alternate
- * instruction code patches (which end up in the common .text area)
- * cannot reach these if they are put there.
- */
 USE_TEXT_SECTION()
 	MASKED_INTERRUPT EXC_STD
 	MASKED_INTERRUPT EXC_HV
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-TRAMP_REAL_BEGIN(kvmppc_skip_interrupt)
+kvmppc_skip_interrupt:
 	/*
 	 * Here all GPRs are unchanged from when the interrupt happened
 	 * except for r13, which is saved in SPRG_SCRATCH0.
@@ -2536,7 +2545,7 @@ TRAMP_REAL_BEGIN(kvmppc_skip_interrupt)
 	RFI_TO_KERNEL
 	b	.
 
-TRAMP_REAL_BEGIN(kvmppc_skip_Hinterrupt)
+kvmppc_skip_Hinterrupt:
 	/*
 	 * Here all GPRs are unchanged from when the interrupt happened
 	 * except for r13, which is saved in SPRG_SCRATCH0.
@@ -2549,16 +2558,6 @@ TRAMP_REAL_BEGIN(kvmppc_skip_Hinterrupt)
 	b	.
 #endif
 
-/*
- * Ensure that any handlers that get invoked from the exception prologs
- * above are below the first 64KB (0x10000) of the kernel image because
- * the prologs assemble the addresses of these handlers using the
- * LOAD_HANDLER macro, which uses an ori instruction.
- */
-
-/*** Common interrupt handlers ***/
-
-
 	/*
 	 * Relocation-on interrupts: A subset of the interrupts can be delivered
 	 * with IR=1/DR=1, if AIL==2 and MSR.HV won't be changed by delivering
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dbc2fecc37f0..780a499c7114 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1266,7 +1266,6 @@ kvmppc_interrupt_hv:
 	 * R12		= (guest CR << 32) | interrupt vector
 	 * R13		= PACA
 	 * guest R12 saved in shadow VCPU SCRATCH0
-	 * guest CTR saved in shadow VCPU SCRATCH1 if RELOCATABLE
 	 * guest R13 saved in SPRN_SCRATCH0
 	 */
 	std	r9, HSTATE_SCRATCH2(r13)
@@ -1367,12 +1366,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 11:	stw	r3,VCPU_HEIR(r9)
 
 	/* these are volatile across C function calls */
-#ifdef CONFIG_RELOCATABLE
-	ld	r3, HSTATE_SCRATCH1(r13)
-	mtctr	r3
-#else
 	mfctr	r3
-#endif
 	mfxer	r4
 	std	r3, VCPU_CTR(r9)
 	std	r4, VCPU_XER(r9)
@@ -3258,7 +3252,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_P9_TM_HV_ASSIST)
  * r12 is (CR << 32) | vector
  * r13 points to our PACA
  * r12 is saved in HSTATE_SCRATCH0(r13)
- * ctr is saved in HSTATE_SCRATCH1(r13) if RELOCATABLE
  * r9 is saved in HSTATE_SCRATCH2(r13)
  * r13 is saved in HSPRG1
  * cfar is saved in HSTATE_CFAR(r13)
@@ -3307,11 +3300,7 @@ kvmppc_bad_host_intr:
 	ld	r5, HSTATE_CFAR(r13)
 	std	r5, ORIG_GPR3(r1)
 	mflr	r3
-#ifdef CONFIG_RELOCATABLE
-	ld	r4, HSTATE_SCRATCH1(r13)
-#else
 	mfctr	r4
-#endif
 	mfxer	r5
 	lbz	r6, PACAIRQSOFTMASK(r13)
 	std	r3, _LINK(r1)
diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S
index 0169bab544dd..1f492aa4c8d6 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -167,16 +167,9 @@ kvmppc_interrupt_pr:
 	 * R12             = (guest CR << 32) | exit handler id
 	 * R13             = PACA
 	 * HSTATE.SCRATCH0 = guest R12
-	 * HSTATE.SCRATCH1 = guest CTR if RELOCATABLE
 	 */
 #ifdef CONFIG_PPC64
 	/* Match 32-bit entry */
-#ifdef CONFIG_RELOCATABLE
-	std	r9, HSTATE_SCRATCH2(r13)
-	ld	r9, HSTATE_SCRATCH1(r13)
-	mtctr	r9
-	ld	r9, HSTATE_SCRATCH2(r13)
-#endif
 	rotldi	r12, r12, 32		  /* Flip R12 halves for stw */
 	stw	r12, HSTATE_SCRATCH1(r13) /* CR is now in the low half */
 	srdi	r12, r12, 32		  /* shift trap into low half */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 13/32] powerpc/64s/exception: remove confusing IEARLY option
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (11 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 12/32] powerpc/64s/exception: move KVM " Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 14/32] powerpc/64s/exception: remove the SPR saving patch code macros Nicholas Piggin
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Replace IEARLY=1 and IEARLY=2 with IBRANCH_COMMON, which controls if
the entry code branches to a common handler; and IREALMODE_COMMON,
which controls whether the common handler should remain in real mode.

These special cases no longer avoid loading the SRR registers, there
is no point as most of them load the registers immediately anyway.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 48 ++++++++++++++--------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index eb2f6ee4d652..f4f35d01fe00 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -174,7 +174,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IDAR		.L_IDAR_\name\()
 #define IDSISR		.L_IDSISR_\name\()
 #define ISET_RI		.L_ISET_RI_\name\()
-#define IEARLY		.L_IEARLY_\name\()
+#define IBRANCH_TO_COMMON	.L_IBRANCH_TO_COMMON_\name\()
+#define IREALMODE_COMMON	.L_IREALMODE_COMMON_\name\()
 #define IMASK		.L_IMASK_\name\()
 #define IKVM_SKIP	.L_IKVM_SKIP_\name\()
 #define IKVM_REAL	.L_IKVM_REAL_\name\()
@@ -218,8 +219,15 @@ do_define_int n
 	.ifndef ISET_RI
 		ISET_RI=1
 	.endif
-	.ifndef IEARLY
-		IEARLY=0
+	.ifndef IBRANCH_TO_COMMON
+		IBRANCH_TO_COMMON=1
+	.endif
+	.ifndef IREALMODE_COMMON
+		IREALMODE_COMMON=0
+	.else
+		.if ! IBRANCH_TO_COMMON
+			.error "IREALMODE_COMMON=1 but IBRANCH_TO_COMMON=0"
+		.endif
 	.endif
 	.ifndef IMASK
 		IMASK=0
@@ -353,6 +361,11 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  */
 
 .macro GEN_BRANCH_TO_COMMON name, virt
+	.if IREALMODE_COMMON
+	LOAD_HANDLER(r10, \name\()_common)
+	mtctr	r10
+	bctr
+	.else
 	.if \virt
 #ifndef CONFIG_RELOCATABLE
 	b	\name\()_common_virt
@@ -366,6 +379,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	mtctr	r10
 	bctr
 	.endif
+	.endif
 .endm
 
 .macro GEN_INT_ENTRY name, virt, ool=0
@@ -421,11 +435,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	stw	r10,IAREA+EX_DSISR(r13)
 	.endif
 
-	.if IEARLY == 2
-	/* nothing more */
-	.elseif IEARLY
-	BRANCH_TO_C000(r11, \name\()_common)
-	.else
 	.if IHSRR == EXC_HV_OR_STD
 	BEGIN_FTR_SECTION
 	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
@@ -441,6 +450,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	mfspr	r11,SPRN_SRR0		/* save SRR0 */
 	mfspr	r12,SPRN_SRR1		/* and SRR1 */
 	.endif
+
+	.if IBRANCH_TO_COMMON
 	GEN_BRANCH_TO_COMMON \name \virt
 	.endif
 
@@ -926,6 +937,7 @@ INT_DEFINE_BEGIN(machine_check_early)
 	IVEC=0x200
 	IAREA=PACA_EXMC
 	IVIRT=0 /* no virt entry point */
+	IREALMODE_COMMON=1
 	/*
 	 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 	 * nested machine check corrupts it. machine_check_common enables
@@ -933,7 +945,6 @@ INT_DEFINE_BEGIN(machine_check_early)
 	 */
 	ISET_RI=0
 	ISTACK=0
-	IEARLY=1
 	IDAR=1
 	IDSISR=1
 	IRECONCILE=0
@@ -973,9 +984,6 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
 	EXCEPTION_RESTORE_REGS EXC_STD
 
 EXC_COMMON_BEGIN(machine_check_early_common)
-	mfspr	r11,SPRN_SRR0
-	mfspr	r12,SPRN_SRR1
-
 	/*
 	 * Switch to mc_emergency stack and handle re-entrancy (we limit
 	 * the nested MCE upto level 4 to avoid stack overflow).
@@ -1809,7 +1817,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 INT_DEFINE_BEGIN(hmi_exception_early)
 	IVEC=0xe60
 	IHSRR=EXC_HV
-	IEARLY=1
+	IREALMODE_COMMON=1
 	ISTACK=0
 	IRECONCILE=0
 	IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
@@ -1829,8 +1837,6 @@ EXC_REAL_END(hmi_exception, 0xe60, 0x20)
 EXC_VIRT_NONE(0x4e60, 0x20)
 
 EXC_COMMON_BEGIN(hmi_exception_early_common)
-	mfspr	r11,SPRN_HSRR0		/* Save HSRR0 */
-	mfspr	r12,SPRN_HSRR1		/* Save HSRR1 */
 	mr	r10,r1			/* Save r1 */
 	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack for realmode */
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
@@ -2156,29 +2162,23 @@ EXC_VIRT_NONE(0x5400, 0x100)
 INT_DEFINE_BEGIN(denorm_exception)
 	IVEC=0x1500
 	IHSRR=EXC_HV
-	IEARLY=2
+	IBRANCH_TO_COMMON=0
 	IKVM_REAL=1
 INT_DEFINE_END(denorm_exception)
 
 EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
 	GEN_INT_ENTRY denorm_exception, virt=0
 #ifdef CONFIG_PPC_DENORMALISATION
-	mfspr	r10,SPRN_HSRR1
-	andis.	r10,r10,(HSRR1_DENORM)@h /* denorm? */
+	andis.	r10,r12,(HSRR1_DENORM)@h /* denorm? */
 	bne+	denorm_assist
 #endif
-	mfspr	r11,SPRN_HSRR0
-	mfspr	r12,SPRN_HSRR1
 	GEN_BRANCH_TO_COMMON denorm_exception, virt=0
 EXC_REAL_END(denorm_exception, 0x1500, 0x100)
 #ifdef CONFIG_PPC_DENORMALISATION
 EXC_VIRT_BEGIN(denorm_exception, 0x5500, 0x100)
 	GEN_INT_ENTRY denorm_exception, virt=1
-	mfspr	r10,SPRN_HSRR1
-	andis.	r10,r10,(HSRR1_DENORM)@h /* denorm? */
+	andis.	r10,r12,(HSRR1_DENORM)@h /* denorm? */
 	bne+	denorm_assist
-	mfspr	r11,SPRN_HSRR0
-	mfspr	r12,SPRN_HSRR1
 	GEN_BRANCH_TO_COMMON denorm_exception, virt=1
 EXC_VIRT_END(denorm_exception, 0x5500, 0x100)
 #else
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 14/32] powerpc/64s/exception: remove the SPR saving patch code macros
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (12 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 13/32] powerpc/64s/exception: remove confusing IEARLY option Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 15/32] powerpc/64s/exception: trim unused arguments from KVMTEST macro Nicholas Piggin
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

These are used infrequently enough they don't provide much help, so
inline them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 82 ++++++++++------------------
 1 file changed, 28 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f4f35d01fe00..feb563416abd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -110,46 +110,6 @@ name:
 #define EXC_HV		1
 #define EXC_STD		0
 
-/*
- * PPR save/restore macros used in exceptions-64s.S
- * Used for P7 or later processors
- */
-#define SAVE_PPR(area, ra)						\
-BEGIN_FTR_SECTION_NESTED(940)						\
-	ld	ra,area+EX_PPR(r13);	/* Read PPR from paca */	\
-	std	ra,_PPR(r1);						\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
-
-#define RESTORE_PPR_PACA(area, ra)					\
-BEGIN_FTR_SECTION_NESTED(941)						\
-	ld	ra,area+EX_PPR(r13);					\
-	mtspr	SPRN_PPR,ra;						\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
-
-/*
- * Get an SPR into a register if the CPU has the given feature
- */
-#define OPT_GET_SPR(ra, spr, ftr)					\
-BEGIN_FTR_SECTION_NESTED(943)						\
-	mfspr	ra,spr;							\
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Set an SPR from a register if the CPU has the given feature
- */
-#define OPT_SET_SPR(ra, spr, ftr)					\
-BEGIN_FTR_SECTION_NESTED(943)						\
-	mtspr	spr,ra;							\
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Save a register to the PACA if the CPU has the given feature
- */
-#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr)				\
-BEGIN_FTR_SECTION_NESTED(943)						\
-	std	ra,offset(r13);						\
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -278,18 +238,18 @@ do_define_int n
 	cmpwi	r10,KVM_GUEST_MODE_SKIP
 	beq	89f
 	.else
-BEGIN_FTR_SECTION_NESTED(947)
+BEGIN_FTR_SECTION
 	ld	r10,IAREA+EX_CFAR(r13)
 	std	r10,HSTATE_CFAR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	.endif
 
 	ld	r10,PACA_EXGEN+EX_CTR(r13)
 	mtctr	r10
-BEGIN_FTR_SECTION_NESTED(948)
+BEGIN_FTR_SECTION
 	ld	r10,IAREA+EX_PPR(r13)
 	std	r10,HSTATE_PPR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	ld	r11,IAREA+EX_R11(r13)
 	ld	r12,IAREA+EX_R12(r13)
 	std	r12,HSTATE_SCRATCH0(r13)
@@ -386,10 +346,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	SET_SCRATCH0(r13)			/* save r13 */
 	GET_PACA(r13)
 	std	r9,IAREA+EX_R9(r13)		/* save r9 */
-	OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION
+	mfspr	r9,SPRN_PPR
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	HMT_MEDIUM
 	std	r10,IAREA+EX_R10(r13)		/* save r10 - r12 */
-	OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+	mfspr	r10,SPRN_CFAR
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	.if \ool
 	.if !\virt
 	b	tramp_real_\name
@@ -402,8 +366,12 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 	.endif
 	.endif
 
-	OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
-	OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+	std	r9,IAREA+EX_PPR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION
+	std	r10,IAREA+EX_CFAR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	INTERRUPT_TO_KERNEL
 	mfctr	r10
 	std	r10,IAREA+EX_CTR(r13)
@@ -558,7 +526,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 	.endif
 	beq	101f			/* if from kernel mode		*/
 	ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
-	SAVE_PPR(IAREA, r9)
+BEGIN_FTR_SECTION
+	ld	r9,IAREA+EX_PPR(r13)	/* Read PPR from paca		*/
+	std	r9,_PPR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 101:
 	.else
 	.if IKUAP
@@ -598,10 +569,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 	std	r10,_DSISR(r1)
 	.endif
 
-BEGIN_FTR_SECTION_NESTED(66)
+BEGIN_FTR_SECTION
 	ld	r10,IAREA+EX_CFAR(r13)
 	std	r10,ORIG_GPR3(r1)
-END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	ld	r10,IAREA+EX_CTR(r13)
 	std	r10,_CTR(r1)
 	std	r2,GPR2(r1)		/* save r2 in stackframe	*/
@@ -1683,10 +1654,10 @@ TRAMP_REAL_BEGIN(system_call_kvm)
 	  * HMT_MEDIUM. That allows the KVM code to save that value into the
 	  * guest state (it is the guest's PPR value).
 	  */
-BEGIN_FTR_SECTION_NESTED(948)
+BEGIN_FTR_SECTION
 	mfspr	r10,SPRN_PPR
 	std	r10,HSTATE_PPR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	HMT_MEDIUM
 	mfctr	r10
 	SET_SCRATCH0(r10)
@@ -2241,7 +2212,10 @@ denorm_done:
 	mtspr	SPRN_HSRR0,r11
 	mtcrf	0x80,r9
 	ld	r9,PACA_EXGEN+EX_R9(r13)
-	RESTORE_PPR_PACA(PACA_EXGEN, r10)
+BEGIN_FTR_SECTION
+	ld	r10,PACA_EXGEN+EX_PPR(r13)
+	mtspr	SPRN_PPR,r10
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 BEGIN_FTR_SECTION
 	ld	r10,PACA_EXGEN+EX_CFAR(r13)
 	mtspr	SPRN_CFAR,r10
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 15/32] powerpc/64s/exception: trim unused arguments from KVMTEST macro
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (13 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 14/32] powerpc/64s/exception: remove the SPR saving patch code macros Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 16/32] powerpc/64s/exception: hdecrementer avoid touching the stack Nicholas Piggin
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index feb563416abd..7e056488d42a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -224,7 +224,7 @@ do_define_int n
 #define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
-.macro KVMTEST name, hsrr, n
+.macro KVMTEST name
 	lbz	r10,HSTATE_IN_GUEST(r13)
 	cmpwi	r10,0
 	bne	\name\()_kvm
@@ -293,7 +293,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 .endm
 
 #else
-.macro KVMTEST name, hsrr, n
+.macro KVMTEST name
 .endm
 .macro GEN_KVM name
 .endm
@@ -437,7 +437,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
 	.if IKVM_REAL
-		KVMTEST \name IHSRR IVEC
+		KVMTEST \name
 	.endif
 
 	ld	r10,PACAKMSR(r13)	/* get MSR value for kernel */
@@ -460,7 +460,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
 	.if IKVM_VIRT
-		KVMTEST \name IHSRR IVEC
+		KVMTEST \name
 1:
 	.endif
 	.endif /* IVIRT */
@@ -1582,7 +1582,7 @@ INT_DEFINE_END(system_call)
 	GET_PACA(r13)
 	std	r10,PACA_EXGEN+EX_R10(r13)
 	INTERRUPT_TO_KERNEL
-	KVMTEST system_call EXC_STD 0xc00 /* uses r10, branch to system_call_kvm */
+	KVMTEST system_call /* uses r10, branch to system_call_kvm */
 	mfctr	r9
 #else
 	mr	r9,r13
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 16/32] powerpc/64s/exception: hdecrementer avoid touching the stack
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (14 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 15/32] powerpc/64s/exception: trim unused arguments from KVMTEST macro Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 17/32] powerpc/64s/exception: re-inline some handlers Nicholas Piggin
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The hdec interrupt handler is reported to sometimes fire in Linux if
KVM leaves it pending after a guest exists. This is harmless, so there
is a no-op handler for it.

The interrupt handler currently uses the regular kernel stack. Change
this to avoid touching the stack entirely.

This should be the last place where the regular Linux stack can be
accessed with asynchronous interrupts (including PMI) soft-masked.
It might be possible to take advantage of this invariant, e.g., to
context switch the kernel stack SLB entry without clearing MSR[EE].

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/time.h      |  1 -
 arch/powerpc/kernel/exceptions-64s.S | 25 ++++++++++++++++++++-----
 arch/powerpc/kernel/time.c           |  9 ---------
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 08dbe3e6831c..e0107495c4de 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -24,7 +24,6 @@ extern struct clock_event_device decrementer_clockevent;
 
 
 extern void generic_calibrate_decr(void);
-extern void hdec_interrupt(struct pt_regs *regs);
 
 /* Some sane defaults: 125 MHz timebase, 1GHz processor */
 extern unsigned long ppc_proc_freq;
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 7e056488d42a..f87dc4bf937d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1491,6 +1491,8 @@ EXC_COMMON_BEGIN(decrementer_common)
 INT_DEFINE_BEGIN(hdecrementer)
 	IVEC=0x980
 	IHSRR=EXC_HV
+	ISTACK=0
+	IRECONCILE=0
 	IKVM_REAL=1
 	IKVM_VIRT=1
 INT_DEFINE_END(hdecrementer)
@@ -1502,11 +1504,24 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
 	GEN_INT_ENTRY hdecrementer, virt=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
 EXC_COMMON_BEGIN(hdecrementer_common)
-	GEN_COMMON hdecrementer
-	bl	save_nvgprs
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	hdec_interrupt
-	b	ret_from_except
+	__GEN_COMMON_ENTRY hdecrementer
+	/*
+	 * Hypervisor decrementer interrupts not caught by the KVM test
+	 * shouldn't occur but are sometimes left pending on exit from a KVM
+	 * guest.  We don't need to do anything to clear them, as they are
+	 * edge-triggered.
+	 *
+	 * Be careful to avoid touching the kernel stack.
+	 */
+	ld	r10,PACA_EXGEN+EX_CTR(r13)
+	mtctr	r10
+	mtcrf	0x80,r9
+	ld	r9,PACA_EXGEN+EX_R9(r13)
+	ld	r10,PACA_EXGEN+EX_R10(r13)
+	ld	r11,PACA_EXGEN+EX_R11(r13)
+	ld	r12,PACA_EXGEN+EX_R12(r13)
+	ld	r13,PACA_EXGEN+EX_R13(r13)
+	HRFI_TO_KERNEL
 
 	GEN_KVM hdecrementer
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 1168e8b37e30..bda9cb4a0a5f 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -663,15 +663,6 @@ void timer_broadcast_interrupt(void)
 }
 #endif
 
-/*
- * Hypervisor decrementer interrupts shouldn't occur but are sometimes
- * left pending on exit from a KVM guest.  We don't need to do anything
- * to clear them, as they are edge-triggered.
- */
-void hdec_interrupt(struct pt_regs *regs)
-{
-}
-
 #ifdef CONFIG_SUSPEND
 static void generic_suspend_disable_irqs(void)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 17/32] powerpc/64s/exception: re-inline some handlers
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (15 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 16/32] powerpc/64s/exception: hdecrementer avoid touching the stack Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 18/32] powerpc/64s/exception: Clean up SRR specifiers Nicholas Piggin
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The reduction in interrupt entry size allows some handlers to be
re-inlined.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index f87dc4bf937d..ae0e68899f0e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1186,7 +1186,7 @@ INT_DEFINE_BEGIN(data_access)
 INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
-	GEN_INT_ENTRY data_access, virt=0, ool=1
+	GEN_INT_ENTRY data_access, virt=0
 EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 	GEN_INT_ENTRY data_access, virt=1
@@ -1216,7 +1216,7 @@ INT_DEFINE_BEGIN(data_access_slb)
 INT_DEFINE_END(data_access_slb)
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-	GEN_INT_ENTRY data_access_slb, virt=0, ool=1
+	GEN_INT_ENTRY data_access_slb, virt=0
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
 	GEN_INT_ENTRY data_access_slb, virt=1
@@ -1472,7 +1472,7 @@ INT_DEFINE_BEGIN(decrementer)
 INT_DEFINE_END(decrementer)
 
 EXC_REAL_BEGIN(decrementer, 0x900, 0x80)
-	GEN_INT_ENTRY decrementer, virt=0, ool=1
+	GEN_INT_ENTRY decrementer, virt=0
 EXC_REAL_END(decrementer, 0x900, 0x80)
 EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
 	GEN_INT_ENTRY decrementer, virt=1
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 18/32] powerpc/64s/exception: Clean up SRR specifiers
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (16 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 17/32] powerpc/64s/exception: re-inline some handlers Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 19/32] powerpc/64s/exception: add more comments for interrupt handlers Nicholas Piggin
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Remove more magic numbers and replace with nicely named bools.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 68 +++++++++++++---------------
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index ae0e68899f0e..b01ff51892dc 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -105,11 +105,6 @@ name:
 	ori	reg,reg,(ABS_ADDR(label))@l;				\
 	addis	reg,reg,(ABS_ADDR(label))@h
 
-/* Exception register prefixes */
-#define EXC_HV_OR_STD	2 /* depends on HVMODE */
-#define EXC_HV		1
-#define EXC_STD		0
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -128,6 +123,7 @@ name:
  */
 #define IVEC		.L_IVEC_\name\()
 #define IHSRR		.L_IHSRR_\name\()
+#define IHSRR_IF_HVMODE	.L_IHSRR_IF_HVMODE_\name\()
 #define IAREA		.L_IAREA_\name\()
 #define IVIRT		.L_IVIRT_\name\()
 #define IISIDE		.L_IISIDE_\name\()
@@ -159,7 +155,10 @@ do_define_int n
 		.error "IVEC not defined"
 	.endif
 	.ifndef IHSRR
-		IHSRR=EXC_STD
+		IHSRR=0
+	.endif
+	.ifndef IHSRR_IF_HVMODE
+		IHSRR_IF_HVMODE=0
 	.endif
 	.ifndef IAREA
 		IAREA=PACA_EXGEN
@@ -257,7 +256,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	ld	r9,IAREA+EX_R9(r13)
 	ld	r10,IAREA+EX_R10(r13)
 	/* HSRR variants have the 0x2 bit added to their trap number */
-	.if IHSRR == EXC_HV_OR_STD
+	.if IHSRR_IF_HVMODE
 	BEGIN_FTR_SECTION
 	ori	r12,r12,(IVEC + 0x2)
 	FTR_SECTION_ELSE
@@ -278,7 +277,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	ld	r10,IAREA+EX_R10(r13)
 	ld	r11,IAREA+EX_R11(r13)
 	ld	r12,IAREA+EX_R12(r13)
-	.if IHSRR == EXC_HV_OR_STD
+	.if IHSRR_IF_HVMODE
 	BEGIN_FTR_SECTION
 	b	kvmppc_skip_Hinterrupt
 	FTR_SECTION_ELSE
@@ -403,7 +402,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	stw	r10,IAREA+EX_DSISR(r13)
 	.endif
 
-	.if IHSRR == EXC_HV_OR_STD
+	.if IHSRR_IF_HVMODE
 	BEGIN_FTR_SECTION
 	mfspr	r11,SPRN_HSRR0		/* save HSRR0 */
 	mfspr	r12,SPRN_HSRR1		/* and HSRR1 */
@@ -485,7 +484,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 		.abort "Bad maskable vector"
 		.endif
 
-		.if IHSRR == EXC_HV_OR_STD
+		.if IHSRR_IF_HVMODE
 		BEGIN_FTR_SECTION
 		bne	masked_Hinterrupt
 		FTR_SECTION_ELSE
@@ -618,12 +617,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
  */
-.macro EXCEPTION_RESTORE_REGS hsrr
+.macro EXCEPTION_RESTORE_REGS hsrr=0
 	/* Move original SRR0 and SRR1 into the respective regs */
 	ld	r9,_MSR(r1)
-	.if \hsrr == EXC_HV_OR_STD
-	.error "EXC_HV_OR_STD Not implemented for EXCEPTION_RESTORE_REGS"
-	.endif
 	.if \hsrr
 	mtspr	SPRN_HSRR1,r9
 	.else
@@ -898,7 +894,7 @@ EXC_COMMON_BEGIN(system_reset_common)
 	ld	r10,SOFTE(r1)
 	stb	r10,PACAIRQSOFTMASK(r13)
 
-	EXCEPTION_RESTORE_REGS EXC_STD
+	EXCEPTION_RESTORE_REGS
 	RFI_TO_USER_OR_KERNEL
 
 	GEN_KVM system_reset
@@ -952,7 +948,7 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
 	lhz	r12,PACA_IN_MCE(r13);			\
 	subi	r12,r12,1;				\
 	sth	r12,PACA_IN_MCE(r13);			\
-	EXCEPTION_RESTORE_REGS EXC_STD
+	EXCEPTION_RESTORE_REGS
 
 EXC_COMMON_BEGIN(machine_check_early_common)
 	/*
@@ -1321,7 +1317,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(hardware_interrupt)
 	IVEC=0x500
-	IHSRR=EXC_HV_OR_STD
+	IHSRR_IF_HVMODE=1
 	IMASK=IRQS_DISABLED
 	IKVM_REAL=1
 	IKVM_VIRT=1
@@ -1490,7 +1486,7 @@ EXC_COMMON_BEGIN(decrementer_common)
 
 INT_DEFINE_BEGIN(hdecrementer)
 	IVEC=0x980
-	IHSRR=EXC_HV
+	IHSRR=1
 	ISTACK=0
 	IRECONCILE=0
 	IKVM_REAL=1
@@ -1719,7 +1715,7 @@ EXC_COMMON_BEGIN(single_step_common)
 
 INT_DEFINE_BEGIN(h_data_storage)
 	IVEC=0xe00
-	IHSRR=EXC_HV
+	IHSRR=1
 	IDAR=1
 	IDSISR=1
 	IKVM_SKIP=1
@@ -1751,7 +1747,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(h_instr_storage)
 	IVEC=0xe20
-	IHSRR=EXC_HV
+	IHSRR=1
 	IKVM_REAL=1
 	IKVM_VIRT=1
 INT_DEFINE_END(h_instr_storage)
@@ -1774,7 +1770,7 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 
 INT_DEFINE_BEGIN(emulation_assist)
 	IVEC=0xe40
-	IHSRR=EXC_HV
+	IHSRR=1
 	IKVM_REAL=1
 	IKVM_VIRT=1
 INT_DEFINE_END(emulation_assist)
@@ -1802,7 +1798,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
  */
 INT_DEFINE_BEGIN(hmi_exception_early)
 	IVEC=0xe60
-	IHSRR=EXC_HV
+	IHSRR=1
 	IREALMODE_COMMON=1
 	ISTACK=0
 	IRECONCILE=0
@@ -1812,7 +1808,7 @@ INT_DEFINE_END(hmi_exception_early)
 
 INT_DEFINE_BEGIN(hmi_exception)
 	IVEC=0xe60
-	IHSRR=EXC_HV
+	IHSRR=1
 	IMASK=IRQS_DISABLED
 	IKVM_REAL=1
 INT_DEFINE_END(hmi_exception)
@@ -1834,7 +1830,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common)
 	cmpdi	cr0,r3,0
 	bne	1f
 
-	EXCEPTION_RESTORE_REGS EXC_HV
+	EXCEPTION_RESTORE_REGS hsrr=1
 	HRFI_TO_USER_OR_KERNEL
 
 1:
@@ -1842,7 +1838,7 @@ EXC_COMMON_BEGIN(hmi_exception_early_common)
 	 * Go to virtual mode and pull the HMI event information from
 	 * firmware.
 	 */
-	EXCEPTION_RESTORE_REGS EXC_HV
+	EXCEPTION_RESTORE_REGS hsrr=1
 	GEN_INT_ENTRY hmi_exception, virt=0
 
 	GEN_KVM hmi_exception_early
@@ -1861,7 +1857,7 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 
 INT_DEFINE_BEGIN(h_doorbell)
 	IVEC=0xe80
-	IHSRR=EXC_HV
+	IHSRR=1
 	IMASK=IRQS_DISABLED
 	IKVM_REAL=1
 	IKVM_VIRT=1
@@ -1890,7 +1886,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 
 INT_DEFINE_BEGIN(h_virt_irq)
 	IVEC=0xea0
-	IHSRR=EXC_HV
+	IHSRR=1
 	IMASK=IRQS_DISABLED
 	IKVM_REAL=1
 	IKVM_VIRT=1
@@ -2060,7 +2056,7 @@ EXC_COMMON_BEGIN(facility_unavailable_common)
 
 INT_DEFINE_BEGIN(h_facility_unavailable)
 	IVEC=0xf80
-	IHSRR=EXC_HV
+	IHSRR=1
 	IKVM_REAL=1
 	IKVM_VIRT=1
 INT_DEFINE_END(h_facility_unavailable)
@@ -2096,7 +2092,7 @@ EXC_VIRT_NONE(0x5100, 0x100)
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_system_error)
 	IVEC=0x1200
-	IHSRR=EXC_HV
+	IHSRR=1
 	IKVM_SKIP=1
 	IKVM_REAL=1
 INT_DEFINE_END(cbe_system_error)
@@ -2147,8 +2143,8 @@ EXC_VIRT_NONE(0x5400, 0x100)
 
 INT_DEFINE_BEGIN(denorm_exception)
 	IVEC=0x1500
-	IHSRR=EXC_HV
-	IBRANCH_TO_COMMON=0
+	IHSRR=1
+	IBRANCH_COMMON=0
 	IKVM_REAL=1
 INT_DEFINE_END(denorm_exception)
 
@@ -2256,7 +2252,7 @@ EXC_COMMON_BEGIN(denorm_exception_common)
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_maintenance)
 	IVEC=0x1600
-	IHSRR=EXC_HV
+	IHSRR=1
 	IKVM_SKIP=1
 	IKVM_REAL=1
 INT_DEFINE_END(cbe_maintenance)
@@ -2308,7 +2304,7 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 #ifdef CONFIG_CBE_RAS
 INT_DEFINE_BEGIN(cbe_thermal)
 	IVEC=0x1800
-	IHSRR=EXC_HV
+	IHSRR=1
 	IKVM_SKIP=1
 	IKVM_REAL=1
 INT_DEFINE_END(cbe_thermal)
@@ -2371,7 +2367,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
  * - Else it is one of PACA_IRQ_MUST_HARD_MASK, so hard disable and return.
  * This is called with r10 containing the value to OR to the paca field.
  */
-.macro MASKED_INTERRUPT hsrr
+.macro MASKED_INTERRUPT hsrr=0
 	.if \hsrr
 masked_Hinterrupt:
 	.else
@@ -2518,8 +2514,8 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
 	hrfid
 
 USE_TEXT_SECTION()
-	MASKED_INTERRUPT EXC_STD
-	MASKED_INTERRUPT EXC_HV
+	MASKED_INTERRUPT
+	MASKED_INTERRUPT hsrr=1
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 kvmppc_skip_interrupt:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 19/32] powerpc/64s/exception: add more comments for interrupt handlers
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (17 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 18/32] powerpc/64s/exception: Clean up SRR specifiers Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 20/32] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported Nicholas Piggin
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

A few of the non-standard handlers are left uncommented. Some more
description could be added to some.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 391 ++++++++++++++++++++++++---
 1 file changed, 353 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b01ff51892dc..e976cbf4f4aa 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,26 +121,26 @@ name:
 /*
  * Interrupt code generation macros
  */
-#define IVEC		.L_IVEC_\name\()
-#define IHSRR		.L_IHSRR_\name\()
-#define IHSRR_IF_HVMODE	.L_IHSRR_IF_HVMODE_\name\()
-#define IAREA		.L_IAREA_\name\()
-#define IVIRT		.L_IVIRT_\name\()
-#define IISIDE		.L_IISIDE_\name\()
-#define IDAR		.L_IDAR_\name\()
-#define IDSISR		.L_IDSISR_\name\()
-#define ISET_RI		.L_ISET_RI_\name\()
-#define IBRANCH_TO_COMMON	.L_IBRANCH_TO_COMMON_\name\()
-#define IREALMODE_COMMON	.L_IREALMODE_COMMON_\name\()
-#define IMASK		.L_IMASK_\name\()
-#define IKVM_SKIP	.L_IKVM_SKIP_\name\()
-#define IKVM_REAL	.L_IKVM_REAL_\name\()
+#define IVEC		.L_IVEC_\name\()	/* Interrupt vector address */
+#define IHSRR		.L_IHSRR_\name\()	/* Sets SRR or HSRR registers */
+#define IHSRR_IF_HVMODE	.L_IHSRR_IF_HVMODE_\name\() /* HSRR if HV else SRR */
+#define IAREA		.L_IAREA_\name\()	/* PACA save area */
+#define IVIRT		.L_IVIRT_\name\()	/* Has virt mode entry point */
+#define IISIDE		.L_IISIDE_\name\()	/* Uses SRR0/1 not DAR/DSISR */
+#define IDAR		.L_IDAR_\name\()	/* Uses DAR (or SRR0) */
+#define IDSISR		.L_IDSISR_\name\()	/* Uses DSISR (or SRR1) */
+#define ISET_RI		.L_ISET_RI_\name\()	/* Run common code w/ MSR[RI]=1 */
+#define IBRANCH_TO_COMMON	.L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch to common */
+#define IREALMODE_COMMON	.L_IREALMODE_COMMON_\name\() /* Common runs in realmode */
+#define IMASK		.L_IMASK_\name\()	/* IRQ soft-mask bit */
+#define IKVM_SKIP	.L_IKVM_SKIP_\name\()	/* Generate KVM skip handler */
+#define IKVM_REAL	.L_IKVM_REAL_\name\()	/* Real entry tests KVM */
 #define __IKVM_REAL(name)	.L_IKVM_REAL_ ## name
-#define IKVM_VIRT	.L_IKVM_VIRT_\name\()
-#define ISTACK		.L_ISTACK_\name\()
+#define IKVM_VIRT	.L_IKVM_VIRT_\name\()	/* Virt entry tests KVM */
+#define ISTACK		.L_ISTACK_\name\()	/* Set regular kernel stack */
 #define __ISTACK(name)	.L_ISTACK_ ## name
-#define IRECONCILE	.L_IRECONCILE_\name\()
-#define IKUAP		.L_IKUAP_\name\()
+#define IRECONCILE	.L_IRECONCILE_\name\()	/* Do RECONCILE_IRQ_STATE */
+#define IKUAP		.L_IKUAP_\name\()	/* Do KUAP lock */
 
 #define INT_DEFINE_BEGIN(n)						\
 .macro int_define_ ## n name
@@ -759,6 +759,39 @@ __start_interrupts:
 EXC_VIRT_NONE(0x4000, 0x100)
 
 
+/**
+ * Interrupt 0x100 - System Reset Interrupt (SRESET aka NMI).
+ * This is a non-maskable, asynchronous interrupt always taken in real-mode.
+ * It is caused by:
+ * - Wake from power-saving state, on powernv.
+ * - An NMI from another CPU, triggered by firmware or hypercall.
+ * - As crash/debug signal injected from BMC, firmware or hypervisor.
+ *
+ * Handling:
+ * Power-save wakeup is the only performance critical path, so this is
+ * determined quickly as possible first. In this case volatile registers
+ * can be discarded and SPRs like CFAR don't need to be read.
+ *
+ * If not a powersave wakeup, then it's run as a regular interrupt, however
+ * it uses its own stack and PACA save area to preserve the regular kernel
+ * environment for debugging.
+ *
+ * This interrupt is not maskable, so triggering it when MSR[RI] is clear,
+ * or SCRATCH0 is in use, etc. may cause a crash. It's also not entirely
+ * correct to switch to virtual mode to run the regular interrupt handler
+ * because it might be interrupted when the MMU is in a bad state (e.g., SLB
+ * is clear).
+ *
+ * FWNMI:
+ * PAPR specifies a "fwnmi" facility which sends the sreset to a different
+ * entry point with a different register set up. Some hypervisors will
+ * send the sreset to 0x100 in the guest if it is not fwnmi capable.
+ *
+ * KVM:
+ * Unlike most SRR interrupts, this may be taken by the host while executing
+ * in a guest, so a KVM test is required. KVM will pull the CPU out of guest
+ * mode and then raise the sreset.
+ */
 INT_DEFINE_BEGIN(system_reset)
 	IVEC=0x100
 	IAREA=PACA_EXNMI
@@ -834,6 +867,7 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake)
  * Vectors for the FWNMI option.  Share common code.
  */
 TRAMP_REAL_BEGIN(system_reset_fwnmi)
+	/* XXX: fwnmi guest could run a nested/PR guest, so why no test?  */
 	__IKVM_REAL(system_reset)=0
 	GEN_INT_ENTRY system_reset, virt=0
 
@@ -900,6 +934,44 @@ EXC_COMMON_BEGIN(system_reset_common)
 	GEN_KVM system_reset
 
 
+/**
+ * Interrupt 0x200 - Machine Check Interrupt (MCE).
+ * This is a non-maskable interrupt always taken in real-mode. It can be
+ * synchronous or asynchronous, caused by hardware or software, and it may be
+ * taken in a power-saving state.
+ *
+ * Handling:
+ * Similarly to system reset, this uses its own stack and PACA save area,
+ * the difference is re-entrancy is allowed on the machine check stack.
+ *
+ * machine_check_early is run in real mode, and carefully decodes the
+ * machine check and tries to handle it (e.g., flush the SLB if there was an
+ * error detected there), determines if it was recoverable and logs the
+ * event.
+ *
+ * Then, depending on the execution context when the interrupt is taken, there
+ * are 3 main actions:
+ * - Executing in kernel mode. The event is queued with irq_work, which means
+ *   it is handled when it is next safe to do so (i.e., the kernel has enabled
+ *   interrupts), which could be immediately when the interrupt returns. This
+ *   avoids nasty issues like switching to virtual mode when the MMU is in a
+ *   bad state, or when executing OPAL code. (SRESET is exposed to such issues,
+ *   but it has different priorities). Check to see if the CPU was in power
+ *   save, and return via the wake up code if it was.
+ *
+ * - Executing in user mode. machine_check_exception is run like a normal
+ *   interrupt handler, which processes the data generated by the early handler.
+ *
+ * - Executing in guest mode. The interrupt is run with its KVM test, and
+ *   branches to KVM to deal with. KVM may queue the event for the host
+ *   to report later.
+ *
+ * This interrupt is not maskable, so if it triggers when MSR[RI] is clear,
+ * or SCRATCH0 is in use, it may cause a crash.
+ *
+ * KVM:
+ * See SRESET.
+ */
 INT_DEFINE_BEGIN(machine_check_early)
 	IVEC=0x200
 	IAREA=PACA_EXMC
@@ -1159,19 +1231,28 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 
 
 /**
- * 0x300 - Data Storage Interrupt (DSI)
- * This interrupt is generated due to a data access which does not have a valid
- * page table entry with permissions to allow the data access to be performed.
- * DAWR matches also fault here, as do RC updates, and minor misc errors e.g.,
- * copy/paste, AMO, certain invalid CI accesses, etc.
+ * Interrupt 0x300 - Data Storage Interrupt (DSI).
+ * This is a synchronous interrupt generated due to a data access exception,
+ * e.g., a load orstore which does not have a valid page table entry with
+ * permissions. DAWR matches also fault here, as do RC updates, and minor misc
+ * errors e.g., copy/paste, AMO, certain invalid CI accesses, etc.
+ *
+ * Handling:
+ * - Hash MMU
+ *   Go to do_hash_page first to see if the HPT can be filled from an entry in
+ *   the Linux page table. Hash faults can hit in kernel mode in a fairly
+ *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
+ *   "non-bolted" regions, e.g., vmalloc space. However these should always be
+ *   backed by Linux page tables.
  *
- * This interrupt is delivered to the guest (HV bit unchanged).
+ *   If none is found, do a Linux page fault. Linux page faults can happen in
+ *   kernel mode due to user copy operations of course.
  *
- * Linux HPT responds by first attempting to refill the hash table from the
- * Linux page table, then going to a full page fault if the Linux page table
- * entry was insufficient. RPT goes straight to full page fault.
+ * - Radix MMU
+ *   The hardware loads from the Linux page table directly, so a fault goes
+ *   immediately to Linux page fault.
  *
- * PR KVM ...?
+ * Conditions like DAWR match are handled on the way in to Linux page fault.
  */
 INT_DEFINE_BEGIN(data_access)
 	IVEC=0x300
@@ -1202,6 +1283,24 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	GEN_KVM data_access
 
 
+/**
+ * Interrupt 0x380 - Data Segment Interrupt (DSLB).
+ * This is a synchronous interrupt in response to an MMU fault missing SLB
+ * entry for HPT, or an address outside RPT translation range.
+ *
+ * Handling:
+ * - HPT:
+ *   This refills the SLB, or reports an access fault similarly to a bad page
+ *   fault. When coming from user-mode, the SLB handler may access any kernel
+ *   data, though it may itself take a DSLB. When coming from kernel mode,
+ *   recursive faults must be avoided so access is restricted to the kernel
+ *   image text/data, kernel stack, and any data allocated below
+ *   ppc64_bolted_size (first segment). The kernel handler must avoid stomping
+ *   on user-handler data structures.
+ *
+ * A dedicated save area EXSLB is used (XXX: but it actually need not be
+ * these days, we could use EXGEN).
+ */
 INT_DEFINE_BEGIN(data_access_slb)
 	IVEC=0x380
 	IAREA=PACA_EXSLB
@@ -1244,6 +1343,15 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	GEN_KVM data_access_slb
 
 
+/**
+ * Interrupt 0x400 - Instruction Storage Interrupt (ISI).
+ * This is a synchronous interrupt in response to an MMU fault due to an
+ * instruction fetch.
+ *
+ * Handling:
+ * Similar to DSI, though in response to fetch. The faulting address is found
+ * in SRR0 (rather than DAR), and status in SRR1 (rather than DSISR).
+ */
 INT_DEFINE_BEGIN(instruction_access)
 	IVEC=0x400
 	IISIDE=1
@@ -1273,6 +1381,15 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	GEN_KVM instruction_access
 
 
+/**
+ * Interrupt 0x480 - Instruction Segment Interrupt (ISLB).
+ * This is a synchronous interrupt in response to an MMU fault due to an
+ * instruction fetch.
+ *
+ * Handling:
+ * Similar to DSLB, though in response to fetch. The faulting address is found
+ * in SRR0 (rather than DAR).
+ */
 INT_DEFINE_BEGIN(instruction_access_slb)
 	IVEC=0x480
 	IAREA=PACA_EXSLB
@@ -1315,6 +1432,29 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	GEN_KVM instruction_access_slb
 
 
+/**
+ * Interrupt 0x500 - External Interrupt.
+ * This is an asynchronous maskable interrupt in response to an "external
+ * exception" from the interrupt controller or hypervisor (e.g., device
+ * interrupt). It is maskable in hardware by clearing MSR[EE], and
+ * soft-maskable with IRQS_DISABLED mask (i.e., local_irq_disable()).
+ *
+ * When running in HV mode, Linux sets up the LPCR[LPES] bit such that
+ * interrupts are delivered with HSRR registers, guests use SRRs, which
+ * reqiures IHSRR_IF_HVMODE.
+ *
+ * On bare metal POWER9 and later, Linux sets the LPCR[HVICE] bit such that
+ * external interrupts are delivered as Hypervisor Virtualization Interrupts
+ * rather than External Interrupts.
+ *
+ * Handling:
+ * This calls into Linux IRQ handler. NVGPRs are not saved to reduce overhead,
+ * because registers at the time of the interrupt are not so important as it is
+ * asynchronous.
+ *
+ * If soft masked, the masked handler will note the pending interrupt for
+ * replay, and clear MSR[EE] in the interrupted context.
+ */
 INT_DEFINE_BEGIN(hardware_interrupt)
 	IVEC=0x500
 	IHSRR_IF_HVMODE=1
@@ -1340,6 +1480,10 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	GEN_KVM hardware_interrupt
 
 
+/**
+ * Interrupt 0x600 - Alignment Interrupt
+ * This is a synchronous interrupt in response to data alignment fault.
+ */
 INT_DEFINE_BEGIN(alignment)
 	IVEC=0x600
 	IDAR=1
@@ -1363,6 +1507,15 @@ EXC_COMMON_BEGIN(alignment_common)
 	GEN_KVM alignment
 
 
+/**
+ * Interrupt 0x700 - Program Interrupt (program check).
+ * This is a synchronous interrupt in response to various instruction faults:
+ * traps, privilege errors, TM errors, floating point exceptions.
+ *
+ * Handling:
+ * This interrupt may use the "emergency stack" in some cases when being taken
+ * from kernel context, which complicates handling.
+ */
 INT_DEFINE_BEGIN(program_check)
 	IVEC=0x700
 	IKVM_REAL=1
@@ -1416,6 +1569,15 @@ EXC_COMMON_BEGIN(program_check_common)
 	GEN_KVM program_check
 
 
+/*
+ * Interrupt 0x800 - Floating-Point Unavailable Interrupt.
+ * This is a synchronous interrupt in response to executing an fp instruction
+ * with MSR[FP]=0.
+ *
+ * Handling:
+ * This will load FP registers and enable the FP bit if coming from userspace,
+ * otherwise report a bad kernel use of FP.
+ */
 INT_DEFINE_BEGIN(fp_unavailable)
 	IVEC=0x800
 	IRECONCILE=0
@@ -1461,6 +1623,23 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 	GEN_KVM fp_unavailable
 
 
+/**
+ * Interrupt 0x900 - Decrementer Interrupt.
+ * This is an asynchronous interrupt in response to a decrementer exception
+ * (e.g., DEC has wrapped below zero). It is maskable in hardware by clearing
+ * MSR[EE], and soft-maskable with IRQS_DISABLED mask (i.e.,
+ * local_irq_disable()).
+ *
+ * Handling:
+ * This calls into Linux timer handler. NVGPRs are not saved (see 0x500).
+ *
+ * If soft masked, the masked handler will note the pending interrupt for
+ * replay, and bump the decrementer to a high value, leaving MSR[EE] enabled
+ * in the interrupted context.
+ * If PPC_WATCHDOG is configured, the soft masked handler will actually set
+ * things back up to run soft_nmi_interrupt as a regular interrupt handler
+ * on the emergency stack.
+ */
 INT_DEFINE_BEGIN(decrementer)
 	IVEC=0x900
 	IMASK=IRQS_DISABLED
@@ -1484,6 +1663,16 @@ EXC_COMMON_BEGIN(decrementer_common)
 	GEN_KVM decrementer
 
 
+/**
+ * Interrupt 0x980 - Hypervisor Decrementer Interrupt.
+ * This is an asynchronous interrupt, similar to 0x900 but for the HDEC
+ * register.
+ *
+ * Handling:
+ * Linux does not use this outside KVM where it's used to keep a host timer
+ * while the guest is given control of DEC. It should normally be caught by
+ * the KVM test and routed there.
+ */
 INT_DEFINE_BEGIN(hdecrementer)
 	IVEC=0x980
 	IHSRR=1
@@ -1522,6 +1711,20 @@ EXC_COMMON_BEGIN(hdecrementer_common)
 	GEN_KVM hdecrementer
 
 
+/**
+ * Interrupt 0xa00 - Directed Privileged Doorbell Interrupt.
+ * This is an asynchronous interrupt in response to a msgsndp doorbell.
+ * It is maskable in hardware by clearing MSR[EE], and soft-maskable with
+ * IRQS_DISABLED mask (i.e., local_irq_disable()).
+ *
+ * Handling:
+ * Guests may use this for IPIs between threads in a core if the
+ * hypervisor supports it. NVGPRS are not saved (see 0x500).
+ *
+ * If soft masked, the masked handler will note the pending interrupt for
+ * replay, leaving MSR[EE] enabled in the interrupted context because the
+ * doorbells are edge triggered.
+ */
 INT_DEFINE_BEGIN(doorbell_super)
 	IVEC=0xa00
 	IMASK=IRQS_DISABLED
@@ -1552,16 +1755,20 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 EXC_REAL_NONE(0xb00, 0x100)
 EXC_VIRT_NONE(0x4b00, 0x100)
 
-/*
- * system call / hypercall (0xc00, 0x4c00)
- *
- * The system call exception is invoked with "sc 0" and does not alter HV bit.
- *
- * The hypercall is invoked with "sc 1" and sets HV=1.
+/**
+ * Interrupt 0xc00 - System Call Interrupt (syscall, hcall).
+ * This is a synchronous interrupt invoked with the "sc" instruction. The
+ * system call is invoked with "sc 0" and does not alter the HV bit, so it
+ * is directed to the currently running OS. The hypercall is invoked with
+ * "sc 1" and it sets HV=1, so it elevates to hypervisor.
  *
  * In HPT, sc 1 always goes to 0xc00 real mode. In RADIX, sc 1 can go to
  * 0x4c00 virtual mode.
  *
+ * Handling:
+ * If the KVM test fires then it was due to a hypercall and is accordingly
+ * routed to KVM. Otherwise this executes a normal Linux system call.
+ *
  * Call convention:
  *
  * syscall and hypercalls register conventions are documented in
@@ -1692,6 +1899,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 #endif
 
 
+/**
+ * Interrupt 0xd00 - Trace Interrupt.
+ * This is a synchronous interrupt in response to instruction step or
+ * breakpoint faults.
+ */
 INT_DEFINE_BEGIN(single_step)
 	IVEC=0xd00
 	IKVM_REAL=1
@@ -1713,6 +1925,18 @@ EXC_COMMON_BEGIN(single_step_common)
 	GEN_KVM single_step
 
 
+/**
+ * Interrupt 0xe00 - Hypervisor Data Storage Interrupt (HDSI).
+ * This is a synchronous interrupt in response to an MMU fault caused by a
+ * guest data access.
+ *
+ * Handling:
+ * This should always get routed to KVM. In radix MMU mode, this is caused
+ * by a guest nested radix access that can't be performed due to the
+ * partition scope page table. In hash mode, this can be caused by guests
+ * running with translation disabled (virtual real mode) or with VPM enabled.
+ * KVM will update the page table structures or disallow the access.
+ */
 INT_DEFINE_BEGIN(h_data_storage)
 	IVEC=0xe00
 	IHSRR=1
@@ -1745,6 +1969,11 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 	GEN_KVM h_data_storage
 
 
+/**
+ * Interrupt 0xe20 - Hypervisor Instruction Storage Interrupt (HISI).
+ * This is a synchronous interrupt in response to an MMU fault caused by a
+ * guest instruction fetch, similar to HDSI.
+ */
 INT_DEFINE_BEGIN(h_instr_storage)
 	IVEC=0xe20
 	IHSRR=1
@@ -1768,6 +1997,9 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 	GEN_KVM h_instr_storage
 
 
+/**
+ * Interrupt 0xe40 - Hypervisor Emulation Assistance Interrupt.
+ */
 INT_DEFINE_BEGIN(emulation_assist)
 	IVEC=0xe40
 	IHSRR=1
@@ -1791,10 +2023,29 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 	GEN_KVM emulation_assist
 
 
-/*
- * hmi_exception trampoline is a special case. It jumps to hmi_exception_early
- * first, and then eventaully from there to the trampoline to get into virtual
- * mode.
+/**
+ * Interrupt 0xe60 - Hypervisor Maintenance Interrupt (HMI).
+ * This is an asynchronous interrupt caused by a Hypervisor Maintenance
+ * Exception. It is always taken in real mode but uses HSRR registers
+ * unlike SRESET and MCE.
+ *
+ * It is maskable in hardware by clearing MSR[EE], and partially soft-maskable
+ * with IRQS_DISABLED mask (i.e., local_irq_disable()).
+ *
+ * Handling:
+ * This is a special case, this is handled similarly to machine checks, with an
+ * initial real mode handler that is not soft-masked, which attempts to fix the
+ * problem. Then a regular handler which is soft-maskable and reports the
+ * problem.
+ *
+ * The emergency stack is used for the early real mode handler.
+ *
+ * XXX: unclear why MCE and HMI schemes could not be made common, e.g.,
+ * either use soft-masking for the MCE, or use irq_work for the HMI.
+ *
+ * KVM:
+ * Unlike MCE, this calls into KVM without calling the real mode handler
+ * first.
  */
 INT_DEFINE_BEGIN(hmi_exception_early)
 	IVEC=0xe60
@@ -1855,6 +2106,11 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 	GEN_KVM hmi_exception
 
 
+/**
+ * Interrupt 0xe80 - Directed Hypervisor Doorbell Interrupt.
+ * This is an asynchronous interrupt in response to a msgsnd doorbell.
+ * Similar to the 0xa00 doorbell but for host rather than guest.
+ */
 INT_DEFINE_BEGIN(h_doorbell)
 	IVEC=0xe80
 	IHSRR=1
@@ -1884,6 +2140,11 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 	GEN_KVM h_doorbell
 
 
+/**
+ * Interrupt 0xea0 - Hypervisor Virtualization Interrupt.
+ * This is an asynchronous interrupt in response to an "external exception".
+ * Similar to 0x500 but for host only.
+ */
 INT_DEFINE_BEGIN(h_virt_irq)
 	IVEC=0xea0
 	IHSRR=1
@@ -1915,6 +2176,22 @@ EXC_REAL_NONE(0xee0, 0x20)
 EXC_VIRT_NONE(0x4ee0, 0x20)
 
 
+/*
+ * Interrupt 0xf00 - Performance Monitor Interrupt (PMI, PMU).
+ * This is an asynchronous interrupt in response to a PMU exception.
+ * It is maskable in hardware by clearing MSR[EE], and soft-maskable with
+ * IRQS_PMI_DISABLED mask (NOTE: NOT local_irq_disable()).
+ *
+ * Handling:
+ * This calls into the perf subsystem.
+ *
+ * Like the watchdog soft-nmi, it appears an NMI interrupt to Linux, in that it
+ * runs under local_irq_disable. However it may be soft-masked in
+ * powerpc-specific code.
+ *
+ * If soft masked, the masked handler will note the pending interrupt for
+ * replay, and clear MSR[EE] in the interrupted context.
+ */
 INT_DEFINE_BEGIN(performance_monitor)
 	IVEC=0xf00
 	IMASK=IRQS_PMI_DISABLED
@@ -1938,6 +2215,12 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	GEN_KVM performance_monitor
 
 
+/**
+ * Interrupt 0xf20 - Vector Unavailable Interrupt.
+ * This is a synchronous interrupt in response to
+ * executing a vector (or altivec) instruction with MSR[VEC]=0.
+ * Similar to FP unavailable.
+ */
 INT_DEFINE_BEGIN(altivec_unavailable)
 	IVEC=0xf20
 	IRECONCILE=0
@@ -1986,6 +2269,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 	GEN_KVM altivec_unavailable
 
 
+/**
+ * Interrupt 0xf40 - VSX Unavailable Interrupt.
+ * This is a synchronous interrupt in response to
+ * executing a VSX instruction with MSR[VSX]=0.
+ * Similar to FP unavailable.
+ */
 INT_DEFINE_BEGIN(vsx_unavailable)
 	IVEC=0xf40
 	IRECONCILE=0
@@ -2033,6 +2322,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	GEN_KVM vsx_unavailable
 
 
+/**
+ * Interrupt 0xf60 - Facility Unavailable Interrupt.
+ * This is a synchronous interrupt in response to
+ * executing an instruction without access to the facility that can be
+ * resolved by the OS (e.g., FSCR, MSR).
+ * Similar to FP unavailable.
+ */
 INT_DEFINE_BEGIN(facility_unavailable)
 	IVEC=0xf60
 	IKVM_REAL=1
@@ -2054,6 +2350,13 @@ EXC_COMMON_BEGIN(facility_unavailable_common)
 	GEN_KVM facility_unavailable
 
 
+/**
+ * Interrupt 0xf60 - Hypervisor Facility Unavailable Interrupt.
+ * This is a synchronous interrupt in response to
+ * executing an instruction without access to the facility that can only
+ * be resolved in HV mode (e.g., HFSCR).
+ * Similar to FP unavailable.
+ */
 INT_DEFINE_BEGIN(h_facility_unavailable)
 	IVEC=0xf80
 	IHSRR=1
@@ -2141,6 +2444,18 @@ EXC_COMMON_BEGIN(instruction_breakpoint_common)
 EXC_REAL_NONE(0x1400, 0x100)
 EXC_VIRT_NONE(0x5400, 0x100)
 
+/**
+ * Interrupt 0x1500 - Soft Patch Interrupt
+ *
+ * Handling:
+ * This is an implementation specific interrupt which can be used for a
+ * range of exceptions.
+ *
+ * This interrupt handler is unique in that it runs the denormal assist
+ * code even for guests (and even in guest context) without going to KVM,
+ * for speed. POWER9 does not raise denorm exceptions, so this special case
+ * could be phased out in future to reduce special cases.
+ */
 INT_DEFINE_BEGIN(denorm_exception)
 	IVEC=0x1500
 	IHSRR=1
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 20/32] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (18 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 19/32] powerpc/64s/exception: add more comments for interrupt handlers Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 21/32] powerpc/64s/exception: sreset interrupts reconcile fix Nicholas Piggin
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Apart from SRESET, MCE, and syscall (hcall variant), the SRR type
interrupts are not escalated to hypervisor mode, so delivered to the OS.

When running PR KVM, the OS is the hypervisor, and the guest runs with
MSR[PR]=1, so these interrupts must test if a guest was running when
interrupted. These tests are required at the real-mode entry points
because the PR KVM host runs with LPCR[AIL]=0.

In HV KVM and nested HV KVM, the guest always receives these interrupts,
so there is no need for the host to make this test. So remove the tests
if PR KVM is not configured.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 65 ++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index e976cbf4f4aa..c23eb9c572b2 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -214,9 +214,36 @@ do_define_int n
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
- * If hv is possible, interrupts come into to the hv version
- * of the kvmppc_interrupt code, which then jumps to the PR handler,
- * kvmppc_interrupt_pr, if the guest is a PR guest.
+ * All interrupts which set HSRR registers, as well as SRESET and MCE and
+ * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
+ * so they all generally need to test whether they were taken in guest context.
+ *
+ * Note: SRESET and MCE may also be sent to the guest by the hypervisor, and be
+ * taken with MSR[HV]=0.
+ *
+ * Interrupts which set SRR registers (with the above exceptions) do not
+ * elevate to MSR[HV]=1 mode, though most can be taken when running with
+ * MSR[HV]=1  (e.g., bare metal kernel and userspace). So these interrupts do
+ * not need to test whether a guest is running because they get delivered to
+ * the guest directly, including nested HV KVM guests.
+ *
+ * The exception is PR KVM, where the guest runs with MSR[PR]=1 and the host
+ * runs with MSR[HV]=0, so the host takes all interrupts on behalf of the
+ * guest. PR KVM runs with LPCR[AIL]=0 which causes interrupts to always be
+ * delivered to the real-mode entry point, therefore such interrupts only test
+ * KVM in their real mode handlers, and only when PR KVM is possible.
+ *
+ * Interrupts that are taken in MSR[HV]=0 and escalate to MSR[HV]=1 are always
+ * delivered in real-mode when the MMU is in hash mode because the MMU
+ * registers are not set appropriately to translate host addresses. In nested
+ * radix mode these can be delivered in virt-mode as the host translations are
+ * used implicitly (see: effective LPID, effective PID).
+ */
+
+/*
+ * If an interrupt is taken while a guest is running, it is immediately routed
+ * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go first
+ * to kvmppc_interrupt_hv, which handles the PR guest case.
  */
 #define kvmppc_interrupt kvmppc_interrupt_hv
 #else
@@ -1258,8 +1285,10 @@ INT_DEFINE_BEGIN(data_access)
 	IVEC=0x300
 	IDAR=1
 	IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_SKIP=1
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
@@ -1306,8 +1335,10 @@ INT_DEFINE_BEGIN(data_access_slb)
 	IAREA=PACA_EXSLB
 	IRECONCILE=0
 	IDAR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_SKIP=1
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(data_access_slb)
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
@@ -1357,7 +1388,9 @@ INT_DEFINE_BEGIN(instruction_access)
 	IISIDE=1
 	IDAR=1
 	IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_access)
 
 EXC_REAL_BEGIN(instruction_access, 0x400, 0x80)
@@ -1396,7 +1429,9 @@ INT_DEFINE_BEGIN(instruction_access_slb)
 	IRECONCILE=0
 	IISIDE=1
 	IDAR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_access_slb)
 
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
@@ -1488,7 +1523,9 @@ INT_DEFINE_BEGIN(alignment)
 	IVEC=0x600
 	IDAR=1
 	IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(alignment)
 
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
@@ -1518,7 +1555,9 @@ EXC_COMMON_BEGIN(alignment_common)
  */
 INT_DEFINE_BEGIN(program_check)
 	IVEC=0x700
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(program_check)
 
 EXC_REAL_BEGIN(program_check, 0x700, 0x100)
@@ -1581,7 +1620,9 @@ EXC_COMMON_BEGIN(program_check_common)
 INT_DEFINE_BEGIN(fp_unavailable)
 	IVEC=0x800
 	IRECONCILE=0
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(fp_unavailable)
 
 EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100)
@@ -1643,7 +1684,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 INT_DEFINE_BEGIN(decrementer)
 	IVEC=0x900
 	IMASK=IRQS_DISABLED
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(decrementer)
 
 EXC_REAL_BEGIN(decrementer, 0x900, 0x80)
@@ -1728,7 +1771,9 @@ EXC_COMMON_BEGIN(hdecrementer_common)
 INT_DEFINE_BEGIN(doorbell_super)
 	IVEC=0xa00
 	IMASK=IRQS_DISABLED
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(doorbell_super)
 
 EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100)
@@ -1906,7 +1951,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
  */
 INT_DEFINE_BEGIN(single_step)
 	IVEC=0xd00
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(single_step)
 
 EXC_REAL_BEGIN(single_step, 0xd00, 0x100)
@@ -2195,7 +2242,9 @@ EXC_VIRT_NONE(0x4ee0, 0x20)
 INT_DEFINE_BEGIN(performance_monitor)
 	IVEC=0xf00
 	IMASK=IRQS_PMI_DISABLED
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(performance_monitor)
 
 EXC_REAL_BEGIN(performance_monitor, 0xf00, 0x20)
@@ -2224,7 +2273,9 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 INT_DEFINE_BEGIN(altivec_unavailable)
 	IVEC=0xf20
 	IRECONCILE=0
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(altivec_unavailable)
 
 EXC_REAL_BEGIN(altivec_unavailable, 0xf20, 0x20)
@@ -2278,7 +2329,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 INT_DEFINE_BEGIN(vsx_unavailable)
 	IVEC=0xf40
 	IRECONCILE=0
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(vsx_unavailable)
 
 EXC_REAL_BEGIN(vsx_unavailable, 0xf40, 0x20)
@@ -2331,7 +2384,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
  */
 INT_DEFINE_BEGIN(facility_unavailable)
 	IVEC=0xf60
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(facility_unavailable)
 
 EXC_REAL_BEGIN(facility_unavailable, 0xf60, 0x20)
@@ -2421,8 +2476,10 @@ EXC_VIRT_NONE(0x5200, 0x100)
 
 INT_DEFINE_BEGIN(instruction_breakpoint)
 	IVEC=0x1300
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_SKIP=1
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_breakpoint)
 
 EXC_REAL_BEGIN(instruction_breakpoint, 0x1300, 0x100)
@@ -2593,7 +2650,9 @@ EXC_VIRT_NONE(0x5600, 0x100)
 
 INT_DEFINE_BEGIN(altivec_assist)
 	IVEC=0x1700
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	IKVM_REAL=1
+#endif
 INT_DEFINE_END(altivec_assist)
 
 EXC_REAL_BEGIN(altivec_assist, 0x1700, 0x100)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 21/32] powerpc/64s/exception: sreset interrupts reconcile fix
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (19 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 20/32] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 22/32] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except Nicholas Piggin
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

This adds IRQ_HARD_DIS to irq_happened. Although it doesn't seem to
matter much because we're not allowed to enable irqs in an NMI handler,
the soft-irq debugging code is becoming more strict about ensuring
IRQ_HARD_DIS is in sync with MSR[EE], this may help avoid asserts or
other issues.

Add a comments explaining why MCE does not have this. Early machine
check is generally much smaller and more contained code which will
explode if you look at it wrong anyway as it runs in real mode, though
there's an argument that we should do similar reconciling for the MCE
as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index c23eb9c572b2..6ff5ea236b17 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -920,18 +920,19 @@ EXC_COMMON_BEGIN(system_reset_common)
 	__GEN_COMMON_BODY system_reset
 	bl	save_nvgprs
 	/*
-	 * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does
+	 * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
 	 * the right thing. We do not want to reconcile because that goes
 	 * through irq tracing which we don't want in NMI.
 	 *
-	 * Save PACAIRQHAPPENED because some code will do a hard disable
-	 * (e.g., xmon). So we want to restore this back to where it was
-	 * when we return. DAR is unused in the stack, so save it there.
+	 * Save PACAIRQHAPPENED to _DAR (otherwise unused), and set HARD_DIS
+	 * as we are running with MSR[EE]=0.
 	 */
 	li	r10,IRQS_ALL_DISABLED
 	stb	r10,PACAIRQSOFTMASK(r13)
 	lbz	r10,PACAIRQHAPPENED(r13)
 	std	r10,_DAR(r1)
+	ori	r10,r10,PACA_IRQ_HARD_DIS
+	stb	r10,PACAIRQHAPPENED(r13)
 
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	system_reset_exception
@@ -976,6 +977,11 @@ EXC_COMMON_BEGIN(system_reset_common)
  * error detected there), determines if it was recoverable and logs the
  * event.
  *
+ * This early code does not "reconcile" irq soft-mask state like SRESET or
+ * regular interrupts do, so irqs_disabled() among other things may not work
+ * properly (irq disable/enable already doesn't work because irq tracing can
+ * not work in real mode).
+ *
  * Then, depending on the execution context when the interrupt is taken, there
  * are 3 main actions:
  * - Executing in kernel mode. The event is queued with irq_work, which means
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 22/32] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (20 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 21/32] powerpc/64s/exception: sreset interrupts reconcile fix Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 23/32] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The soft nmi handler does not reconcile interrupt state, so it should
not return via the normal ret_from_except path. Return like other NMIs,
using the EXCEPTION_RESTORE_REGS macro.

This becomes important when the scv interrupt is implemented, which
must handle soft-masked interrupts that have r13 set to something other
than the PACA -- returning to kernel in this case must restore r13.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
v3:
- save/restore irq soft mask state like other NMIs rather than a normal
  reconcile, to avoid soft mask warnings or possibly worse.

 arch/powerpc/kernel/exceptions-64s.S | 29 +++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6ff5ea236b17..5ddfc32cacad 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2713,6 +2713,7 @@ EXC_VIRT_NONE(0x5800, 0x100)
 INT_DEFINE_BEGIN(soft_nmi)
 	IVEC=0x900
 	ISTACK=0
+	IRECONCILE=0	/* Soft-NMI may fire under local_irq_disable */
 INT_DEFINE_END(soft_nmi)
 
 /*
@@ -2731,9 +2732,35 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	subi	r1,r1,INT_FRAME_SIZE
 	__GEN_COMMON_BODY soft_nmi
 	bl	save_nvgprs
+
+	/*
+	 * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
+	 * system_reset_common)
+	 */
+	li	r10,IRQS_ALL_DISABLED
+	stb	r10,PACAIRQSOFTMASK(r13)
+	lbz	r10,PACAIRQHAPPENED(r13)
+	std	r10,_DAR(r1)
+	ori	r10,r10,PACA_IRQ_HARD_DIS
+	stb	r10,PACAIRQHAPPENED(r13)
+
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	soft_nmi_interrupt
-	b	ret_from_except
+
+	/* Clear MSR_RI before setting SRR0 and SRR1. */
+	li	r9,0
+	mtmsrd	r9,1
+
+	/*
+	 * Restore soft mask settings.
+	 */
+	ld	r10,_DAR(r1)
+	stb	r10,PACAIRQHAPPENED(r13)
+	ld	r10,SOFTE(r1)
+	stb	r10,PACAIRQSOFTMASK(r13)
+
+	EXCEPTION_RESTORE_REGS hsrr=0
+	RFI_TO_KERNEL
 
 #endif /* CONFIG_PPC_WATCHDOG */
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 23/32] powerpc/64: system call remove non-volatile GPR save optimisation
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (21 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 22/32] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 24/32] powerpc/64: sstep ifdef the deprecated fast endian switch syscall Nicholas Piggin
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

powerpc has an optimisation where interrupts avoid saving the
non-volatile (or callee saved) registers to the interrupt stack frame if
they are not required.

Two problems with this are that an interrupt does not always know
whether it will need non-volatiles; and if it does need them, they can
only be saved from the entry-scoped asm code (because we don't control
what the C compiler does with these registers).

system calls are the most difficult: some system calls always require
all registers (e.g., fork, to copy regs into the child).  Sometimes
registers are only required under certain conditions (e.g., tracing,
signal delivery). These cases require ugly logic in the call chains
(e.g., ppc_fork), and require a lot of logic to be implemented in asm.

So remove the optimisation for system calls, and always save NVGPRs on
entry. Modern high performance CPUs are not so sensitive, because the
stores are dense in cache and can be hidden by other expensive work in
the syscall path -- the null syscall selftests benchmark on POWER9 is
not slowed (124.40ns before and 123.64ns after, i.e., within the noise).

Other interrupts retain the NVGPR optimisation for now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S           | 72 +++++-------------------
 arch/powerpc/kernel/syscalls/syscall.tbl | 22 +++++---
 2 files changed, 28 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6ba675b0cf7d..14afe12eae8c 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -98,13 +98,14 @@ END_BTB_FLUSH_SECTION
 	std	r11,_XER(r1)
 	std	r11,_CTR(r1)
 	std	r9,GPR13(r1)
+	SAVE_NVGPRS(r1)
 	mflr	r10
 	/*
 	 * This clears CR0.SO (bit 28), which is the error indication on
 	 * return from this system call.
 	 */
 	rldimi	r2,r11,28,(63-28)
-	li	r11,0xc01
+	li	r11,0xc00
 	std	r10,_LINK(r1)
 	std	r11,_TRAP(r1)
 	std	r3,ORIG_GPR3(r1)
@@ -323,7 +324,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 /* Traced system call support */
 .Lsyscall_dotrace:
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_syscall_trace_enter
 
@@ -408,7 +408,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	mtmsrd	r10,1
 #endif /* CONFIG_PPC_BOOK3E */
 
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_syscall_trace_leave
 	b	ret_from_except
@@ -442,62 +441,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 _ASM_NOKPROBE_SYMBOL(system_call_common);
 _ASM_NOKPROBE_SYMBOL(system_call_exit);
 
-/* Save non-volatile GPRs, if not already saved. */
-_GLOBAL(save_nvgprs)
-	ld	r11,_TRAP(r1)
-	andi.	r0,r11,1
-	beqlr-
-	SAVE_NVGPRS(r1)
-	clrrdi	r0,r11,1
-	std	r0,_TRAP(r1)
-	blr
-_ASM_NOKPROBE_SYMBOL(save_nvgprs);
-
-	
-/*
- * The sigsuspend and rt_sigsuspend system calls can call do_signal
- * and thus put the process into the stopped state where we might
- * want to examine its user state with ptrace.  Therefore we need
- * to save all the nonvolatile registers (r14 - r31) before calling
- * the C code.  Similarly, fork, vfork and clone need the full
- * register state on the stack so that it can be copied to the child.
- */
-
-_GLOBAL(ppc_fork)
-	bl	save_nvgprs
-	bl	sys_fork
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_vfork)
-	bl	save_nvgprs
-	bl	sys_vfork
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_clone)
-	bl	save_nvgprs
-	bl	sys_clone
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_clone3)
-       bl      save_nvgprs
-       bl      sys_clone3
-       b       .Lsyscall_exit
-
-_GLOBAL(ppc32_swapcontext)
-	bl	save_nvgprs
-	bl	compat_sys_swapcontext
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc64_swapcontext)
-	bl	save_nvgprs
-	bl	sys_swapcontext
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_switch_endian)
-	bl	save_nvgprs
-	bl	sys_switch_endian
-	b	.Lsyscall_exit
-
 _GLOBAL(ret_from_fork)
 	bl	schedule_tail
 	REST_NVGPRS(r1)
@@ -516,6 +459,17 @@ _GLOBAL(ret_from_kernel_thread)
 	li	r3,0
 	b	.Lsyscall_exit
 
+/* Save non-volatile GPRs, if not already saved. */
+_GLOBAL(save_nvgprs)
+	ld	r11,_TRAP(r1)
+	andi.	r0,r11,1
+	beqlr-
+	SAVE_NVGPRS(r1)
+	clrrdi	r0,r11,1
+	std	r0,_TRAP(r1)
+	blr
+_ASM_NOKPROBE_SYMBOL(save_nvgprs);
+
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #define FLUSH_COUNT_CACHE	\
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 35b61bfc1b1a..220ae11555f2 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -9,7 +9,9 @@
 #
 0	nospu	restart_syscall			sys_restart_syscall
 1	nospu	exit				sys_exit
-2	nospu	fork				ppc_fork
+2	32	fork				ppc_fork			sys_fork
+2	64	fork				sys_fork
+2	spu	fork				sys_ni_syscall
 3	common	read				sys_read
 4	common	write				sys_write
 5	common	open				sys_open			compat_sys_open
@@ -158,7 +160,9 @@
 119	32	sigreturn			sys_sigreturn			compat_sys_sigreturn
 119	64	sigreturn			sys_ni_syscall
 119	spu	sigreturn			sys_ni_syscall
-120	nospu	clone				ppc_clone
+120	32	clone				ppc_clone			sys_clone
+120	64	clone				sys_clone
+120	spu	clone				sys_ni_syscall
 121	common	setdomainname			sys_setdomainname
 122	common	uname				sys_newuname
 123	common	modify_ldt			sys_ni_syscall
@@ -240,7 +244,9 @@
 186	spu	sendfile			sys_sendfile64
 187	common	getpmsg				sys_ni_syscall
 188	common 	putpmsg				sys_ni_syscall
-189	nospu	vfork				ppc_vfork
+189	32	vfork				ppc_vfork			sys_vfork
+189	64	vfork				sys_vfork
+189	spu	vfork				sys_ni_syscall
 190	common	ugetrlimit			sys_getrlimit			compat_sys_getrlimit
 191	common	readahead			sys_readahead			compat_sys_readahead
 192	32	mmap2				sys_mmap2			compat_sys_mmap2
@@ -316,8 +322,8 @@
 248	32	clock_nanosleep			sys_clock_nanosleep_time32
 248	64	clock_nanosleep			sys_clock_nanosleep
 248	spu	clock_nanosleep			sys_clock_nanosleep
-249	32	swapcontext			ppc_swapcontext			ppc32_swapcontext
-249	64	swapcontext			ppc64_swapcontext
+249	32	swapcontext			ppc_swapcontext			compat_sys_swapcontext
+249	64	swapcontext			sys_swapcontext
 249	spu	swapcontext			sys_ni_syscall
 250	common	tgkill				sys_tgkill
 251	32	utimes				sys_utimes_time32
@@ -456,7 +462,7 @@
 361	common	bpf				sys_bpf
 362	nospu	execveat			sys_execveat			compat_sys_execveat
 363	32	switch_endian			sys_ni_syscall
-363	64	switch_endian			ppc_switch_endian
+363	64	switch_endian			sys_switch_endian
 363	spu	switch_endian			sys_ni_syscall
 364	common	userfaultfd			sys_userfaultfd
 365	common	membarrier			sys_membarrier
@@ -516,6 +522,8 @@
 432	common	fsmount				sys_fsmount
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
-435	nospu	clone3				ppc_clone3
+435	32	clone3				ppc_clone3			sys_clone3
+435	64	clone3				sys_clone3
+435	spu	clone3				sys_ni_syscall
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 24/32] powerpc/64: sstep ifdef the deprecated fast endian switch syscall
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (22 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 23/32] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C Nicholas Piggin
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/lib/sstep.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index c077acb983a1..5f3a7bd9d90d 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3179,8 +3179,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 		 * entry code works.  If that is changed, this will
 		 * need to be changed also.
 		 */
-		if (regs->gpr[0] == 0x1ebe &&
-		    cpu_has_feature(CPU_FTR_REAL_LE)) {
+		if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
+				cpu_has_feature(CPU_FTR_REAL_LE) &&
+				regs->gpr[0] == 0x1ebe) {
 			regs->msr ^= MSR_LE;
 			goto instr_done;
 		}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (23 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 24/32] powerpc/64: sstep ifdef the deprecated fast endian switch syscall Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-03-19  9:18   ` Christophe Leroy
  2020-02-25 17:35 ` [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning Nicholas Piggin
                   ` (8 subsequent siblings)
  33 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

System call entry and particularly exit code is beyond the limit of what
is reasonable to implement in asm.

This conversion moves all conditional branches out of the asm code,
except for the case that all GPRs should be restored at exit.

Null syscall test is about 5% faster after this patch, because the exit
work is handled under local_irq_disable, and the hard mask and pending
interrupt replay is handled after that, which avoids games with MSR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---

v2,rebase (from Michal):
- Add endian conversion for dtl_idx (ms)
- Fix sparse warning about missing declaration (ms)
- Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)

v3: Fixes thanks to reports from mpe and selftests errors:
- Several soft-mask debug and unsafe smp_processor_id() warnings due to
  tracing and other false positives due to checks in "unreconciled" code.
- Fix a bug with syscall tracing functions that set registers (e.g.,
  PTRACE_SETREG) not setting GPRs properly.
- Fix silly tabort_syscall bug that causes kernel crashes when making system
  calls in transactional state.

 arch/powerpc/include/asm/asm-prototypes.h     |  17 +-
 .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
 arch/powerpc/include/asm/cputime.h            |  29 ++
 arch/powerpc/include/asm/hw_irq.h             |   4 +
 arch/powerpc/include/asm/ptrace.h             |   3 +
 arch/powerpc/include/asm/signal.h             |   3 +
 arch/powerpc/include/asm/switch_to.h          |   5 +
 arch/powerpc/include/asm/time.h               |   3 +
 arch/powerpc/kernel/Makefile                  |   3 +-
 arch/powerpc/kernel/entry_64.S                | 338 +++---------------
 arch/powerpc/kernel/signal.h                  |   2 -
 arch/powerpc/kernel/syscall_64.c              | 213 +++++++++++
 arch/powerpc/kernel/systbl.S                  |   9 +-
 13 files changed, 328 insertions(+), 315 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 983c0084fb3f..4b3609554e76 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
+#ifdef CONFIG_PPC64
+long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
+notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
+notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
+notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
+#endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
 		      u32 len_high, u32 len_low);
@@ -104,14 +110,6 @@ long sys_switch_endian(void);
 notrace unsigned int __check_irq_replay(void);
 void notrace restore_interrupts(void);
 
-/* ptrace */
-long do_syscall_trace_enter(struct pt_regs *regs);
-void do_syscall_trace_leave(struct pt_regs *regs);
-
-/* process */
-void restore_math(struct pt_regs *regs);
-void restore_tm_state(struct pt_regs *regs);
-
 /* prom_init (OpenFirmware) */
 unsigned long __init prom_init(unsigned long r3, unsigned long r4,
 			       unsigned long pp,
@@ -122,9 +120,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
 void __init early_setup(unsigned long dt_ptr);
 void early_setup_secondary(void);
 
-/* time */
-void accumulate_stolen_time(void);
-
 /* misc runtime */
 extern u64 __bswapdi2(u64);
 extern s64 __lshrdi3(s64, int);
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 90dd3a3fc8c7..71081d90f999 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -3,6 +3,7 @@
 #define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
 
 #include <linux/const.h>
+#include <asm/reg.h>
 
 #define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
 #define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
@@ -56,7 +57,14 @@
 
 #ifdef CONFIG_PPC_KUAP
 
-#include <asm/reg.h>
+#include <asm/mmu.h>
+#include <asm/ptrace.h>
+
+static inline void kuap_check_amr(void)
+{
+	if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP))
+		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
+}
 
 /*
  * We support individually allowing read or write, but we don't support nesting
@@ -127,6 +135,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
 		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
 }
+#else /* CONFIG_PPC_KUAP */
+static inline void kuap_check_amr(void)
+{
+}
 #endif /* CONFIG_PPC_KUAP */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 2431b4ada2fa..6639a6847cc0 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
 #ifdef CONFIG_PPC64
 #define get_accounting(tsk)	(&get_paca()->accounting)
 static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
+
+/*
+ * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
+ * can't use use get_paca()
+ */
+static notrace inline void account_cpu_user_entry(void)
+{
+	unsigned long tb = mftb();
+	struct cpu_accounting_data *acct = &local_paca->accounting;
+
+	acct->utime += (tb - acct->starttime_user);
+	acct->starttime = tb;
+}
+static notrace inline void account_cpu_user_exit(void)
+{
+	unsigned long tb = mftb();
+	struct cpu_accounting_data *acct = &local_paca->accounting;
+
+	acct->stime += (tb - acct->starttime);
+	acct->starttime_user = tb;
+}
+
 #else
 #define get_accounting(tsk)	(&task_thread_info(tsk)->accounting)
 /*
@@ -61,5 +83,12 @@ static inline void arch_vtime_task_switch(struct task_struct *prev)
 #endif
 
 #endif /* __KERNEL__ */
+#else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
+static inline void account_cpu_user_entry(void)
+{
+}
+static inline void account_cpu_user_exit(void)
+{
+}
 #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 #endif /* __POWERPC_CPUTIME_H */
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index e3a905e3d573..310583e62bd9 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -228,9 +228,13 @@ static inline bool arch_irqs_disabled(void)
 #ifdef CONFIG_PPC_BOOK3E
 #define __hard_irq_enable()	wrtee(MSR_EE)
 #define __hard_irq_disable()	wrtee(0)
+#define __hard_EE_RI_disable()	wrtee(0)
+#define __hard_RI_enable()	do { } while (0)
 #else
 #define __hard_irq_enable()	__mtmsrd(MSR_EE|MSR_RI, 1)
 #define __hard_irq_disable()	__mtmsrd(MSR_RI, 1)
+#define __hard_EE_RI_disable()	__mtmsrd(0, 1)
+#define __hard_RI_enable()	__mtmsrd(MSR_RI, 1)
 #endif
 
 #define hard_irq_disable()	do {					\
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index ee3ada66deb5..082a40153b94 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -138,6 +138,9 @@ extern unsigned long profile_pc(struct pt_regs *regs);
 #define profile_pc(regs) instruction_pointer(regs)
 #endif
 
+long do_syscall_trace_enter(struct pt_regs *regs);
+void do_syscall_trace_leave(struct pt_regs *regs);
+
 #define kernel_stack_pointer(regs) ((regs)->gpr[1])
 static inline int is_syscall_success(struct pt_regs *regs)
 {
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index 0803ca8b9149..99e1c6de27bc 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -6,4 +6,7 @@
 #include <uapi/asm/signal.h>
 #include <uapi/asm/ptrace.h>
 
+struct pt_regs;
+void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
+
 #endif /* _ASM_POWERPC_SIGNAL_H */
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 5b03d8a82409..476008bc3d08 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -5,6 +5,7 @@
 #ifndef _ASM_POWERPC_SWITCH_TO_H
 #define _ASM_POWERPC_SWITCH_TO_H
 
+#include <linux/sched.h>
 #include <asm/reg.h>
 
 struct thread_struct;
@@ -22,6 +23,10 @@ extern void switch_booke_debug_regs(struct debug_reg *new_debug);
 
 extern int emulate_altivec(struct pt_regs *);
 
+void restore_math(struct pt_regs *regs);
+
+void restore_tm_state(struct pt_regs *regs);
+
 extern void flush_all_to_thread(struct task_struct *);
 extern void giveup_all(struct task_struct *);
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index e0107495c4de..39ce95016a3a 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -194,5 +194,8 @@ DECLARE_PER_CPU(u64, decrementers_next_tb);
 /* Convert timebase ticks to nanoseconds */
 unsigned long long tb_to_ns(unsigned long long tb_ticks);
 
+/* SPLPAR */
+void accumulate_stolen_time(void);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 78a1b22d4fd8..5700231a8988 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -50,7 +50,8 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
 				   of_platform.o prom_parse.o
 obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 				   signal_64.o ptrace32.o \
-				   paca.o nvram_64.o firmware.o note.o
+				   paca.o nvram_64.o firmware.o note.o \
+				   syscall_64.o
 obj-$(CONFIG_VDSO32)		+= vdso32/
 obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 14afe12eae8c..7404290fa132 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -63,12 +63,7 @@ exception_marker:
 
 	.globl system_call_common
 system_call_common:
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-BEGIN_FTR_SECTION
-	extrdi.	r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
-	bne	.Ltabort_syscall
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
-#endif
+_ASM_NOKPROBE_SYMBOL(system_call_common)
 	mr	r10,r1
 	ld	r1,PACAKSAVE(r13)
 	std	r10,0(r1)
@@ -76,344 +71,101 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 	std	r12,_MSR(r1)
 	std	r0,GPR0(r1)
 	std	r10,GPR1(r1)
+	std	r2,GPR2(r1)
 #ifdef CONFIG_PPC_FSL_BOOK3E
 START_BTB_FLUSH_SECTION
 	BTB_FLUSH(r10)
 END_BTB_FLUSH_SECTION
 #endif
-	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
-	std	r2,GPR2(r1)
+	ld	r2,PACATOC(r13)
+	mfcr	r12
+	li	r11,0
+	/* Can we avoid saving r3-r8 in common case? */
 	std	r3,GPR3(r1)
-	mfcr	r2
 	std	r4,GPR4(r1)
 	std	r5,GPR5(r1)
 	std	r6,GPR6(r1)
 	std	r7,GPR7(r1)
 	std	r8,GPR8(r1)
-	li	r11,0
+	/* Zero r9-r12, this should only be required when restoring all GPRs */
 	std	r11,GPR9(r1)
 	std	r11,GPR10(r1)
 	std	r11,GPR11(r1)
 	std	r11,GPR12(r1)
-	std	r11,_XER(r1)
-	std	r11,_CTR(r1)
 	std	r9,GPR13(r1)
 	SAVE_NVGPRS(r1)
+	std	r11,_XER(r1)
+	std	r11,_CTR(r1)
 	mflr	r10
+
 	/*
 	 * This clears CR0.SO (bit 28), which is the error indication on
 	 * return from this system call.
 	 */
-	rldimi	r2,r11,28,(63-28)
+	rldimi	r12,r11,28,(63-28)
 	li	r11,0xc00
 	std	r10,_LINK(r1)
 	std	r11,_TRAP(r1)
+	std	r12,_CCR(r1)
 	std	r3,ORIG_GPR3(r1)
-	std	r2,_CCR(r1)
-	ld	r2,PACATOC(r13)
-	addi	r9,r1,STACK_FRAME_OVERHEAD
+	addi	r10,r1,STACK_FRAME_OVERHEAD
 	ld	r11,exception_marker@toc(r2)
-	std	r11,-16(r9)		/* "regshere" marker */
-
-	kuap_check_amr r10, r11
-
-#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
-BEGIN_FW_FTR_SECTION
-	/* see if there are any DTL entries to process */
-	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
-	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
-	addi	r10,r10,LPPACA_DTLIDX
-	LDX_BE	r10,0,r10		/* get log write index */
-	cmpd	r11,r10
-	beq+	33f
-	bl	accumulate_stolen_time
-	REST_GPR(0,r1)
-	REST_4GPRS(3,r1)
-	REST_2GPRS(7,r1)
-	addi	r9,r1,STACK_FRAME_OVERHEAD
-33:
-END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */
-
-	/*
-	 * A syscall should always be called with interrupts enabled
-	 * so we just unconditionally hard-enable here. When some kind
-	 * of irq tracing is used, we additionally check that condition
-	 * is correct
-	 */
-#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
-	lbz	r10,PACAIRQSOFTMASK(r13)
-1:	tdnei	r10,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
-#endif
-
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	1
-#else
-	li	r11,MSR_RI
-	ori	r11,r11,MSR_EE
-	mtmsrd	r11,1
-#endif /* CONFIG_PPC_BOOK3E */
-
-system_call:			/* label this so stack traces look sane */
-	/* We do need to set SOFTE in the stack frame or the return
-	 * from interrupt will be painful
-	 */
-	li	r10,IRQS_ENABLED
-	std	r10,SOFTE(r1)
-
-	ld	r11, PACA_THREAD_INFO(r13)
-	ld	r10,TI_FLAGS(r11)
-	andi.	r11,r10,_TIF_SYSCALL_DOTRACE
-	bne	.Lsyscall_dotrace		/* does not return */
-	cmpldi	0,r0,NR_syscalls
-	bge-	.Lsyscall_enosys
-
-.Lsyscall:
-/*
- * Need to vector to 32 Bit or default sys_call_table here,
- * based on caller's run-mode / personality.
- */
-	ld	r11,SYS_CALL_TABLE@toc(2)
-	andis.	r10,r10,_TIF_32BIT@h
-	beq	15f
-	ld	r11,COMPAT_SYS_CALL_TABLE@toc(2)
-	clrldi	r3,r3,32
-	clrldi	r4,r4,32
-	clrldi	r5,r5,32
-	clrldi	r6,r6,32
-	clrldi	r7,r7,32
-	clrldi	r8,r8,32
-15:
-	slwi	r0,r0,3
-
-	barrier_nospec_asm
-	/*
-	 * Prevent the load of the handler below (based on the user-passed
-	 * system call number) being speculatively executed until the test
-	 * against NR_syscalls and branch to .Lsyscall_enosys above has
-	 * committed.
-	 */
+	std	r11,-16(r10)		/* "regshere" marker */
 
-	ldx	r12,r11,r0	/* Fetch system call handler [ptr] */
-	mtctr   r12
-	bctrl			/* Call handler */
+	/* Calling convention has r9 = orig r0, r10 = regs */
+	mr	r9,r0
+	bl	system_call_exception
 
-	/* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */
 .Lsyscall_exit:
-	std	r3,RESULT(r1)
+	addi    r4,r1,STACK_FRAME_OVERHEAD
+	bl	syscall_exit_prepare
 
-#ifdef CONFIG_DEBUG_RSEQ
-	/* Check whether the syscall is issued inside a restartable sequence */
-	addi    r3,r1,STACK_FRAME_OVERHEAD
-	bl      rseq_syscall
-	ld	r3,RESULT(r1)
-#endif
-
-	ld	r12, PACA_THREAD_INFO(r13)
-
-	ld	r8,_MSR(r1)
-
-/*
- * This is a few instructions into the actual syscall exit path (which actually
- * starts at .Lsyscall_exit) to cater to kprobe blacklisting and to reduce the
- * number of visible symbols for profiling purposes.
- *
- * We can probe from system_call until this point as MSR_RI is set. But once it
- * is cleared below, we won't be able to take a trap.
- *
- * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL().
- */
-system_call_exit:
-	/*
-	 * Disable interrupts so current_thread_info()->flags can't change,
-	 * and so that we don't get interrupted after loading SRR0/1.
-	 *
-	 * Leave MSR_RI enabled for now, because with THREAD_INFO_IN_TASK we
-	 * could fault on the load of the TI_FLAGS below.
-	 */
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	0
-#else
-	li	r11,MSR_RI
-	mtmsrd	r11,1
-#endif /* CONFIG_PPC_BOOK3E */
-
-	ld	r9,TI_FLAGS(r12)
-	li	r11,-MAX_ERRNO
-	andi.	r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
-	bne-	.Lsyscall_exit_work
+	ld	r2,_CCR(r1)
+	ld	r4,_NIP(r1)
+	ld	r5,_MSR(r1)
+	ld	r6,_LINK(r1)
 
-	andi.	r0,r8,MSR_FP
-	beq 2f
-#ifdef CONFIG_ALTIVEC
-	andis.	r0,r8,MSR_VEC@h
-	bne	3f
-#endif
-2:	addi    r3,r1,STACK_FRAME_OVERHEAD
-	bl	restore_math
-	ld	r8,_MSR(r1)
-	ld	r3,RESULT(r1)
-	li	r11,-MAX_ERRNO
-
-3:	cmpld	r3,r11
-	ld	r5,_CCR(r1)
-	bge-	.Lsyscall_error
-.Lsyscall_error_cont:
-	ld	r7,_NIP(r1)
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1			/* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-	andi.	r6,r8,MSR_PR
-	ld	r4,_LINK(r1)
-
-	kuap_check_amr r10, r11
 
-#ifdef CONFIG_PPC_BOOK3S
-	/*
-	 * Clear MSR_RI, MSR_EE is already and remains disabled. We could do
-	 * this later, but testing shows that doing it here causes less slow
-	 * down than doing it closer to the rfid.
-	 */
-	li	r11,0
-	mtmsrd	r11,1
-#endif
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+	mtlr	r6
 
-	beq-	1f
-	ACCOUNT_CPU_USER_EXIT(r13, r11, r12)
+	cmpdi	r3,0
+	bne	.Lsyscall_restore_regs
+.Lsyscall_restore_regs_cont:
 
 BEGIN_FTR_SECTION
 	HMT_MEDIUM_LOW
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	std	r8, PACATMSCRATCH(r13)
-#endif
-
 	/*
 	 * We don't need to restore AMR on the way back to userspace for KUAP.
 	 * The value of AMR only matters while we're in the kernel.
 	 */
-	ld	r13,GPR13(r1)	/* only restore r13 if returning to usermode */
+	mtcr	r2
 	ld	r2,GPR2(r1)
+	ld	r3,GPR3(r1)
+	ld	r13,GPR13(r1)
 	ld	r1,GPR1(r1)
-	mtlr	r4
-	mtcr	r5
-	mtspr	SPRN_SRR0,r7
-	mtspr	SPRN_SRR1,r8
 	RFI_TO_USER
 	b	.	/* prevent speculative execution */
 
-1:	/* exit to kernel */
-	kuap_restore_amr r2
-
-	ld	r2,GPR2(r1)
-	ld	r1,GPR1(r1)
-	mtlr	r4
-	mtcr	r5
-	mtspr	SPRN_SRR0,r7
-	mtspr	SPRN_SRR1,r8
-	RFI_TO_KERNEL
-	b	.	/* prevent speculative execution */
-
-.Lsyscall_error:
-	oris	r5,r5,0x1000	/* Set SO bit in CR */
-	neg	r3,r3
-	std	r5,_CCR(r1)
-	b	.Lsyscall_error_cont
-
-/* Traced system call support */
-.Lsyscall_dotrace:
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	do_syscall_trace_enter
-
-	/*
-	 * We use the return value of do_syscall_trace_enter() as the syscall
-	 * number. If the syscall was rejected for any reason do_syscall_trace_enter()
-	 * returns an invalid syscall number and the test below against
-	 * NR_syscalls will fail.
-	 */
-	mr	r0,r3
-
-	/* Restore argument registers just clobbered and/or possibly changed. */
-	ld	r3,GPR3(r1)
-	ld	r4,GPR4(r1)
-	ld	r5,GPR5(r1)
-	ld	r6,GPR6(r1)
-	ld	r7,GPR7(r1)
-	ld	r8,GPR8(r1)
-
-	/* Repopulate r9 and r10 for the syscall path */
-	addi	r9,r1,STACK_FRAME_OVERHEAD
-	ld	r10, PACA_THREAD_INFO(r13)
-	ld	r10,TI_FLAGS(r10)
-
-	cmpldi	r0,NR_syscalls
-	blt+	.Lsyscall
-
-	/* Return code is already in r3 thanks to do_syscall_trace_enter() */
-	b	.Lsyscall_exit
-
-
-.Lsyscall_enosys:
-	li	r3,-ENOSYS
-	b	.Lsyscall_exit
-	
-.Lsyscall_exit_work:
-	/* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr.
-	 If TIF_NOERROR is set, just save r3 as it is. */
-
-	andi.	r0,r9,_TIF_RESTOREALL
-	beq+	0f
+.Lsyscall_restore_regs:
+	ld	r3,_CTR(r1)
+	ld	r4,_XER(r1)
 	REST_NVGPRS(r1)
-	b	2f
-0:	cmpld	r3,r11		/* r11 is -MAX_ERRNO */
-	blt+	1f
-	andi.	r0,r9,_TIF_NOERROR
-	bne-	1f
-	ld	r5,_CCR(r1)
-	neg	r3,r3
-	oris	r5,r5,0x1000	/* Set SO bit in CR */
-	std	r5,_CCR(r1)
-1:	std	r3,GPR3(r1)
-2:	andi.	r0,r9,(_TIF_PERSYSCALL_MASK)
-	beq	4f
-
-	/* Clear per-syscall TIF flags if any are set.  */
-
-	li	r11,_TIF_PERSYSCALL_MASK
-	addi	r12,r12,TI_FLAGS
-3:	ldarx	r10,0,r12
-	andc	r10,r10,r11
-	stdcx.	r10,0,r12
-	bne-	3b
-	subi	r12,r12,TI_FLAGS
-
-4:	/* Anything else left to do? */
-BEGIN_FTR_SECTION
-	lis	r3,DEFAULT_PPR@highest	/* Set default PPR */
-	sldi	r3,r3,32	/* bits 11-13 are used for ppr */
-	std	r3,_PPR(r1)
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-
-	andi.	r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP)
-	beq	ret_from_except_lite
-
-	/* Re-enable interrupts */
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	1
-#else
-	li	r10,MSR_RI
-	ori	r10,r10,MSR_EE
-	mtmsrd	r10,1
-#endif /* CONFIG_PPC_BOOK3E */
-
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	do_syscall_trace_leave
-	b	ret_from_except
+	mtctr	r3
+	mtspr	SPRN_XER,r4
+	ld	r0,GPR0(r1)
+	REST_8GPRS(4, r1)
+	ld	r12,GPR12(r1)
+	b	.Lsyscall_restore_regs_cont
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-.Ltabort_syscall:
+_GLOBAL(tabort_syscall) /* (unsigned long nip, unsigned long msr) */
 	/* Firstly we need to enable TM in the kernel */
 	mfmsr	r10
 	li	r9, 1
@@ -433,13 +185,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	li	r9, MSR_RI
 	andc	r10, r10, r9
 	mtmsrd	r10, 1
-	mtspr	SPRN_SRR0, r11
-	mtspr	SPRN_SRR1, r12
+	mtspr	SPRN_SRR0, r3
+	mtspr	SPRN_SRR1, r4
 	RFI_TO_USER
 	b	.	/* prevent speculative execution */
 #endif
-_ASM_NOKPROBE_SYMBOL(system_call_common);
-_ASM_NOKPROBE_SYMBOL(system_call_exit);
 
 _GLOBAL(ret_from_fork)
 	bl	schedule_tail
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index 800433685888..d396efca4068 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -10,8 +10,6 @@
 #ifndef _POWERPC_ARCH_SIGNAL_H
 #define _POWERPC_ARCH_SIGNAL_H
 
-extern void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
-
 extern void __user *get_sigframe(struct ksignal *ksig, unsigned long sp,
 				  size_t frame_size, int is_32);
 
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
new file mode 100644
index 000000000000..20f77cc19df8
--- /dev/null
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/err.h>
+#include <asm/asm-prototypes.h>
+#include <asm/book3s/64/kup-radix.h>
+#include <asm/cputime.h>
+#include <asm/hw_irq.h>
+#include <asm/kprobes.h>
+#include <asm/paca.h>
+#include <asm/ptrace.h>
+#include <asm/reg.h>
+#include <asm/signal.h>
+#include <asm/switch_to.h>
+#include <asm/syscall.h>
+#include <asm/time.h>
+#include <asm/unistd.h>
+
+extern void __noreturn tabort_syscall(unsigned long nip, unsigned long msr);
+
+typedef long (*syscall_fn)(long, long, long, long, long, long);
+
+/* Has to run notrace because it is entered "unreconciled" */
+notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
+			   unsigned long r0, struct pt_regs *regs)
+{
+	unsigned long ti_flags;
+	syscall_fn f;
+
+	BUG_ON(!(regs->msr & MSR_PR));
+
+	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
+	    unlikely(regs->msr & MSR_TS_T))
+		tabort_syscall(regs->nip, regs->msr);
+
+	account_cpu_user_entry();
+
+#ifdef CONFIG_PPC_SPLPAR
+	if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
+	    firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		struct lppaca *lp = local_paca->lppaca_ptr;
+
+		if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
+			accumulate_stolen_time();
+	}
+#endif
+
+	kuap_check_amr();
+
+	/*
+	 * A syscall should always be called with interrupts enabled
+	 * so we just unconditionally hard-enable here. When some kind
+	 * of irq tracing is used, we additionally check that condition
+	 * is correct
+	 */
+	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
+		WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
+		WARN_ON(local_paca->irq_happened);
+	}
+	/*
+	 * This is not required for the syscall exit path, but makes the
+	 * stack frame look nicer. If this was initialised in the first stack
+	 * frame, or if the unwinder was taught the first stack frame always
+	 * returns to user with IRQS_ENABLED, this store could be avoided!
+	 */
+	regs->softe = IRQS_ENABLED;
+
+	__hard_irq_enable();
+
+	ti_flags = current_thread_info()->flags;
+	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+		/*
+		 * We use the return value of do_syscall_trace_enter() as the
+		 * syscall number. If the syscall was rejected for any reason
+		 * do_syscall_trace_enter() returns an invalid syscall number
+		 * and the test against NR_syscalls will fail and the return
+		 * value to be used is in regs->gpr[3].
+		 */
+		r0 = do_syscall_trace_enter(regs);
+		if (unlikely(r0 >= NR_syscalls))
+			return regs->gpr[3];
+		r3 = regs->gpr[3];
+		r4 = regs->gpr[4];
+		r5 = regs->gpr[5];
+		r6 = regs->gpr[6];
+		r7 = regs->gpr[7];
+		r8 = regs->gpr[8];
+
+	} else if (unlikely(r0 >= NR_syscalls)) {
+		return -ENOSYS;
+	}
+
+	/* May be faster to do array_index_nospec? */
+	barrier_nospec();
+
+	if (unlikely(ti_flags & _TIF_32BIT)) {
+		f = (void *)compat_sys_call_table[r0];
+
+		r3 &= 0x00000000ffffffffULL;
+		r4 &= 0x00000000ffffffffULL;
+		r5 &= 0x00000000ffffffffULL;
+		r6 &= 0x00000000ffffffffULL;
+		r7 &= 0x00000000ffffffffULL;
+		r8 &= 0x00000000ffffffffULL;
+
+	} else {
+		f = (void *)sys_call_table[r0];
+	}
+
+	return f(r3, r4, r5, r6, r7, r8);
+}
+
+/*
+ * This should be called after a syscall returns, with r3 the return value
+ * from the syscall. If this function returns non-zero, the system call
+ * exit assembly should additionally load all GPR registers and CTR and XER
+ * from the interrupt frame.
+ *
+ * The function graph tracer can not trace the return side of this function,
+ * because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
+ */
+notrace unsigned long syscall_exit_prepare(unsigned long r3,
+					   struct pt_regs *regs)
+{
+	unsigned long *ti_flagsp = &current_thread_info()->flags;
+	unsigned long ti_flags;
+	unsigned long ret = 0;
+
+	regs->result = r3;
+
+	/* Check whether the syscall is issued inside a restartable sequence */
+	rseq_syscall(regs);
+
+	ti_flags = *ti_flagsp;
+
+	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) {
+		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
+			r3 = -r3;
+			regs->ccr |= 0x10000000; /* Set SO bit in CR */
+		}
+	}
+
+	if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
+		if (ti_flags & _TIF_RESTOREALL)
+			ret = _TIF_RESTOREALL;
+		else
+			regs->gpr[3] = r3;
+		clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
+	} else {
+		regs->gpr[3] = r3;
+	}
+
+	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+		do_syscall_trace_leave(regs);
+		ret |= _TIF_RESTOREALL;
+	}
+
+again:
+	local_irq_disable();
+	ti_flags = READ_ONCE(*ti_flagsp);
+	while (unlikely(ti_flags & _TIF_USER_WORK_MASK)) {
+		local_irq_enable();
+		if (ti_flags & _TIF_NEED_RESCHED) {
+			schedule();
+		} else {
+			/*
+			 * SIGPENDING must restore signal handler function
+			 * argument GPRs, and some non-volatiles (e.g., r1).
+			 * Restore all for now. This could be made lighter.
+			 */
+			if (ti_flags & _TIF_SIGPENDING)
+				ret |= _TIF_RESTOREALL;
+			do_notify_resume(regs, ti_flags);
+		}
+		local_irq_disable();
+		ti_flags = READ_ONCE(*ti_flagsp);
+	}
+
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && IS_ENABLED(CONFIG_PPC_FPU)) {
+		unsigned long mathflags = MSR_FP;
+
+		if (IS_ENABLED(CONFIG_ALTIVEC))
+			mathflags |= MSR_VEC;
+
+		if ((regs->msr & mathflags) != mathflags)
+			restore_math(regs);
+	}
+
+	/* This must be done with RI=1 because tracing may touch vmaps */
+	trace_hardirqs_on();
+
+	/* This pattern matches prep_irq_for_idle */
+	__hard_EE_RI_disable();
+	if (unlikely(lazy_irq_pending())) {
+		__hard_RI_enable();
+		trace_hardirqs_off();
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+		local_irq_enable();
+		/* Took an interrupt which may have more exit work to do. */
+		goto again;
+	}
+	local_paca->irq_happened = 0;
+	irq_soft_mask_set(IRQS_ENABLED);
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	local_paca->tm_scratch = regs->msr;
+#endif
+
+	kuap_check_amr();
+
+	account_cpu_user_exit();
+
+	return ret;
+}
diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 5b905a2f4e4d..d34276f3c495 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -16,25 +16,22 @@
 
 #ifdef CONFIG_PPC64
 	.p2align	3
+#define __SYSCALL(nr, entry)	.8byte entry
+#else
+#define __SYSCALL(nr, entry)	.long entry
 #endif
 
 .globl sys_call_table
 sys_call_table:
 #ifdef CONFIG_PPC64
-#define __SYSCALL(nr, entry)	.8byte DOTSYM(entry)
 #include <asm/syscall_table_64.h>
-#undef __SYSCALL
 #else
-#define __SYSCALL(nr, entry)	.long entry
 #include <asm/syscall_table_32.h>
-#undef __SYSCALL
 #endif
 
 #ifdef CONFIG_COMPAT
 .globl compat_sys_call_table
 compat_sys_call_table:
 #define compat_sys_sigsuspend	sys_sigsuspend
-#define __SYSCALL(nr, entry)	.8byte DOTSYM(entry)
 #include <asm/syscall_table_c32.h>
-#undef __SYSCALL
 #endif
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (24 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 21:20   ` Segher Boessenkool
  2020-02-25 17:35 ` [PATCH v3 27/32] powerpc/64: implement soft interrupt replay in C Nicholas Piggin
                   ` (7 subsequent siblings)
  33 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Kernel addresses and potentially other sensitive data could be leaked
in volatile registers after a syscall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7404290fa132..0e2c56573a41 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -135,6 +135,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
 	cmpdi	r3,0
 	bne	.Lsyscall_restore_regs
+	li	r0,0
+	li	r4,0
+	li	r5,0
+	li	r6,0
+	li	r7,0
+	li	r8,0
+	li	r9,0
+	li	r10,0
+	li	r11,0
+	li	r12,0
+	mtctr	r0
+	mtspr	SPRN_XER,r0
 .Lsyscall_restore_regs_cont:
 
 BEGIN_FTR_SECTION
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 27/32] powerpc/64: implement soft interrupt replay in C
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (25 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

When local_irq_enable() finds a pending soft-masked interrupt, it
"replays" it by setting up registers like the initial interrupt entry,
then calls into the low level handler to set up an interrupt stack
frame and process the interrupt.

This is not necessary, and uses more stack than needed. The high level
interrupt handler can be called directly from C, with just pt_regs set
up on stack. This should be faster and use less stack.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/hw_irq.h    |   1 -
 arch/powerpc/kernel/exceptions-64e.S |  32 ------
 arch/powerpc/kernel/exceptions-64s.S |  47 --------
 arch/powerpc/kernel/irq.c            | 165 +++++++++++++++++++++------
 4 files changed, 130 insertions(+), 115 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 310583e62bd9..0e9a9598f91f 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -52,7 +52,6 @@
 #ifndef __ASSEMBLY__
 
 extern void replay_system_reset(void);
-extern void __replay_interrupt(unsigned int vector);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void timer_broadcast_interrupt(void);
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index e4076e3c072d..4efac5490216 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1002,38 +1002,6 @@ masked_interrupt_book3e_0x280:
 masked_interrupt_book3e_0x2c0:
 	masked_interrupt_book3e PACA_IRQ_DBELL 0
 
-/*
- * Called from arch_local_irq_enable when an interrupt needs
- * to be resent. r3 contains either 0x500,0x900,0x260 or 0x280
- * to indicate the kind of interrupt. MSR:EE is already off.
- * We generate a stackframe like if a real interrupt had happened.
- *
- * Note: While MSR:EE is off, we need to make sure that _MSR
- * in the generated frame has EE set to 1 or the exception
- * handler will not properly re-enable them.
- */
-_GLOBAL(__replay_interrupt)
-	/* We are going to jump to the exception common code which
-	 * will retrieve various register values from the PACA which
-	 * we don't give a damn about.
-	 */
-	mflr	r10
-	mfmsr	r11
-	mfcr	r4
-	mtspr	SPRN_SPRG_GEN_SCRATCH,r13;
-	std	r1,PACA_EXGEN+EX_R1(r13);
-	stw	r4,PACA_EXGEN+EX_CR(r13);
-	ori	r11,r11,MSR_EE
-	subi	r1,r1,INT_FRAME_SIZE;
-	cmpwi	cr0,r3,0x500
-	beq	exc_0x500_common
-	cmpwi	cr0,r3,0x900
-	beq	exc_0x900_common
-	cmpwi	cr0,r3,0x280
-	beq	exc_0x280_common
-	blr
-
-
 /*
  * This is called from 0x300 and 0x400 handlers after the prologs with
  * r14 and r15 containing the fault address and error code, with the
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 5ddfc32cacad..bad8cd9e7dba 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -3146,50 +3146,3 @@ doorbell_super_common_msgclr:
 	LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
 	PPC_MSGCLRP(3)
 	b 	doorbell_super_common
-
-/*
- * Called from arch_local_irq_enable when an interrupt needs
- * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
- * which kind of interrupt. MSR:EE is already off. We generate a
- * stackframe like if a real interrupt had happened.
- *
- * Note: While MSR:EE is off, we need to make sure that _MSR
- * in the generated frame has EE set to 1 or the exception
- * handler will not properly re-enable them.
- *
- * Note that we don't specify LR as the NIP (return address) for
- * the interrupt because that would unbalance the return branch
- * predictor.
- */
-_GLOBAL(__replay_interrupt)
-	/* We are going to jump to the exception common code which
-	 * will retrieve various register values from the PACA which
-	 * we don't give a damn about, so we don't bother storing them.
-	 */
-	mfmsr	r12
-	LOAD_REG_ADDR(r11, replay_interrupt_return)
-	mfcr	r9
-	ori	r12,r12,MSR_EE
-	cmpwi	r3,0x900
-	beq	decrementer_common
-	cmpwi	r3,0x500
-BEGIN_FTR_SECTION
-	beq	h_virt_irq_common
-FTR_SECTION_ELSE
-	beq	hardware_interrupt_common
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300)
-	cmpwi	r3,0xf00
-	beq	performance_monitor_common
-BEGIN_FTR_SECTION
-	cmpwi	r3,0xa00
-	beq	h_doorbell_common_msgclr
-	cmpwi	r3,0xe60
-	beq	hmi_exception_common
-FTR_SECTION_ELSE
-	cmpwi	r3,0xa00
-	beq	doorbell_super_common_msgclr
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
-replay_interrupt_return:
-	blr
-
-_ASM_NOKPROBE_SYMBOL(__replay_interrupt)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5c9b11878555..afd74eba70aa 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -70,6 +70,7 @@
 #include <asm/paca.h>
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
+#include <asm/dbell.h>
 #endif
 #define CREATE_TRACE_POINTS
 #include <asm/trace.h>
@@ -230,10 +231,121 @@ notrace unsigned int __check_irq_replay(void)
 	return 0;
 }
 
+static void replay_soft_interrupts(void)
+{
+	/*
+	 * We use local_paca rather than get_paca() to avoid all
+	 * the debug_smp_processor_id() business in this low level
+	 * function
+	 */
+	unsigned char happened = local_paca->irq_happened;
+	struct pt_regs regs;
+
+	ppc_save_regs(&regs);
+	regs.softe = IRQS_ALL_DISABLED;
+
+again:
+	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+		WARN_ON_ONCE(mfmsr() & MSR_EE);
+
+	if (happened & PACA_IRQ_HARD_DIS) {
+		/*
+		 * We may have missed a decrementer interrupt if hard disabled.
+		 * Check the decrementer register in case we had a rollover
+		 * while hard disabled.
+		 */
+		if (!(happened & PACA_IRQ_DEC)) {
+			if (decrementer_check_overflow())
+				happened |= PACA_IRQ_DEC;
+		}
+	}
+
+	/*
+	 * Force the delivery of pending soft-disabled interrupts on PS3.
+	 * Any HV call will have this side effect.
+	 */
+	if (firmware_has_feature(FW_FEATURE_PS3_LV1)) {
+		u64 tmp, tmp2;
+		lv1_get_version_info(&tmp, &tmp2);
+	}
+
+	/*
+	 * Check if an hypervisor Maintenance interrupt happened.
+	 * This is a higher priority interrupt than the others, so
+	 * replay it first.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (happened & PACA_IRQ_HMI)) {
+		local_paca->irq_happened &= ~PACA_IRQ_HMI;
+		regs.trap = 0xe60;
+		handle_hmi_exception(&regs);
+		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
+			hard_irq_disable();
+	}
+
+	if (happened & PACA_IRQ_DEC) {
+		local_paca->irq_happened &= ~PACA_IRQ_DEC;
+		regs.trap = 0x900;
+		timer_interrupt(&regs);
+		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
+			hard_irq_disable();
+	}
+
+	if (happened & PACA_IRQ_EE) {
+		local_paca->irq_happened &= ~PACA_IRQ_EE;
+		regs.trap = 0x500;
+		do_IRQ(&regs);
+		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
+			hard_irq_disable();
+	}
+
+	/*
+	 * Check if an EPR external interrupt happened this bit is typically
+	 * set if we need to handle another "edge" interrupt from within the
+	 * MPIC "EPR" handler.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_BOOK3E) && (happened & PACA_IRQ_EE_EDGE)) {
+		local_paca->irq_happened &= ~PACA_IRQ_EE_EDGE;
+		regs.trap = 0x500;
+		do_IRQ(&regs);
+		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
+			hard_irq_disable();
+	}
+
+	if (IS_ENABLED(CONFIG_PPC_DOORBELL) && (happened & PACA_IRQ_DBELL)) {
+		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
+		if (IS_ENABLED(CONFIG_PPC_BOOK3E))
+			regs.trap = 0x280;
+		else
+			regs.trap = 0xa00;
+		doorbell_exception(&regs);
+		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
+			hard_irq_disable();
+	}
+
+	/* Book3E does not support soft-masking PMI interrupts */
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && (happened & PACA_IRQ_PMI)) {
+		local_paca->irq_happened &= ~PACA_IRQ_PMI;
+		regs.trap = 0xf00;
+		performance_monitor_exception(&regs);
+		if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
+			hard_irq_disable();
+	}
+
+	happened = local_paca->irq_happened;
+	if (happened & ~PACA_IRQ_HARD_DIS) {
+		/*
+		 * We are responding to the next interrupt, so interrupt-off
+		 * latencies should be reset here.
+		 */
+		trace_hardirqs_on();
+		trace_hardirqs_off();
+		goto again;
+	}
+}
+
 notrace void arch_local_irq_restore(unsigned long mask)
 {
 	unsigned char irq_happened;
-	unsigned int replay;
 
 	/* Write the new soft-enabled value */
 	irq_soft_mask_set(mask);
@@ -255,24 +367,16 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	 */
 	irq_happened = get_irq_happened();
 	if (!irq_happened) {
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-		WARN_ON_ONCE(!(mfmsr() & MSR_EE));
-#endif
+		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 		return;
 	}
 
-	/*
-	 * We need to hard disable to get a trusted value from
-	 * __check_irq_replay(). We also need to soft-disable
-	 * again to avoid warnings in there due to the use of
-	 * per-cpu variables.
-	 */
+	/* We need to hard disable to replay. */
 	if (!(irq_happened & PACA_IRQ_HARD_DIS)) {
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
-		WARN_ON_ONCE(!(mfmsr() & MSR_EE));
-#endif
+		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+			WARN_ON_ONCE(!(mfmsr() & MSR_EE));
 		__hard_irq_disable();
-#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
 	} else {
 		/*
 		 * We should already be hard disabled here. We had bugs
@@ -280,35 +384,26 @@ notrace void arch_local_irq_restore(unsigned long mask)
 		 * warn if we are wrong. Only do that when IRQ tracing
 		 * is enabled as mfmsr() can be costly.
 		 */
-		if (WARN_ON_ONCE(mfmsr() & MSR_EE))
-			__hard_irq_disable();
-#endif
+		if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
+			if (WARN_ON_ONCE(mfmsr() & MSR_EE))
+				__hard_irq_disable();
+		}
+
+		if (irq_happened == PACA_IRQ_HARD_DIS) {
+			local_paca->irq_happened = 0;
+			__hard_irq_enable();
+			return;
+		}
 	}
 
 	irq_soft_mask_set(IRQS_ALL_DISABLED);
 	trace_hardirqs_off();
 
-	/*
-	 * Check if anything needs to be re-emitted. We haven't
-	 * soft-enabled yet to avoid warnings in decrementer_check_overflow
-	 * accessing per-cpu variables
-	 */
-	replay = __check_irq_replay();
+	replay_soft_interrupts();
+	local_paca->irq_happened = 0;
 
-	/* We can soft-enable now */
 	trace_hardirqs_on();
 	irq_soft_mask_set(IRQS_ENABLED);
-
-	/*
-	 * And replay if we have to. This will return with interrupts
-	 * hard-enabled.
-	 */
-	if (replay) {
-		__replay_interrupt(replay);
-		return;
-	}
-
-	/* Finally, let's ensure we are hard enabled */
 	__hard_irq_enable();
 }
 EXPORT_SYMBOL(arch_local_irq_restore);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (26 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 27/32] powerpc/64: implement soft interrupt replay in C Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2021-01-27  8:54   ` Christophe Leroy
                     ` (3 more replies)
  2020-02-25 17:35 ` [PATCH v3 29/32] powerpc/64s/exception: remove lite interrupt return Nicholas Piggin
                   ` (5 subsequent siblings)
  33 siblings, 4 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Implement the bulk of interrupt return logic in C. The asm return code
must handle a few cases: restoring full GPRs, and emulating stack store.

The stack store emulation is significantly simplfied, rather than creating
a new return frame and switching to that before performing the store, it
uses the PACA to keep a scratch register around to perform thestore.

The asm return code is moved into 64e for now. The new logic has made
allowance for 64e, but I don't have a full environment that works well
to test it, and even booting in emulated qemu is not great for stress
testing. 64e shouldn't be too far off working with this, given a bit
more testing and auditing of the logic.

This is slightly faster on a POWER9 (page fault speed increases about
1.1%), probably due to reduced mtmsrd.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2,rebase (from Michal):
- Move the FP restore functions to restore_math. They are not used
  anywhere else and when restore_math is not built gcc warns about them
  being unused (ms)
- Add asm/context_tracking.h include to exceptions-64e.S for SCHEDULE_USER
  definition

v3:
- Fix return from interrupt replay problem by replaying interrupts rather
  than enabling irqs. This ends up being cleaner and __check_irq_replay
  goes away completely for 64s. Should bring 64e up to speed and kill a lot
  of cruft after it's proven on 64s.
- Don't use _GLOBAL if it's not called from C
- Simplify stack store emulation code further, add a bit more commenting.
- Some missing no probe annotations

 .../powerpc/include/asm/book3s/64/kup-radix.h |  10 +
 arch/powerpc/include/asm/hw_irq.h             |   1 +
 arch/powerpc/include/asm/switch_to.h          |   6 +
 arch/powerpc/kernel/entry_64.S                | 486 +++++-------------
 arch/powerpc/kernel/exceptions-64e.S          | 255 ++++++++-
 arch/powerpc/kernel/exceptions-64s.S          | 119 ++---
 arch/powerpc/kernel/irq.c                     |  36 +-
 arch/powerpc/kernel/process.c                 |  89 ++--
 arch/powerpc/kernel/syscall_64.c              | 164 +++++-
 arch/powerpc/kernel/vector.S                  |   2 +-
 10 files changed, 642 insertions(+), 526 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 71081d90f999..3bcef989a35d 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -60,6 +60,12 @@
 #include <asm/mmu.h>
 #include <asm/ptrace.h>
 
+static inline void kuap_restore_amr(struct pt_regs *regs)
+{
+	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
+		mtspr(SPRN_AMR, regs->kuap);
+}
+
 static inline void kuap_check_amr(void)
 {
 	if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP))
@@ -136,6 +142,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
 }
 #else /* CONFIG_PPC_KUAP */
+static inline void kuap_restore_amr(struct pt_regs *regs)
+{
+}
+
 static inline void kuap_check_amr(void)
 {
 }
diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 0e9a9598f91f..e0e71777961f 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -52,6 +52,7 @@
 #ifndef __ASSEMBLY__
 
 extern void replay_system_reset(void);
+extern void replay_soft_interrupts(void);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void timer_broadcast_interrupt(void);
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 476008bc3d08..b867b58b1093 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -23,7 +23,13 @@ extern void switch_booke_debug_regs(struct debug_reg *new_debug);
 
 extern int emulate_altivec(struct pt_regs *);
 
+#ifdef CONFIG_PPC_BOOK3S_64
 void restore_math(struct pt_regs *regs);
+#else
+static inline void restore_math(struct pt_regs *regs)
+{
+}
+#endif
 
 void restore_tm_state(struct pt_regs *regs);
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0e2c56573a41..e13eac968dfc 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -16,6 +16,7 @@
 
 #include <linux/errno.h>
 #include <linux/err.h>
+#include <asm/cache.h>
 #include <asm/unistd.h>
 #include <asm/processor.h>
 #include <asm/page.h>
@@ -221,6 +222,7 @@ _GLOBAL(ret_from_kernel_thread)
 	li	r3,0
 	b	.Lsyscall_exit
 
+#ifdef CONFIG_PPC_BOOK3E
 /* Save non-volatile GPRs, if not already saved. */
 _GLOBAL(save_nvgprs)
 	ld	r11,_TRAP(r1)
@@ -231,6 +233,7 @@ _GLOBAL(save_nvgprs)
 	std	r0,_TRAP(r1)
 	blr
 _ASM_NOKPROBE_SYMBOL(save_nvgprs);
+#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
 
@@ -294,7 +297,7 @@ flush_count_cache:
  * state of one is saved on its kernel stack.  Then the state
  * of the other is restored from its kernel stack.  The memory
  * management hardware is updated to the second process's state.
- * Finally, we can return to the second process, via ret_from_except.
+ * Finally, we can return to the second process, via interrupt_return.
  * On entry, r3 points to the THREAD for the current task, r4
  * points to the THREAD for the new task.
  *
@@ -446,408 +449,151 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	addi	r1,r1,SWITCH_FRAME_SIZE
 	blr
 
-	.align	7
-_GLOBAL(ret_from_except)
-	ld	r11,_TRAP(r1)
-	andi.	r0,r11,1
-	bne	ret_from_except_lite
-	REST_NVGPRS(r1)
-
-_GLOBAL(ret_from_except_lite)
+#ifdef CONFIG_PPC_BOOK3S
 	/*
-	 * Disable interrupts so that current_thread_info()->flags
-	 * can't change between when we test it and when we return
-	 * from the interrupt.
-	 */
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	0
-#else
-	li	r10,MSR_RI
-	mtmsrd	r10,1		  /* Update machine state */
-#endif /* CONFIG_PPC_BOOK3E */
+	 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
+	 * touched, AMR not set, no exit work created, then this can be used.
+	 */
+	.balign IFETCH_ALIGN_BYTES
+	.globl fast_interrupt_return
+fast_interrupt_return:
+_ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
+	ld	r4,_MSR(r1)
+	andi.	r0,r4,MSR_PR
+	bne	.Lfast_user_interrupt_return
+	andi.	r0,r4,MSR_RI
+	bne+	.Lfast_kernel_interrupt_return
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	unrecoverable_exception
+	b	. /* should not get here */
 
-	ld	r9, PACA_THREAD_INFO(r13)
-	ld	r3,_MSR(r1)
-#ifdef CONFIG_PPC_BOOK3E
-	ld	r10,PACACURRENT(r13)
-#endif /* CONFIG_PPC_BOOK3E */
-	ld	r4,TI_FLAGS(r9)
-	andi.	r3,r3,MSR_PR
-	beq	resume_kernel
-#ifdef CONFIG_PPC_BOOK3E
-	lwz	r3,(THREAD+THREAD_DBCR0)(r10)
-#endif /* CONFIG_PPC_BOOK3E */
+	.balign IFETCH_ALIGN_BYTES
+	.globl interrupt_return
+interrupt_return:
+_ASM_NOKPROBE_SYMBOL(interrupt_return)
+	REST_NVGPRS(r1)
 
-	/* Check current_thread_info()->flags */
-	andi.	r0,r4,_TIF_USER_WORK_MASK
-	bne	1f
-#ifdef CONFIG_PPC_BOOK3E
-	/*
-	 * Check to see if the dbcr0 register is set up to debug.
-	 * Use the internal debug mode bit to do this.
-	 */
-	andis.	r0,r3,DBCR0_IDM@h
-	beq	restore
-	mfmsr	r0
-	rlwinm	r0,r0,0,~MSR_DE	/* Clear MSR.DE */
-	mtmsr	r0
-	mtspr	SPRN_DBCR0,r3
-	li	r10, -1
-	mtspr	SPRN_DBSR,r10
-	b	restore
-#else
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	restore_math
-	b	restore
-#endif
-1:	andi.	r0,r4,_TIF_NEED_RESCHED
-	beq	2f
-	bl	restore_interrupts
-	SCHEDULE_USER
-	b	ret_from_except_lite
-2:
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	andi.	r0,r4,_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM
-	bne	3f		/* only restore TM if nothing else to do */
+	.balign IFETCH_ALIGN_BYTES
+	.globl interrupt_return_lite
+interrupt_return_lite:
+_ASM_NOKPROBE_SYMBOL(interrupt_return_lite)
+	ld	r4,_MSR(r1)
+	andi.	r0,r4,MSR_PR
+	beq	.Lkernel_interrupt_return
 	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	restore_tm_state
-	b	restore
-3:
-#endif
-	bl	save_nvgprs
-	/*
-	 * Use a non volatile GPR to save and restore our thread_info flags
-	 * across the call to restore_interrupts.
-	 */
-	mr	r30,r4
-	bl	restore_interrupts
-	mr	r4,r30
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	do_notify_resume
-	b	ret_from_except
-
-resume_kernel:
-	/* check current_thread_info, _TIF_EMULATE_STACK_STORE */
-	andis.	r8,r4,_TIF_EMULATE_STACK_STORE@h
-	beq+	1f
+	bl	interrupt_exit_user_prepare
+	cmpdi	r3,0
+	bne-	.Lrestore_nvgprs
 
-	addi	r8,r1,INT_FRAME_SIZE	/* Get the kprobed function entry */
+.Lfast_user_interrupt_return:
+	ld	r11,_NIP(r1)
+	ld	r12,_MSR(r1)
+BEGIN_FTR_SECTION
+	ld	r10,_PPR(r1)
+	mtspr	SPRN_PPR,r10
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+	mtspr	SPRN_SRR0,r11
+	mtspr	SPRN_SRR1,r12
 
-	ld	r3,GPR1(r1)
-	subi	r3,r3,INT_FRAME_SIZE	/* dst: Allocate a trampoline exception frame */
-	mr	r4,r1			/* src:  current exception frame */
-	mr	r1,r3			/* Reroute the trampoline frame to r1 */
+BEGIN_FTR_SECTION
+	stdcx.	r0,0,r1		/* to clear the reservation */
+FTR_SECTION_ELSE
+	ldarx	r0,0,r1
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
-	/* Copy from the original to the trampoline. */
-	li	r5,INT_FRAME_SIZE/8	/* size: INT_FRAME_SIZE */
-	li	r6,0			/* start offset: 0 */
-	mtctr	r5
-2:	ldx	r0,r6,r4
-	stdx	r0,r6,r3
-	addi	r6,r6,8
-	bdnz	2b
-
-	/* Do real store operation to complete stdu */
-	ld	r5,GPR1(r1)
-	std	r8,0(r5)
-
-	/* Clear _TIF_EMULATE_STACK_STORE flag */
-	lis	r11,_TIF_EMULATE_STACK_STORE@h
-	addi	r5,r9,TI_FLAGS
-0:	ldarx	r4,0,r5
-	andc	r4,r4,r11
-	stdcx.	r4,0,r5
-	bne-	0b
-1:
-
-#ifdef CONFIG_PREEMPTION
-	/* Check if we need to preempt */
-	andi.	r0,r4,_TIF_NEED_RESCHED
-	beq+	restore
-	/* Check that preempt_count() == 0 and interrupts are enabled */
-	lwz	r8,TI_PREEMPT(r9)
-	cmpwi	cr0,r8,0
-	bne	restore
-	ld	r0,SOFTE(r1)
-	andi.	r0,r0,IRQS_DISABLED
-	bne	restore
+	ld	r3,_CCR(r1)
+	ld	r4,_LINK(r1)
+	ld	r5,_CTR(r1)
+	ld	r6,_XER(r1)
+	li	r0,0
 
-	/*
-	 * Here we are preempting the current task. We want to make
-	 * sure we are soft-disabled first and reconcile irq state.
-	 */
-	RECONCILE_IRQ_STATE(r3,r4)
-	bl	preempt_schedule_irq
+	REST_4GPRS(7, r1)
+	REST_2GPRS(11, r1)
+	REST_GPR(13, r1)
 
-	/*
-	 * arch_local_irq_restore() from preempt_schedule_irq above may
-	 * enable hard interrupt but we really should disable interrupts
-	 * when we return from the interrupt, and so that we don't get
-	 * interrupted after loading SRR0/1.
-	 */
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	0
-#else
-	li	r10,MSR_RI
-	mtmsrd	r10,1		  /* Update machine state */
-#endif /* CONFIG_PPC_BOOK3E */
-#endif /* CONFIG_PREEMPTION */
+	mtcr	r3
+	mtlr	r4
+	mtctr	r5
+	mtspr	SPRN_XER,r6
 
-	.globl	fast_exc_return_irq
-fast_exc_return_irq:
-restore:
-	/*
-	 * This is the main kernel exit path. First we check if we
-	 * are about to re-enable interrupts
-	 */
-	ld	r5,SOFTE(r1)
-	lbz	r6,PACAIRQSOFTMASK(r13)
-	andi.	r5,r5,IRQS_DISABLED
-	bne	.Lrestore_irq_off
+	REST_4GPRS(2, r1)
+	REST_GPR(6, r1)
+	REST_GPR(0, r1)
+	REST_GPR(1, r1)
+	RFI_TO_USER
+	b	.	/* prevent speculative execution */
 
-	/* We are enabling, were we already enabled ? Yes, just return */
-	andi.	r6,r6,IRQS_DISABLED
-	beq	cr0,.Ldo_restore
+.Lrestore_nvgprs:
+	REST_NVGPRS(r1)
+	b	.Lfast_user_interrupt_return
 
-	/*
-	 * We are about to soft-enable interrupts (we are hard disabled
-	 * at this point). We check if there's anything that needs to
-	 * be replayed first.
-	 */
-	lbz	r0,PACAIRQHAPPENED(r13)
-	cmpwi	cr0,r0,0
-	bne-	.Lrestore_check_irq_replay
+	.balign IFETCH_ALIGN_BYTES
+.Lkernel_interrupt_return:
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	interrupt_exit_kernel_prepare
+	cmpdi	cr1,r3,0
 
-	/*
-	 * Get here when nothing happened while soft-disabled, just
-	 * soft-enable and move-on. We will hard-enable as a side
-	 * effect of rfi
-	 */
-.Lrestore_no_replay:
-	TRACE_ENABLE_INTS
-	li	r0,IRQS_ENABLED
-	stb	r0,PACAIRQSOFTMASK(r13);
+.Lfast_kernel_interrupt_return:
+	ld	r11,_NIP(r1)
+	ld	r12,_MSR(r1)
+	mtspr	SPRN_SRR0,r11
+	mtspr	SPRN_SRR1,r12
 
-	/*
-	 * Final return path. BookE is handled in a different file
-	 */
-.Ldo_restore:
-#ifdef CONFIG_PPC_BOOK3E
-	b	exception_return_book3e
-#else
-	/*
-	 * Clear the reservation. If we know the CPU tracks the address of
-	 * the reservation then we can potentially save some cycles and use
-	 * a larx. On POWER6 and POWER7 this is significantly faster.
-	 */
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1		/* to clear the reservation */
 FTR_SECTION_ELSE
-	ldarx	r4,0,r1
+	ldarx	r0,0,r1
 ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
-	/*
-	 * Some code path such as load_up_fpu or altivec return directly
-	 * here. They run entirely hard disabled and do not alter the
-	 * interrupt state. They also don't use lwarx/stwcx. and thus
-	 * are known not to leave dangling reservations.
-	 */
-	.globl	fast_exception_return
-fast_exception_return:
-	ld	r3,_MSR(r1)
+	ld	r3,_LINK(r1)
 	ld	r4,_CTR(r1)
-	ld	r0,_LINK(r1)
-	mtctr	r4
-	mtlr	r0
-	ld	r4,_XER(r1)
-	mtspr	SPRN_XER,r4
-
-	kuap_check_amr r5, r6
-
-	REST_8GPRS(5, r1)
-
-	andi.	r0,r3,MSR_RI
-	beq-	.Lunrecov_restore
-
-	/*
-	 * Clear RI before restoring r13.  If we are returning to
-	 * userspace and we take an exception after restoring r13,
-	 * we end up corrupting the userspace r13 value.
-	 */
-	li	r4,0
-	mtmsrd	r4,1
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	/* TM debug */
-	std	r3, PACATMSCRATCH(r13) /* Stash returned-to MSR */
-#endif
-	/*
-	 * r13 is our per cpu area, only restore it if we are returning to
-	 * userspace the value stored in the stack frame may belong to
-	 * another CPU.
-	 */
-	andi.	r0,r3,MSR_PR
-	beq	1f
-BEGIN_FTR_SECTION
-	/* Restore PPR */
-	ld	r2,_PPR(r1)
-	mtspr	SPRN_PPR,r2
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-	ACCOUNT_CPU_USER_EXIT(r13, r2, r4)
-	REST_GPR(13, r1)
-
-	/*
-	 * We don't need to restore AMR on the way back to userspace for KUAP.
-	 * The value of AMR only matters while we're in the kernel.
-	 */
-	mtspr	SPRN_SRR1,r3
-
-	ld	r2,_CCR(r1)
-	mtcrf	0xFF,r2
-	ld	r2,_NIP(r1)
-	mtspr	SPRN_SRR0,r2
-
-	ld	r0,GPR0(r1)
-	ld	r2,GPR2(r1)
-	ld	r3,GPR3(r1)
-	ld	r4,GPR4(r1)
-	ld	r1,GPR1(r1)
-	RFI_TO_USER
-	b	.	/* prevent speculative execution */
+	ld	r5,_XER(r1)
+	ld	r6,_CCR(r1)
+	li	r0,0
 
-1:	mtspr	SPRN_SRR1,r3
+	REST_4GPRS(7, r1)
+	REST_2GPRS(11, r1)
 
-	ld	r2,_CCR(r1)
-	mtcrf	0xFF,r2
-	ld	r2,_NIP(r1)
-	mtspr	SPRN_SRR0,r2
+	mtlr	r3
+	mtctr	r4
+	mtspr	SPRN_XER,r5
 
 	/*
 	 * Leaving a stale exception_marker on the stack can confuse
 	 * the reliable stack unwinder later on. Clear it.
 	 */
-	li	r2,0
-	std	r2,STACK_FRAME_OVERHEAD-16(r1)
+	std	r0,STACK_FRAME_OVERHEAD-16(r1)
 
-	ld	r0,GPR0(r1)
-	ld	r2,GPR2(r1)
-	ld	r3,GPR3(r1)
+	REST_4GPRS(2, r1)
 
-	kuap_restore_amr r4
-
-	ld	r4,GPR4(r1)
-	ld	r1,GPR1(r1)
+	bne-	cr1,1f /* emulate stack store */
+	mtcr	r6
+	REST_GPR(6, r1)
+	REST_GPR(0, r1)
+	REST_GPR(1, r1)
 	RFI_TO_KERNEL
 	b	.	/* prevent speculative execution */
 
-#endif /* CONFIG_PPC_BOOK3E */
-
-	/*
-	 * We are returning to a context with interrupts soft disabled.
-	 *
-	 * However, we may also about to hard enable, so we need to
-	 * make sure that in this case, we also clear PACA_IRQ_HARD_DIS
-	 * or that bit can get out of sync and bad things will happen
-	 */
-.Lrestore_irq_off:
-	ld	r3,_MSR(r1)
-	lbz	r7,PACAIRQHAPPENED(r13)
-	andi.	r0,r3,MSR_EE
-	beq	1f
-	rlwinm	r7,r7,0,~PACA_IRQ_HARD_DIS
-	stb	r7,PACAIRQHAPPENED(r13)
-1:
-#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
-	/* The interrupt should not have soft enabled. */
-	lbz	r7,PACAIRQSOFTMASK(r13)
-1:	tdeqi	r7,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
-#endif
-	b	.Ldo_restore
-
-	/*
-	 * Something did happen, check if a re-emit is needed
-	 * (this also clears paca->irq_happened)
-	 */
-.Lrestore_check_irq_replay:
-	/* XXX: We could implement a fast path here where we check
-	 * for irq_happened being just 0x01, in which case we can
-	 * clear it and return. That means that we would potentially
-	 * miss a decrementer having wrapped all the way around.
-	 *
-	 * Still, this might be useful for things like hash_page
-	 */
-	bl	__check_irq_replay
-	cmpwi	cr0,r3,0
-	beq	.Lrestore_no_replay
- 
-	/*
-	 * We need to re-emit an interrupt. We do so by re-using our
-	 * existing exception frame. We first change the trap value,
-	 * but we need to ensure we preserve the low nibble of it
-	 */
-	ld	r4,_TRAP(r1)
-	clrldi	r4,r4,60
-	or	r4,r4,r3
-	std	r4,_TRAP(r1)
-
-	/*
-	 * PACA_IRQ_HARD_DIS won't always be set here, so set it now
-	 * to reconcile the IRQ state. Tracing is already accounted for.
-	 */
-	lbz	r4,PACAIRQHAPPENED(r13)
-	ori	r4,r4,PACA_IRQ_HARD_DIS
-	stb	r4,PACAIRQHAPPENED(r13)
-
-	/*
-	 * Then find the right handler and call it. Interrupts are
-	 * still soft-disabled and we keep them that way.
-	*/
-	cmpwi	cr0,r3,0x500
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
- 	bl	do_IRQ
-	b	ret_from_except
-1:	cmpwi	cr0,r3,0xf00
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
-	bl	performance_monitor_exception
-	b	ret_from_except
-1:	cmpwi	cr0,r3,0xe60
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
-	bl	handle_hmi_exception
-	b	ret_from_except
-1:	cmpwi	cr0,r3,0x900
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
-	bl	timer_interrupt
-	b	ret_from_except
-#ifdef CONFIG_PPC_DOORBELL
-1:
-#ifdef CONFIG_PPC_BOOK3E
-	cmpwi	cr0,r3,0x280
-#else
-	cmpwi	cr0,r3,0xa00
-#endif /* CONFIG_PPC_BOOK3E */
-	bne	1f
-	addi	r3,r1,STACK_FRAME_OVERHEAD;
-	bl	doorbell_exception
-#endif /* CONFIG_PPC_DOORBELL */
-1:	b	ret_from_except /* What else to do here ? */
- 
-.Lunrecov_restore:
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	unrecoverable_exception
-	b	.Lunrecov_restore
-
-_ASM_NOKPROBE_SYMBOL(ret_from_except);
-_ASM_NOKPROBE_SYMBOL(ret_from_except_lite);
-_ASM_NOKPROBE_SYMBOL(resume_kernel);
-_ASM_NOKPROBE_SYMBOL(fast_exc_return_irq);
-_ASM_NOKPROBE_SYMBOL(restore);
-_ASM_NOKPROBE_SYMBOL(fast_exception_return);
+1:	/*
+	 * Emulate stack store with update. New r1 value was already calculated
+	 * and updated in our interrupt regs by emulate_loadstore, but we can't
+	 * store the previous value of r1 to the stack before re-loading our
+	 * registers from it, otherwise they could be clobbered.  Use
+	 * PACA_EXGEN as temporary storage to hold the store data, as
+	 * interrupts are disabled here so it won't be clobbered.
+	 */
+	mtcr	r6
+	std	r9,PACA_EXGEN+0(r13)
+	addi	r9,r1,INT_FRAME_SIZE /* get original r1 */
+	REST_GPR(6, r1)
+	REST_GPR(0, r1)
+	REST_GPR(1, r1)
+	std	r9,0(r1) /* perform store component of stdu */
+	ld	r9,PACA_EXGEN+0(r13)
 
+	RFI_TO_KERNEL
+	b	.	/* prevent speculative execution */
+#endif /* CONFIG_PPC_BOOK3S */
 
 #ifdef CONFIG_PPC_RTAS
 /*
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index 4efac5490216..d9ed79415100 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -24,6 +24,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_booke_hv_asm.h>
 #include <asm/feature-fixups.h>
+#include <asm/context_tracking.h>
 
 /* XXX This will ultimately add space for a special exception save
  *     structure used to save things like SRR0/SRR1, SPRGs, MAS, etc...
@@ -1041,17 +1042,161 @@ alignment_more:
 	bl	alignment_exception
 	b	ret_from_except
 
-/*
- * We branch here from entry_64.S for the last stage of the exception
- * return code path. MSR:EE is expected to be off at that point
- */
-_GLOBAL(exception_return_book3e)
-	b	1f
+	.align	7
+_GLOBAL(ret_from_except)
+	ld	r11,_TRAP(r1)
+	andi.	r0,r11,1
+	bne	ret_from_except_lite
+	REST_NVGPRS(r1)
+
+_GLOBAL(ret_from_except_lite)
+	/*
+	 * Disable interrupts so that current_thread_info()->flags
+	 * can't change between when we test it and when we return
+	 * from the interrupt.
+	 */
+	wrteei	0
+
+	ld	r9, PACA_THREAD_INFO(r13)
+	ld	r3,_MSR(r1)
+	ld	r10,PACACURRENT(r13)
+	ld	r4,TI_FLAGS(r9)
+	andi.	r3,r3,MSR_PR
+	beq	resume_kernel
+	lwz	r3,(THREAD+THREAD_DBCR0)(r10)
+
+	/* Check current_thread_info()->flags */
+	andi.	r0,r4,_TIF_USER_WORK_MASK
+	bne	1f
+	/*
+	 * Check to see if the dbcr0 register is set up to debug.
+	 * Use the internal debug mode bit to do this.
+	 */
+	andis.	r0,r3,DBCR0_IDM@h
+	beq	restore
+	mfmsr	r0
+	rlwinm	r0,r0,0,~MSR_DE	/* Clear MSR.DE */
+	mtmsr	r0
+	mtspr	SPRN_DBCR0,r3
+	li	r10, -1
+	mtspr	SPRN_DBSR,r10
+	b	restore
+1:	andi.	r0,r4,_TIF_NEED_RESCHED
+	beq	2f
+	bl	restore_interrupts
+	SCHEDULE_USER
+	b	ret_from_except_lite
+2:
+	bl	save_nvgprs
+	/*
+	 * Use a non volatile GPR to save and restore our thread_info flags
+	 * across the call to restore_interrupts.
+	 */
+	mr	r30,r4
+	bl	restore_interrupts
+	mr	r4,r30
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	do_notify_resume
+	b	ret_from_except
+
+resume_kernel:
+	/* check current_thread_info, _TIF_EMULATE_STACK_STORE */
+	andis.	r8,r4,_TIF_EMULATE_STACK_STORE@h
+	beq+	1f
+
+	addi	r8,r1,INT_FRAME_SIZE	/* Get the kprobed function entry */
+
+	ld	r3,GPR1(r1)
+	subi	r3,r3,INT_FRAME_SIZE	/* dst: Allocate a trampoline exception frame */
+	mr	r4,r1			/* src:  current exception frame */
+	mr	r1,r3			/* Reroute the trampoline frame to r1 */
+
+	/* Copy from the original to the trampoline. */
+	li	r5,INT_FRAME_SIZE/8	/* size: INT_FRAME_SIZE */
+	li	r6,0			/* start offset: 0 */
+	mtctr	r5
+2:	ldx	r0,r6,r4
+	stdx	r0,r6,r3
+	addi	r6,r6,8
+	bdnz	2b
+
+	/* Do real store operation to complete stdu */
+	ld	r5,GPR1(r1)
+	std	r8,0(r5)
+
+	/* Clear _TIF_EMULATE_STACK_STORE flag */
+	lis	r11,_TIF_EMULATE_STACK_STORE@h
+	addi	r5,r9,TI_FLAGS
+0:	ldarx	r4,0,r5
+	andc	r4,r4,r11
+	stdcx.	r4,0,r5
+	bne-	0b
+1:
+
+#ifdef CONFIG_PREEMPT
+	/* Check if we need to preempt */
+	andi.	r0,r4,_TIF_NEED_RESCHED
+	beq+	restore
+	/* Check that preempt_count() == 0 and interrupts are enabled */
+	lwz	r8,TI_PREEMPT(r9)
+	cmpwi	cr0,r8,0
+	bne	restore
+	ld	r0,SOFTE(r1)
+	andi.	r0,r0,IRQS_DISABLED
+	bne	restore
+
+	/*
+	 * Here we are preempting the current task. We want to make
+	 * sure we are soft-disabled first and reconcile irq state.
+	 */
+	RECONCILE_IRQ_STATE(r3,r4)
+	bl	preempt_schedule_irq
+
+	/*
+	 * arch_local_irq_restore() from preempt_schedule_irq above may
+	 * enable hard interrupt but we really should disable interrupts
+	 * when we return from the interrupt, and so that we don't get
+	 * interrupted after loading SRR0/1.
+	 */
+	wrteei	0
+#endif /* CONFIG_PREEMPT */
+
+restore:
+	/*
+	 * This is the main kernel exit path. First we check if we
+	 * are about to re-enable interrupts
+	 */
+	ld	r5,SOFTE(r1)
+	lbz	r6,PACAIRQSOFTMASK(r13)
+	andi.	r5,r5,IRQS_DISABLED
+	bne	.Lrestore_irq_off
+
+	/* We are enabling, were we already enabled ? Yes, just return */
+	andi.	r6,r6,IRQS_DISABLED
+	beq	cr0,fast_exception_return
+
+	/*
+	 * We are about to soft-enable interrupts (we are hard disabled
+	 * at this point). We check if there's anything that needs to
+	 * be replayed first.
+	 */
+	lbz	r0,PACAIRQHAPPENED(r13)
+	cmpwi	cr0,r0,0
+	bne-	.Lrestore_check_irq_replay
+
+	/*
+	 * Get here when nothing happened while soft-disabled, just
+	 * soft-enable and move-on. We will hard-enable as a side
+	 * effect of rfi
+	 */
+.Lrestore_no_replay:
+	TRACE_ENABLE_INTS
+	li	r0,IRQS_ENABLED
+	stb	r0,PACAIRQSOFTMASK(r13);
 
 /* This is the return from load_up_fpu fast path which could do with
  * less GPR restores in fact, but for now we have a single return path
  */
-	.globl fast_exception_return
 fast_exception_return:
 	wrteei	0
 1:	mr	r0,r13
@@ -1092,6 +1237,102 @@ fast_exception_return:
 	mfspr	r13,SPRN_SPRG_GEN_SCRATCH
 	rfi
 
+	/*
+	 * We are returning to a context with interrupts soft disabled.
+	 *
+	 * However, we may also about to hard enable, so we need to
+	 * make sure that in this case, we also clear PACA_IRQ_HARD_DIS
+	 * or that bit can get out of sync and bad things will happen
+	 */
+.Lrestore_irq_off:
+	ld	r3,_MSR(r1)
+	lbz	r7,PACAIRQHAPPENED(r13)
+	andi.	r0,r3,MSR_EE
+	beq	1f
+	rlwinm	r7,r7,0,~PACA_IRQ_HARD_DIS
+	stb	r7,PACAIRQHAPPENED(r13)
+1:
+#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
+	/* The interrupt should not have soft enabled. */
+	lbz	r7,PACAIRQSOFTMASK(r13)
+1:	tdeqi	r7,IRQS_ENABLED
+	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+#endif
+	b	fast_exception_return
+
+	/*
+	 * Something did happen, check if a re-emit is needed
+	 * (this also clears paca->irq_happened)
+	 */
+.Lrestore_check_irq_replay:
+	/* XXX: We could implement a fast path here where we check
+	 * for irq_happened being just 0x01, in which case we can
+	 * clear it and return. That means that we would potentially
+	 * miss a decrementer having wrapped all the way around.
+	 *
+	 * Still, this might be useful for things like hash_page
+	 */
+	bl	__check_irq_replay
+	cmpwi	cr0,r3,0
+	beq	.Lrestore_no_replay
+
+	/*
+	 * We need to re-emit an interrupt. We do so by re-using our
+	 * existing exception frame. We first change the trap value,
+	 * but we need to ensure we preserve the low nibble of it
+	 */
+	ld	r4,_TRAP(r1)
+	clrldi	r4,r4,60
+	or	r4,r4,r3
+	std	r4,_TRAP(r1)
+
+	/*
+	 * PACA_IRQ_HARD_DIS won't always be set here, so set it now
+	 * to reconcile the IRQ state. Tracing is already accounted for.
+	 */
+	lbz	r4,PACAIRQHAPPENED(r13)
+	ori	r4,r4,PACA_IRQ_HARD_DIS
+	stb	r4,PACAIRQHAPPENED(r13)
+
+	/*
+	 * Then find the right handler and call it. Interrupts are
+	 * still soft-disabled and we keep them that way.
+	*/
+	cmpwi	cr0,r3,0x500
+	bne	1f
+	addi	r3,r1,STACK_FRAME_OVERHEAD;
+	bl	do_IRQ
+	b	ret_from_except
+1:	cmpwi	cr0,r3,0xf00
+	bne	1f
+	addi	r3,r1,STACK_FRAME_OVERHEAD;
+	bl	performance_monitor_exception
+	b	ret_from_except
+1:	cmpwi	cr0,r3,0xe60
+	bne	1f
+	addi	r3,r1,STACK_FRAME_OVERHEAD;
+	bl	handle_hmi_exception
+	b	ret_from_except
+1:	cmpwi	cr0,r3,0x900
+	bne	1f
+	addi	r3,r1,STACK_FRAME_OVERHEAD;
+	bl	timer_interrupt
+	b	ret_from_except
+#ifdef CONFIG_PPC_DOORBELL
+1:
+	cmpwi	cr0,r3,0x280
+	bne	1f
+	addi	r3,r1,STACK_FRAME_OVERHEAD;
+	bl	doorbell_exception
+#endif /* CONFIG_PPC_DOORBELL */
+1:	b	ret_from_except /* What else to do here ? */
+
+_ASM_NOKPROBE_SYMBOL(ret_from_except);
+_ASM_NOKPROBE_SYMBOL(ret_from_except_lite);
+_ASM_NOKPROBE_SYMBOL(resume_kernel);
+_ASM_NOKPROBE_SYMBOL(restore);
+_ASM_NOKPROBE_SYMBOL(fast_exception_return);
+
 /*
  * Trampolines used when spotting a bad kernel stack pointer in
  * the exception entry code.
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index bad8cd9e7dba..d635fd4e40ea 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -575,6 +575,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	std	r10,GPR12(r1)
 	std	r11,GPR13(r1)
 
+	SAVE_NVGPRS(r1)
+
 	.if IDAR
 	.if IISIDE
 	ld	r10,_NIP(r1)
@@ -611,7 +613,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 	mfspr	r11,SPRN_XER		/* save XER in stackframe	*/
 	std	r10,SOFTE(r1)
 	std	r11,_XER(r1)
-	li	r9,(IVEC)+1
+	li	r9,IVEC
 	std	r9,_TRAP(r1)		/* set trap number		*/
 	li	r10,0
 	ld	r11,exception_marker@toc(r2)
@@ -918,7 +920,6 @@ EXC_COMMON_BEGIN(system_reset_common)
 	ld	r1,PACA_NMI_EMERG_SP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
 	__GEN_COMMON_BODY system_reset
-	bl	save_nvgprs
 	/*
 	 * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
 	 * the right thing. We do not want to reconcile because that goes
@@ -1099,7 +1100,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
 	li	r10,MSR_RI
 	mtmsrd	r10,1
 
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	machine_check_early
 	std	r3,RESULT(r1)	/* Save result */
@@ -1192,10 +1192,9 @@ EXC_COMMON_BEGIN(machine_check_common)
 	/* Enable MSR_RI when finished with PACA_EXMC */
 	li	r10,MSR_RI
 	mtmsrd 	r10,1
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	machine_check_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM machine_check
 
@@ -1362,20 +1361,19 @@ BEGIN_MMU_FTR_SECTION
 	bl	do_slb_fault
 	cmpdi	r3,0
 	bne-	1f
-	b	fast_exception_return
+	b	fast_interrupt_return
 1:	/* Error case */
 MMU_FTR_SECTION_ELSE
 	/* Radix case, access is outside page table range */
 	li	r3,-EFAULT
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	std	r3,RESULT(r1)
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	ld	r4,_DAR(r1)
 	ld	r5,RESULT(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_bad_slb_fault
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM data_access_slb
 
@@ -1455,20 +1453,19 @@ BEGIN_MMU_FTR_SECTION
 	bl	do_slb_fault
 	cmpdi	r3,0
 	bne-	1f
-	b	fast_exception_return
+	b	fast_interrupt_return
 1:	/* Error case */
 MMU_FTR_SECTION_ELSE
 	/* Radix case, access is outside page table range */
 	li	r3,-EFAULT
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 	std	r3,RESULT(r1)
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	ld	r4,_DAR(r1)
 	ld	r5,RESULT(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_bad_slb_fault
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM instruction_access_slb
 
@@ -1516,7 +1513,7 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_IRQ
-	b	ret_from_except_lite
+	b	interrupt_return_lite
 
 	GEN_KVM hardware_interrupt
 
@@ -1542,10 +1539,9 @@ EXC_VIRT_BEGIN(alignment, 0x4600, 0x100)
 EXC_VIRT_END(alignment, 0x4600, 0x100)
 EXC_COMMON_BEGIN(alignment_common)
 	GEN_COMMON alignment
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	alignment_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM alignment
 
@@ -1606,10 +1602,9 @@ EXC_COMMON_BEGIN(program_check_common)
 	__ISTACK(program_check)=1
 	__GEN_COMMON_BODY program_check
 3:
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	program_check_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM program_check
 
@@ -1640,7 +1635,6 @@ EXC_VIRT_END(fp_unavailable, 0x4800, 0x100)
 EXC_COMMON_BEGIN(fp_unavailable_common)
 	GEN_COMMON fp_unavailable
 	bne	1f			/* if from user, just load it up */
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	kernel_fp_unavailable_exception
@@ -1657,14 +1651,13 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #endif
 	bl	load_up_fpu
-	b	fast_exception_return
+	b	fast_interrupt_return
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2:	/* User process was in a transaction */
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	fp_unavailable_tm
-	b	ret_from_except
+	b	interrupt_return
 #endif
 
 	GEN_KVM fp_unavailable
@@ -1707,7 +1700,7 @@ EXC_COMMON_BEGIN(decrementer_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	timer_interrupt
-	b	ret_from_except_lite
+	b	interrupt_return_lite
 
 	GEN_KVM decrementer
 
@@ -1798,7 +1791,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #else
 	bl	unknown_exception
 #endif
-	b	ret_from_except_lite
+	b	interrupt_return_lite
 
 	GEN_KVM doorbell_super
 
@@ -1970,10 +1963,9 @@ EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
 EXC_VIRT_END(single_step, 0x4d00, 0x100)
 EXC_COMMON_BEGIN(single_step_common)
 	GEN_COMMON single_step
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	single_step_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM single_step
 
@@ -2008,7 +2000,6 @@ EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20)
 EXC_VIRT_END(h_data_storage, 0x4e00, 0x20)
 EXC_COMMON_BEGIN(h_data_storage_common)
 	GEN_COMMON h_data_storage
-	bl      save_nvgprs
 	addi    r3,r1,STACK_FRAME_OVERHEAD
 BEGIN_MMU_FTR_SECTION
 	ld	r4,_DAR(r1)
@@ -2017,7 +2008,7 @@ BEGIN_MMU_FTR_SECTION
 MMU_FTR_SECTION_ELSE
 	bl      unknown_exception
 ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
-	b       ret_from_except
+	b       interrupt_return
 
 	GEN_KVM h_data_storage
 
@@ -2042,10 +2033,9 @@ EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20)
 EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20)
 EXC_COMMON_BEGIN(h_instr_storage_common)
 	GEN_COMMON h_instr_storage
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unknown_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM h_instr_storage
 
@@ -2068,10 +2058,9 @@ EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20)
 EXC_VIRT_END(emulation_assist, 0x4e40, 0x20)
 EXC_COMMON_BEGIN(emulation_assist_common)
 	GEN_COMMON emulation_assist
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	emulation_assist_interrupt
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM emulation_assist
 
@@ -2151,10 +2140,9 @@ EXC_COMMON_BEGIN(hmi_exception_common)
 	GEN_COMMON hmi_exception
 	FINISH_NAP
 	RUNLATCH_ON
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	handle_hmi_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM hmi_exception
 
@@ -2188,7 +2176,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #else
 	bl	unknown_exception
 #endif
-	b	ret_from_except_lite
+	b	interrupt_return_lite
 
 	GEN_KVM h_doorbell
 
@@ -2218,7 +2206,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_IRQ
-	b	ret_from_except_lite
+	b	interrupt_return_lite
 
 	GEN_KVM h_virt_irq
 
@@ -2265,7 +2253,7 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	performance_monitor_exception
-	b	ret_from_except_lite
+	b	interrupt_return_lite
 
 	GEN_KVM performance_monitor
 
@@ -2305,23 +2293,21 @@ BEGIN_FTR_SECTION
   END_FTR_SECTION_NESTED(CPU_FTR_TM, CPU_FTR_TM, 69)
 #endif
 	bl	load_up_altivec
-	b	fast_exception_return
+	b	fast_interrupt_return
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2:	/* User process was in a transaction */
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	altivec_unavailable_tm
-	b	ret_from_except
+	b	interrupt_return
 #endif
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 #endif
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	altivec_unavailable_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM altivec_unavailable
 
@@ -2363,20 +2349,18 @@ BEGIN_FTR_SECTION
 	b	load_up_vsx
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 2:	/* User process was in a transaction */
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	vsx_unavailable_tm
-	b	ret_from_except
+	b	interrupt_return
 #endif
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 #endif
-	bl	save_nvgprs
 	RECONCILE_IRQ_STATE(r10, r11)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	vsx_unavailable_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM vsx_unavailable
 
@@ -2403,10 +2387,9 @@ EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20)
 EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20)
 EXC_COMMON_BEGIN(facility_unavailable_common)
 	GEN_COMMON facility_unavailable
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	facility_unavailable_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM facility_unavailable
 
@@ -2433,10 +2416,9 @@ EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20)
 EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20)
 EXC_COMMON_BEGIN(h_facility_unavailable_common)
 	GEN_COMMON h_facility_unavailable
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	facility_unavailable_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM h_facility_unavailable
 
@@ -2467,10 +2449,9 @@ EXC_REAL_END(cbe_system_error, 0x1200, 0x100)
 EXC_VIRT_NONE(0x5200, 0x100)
 EXC_COMMON_BEGIN(cbe_system_error_common)
 	GEN_COMMON cbe_system_error
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_system_error_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM cbe_system_error
 
@@ -2496,10 +2477,9 @@ EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100)
 EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100)
 EXC_COMMON_BEGIN(instruction_breakpoint_common)
 	GEN_COMMON instruction_breakpoint
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	instruction_breakpoint_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM instruction_breakpoint
 
@@ -2619,10 +2599,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 
 EXC_COMMON_BEGIN(denorm_exception_common)
 	GEN_COMMON denorm_exception
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	unknown_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM denorm_exception
 
@@ -2641,10 +2620,9 @@ EXC_REAL_END(cbe_maintenance, 0x1600, 0x100)
 EXC_VIRT_NONE(0x5600, 0x100)
 EXC_COMMON_BEGIN(cbe_maintenance_common)
 	GEN_COMMON cbe_maintenance
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_maintenance_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM cbe_maintenance
 
@@ -2669,14 +2647,13 @@ EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100)
 EXC_VIRT_END(altivec_assist, 0x5700, 0x100)
 EXC_COMMON_BEGIN(altivec_assist_common)
 	GEN_COMMON altivec_assist
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_ALTIVEC
 	bl	altivec_assist_exception
 #else
 	bl	unknown_exception
 #endif
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM altivec_assist
 
@@ -2695,10 +2672,9 @@ EXC_REAL_END(cbe_thermal, 0x1800, 0x100)
 EXC_VIRT_NONE(0x5800, 0x100)
 EXC_COMMON_BEGIN(cbe_thermal_common)
 	GEN_COMMON cbe_thermal
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	cbe_thermal_exception
-	b	ret_from_except
+	b	interrupt_return
 
 	GEN_KVM cbe_thermal
 
@@ -2731,7 +2707,6 @@ EXC_COMMON_BEGIN(soft_nmi_common)
 	ld	r1,PACAEMERGSP(r13)
 	subi	r1,r1,INT_FRAME_SIZE
 	__GEN_COMMON_BODY soft_nmi
-	bl	save_nvgprs
 
 	/*
 	 * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
@@ -3063,7 +3038,7 @@ do_hash_page:
         cmpdi	r3,0			/* see if __hash_page succeeded */
 
 	/* Success */
-	beq	fast_exc_return_irq	/* Return from exception on success */
+	beq	interrupt_return_lite	/* Return from exception on success */
 
 	/* Error */
 	blt-	13f
@@ -3080,17 +3055,15 @@ handle_page_fault:
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_page_fault
 	cmpdi	r3,0
-	beq+	ret_from_except_lite
-	bl	save_nvgprs
+	beq+	interrupt_return_lite
 	mr	r5,r3
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	ld	r4,_DAR(r1)
 	bl	bad_page_fault
-	b	ret_from_except
+	b	interrupt_return
 
 /* We have a data breakpoint exception - handle it */
 handle_dabr_fault:
-	bl	save_nvgprs
 	ld      r4,_DAR(r1)
 	ld      r5,_DSISR(r1)
 	addi    r3,r1,STACK_FRAME_OVERHEAD
@@ -3098,21 +3071,20 @@ handle_dabr_fault:
 	/*
 	 * do_break() may have changed the NV GPRS while handling a breakpoint.
 	 * If so, we need to restore them with their updated values. Don't use
-	 * ret_from_except_lite here.
+	 * interrupt_return_lite here.
 	 */
-	b       ret_from_except
+	b       interrupt_return
 
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /* We have a page fault that hash_page could handle but HV refused
  * the PTE insertion
  */
-13:	bl	save_nvgprs
-	mr	r5,r3
+13:	mr	r5,r3
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	ld	r4,_DAR(r1)
 	bl	low_hash_fault
-	b	ret_from_except
+	b	interrupt_return
 #endif
 
 /*
@@ -3122,11 +3094,10 @@ handle_dabr_fault:
  * were soft-disabled.  We want to invoke the exception handler for
  * the access, or panic if there isn't a handler.
  */
-77:	bl	save_nvgprs
-	addi	r3,r1,STACK_FRAME_OVERHEAD
+77:	addi	r3,r1,STACK_FRAME_OVERHEAD
 	li	r5,SIGSEGV
 	bl	bad_page_fault
-	b	ret_from_except
+	b	interrupt_return
 
 /*
  * When doorbell is triggered from system reset wakeup, the message is
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index afd74eba70aa..6ea27dbcb872 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -110,6 +110,8 @@ static inline notrace int decrementer_check_overflow(void)
 	return now >= *next_tb;
 }
 
+#ifdef CONFIG_PPC_BOOK3E
+
 /* This is called whenever we are re-enabling interrupts
  * and returns either 0 (nothing to do) or 500/900/280/a00/e80 if
  * there's an EE, DEC or DBELL to generate.
@@ -169,41 +171,16 @@ notrace unsigned int __check_irq_replay(void)
 		}
 	}
 
-	/*
-	 * Force the delivery of pending soft-disabled interrupts on PS3.
-	 * Any HV call will have this side effect.
-	 */
-	if (firmware_has_feature(FW_FEATURE_PS3_LV1)) {
-		u64 tmp, tmp2;
-		lv1_get_version_info(&tmp, &tmp2);
-	}
-
-	/*
-	 * Check if an hypervisor Maintenance interrupt happened.
-	 * This is a higher priority interrupt than the others, so
-	 * replay it first.
-	 */
-	if (happened & PACA_IRQ_HMI) {
-		local_paca->irq_happened &= ~PACA_IRQ_HMI;
-		return 0xe60;
-	}
-
 	if (happened & PACA_IRQ_DEC) {
 		local_paca->irq_happened &= ~PACA_IRQ_DEC;
 		return 0x900;
 	}
 
-	if (happened & PACA_IRQ_PMI) {
-		local_paca->irq_happened &= ~PACA_IRQ_PMI;
-		return 0xf00;
-	}
-
 	if (happened & PACA_IRQ_EE) {
 		local_paca->irq_happened &= ~PACA_IRQ_EE;
 		return 0x500;
 	}
 
-#ifdef CONFIG_PPC_BOOK3E
 	/*
 	 * Check if an EPR external interrupt happened this bit is typically
 	 * set if we need to handle another "edge" interrupt from within the
@@ -218,20 +195,15 @@ notrace unsigned int __check_irq_replay(void)
 		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
 		return 0x280;
 	}
-#else
-	if (happened & PACA_IRQ_DBELL) {
-		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
-		return 0xa00;
-	}
-#endif /* CONFIG_PPC_BOOK3E */
 
 	/* There should be nothing left ! */
 	BUG_ON(local_paca->irq_happened != 0);
 
 	return 0;
 }
+#endif /* CONFIG_PPC_BOOK3E */
 
-static void replay_soft_interrupts(void)
+void replay_soft_interrupts(void)
 {
 	/*
 	 * We use local_paca rather than get_paca() to avoid all
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db9dcf2..1dea4d280f6f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -236,23 +236,9 @@ void enable_kernel_fp(void)
 	}
 }
 EXPORT_SYMBOL(enable_kernel_fp);
-
-static int restore_fp(struct task_struct *tsk)
-{
-	if (tsk->thread.load_fp) {
-		load_fp_state(&current->thread.fp_state);
-		current->thread.load_fp++;
-		return 1;
-	}
-	return 0;
-}
-#else
-static int restore_fp(struct task_struct *tsk) { return 0; }
 #endif /* CONFIG_PPC_FPU */
 
 #ifdef CONFIG_ALTIVEC
-#define loadvec(thr) ((thr).load_vec)
-
 static void __giveup_altivec(struct task_struct *tsk)
 {
 	unsigned long msr;
@@ -318,21 +304,6 @@ void flush_altivec_to_thread(struct task_struct *tsk)
 	}
 }
 EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
-
-static int restore_altivec(struct task_struct *tsk)
-{
-	if (cpu_has_feature(CPU_FTR_ALTIVEC) && (tsk->thread.load_vec)) {
-		load_vr_state(&tsk->thread.vr_state);
-		tsk->thread.used_vr = 1;
-		tsk->thread.load_vec++;
-
-		return 1;
-	}
-	return 0;
-}
-#else
-#define loadvec(thr) 0
-static inline int restore_altivec(struct task_struct *tsk) { return 0; }
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef CONFIG_VSX
@@ -400,18 +371,6 @@ void flush_vsx_to_thread(struct task_struct *tsk)
 	}
 }
 EXPORT_SYMBOL_GPL(flush_vsx_to_thread);
-
-static int restore_vsx(struct task_struct *tsk)
-{
-	if (cpu_has_feature(CPU_FTR_VSX)) {
-		tsk->thread.used_vsr = 1;
-		return 1;
-	}
-
-	return 0;
-}
-#else
-static inline int restore_vsx(struct task_struct *tsk) { return 0; }
 #endif /* CONFIG_VSX */
 
 #ifdef CONFIG_SPE
@@ -511,6 +470,53 @@ void giveup_all(struct task_struct *tsk)
 }
 EXPORT_SYMBOL(giveup_all);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+#ifdef CONFIG_PPC_FPU
+static int restore_fp(struct task_struct *tsk)
+{
+	if (tsk->thread.load_fp) {
+		load_fp_state(&current->thread.fp_state);
+		current->thread.load_fp++;
+		return 1;
+	}
+	return 0;
+}
+#else
+static int restore_fp(struct task_struct *tsk) { return 0; }
+#endif /* CONFIG_PPC_FPU */
+
+#ifdef CONFIG_ALTIVEC
+#define loadvec(thr) ((thr).load_vec)
+static int restore_altivec(struct task_struct *tsk)
+{
+	if (cpu_has_feature(CPU_FTR_ALTIVEC) && (tsk->thread.load_vec)) {
+		load_vr_state(&tsk->thread.vr_state);
+		tsk->thread.used_vr = 1;
+		tsk->thread.load_vec++;
+
+		return 1;
+	}
+	return 0;
+}
+#else
+#define loadvec(thr) 0
+static inline int restore_altivec(struct task_struct *tsk) { return 0; }
+#endif /* CONFIG_ALTIVEC */
+
+#ifdef CONFIG_VSX
+static int restore_vsx(struct task_struct *tsk)
+{
+	if (cpu_has_feature(CPU_FTR_VSX)) {
+		tsk->thread.used_vsr = 1;
+		return 1;
+	}
+
+	return 0;
+}
+#else
+static inline int restore_vsx(struct task_struct *tsk) { return 0; }
+#endif /* CONFIG_VSX */
+
 /*
  * The exception exit path calls restore_math() with interrupts hard disabled
  * but the soft irq state not "reconciled". ftrace code that calls
@@ -551,6 +557,7 @@ void notrace restore_math(struct pt_regs *regs)
 
 	regs->msr = msr;
 }
+#endif
 
 static void save_all(struct task_struct *tsk)
 {
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 20f77cc19df8..08e0bebbd3b6 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -26,7 +26,11 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7,
 	unsigned long ti_flags;
 	syscall_fn f;
 
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
+		BUG_ON(!(regs->msr & MSR_RI));
 	BUG_ON(!(regs->msr & MSR_PR));
+	BUG_ON(!FULL_REGS(regs));
+	BUG_ON(regs->softe != IRQS_ENABLED);
 
 	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
 	    unlikely(regs->msr & MSR_TS_T))
@@ -195,7 +199,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 		trace_hardirqs_off();
 		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 		local_irq_enable();
-		/* Took an interrupt which may have more exit work to do. */
+		/* Took an interrupt, may have more exit work to do. */
 		goto again;
 	}
 	local_paca->irq_happened = 0;
@@ -211,3 +215,161 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 
 	return ret;
 }
+
+#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
+notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
+{
+#ifdef CONFIG_PPC_BOOK3E
+	struct thread_struct *ts = &current->thread;
+#endif
+	unsigned long *ti_flagsp = &current_thread_info()->flags;
+	unsigned long ti_flags;
+	unsigned long flags;
+	unsigned long ret = 0;
+
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
+		BUG_ON(!(regs->msr & MSR_RI));
+	BUG_ON(!(regs->msr & MSR_PR));
+	BUG_ON(!FULL_REGS(regs));
+	BUG_ON(regs->softe != IRQS_ENABLED);
+
+	local_irq_save(flags);
+
+again:
+	ti_flags = READ_ONCE(*ti_flagsp);
+	while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
+		local_irq_enable(); /* returning to user: may enable */
+		if (ti_flags & _TIF_NEED_RESCHED) {
+			schedule();
+		} else {
+			if (ti_flags & _TIF_SIGPENDING)
+				ret |= _TIF_RESTOREALL;
+			do_notify_resume(regs, ti_flags);
+		}
+		local_irq_disable();
+		ti_flags = READ_ONCE(*ti_flagsp);
+	}
+
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
+		unsigned long mathflags = 0;
+
+		if (IS_ENABLED(CONFIG_PPC_FPU))
+			mathflags |= MSR_FP;
+		if (IS_ENABLED(CONFIG_ALTIVEC))
+			mathflags |= MSR_VEC;
+
+		if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
+						(ti_flags & _TIF_RESTORE_TM))
+			restore_tm_state(regs);
+		else if ((regs->msr & mathflags) != mathflags)
+			restore_math(regs);
+	}
+
+	trace_hardirqs_on();
+	__hard_EE_RI_disable();
+	if (unlikely(lazy_irq_pending())) {
+		__hard_RI_enable();
+		trace_hardirqs_off();
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+		local_irq_enable();
+		local_irq_disable();
+		/* Took an interrupt, may have more exit work to do. */
+		goto again;
+	}
+	local_paca->irq_happened = 0;
+	irq_soft_mask_set(IRQS_ENABLED);
+
+#ifdef CONFIG_PPC_BOOK3E
+	if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) {
+		/*
+		 * Check to see if the dbcr0 register is set up to debug.
+		 * Use the internal debug mode bit to do this.
+		 */
+		mtmsr(mfmsr() & ~MSR_DE);
+		mtspr(SPRN_DBCR0, ts->debug.dbcr0);
+		mtspr(SPRN_DBSR, -1);
+	}
+#endif
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	local_paca->tm_scratch = regs->msr;
+#endif
+
+	kuap_check_amr();
+
+	account_cpu_user_exit();
+
+	return ret;
+}
+
+void unrecoverable_exception(struct pt_regs *regs);
+void preempt_schedule_irq(void);
+
+notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
+{
+	unsigned long *ti_flagsp = &current_thread_info()->flags;
+	unsigned long flags;
+
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
+		unrecoverable_exception(regs);
+	BUG_ON(regs->msr & MSR_PR);
+	BUG_ON(!FULL_REGS(regs));
+
+	local_irq_save(flags);
+
+	if (regs->softe == IRQS_ENABLED) {
+		/* Returning to a kernel context with local irqs enabled. */
+		WARN_ON_ONCE(!(regs->msr & MSR_EE));
+again:
+		if (IS_ENABLED(CONFIG_PREEMPT)) {
+			/* Return to preemptible kernel context */
+			if (unlikely(*ti_flagsp & _TIF_NEED_RESCHED)) {
+				if (preempt_count() == 0)
+					preempt_schedule_irq();
+			}
+		}
+
+		trace_hardirqs_on();
+		__hard_EE_RI_disable();
+		if (unlikely(lazy_irq_pending())) {
+			__hard_RI_enable();
+			irq_soft_mask_set(IRQS_ALL_DISABLED);
+			trace_hardirqs_off();
+			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+			/*
+			 * Can't local_irq_enable in case we are in interrupt
+			 * context. Must replay directly.
+			 */
+			replay_soft_interrupts();
+			irq_soft_mask_set(flags);
+			/* Took an interrupt, may have more exit work to do. */
+			goto again;
+		}
+		local_paca->irq_happened = 0;
+		irq_soft_mask_set(IRQS_ENABLED);
+	} else {
+		/* Returning to a kernel context with local irqs disabled. */
+		trace_hardirqs_on();
+		__hard_EE_RI_disable();
+		if (regs->msr & MSR_EE)
+			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+	}
+
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	local_paca->tm_scratch = regs->msr;
+#endif
+
+	/*
+	 * We don't need to restore AMR on the way back to userspace for KUAP.
+	 * The value of AMR only matters while we're in the kernel.
+	 */
+	kuap_restore_amr(regs);
+
+	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
+		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
+		return 1;
+	}
+	return 0;
+}
+#endif
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 25c14a0981bf..d20c5e79e03c 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -134,7 +134,7 @@ _GLOBAL(load_up_vsx)
 	/* enable use of VSX after return */
 	oris	r12,r12,MSR_VSX@h
 	std	r12,_MSR(r1)
-	b	fast_exception_return
+	b	fast_interrupt_return
 
 #endif /* CONFIG_VSX */
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 29/32] powerpc/64s/exception: remove lite interrupt return
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (27 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 30/32] powerpc/64: system call reconcile interrupts Nicholas Piggin
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The difference between lite and regular returns is that the lite case
restores all NVGPRs, whereas lite skips that. This is quite clumsy
though, most interrupts want the NVGPRs saved for debugging, not to
modify in the caller, so the NVGPRs restore is not necessary most of
the time. Restore NVGPRs explicitly for one case that requires it,
and move everything else over to avoiding the restore unless the
interrupt return demands it (e.g., handling a signal).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
v3:
- Add a copule of missing restore cases for instruction emulation

 arch/powerpc/kernel/entry_64.S       |  6 ------
 arch/powerpc/kernel/exceptions-64s.S | 24 ++++++++++++++----------
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index e13eac968dfc..6d5464f83c05 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -471,12 +471,6 @@ _ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
 	.globl interrupt_return
 interrupt_return:
 _ASM_NOKPROBE_SYMBOL(interrupt_return)
-	REST_NVGPRS(r1)
-
-	.balign IFETCH_ALIGN_BYTES
-	.globl interrupt_return_lite
-interrupt_return_lite:
-_ASM_NOKPROBE_SYMBOL(interrupt_return_lite)
 	ld	r4,_MSR(r1)
 	andi.	r0,r4,MSR_PR
 	beq	.Lkernel_interrupt_return
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index d635fd4e40ea..b53e452cbca0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1513,7 +1513,7 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_IRQ
-	b	interrupt_return_lite
+	b	interrupt_return
 
 	GEN_KVM hardware_interrupt
 
@@ -1541,6 +1541,7 @@ EXC_COMMON_BEGIN(alignment_common)
 	GEN_COMMON alignment
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	alignment_exception
+	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
 	GEN_KVM alignment
@@ -1604,6 +1605,7 @@ EXC_COMMON_BEGIN(program_check_common)
 3:
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	program_check_exception
+	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
 	GEN_KVM program_check
@@ -1700,7 +1702,7 @@ EXC_COMMON_BEGIN(decrementer_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	timer_interrupt
-	b	interrupt_return_lite
+	b	interrupt_return
 
 	GEN_KVM decrementer
 
@@ -1791,7 +1793,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #else
 	bl	unknown_exception
 #endif
-	b	interrupt_return_lite
+	b	interrupt_return
 
 	GEN_KVM doorbell_super
 
@@ -2060,6 +2062,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 	GEN_COMMON emulation_assist
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	emulation_assist_interrupt
+	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 	b	interrupt_return
 
 	GEN_KVM emulation_assist
@@ -2176,7 +2179,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #else
 	bl	unknown_exception
 #endif
-	b	interrupt_return_lite
+	b	interrupt_return
 
 	GEN_KVM h_doorbell
 
@@ -2206,7 +2209,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_IRQ
-	b	interrupt_return_lite
+	b	interrupt_return
 
 	GEN_KVM h_virt_irq
 
@@ -2253,7 +2256,7 @@ EXC_COMMON_BEGIN(performance_monitor_common)
 	RUNLATCH_ON
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	performance_monitor_exception
-	b	interrupt_return_lite
+	b	interrupt_return
 
 	GEN_KVM performance_monitor
 
@@ -2650,6 +2653,7 @@ EXC_COMMON_BEGIN(altivec_assist_common)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_ALTIVEC
 	bl	altivec_assist_exception
+	REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 #else
 	bl	unknown_exception
 #endif
@@ -3038,7 +3042,7 @@ do_hash_page:
         cmpdi	r3,0			/* see if __hash_page succeeded */
 
 	/* Success */
-	beq	interrupt_return_lite	/* Return from exception on success */
+	beq	interrupt_return	/* Return from exception on success */
 
 	/* Error */
 	blt-	13f
@@ -3055,7 +3059,7 @@ handle_page_fault:
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_page_fault
 	cmpdi	r3,0
-	beq+	interrupt_return_lite
+	beq+	interrupt_return
 	mr	r5,r3
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	ld	r4,_DAR(r1)
@@ -3070,9 +3074,9 @@ handle_dabr_fault:
 	bl      do_break
 	/*
 	 * do_break() may have changed the NV GPRS while handling a breakpoint.
-	 * If so, we need to restore them with their updated values. Don't use
-	 * interrupt_return_lite here.
+	 * If so, we need to restore them with their updated values.
 	 */
+	REST_NVGPRS(r1)
 	b       interrupt_return
 
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 30/32] powerpc/64: system call reconcile interrupts
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (28 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 29/32] powerpc/64s/exception: remove lite interrupt return Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 31/32] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

This reconciles interrupts in the system call case like all other
interrupts. This allows system_call_common to be shared with the
scv system call implementation in a subsequent patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S   | 11 +++++++++++
 arch/powerpc/kernel/syscall_64.c | 28 +++++++++++++---------------
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6d5464f83c05..8406812c9734 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -113,6 +113,17 @@ END_BTB_FLUSH_SECTION
 	ld	r11,exception_marker@toc(r2)
 	std	r11,-16(r10)		/* "regshere" marker */
 
+	/*
+	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
+	 * would clobber syscall parameters. Also we always enter with IRQs
+	 * enabled and nothing pending. system_call_exception() will call
+	 * trace_hardirqs_off().
+	 */
+	li	r11,IRQS_ALL_DISABLED
+	li	r12,PACA_IRQ_HARD_DIS
+	stb	r11,PACAIRQSOFTMASK(r13)
+	stb	r12,PACAIRQHAPPENED(r13)
+
 	/* Calling convention has r9 = orig r0, r10 = regs */
 	mr	r9,r0
 	bl	system_call_exception
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 08e0bebbd3b6..32601a572ff0 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -19,13 +19,19 @@ extern void __noreturn tabort_syscall(unsigned long nip, unsigned long msr);
 
 typedef long (*syscall_fn)(long, long, long, long, long, long);
 
-/* Has to run notrace because it is entered "unreconciled" */
-notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
-			   unsigned long r0, struct pt_regs *regs)
+/* Has to run notrace because it is entered not completely "reconciled" */
+notrace long system_call_exception(long r3, long r4, long r5,
+				   long r6, long r7, long r8,
+				   unsigned long r0, struct pt_regs *regs)
 {
 	unsigned long ti_flags;
 	syscall_fn f;
 
+	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+		BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
+
+	trace_hardirqs_off(); /* finish reconciling */
+
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
 		BUG_ON(!(regs->msr & MSR_RI));
 	BUG_ON(!(regs->msr & MSR_PR));
@@ -33,8 +39,10 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7,
 	BUG_ON(regs->softe != IRQS_ENABLED);
 
 	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
-	    unlikely(regs->msr & MSR_TS_T))
+	    unlikely(regs->msr & MSR_TS_T)) {
+		local_irq_enable();
 		tabort_syscall(regs->nip, regs->msr);
+	}
 
 	account_cpu_user_entry();
 
@@ -50,16 +58,6 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7,
 
 	kuap_check_amr();
 
-	/*
-	 * A syscall should always be called with interrupts enabled
-	 * so we just unconditionally hard-enable here. When some kind
-	 * of irq tracing is used, we additionally check that condition
-	 * is correct
-	 */
-	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
-		WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
-		WARN_ON(local_paca->irq_happened);
-	}
 	/*
 	 * This is not required for the syscall exit path, but makes the
 	 * stack frame look nicer. If this was initialised in the first stack
@@ -68,7 +66,7 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7,
 	 */
 	regs->softe = IRQS_ENABLED;
 
-	__hard_irq_enable();
+	local_irq_enable();
 
 	ti_flags = current_thread_info()->flags;
 	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 31/32] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (29 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 30/32] powerpc/64: system call reconcile interrupts Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-02-25 17:35 ` [PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

The scv instruction causes an interrupt which can enter the kernel with
MSR[EE]=1, thus allowing interrupts to hit at any time. These must not
be taken as normal interrupts, because they come from MSR[PR]=0 context,
and yet the kernel stack is not yet set up and r13 is not set to the
PACA).

Treat this as a soft-masked interrupt regardless of the soft masked
state. This does not affect behaviour yet, because currently all
interrupts are taken with MSR[EE]=0.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index b53e452cbca0..7a6be3f32973 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -494,8 +494,24 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 
 .macro __GEN_COMMON_BODY name
 	.if IMASK
+		.if ! ISTACK
+		.error "No support for masked interrupt to use custom stack"
+		.endif
+
+		/* If coming from user, skip soft-mask tests. */
+		andi.	r10,r12,MSR_PR
+		bne	2f
+
+		/* Kernel code running below __end_interrupts is implicitly
+		 * soft-masked */
+		LOAD_HANDLER(r10, __end_interrupts)
+		cmpd	r11,r10
+		li	r10,IMASK
+		blt-	1f
+
+		/* Test the soft mask state against our interrupt's bit */
 		lbz	r10,PACAIRQSOFTMASK(r13)
-		andi.	r10,r10,IMASK
+1:		andi.	r10,r10,IMASK
 		/* Associate vector numbers with bits in paca->irq_happened */
 		.if IVEC == 0x500 || IVEC == 0xea0
 		li	r10,PACA_IRQ_EE
@@ -526,7 +542,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 
 	.if ISTACK
 	andi.	r10,r12,MSR_PR		/* See if coming from user	*/
-	mr	r10,r1			/* Save r1			*/
+2:	mr	r10,r1			/* Save r1			*/
 	subi	r1,r1,INT_FRAME_SIZE	/* alloc frame on kernel stack	*/
 	beq-	100f
 	ld	r1,PACAKSAVE(r13)	/* kernel stack to use		*/
@@ -2791,7 +2807,8 @@ masked_interrupt:
 	ld	r10,PACA_EXGEN+EX_R10(r13)
 	ld	r11,PACA_EXGEN+EX_R11(r13)
 	ld	r12,PACA_EXGEN+EX_R12(r13)
-	/* returns to kernel where r13 must be set up, so don't restore it */
+	ld	r13,PACA_EXGEN+EX_R13(r13)
+	/* May return to masked low address where r13 is not set up */
 	.if \hsrr
 	HRFI_TO_KERNEL
 	.else
@@ -2950,6 +2967,10 @@ EXC_COMMON_BEGIN(ppc64_runlatch_on_trampoline)
 
 USE_FIXED_SECTION(virt_trampolines)
 	/*
+	 * All code below __end_interrupts is treated as soft-masked. If
+	 * any code runs here with MSR[EE]=1, it must then cope with pending
+	 * soft interrupt being raised (i.e., by ensuring it is replayed).
+	 *
 	 * The __end_interrupts marker must be past the out-of-line (OOL)
 	 * handlers, so that they are copied to real address 0x100 when running
 	 * a relocatable kernel. This ensures they can be reached from the short
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
                   ` (30 preceding siblings ...)
  2020-02-25 17:35 ` [PATCH v3 31/32] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin
@ 2020-02-25 17:35 ` Nicholas Piggin
  2020-03-01 12:20     ` kbuild test robot
  2020-03-19 12:19   ` Michal Suchanek
  2020-03-20 10:20   ` Michal Suchanek
  33 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-25 17:35 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical
to 'sc' system calls, with the exception that lr is not preserved, and
it is 64-bit only. There may yet be changes made to this ABI, so it's
for testing only.

rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve lr.

In a comparison of getpid syscall, the test program had scv taking
about 3 more cycles in user mode (92 vs 89 for sc), due to lr handling.
getpid syscall throughput on POWER9 is improved by 33%, mostly due to
reducing mtmsr and mtspr.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 Documentation/powerpc/syscall64-abi.rst   |  42 +++++---
 arch/powerpc/include/asm/asm-prototypes.h |   2 +-
 arch/powerpc/include/asm/exception-64s.h  |   6 ++
 arch/powerpc/include/asm/head-64.h        |   2 +-
 arch/powerpc/include/asm/ppc_asm.h        |   2 +
 arch/powerpc/include/asm/processor.h      |   2 +-
 arch/powerpc/include/asm/setup.h          |   4 +-
 arch/powerpc/kernel/cpu_setup_power.S     |   2 +-
 arch/powerpc/kernel/cputable.c            |   3 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c         |   1 +
 arch/powerpc/kernel/entry_64.S            | 114 +++++++++++++++++++++
 arch/powerpc/kernel/exceptions-64s.S      | 119 +++++++++++++++++++++-
 arch/powerpc/kernel/setup_64.c            |   5 +-
 arch/powerpc/kernel/syscall_64.c          |  14 ++-
 arch/powerpc/platforms/pseries/setup.c    |   8 +-
 15 files changed, 295 insertions(+), 31 deletions(-)

diff --git a/Documentation/powerpc/syscall64-abi.rst b/Documentation/powerpc/syscall64-abi.rst
index e49f69f941b9..30c045e8726e 100644
--- a/Documentation/powerpc/syscall64-abi.rst
+++ b/Documentation/powerpc/syscall64-abi.rst
@@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI
 syscall
 =======
 
+Invocation
+----------
+The syscall is made with the sc instruction, and returns with execution
+continuing at the instruction following the sc instruction.
+
+If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the
+scv 0 instruction is an alternative that may be used, with some differences
+to calling sequence.
+
 syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI
 specification C function calling sequence, including register preservation
 rules, with the following differences.
@@ -12,16 +21,23 @@ rules, with the following differences.
 .. [1] Some syscalls (typically low-level management functions) may have
        different calling sequences (e.g., rt_sigreturn).
 
-Parameters and return value
----------------------------
+Parameters
+----------
 The system call number is specified in r0.
 
 There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
 
-Both a return value and a return error code are returned. cr0.SO is the return
-error code, and r3 is the return value or error code. When cr0.SO is clear,
-the syscall succeeded and r3 is the return value. When cr0.SO is set, the
-syscall failed and r3 is the error code that generally corresponds to errno.
+Return value
+------------
+- For the sc instruction, both a return value and a return error code are
+  returned. cr0.SO is the return error code, and r3 is the return value or
+  error code. When cr0.SO is clear, the syscall succeeded and r3 is the return
+  value. When cr0.SO is set, the syscall failed and r3 is the error code that
+  generally corresponds to errno.
+
+- For the scv 0 instruction, there is a return value indicates failure if it
+  is >= -MAX_ERRNO (-4095) as an unsigned comparison, in which case it is the
+  negated return error code. Otherwise it is the successful return value.
 
 Stack
 -----
@@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling sequence with the
 following differences:
 
 =========== ============= ========================================
+--- For the sc instruction ---
 r0          Volatile      (System call number.)
 r3          Volatile      (Parameter 1, and return value.)
 r4-r8       Volatile      (Parameters 2-6.)
-cr0         Volatile      (cr0.SO is the return error condition)
+cr0         Volatile      (cr0.SO is the return error condition.)
 cr1, cr5-7  Nonvolatile
 lr          Nonvolatile
+
+--- For the scv 0 instruction ---
+r0          Volatile      (System call number.)
+r3          Volatile      (Parameter 1, and return value.)
+r4-r8       Volatile      (Parameters 2-6.)
 =========== ============= ========================================
 
 All floating point and vector data registers as well as control and status
 registers are nonvolatile.
 
-Invocation
-----------
-The syscall is performed with the sc instruction, and returns with execution
-continuing at the instruction following the sc instruction.
-
 Transactional Memory
 --------------------
 Syscall behavior can change if the processor is in transactional or suspended
@@ -75,6 +92,7 @@ auxiliary vector.
   returning to the caller. This case is not well defined or supported, so this
   behavior should not be relied upon.
 
+scv 0 syscalls will always behave as PPC_FEATURE2_HTM_NOSC.
 
 vsyscall
 ========
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 4b3609554e76..2ea43e4afdff 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -99,7 +99,7 @@ void __init machine_init(u64 dt_ptr);
 #endif
 #ifdef CONFIG_PPC64
 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
-notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
+notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs, long scv);
 notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
 notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
 #endif
diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index 47bd4ea0837d..0c2fe7f042d1 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -123,6 +123,12 @@
 	hrfid;								\
 	b	hrfi_flush_fallback
 
+#define RFSCV_TO_USER							\
+	STF_EXIT_BARRIER_SLOT;						\
+	RFI_FLUSH_SLOT;							\
+	RFSCV;								\
+	b	rfscv_flush_fallback
+
 #endif /* __ASSEMBLY__ */
 
 #endif	/* _ASM_POWERPC_EXCEPTION_H */
diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h
index 2dabcf668292..4cb9efa2eb21 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -128,7 +128,7 @@ end_##sname:
 	.if ((start) % (size) != 0);				\
 	.error "Fixed section exception vector misalignment";	\
 	.endif;							\
-	.if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100); \
+	.if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100) && ((size) != 0x1000); \
 	.error "Fixed section exception vector bad size";	\
 	.endif;							\
 	.if (start) < sname##_start;				\
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 6b03dff61a05..160f3bb77ea4 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -755,6 +755,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
 #define N_SLINE	68
 #define N_SO	100
 
+#define RFSCV	.long 0x4c0000a4
+
 /*
  * Create an endian fixup trampoline
  *
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index eedcbfb9a6ff..414569940c3f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -304,7 +304,7 @@ struct thread_struct {
 	.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \
 	.addr_limit = KERNEL_DS, \
 	.fpexc_mode = 0, \
-	.fscr = FSCR_TAR | FSCR_EBB \
+	.fscr = FSCR_TAR | FSCR_EBB | FSCR_SCV \
 }
 #endif
 
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 65676e2325b8..9efbddee2bca 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -30,12 +30,12 @@ void setup_panic(void);
 #define ARCH_PANIC_TIMEOUT 180
 
 #ifdef CONFIG_PPC_PSERIES
-extern void pseries_enable_reloc_on_exc(void);
+extern bool pseries_enable_reloc_on_exc(void);
 extern void pseries_disable_reloc_on_exc(void);
 extern void pseries_big_endian_exceptions(void);
 extern void pseries_little_endian_exceptions(void);
 #else
-static inline void pseries_enable_reloc_on_exc(void) {}
+static inline bool pseries_enable_reloc_on_exc(void) { return false; }
 static inline void pseries_disable_reloc_on_exc(void) {}
 static inline void pseries_big_endian_exceptions(void) {}
 static inline void pseries_little_endian_exceptions(void) {}
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..6b087275d499 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -184,7 +184,7 @@ __init_LPCR_ISA300:
 
 __init_FSCR:
 	mfspr	r3,SPRN_FSCR
-	ori	r3,r3,FSCR_TAR|FSCR_DSCR|FSCR_EBB
+	ori	r3,r3,FSCR_SCV|FSCR_TAR|FSCR_DSCR|FSCR_EBB
 	mtspr	SPRN_FSCR,r3
 	blr
 
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index e745abc5457a..286d896546fb 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -118,7 +118,8 @@ extern void __restore_cpu_e6500(void);
 #define COMMON_USER2_POWER9	(COMMON_USER2_POWER8 | \
 				 PPC_FEATURE2_ARCH_3_00 | \
 				 PPC_FEATURE2_HAS_IEEE128 | \
-				 PPC_FEATURE2_DARN )
+				 PPC_FEATURE2_DARN | \
+				 PPC_FEATURE2_SCV)
 
 #ifdef CONFIG_PPC_BOOK3E_64
 #define COMMON_USER_BOOKE	(COMMON_USER_PPC64 | PPC_FEATURE_BOOKE)
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 182b4047c1ef..48340d288825 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -566,6 +566,7 @@ static struct dt_cpu_feature_match __initdata
 	{"little-endian", feat_enable_le, CPU_FTR_REAL_LE},
 	{"smt", feat_enable_smt, 0},
 	{"interrupt-facilities", feat_enable, 0},
+	{"system-call-vectored", feat_enable, 0},
 	{"timer-facilities", feat_enable, 0},
 	{"timer-facilities-v3", feat_enable, 0},
 	{"debug-facilities", feat_enable, 0},
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 8406812c9734..4c0d0400e93d 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -62,6 +62,119 @@ exception_marker:
 	.section	".text"
 	.align 7
 
+	.globl system_call_vectored_common
+system_call_vectored_common:
+	INTERRUPT_TO_KERNEL
+	mr	r10,r1
+	ld	r1,PACAKSAVE(r13)
+	std	r10,0(r1)
+	std	r11,_NIP(r1)
+	std	r12,_MSR(r1)
+	std	r0,GPR0(r1)
+	std	r10,GPR1(r1)
+	std	r2,GPR2(r1)
+	ld	r2,PACATOC(r13)
+	mfcr	r12
+	li	r11,0
+	/* Can we avoid saving r3-r8 in common case? */
+	std	r3,GPR3(r1)
+	std	r4,GPR4(r1)
+	std	r5,GPR5(r1)
+	std	r6,GPR6(r1)
+	std	r7,GPR7(r1)
+	std	r8,GPR8(r1)
+	/* Zero r9-r12, this should only be required when restoring all GPRs */
+	std	r11,GPR9(r1)
+	std	r11,GPR10(r1)
+	std	r11,GPR11(r1)
+	std	r11,GPR12(r1)
+	std	r9,GPR13(r1)
+	SAVE_NVGPRS(r1)
+	std	r11,_XER(r1)
+	std	r11,_LINK(r1)
+	std	r11,_CTR(r1)
+
+	li	r11,0xc00
+	std	r11,_TRAP(r1)
+	std	r12,_CCR(r1)
+	std	r3,ORIG_GPR3(r1)
+	addi	r10,r1,STACK_FRAME_OVERHEAD
+	ld	r11,exception_marker@toc(r2)
+	std	r11,-16(r10)		/* "regshere" marker */
+
+	/*
+	 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
+	 * would clobber syscall parameters. Also we always enter with IRQs
+	 * enabled and nothing pending. system_call_exception() will call
+	 * trace_hardirqs_off().
+	 *
+	 * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS.
+	 */
+	li	r9,IRQS_ALL_DISABLED
+	stb	r9,PACAIRQSOFTMASK(r13)
+
+	/* Calling convention has r9 = orig r0, r10 = regs */
+	mr	r9,r0
+	bl	system_call_exception
+
+.Lsyscall_vectored_exit:
+	addi    r4,r1,STACK_FRAME_OVERHEAD
+	li	r5,1 /* scv */
+	bl	syscall_exit_prepare
+
+	ld	r2,_CCR(r1)
+	ld	r4,_NIP(r1)
+	ld	r5,_MSR(r1)
+
+BEGIN_FTR_SECTION
+	stdcx.	r0,0,r1			/* to clear the reservation */
+END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
+
+	mtlr	r4
+	mtctr	r5
+
+	cmpdi	r3,0
+	bne	syscall_vectored_restore_regs
+	li	r0,0
+	li	r4,0
+	li	r5,0
+	li	r6,0
+	li	r7,0
+	li	r8,0
+	li	r9,0
+	li	r10,0
+	li	r11,0
+	li	r12,0
+	mtspr	SPRN_XER,r0
+.Lsyscall_vectored_restore_regs_cont:
+
+BEGIN_FTR_SECTION
+	HMT_MEDIUM_LOW
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+
+	/*
+	 * We don't need to restore AMR on the way back to userspace for KUAP.
+	 * The value of AMR only matters while we're in the kernel.
+	 */
+	mtcr	r2
+	ld	r2,GPR2(r1)
+	ld	r3,GPR3(r1)
+	ld	r13,GPR13(r1)
+	ld	r1,GPR1(r1)
+	RFSCV_TO_USER
+	b	.	/* prevent speculative execution */
+_ASM_NOKPROBE_SYMBOL(system_call_vectored_common);
+
+syscall_vectored_restore_regs:
+	ld	r4,_XER(r1)
+	REST_NVGPRS(r1)
+	mtspr	SPRN_XER,r4
+	ld	r0,GPR0(r1)
+	REST_8GPRS(4, r1)
+	ld	r12,GPR12(r1)
+	b	.Lsyscall_vectored_restore_regs_cont
+
+	.balign IFETCH_ALIGN_BYTES
 	.globl system_call_common
 system_call_common:
 _ASM_NOKPROBE_SYMBOL(system_call_common)
@@ -130,6 +243,7 @@ END_BTB_FLUSH_SECTION
 
 .Lsyscall_exit:
 	addi    r4,r1,STACK_FRAME_OVERHEAD
+	li	r5,0 /* !scv */
 	bl	syscall_exit_prepare
 
 	ld	r2,_CCR(r1)
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 7a6be3f32973..6a936c9199d6 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -742,6 +742,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
  * guarantee they will be delivered virtually. Some conditions (see the ISA)
  * cause exceptions to be delivered in real mode.
  *
+ * The scv instructions are a special case. They get a 0x3000 offset applied.
+ * scv exceptions have unique reentrancy properties, see below.
+ *
  * It's impossible to receive interrupts below 0x300 via AIL.
  *
  * KVM: None of the virtual exceptions are from the guest. Anything that
@@ -751,8 +754,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
  * We layout physical memory as follows:
  * 0x0000 - 0x00ff : Secondary processor spin code
  * 0x0100 - 0x18ff : Real mode pSeries interrupt vectors
- * 0x1900 - 0x3fff : Real mode trampolines
- * 0x4000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors
+ * 0x1900 - 0x2fff : Real mode trampolines
+ * 0x3000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors
  * 0x5900 - 0x6fff : Relon mode trampolines
  * 0x7000 - 0x7fff : FWNMI data area
  * 0x8000 -   .... : Common interrupt handlers, remaining early
@@ -763,8 +766,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
  * vectors there.
  */
 OPEN_FIXED_SECTION(real_vectors,        0x0100, 0x1900)
-OPEN_FIXED_SECTION(real_trampolines,    0x1900, 0x4000)
-OPEN_FIXED_SECTION(virt_vectors,        0x4000, 0x5900)
+OPEN_FIXED_SECTION(real_trampolines,    0x1900, 0x3000)
+OPEN_FIXED_SECTION(virt_vectors,        0x3000, 0x5900)
 OPEN_FIXED_SECTION(virt_trampolines,    0x5900, 0x7000)
 
 #ifdef CONFIG_PPC_POWERNV
@@ -800,6 +803,73 @@ USE_FIXED_SECTION(real_vectors)
 	.globl __start_interrupts
 __start_interrupts:
 
+/**
+ * Interrupt 0x3000 - System Call Vectored Interrupt (syscall).
+ * This is a synchronous interrupt invoked with the "scv" instruction. The
+ * system call does not alter the HV bit, so it is directed to the OS.
+ *
+ * Handling:
+ * scv instructions enter the kernel without changing EE, RI, ME, or HV.
+ * In particular, this means we can take a maskable interrupt at any point
+ * in the scv handler, which is unlike any other interrupt. This is solved
+ * by treating the instruction addresses below __end_interrupts as being
+ * soft-masked.
+ *
+ * AIL-0 mode scv exceptions go to 0x17000-0x17fff, but we set AIL-3 and
+ * ensure scv is never executed with relocation off, which means AIL-0
+ * should never happen.
+ *
+ * Before leaving the below __end_interrupts text, at least of the following
+ * must be true:
+ * - MSR[PR]=1 (i.e., return to userspace)
+ * - MSR_EE|MSR_RI is set (no reentrant exceptions)
+ * - Standard kernel environment is set up (stack, paca, etc)
+ *
+ * Call convention:
+ *
+ * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
+ */
+EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000)
+	/* SCV 0 */
+.L_scv0:
+	mr	r9,r13
+	GET_PACA(r13)
+	mflr	r11
+	mfctr	r12
+	li	r10,IRQS_ALL_DISABLED
+	stb	r10,PACAIRQSOFTMASK(r13)
+#ifdef CONFIG_RELOCATABLE
+	b	system_call_vectored_tramp
+#else
+	b	system_call_vectored_common
+#endif
+	nop
+
+	/* SCV 1 - 127 */
+	.rept	127
+	/*
+	 * cause scv to return -ENOSYS.
+	 * This may look a bit funny to tracing.
+	 */
+	li	r0,-1
+	b	.L_scv0
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop
+	.endr
+EXC_VIRT_END(system_call_vectored, 0x3000, 0x1000)
+
+#ifdef CONFIG_RELOCATABLE
+TRAMP_VIRT_BEGIN(system_call_vectored_tramp)
+	__LOAD_HANDLER(r10, system_call_vectored_common)
+	mtctr	r10
+	bctr
+#endif
+
+
 /* No virt vectors corresponding with 0x0..0x100 */
 EXC_VIRT_NONE(0x4000, 0x100)
 
@@ -2916,6 +2986,47 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
 	GET_SCRATCH0(r13);
 	hrfid
 
+TRAMP_REAL_BEGIN(rfscv_flush_fallback)
+	/* system call volatile */
+	mr	r7,r13
+	GET_PACA(r13);
+	mr	r8,r1
+	ld	r1,PACAKSAVE(r13)
+	mfctr	r9
+	ld	r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13)
+	ld	r11,PACA_L1D_FLUSH_SIZE(r13)
+	srdi	r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */
+	mtctr	r11
+	DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */
+
+	/* order ld/st prior to dcbt stop all streams with flushing */
+	sync
+
+	/*
+	 * The load adresses are at staggered offsets within cachelines,
+	 * which suits some pipelines better (on others it should not
+	 * hurt).
+	 */
+1:
+	ld	r11,(0x80 + 8)*0(r10)
+	ld	r11,(0x80 + 8)*1(r10)
+	ld	r11,(0x80 + 8)*2(r10)
+	ld	r11,(0x80 + 8)*3(r10)
+	ld	r11,(0x80 + 8)*4(r10)
+	ld	r11,(0x80 + 8)*5(r10)
+	ld	r11,(0x80 + 8)*6(r10)
+	ld	r11,(0x80 + 8)*7(r10)
+	addi	r10,r10,0x80*8
+	bdnz	1b
+
+	mtctr	r9
+	li	r9,0
+	li	r10,0
+	li	r11,0
+	mr	r1,r8
+	mr	r13,r7
+	RFSCV
+
 USE_TEXT_SECTION()
 	MASKED_INTERRUPT
 	MASKED_INTERRUPT hsrr=1
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index e05e6dd67ae6..3bf03666ee09 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -196,7 +196,10 @@ static void __init configure_exceptions(void)
 	/* Under a PAPR hypervisor, we need hypercalls */
 	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
 		/* Enable AIL if possible */
-		pseries_enable_reloc_on_exc();
+		if (!pseries_enable_reloc_on_exc()) {
+			init_task.thread.fscr &= ~FSCR_SCV;
+			cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
+		}
 
 		/*
 		 * Tell the hypervisor that we want our exceptions to
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 32601a572ff0..87d95b455b83 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -121,7 +121,8 @@ notrace long system_call_exception(long r3, long r4, long r5,
  * because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
  */
 notrace unsigned long syscall_exit_prepare(unsigned long r3,
-					   struct pt_regs *regs)
+					   struct pt_regs *regs,
+					   long scv)
 {
 	unsigned long *ti_flagsp = &current_thread_info()->flags;
 	unsigned long ti_flags;
@@ -134,7 +135,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 
 	ti_flags = *ti_flagsp;
 
-	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) {
+	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
 		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
 			r3 = -r3;
 			regs->ccr |= 0x10000000; /* Set SO bit in CR */
@@ -191,9 +192,14 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
 	trace_hardirqs_on();
 
 	/* This pattern matches prep_irq_for_idle */
-	__hard_EE_RI_disable();
+	/* scv need not set RI=0 because SRRs are not used */
+	if (scv)
+		__hard_irq_disable();
+	else
+		__hard_EE_RI_disable();
 	if (unlikely(lazy_irq_pending())) {
-		__hard_RI_enable();
+		if (!scv)
+			__hard_RI_enable();
 		trace_hardirqs_off();
 		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
 		local_irq_enable();
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0c8421dd01ab..17d17f064a2d 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -349,7 +349,7 @@ static void pseries_lpar_idle(void)
  * to ever be a problem in practice we can move this into a kernel thread to
  * finish off the process later in boot.
  */
-void pseries_enable_reloc_on_exc(void)
+bool pseries_enable_reloc_on_exc(void)
 {
 	long rc;
 	unsigned int delay, total_delay = 0;
@@ -360,11 +360,13 @@ void pseries_enable_reloc_on_exc(void)
 			if (rc == H_P2) {
 				pr_info("Relocation on exceptions not"
 					" supported\n");
+				return false;
 			} else if (rc != H_SUCCESS) {
 				pr_warn("Unable to enable relocation"
 					" on exceptions: %ld\n", rc);
+				return false;
 			}
-			break;
+			return true;
 		}
 
 		delay = get_longbusy_msecs(rc);
@@ -373,7 +375,7 @@ void pseries_enable_reloc_on_exc(void)
 			pr_warn("Warning: Giving up waiting to enable "
 				"relocation on exceptions (%u msec)!\n",
 				total_delay);
-			return;
+			return false;
 		}
 
 		mdelay(delay);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning
  2020-02-25 17:35 ` [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning Nicholas Piggin
@ 2020-02-25 21:20   ` Segher Boessenkool
  2020-02-26  3:39     ` Nicholas Piggin
  2020-03-07  0:54     ` [PATCH] Fix " Nicholas Piggin
  0 siblings, 2 replies; 161+ messages in thread
From: Segher Boessenkool @ 2020-02-25 21:20 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Michal Suchanek, linuxppc-dev

Hi!

On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote:
> Kernel addresses and potentially other sensitive data could be leaked
> in volatile registers after a syscall.

>  	cmpdi	r3,0
>  	bne	.Lsyscall_restore_regs
> +	li	r0,0
> +	li	r4,0
> +	li	r5,0
> +	li	r6,0
> +	li	r7,0
> +	li	r8,0
> +	li	r9,0
> +	li	r10,0
> +	li	r11,0
> +	li	r12,0
> +	mtctr	r0
> +	mtspr	SPRN_XER,r0
>  .Lsyscall_restore_regs_cont:

What about LR?  Is that taken care of later?

This also deserves a big fat comment imo, it is very important after
all, and not so obvious.


Segher

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning
  2020-02-25 21:20   ` Segher Boessenkool
@ 2020-02-26  3:39     ` Nicholas Piggin
  2020-03-07  0:54     ` [PATCH] Fix " Nicholas Piggin
  1 sibling, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-02-26  3:39 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michal Suchanek, linuxppc-dev

Segher Boessenkool's on February 26, 2020 7:20 am:
> Hi!
> 
> On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote:
>> Kernel addresses and potentially other sensitive data could be leaked
>> in volatile registers after a syscall.
> 
>>  	cmpdi	r3,0
>>  	bne	.Lsyscall_restore_regs
>> +	li	r0,0
>> +	li	r4,0
>> +	li	r5,0
>> +	li	r6,0
>> +	li	r7,0
>> +	li	r8,0
>> +	li	r9,0
>> +	li	r10,0
>> +	li	r11,0
>> +	li	r12,0
>> +	mtctr	r0
>> +	mtspr	SPRN_XER,r0
>>  .Lsyscall_restore_regs_cont:
> 
> What about LR?  Is that taken care of later?

LR is preserved by sc as per ABI.

> This also deserves a big fat comment imo, it is very important after
> all, and not so obvious.

Sure I can add something.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions
  2020-02-25 17:35 ` [PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin
@ 2020-03-01 12:20     ` kbuild test robot
  0 siblings, 0 replies; 161+ messages in thread
From: kbuild test robot @ 2020-03-01 12:20 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7145 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.6-rc3 next-20200228]
[cannot apply to kvm-ppc/kvm-ppc-next scottwood/next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-interrupts-and-syscalls-series/20200226-043224
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ppc64e_defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/entry_64.S: Assembler messages:
>> arch/powerpc/kernel/entry_64.S:67: Error: unrecognized opcode: `interrupt_to_kernel'
>> arch/powerpc/kernel/entry_64.S:164: Error: unrecognized opcode: `rfscv_to_user'

vim +67 arch/powerpc/kernel/entry_64.S

    47	
    48	/*
    49	 * System calls.
    50	 */
    51		.section	".toc","aw"
    52	SYS_CALL_TABLE:
    53		.tc sys_call_table[TC],sys_call_table
    54	
    55	COMPAT_SYS_CALL_TABLE:
    56		.tc compat_sys_call_table[TC],compat_sys_call_table
    57	
    58	/* This value is used to mark exception frames on the stack. */
    59	exception_marker:
    60		.tc	ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
    61	
    62		.section	".text"
    63		.align 7
    64	
    65		.globl system_call_vectored_common
    66	system_call_vectored_common:
  > 67		INTERRUPT_TO_KERNEL
    68		mr	r10,r1
    69		ld	r1,PACAKSAVE(r13)
    70		std	r10,0(r1)
    71		std	r11,_NIP(r1)
    72		std	r12,_MSR(r1)
    73		std	r0,GPR0(r1)
    74		std	r10,GPR1(r1)
    75		std	r2,GPR2(r1)
    76		ld	r2,PACATOC(r13)
    77		mfcr	r12
    78		li	r11,0
    79		/* Can we avoid saving r3-r8 in common case? */
    80		std	r3,GPR3(r1)
    81		std	r4,GPR4(r1)
    82		std	r5,GPR5(r1)
    83		std	r6,GPR6(r1)
    84		std	r7,GPR7(r1)
    85		std	r8,GPR8(r1)
    86		/* Zero r9-r12, this should only be required when restoring all GPRs */
    87		std	r11,GPR9(r1)
    88		std	r11,GPR10(r1)
    89		std	r11,GPR11(r1)
    90		std	r11,GPR12(r1)
    91		std	r9,GPR13(r1)
    92		SAVE_NVGPRS(r1)
    93		std	r11,_XER(r1)
    94		std	r11,_LINK(r1)
    95		std	r11,_CTR(r1)
    96	
    97		li	r11,0xc00
    98		std	r11,_TRAP(r1)
    99		std	r12,_CCR(r1)
   100		std	r3,ORIG_GPR3(r1)
   101		addi	r10,r1,STACK_FRAME_OVERHEAD
   102		ld	r11,exception_marker@toc(r2)
   103		std	r11,-16(r10)		/* "regshere" marker */
   104	
   105		/*
   106		 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
   107		 * would clobber syscall parameters. Also we always enter with IRQs
   108		 * enabled and nothing pending. system_call_exception() will call
   109		 * trace_hardirqs_off().
   110		 *
   111		 * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS.
   112		 */
   113		li	r9,IRQS_ALL_DISABLED
   114		stb	r9,PACAIRQSOFTMASK(r13)
   115	
   116		/* Calling convention has r9 = orig r0, r10 = regs */
   117		mr	r9,r0
   118		bl	system_call_exception
   119	
   120	.Lsyscall_vectored_exit:
   121		addi    r4,r1,STACK_FRAME_OVERHEAD
   122		li	r5,1 /* scv */
   123		bl	syscall_exit_prepare
   124	
   125		ld	r2,_CCR(r1)
   126		ld	r4,_NIP(r1)
   127		ld	r5,_MSR(r1)
   128	
   129	BEGIN_FTR_SECTION
   130		stdcx.	r0,0,r1			/* to clear the reservation */
   131	END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
   132	
   133		mtlr	r4
   134		mtctr	r5
   135	
   136		cmpdi	r3,0
   137		bne	syscall_vectored_restore_regs
   138		li	r0,0
   139		li	r4,0
   140		li	r5,0
   141		li	r6,0
   142		li	r7,0
   143		li	r8,0
   144		li	r9,0
   145		li	r10,0
   146		li	r11,0
   147		li	r12,0
   148		mtspr	SPRN_XER,r0
   149	.Lsyscall_vectored_restore_regs_cont:
   150	
   151	BEGIN_FTR_SECTION
   152		HMT_MEDIUM_LOW
   153	END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
   154	
   155		/*
   156		 * We don't need to restore AMR on the way back to userspace for KUAP.
   157		 * The value of AMR only matters while we're in the kernel.
   158		 */
   159		mtcr	r2
   160		ld	r2,GPR2(r1)
   161		ld	r3,GPR3(r1)
   162		ld	r13,GPR13(r1)
   163		ld	r1,GPR1(r1)
 > 164		RFSCV_TO_USER
   165		b	.	/* prevent speculative execution */
   166	_ASM_NOKPROBE_SYMBOL(system_call_vectored_common);
   167	
   168	syscall_vectored_restore_regs:
   169		ld	r4,_XER(r1)
   170		REST_NVGPRS(r1)
   171		mtspr	SPRN_XER,r4
   172		ld	r0,GPR0(r1)
   173		REST_8GPRS(4, r1)
   174		ld	r12,GPR12(r1)
   175		b	.Lsyscall_vectored_restore_regs_cont
   176	
   177		.balign IFETCH_ALIGN_BYTES
   178		.globl system_call_common
   179	system_call_common:
   180	_ASM_NOKPROBE_SYMBOL(system_call_common)
   181		mr	r10,r1
   182		ld	r1,PACAKSAVE(r13)
   183		std	r10,0(r1)
   184		std	r11,_NIP(r1)
   185		std	r12,_MSR(r1)
   186		std	r0,GPR0(r1)
   187		std	r10,GPR1(r1)
   188		std	r2,GPR2(r1)
   189	#ifdef CONFIG_PPC_FSL_BOOK3E
   190	START_BTB_FLUSH_SECTION
   191		BTB_FLUSH(r10)
   192	END_BTB_FLUSH_SECTION
   193	#endif
   194		ld	r2,PACATOC(r13)
   195		mfcr	r12
   196		li	r11,0
   197		/* Can we avoid saving r3-r8 in common case? */
   198		std	r3,GPR3(r1)
   199		std	r4,GPR4(r1)
   200		std	r5,GPR5(r1)
   201		std	r6,GPR6(r1)
   202		std	r7,GPR7(r1)
   203		std	r8,GPR8(r1)
   204		/* Zero r9-r12, this should only be required when restoring all GPRs */
   205		std	r11,GPR9(r1)
   206		std	r11,GPR10(r1)
   207		std	r11,GPR11(r1)
   208		std	r11,GPR12(r1)
   209		std	r9,GPR13(r1)
   210		SAVE_NVGPRS(r1)
   211		std	r11,_XER(r1)
   212		std	r11,_CTR(r1)
   213		mflr	r10
   214	
   215		/*
   216		 * This clears CR0.SO (bit 28), which is the error indication on
   217		 * return from this system call.
   218		 */
   219		rldimi	r12,r11,28,(63-28)
   220		li	r11,0xc00
   221		std	r10,_LINK(r1)
   222		std	r11,_TRAP(r1)
   223		std	r12,_CCR(r1)
   224		std	r3,ORIG_GPR3(r1)
   225		addi	r10,r1,STACK_FRAME_OVERHEAD
   226		ld	r11,exception_marker@toc(r2)
   227		std	r11,-16(r10)		/* "regshere" marker */
   228	
   229		/*
   230		 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
   231		 * would clobber syscall parameters. Also we always enter with IRQs
   232		 * enabled and nothing pending. system_call_exception() will call
   233		 * trace_hardirqs_off().
   234		 */
   235		li	r11,IRQS_ALL_DISABLED
   236		li	r12,PACA_IRQ_HARD_DIS
   237		stb	r11,PACAIRQSOFTMASK(r13)
   238		stb	r12,PACAIRQHAPPENED(r13)
   239	
   240		/* Calling convention has r9 = orig r0, r10 = regs */
   241		mr	r9,r0
   242		bl	system_call_exception
   243	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 17284 bytes --]

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions
@ 2020-03-01 12:20     ` kbuild test robot
  0 siblings, 0 replies; 161+ messages in thread
From: kbuild test robot @ 2020-03-01 12:20 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7384 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.6-rc3 next-20200228]
[cannot apply to kvm-ppc/kvm-ppc-next scottwood/next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-interrupts-and-syscalls-series/20200226-043224
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ppc64e_defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/entry_64.S: Assembler messages:
>> arch/powerpc/kernel/entry_64.S:67: Error: unrecognized opcode: `interrupt_to_kernel'
>> arch/powerpc/kernel/entry_64.S:164: Error: unrecognized opcode: `rfscv_to_user'

vim +67 arch/powerpc/kernel/entry_64.S

    47	
    48	/*
    49	 * System calls.
    50	 */
    51		.section	".toc","aw"
    52	SYS_CALL_TABLE:
    53		.tc sys_call_table[TC],sys_call_table
    54	
    55	COMPAT_SYS_CALL_TABLE:
    56		.tc compat_sys_call_table[TC],compat_sys_call_table
    57	
    58	/* This value is used to mark exception frames on the stack. */
    59	exception_marker:
    60		.tc	ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
    61	
    62		.section	".text"
    63		.align 7
    64	
    65		.globl system_call_vectored_common
    66	system_call_vectored_common:
  > 67		INTERRUPT_TO_KERNEL
    68		mr	r10,r1
    69		ld	r1,PACAKSAVE(r13)
    70		std	r10,0(r1)
    71		std	r11,_NIP(r1)
    72		std	r12,_MSR(r1)
    73		std	r0,GPR0(r1)
    74		std	r10,GPR1(r1)
    75		std	r2,GPR2(r1)
    76		ld	r2,PACATOC(r13)
    77		mfcr	r12
    78		li	r11,0
    79		/* Can we avoid saving r3-r8 in common case? */
    80		std	r3,GPR3(r1)
    81		std	r4,GPR4(r1)
    82		std	r5,GPR5(r1)
    83		std	r6,GPR6(r1)
    84		std	r7,GPR7(r1)
    85		std	r8,GPR8(r1)
    86		/* Zero r9-r12, this should only be required when restoring all GPRs */
    87		std	r11,GPR9(r1)
    88		std	r11,GPR10(r1)
    89		std	r11,GPR11(r1)
    90		std	r11,GPR12(r1)
    91		std	r9,GPR13(r1)
    92		SAVE_NVGPRS(r1)
    93		std	r11,_XER(r1)
    94		std	r11,_LINK(r1)
    95		std	r11,_CTR(r1)
    96	
    97		li	r11,0xc00
    98		std	r11,_TRAP(r1)
    99		std	r12,_CCR(r1)
   100		std	r3,ORIG_GPR3(r1)
   101		addi	r10,r1,STACK_FRAME_OVERHEAD
   102		ld	r11,exception_marker(a)toc(r2)
   103		std	r11,-16(r10)		/* "regshere" marker */
   104	
   105		/*
   106		 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
   107		 * would clobber syscall parameters. Also we always enter with IRQs
   108		 * enabled and nothing pending. system_call_exception() will call
   109		 * trace_hardirqs_off().
   110		 *
   111		 * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS.
   112		 */
   113		li	r9,IRQS_ALL_DISABLED
   114		stb	r9,PACAIRQSOFTMASK(r13)
   115	
   116		/* Calling convention has r9 = orig r0, r10 = regs */
   117		mr	r9,r0
   118		bl	system_call_exception
   119	
   120	.Lsyscall_vectored_exit:
   121		addi    r4,r1,STACK_FRAME_OVERHEAD
   122		li	r5,1 /* scv */
   123		bl	syscall_exit_prepare
   124	
   125		ld	r2,_CCR(r1)
   126		ld	r4,_NIP(r1)
   127		ld	r5,_MSR(r1)
   128	
   129	BEGIN_FTR_SECTION
   130		stdcx.	r0,0,r1			/* to clear the reservation */
   131	END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
   132	
   133		mtlr	r4
   134		mtctr	r5
   135	
   136		cmpdi	r3,0
   137		bne	syscall_vectored_restore_regs
   138		li	r0,0
   139		li	r4,0
   140		li	r5,0
   141		li	r6,0
   142		li	r7,0
   143		li	r8,0
   144		li	r9,0
   145		li	r10,0
   146		li	r11,0
   147		li	r12,0
   148		mtspr	SPRN_XER,r0
   149	.Lsyscall_vectored_restore_regs_cont:
   150	
   151	BEGIN_FTR_SECTION
   152		HMT_MEDIUM_LOW
   153	END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
   154	
   155		/*
   156		 * We don't need to restore AMR on the way back to userspace for KUAP.
   157		 * The value of AMR only matters while we're in the kernel.
   158		 */
   159		mtcr	r2
   160		ld	r2,GPR2(r1)
   161		ld	r3,GPR3(r1)
   162		ld	r13,GPR13(r1)
   163		ld	r1,GPR1(r1)
 > 164		RFSCV_TO_USER
   165		b	.	/* prevent speculative execution */
   166	_ASM_NOKPROBE_SYMBOL(system_call_vectored_common);
   167	
   168	syscall_vectored_restore_regs:
   169		ld	r4,_XER(r1)
   170		REST_NVGPRS(r1)
   171		mtspr	SPRN_XER,r4
   172		ld	r0,GPR0(r1)
   173		REST_8GPRS(4, r1)
   174		ld	r12,GPR12(r1)
   175		b	.Lsyscall_vectored_restore_regs_cont
   176	
   177		.balign IFETCH_ALIGN_BYTES
   178		.globl system_call_common
   179	system_call_common:
   180	_ASM_NOKPROBE_SYMBOL(system_call_common)
   181		mr	r10,r1
   182		ld	r1,PACAKSAVE(r13)
   183		std	r10,0(r1)
   184		std	r11,_NIP(r1)
   185		std	r12,_MSR(r1)
   186		std	r0,GPR0(r1)
   187		std	r10,GPR1(r1)
   188		std	r2,GPR2(r1)
   189	#ifdef CONFIG_PPC_FSL_BOOK3E
   190	START_BTB_FLUSH_SECTION
   191		BTB_FLUSH(r10)
   192	END_BTB_FLUSH_SECTION
   193	#endif
   194		ld	r2,PACATOC(r13)
   195		mfcr	r12
   196		li	r11,0
   197		/* Can we avoid saving r3-r8 in common case? */
   198		std	r3,GPR3(r1)
   199		std	r4,GPR4(r1)
   200		std	r5,GPR5(r1)
   201		std	r6,GPR6(r1)
   202		std	r7,GPR7(r1)
   203		std	r8,GPR8(r1)
   204		/* Zero r9-r12, this should only be required when restoring all GPRs */
   205		std	r11,GPR9(r1)
   206		std	r11,GPR10(r1)
   207		std	r11,GPR11(r1)
   208		std	r11,GPR12(r1)
   209		std	r9,GPR13(r1)
   210		SAVE_NVGPRS(r1)
   211		std	r11,_XER(r1)
   212		std	r11,_CTR(r1)
   213		mflr	r10
   214	
   215		/*
   216		 * This clears CR0.SO (bit 28), which is the error indication on
   217		 * return from this system call.
   218		 */
   219		rldimi	r12,r11,28,(63-28)
   220		li	r11,0xc00
   221		std	r10,_LINK(r1)
   222		std	r11,_TRAP(r1)
   223		std	r12,_CCR(r1)
   224		std	r3,ORIG_GPR3(r1)
   225		addi	r10,r1,STACK_FRAME_OVERHEAD
   226		ld	r11,exception_marker(a)toc(r2)
   227		std	r11,-16(r10)		/* "regshere" marker */
   228	
   229		/*
   230		 * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
   231		 * would clobber syscall parameters. Also we always enter with IRQs
   232		 * enabled and nothing pending. system_call_exception() will call
   233		 * trace_hardirqs_off().
   234		 */
   235		li	r11,IRQS_ALL_DISABLED
   236		li	r12,PACA_IRQ_HARD_DIS
   237		stb	r11,PACAIRQSOFTMASK(r13)
   238		stb	r12,PACAIRQHAPPENED(r13)
   239	
   240		/* Calling convention has r9 = orig r0, r10 = regs */
   241		mr	r9,r0
   242		bl	system_call_exception
   243	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 17284 bytes --]

^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH] Fix powerpc/64: system call zero volatile registers when returning
  2020-02-25 21:20   ` Segher Boessenkool
  2020-02-26  3:39     ` Nicholas Piggin
@ 2020-03-07  0:54     ` Nicholas Piggin
  1 sibling, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-03-07  0:54 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Michal Suchanek, linuxppc-dev

Here's an incremental fix that can be folded into the patch.

Segher Boessenkool's on February 26, 2020 7:20 am:
> Hi!
> 
> On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote:
>> Kernel addresses and potentially other sensitive data could be leaked
>> in volatile registers after a syscall.
> 
>>  	cmpdi	r3,0
>>  	bne	.Lsyscall_restore_regs
>> +	li	r0,0
>> +	li	r4,0
>> +	li	r5,0
>> +	li	r6,0
>> +	li	r7,0
>> +	li	r8,0
>> +	li	r9,0
>> +	li	r10,0
>> +	li	r11,0
>> +	li	r12,0
>> +	mtctr	r0
>> +	mtspr	SPRN_XER,r0
>>  .Lsyscall_restore_regs_cont:
> 
> What about LR?  Is that taken care of later?
> 
> This also deserves a big fat comment imo, it is very important after
> all, and not so obvious.
> 
> 
> Segher
> 

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0e2c56573a41..ea534375250b 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -135,6 +135,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
 	cmpdi	r3,0
 	bne	.Lsyscall_restore_regs
+	/* Zero volatile regs that may contain sensitive kernel data */
 	li	r0,0
 	li	r4,0
 	li	r5,0
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C
  2020-02-25 17:35 ` [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C Nicholas Piggin
@ 2020-03-19  9:18   ` Christophe Leroy
  2020-03-20  3:39     ` Nicholas Piggin
  0 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2020-03-19  9:18 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek



Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
> System call entry and particularly exit code is beyond the limit of what
> is reasonable to implement in asm.
> 
> This conversion moves all conditional branches out of the asm code,
> except for the case that all GPRs should be restored at exit.
> 
> Null syscall test is about 5% faster after this patch, because the exit
> work is handled under local_irq_disable, and the hard mask and pending
> interrupt replay is handled after that, which avoids games with MSR.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> 
> v2,rebase (from Michal):
> - Add endian conversion for dtl_idx (ms)
> - Fix sparse warning about missing declaration (ms)
> - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)
> 
> v3: Fixes thanks to reports from mpe and selftests errors:
> - Several soft-mask debug and unsafe smp_processor_id() warnings due to
>    tracing and other false positives due to checks in "unreconciled" code.
> - Fix a bug with syscall tracing functions that set registers (e.g.,
>    PTRACE_SETREG) not setting GPRs properly.
> - Fix silly tabort_syscall bug that causes kernel crashes when making system
>    calls in transactional state.
> 
>   arch/powerpc/include/asm/asm-prototypes.h     |  17 +-
>   .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
>   arch/powerpc/include/asm/cputime.h            |  29 ++
>   arch/powerpc/include/asm/hw_irq.h             |   4 +
>   arch/powerpc/include/asm/ptrace.h             |   3 +
>   arch/powerpc/include/asm/signal.h             |   3 +
>   arch/powerpc/include/asm/switch_to.h          |   5 +
>   arch/powerpc/include/asm/time.h               |   3 +
>   arch/powerpc/kernel/Makefile                  |   3 +-
>   arch/powerpc/kernel/entry_64.S                | 338 +++---------------
>   arch/powerpc/kernel/signal.h                  |   2 -
>   arch/powerpc/kernel/syscall_64.c              | 213 +++++++++++
>   arch/powerpc/kernel/systbl.S                  |   9 +-
>   13 files changed, 328 insertions(+), 315 deletions(-)
>   create mode 100644 arch/powerpc/kernel/syscall_64.c
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
> index 983c0084fb3f..4b3609554e76 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
>   unsigned long __init early_init(unsigned long dt_ptr);
>   void __init machine_init(u64 dt_ptr);
>   #endif
> +#ifdef CONFIG_PPC64

This ifdef is not necessary as it has no pending #else.
Having function declaration without definition is not an issue.
Keeping in mind that we are aiming at generalising this to PPC32.

> +long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
> +notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
> +notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
> +#endif
>   
>   long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
>   		      u32 len_high, u32 len_low);
> @@ -104,14 +110,6 @@ long sys_switch_endian(void);
>   notrace unsigned int __check_irq_replay(void);
>   void notrace restore_interrupts(void);
>   
> -/* ptrace */
> -long do_syscall_trace_enter(struct pt_regs *regs);
> -void do_syscall_trace_leave(struct pt_regs *regs);
> -
> -/* process */
> -void restore_math(struct pt_regs *regs);
> -void restore_tm_state(struct pt_regs *regs);
> -
>   /* prom_init (OpenFirmware) */
>   unsigned long __init prom_init(unsigned long r3, unsigned long r4,
>   			       unsigned long pp,
> @@ -122,9 +120,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
>   void __init early_setup(unsigned long dt_ptr);
>   void early_setup_secondary(void);
>   
> -/* time */
> -void accumulate_stolen_time(void);
> -
>   /* misc runtime */
>   extern u64 __bswapdi2(u64);
>   extern s64 __lshrdi3(s64, int);
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index 90dd3a3fc8c7..71081d90f999 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> @@ -3,6 +3,7 @@
>   #define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
>   
>   #include <linux/const.h>
> +#include <asm/reg.h>
>   
>   #define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
>   #define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
> @@ -56,7 +57,14 @@
>   
>   #ifdef CONFIG_PPC_KUAP
>   
> -#include <asm/reg.h>
> +#include <asm/mmu.h>
> +#include <asm/ptrace.h>
> +
> +static inline void kuap_check_amr(void)
> +{
> +	if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP))
> +		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
> +}
>   
>   /*
>    * We support individually allowing read or write, but we don't support nesting
> @@ -127,6 +135,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
>   		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
>   		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
>   }
> +#else /* CONFIG_PPC_KUAP */
> +static inline void kuap_check_amr(void)
> +{
> +}
>   #endif /* CONFIG_PPC_KUAP */
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
> index 2431b4ada2fa..6639a6847cc0 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
>   #ifdef CONFIG_PPC64
>   #define get_accounting(tsk)	(&get_paca()->accounting)
>   static inline void arch_vtime_task_switch(struct task_struct *tsk) { }

Could we have the below additions sit outside of this PPC64 ifdef, to be 
reused on PPC32 ?

> +
> +/*
> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
> + * can't use use get_paca()
> + */
> +static notrace inline void account_cpu_user_entry(void)
> +{
> +	unsigned long tb = mftb();
> +	struct cpu_accounting_data *acct = &local_paca->accounting;

In the spirit of reusing that code on PPC32, can we use get_accounting() 
? Or an alternate version of get_accounting(), eg 
get_accounting_notrace() to be defined ?

> +
> +	acct->utime += (tb - acct->starttime_user);
> +	acct->starttime = tb;
> +}
> +static notrace inline void account_cpu_user_exit(void)
> +{
> +	unsigned long tb = mftb();
> +	struct cpu_accounting_data *acct = &local_paca->accounting;
> +
> +	acct->stime += (tb - acct->starttime);
> +	acct->starttime_user = tb;
> +}
> +
>   #else
>   #define get_accounting(tsk)	(&task_thread_info(tsk)->accounting)
>   /*
> @@ -61,5 +83,12 @@ static inline void arch_vtime_task_switch(struct task_struct *prev)
>   #endif
>   
>   #endif /* __KERNEL__ */
> +#else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> +static inline void account_cpu_user_entry(void)
> +{
> +}
> +static inline void account_cpu_user_exit(void)
> +{
> +}
>   #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>   #endif /* __POWERPC_CPUTIME_H */
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index e3a905e3d573..310583e62bd9 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -228,9 +228,13 @@ static inline bool arch_irqs_disabled(void)
>   #ifdef CONFIG_PPC_BOOK3E
>   #define __hard_irq_enable()	wrtee(MSR_EE)
>   #define __hard_irq_disable()	wrtee(0)
> +#define __hard_EE_RI_disable()	wrtee(0)
> +#define __hard_RI_enable()	do { } while (0)
>   #else
>   #define __hard_irq_enable()	__mtmsrd(MSR_EE|MSR_RI, 1)
>   #define __hard_irq_disable()	__mtmsrd(MSR_RI, 1)
> +#define __hard_EE_RI_disable()	__mtmsrd(0, 1)
> +#define __hard_RI_enable()	__mtmsrd(MSR_RI, 1)
>   #endif
>   
>   #define hard_irq_disable()	do {					\
> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
> index ee3ada66deb5..082a40153b94 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -138,6 +138,9 @@ extern unsigned long profile_pc(struct pt_regs *regs);
>   #define profile_pc(regs) instruction_pointer(regs)
>   #endif
>   
> +long do_syscall_trace_enter(struct pt_regs *regs);
> +void do_syscall_trace_leave(struct pt_regs *regs);
> +
>   #define kernel_stack_pointer(regs) ((regs)->gpr[1])
>   static inline int is_syscall_success(struct pt_regs *regs)
>   {
> diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
> index 0803ca8b9149..99e1c6de27bc 100644
> --- a/arch/powerpc/include/asm/signal.h
> +++ b/arch/powerpc/include/asm/signal.h
> @@ -6,4 +6,7 @@
>   #include <uapi/asm/signal.h>
>   #include <uapi/asm/ptrace.h>
>   
> +struct pt_regs;
> +void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
> +
>   #endif /* _ASM_POWERPC_SIGNAL_H */
> diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
> index 5b03d8a82409..476008bc3d08 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -5,6 +5,7 @@
>   #ifndef _ASM_POWERPC_SWITCH_TO_H
>   #define _ASM_POWERPC_SWITCH_TO_H
>   
> +#include <linux/sched.h>
>   #include <asm/reg.h>
>   
>   struct thread_struct;
> @@ -22,6 +23,10 @@ extern void switch_booke_debug_regs(struct debug_reg *new_debug);
>   
>   extern int emulate_altivec(struct pt_regs *);
>   
> +void restore_math(struct pt_regs *regs);
> +
> +void restore_tm_state(struct pt_regs *regs);
> +
>   extern void flush_all_to_thread(struct task_struct *);
>   extern void giveup_all(struct task_struct *);
>   
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index e0107495c4de..39ce95016a3a 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -194,5 +194,8 @@ DECLARE_PER_CPU(u64, decrementers_next_tb);
>   /* Convert timebase ticks to nanoseconds */
>   unsigned long long tb_to_ns(unsigned long long tb_ticks);
>   
> +/* SPLPAR */
> +void accumulate_stolen_time(void);
> +
>   #endif /* __KERNEL__ */
>   #endif /* __POWERPC_TIME_H */
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 78a1b22d4fd8..5700231a8988 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -50,7 +50,8 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
>   				   of_platform.o prom_parse.o
>   obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
>   				   signal_64.o ptrace32.o \
> -				   paca.o nvram_64.o firmware.o note.o
> +				   paca.o nvram_64.o firmware.o note.o \
> +				   syscall_64.o
>   obj-$(CONFIG_VDSO32)		+= vdso32/
>   obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
>   obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o

[snip]

> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> index 800433685888..d396efca4068 100644
> --- a/arch/powerpc/kernel/signal.h
> +++ b/arch/powerpc/kernel/signal.h
> @@ -10,8 +10,6 @@
>   #ifndef _POWERPC_ARCH_SIGNAL_H
>   #define _POWERPC_ARCH_SIGNAL_H
>   
> -extern void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
> -
>   extern void __user *get_sigframe(struct ksignal *ksig, unsigned long sp,
>   				  size_t frame_size, int is_32);
>   
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c

Could some part of it go in a syscall.c to be reused on PPC32 ?

> new file mode 100644
> index 000000000000..20f77cc19df8
> --- /dev/null
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -0,0 +1,213 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#include <linux/err.h>
> +#include <asm/asm-prototypes.h>
> +#include <asm/book3s/64/kup-radix.h>
> +#include <asm/cputime.h>
> +#include <asm/hw_irq.h>
> +#include <asm/kprobes.h>
> +#include <asm/paca.h>
> +#include <asm/ptrace.h>
> +#include <asm/reg.h>
> +#include <asm/signal.h>
> +#include <asm/switch_to.h>
> +#include <asm/syscall.h>
> +#include <asm/time.h>
> +#include <asm/unistd.h>
> +
> +extern void __noreturn tabort_syscall(unsigned long nip, unsigned long msr);
> +
> +typedef long (*syscall_fn)(long, long, long, long, long, long);
> +
> +/* Has to run notrace because it is entered "unreconciled" */
> +notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
> +			   unsigned long r0, struct pt_regs *regs)
> +{
> +	unsigned long ti_flags;
> +	syscall_fn f;
> +
> +	BUG_ON(!(regs->msr & MSR_PR));
> +
> +	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> +	    unlikely(regs->msr & MSR_TS_T))
> +		tabort_syscall(regs->nip, regs->msr);
> +
> +	account_cpu_user_entry();
> +
> +#ifdef CONFIG_PPC_SPLPAR
> +	if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> +	    firmware_has_feature(FW_FEATURE_SPLPAR)) {
> +		struct lppaca *lp = local_paca->lppaca_ptr;
> +
> +		if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
> +			accumulate_stolen_time();
> +	}
> +#endif
> +
> +	kuap_check_amr();
> +
> +	/*
> +	 * A syscall should always be called with interrupts enabled
> +	 * so we just unconditionally hard-enable here. When some kind
> +	 * of irq tracing is used, we additionally check that condition
> +	 * is correct
> +	 */
> +	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
> +		WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
> +		WARN_ON(local_paca->irq_happened);
> +	}
> +	/*
> +	 * This is not required for the syscall exit path, but makes the
> +	 * stack frame look nicer. If this was initialised in the first stack
> +	 * frame, or if the unwinder was taught the first stack frame always
> +	 * returns to user with IRQS_ENABLED, this store could be avoided!
> +	 */
> +	regs->softe = IRQS_ENABLED;

softe doesn't exist on PPC32. Can we do that through a helper ?

> +
> +	__hard_irq_enable();

This doesn't exist on PPC32. Should we define __hard_irq_enable() as 
arch_local_irq_enable() on PPC32 ?

> +
> +	ti_flags = current_thread_info()->flags;
> +	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +		/*
> +		 * We use the return value of do_syscall_trace_enter() as the
> +		 * syscall number. If the syscall was rejected for any reason
> +		 * do_syscall_trace_enter() returns an invalid syscall number
> +		 * and the test against NR_syscalls will fail and the return
> +		 * value to be used is in regs->gpr[3].
> +		 */
> +		r0 = do_syscall_trace_enter(regs);
> +		if (unlikely(r0 >= NR_syscalls))
> +			return regs->gpr[3];
> +		r3 = regs->gpr[3];
> +		r4 = regs->gpr[4];
> +		r5 = regs->gpr[5];
> +		r6 = regs->gpr[6];
> +		r7 = regs->gpr[7];
> +		r8 = regs->gpr[8];
> +
> +	} else if (unlikely(r0 >= NR_syscalls)) {
> +		return -ENOSYS;
> +	}
> +
> +	/* May be faster to do array_index_nospec? */
> +	barrier_nospec();
> +
> +	if (unlikely(ti_flags & _TIF_32BIT)) {

Use is_compat_task() instead ?

> +		f = (void *)compat_sys_call_table[r0];
> +
> +		r3 &= 0x00000000ffffffffULL;
> +		r4 &= 0x00000000ffffffffULL;
> +		r5 &= 0x00000000ffffffffULL;
> +		r6 &= 0x00000000ffffffffULL;
> +		r7 &= 0x00000000ffffffffULL;
> +		r8 &= 0x00000000ffffffffULL;
> +
> +	} else {
> +		f = (void *)sys_call_table[r0];
> +	}
> +
> +	return f(r3, r4, r5, r6, r7, r8);
> +}
> +
> +/*
> + * This should be called after a syscall returns, with r3 the return value
> + * from the syscall. If this function returns non-zero, the system call
> + * exit assembly should additionally load all GPR registers and CTR and XER
> + * from the interrupt frame.
> + *
> + * The function graph tracer can not trace the return side of this function,
> + * because RI=0 and soft mask state is "unreconciled", so it is marked notrace.
> + */
> +notrace unsigned long syscall_exit_prepare(unsigned long r3,
> +					   struct pt_regs *regs)
> +{
> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
> +	unsigned long ti_flags;
> +	unsigned long ret = 0;
> +
> +	regs->result = r3;
> +
> +	/* Check whether the syscall is issued inside a restartable sequence */
> +	rseq_syscall(regs);
> +
> +	ti_flags = *ti_flagsp;
> +
> +	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) {
> +		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> +			r3 = -r3;
> +			regs->ccr |= 0x10000000; /* Set SO bit in CR */
> +		}
> +	}
> +
> +	if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
> +		if (ti_flags & _TIF_RESTOREALL)
> +			ret = _TIF_RESTOREALL;
> +		else
> +			regs->gpr[3] = r3;
> +		clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
> +	} else {
> +		regs->gpr[3] = r3;
> +	}
> +
> +	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +		do_syscall_trace_leave(regs);
> +		ret |= _TIF_RESTOREALL;
> +	}
> +
> +again:
> +	local_irq_disable();
> +	ti_flags = READ_ONCE(*ti_flagsp);
> +	while (unlikely(ti_flags & _TIF_USER_WORK_MASK)) {
> +		local_irq_enable();
> +		if (ti_flags & _TIF_NEED_RESCHED) {
> +			schedule();
> +		} else {
> +			/*
> +			 * SIGPENDING must restore signal handler function
> +			 * argument GPRs, and some non-volatiles (e.g., r1).
> +			 * Restore all for now. This could be made lighter.
> +			 */
> +			if (ti_flags & _TIF_SIGPENDING)
> +				ret |= _TIF_RESTOREALL;
> +			do_notify_resume(regs, ti_flags);
> +		}
> +		local_irq_disable();
> +		ti_flags = READ_ONCE(*ti_flagsp);
> +	}
> +
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && IS_ENABLED(CONFIG_PPC_FPU)) {
> +		unsigned long mathflags = MSR_FP;
> +
> +		if (IS_ENABLED(CONFIG_ALTIVEC))
> +			mathflags |= MSR_VEC;
> +
> +		if ((regs->msr & mathflags) != mathflags)
> +			restore_math(regs);
> +	}
> +
> +	/* This must be done with RI=1 because tracing may touch vmaps */
> +	trace_hardirqs_on();
> +
> +	/* This pattern matches prep_irq_for_idle */
> +	__hard_EE_RI_disable();
> +	if (unlikely(lazy_irq_pending())) {
> +		__hard_RI_enable();
> +		trace_hardirqs_off();
> +		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +		local_irq_enable();
> +		/* Took an interrupt which may have more exit work to do. */
> +		goto again;
> +	}
> +	local_paca->irq_happened = 0;
> +	irq_soft_mask_set(IRQS_ENABLED);
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	local_paca->tm_scratch = regs->msr;
> +#endif
> +
> +	kuap_check_amr();
> +
> +	account_cpu_user_exit();
> +
> +	return ret;
> +}
> diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
> index 5b905a2f4e4d..d34276f3c495 100644
> --- a/arch/powerpc/kernel/systbl.S
> +++ b/arch/powerpc/kernel/systbl.S
> @@ -16,25 +16,22 @@
>   
>   #ifdef CONFIG_PPC64
>   	.p2align	3
> +#define __SYSCALL(nr, entry)	.8byte entry
> +#else
> +#define __SYSCALL(nr, entry)	.long entry
>   #endif
>   
>   .globl sys_call_table
>   sys_call_table:
>   #ifdef CONFIG_PPC64
> -#define __SYSCALL(nr, entry)	.8byte DOTSYM(entry)
>   #include <asm/syscall_table_64.h>
> -#undef __SYSCALL
>   #else
> -#define __SYSCALL(nr, entry)	.long entry
>   #include <asm/syscall_table_32.h>
> -#undef __SYSCALL
>   #endif
>   
>   #ifdef CONFIG_COMPAT
>   .globl compat_sys_call_table
>   compat_sys_call_table:
>   #define compat_sys_sigsuspend	sys_sigsuspend
> -#define __SYSCALL(nr, entry)	.8byte DOTSYM(entry)
>   #include <asm/syscall_table_c32.h>
> -#undef __SYSCALL
>   #endif
> 

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
@ 2020-03-19 12:19   ` Michal Suchanek
  2020-02-25 17:35 ` [PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters Nicholas Piggin
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Less code means less bugs so add a knob to skip the compat stuff.

Changes in v2: saner CONFIG_COMPAT ifdefs
Changes in v3:
 - change llseek to 32bit instead of builing it unconditionally in fs
 - clanup the makefile conditionals
 - remove some ifdefs or convert to IS_DEFINED where possible
Changes in v4:
 - cleanup is_32bit_task and current_is_64bit
 - more makefile cleanup
Changes in v5:
 - more current_is_64bit cleanup
 - split off callchain.c 32bit and 64bit parts
Changes in v6:
 - cleanup makefile after split
 - consolidate read_user_stack_32
 - fix some checkpatch warnings
Changes in v7:
 - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
 - remove leftover hunk
 - add review tags
Changes in v8:
 - consolidate valid_user_sp to fix it in the split callchain.c
 - fix build errors/warnings with PPC64 !COMPAT and PPC32
Changes in v9:
 - remove current_is_64bit()
Chanegs in v10:
 - rebase, sent together with the syscall cleanup
Changes in v11:
 - rebase
 - add MAINTAINERS pattern for ppc perf

Michal Suchanek (8):
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc: move common register copy functions from signal_32.c to
    signal.c
  powerpc/perf: consolidate read_user_stack_32
  powerpc/perf: consolidate valid_user_sp
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by
    default.
  powerpc/perf: split callchain.c by bitness
  MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
    entry.

 MAINTAINERS                            |   2 +
 arch/powerpc/Kconfig                   |   5 +-
 arch/powerpc/include/asm/thread_info.h |   4 +-
 arch/powerpc/include/asm/unistd.h      |   1 +
 arch/powerpc/kernel/Makefile           |   6 +-
 arch/powerpc/kernel/entry_64.S         |   2 +
 arch/powerpc/kernel/signal.c           | 144 +++++++++-
 arch/powerpc/kernel/signal_32.c        | 140 ----------
 arch/powerpc/kernel/syscall_64.c       |   6 +-
 arch/powerpc/kernel/vdso.c             |   3 +-
 arch/powerpc/perf/Makefile             |   5 +-
 arch/powerpc/perf/callchain.c          | 356 +------------------------
 arch/powerpc/perf/callchain.h          |  20 ++
 arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
 arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
 fs/read_write.c                        |   3 +-
 16 files changed, 556 insertions(+), 511 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-03-19 12:19   ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Less code means less bugs so add a knob to skip the compat stuff.

Changes in v2: saner CONFIG_COMPAT ifdefs
Changes in v3:
 - change llseek to 32bit instead of builing it unconditionally in fs
 - clanup the makefile conditionals
 - remove some ifdefs or convert to IS_DEFINED where possible
Changes in v4:
 - cleanup is_32bit_task and current_is_64bit
 - more makefile cleanup
Changes in v5:
 - more current_is_64bit cleanup
 - split off callchain.c 32bit and 64bit parts
Changes in v6:
 - cleanup makefile after split
 - consolidate read_user_stack_32
 - fix some checkpatch warnings
Changes in v7:
 - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
 - remove leftover hunk
 - add review tags
Changes in v8:
 - consolidate valid_user_sp to fix it in the split callchain.c
 - fix build errors/warnings with PPC64 !COMPAT and PPC32
Changes in v9:
 - remove current_is_64bit()
Chanegs in v10:
 - rebase, sent together with the syscall cleanup
Changes in v11:
 - rebase
 - add MAINTAINERS pattern for ppc perf

Michal Suchanek (8):
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc: move common register copy functions from signal_32.c to
    signal.c
  powerpc/perf: consolidate read_user_stack_32
  powerpc/perf: consolidate valid_user_sp
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by
    default.
  powerpc/perf: split callchain.c by bitness
  MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
    entry.

 MAINTAINERS                            |   2 +
 arch/powerpc/Kconfig                   |   5 +-
 arch/powerpc/include/asm/thread_info.h |   4 +-
 arch/powerpc/include/asm/unistd.h      |   1 +
 arch/powerpc/kernel/Makefile           |   6 +-
 arch/powerpc/kernel/entry_64.S         |   2 +
 arch/powerpc/kernel/signal.c           | 144 +++++++++-
 arch/powerpc/kernel/signal_32.c        | 140 ----------
 arch/powerpc/kernel/syscall_64.c       |   6 +-
 arch/powerpc/kernel/vdso.c             |   3 +-
 arch/powerpc/perf/Makefile             |   5 +-
 arch/powerpc/perf/callchain.c          | 356 +------------------------
 arch/powerpc/perf/callchain.h          |  20 ++
 arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
 arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
 fs/read_write.c                        |   3 +-
 16 files changed, 556 insertions(+), 511 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH v11 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")

When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.

Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
---
v7: new patch
---
 arch/powerpc/include/asm/unistd.h | 1 +
 fs/read_write.c                   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index b0720c7c3fcf..700fcdac2e3c 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -31,6 +31,7 @@
 #define __ARCH_WANT_SYS_SOCKETCALL
 #define __ARCH_WANT_SYS_FADVISE64
 #define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
 #define __ARCH_WANT_SYS_NICE
 #define __ARCH_WANT_SYS_OLD_GETRLIMIT
 #define __ARCH_WANT_SYS_OLD_UNAME
diff --git a/fs/read_write.c b/fs/read_write.c
index 59d819c5b92e..bbfa9b12b15e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -331,7 +331,8 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned i
 }
 #endif
 
-#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
+#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
+	defined(__ARCH_WANT_SYS_LLSEEK)
 SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		unsigned long, offset_low, loff_t __user *, result,
 		unsigned int, whence)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")

When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.

Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
---
v7: new patch
---
 arch/powerpc/include/asm/unistd.h | 1 +
 fs/read_write.c                   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index b0720c7c3fcf..700fcdac2e3c 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -31,6 +31,7 @@
 #define __ARCH_WANT_SYS_SOCKETCALL
 #define __ARCH_WANT_SYS_FADVISE64
 #define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
 #define __ARCH_WANT_SYS_NICE
 #define __ARCH_WANT_SYS_OLD_GETRLIMIT
 #define __ARCH_WANT_SYS_OLD_UNAME
diff --git a/fs/read_write.c b/fs/read_write.c
index 59d819c5b92e..bbfa9b12b15e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -331,7 +331,8 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned i
 }
 #endif
 
-#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
+#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
+	defined(__ARCH_WANT_SYS_LLSEEK)
 SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		unsigned long, offset_low, loff_t __user *, result,
 		unsigned int, whence)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 2/8] powerpc: move common register copy functions from signal_32.c to signal.c
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/signal.c    | 141 ++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c | 140 -------------------------------
 2 files changed, 141 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d215f9554553..4b0152108f61 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,12 +18,153 @@
 #include <linux/syscalls.h>
 #include <asm/hw_breakpoint.h>
 #include <linux/uaccess.h>
+#include <asm/switch_to.h>
 #include <asm/unistd.h>
 #include <asm/debug.h>
 #include <asm/tm.h>
 
 #include "signal.h"
 
+#ifdef CONFIG_VSX
+unsigned long copy_fpr_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_FPR(i);
+	buf[i] = task->thread.fp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_fpr_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_FPR(i) = buf[i];
+	task->thread.fp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_vsx_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_vsx_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+unsigned long copy_ckfpr_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_CKFPR(i);
+	buf[i] = task->thread.ckfp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_CKFPR(i) = buf[i];
+	task->thread.ckfp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_ckvsx_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#else
+inline unsigned long copy_fpr_to_user(void __user *to,
+				      struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.fp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_fpr_from_user(struct task_struct *task,
+					void __user *from)
+{
+	return __copy_from_user(task->thread.fp_state.fpr, from,
+			      ELF_NFPREG * sizeof(double));
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+inline unsigned long copy_ckfpr_to_user(void __user *to,
+					 struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.ckfp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
+						 void __user *from)
+{
+	return __copy_from_user(task->thread.ckfp_state.fpr, from,
+				ELF_NFPREG * sizeof(double));
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#endif
+
 /* Log an error when sending an unhandled signal to a process. Controlled
  * through debug.exception-trace sysctl.
  */
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 1b090a76b444..4f96d29a22bf 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -235,146 +235,6 @@ struct rt_sigframe {
 	int			abigap[56];
 };
 
-#ifdef CONFIG_VSX
-unsigned long copy_fpr_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_FPR(i);
-	buf[i] = task->thread.fp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_fpr_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_FPR(i) = buf[i];
-	task->thread.fp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_vsx_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_vsx_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-unsigned long copy_ckfpr_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_CKFPR(i);
-	buf[i] = task->thread.ckfp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_ckfpr_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_CKFPR(i) = buf[i];
-	task->thread.ckfp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_ckvsx_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_ckvsx_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#else
-inline unsigned long copy_fpr_to_user(void __user *to,
-				      struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.fp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_fpr_from_user(struct task_struct *task,
-					void __user *from)
-{
-	return __copy_from_user(task->thread.fp_state.fpr, from,
-			      ELF_NFPREG * sizeof(double));
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-inline unsigned long copy_ckfpr_to_user(void __user *to,
-					 struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.ckfp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
-						 void __user *from)
-{
-	return __copy_from_user(task->thread.ckfp_state.fpr, from,
-				ELF_NFPREG * sizeof(double));
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#endif
-
 /*
  * Save the current user registers on the user stack.
  * We only save the altivec/spe registers if the process has used
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 2/8] powerpc: move common register copy functions from signal_32.c to signal.c
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/signal.c    | 141 ++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c | 140 -------------------------------
 2 files changed, 141 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d215f9554553..4b0152108f61 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,12 +18,153 @@
 #include <linux/syscalls.h>
 #include <asm/hw_breakpoint.h>
 #include <linux/uaccess.h>
+#include <asm/switch_to.h>
 #include <asm/unistd.h>
 #include <asm/debug.h>
 #include <asm/tm.h>
 
 #include "signal.h"
 
+#ifdef CONFIG_VSX
+unsigned long copy_fpr_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_FPR(i);
+	buf[i] = task->thread.fp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_fpr_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_FPR(i) = buf[i];
+	task->thread.fp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_vsx_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_vsx_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+unsigned long copy_ckfpr_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_CKFPR(i);
+	buf[i] = task->thread.ckfp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_CKFPR(i) = buf[i];
+	task->thread.ckfp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_ckvsx_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#else
+inline unsigned long copy_fpr_to_user(void __user *to,
+				      struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.fp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_fpr_from_user(struct task_struct *task,
+					void __user *from)
+{
+	return __copy_from_user(task->thread.fp_state.fpr, from,
+			      ELF_NFPREG * sizeof(double));
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+inline unsigned long copy_ckfpr_to_user(void __user *to,
+					 struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.ckfp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
+						 void __user *from)
+{
+	return __copy_from_user(task->thread.ckfp_state.fpr, from,
+				ELF_NFPREG * sizeof(double));
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#endif
+
 /* Log an error when sending an unhandled signal to a process. Controlled
  * through debug.exception-trace sysctl.
  */
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 1b090a76b444..4f96d29a22bf 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -235,146 +235,6 @@ struct rt_sigframe {
 	int			abigap[56];
 };
 
-#ifdef CONFIG_VSX
-unsigned long copy_fpr_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_FPR(i);
-	buf[i] = task->thread.fp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_fpr_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_FPR(i) = buf[i];
-	task->thread.fp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_vsx_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_vsx_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-unsigned long copy_ckfpr_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_CKFPR(i);
-	buf[i] = task->thread.ckfp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_ckfpr_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_CKFPR(i) = buf[i];
-	task->thread.ckfp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_ckvsx_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_ckvsx_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#else
-inline unsigned long copy_fpr_to_user(void __user *to,
-				      struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.fp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_fpr_from_user(struct task_struct *task,
-					void __user *from)
-{
-	return __copy_from_user(task->thread.fp_state.fpr, from,
-			      ELF_NFPREG * sizeof(double));
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-inline unsigned long copy_ckfpr_to_user(void __user *to,
-					 struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.ckfp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
-						 void __user *from)
-{
-	return __copy_from_user(task->thread.ckfp_state.fpr, from,
-				ELF_NFPREG * sizeof(double));
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#endif
-
 /*
  * Save the current user registers on the user stack.
  * We only save the altivec/spe registers if the process has used
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

There are two almost identical copies for 32bit and 64bit.

The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6:  new patch
v8:  move the consolidated function out of the ifdef block.
v11: rebase on top of def0bfdbd603
---
 arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..c9a78c6e4361 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 4);
-}
-
 static inline int valid_user_sp(unsigned long sp, int is_64)
 {
 	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
@@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 }
 
 #else  /* CONFIG_PPC64 */
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	return probe_user_read(ret, ptr, sizeof(*ret));
+	return 0;
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
@@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
 /*
  * Layout for non-RT signal frames
  */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

There are two almost identical copies for 32bit and 64bit.

The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6:  new patch
v8:  move the consolidated function out of the ifdef block.
v11: rebase on top of def0bfdbd603
---
 arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..c9a78c6e4361 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 4);
-}
-
 static inline int valid_user_sp(unsigned long sp, int is_64)
 {
 	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
@@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 }
 
 #else  /* CONFIG_PPC64 */
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	return probe_user_read(ret, ptr, sizeof(*ret));
+	return 0;
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
@@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
 /*
  * Layout for non-RT signal frames
  */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 4/8] powerpc/perf: consolidate valid_user_sp
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Merge the 32bit and 64bit version.

Halve the check constants on 32bit.

Use STACK_TOP since it is defined.

Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.

This removes a page from the valid 32bit area on 64bit:
 #define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
 #define STACK_TOP_USER32 TASK_SIZE_USER32

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v8: new patch
v11: simplify by using is_32bit_task()
---
 arch/powerpc/perf/callchain.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c9a78c6e4361..194c7fd933e6 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -102,6 +102,15 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
+static inline int valid_user_sp(unsigned long sp)
+{
+	bool is_64 = !is_32bit_task();
+
+	if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
+		return 0;
+	return 1;
+}
+
 #ifdef CONFIG_PPC64
 /*
  * On 64-bit we don't want to invoke hash_page on user addresses from
@@ -161,13 +170,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
-		return 0;
-	return 1;
-}
-
 /*
  * 64-bit user processes use the same stack frame for RT and non-RT signals.
  */
@@ -226,7 +228,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned long __user *) sp;
-		if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
+		if (!valid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
 			return;
@@ -275,13 +277,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 {
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
-		return 0;
-	return 1;
-}
-
 #define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
 #define sigcontext32		sigcontext
 #define mcontext32		mcontext
@@ -423,7 +418,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned int __user *) (unsigned long) sp;
-		if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
+		if (!valid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
 			return;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 4/8] powerpc/perf: consolidate valid_user_sp
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Merge the 32bit and 64bit version.

Halve the check constants on 32bit.

Use STACK_TOP since it is defined.

Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.

This removes a page from the valid 32bit area on 64bit:
 #define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
 #define STACK_TOP_USER32 TASK_SIZE_USER32

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v8: new patch
v11: simplify by using is_32bit_task()
---
 arch/powerpc/perf/callchain.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c9a78c6e4361..194c7fd933e6 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -102,6 +102,15 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
+static inline int valid_user_sp(unsigned long sp)
+{
+	bool is_64 = !is_32bit_task();
+
+	if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
+		return 0;
+	return 1;
+}
+
 #ifdef CONFIG_PPC64
 /*
  * On 64-bit we don't want to invoke hash_page on user addresses from
@@ -161,13 +170,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
-		return 0;
-	return 1;
-}
-
 /*
  * 64-bit user processes use the same stack frame for RT and non-RT signals.
  */
@@ -226,7 +228,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned long __user *) sp;
-		if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
+		if (!valid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
 			return;
@@ -275,13 +277,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 {
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
-		return 0;
-	return 1;
-}
-
 #define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
 #define sigcontext32		sigcontext
 #define mcontext32		mcontext
@@ -423,7 +418,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned int __user *) (unsigned long) sp;
-		if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
+		if (!valid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
 			return;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- fix 32bit ifdef condition in signal.c
- simplify the compat ifdef condition in vdso.c - 64bit is redundant
- simplify the compat ifdef condition in callchain.c - 64bit is redundant
v3:
- use IS_ENABLED and maybe_unused where possible
- do not ifdef declarations
- clean up Makefile
v4:
- further makefile cleanup
- simplify is_32bit_task conditions
- avoid ifdef in condition by using return
v5:
- avoid unreachable code on 32bit
- make is_current_64bit constant on !COMPAT
- add stub perf_callchain_user_32 to avoid some ifdefs
v6:
- consolidate current_is_64bit
v7:
- remove leftover perf_callchain_user_32 stub from previous series version
v8:
- fix build again - too trigger-happy with stub removal
- remove a vdso.c hunk that causes warning according to kbuild test robot
v9:
- removed current_is_64bit in previous patch
v10:
- rebase on top of 70ed86f4de5bd
---
 arch/powerpc/include/asm/thread_info.h | 4 ++--
 arch/powerpc/kernel/Makefile           | 6 +++---
 arch/powerpc/kernel/entry_64.S         | 2 ++
 arch/powerpc/kernel/signal.c           | 3 +--
 arch/powerpc/kernel/syscall_64.c       | 6 ++----
 arch/powerpc/kernel/vdso.c             | 3 ++-
 arch/powerpc/perf/callchain.c          | 8 +++++++-
 7 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index a2270749b282..ca6c97025704 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -162,10 +162,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
 	return (ti->local_flags & flags) != 0;
 }
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #define is_32bit_task()	(test_thread_flag(TIF_32BIT))
 #else
-#define is_32bit_task()	(1)
+#define is_32bit_task()	(IS_ENABLED(CONFIG_PPC32))
 #endif
 
 #if defined(CONFIG_PPC64)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 5700231a8988..98a1c143b613 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -42,16 +42,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
 endif
 
 obj-y				:= cputable.o ptrace.o syscalls.o \
-				   irq.o align.o signal_32.o pmc.o vdso.o \
+				   irq.o align.o signal_$(BITS).o pmc.o vdso.o \
 				   process.o systbl.o idle.o \
 				   signal.o sysfs.o cacheinfo.o time.o \
 				   prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
 				   of_platform.o prom_parse.o
-obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
-				   signal_64.o ptrace32.o \
+obj-$(CONFIG_PPC64)		+= setup_64.o \
 				   paca.o nvram_64.o firmware.o note.o \
 				   syscall_64.o
+obj-$(CONFIG_COMPAT)		+= sys_ppc32.o ptrace32.o signal_32.o
 obj-$(CONFIG_VDSO32)		+= vdso32/
 obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 4c0d0400e93d..fe1421e08f09 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -52,8 +52,10 @@
 SYS_CALL_TABLE:
 	.tc sys_call_table[TC],sys_call_table
 
+#ifdef CONFIG_COMPAT
 COMPAT_SYS_CALL_TABLE:
 	.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
 
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 4b0152108f61..a264989626fd 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
 	sigset_t *oldset = sigmask_to_save();
 	struct ksignal ksig = { .sig = 0 };
 	int ret;
-	int is32 = is_32bit_task();
 
 	BUG_ON(tsk != current);
 
@@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
 
 	rseq_signal_deliver(&ksig, tsk->thread.regs);
 
-	if (is32) {
+	if (is_32bit_task()) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
 		else
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 87d95b455b83..2dcbfe38f5ac 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
 				   long r6, long r7, long r8,
 				   unsigned long r0, struct pt_regs *regs)
 {
-	unsigned long ti_flags;
 	syscall_fn f;
 
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 
 	local_irq_enable();
 
-	ti_flags = current_thread_info()->flags;
-	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
 		/*
 		 * We use the return value of do_syscall_trace_enter() as the
 		 * syscall number. If the syscall was rejected for any reason
@@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 	/* May be faster to do array_index_nospec? */
 	barrier_nospec();
 
-	if (unlikely(ti_flags & _TIF_32BIT)) {
+	if (unlikely(is_32bit_task())) {
 		f = (void *)compat_sys_call_table[r0];
 
 		r3 &= 0x00000000ffffffffULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index b9a108411c0d..77da3b7d304d 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -656,7 +656,8 @@ static void __init vdso_setup_syscall_map(void)
 		if (sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_64[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
-		if (compat_sys_call_table[i] != sys_ni_syscall)
+		if (IS_ENABLED(CONFIG_COMPAT) &&
+		    compat_sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_32[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 194c7fd933e6..8a274bd523b1 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,7 +15,7 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #include "../kernel/ppc32.h"
 #endif
 #include <asm/pte-walk.h>
@@ -285,6 +285,7 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 
 #endif /* CONFIG_PPC64 */
 
+#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
 /*
  * On 32-bit we just access the address and let hash_page create a
  * HPTE if necessary, so there is no need to fall back to reading
@@ -448,6 +449,11 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 		sp = next_sp;
 	}
 }
+#else /* 32bit */
+static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+				   struct pt_regs *regs)
+{}
+#endif /* 32bit */
 
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- fix 32bit ifdef condition in signal.c
- simplify the compat ifdef condition in vdso.c - 64bit is redundant
- simplify the compat ifdef condition in callchain.c - 64bit is redundant
v3:
- use IS_ENABLED and maybe_unused where possible
- do not ifdef declarations
- clean up Makefile
v4:
- further makefile cleanup
- simplify is_32bit_task conditions
- avoid ifdef in condition by using return
v5:
- avoid unreachable code on 32bit
- make is_current_64bit constant on !COMPAT
- add stub perf_callchain_user_32 to avoid some ifdefs
v6:
- consolidate current_is_64bit
v7:
- remove leftover perf_callchain_user_32 stub from previous series version
v8:
- fix build again - too trigger-happy with stub removal
- remove a vdso.c hunk that causes warning according to kbuild test robot
v9:
- removed current_is_64bit in previous patch
v10:
- rebase on top of 70ed86f4de5bd
---
 arch/powerpc/include/asm/thread_info.h | 4 ++--
 arch/powerpc/kernel/Makefile           | 6 +++---
 arch/powerpc/kernel/entry_64.S         | 2 ++
 arch/powerpc/kernel/signal.c           | 3 +--
 arch/powerpc/kernel/syscall_64.c       | 6 ++----
 arch/powerpc/kernel/vdso.c             | 3 ++-
 arch/powerpc/perf/callchain.c          | 8 +++++++-
 7 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index a2270749b282..ca6c97025704 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -162,10 +162,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
 	return (ti->local_flags & flags) != 0;
 }
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #define is_32bit_task()	(test_thread_flag(TIF_32BIT))
 #else
-#define is_32bit_task()	(1)
+#define is_32bit_task()	(IS_ENABLED(CONFIG_PPC32))
 #endif
 
 #if defined(CONFIG_PPC64)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 5700231a8988..98a1c143b613 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -42,16 +42,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
 endif
 
 obj-y				:= cputable.o ptrace.o syscalls.o \
-				   irq.o align.o signal_32.o pmc.o vdso.o \
+				   irq.o align.o signal_$(BITS).o pmc.o vdso.o \
 				   process.o systbl.o idle.o \
 				   signal.o sysfs.o cacheinfo.o time.o \
 				   prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
 				   of_platform.o prom_parse.o
-obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
-				   signal_64.o ptrace32.o \
+obj-$(CONFIG_PPC64)		+= setup_64.o \
 				   paca.o nvram_64.o firmware.o note.o \
 				   syscall_64.o
+obj-$(CONFIG_COMPAT)		+= sys_ppc32.o ptrace32.o signal_32.o
 obj-$(CONFIG_VDSO32)		+= vdso32/
 obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 4c0d0400e93d..fe1421e08f09 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -52,8 +52,10 @@
 SYS_CALL_TABLE:
 	.tc sys_call_table[TC],sys_call_table
 
+#ifdef CONFIG_COMPAT
 COMPAT_SYS_CALL_TABLE:
 	.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
 
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 4b0152108f61..a264989626fd 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
 	sigset_t *oldset = sigmask_to_save();
 	struct ksignal ksig = { .sig = 0 };
 	int ret;
-	int is32 = is_32bit_task();
 
 	BUG_ON(tsk != current);
 
@@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
 
 	rseq_signal_deliver(&ksig, tsk->thread.regs);
 
-	if (is32) {
+	if (is_32bit_task()) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
 		else
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 87d95b455b83..2dcbfe38f5ac 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
 				   long r6, long r7, long r8,
 				   unsigned long r0, struct pt_regs *regs)
 {
-	unsigned long ti_flags;
 	syscall_fn f;
 
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 
 	local_irq_enable();
 
-	ti_flags = current_thread_info()->flags;
-	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
 		/*
 		 * We use the return value of do_syscall_trace_enter() as the
 		 * syscall number. If the syscall was rejected for any reason
@@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 	/* May be faster to do array_index_nospec? */
 	barrier_nospec();
 
-	if (unlikely(ti_flags & _TIF_32BIT)) {
+	if (unlikely(is_32bit_task())) {
 		f = (void *)compat_sys_call_table[r0];
 
 		r3 &= 0x00000000ffffffffULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index b9a108411c0d..77da3b7d304d 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -656,7 +656,8 @@ static void __init vdso_setup_syscall_map(void)
 		if (sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_64[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
-		if (compat_sys_call_table[i] != sys_ni_syscall)
+		if (IS_ENABLED(CONFIG_COMPAT) &&
+		    compat_sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_32[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 194c7fd933e6..8a274bd523b1 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,7 +15,7 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #include "../kernel/ppc32.h"
 #endif
 #include <asm/pte-walk.h>
@@ -285,6 +285,7 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 
 #endif /* CONFIG_PPC64 */
 
+#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
 /*
  * On 32-bit we just access the address and let hash_page create a
  * HPTE if necessary, so there is no need to fall back to reading
@@ -448,6 +449,11 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 		sp = next_sp;
 	}
 }
+#else /* 32bit */
+static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+				   struct pt_regs *regs)
+{}
+#endif /* 32bit */
 
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 6/8] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v3: make configurable
---
 arch/powerpc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 497b7d0b2d7e..29d00b3959b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -264,8 +264,9 @@ config PANIC_TIMEOUT
 	default 180
 
 config COMPAT
-	bool
-	default y if PPC64
+	bool "Enable support for 32bit binaries"
+	depends on PPC64
+	default y if !CPU_LITTLE_ENDIAN
 	select COMPAT_BINFMT_ELF
 	select ARCH_WANT_OLD_COMPAT_IPC
 	select COMPAT_OLD_SIGACTION
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 6/8] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v3: make configurable
---
 arch/powerpc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 497b7d0b2d7e..29d00b3959b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -264,8 +264,9 @@ config PANIC_TIMEOUT
 	default 180
 
 config COMPAT
-	bool
-	default y if PPC64
+	bool "Enable support for 32bit binaries"
+	depends on PPC64
+	default y if !CPU_LITTLE_ENDIAN
 	select COMPAT_BINFMT_ELF
 	select ARCH_WANT_OLD_COMPAT_IPC
 	select COMPAT_OLD_SIGACTION
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 7/8] powerpc/perf: split callchain.c by bitness
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.

No code change intended.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v6:
 - move current_is_64bit consolidetaion to earlier patch
 - move defines to the top of callchain_32.c
 - Makefile cleanup
v8:
 - fix valid_user_sp
v11:
 - rebase on top of def0bfdbd603
---
 arch/powerpc/perf/Makefile       |   5 +-
 arch/powerpc/perf/callchain.c    | 357 +------------------------------
 arch/powerpc/perf/callchain.h    |  20 ++
 arch/powerpc/perf/callchain_32.c | 196 +++++++++++++++++
 arch/powerpc/perf/callchain_64.c | 174 +++++++++++++++
 5 files changed, 395 insertions(+), 357 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index c155dcbb8691..53d614e98537 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_PERF_EVENTS)	+= callchain.o perf_regs.o
+obj-$(CONFIG_PERF_EVENTS)	+= callchain.o callchain_$(BITS).o perf_regs.o
+ifdef CONFIG_COMPAT
+obj-$(CONFIG_PERF_EVENTS)	+= callchain_32.o
+endif
 
 obj-$(CONFIG_PPC_PERF_CTRS)	+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)	+= ppc970-pmu.o power5-pmu.o \
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 8a274bd523b1..dd5051015008 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,11 +15,9 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_COMPAT
-#include "../kernel/ppc32.h"
-#endif
 #include <asm/pte-walk.h>
 
+#include "callchain.h"
 
 /*
  * Is sp valid as the address of the next kernel stack frame after prev_sp?
@@ -102,359 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
-static inline int valid_user_sp(unsigned long sp)
-{
-	bool is_64 = !is_32bit_task();
-
-	if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
-		return 0;
-	return 1;
-}
-
-#ifdef CONFIG_PPC64
-/*
- * On 64-bit we don't want to invoke hash_page on user addresses from
- * interrupt context, so if the access faults, we read the page tables
- * to find which page (if any) is mapped and access it directly.
- */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	int ret = -EFAULT;
-	pgd_t *pgdir;
-	pte_t *ptep, pte;
-	unsigned shift;
-	unsigned long addr = (unsigned long) ptr;
-	unsigned long offset;
-	unsigned long pfn, flags;
-	void *kaddr;
-
-	pgdir = current->mm->pgd;
-	if (!pgdir)
-		return -EFAULT;
-
-	local_irq_save(flags);
-	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
-	if (!ptep)
-		goto err_out;
-	if (!shift)
-		shift = PAGE_SHIFT;
-
-	/* align address to page boundary */
-	offset = addr & ((1UL << shift) - 1);
-
-	pte = READ_ONCE(*ptep);
-	if (!pte_present(pte) || !pte_user(pte))
-		goto err_out;
-	pfn = pte_pfn(pte);
-	if (!page_is_ram(pfn))
-		goto err_out;
-
-	/* no highmem to worry about here */
-	kaddr = pfn_to_kaddr(pfn);
-	memcpy(buf, kaddr + offset, nb);
-	ret = 0;
-err_out:
-	local_irq_restore(flags);
-	return ret;
-}
-
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 8);
-}
-
-/*
- * 64-bit user processes use the same stack frame for RT and non-RT signals.
- */
-struct signal_frame_64 {
-	char		dummy[__SIGNAL_FRAMESIZE];
-	struct ucontext	uc;
-	unsigned long	unused[2];
-	unsigned int	tramp[6];
-	struct siginfo	*pinfo;
-	void		*puc;
-	struct siginfo	info;
-	char		abigap[288];
-};
-
-static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_64, tramp))
-		return 1;
-	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-/*
- * Do some sanity checking on the signal frame pointed to by sp.
- * We check the pinfo and puc pointers in the frame.
- */
-static int sane_signal_64_frame(unsigned long sp)
-{
-	struct signal_frame_64 __user *sf;
-	unsigned long pinfo, puc;
-
-	sf = (struct signal_frame_64 __user *) sp;
-	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
-	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
-		return 0;
-	return pinfo == (unsigned long) &sf->info &&
-		puc == (unsigned long) &sf->uc;
-}
-
-static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned long sp, next_sp;
-	unsigned long next_ip;
-	unsigned long lr;
-	long level = 0;
-	struct signal_frame_64 __user *sigframe;
-	unsigned long __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned long __user *) sp;
-		if (!valid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
-			return;
-
-		/*
-		 * Note: the next_sp - sp >= signal frame size check
-		 * is true when next_sp < sp, which can happen when
-		 * transitioning from an alternate signal stack to the
-		 * normal stack.
-		 */
-		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
-		    (is_sigreturn_64_address(next_ip, sp) ||
-		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
-		    sane_signal_64_frame(sp)) {
-			/*
-			 * This looks like an signal frame
-			 */
-			sigframe = (struct signal_frame_64 __user *) sp;
-			uregs = sigframe->uc.uc_mcontext.gp_regs;
-			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_64(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-
-#else  /* CONFIG_PPC64 */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	return 0;
-}
-
-static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-					  struct pt_regs *regs)
-{
-}
-
-#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
-#define sigcontext32		sigcontext
-#define mcontext32		mcontext
-#define ucontext32		ucontext
-#define compat_siginfo_t	struct siginfo
-
-#endif /* CONFIG_PPC64 */
-
-#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	int rc;
-
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
-
-	if (IS_ENABLED(CONFIG_PPC64) && rc)
-		return read_user_stack_slow(ptr, ret, 4);
-
-	return rc;
-}
-
-/*
- * Layout for non-RT signal frames
- */
-struct signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32];
-	struct sigcontext32	sctx;
-	struct mcontext32	mctx;
-	int			abigap[56];
-};
-
-/*
- * Layout for RT signal frames
- */
-struct rt_signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
-	compat_siginfo_t	info;
-	struct ucontext32	uc;
-	int			abigap[56];
-};
-
-static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
-		return 1;
-	if (vdso32_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct rt_signal_frame_32,
-				 uc.uc_mcontext.mc_pad))
-		return 1;
-	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int sane_signal_32_frame(unsigned int sp)
-{
-	struct signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->mctx;
-}
-
-static int sane_rt_signal_32_frame(unsigned int sp)
-{
-	struct rt_signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->uc.uc_mcontext;
-}
-
-static unsigned int __user *signal_frame_32_regs(unsigned int sp,
-				unsigned int next_sp, unsigned int next_ip)
-{
-	struct mcontext32 __user *mctx = NULL;
-	struct signal_frame_32 __user *sf;
-	struct rt_signal_frame_32 __user *rt_sf;
-
-	/*
-	 * Note: the next_sp - sp >= signal frame size check
-	 * is true when next_sp < sp, for example, when
-	 * transitioning from an alternate signal stack to the
-	 * normal stack.
-	 */
-	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
-	    is_sigreturn_32_address(next_ip, sp) &&
-	    sane_signal_32_frame(sp)) {
-		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &sf->mctx;
-	}
-
-	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
-	    is_rt_sigreturn_32_address(next_ip, sp) &&
-	    sane_rt_signal_32_frame(sp)) {
-		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &rt_sf->uc.uc_mcontext;
-	}
-
-	if (!mctx)
-		return NULL;
-	return mctx->mc_gregs;
-}
-
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned int sp, next_sp;
-	unsigned int next_ip;
-	unsigned int lr;
-	long level = 0;
-	unsigned int __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned int __user *) (unsigned long) sp;
-		if (!valid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
-			return;
-
-		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
-		if (!uregs && level <= 1)
-			uregs = signal_frame_32_regs(sp, next_sp, lr);
-		if (uregs) {
-			/*
-			 * This looks like an signal frame, so restart
-			 * the stack trace with the values in it.
-			 */
-			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_32(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-#else /* 32bit */
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{}
-#endif /* 32bit */
-
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
new file mode 100644
index 000000000000..8631a96d627d
--- /dev/null
+++ b/arch/powerpc/perf/callchain.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _POWERPC_PERF_CALLCHAIN_H
+#define _POWERPC_PERF_CALLCHAIN_H
+
+int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+
+static inline int valid_user_sp(unsigned long sp)
+{
+	bool is_64 = !is_32bit_task();
+
+	if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
+		return 0;
+	return 1;
+}
+
+#endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
new file mode 100644
index 000000000000..25729c651cb2
--- /dev/null
+++ b/arch/powerpc/perf/callchain_32.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+#ifdef CONFIG_PPC64
+#include "../kernel/ppc32.h"
+#else  /* CONFIG_PPC64 */
+
+#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
+#define sigcontext32		sigcontext
+#define mcontext32		mcontext
+#define ucontext32		ucontext
+#define compat_siginfo_t	struct siginfo
+
+#endif /* CONFIG_PPC64 */
+
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
+/*
+ * Layout for non-RT signal frames
+ */
+struct signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32];
+	struct sigcontext32	sctx;
+	struct mcontext32	mctx;
+	int			abigap[56];
+};
+
+/*
+ * Layout for RT signal frames
+ */
+struct rt_signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
+	compat_siginfo_t	info;
+	struct ucontext32	uc;
+	int			abigap[56];
+};
+
+static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
+		return 1;
+	if (vdso32_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct rt_signal_frame_32,
+				 uc.uc_mcontext.mc_pad))
+		return 1;
+	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int sane_signal_32_frame(unsigned int sp)
+{
+	struct signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->mctx;
+}
+
+static int sane_rt_signal_32_frame(unsigned int sp)
+{
+	struct rt_signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->uc.uc_mcontext;
+}
+
+static unsigned int __user *signal_frame_32_regs(unsigned int sp,
+				unsigned int next_sp, unsigned int next_ip)
+{
+	struct mcontext32 __user *mctx = NULL;
+	struct signal_frame_32 __user *sf;
+	struct rt_signal_frame_32 __user *rt_sf;
+
+	/*
+	 * Note: the next_sp - sp >= signal frame size check
+	 * is true when next_sp < sp, for example, when
+	 * transitioning from an alternate signal stack to the
+	 * normal stack.
+	 */
+	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
+	    is_sigreturn_32_address(next_ip, sp) &&
+	    sane_signal_32_frame(sp)) {
+		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &sf->mctx;
+	}
+
+	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
+	    is_rt_sigreturn_32_address(next_ip, sp) &&
+	    sane_rt_signal_32_frame(sp)) {
+		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &rt_sf->uc.uc_mcontext;
+	}
+
+	if (!mctx)
+		return NULL;
+	return mctx->mc_gregs;
+}
+
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned int sp, next_sp;
+	unsigned int next_ip;
+	unsigned int lr;
+	long level = 0;
+	unsigned int __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned int __user *) (unsigned long) sp;
+		if (!valid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
+			return;
+
+		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
+		if (!uregs && level <= 1)
+			uregs = signal_frame_32_regs(sp, next_sp, lr);
+		if (uregs) {
+			/*
+			 * This looks like an signal frame, so restart
+			 * the stack trace with the values in it.
+			 */
+			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_32(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
new file mode 100644
index 000000000000..7e8eed59dd18
--- /dev/null
+++ b/arch/powerpc/perf/callchain_64.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+/*
+ * On 64-bit we don't want to invoke hash_page on user addresses from
+ * interrupt context, so if the access faults, we read the page tables
+ * to find which page (if any) is mapped and access it directly.
+ */
+int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+	int ret = -EFAULT;
+	pgd_t *pgdir;
+	pte_t *ptep, pte;
+	unsigned int shift;
+	unsigned long addr = (unsigned long) ptr;
+	unsigned long offset;
+	unsigned long pfn, flags;
+	void *kaddr;
+
+	pgdir = current->mm->pgd;
+	if (!pgdir)
+		return -EFAULT;
+
+	local_irq_save(flags);
+	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
+	if (!ptep)
+		goto err_out;
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* align address to page boundary */
+	offset = addr & ((1UL << shift) - 1);
+
+	pte = READ_ONCE(*ptep);
+	if (!pte_present(pte) || !pte_user(pte))
+		goto err_out;
+	pfn = pte_pfn(pte);
+	if (!page_is_ram(pfn))
+		goto err_out;
+
+	/* no highmem to worry about here */
+	kaddr = pfn_to_kaddr(pfn);
+	memcpy(buf, kaddr + offset, nb);
+	ret = 0;
+err_out:
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+{
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
+	    ((unsigned long)ptr & 7))
+		return -EFAULT;
+
+	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+		return 0;
+
+	return read_user_stack_slow(ptr, ret, 8);
+}
+
+/*
+ * 64-bit user processes use the same stack frame for RT and non-RT signals.
+ */
+struct signal_frame_64 {
+	char		dummy[__SIGNAL_FRAMESIZE];
+	struct ucontext	uc;
+	unsigned long	unused[2];
+	unsigned int	tramp[6];
+	struct siginfo	*pinfo;
+	void		*puc;
+	struct siginfo	info;
+	char		abigap[288];
+};
+
+static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_64, tramp))
+		return 1;
+	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+/*
+ * Do some sanity checking on the signal frame pointed to by sp.
+ * We check the pinfo and puc pointers in the frame.
+ */
+static int sane_signal_64_frame(unsigned long sp)
+{
+	struct signal_frame_64 __user *sf;
+	unsigned long pinfo, puc;
+
+	sf = (struct signal_frame_64 __user *) sp;
+	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
+	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
+		return 0;
+	return pinfo == (unsigned long) &sf->info &&
+		puc == (unsigned long) &sf->uc;
+}
+
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned long sp, next_sp;
+	unsigned long next_ip;
+	unsigned long lr;
+	long level = 0;
+	struct signal_frame_64 __user *sigframe;
+	unsigned long __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned long __user *) sp;
+		if (!valid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
+			return;
+
+		/*
+		 * Note: the next_sp - sp >= signal frame size check
+		 * is true when next_sp < sp, which can happen when
+		 * transitioning from an alternate signal stack to the
+		 * normal stack.
+		 */
+		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
+		    (is_sigreturn_64_address(next_ip, sp) ||
+		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
+		    sane_signal_64_frame(sp)) {
+			/*
+			 * This looks like an signal frame
+			 */
+			sigframe = (struct signal_frame_64 __user *) sp;
+			uregs = sigframe->uc.uc_mcontext.gp_regs;
+			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_64(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 7/8] powerpc/perf: split callchain.c by bitness
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.

No code change intended.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v6:
 - move current_is_64bit consolidetaion to earlier patch
 - move defines to the top of callchain_32.c
 - Makefile cleanup
v8:
 - fix valid_user_sp
v11:
 - rebase on top of def0bfdbd603
---
 arch/powerpc/perf/Makefile       |   5 +-
 arch/powerpc/perf/callchain.c    | 357 +------------------------------
 arch/powerpc/perf/callchain.h    |  20 ++
 arch/powerpc/perf/callchain_32.c | 196 +++++++++++++++++
 arch/powerpc/perf/callchain_64.c | 174 +++++++++++++++
 5 files changed, 395 insertions(+), 357 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index c155dcbb8691..53d614e98537 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_PERF_EVENTS)	+= callchain.o perf_regs.o
+obj-$(CONFIG_PERF_EVENTS)	+= callchain.o callchain_$(BITS).o perf_regs.o
+ifdef CONFIG_COMPAT
+obj-$(CONFIG_PERF_EVENTS)	+= callchain_32.o
+endif
 
 obj-$(CONFIG_PPC_PERF_CTRS)	+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)	+= ppc970-pmu.o power5-pmu.o \
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 8a274bd523b1..dd5051015008 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,11 +15,9 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_COMPAT
-#include "../kernel/ppc32.h"
-#endif
 #include <asm/pte-walk.h>
 
+#include "callchain.h"
 
 /*
  * Is sp valid as the address of the next kernel stack frame after prev_sp?
@@ -102,359 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
-static inline int valid_user_sp(unsigned long sp)
-{
-	bool is_64 = !is_32bit_task();
-
-	if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
-		return 0;
-	return 1;
-}
-
-#ifdef CONFIG_PPC64
-/*
- * On 64-bit we don't want to invoke hash_page on user addresses from
- * interrupt context, so if the access faults, we read the page tables
- * to find which page (if any) is mapped and access it directly.
- */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	int ret = -EFAULT;
-	pgd_t *pgdir;
-	pte_t *ptep, pte;
-	unsigned shift;
-	unsigned long addr = (unsigned long) ptr;
-	unsigned long offset;
-	unsigned long pfn, flags;
-	void *kaddr;
-
-	pgdir = current->mm->pgd;
-	if (!pgdir)
-		return -EFAULT;
-
-	local_irq_save(flags);
-	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
-	if (!ptep)
-		goto err_out;
-	if (!shift)
-		shift = PAGE_SHIFT;
-
-	/* align address to page boundary */
-	offset = addr & ((1UL << shift) - 1);
-
-	pte = READ_ONCE(*ptep);
-	if (!pte_present(pte) || !pte_user(pte))
-		goto err_out;
-	pfn = pte_pfn(pte);
-	if (!page_is_ram(pfn))
-		goto err_out;
-
-	/* no highmem to worry about here */
-	kaddr = pfn_to_kaddr(pfn);
-	memcpy(buf, kaddr + offset, nb);
-	ret = 0;
-err_out:
-	local_irq_restore(flags);
-	return ret;
-}
-
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 8);
-}
-
-/*
- * 64-bit user processes use the same stack frame for RT and non-RT signals.
- */
-struct signal_frame_64 {
-	char		dummy[__SIGNAL_FRAMESIZE];
-	struct ucontext	uc;
-	unsigned long	unused[2];
-	unsigned int	tramp[6];
-	struct siginfo	*pinfo;
-	void		*puc;
-	struct siginfo	info;
-	char		abigap[288];
-};
-
-static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_64, tramp))
-		return 1;
-	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-/*
- * Do some sanity checking on the signal frame pointed to by sp.
- * We check the pinfo and puc pointers in the frame.
- */
-static int sane_signal_64_frame(unsigned long sp)
-{
-	struct signal_frame_64 __user *sf;
-	unsigned long pinfo, puc;
-
-	sf = (struct signal_frame_64 __user *) sp;
-	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
-	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
-		return 0;
-	return pinfo == (unsigned long) &sf->info &&
-		puc == (unsigned long) &sf->uc;
-}
-
-static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned long sp, next_sp;
-	unsigned long next_ip;
-	unsigned long lr;
-	long level = 0;
-	struct signal_frame_64 __user *sigframe;
-	unsigned long __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned long __user *) sp;
-		if (!valid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
-			return;
-
-		/*
-		 * Note: the next_sp - sp >= signal frame size check
-		 * is true when next_sp < sp, which can happen when
-		 * transitioning from an alternate signal stack to the
-		 * normal stack.
-		 */
-		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
-		    (is_sigreturn_64_address(next_ip, sp) ||
-		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
-		    sane_signal_64_frame(sp)) {
-			/*
-			 * This looks like an signal frame
-			 */
-			sigframe = (struct signal_frame_64 __user *) sp;
-			uregs = sigframe->uc.uc_mcontext.gp_regs;
-			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_64(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-
-#else  /* CONFIG_PPC64 */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	return 0;
-}
-
-static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-					  struct pt_regs *regs)
-{
-}
-
-#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
-#define sigcontext32		sigcontext
-#define mcontext32		mcontext
-#define ucontext32		ucontext
-#define compat_siginfo_t	struct siginfo
-
-#endif /* CONFIG_PPC64 */
-
-#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	int rc;
-
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
-
-	if (IS_ENABLED(CONFIG_PPC64) && rc)
-		return read_user_stack_slow(ptr, ret, 4);
-
-	return rc;
-}
-
-/*
- * Layout for non-RT signal frames
- */
-struct signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32];
-	struct sigcontext32	sctx;
-	struct mcontext32	mctx;
-	int			abigap[56];
-};
-
-/*
- * Layout for RT signal frames
- */
-struct rt_signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
-	compat_siginfo_t	info;
-	struct ucontext32	uc;
-	int			abigap[56];
-};
-
-static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
-		return 1;
-	if (vdso32_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct rt_signal_frame_32,
-				 uc.uc_mcontext.mc_pad))
-		return 1;
-	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int sane_signal_32_frame(unsigned int sp)
-{
-	struct signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->mctx;
-}
-
-static int sane_rt_signal_32_frame(unsigned int sp)
-{
-	struct rt_signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->uc.uc_mcontext;
-}
-
-static unsigned int __user *signal_frame_32_regs(unsigned int sp,
-				unsigned int next_sp, unsigned int next_ip)
-{
-	struct mcontext32 __user *mctx = NULL;
-	struct signal_frame_32 __user *sf;
-	struct rt_signal_frame_32 __user *rt_sf;
-
-	/*
-	 * Note: the next_sp - sp >= signal frame size check
-	 * is true when next_sp < sp, for example, when
-	 * transitioning from an alternate signal stack to the
-	 * normal stack.
-	 */
-	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
-	    is_sigreturn_32_address(next_ip, sp) &&
-	    sane_signal_32_frame(sp)) {
-		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &sf->mctx;
-	}
-
-	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
-	    is_rt_sigreturn_32_address(next_ip, sp) &&
-	    sane_rt_signal_32_frame(sp)) {
-		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &rt_sf->uc.uc_mcontext;
-	}
-
-	if (!mctx)
-		return NULL;
-	return mctx->mc_gregs;
-}
-
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned int sp, next_sp;
-	unsigned int next_ip;
-	unsigned int lr;
-	long level = 0;
-	unsigned int __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned int __user *) (unsigned long) sp;
-		if (!valid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
-			return;
-
-		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
-		if (!uregs && level <= 1)
-			uregs = signal_frame_32_regs(sp, next_sp, lr);
-		if (uregs) {
-			/*
-			 * This looks like an signal frame, so restart
-			 * the stack trace with the values in it.
-			 */
-			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_32(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-#else /* 32bit */
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{}
-#endif /* 32bit */
-
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
new file mode 100644
index 000000000000..8631a96d627d
--- /dev/null
+++ b/arch/powerpc/perf/callchain.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _POWERPC_PERF_CALLCHAIN_H
+#define _POWERPC_PERF_CALLCHAIN_H
+
+int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+
+static inline int valid_user_sp(unsigned long sp)
+{
+	bool is_64 = !is_32bit_task();
+
+	if (!sp || (sp & (is_64 ? 7 : 3)) || sp > STACK_TOP - (is_64 ? 32 : 16))
+		return 0;
+	return 1;
+}
+
+#endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
new file mode 100644
index 000000000000..25729c651cb2
--- /dev/null
+++ b/arch/powerpc/perf/callchain_32.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+#ifdef CONFIG_PPC64
+#include "../kernel/ppc32.h"
+#else  /* CONFIG_PPC64 */
+
+#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
+#define sigcontext32		sigcontext
+#define mcontext32		mcontext
+#define ucontext32		ucontext
+#define compat_siginfo_t	struct siginfo
+
+#endif /* CONFIG_PPC64 */
+
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
+/*
+ * Layout for non-RT signal frames
+ */
+struct signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32];
+	struct sigcontext32	sctx;
+	struct mcontext32	mctx;
+	int			abigap[56];
+};
+
+/*
+ * Layout for RT signal frames
+ */
+struct rt_signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
+	compat_siginfo_t	info;
+	struct ucontext32	uc;
+	int			abigap[56];
+};
+
+static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
+		return 1;
+	if (vdso32_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct rt_signal_frame_32,
+				 uc.uc_mcontext.mc_pad))
+		return 1;
+	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int sane_signal_32_frame(unsigned int sp)
+{
+	struct signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->mctx;
+}
+
+static int sane_rt_signal_32_frame(unsigned int sp)
+{
+	struct rt_signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->uc.uc_mcontext;
+}
+
+static unsigned int __user *signal_frame_32_regs(unsigned int sp,
+				unsigned int next_sp, unsigned int next_ip)
+{
+	struct mcontext32 __user *mctx = NULL;
+	struct signal_frame_32 __user *sf;
+	struct rt_signal_frame_32 __user *rt_sf;
+
+	/*
+	 * Note: the next_sp - sp >= signal frame size check
+	 * is true when next_sp < sp, for example, when
+	 * transitioning from an alternate signal stack to the
+	 * normal stack.
+	 */
+	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
+	    is_sigreturn_32_address(next_ip, sp) &&
+	    sane_signal_32_frame(sp)) {
+		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &sf->mctx;
+	}
+
+	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
+	    is_rt_sigreturn_32_address(next_ip, sp) &&
+	    sane_rt_signal_32_frame(sp)) {
+		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &rt_sf->uc.uc_mcontext;
+	}
+
+	if (!mctx)
+		return NULL;
+	return mctx->mc_gregs;
+}
+
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned int sp, next_sp;
+	unsigned int next_ip;
+	unsigned int lr;
+	long level = 0;
+	unsigned int __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned int __user *) (unsigned long) sp;
+		if (!valid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
+			return;
+
+		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
+		if (!uregs && level <= 1)
+			uregs = signal_frame_32_regs(sp, next_sp, lr);
+		if (uregs) {
+			/*
+			 * This looks like an signal frame, so restart
+			 * the stack trace with the values in it.
+			 */
+			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_32(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
new file mode 100644
index 000000000000..7e8eed59dd18
--- /dev/null
+++ b/arch/powerpc/perf/callchain_64.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+/*
+ * On 64-bit we don't want to invoke hash_page on user addresses from
+ * interrupt context, so if the access faults, we read the page tables
+ * to find which page (if any) is mapped and access it directly.
+ */
+int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+	int ret = -EFAULT;
+	pgd_t *pgdir;
+	pte_t *ptep, pte;
+	unsigned int shift;
+	unsigned long addr = (unsigned long) ptr;
+	unsigned long offset;
+	unsigned long pfn, flags;
+	void *kaddr;
+
+	pgdir = current->mm->pgd;
+	if (!pgdir)
+		return -EFAULT;
+
+	local_irq_save(flags);
+	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
+	if (!ptep)
+		goto err_out;
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* align address to page boundary */
+	offset = addr & ((1UL << shift) - 1);
+
+	pte = READ_ONCE(*ptep);
+	if (!pte_present(pte) || !pte_user(pte))
+		goto err_out;
+	pfn = pte_pfn(pte);
+	if (!page_is_ram(pfn))
+		goto err_out;
+
+	/* no highmem to worry about here */
+	kaddr = pfn_to_kaddr(pfn);
+	memcpy(buf, kaddr + offset, nb);
+	ret = 0;
+err_out:
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+{
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
+	    ((unsigned long)ptr & 7))
+		return -EFAULT;
+
+	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+		return 0;
+
+	return read_user_stack_slow(ptr, ret, 8);
+}
+
+/*
+ * 64-bit user processes use the same stack frame for RT and non-RT signals.
+ */
+struct signal_frame_64 {
+	char		dummy[__SIGNAL_FRAMESIZE];
+	struct ucontext	uc;
+	unsigned long	unused[2];
+	unsigned int	tramp[6];
+	struct siginfo	*pinfo;
+	void		*puc;
+	struct siginfo	info;
+	char		abigap[288];
+};
+
+static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_64, tramp))
+		return 1;
+	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+/*
+ * Do some sanity checking on the signal frame pointed to by sp.
+ * We check the pinfo and puc pointers in the frame.
+ */
+static int sane_signal_64_frame(unsigned long sp)
+{
+	struct signal_frame_64 __user *sf;
+	unsigned long pinfo, puc;
+
+	sf = (struct signal_frame_64 __user *) sp;
+	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
+	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
+		return 0;
+	return pinfo == (unsigned long) &sf->info &&
+		puc == (unsigned long) &sf->uc;
+}
+
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned long sp, next_sp;
+	unsigned long next_ip;
+	unsigned long lr;
+	long level = 0;
+	struct signal_frame_64 __user *sigframe;
+	unsigned long __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned long __user *) sp;
+		if (!valid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
+			return;
+
+		/*
+		 * Note: the next_sp - sp >= signal frame size check
+		 * is true when next_sp < sp, which can happen when
+		 * transitioning from an alternate signal stack to the
+		 * normal stack.
+		 */
+		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
+		    (is_sigreturn_64_address(next_ip, sp) ||
+		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
+		    sane_signal_64_frame(sp)) {
+			/*
+			 * This looks like an signal frame
+			 */
+			sigframe = (struct signal_frame_64 __user *) sp;
+			uregs = sigframe->uc.uc_mcontext.gp_regs;
+			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_64(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:19     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v10: new patch
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bc8dbe4fe4c9..329bf4a31412 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13088,6 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
 F:	arch/*/kernel/*/*/perf_event*.c
 F:	arch/*/include/asm/perf_event.h
 F:	arch/*/kernel/perf_callchain.c
+F:	arch/*/perf/*
+F:	arch/*/perf/*/*
 F:	arch/*/events/*
 F:	arch/*/events/*/*
 F:	tools/perf/
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-19 12:19     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-19 12:19 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v10: new patch
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bc8dbe4fe4c9..329bf4a31412 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13088,6 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
 F:	arch/*/kernel/*/*/perf_event*.c
 F:	arch/*/include/asm/perf_event.h
 F:	arch/*/kernel/perf_callchain.c
+F:	arch/*/perf/*
+F:	arch/*/perf/*/*
 F:	arch/*/events/*
 F:	arch/*/events/*/*
 F:	tools/perf/
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-03-19 12:36     ` Christophe Leroy
  -1 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-03-19 12:36 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Alexander Viro, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron,
	Andy Shevchenko, Thomas Gleixner, Arnd Bergmann, Nayna Jain,
	Eric Richter, Claudio Carvalho, Nicholas Piggin, Hari Bathini,
	Masahiro Yamada, Thiago Jung Bauermann,
	Sebastian Andrzej Siewior, Valentin Schneider, Jordan Niethe,
	Michael Neuling, Gustavo Luiz Duarte, Allison Randal,
	Eric W. Biederman, linux-kernel, linux-fsdevel

You sent it twice ? Any difference between the two dispatch ?

Christophe

Le 19/03/2020 à 13:19, Michal Suchanek a écrit :
> Less code means less bugs so add a knob to skip the compat stuff.
> 
> Changes in v2: saner CONFIG_COMPAT ifdefs
> Changes in v3:
>   - change llseek to 32bit instead of builing it unconditionally in fs
>   - clanup the makefile conditionals
>   - remove some ifdefs or convert to IS_DEFINED where possible
> Changes in v4:
>   - cleanup is_32bit_task and current_is_64bit
>   - more makefile cleanup
> Changes in v5:
>   - more current_is_64bit cleanup
>   - split off callchain.c 32bit and 64bit parts
> Changes in v6:
>   - cleanup makefile after split
>   - consolidate read_user_stack_32
>   - fix some checkpatch warnings
> Changes in v7:
>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>   - remove leftover hunk
>   - add review tags
> Changes in v8:
>   - consolidate valid_user_sp to fix it in the split callchain.c
>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
> Changes in v9:
>   - remove current_is_64bit()
> Chanegs in v10:
>   - rebase, sent together with the syscall cleanup
> Changes in v11:
>   - rebase
>   - add MAINTAINERS pattern for ppc perf
> 
> Michal Suchanek (8):
>    powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
>    powerpc: move common register copy functions from signal_32.c to
>      signal.c
>    powerpc/perf: consolidate read_user_stack_32
>    powerpc/perf: consolidate valid_user_sp
>    powerpc/64: make buildable without CONFIG_COMPAT
>    powerpc/64: Make COMPAT user-selectable disabled on littleendian by
>      default.
>    powerpc/perf: split callchain.c by bitness
>    MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
>      entry.
> 
>   MAINTAINERS                            |   2 +
>   arch/powerpc/Kconfig                   |   5 +-
>   arch/powerpc/include/asm/thread_info.h |   4 +-
>   arch/powerpc/include/asm/unistd.h      |   1 +
>   arch/powerpc/kernel/Makefile           |   6 +-
>   arch/powerpc/kernel/entry_64.S         |   2 +
>   arch/powerpc/kernel/signal.c           | 144 +++++++++-
>   arch/powerpc/kernel/signal_32.c        | 140 ----------
>   arch/powerpc/kernel/syscall_64.c       |   6 +-
>   arch/powerpc/kernel/vdso.c             |   3 +-
>   arch/powerpc/perf/Makefile             |   5 +-
>   arch/powerpc/perf/callchain.c          | 356 +------------------------
>   arch/powerpc/perf/callchain.h          |  20 ++
>   arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
>   arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
>   fs/read_write.c                        |   3 +-
>   16 files changed, 556 insertions(+), 511 deletions(-)
>   create mode 100644 arch/powerpc/perf/callchain.h
>   create mode 100644 arch/powerpc/perf/callchain_32.c
>   create mode 100644 arch/powerpc/perf/callchain_64.c
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-03-19 12:36     ` Christophe Leroy
  0 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-03-19 12:36 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

You sent it twice ? Any difference between the two dispatch ?

Christophe

Le 19/03/2020 à 13:19, Michal Suchanek a écrit :
> Less code means less bugs so add a knob to skip the compat stuff.
> 
> Changes in v2: saner CONFIG_COMPAT ifdefs
> Changes in v3:
>   - change llseek to 32bit instead of builing it unconditionally in fs
>   - clanup the makefile conditionals
>   - remove some ifdefs or convert to IS_DEFINED where possible
> Changes in v4:
>   - cleanup is_32bit_task and current_is_64bit
>   - more makefile cleanup
> Changes in v5:
>   - more current_is_64bit cleanup
>   - split off callchain.c 32bit and 64bit parts
> Changes in v6:
>   - cleanup makefile after split
>   - consolidate read_user_stack_32
>   - fix some checkpatch warnings
> Changes in v7:
>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>   - remove leftover hunk
>   - add review tags
> Changes in v8:
>   - consolidate valid_user_sp to fix it in the split callchain.c
>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
> Changes in v9:
>   - remove current_is_64bit()
> Chanegs in v10:
>   - rebase, sent together with the syscall cleanup
> Changes in v11:
>   - rebase
>   - add MAINTAINERS pattern for ppc perf
> 
> Michal Suchanek (8):
>    powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
>    powerpc: move common register copy functions from signal_32.c to
>      signal.c
>    powerpc/perf: consolidate read_user_stack_32
>    powerpc/perf: consolidate valid_user_sp
>    powerpc/64: make buildable without CONFIG_COMPAT
>    powerpc/64: Make COMPAT user-selectable disabled on littleendian by
>      default.
>    powerpc/perf: split callchain.c by bitness
>    MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
>      entry.
> 
>   MAINTAINERS                            |   2 +
>   arch/powerpc/Kconfig                   |   5 +-
>   arch/powerpc/include/asm/thread_info.h |   4 +-
>   arch/powerpc/include/asm/unistd.h      |   1 +
>   arch/powerpc/kernel/Makefile           |   6 +-
>   arch/powerpc/kernel/entry_64.S         |   2 +
>   arch/powerpc/kernel/signal.c           | 144 +++++++++-
>   arch/powerpc/kernel/signal_32.c        | 140 ----------
>   arch/powerpc/kernel/syscall_64.c       |   6 +-
>   arch/powerpc/kernel/vdso.c             |   3 +-
>   arch/powerpc/perf/Makefile             |   5 +-
>   arch/powerpc/perf/callchain.c          | 356 +------------------------
>   arch/powerpc/perf/callchain.h          |  20 ++
>   arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
>   arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
>   fs/read_write.c                        |   3 +-
>   16 files changed, 556 insertions(+), 511 deletions(-)
>   create mode 100644 arch/powerpc/perf/callchain.h
>   create mode 100644 arch/powerpc/perf/callchain_32.c
>   create mode 100644 arch/powerpc/perf/callchain_64.c
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-19 12:19     ` Michal Suchanek
@ 2020-03-19 13:37       ` Andy Shevchenko
  -1 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-19 13:37 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Alexander Viro, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron,
	Andy Shevchenko, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	Linux Kernel Mailing List, Linux FS Devel

On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek <msuchanek@suse.de> wrote:
>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v10: new patch
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bc8dbe4fe4c9..329bf4a31412 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13088,6 +13088,8 @@ F:      arch/*/kernel/*/perf_event*.c
>  F:     arch/*/kernel/*/*/perf_event*.c
>  F:     arch/*/include/asm/perf_event.h
>  F:     arch/*/kernel/perf_callchain.c
> +F:     arch/*/perf/*
> +F:     arch/*/perf/*/*
>  F:     arch/*/events/*
>  F:     arch/*/events/*/*
>  F:     tools/perf/

Had you run parse-maintainers.pl?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-19 13:37       ` Andy Shevchenko
  0 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-19 13:37 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, Linux Kernel Mailing List,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Mauro Carvalho Chehab, Masahiro Yamada, Nayna Jain,
	Alexander Shishkin, Ingo Molnar, Allison Randal, Jordan Niethe,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, Linux FS Devel,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT, David S. Miller,
	Thiago Jung Bauermann

On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek <msuchanek@suse.de> wrote:
>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v10: new patch
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bc8dbe4fe4c9..329bf4a31412 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13088,6 +13088,8 @@ F:      arch/*/kernel/*/perf_event*.c
>  F:     arch/*/kernel/*/*/perf_event*.c
>  F:     arch/*/include/asm/perf_event.h
>  F:     arch/*/kernel/perf_callchain.c
> +F:     arch/*/perf/*
> +F:     arch/*/perf/*/*
>  F:     arch/*/events/*
>  F:     arch/*/events/*/*
>  F:     tools/perf/

Had you run parse-maintainers.pl?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-19 13:37       ` Andy Shevchenko
@ 2020-03-19 14:00         ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-19 14:00 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Alexander Viro, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron,
	Andy Shevchenko, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	Linux Kernel Mailing List, Linux FS Devel

On Thu, Mar 19, 2020 at 03:37:03PM +0200, Andy Shevchenko wrote:
> On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek <msuchanek@suse.de> wrote:
> >
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > v10: new patch
> > ---
> >  MAINTAINERS | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index bc8dbe4fe4c9..329bf4a31412 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13088,6 +13088,8 @@ F:      arch/*/kernel/*/perf_event*.c
> >  F:     arch/*/kernel/*/*/perf_event*.c
> >  F:     arch/*/include/asm/perf_event.h
> >  F:     arch/*/kernel/perf_callchain.c
> > +F:     arch/*/perf/*
> > +F:     arch/*/perf/*/*
> >  F:     arch/*/events/*
> >  F:     arch/*/events/*/*
> >  F:     tools/perf/
> 
> Had you run parse-maintainers.pl?
Did not know it exists. The output is:

scripts/parse-maintainers.pl 
Odd non-pattern line '
Documentation/devicetree/bindings/media/ti,cal.yaml
' for 'TI VPE/CAL DRIVERS' at scripts/parse-maintainers.pl line 147,
<$file> line 16756.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-19 14:00         ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-19 14:00 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, Linux Kernel Mailing List,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Mauro Carvalho Chehab, Masahiro Yamada, Nayna Jain,
	Alexander Shishkin, Ingo Molnar, Allison Randal, Jordan Niethe,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, Linux FS Devel,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT, David S. Miller,
	Thiago Jung Bauermann

On Thu, Mar 19, 2020 at 03:37:03PM +0200, Andy Shevchenko wrote:
> On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek <msuchanek@suse.de> wrote:
> >
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > v10: new patch
> > ---
> >  MAINTAINERS | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index bc8dbe4fe4c9..329bf4a31412 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13088,6 +13088,8 @@ F:      arch/*/kernel/*/perf_event*.c
> >  F:     arch/*/kernel/*/*/perf_event*.c
> >  F:     arch/*/include/asm/perf_event.h
> >  F:     arch/*/kernel/perf_callchain.c
> > +F:     arch/*/perf/*
> > +F:     arch/*/perf/*/*
> >  F:     arch/*/events/*
> >  F:     arch/*/events/*/*
> >  F:     tools/perf/
> 
> Had you run parse-maintainers.pl?
Did not know it exists. The output is:

scripts/parse-maintainers.pl 
Odd non-pattern line '
Documentation/devicetree/bindings/media/ti,cal.yaml
' for 'TI VPE/CAL DRIVERS' at scripts/parse-maintainers.pl line 147,
<$file> line 16756.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-03-19 12:36     ` Christophe Leroy
@ 2020-03-19 14:01       ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-19 14:01 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Thu, Mar 19, 2020 at 01:36:56PM +0100, Christophe Leroy wrote:
> You sent it twice ? Any difference between the two dispatch ?
Some headers were broken the first time around.

Thanks

Michal
> 
> Christophe
> 
> Le 19/03/2020 à 13:19, Michal Suchanek a écrit :
> > Less code means less bugs so add a knob to skip the compat stuff.
> > 
> > Changes in v2: saner CONFIG_COMPAT ifdefs
> > Changes in v3:
> >   - change llseek to 32bit instead of builing it unconditionally in fs
> >   - clanup the makefile conditionals
> >   - remove some ifdefs or convert to IS_DEFINED where possible
> > Changes in v4:
> >   - cleanup is_32bit_task and current_is_64bit
> >   - more makefile cleanup
> > Changes in v5:
> >   - more current_is_64bit cleanup
> >   - split off callchain.c 32bit and 64bit parts
> > Changes in v6:
> >   - cleanup makefile after split
> >   - consolidate read_user_stack_32
> >   - fix some checkpatch warnings
> > Changes in v7:
> >   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> >   - remove leftover hunk
> >   - add review tags
> > Changes in v8:
> >   - consolidate valid_user_sp to fix it in the split callchain.c
> >   - fix build errors/warnings with PPC64 !COMPAT and PPC32
> > Changes in v9:
> >   - remove current_is_64bit()
> > Chanegs in v10:
> >   - rebase, sent together with the syscall cleanup
> > Changes in v11:
> >   - rebase
> >   - add MAINTAINERS pattern for ppc perf
> > 
> > Michal Suchanek (8):
> >    powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> >    powerpc: move common register copy functions from signal_32.c to
> >      signal.c
> >    powerpc/perf: consolidate read_user_stack_32
> >    powerpc/perf: consolidate valid_user_sp
> >    powerpc/64: make buildable without CONFIG_COMPAT
> >    powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> >      default.
> >    powerpc/perf: split callchain.c by bitness
> >    MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
> >      entry.
> > 
> >   MAINTAINERS                            |   2 +
> >   arch/powerpc/Kconfig                   |   5 +-
> >   arch/powerpc/include/asm/thread_info.h |   4 +-
> >   arch/powerpc/include/asm/unistd.h      |   1 +
> >   arch/powerpc/kernel/Makefile           |   6 +-
> >   arch/powerpc/kernel/entry_64.S         |   2 +
> >   arch/powerpc/kernel/signal.c           | 144 +++++++++-
> >   arch/powerpc/kernel/signal_32.c        | 140 ----------
> >   arch/powerpc/kernel/syscall_64.c       |   6 +-
> >   arch/powerpc/kernel/vdso.c             |   3 +-
> >   arch/powerpc/perf/Makefile             |   5 +-
> >   arch/powerpc/perf/callchain.c          | 356 +------------------------
> >   arch/powerpc/perf/callchain.h          |  20 ++
> >   arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
> >   arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
> >   fs/read_write.c                        |   3 +-
> >   16 files changed, 556 insertions(+), 511 deletions(-)
> >   create mode 100644 arch/powerpc/perf/callchain.h
> >   create mode 100644 arch/powerpc/perf/callchain_32.c
> >   create mode 100644 arch/powerpc/perf/callchain_64.c
> > 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-03-19 14:01       ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-19 14:01 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, linux-fsdevel, linuxppc-dev,
	David S. Miller, Thiago Jung Bauermann

On Thu, Mar 19, 2020 at 01:36:56PM +0100, Christophe Leroy wrote:
> You sent it twice ? Any difference between the two dispatch ?
Some headers were broken the first time around.

Thanks

Michal
> 
> Christophe
> 
> Le 19/03/2020 à 13:19, Michal Suchanek a écrit :
> > Less code means less bugs so add a knob to skip the compat stuff.
> > 
> > Changes in v2: saner CONFIG_COMPAT ifdefs
> > Changes in v3:
> >   - change llseek to 32bit instead of builing it unconditionally in fs
> >   - clanup the makefile conditionals
> >   - remove some ifdefs or convert to IS_DEFINED where possible
> > Changes in v4:
> >   - cleanup is_32bit_task and current_is_64bit
> >   - more makefile cleanup
> > Changes in v5:
> >   - more current_is_64bit cleanup
> >   - split off callchain.c 32bit and 64bit parts
> > Changes in v6:
> >   - cleanup makefile after split
> >   - consolidate read_user_stack_32
> >   - fix some checkpatch warnings
> > Changes in v7:
> >   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> >   - remove leftover hunk
> >   - add review tags
> > Changes in v8:
> >   - consolidate valid_user_sp to fix it in the split callchain.c
> >   - fix build errors/warnings with PPC64 !COMPAT and PPC32
> > Changes in v9:
> >   - remove current_is_64bit()
> > Chanegs in v10:
> >   - rebase, sent together with the syscall cleanup
> > Changes in v11:
> >   - rebase
> >   - add MAINTAINERS pattern for ppc perf
> > 
> > Michal Suchanek (8):
> >    powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> >    powerpc: move common register copy functions from signal_32.c to
> >      signal.c
> >    powerpc/perf: consolidate read_user_stack_32
> >    powerpc/perf: consolidate valid_user_sp
> >    powerpc/64: make buildable without CONFIG_COMPAT
> >    powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> >      default.
> >    powerpc/perf: split callchain.c by bitness
> >    MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
> >      entry.
> > 
> >   MAINTAINERS                            |   2 +
> >   arch/powerpc/Kconfig                   |   5 +-
> >   arch/powerpc/include/asm/thread_info.h |   4 +-
> >   arch/powerpc/include/asm/unistd.h      |   1 +
> >   arch/powerpc/kernel/Makefile           |   6 +-
> >   arch/powerpc/kernel/entry_64.S         |   2 +
> >   arch/powerpc/kernel/signal.c           | 144 +++++++++-
> >   arch/powerpc/kernel/signal_32.c        | 140 ----------
> >   arch/powerpc/kernel/syscall_64.c       |   6 +-
> >   arch/powerpc/kernel/vdso.c             |   3 +-
> >   arch/powerpc/perf/Makefile             |   5 +-
> >   arch/powerpc/perf/callchain.c          | 356 +------------------------
> >   arch/powerpc/perf/callchain.h          |  20 ++
> >   arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
> >   arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
> >   fs/read_write.c                        |   3 +-
> >   16 files changed, 556 insertions(+), 511 deletions(-)
> >   create mode 100644 arch/powerpc/perf/callchain.h
> >   create mode 100644 arch/powerpc/perf/callchain_32.c
> >   create mode 100644 arch/powerpc/perf/callchain_64.c
> > 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-19 14:00         ` Michal Suchánek
@ 2020-03-19 14:26           ` Andy Shevchenko
  -1 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-19 14:26 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Alexander Viro, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron,
	Christophe Leroy, Thomas Gleixner, Arnd Bergmann, Nayna Jain,
	Eric Richter, Claudio Carvalho, Nicholas Piggin, Hari Bathini,
	Masahiro Yamada, Thiago Jung Bauermann,
	Sebastian Andrzej Siewior, Valentin Schneider, Jordan Niethe,
	Michael Neuling, Gustavo Luiz Duarte, Allison Randal,
	Eric W. Biederman, Linux Kernel Mailing List, Linux FS Devel

On Thu, Mar 19, 2020 at 03:00:08PM +0100, Michal Suchánek wrote:
> On Thu, Mar 19, 2020 at 03:37:03PM +0200, Andy Shevchenko wrote:
> > On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek <msuchanek@suse.de> wrote:
> > >
> > > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > > ---
> > > v10: new patch
> > > ---
> > >  MAINTAINERS | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bc8dbe4fe4c9..329bf4a31412 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -13088,6 +13088,8 @@ F:      arch/*/kernel/*/perf_event*.c
> > >  F:     arch/*/kernel/*/*/perf_event*.c
> > >  F:     arch/*/include/asm/perf_event.h
> > >  F:     arch/*/kernel/perf_callchain.c
> > > +F:     arch/*/perf/*
> > > +F:     arch/*/perf/*/*
> > >  F:     arch/*/events/*
> > >  F:     arch/*/events/*/*
> > >  F:     tools/perf/
> > 
> > Had you run parse-maintainers.pl?
> Did not know it exists. The output is:
> 
> scripts/parse-maintainers.pl 
> Odd non-pattern line '
> Documentation/devicetree/bindings/media/ti,cal.yaml
> ' for 'TI VPE/CAL DRIVERS' at scripts/parse-maintainers.pl line 147,
> <$file> line 16756.

It is fixed in media tree and available in linux next as

d44535cb14c9 ("media: MAINTAINERS: Sort entries in database for TI VPE/CAL")

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-19 14:26           ` Andy Shevchenko
  0 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-19 14:26 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, Linux Kernel Mailing List,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Mauro Carvalho Chehab, Masahiro Yamada, Nayna Jain,
	Alexander Shishkin, Ingo Molnar, Allison Randal, Jordan Niethe,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Hari Bathini, Greg Kroah-Hartman, Nicholas Piggin,
	Claudio Carvalho, Eric Richter, Eric W. Biederman,
	Linux FS Devel, open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	David S. Miller, Thiago Jung Bauermann

On Thu, Mar 19, 2020 at 03:00:08PM +0100, Michal Suchánek wrote:
> On Thu, Mar 19, 2020 at 03:37:03PM +0200, Andy Shevchenko wrote:
> > On Thu, Mar 19, 2020 at 2:21 PM Michal Suchanek <msuchanek@suse.de> wrote:
> > >
> > > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > > ---
> > > v10: new patch
> > > ---
> > >  MAINTAINERS | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bc8dbe4fe4c9..329bf4a31412 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -13088,6 +13088,8 @@ F:      arch/*/kernel/*/perf_event*.c
> > >  F:     arch/*/kernel/*/*/perf_event*.c
> > >  F:     arch/*/include/asm/perf_event.h
> > >  F:     arch/*/kernel/perf_callchain.c
> > > +F:     arch/*/perf/*
> > > +F:     arch/*/perf/*/*
> > >  F:     arch/*/events/*
> > >  F:     arch/*/events/*/*
> > >  F:     tools/perf/
> > 
> > Had you run parse-maintainers.pl?
> Did not know it exists. The output is:
> 
> scripts/parse-maintainers.pl 
> Odd non-pattern line '
> Documentation/devicetree/bindings/media/ti,cal.yaml
> ' for 'TI VPE/CAL DRIVERS' at scripts/parse-maintainers.pl line 147,
> <$file> line 16756.

It is fixed in media tree and available in linux next as

d44535cb14c9 ("media: MAINTAINERS: Sort entries in database for TI VPE/CAL")

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-19 12:19     ` Michal Suchanek
@ 2020-03-19 17:03       ` Joe Perches
  -1 siblings, 0 replies; 161+ messages in thread
From: Joe Perches @ 2020-03-19 17:03 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Alexander Viro, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron,
	Andy Shevchenko, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Thu, 2020-03-19 at 13:19 +0100, Michal Suchanek wrote:
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v10: new patch
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bc8dbe4fe4c9..329bf4a31412 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13088,6 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
>  F:	arch/*/kernel/*/*/perf_event*.c
>  F:	arch/*/include/asm/perf_event.h
>  F:	arch/*/kernel/perf_callchain.c
> +F:	arch/*/perf/*
> +F:	arch/*/perf/*/*

While I understand the desire, I believe that
repetitive listings like this don't really
help much.

Having a single entry of:

F:	arch/*/perf/

would serve the same purpose.

Nominally, the difference between the 2 entries
vs the 1 entry is this:

F:	arch/*/perf/*

Only the specific files in any directory that matches
this pattern but not any files in their subdirectories
are maintained.

F:	arch/*/perf/*/*

Only the files in any top level subdirectory of any
directory that matches this pattern are maintained
but not files in any directory of those subdirectories.

F:	arch/*/perf/

Any file or any file in any subdirectory of any
directory that matches this pattern is maintained.



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-19 17:03       ` Joe Perches
  0 siblings, 0 replies; 161+ messages in thread
From: Joe Perches @ 2020-03-19 17:03 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

On Thu, 2020-03-19 at 13:19 +0100, Michal Suchanek wrote:
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v10: new patch
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index bc8dbe4fe4c9..329bf4a31412 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13088,6 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
>  F:	arch/*/kernel/*/*/perf_event*.c
>  F:	arch/*/include/asm/perf_event.h
>  F:	arch/*/kernel/perf_callchain.c
> +F:	arch/*/perf/*
> +F:	arch/*/perf/*/*

While I understand the desire, I believe that
repetitive listings like this don't really
help much.

Having a single entry of:

F:	arch/*/perf/

would serve the same purpose.

Nominally, the difference between the 2 entries
vs the 1 entry is this:

F:	arch/*/perf/*

Only the specific files in any directory that matches
this pattern but not any files in their subdirectories
are maintained.

F:	arch/*/perf/*/*

Only the files in any top level subdirectory of any
directory that matches this pattern are maintained
but not files in any directory of those subdirectories.

F:	arch/*/perf/

Any file or any file in any subdirectory of any
directory that matches this pattern is maintained.



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C
  2020-03-19  9:18   ` Christophe Leroy
@ 2020-03-20  3:39     ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-03-20  3:39 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Christophe Leroy's on March 19, 2020 7:18 pm:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> System call entry and particularly exit code is beyond the limit of what
>> is reasonable to implement in asm.
>> 
>> This conversion moves all conditional branches out of the asm code,
>> except for the case that all GPRs should be restored at exit.
>> 
>> Null syscall test is about 5% faster after this patch, because the exit
>> work is handled under local_irq_disable, and the hard mask and pending
>> interrupt replay is handled after that, which avoids games with MSR.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
>> ---
>> 
>> v2,rebase (from Michal):
>> - Add endian conversion for dtl_idx (ms)
>> - Fix sparse warning about missing declaration (ms)
>> - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)
>> 
>> v3: Fixes thanks to reports from mpe and selftests errors:
>> - Several soft-mask debug and unsafe smp_processor_id() warnings due to
>>    tracing and other false positives due to checks in "unreconciled" code.
>> - Fix a bug with syscall tracing functions that set registers (e.g.,
>>    PTRACE_SETREG) not setting GPRs properly.
>> - Fix silly tabort_syscall bug that causes kernel crashes when making system
>>    calls in transactional state.
>> 
>>   arch/powerpc/include/asm/asm-prototypes.h     |  17 +-
>>   .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
>>   arch/powerpc/include/asm/cputime.h            |  29 ++
>>   arch/powerpc/include/asm/hw_irq.h             |   4 +
>>   arch/powerpc/include/asm/ptrace.h             |   3 +
>>   arch/powerpc/include/asm/signal.h             |   3 +
>>   arch/powerpc/include/asm/switch_to.h          |   5 +
>>   arch/powerpc/include/asm/time.h               |   3 +
>>   arch/powerpc/kernel/Makefile                  |   3 +-
>>   arch/powerpc/kernel/entry_64.S                | 338 +++---------------
>>   arch/powerpc/kernel/signal.h                  |   2 -
>>   arch/powerpc/kernel/syscall_64.c              | 213 +++++++++++
>>   arch/powerpc/kernel/systbl.S                  |   9 +-
>>   13 files changed, 328 insertions(+), 315 deletions(-)
>>   create mode 100644 arch/powerpc/kernel/syscall_64.c
>> 
>> diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
>> index 983c0084fb3f..4b3609554e76 100644
>> --- a/arch/powerpc/include/asm/asm-prototypes.h
>> +++ b/arch/powerpc/include/asm/asm-prototypes.h
>> @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
>>   unsigned long __init early_init(unsigned long dt_ptr);
>>   void __init machine_init(u64 dt_ptr);
>>   #endif
>> +#ifdef CONFIG_PPC64
> 
> This ifdef is not necessary as it has no pending #else.
> Having function declaration without definition is not an issue.
> Keeping in mind that we are aiming at generalising this to PPC32.

Well there's other unnecessary ifdefs in there too I think. But sure.
This patch also got the interrupt_exit_ prototypes leaked in from the
later patch so I could fix those.

>> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
>> index 2431b4ada2fa..6639a6847cc0 100644
>> --- a/arch/powerpc/include/asm/cputime.h
>> +++ b/arch/powerpc/include/asm/cputime.h
>> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
>>   #ifdef CONFIG_PPC64
>>   #define get_accounting(tsk)	(&get_paca()->accounting)
>>   static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> 
> Could we have the below additions sit outside of this PPC64 ifdef, to be 
> reused on PPC32 ?

Okay.

>> +
>> +/*
>> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
>> + * can't use use get_paca()
>> + */
>> +static notrace inline void account_cpu_user_entry(void)
>> +{
>> +	unsigned long tb = mftb();
>> +	struct cpu_accounting_data *acct = &local_paca->accounting;
> 
> In the spirit of reusing that code on PPC32, can we use get_accounting() 
> ? Or an alternate version of get_accounting(), eg 
> get_accounting_notrace() to be defined ?

Okay.

>> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> 
> Could some part of it go in a syscall.c to be reused on PPC32 ?

I could put it all in syscall.c and then we can adjust with some ifdefs
or helpers. I don't think there is enough to be worth syscall.c,
syscall_32.c, and syscall_64.c.

I wonder about the interrupt returns as well, that doesn't really make
sense in a file called syscall.c, but the code is very similar to
system call exit. Should we just call it interrupts.c?

>> +	/*
>> +	 * This is not required for the syscall exit path, but makes the
>> +	 * stack frame look nicer. If this was initialised in the first stack
>> +	 * frame, or if the unwinder was taught the first stack frame always
>> +	 * returns to user with IRQS_ENABLED, this store could be avoided!
>> +	 */
>> +	regs->softe = IRQS_ENABLED;
> 
> softe doesn't exist on PPC32. Can we do that through a helper ?

I guess, we can have regs_set_irq_state(regs, IRQS_ENABLED); or
something like that.

We make that helper and a _get_ counterpart in a later patch which 
covers other cases in the tree as well.

>> +
>> +	__hard_irq_enable();
> 
> This doesn't exist on PPC32. Should we define __hard_irq_enable() as 
> arch_local_irq_enable() on PPC32 ?

This goes away with patch 29. Better not to have this ugly thing
spill into ppc32 code at all if we can avoid it :)

> 
>> +
>> +	ti_flags = current_thread_info()->flags;
>> +	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
>> +		/*
>> +		 * We use the return value of do_syscall_trace_enter() as the
>> +		 * syscall number. If the syscall was rejected for any reason
>> +		 * do_syscall_trace_enter() returns an invalid syscall number
>> +		 * and the test against NR_syscalls will fail and the return
>> +		 * value to be used is in regs->gpr[3].
>> +		 */
>> +		r0 = do_syscall_trace_enter(regs);
>> +		if (unlikely(r0 >= NR_syscalls))
>> +			return regs->gpr[3];
>> +		r3 = regs->gpr[3];
>> +		r4 = regs->gpr[4];
>> +		r5 = regs->gpr[5];
>> +		r6 = regs->gpr[6];
>> +		r7 = regs->gpr[7];
>> +		r8 = regs->gpr[8];
>> +
>> +	} else if (unlikely(r0 >= NR_syscalls)) {
>> +		return -ENOSYS;
>> +	}
>> +
>> +	/* May be faster to do array_index_nospec? */
>> +	barrier_nospec();
>> +
>> +	if (unlikely(ti_flags & _TIF_32BIT)) {
> 
> Use is_compat_task() instead ?

Michal pointed this out, he's got patches that do this on top of this
series.

Incremental diff for your suggestions below. Now there is likely we're
going to have a few ifdefs, particularly in the exit paths where we have
complexity handling irq soft masked state where helpers dont make much
sense. I don't think that will be such a bad thing, but we can come to
it as we go.

Thanks,
Nick

---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ---
 arch/powerpc/include/asm/cputime.h        | 38 +++++++++++++----------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 4b3609554e76..ab59a4904254 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -97,12 +97,8 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
-#ifdef CONFIG_PPC64
 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
 notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
-notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
-notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
-#endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
 		      u32 len_high, u32 len_low);
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 6639a6847cc0..0fccd5ea1e9a 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -43,8 +43,26 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
  */
 #ifdef CONFIG_PPC64
 #define get_accounting(tsk)	(&get_paca()->accounting)
+#define raw_get_accounting(tsk)	(&local_paca->accounting)
 static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
 
+#else
+#define get_accounting(tsk)	(&task_thread_info(tsk)->accounting)
+#define raw_get_accounting(tsk)	get_accounting(tsk)
+/*
+ * Called from the context switch with interrupts disabled, to charge all
+ * accumulated times to the current process, and to prepare accounting on
+ * the next process.
+ */
+static inline void arch_vtime_task_switch(struct task_struct *prev)
+{
+	struct cpu_accounting_data *acct = get_accounting(current);
+	struct cpu_accounting_data *acct0 = get_accounting(prev);
+
+	acct->starttime = acct0->starttime;
+}
+#endif
+
 /*
  * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
  * can't use use get_paca()
@@ -52,35 +70,21 @@ static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
 static notrace inline void account_cpu_user_entry(void)
 {
 	unsigned long tb = mftb();
-	struct cpu_accounting_data *acct = &local_paca->accounting;
+	struct cpu_accounting_data *acct = raw_get_accounting(current);
 
 	acct->utime += (tb - acct->starttime_user);
 	acct->starttime = tb;
 }
+
 static notrace inline void account_cpu_user_exit(void)
 {
 	unsigned long tb = mftb();
-	struct cpu_accounting_data *acct = &local_paca->accounting;
+	struct cpu_accounting_data *acct = raw_get_accounting(current);
 
 	acct->stime += (tb - acct->starttime);
 	acct->starttime_user = tb;
 }
 
-#else
-#define get_accounting(tsk)	(&task_thread_info(tsk)->accounting)
-/*
- * Called from the context switch with interrupts disabled, to charge all
- * accumulated times to the current process, and to prepare accounting on
- * the next process.
- */
-static inline void arch_vtime_task_switch(struct task_struct *prev)
-{
-	struct cpu_accounting_data *acct = get_accounting(current);
-	struct cpu_accounting_data *acct0 = get_accounting(prev);
-
-	acct->starttime = acct0->starttime;
-}
-#endif
 
 #endif /* __KERNEL__ */
 #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 0/8] Disable compat cruft on ppc64le v12
  2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
@ 2020-03-20 10:20   ` Michal Suchanek
  2020-02-25 17:35 ` [PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters Nicholas Piggin
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Less code means less bugs so add a knob to skip the compat stuff.

Changes in v2: saner CONFIG_COMPAT ifdefs
Changes in v3:
 - change llseek to 32bit instead of builing it unconditionally in fs
 - clanup the makefile conditionals
 - remove some ifdefs or convert to IS_DEFINED where possible
Changes in v4:
 - cleanup is_32bit_task and current_is_64bit
 - more makefile cleanup
Changes in v5:
 - more current_is_64bit cleanup
 - split off callchain.c 32bit and 64bit parts
Changes in v6:
 - cleanup makefile after split
 - consolidate read_user_stack_32
 - fix some checkpatch warnings
Changes in v7:
 - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
 - remove leftover hunk
 - add review tags
Changes in v8:
 - consolidate valid_user_sp to fix it in the split callchain.c
 - fix build errors/warnings with PPC64 !COMPAT and PPC32
Changes in v9:
 - remove current_is_64bit()
Chanegs in v10:
 - rebase, sent together with the syscall cleanup
Changes in v11:
 - rebase
 - add MAINTAINERS pattern for ppc perf
Changes in v12:
 - simplify valid_user_sp and change to invalid_user_sp
 - remove superfluous perf patterns in MAINTAINERS

Michal Suchanek (8):
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc: move common register copy functions from signal_32.c to
    signal.c
  powerpc/perf: consolidate read_user_stack_32
  powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by
    default.
  powerpc/perf: split callchain.c by bitness
  MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
    entry.

 MAINTAINERS                            |   6 +-
 arch/powerpc/Kconfig                   |   5 +-
 arch/powerpc/include/asm/thread_info.h |   4 +-
 arch/powerpc/include/asm/unistd.h      |   1 +
 arch/powerpc/kernel/Makefile           |   6 +-
 arch/powerpc/kernel/entry_64.S         |   2 +
 arch/powerpc/kernel/signal.c           | 144 +++++++++-
 arch/powerpc/kernel/signal_32.c        | 140 ----------
 arch/powerpc/kernel/syscall_64.c       |   6 +-
 arch/powerpc/kernel/vdso.c             |   3 +-
 arch/powerpc/perf/Makefile             |   5 +-
 arch/powerpc/perf/callchain.c          | 356 +------------------------
 arch/powerpc/perf/callchain.h          |  19 ++
 arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
 arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
 fs/read_write.c                        |   3 +-
 16 files changed, 556 insertions(+), 514 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH v12 0/8] Disable compat cruft on ppc64le v12
@ 2020-03-20 10:20   ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Less code means less bugs so add a knob to skip the compat stuff.

Changes in v2: saner CONFIG_COMPAT ifdefs
Changes in v3:
 - change llseek to 32bit instead of builing it unconditionally in fs
 - clanup the makefile conditionals
 - remove some ifdefs or convert to IS_DEFINED where possible
Changes in v4:
 - cleanup is_32bit_task and current_is_64bit
 - more makefile cleanup
Changes in v5:
 - more current_is_64bit cleanup
 - split off callchain.c 32bit and 64bit parts
Changes in v6:
 - cleanup makefile after split
 - consolidate read_user_stack_32
 - fix some checkpatch warnings
Changes in v7:
 - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
 - remove leftover hunk
 - add review tags
Changes in v8:
 - consolidate valid_user_sp to fix it in the split callchain.c
 - fix build errors/warnings with PPC64 !COMPAT and PPC32
Changes in v9:
 - remove current_is_64bit()
Chanegs in v10:
 - rebase, sent together with the syscall cleanup
Changes in v11:
 - rebase
 - add MAINTAINERS pattern for ppc perf
Changes in v12:
 - simplify valid_user_sp and change to invalid_user_sp
 - remove superfluous perf patterns in MAINTAINERS

Michal Suchanek (8):
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc: move common register copy functions from signal_32.c to
    signal.c
  powerpc/perf: consolidate read_user_stack_32
  powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by
    default.
  powerpc/perf: split callchain.c by bitness
  MAINTAINERS: perf: Add pattern that matches ppc perf to the perf
    entry.

 MAINTAINERS                            |   6 +-
 arch/powerpc/Kconfig                   |   5 +-
 arch/powerpc/include/asm/thread_info.h |   4 +-
 arch/powerpc/include/asm/unistd.h      |   1 +
 arch/powerpc/kernel/Makefile           |   6 +-
 arch/powerpc/kernel/entry_64.S         |   2 +
 arch/powerpc/kernel/signal.c           | 144 +++++++++-
 arch/powerpc/kernel/signal_32.c        | 140 ----------
 arch/powerpc/kernel/syscall_64.c       |   6 +-
 arch/powerpc/kernel/vdso.c             |   3 +-
 arch/powerpc/perf/Makefile             |   5 +-
 arch/powerpc/perf/callchain.c          | 356 +------------------------
 arch/powerpc/perf/callchain.h          |  19 ++
 arch/powerpc/perf/callchain_32.c       | 196 ++++++++++++++
 arch/powerpc/perf/callchain_64.c       | 174 ++++++++++++
 fs/read_write.c                        |   3 +-
 16 files changed, 556 insertions(+), 514 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH v12 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")

When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.

Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
---
v7: new patch
---
 arch/powerpc/include/asm/unistd.h | 1 +
 fs/read_write.c                   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index b0720c7c3fcf..700fcdac2e3c 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -31,6 +31,7 @@
 #define __ARCH_WANT_SYS_SOCKETCALL
 #define __ARCH_WANT_SYS_FADVISE64
 #define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
 #define __ARCH_WANT_SYS_NICE
 #define __ARCH_WANT_SYS_OLD_GETRLIMIT
 #define __ARCH_WANT_SYS_OLD_UNAME
diff --git a/fs/read_write.c b/fs/read_write.c
index 59d819c5b92e..bbfa9b12b15e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -331,7 +331,8 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned i
 }
 #endif
 
-#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
+#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
+	defined(__ARCH_WANT_SYS_LLSEEK)
 SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		unsigned long, offset_low, loff_t __user *, result,
 		unsigned int, whence)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")

When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.

Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
---
v7: new patch
---
 arch/powerpc/include/asm/unistd.h | 1 +
 fs/read_write.c                   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index b0720c7c3fcf..700fcdac2e3c 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -31,6 +31,7 @@
 #define __ARCH_WANT_SYS_SOCKETCALL
 #define __ARCH_WANT_SYS_FADVISE64
 #define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
 #define __ARCH_WANT_SYS_NICE
 #define __ARCH_WANT_SYS_OLD_GETRLIMIT
 #define __ARCH_WANT_SYS_OLD_UNAME
diff --git a/fs/read_write.c b/fs/read_write.c
index 59d819c5b92e..bbfa9b12b15e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -331,7 +331,8 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned i
 }
 #endif
 
-#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
+#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
+	defined(__ARCH_WANT_SYS_LLSEEK)
 SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		unsigned long, offset_low, loff_t __user *, result,
 		unsigned int, whence)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 2/8] powerpc: move common register copy functions from signal_32.c to signal.c
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/signal.c    | 141 ++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c | 140 -------------------------------
 2 files changed, 141 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d215f9554553..4b0152108f61 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,12 +18,153 @@
 #include <linux/syscalls.h>
 #include <asm/hw_breakpoint.h>
 #include <linux/uaccess.h>
+#include <asm/switch_to.h>
 #include <asm/unistd.h>
 #include <asm/debug.h>
 #include <asm/tm.h>
 
 #include "signal.h"
 
+#ifdef CONFIG_VSX
+unsigned long copy_fpr_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_FPR(i);
+	buf[i] = task->thread.fp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_fpr_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_FPR(i) = buf[i];
+	task->thread.fp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_vsx_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_vsx_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+unsigned long copy_ckfpr_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_CKFPR(i);
+	buf[i] = task->thread.ckfp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_CKFPR(i) = buf[i];
+	task->thread.ckfp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_ckvsx_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#else
+inline unsigned long copy_fpr_to_user(void __user *to,
+				      struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.fp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_fpr_from_user(struct task_struct *task,
+					void __user *from)
+{
+	return __copy_from_user(task->thread.fp_state.fpr, from,
+			      ELF_NFPREG * sizeof(double));
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+inline unsigned long copy_ckfpr_to_user(void __user *to,
+					 struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.ckfp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
+						 void __user *from)
+{
+	return __copy_from_user(task->thread.ckfp_state.fpr, from,
+				ELF_NFPREG * sizeof(double));
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#endif
+
 /* Log an error when sending an unhandled signal to a process. Controlled
  * through debug.exception-trace sysctl.
  */
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 1b090a76b444..4f96d29a22bf 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -235,146 +235,6 @@ struct rt_sigframe {
 	int			abigap[56];
 };
 
-#ifdef CONFIG_VSX
-unsigned long copy_fpr_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_FPR(i);
-	buf[i] = task->thread.fp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_fpr_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_FPR(i) = buf[i];
-	task->thread.fp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_vsx_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_vsx_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-unsigned long copy_ckfpr_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_CKFPR(i);
-	buf[i] = task->thread.ckfp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_ckfpr_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_CKFPR(i) = buf[i];
-	task->thread.ckfp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_ckvsx_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_ckvsx_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#else
-inline unsigned long copy_fpr_to_user(void __user *to,
-				      struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.fp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_fpr_from_user(struct task_struct *task,
-					void __user *from)
-{
-	return __copy_from_user(task->thread.fp_state.fpr, from,
-			      ELF_NFPREG * sizeof(double));
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-inline unsigned long copy_ckfpr_to_user(void __user *to,
-					 struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.ckfp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
-						 void __user *from)
-{
-	return __copy_from_user(task->thread.ckfp_state.fpr, from,
-				ELF_NFPREG * sizeof(double));
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#endif
-
 /*
  * Save the current user registers on the user stack.
  * We only save the altivec/spe registers if the process has used
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 2/8] powerpc: move common register copy functions from signal_32.c to signal.c
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/kernel/signal.c    | 141 ++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/signal_32.c | 140 -------------------------------
 2 files changed, 141 insertions(+), 140 deletions(-)

diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index d215f9554553..4b0152108f61 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,12 +18,153 @@
 #include <linux/syscalls.h>
 #include <asm/hw_breakpoint.h>
 #include <linux/uaccess.h>
+#include <asm/switch_to.h>
 #include <asm/unistd.h>
 #include <asm/debug.h>
 #include <asm/tm.h>
 
 #include "signal.h"
 
+#ifdef CONFIG_VSX
+unsigned long copy_fpr_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_FPR(i);
+	buf[i] = task->thread.fp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_fpr_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_FPR(i) = buf[i];
+	task->thread.fp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_vsx_to_user(void __user *to,
+			       struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_vsx_from_user(struct task_struct *task,
+				 void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+unsigned long copy_ckfpr_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		buf[i] = task->thread.TS_CKFPR(i);
+	buf[i] = task->thread.ckfp_state.fpscr;
+	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NFPREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+		task->thread.TS_CKFPR(i) = buf[i];
+	task->thread.ckfp_state.fpscr = buf[i];
+
+	return 0;
+}
+
+unsigned long copy_ckvsx_to_user(void __user *to,
+				  struct task_struct *task)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	/* save FPR copy to local buffer then write to the thread_struct */
+	for (i = 0; i < ELF_NVSRHALFREG; i++)
+		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
+	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
+					  void __user *from)
+{
+	u64 buf[ELF_NVSRHALFREG];
+	int i;
+
+	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+		return 1;
+	for (i = 0; i < ELF_NVSRHALFREG ; i++)
+		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+	return 0;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#else
+inline unsigned long copy_fpr_to_user(void __user *to,
+				      struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.fp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_fpr_from_user(struct task_struct *task,
+					void __user *from)
+{
+	return __copy_from_user(task->thread.fp_state.fpr, from,
+			      ELF_NFPREG * sizeof(double));
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+inline unsigned long copy_ckfpr_to_user(void __user *to,
+					 struct task_struct *task)
+{
+	return __copy_to_user(to, task->thread.ckfp_state.fpr,
+			      ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
+						 void __user *from)
+{
+	return __copy_from_user(task->thread.ckfp_state.fpr, from,
+				ELF_NFPREG * sizeof(double));
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#endif
+
 /* Log an error when sending an unhandled signal to a process. Controlled
  * through debug.exception-trace sysctl.
  */
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 1b090a76b444..4f96d29a22bf 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -235,146 +235,6 @@ struct rt_sigframe {
 	int			abigap[56];
 };
 
-#ifdef CONFIG_VSX
-unsigned long copy_fpr_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_FPR(i);
-	buf[i] = task->thread.fp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_fpr_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_FPR(i) = buf[i];
-	task->thread.fp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_vsx_to_user(void __user *to,
-			       struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_vsx_from_user(struct task_struct *task,
-				 void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-unsigned long copy_ckfpr_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_CKFPR(i);
-	buf[i] = task->thread.ckfp_state.fpscr;
-	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_ckfpr_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NFPREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_CKFPR(i) = buf[i];
-	task->thread.ckfp_state.fpscr = buf[i];
-
-	return 0;
-}
-
-unsigned long copy_ckvsx_to_user(void __user *to,
-				  struct task_struct *task)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	/* save FPR copy to local buffer then write to the thread_struct */
-	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
-	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_ckvsx_from_user(struct task_struct *task,
-					  void __user *from)
-{
-	u64 buf[ELF_NVSRHALFREG];
-	int i;
-
-	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
-		return 1;
-	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
-	return 0;
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#else
-inline unsigned long copy_fpr_to_user(void __user *to,
-				      struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.fp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_fpr_from_user(struct task_struct *task,
-					void __user *from)
-{
-	return __copy_from_user(task->thread.fp_state.fpr, from,
-			      ELF_NFPREG * sizeof(double));
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-inline unsigned long copy_ckfpr_to_user(void __user *to,
-					 struct task_struct *task)
-{
-	return __copy_to_user(to, task->thread.ckfp_state.fpr,
-			      ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
-						 void __user *from)
-{
-	return __copy_from_user(task->thread.ckfp_state.fpr, from,
-				ELF_NFPREG * sizeof(double));
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#endif
-
 /*
  * Save the current user registers on the user stack.
  * We only save the altivec/spe registers if the process has used
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

There are two almost identical copies for 32bit and 64bit.

The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6:  new patch
v8:  move the consolidated function out of the ifdef block.
v11: rebase on top of def0bfdbd603
---
 arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..c9a78c6e4361 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 4);
-}
-
 static inline int valid_user_sp(unsigned long sp, int is_64)
 {
 	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
@@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 }
 
 #else  /* CONFIG_PPC64 */
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	return probe_user_read(ret, ptr, sizeof(*ret));
+	return 0;
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
@@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
 /*
  * Layout for non-RT signal frames
  */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

There are two almost identical copies for 32bit and 64bit.

The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v6:  new patch
v8:  move the consolidated function out of the ifdef block.
v11: rebase on top of def0bfdbd603
---
 arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..c9a78c6e4361 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 4);
-}
-
 static inline int valid_user_sp(unsigned long sp, int is_64)
 {
 	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
@@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 }
 
 #else  /* CONFIG_PPC64 */
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	return probe_user_read(ret, ptr, sizeof(*ret));
+	return 0;
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
@@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
 
 #endif /* CONFIG_PPC64 */
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
 /*
  * Layout for non-RT signal frames
  */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 4/8] powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Merge the 32bit and 64bit version.

Halve the check constants on 32bit.

Use STACK_TOP since it is defined.

Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.

This removes a page from the valid 32bit area on 64bit:
 #define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
 #define STACK_TOP_USER32 TASK_SIZE_USER32

Change return value to bool. It is inverted by users anyway.

Change to invalid_user_sp to avoid inverting the return value twice.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v8: new patch
v11: simplify by using is_32bit_task()
v12:
 - simplify by precalculating subexpresions
 - change return value to bool
 - remove double inversion
---
 arch/powerpc/perf/callchain.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c9a78c6e4361..001d0473a61f 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -102,6 +102,14 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
+static inline bool invalid_user_sp(unsigned long sp)
+{
+	unsigned long mask = is_32bit_task() ? 3 : 7;
+	unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
+
+	return (!sp || (sp & mask) || (sp > top));
+}
+
 #ifdef CONFIG_PPC64
 /*
  * On 64-bit we don't want to invoke hash_page on user addresses from
@@ -161,13 +169,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
-		return 0;
-	return 1;
-}
-
 /*
  * 64-bit user processes use the same stack frame for RT and non-RT signals.
  */
@@ -226,7 +227,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned long __user *) sp;
-		if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
+		if (invalid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
 			return;
@@ -275,13 +276,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 {
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
-		return 0;
-	return 1;
-}
-
 #define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
 #define sigcontext32		sigcontext
 #define mcontext32		mcontext
@@ -423,7 +417,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned int __user *) (unsigned long) sp;
-		if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
+		if (invalid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
 			return;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 4/8] powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Merge the 32bit and 64bit version.

Halve the check constants on 32bit.

Use STACK_TOP since it is defined.

Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.

This removes a page from the valid 32bit area on 64bit:
 #define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
 #define STACK_TOP_USER32 TASK_SIZE_USER32

Change return value to bool. It is inverted by users anyway.

Change to invalid_user_sp to avoid inverting the return value twice.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v8: new patch
v11: simplify by using is_32bit_task()
v12:
 - simplify by precalculating subexpresions
 - change return value to bool
 - remove double inversion
---
 arch/powerpc/perf/callchain.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c9a78c6e4361..001d0473a61f 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -102,6 +102,14 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
+static inline bool invalid_user_sp(unsigned long sp)
+{
+	unsigned long mask = is_32bit_task() ? 3 : 7;
+	unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
+
+	return (!sp || (sp & mask) || (sp > top));
+}
+
 #ifdef CONFIG_PPC64
 /*
  * On 64-bit we don't want to invoke hash_page on user addresses from
@@ -161,13 +169,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 	return read_user_stack_slow(ptr, ret, 8);
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
-		return 0;
-	return 1;
-}
-
 /*
  * 64-bit user processes use the same stack frame for RT and non-RT signals.
  */
@@ -226,7 +227,7 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned long __user *) sp;
-		if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
+		if (invalid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
 			return;
@@ -275,13 +276,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 {
 }
 
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
-	if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
-		return 0;
-	return 1;
-}
-
 #define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
 #define sigcontext32		sigcontext
 #define mcontext32		mcontext
@@ -423,7 +417,7 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 
 	while (entry->nr < entry->max_stack) {
 		fp = (unsigned int __user *) (unsigned long) sp;
-		if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
+		if (invalid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
 			return;
 		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
 			return;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- fix 32bit ifdef condition in signal.c
- simplify the compat ifdef condition in vdso.c - 64bit is redundant
- simplify the compat ifdef condition in callchain.c - 64bit is redundant
v3:
- use IS_ENABLED and maybe_unused where possible
- do not ifdef declarations
- clean up Makefile
v4:
- further makefile cleanup
- simplify is_32bit_task conditions
- avoid ifdef in condition by using return
v5:
- avoid unreachable code on 32bit
- make is_current_64bit constant on !COMPAT
- add stub perf_callchain_user_32 to avoid some ifdefs
v6:
- consolidate current_is_64bit
v7:
- remove leftover perf_callchain_user_32 stub from previous series version
v8:
- fix build again - too trigger-happy with stub removal
- remove a vdso.c hunk that causes warning according to kbuild test robot
v9:
- removed current_is_64bit in previous patch
v10:
- rebase on top of 70ed86f4de5bd
---
 arch/powerpc/include/asm/thread_info.h | 4 ++--
 arch/powerpc/kernel/Makefile           | 6 +++---
 arch/powerpc/kernel/entry_64.S         | 2 ++
 arch/powerpc/kernel/signal.c           | 3 +--
 arch/powerpc/kernel/syscall_64.c       | 6 ++----
 arch/powerpc/kernel/vdso.c             | 3 ++-
 arch/powerpc/perf/callchain.c          | 8 +++++++-
 7 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index a2270749b282..ca6c97025704 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -162,10 +162,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
 	return (ti->local_flags & flags) != 0;
 }
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #define is_32bit_task()	(test_thread_flag(TIF_32BIT))
 #else
-#define is_32bit_task()	(1)
+#define is_32bit_task()	(IS_ENABLED(CONFIG_PPC32))
 #endif
 
 #if defined(CONFIG_PPC64)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 5700231a8988..98a1c143b613 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -42,16 +42,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
 endif
 
 obj-y				:= cputable.o ptrace.o syscalls.o \
-				   irq.o align.o signal_32.o pmc.o vdso.o \
+				   irq.o align.o signal_$(BITS).o pmc.o vdso.o \
 				   process.o systbl.o idle.o \
 				   signal.o sysfs.o cacheinfo.o time.o \
 				   prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
 				   of_platform.o prom_parse.o
-obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
-				   signal_64.o ptrace32.o \
+obj-$(CONFIG_PPC64)		+= setup_64.o \
 				   paca.o nvram_64.o firmware.o note.o \
 				   syscall_64.o
+obj-$(CONFIG_COMPAT)		+= sys_ppc32.o ptrace32.o signal_32.o
 obj-$(CONFIG_VDSO32)		+= vdso32/
 obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 4c0d0400e93d..fe1421e08f09 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -52,8 +52,10 @@
 SYS_CALL_TABLE:
 	.tc sys_call_table[TC],sys_call_table
 
+#ifdef CONFIG_COMPAT
 COMPAT_SYS_CALL_TABLE:
 	.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
 
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 4b0152108f61..a264989626fd 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
 	sigset_t *oldset = sigmask_to_save();
 	struct ksignal ksig = { .sig = 0 };
 	int ret;
-	int is32 = is_32bit_task();
 
 	BUG_ON(tsk != current);
 
@@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
 
 	rseq_signal_deliver(&ksig, tsk->thread.regs);
 
-	if (is32) {
+	if (is_32bit_task()) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
 		else
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 87d95b455b83..2dcbfe38f5ac 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
 				   long r6, long r7, long r8,
 				   unsigned long r0, struct pt_regs *regs)
 {
-	unsigned long ti_flags;
 	syscall_fn f;
 
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 
 	local_irq_enable();
 
-	ti_flags = current_thread_info()->flags;
-	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
 		/*
 		 * We use the return value of do_syscall_trace_enter() as the
 		 * syscall number. If the syscall was rejected for any reason
@@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 	/* May be faster to do array_index_nospec? */
 	barrier_nospec();
 
-	if (unlikely(ti_flags & _TIF_32BIT)) {
+	if (unlikely(is_32bit_task())) {
 		f = (void *)compat_sys_call_table[r0];
 
 		r3 &= 0x00000000ffffffffULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index b9a108411c0d..77da3b7d304d 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -656,7 +656,8 @@ static void __init vdso_setup_syscall_map(void)
 		if (sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_64[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
-		if (compat_sys_call_table[i] != sys_ni_syscall)
+		if (IS_ENABLED(CONFIG_COMPAT) &&
+		    compat_sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_32[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 001d0473a61f..b5afd0bec4f8 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,7 +15,7 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #include "../kernel/ppc32.h"
 #endif
 #include <asm/pte-walk.h>
@@ -284,6 +284,7 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 
 #endif /* CONFIG_PPC64 */
 
+#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
 /*
  * On 32-bit we just access the address and let hash_page create a
  * HPTE if necessary, so there is no need to fall back to reading
@@ -447,6 +448,11 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 		sp = next_sp;
 	}
 }
+#else /* 32bit */
+static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+				   struct pt_regs *regs)
+{}
+#endif /* 32bit */
 
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v2:
- fix 32bit ifdef condition in signal.c
- simplify the compat ifdef condition in vdso.c - 64bit is redundant
- simplify the compat ifdef condition in callchain.c - 64bit is redundant
v3:
- use IS_ENABLED and maybe_unused where possible
- do not ifdef declarations
- clean up Makefile
v4:
- further makefile cleanup
- simplify is_32bit_task conditions
- avoid ifdef in condition by using return
v5:
- avoid unreachable code on 32bit
- make is_current_64bit constant on !COMPAT
- add stub perf_callchain_user_32 to avoid some ifdefs
v6:
- consolidate current_is_64bit
v7:
- remove leftover perf_callchain_user_32 stub from previous series version
v8:
- fix build again - too trigger-happy with stub removal
- remove a vdso.c hunk that causes warning according to kbuild test robot
v9:
- removed current_is_64bit in previous patch
v10:
- rebase on top of 70ed86f4de5bd
---
 arch/powerpc/include/asm/thread_info.h | 4 ++--
 arch/powerpc/kernel/Makefile           | 6 +++---
 arch/powerpc/kernel/entry_64.S         | 2 ++
 arch/powerpc/kernel/signal.c           | 3 +--
 arch/powerpc/kernel/syscall_64.c       | 6 ++----
 arch/powerpc/kernel/vdso.c             | 3 ++-
 arch/powerpc/perf/callchain.c          | 8 +++++++-
 7 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index a2270749b282..ca6c97025704 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -162,10 +162,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
 	return (ti->local_flags & flags) != 0;
 }
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #define is_32bit_task()	(test_thread_flag(TIF_32BIT))
 #else
-#define is_32bit_task()	(1)
+#define is_32bit_task()	(IS_ENABLED(CONFIG_PPC32))
 #endif
 
 #if defined(CONFIG_PPC64)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 5700231a8988..98a1c143b613 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -42,16 +42,16 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
 endif
 
 obj-y				:= cputable.o ptrace.o syscalls.o \
-				   irq.o align.o signal_32.o pmc.o vdso.o \
+				   irq.o align.o signal_$(BITS).o pmc.o vdso.o \
 				   process.o systbl.o idle.o \
 				   signal.o sysfs.o cacheinfo.o time.o \
 				   prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
 				   of_platform.o prom_parse.o
-obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
-				   signal_64.o ptrace32.o \
+obj-$(CONFIG_PPC64)		+= setup_64.o \
 				   paca.o nvram_64.o firmware.o note.o \
 				   syscall_64.o
+obj-$(CONFIG_COMPAT)		+= sys_ppc32.o ptrace32.o signal_32.o
 obj-$(CONFIG_VDSO32)		+= vdso32/
 obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 4c0d0400e93d..fe1421e08f09 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -52,8 +52,10 @@
 SYS_CALL_TABLE:
 	.tc sys_call_table[TC],sys_call_table
 
+#ifdef CONFIG_COMPAT
 COMPAT_SYS_CALL_TABLE:
 	.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
 
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 4b0152108f61..a264989626fd 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
 	sigset_t *oldset = sigmask_to_save();
 	struct ksignal ksig = { .sig = 0 };
 	int ret;
-	int is32 = is_32bit_task();
 
 	BUG_ON(tsk != current);
 
@@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
 
 	rseq_signal_deliver(&ksig, tsk->thread.regs);
 
-	if (is32) {
+	if (is_32bit_task()) {
         	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
 			ret = handle_rt_signal32(&ksig, oldset, tsk);
 		else
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 87d95b455b83..2dcbfe38f5ac 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
 				   long r6, long r7, long r8,
 				   unsigned long r0, struct pt_regs *regs)
 {
-	unsigned long ti_flags;
 	syscall_fn f;
 
 	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
@@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 
 	local_irq_enable();
 
-	ti_flags = current_thread_info()->flags;
-	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
 		/*
 		 * We use the return value of do_syscall_trace_enter() as the
 		 * syscall number. If the syscall was rejected for any reason
@@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
 	/* May be faster to do array_index_nospec? */
 	barrier_nospec();
 
-	if (unlikely(ti_flags & _TIF_32BIT)) {
+	if (unlikely(is_32bit_task())) {
 		f = (void *)compat_sys_call_table[r0];
 
 		r3 &= 0x00000000ffffffffULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index b9a108411c0d..77da3b7d304d 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -656,7 +656,8 @@ static void __init vdso_setup_syscall_map(void)
 		if (sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_64[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
-		if (compat_sys_call_table[i] != sys_ni_syscall)
+		if (IS_ENABLED(CONFIG_COMPAT) &&
+		    compat_sys_call_table[i] != sys_ni_syscall)
 			vdso_data->syscall_map_32[i >> 5] |=
 				0x80000000UL >> (i & 0x1f);
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 001d0473a61f..b5afd0bec4f8 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,7 +15,7 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
 #include "../kernel/ppc32.h"
 #endif
 #include <asm/pte-walk.h>
@@ -284,6 +284,7 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
 
 #endif /* CONFIG_PPC64 */
 
+#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
 /*
  * On 32-bit we just access the address and let hash_page create a
  * HPTE if necessary, so there is no need to fall back to reading
@@ -447,6 +448,11 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
 		sp = next_sp;
 	}
 }
+#else /* 32bit */
+static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+				   struct pt_regs *regs)
+{}
+#endif /* 32bit */
 
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 6/8] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v3: make configurable
---
 arch/powerpc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 497b7d0b2d7e..29d00b3959b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -264,8 +264,9 @@ config PANIC_TIMEOUT
 	default 180
 
 config COMPAT
-	bool
-	default y if PPC64
+	bool "Enable support for 32bit binaries"
+	depends on PPC64
+	default y if !CPU_LITTLE_ENDIAN
 	select COMPAT_BINFMT_ELF
 	select ARCH_WANT_OLD_COMPAT_IPC
 	select COMPAT_OLD_SIGACTION
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 6/8] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
v3: make configurable
---
 arch/powerpc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 497b7d0b2d7e..29d00b3959b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -264,8 +264,9 @@ config PANIC_TIMEOUT
 	default 180
 
 config COMPAT
-	bool
-	default y if PPC64
+	bool "Enable support for 32bit binaries"
+	depends on PPC64
+	default y if !CPU_LITTLE_ENDIAN
 	select COMPAT_BINFMT_ELF
 	select ARCH_WANT_OLD_COMPAT_IPC
 	select COMPAT_OLD_SIGACTION
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 7/8] powerpc/perf: split callchain.c by bitness
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.

No code change intended.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v6:
 - move current_is_64bit consolidetaion to earlier patch
 - move defines to the top of callchain_32.c
 - Makefile cleanup
v8:
 - fix valid_user_sp
v11:
 - rebase on top of def0bfdbd603
---
 arch/powerpc/perf/Makefile       |   5 +-
 arch/powerpc/perf/callchain.c    | 356 +------------------------------
 arch/powerpc/perf/callchain.h    |  19 ++
 arch/powerpc/perf/callchain_32.c | 196 +++++++++++++++++
 arch/powerpc/perf/callchain_64.c | 174 +++++++++++++++
 5 files changed, 394 insertions(+), 356 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index c155dcbb8691..53d614e98537 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_PERF_EVENTS)	+= callchain.o perf_regs.o
+obj-$(CONFIG_PERF_EVENTS)	+= callchain.o callchain_$(BITS).o perf_regs.o
+ifdef CONFIG_COMPAT
+obj-$(CONFIG_PERF_EVENTS)	+= callchain_32.o
+endif
 
 obj-$(CONFIG_PPC_PERF_CTRS)	+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)	+= ppc970-pmu.o power5-pmu.o \
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index b5afd0bec4f8..dd5051015008 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,11 +15,9 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_COMPAT
-#include "../kernel/ppc32.h"
-#endif
 #include <asm/pte-walk.h>
 
+#include "callchain.h"
 
 /*
  * Is sp valid as the address of the next kernel stack frame after prev_sp?
@@ -102,358 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
-static inline bool invalid_user_sp(unsigned long sp)
-{
-	unsigned long mask = is_32bit_task() ? 3 : 7;
-	unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
-
-	return (!sp || (sp & mask) || (sp > top));
-}
-
-#ifdef CONFIG_PPC64
-/*
- * On 64-bit we don't want to invoke hash_page on user addresses from
- * interrupt context, so if the access faults, we read the page tables
- * to find which page (if any) is mapped and access it directly.
- */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	int ret = -EFAULT;
-	pgd_t *pgdir;
-	pte_t *ptep, pte;
-	unsigned shift;
-	unsigned long addr = (unsigned long) ptr;
-	unsigned long offset;
-	unsigned long pfn, flags;
-	void *kaddr;
-
-	pgdir = current->mm->pgd;
-	if (!pgdir)
-		return -EFAULT;
-
-	local_irq_save(flags);
-	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
-	if (!ptep)
-		goto err_out;
-	if (!shift)
-		shift = PAGE_SHIFT;
-
-	/* align address to page boundary */
-	offset = addr & ((1UL << shift) - 1);
-
-	pte = READ_ONCE(*ptep);
-	if (!pte_present(pte) || !pte_user(pte))
-		goto err_out;
-	pfn = pte_pfn(pte);
-	if (!page_is_ram(pfn))
-		goto err_out;
-
-	/* no highmem to worry about here */
-	kaddr = pfn_to_kaddr(pfn);
-	memcpy(buf, kaddr + offset, nb);
-	ret = 0;
-err_out:
-	local_irq_restore(flags);
-	return ret;
-}
-
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 8);
-}
-
-/*
- * 64-bit user processes use the same stack frame for RT and non-RT signals.
- */
-struct signal_frame_64 {
-	char		dummy[__SIGNAL_FRAMESIZE];
-	struct ucontext	uc;
-	unsigned long	unused[2];
-	unsigned int	tramp[6];
-	struct siginfo	*pinfo;
-	void		*puc;
-	struct siginfo	info;
-	char		abigap[288];
-};
-
-static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_64, tramp))
-		return 1;
-	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-/*
- * Do some sanity checking on the signal frame pointed to by sp.
- * We check the pinfo and puc pointers in the frame.
- */
-static int sane_signal_64_frame(unsigned long sp)
-{
-	struct signal_frame_64 __user *sf;
-	unsigned long pinfo, puc;
-
-	sf = (struct signal_frame_64 __user *) sp;
-	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
-	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
-		return 0;
-	return pinfo == (unsigned long) &sf->info &&
-		puc == (unsigned long) &sf->uc;
-}
-
-static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned long sp, next_sp;
-	unsigned long next_ip;
-	unsigned long lr;
-	long level = 0;
-	struct signal_frame_64 __user *sigframe;
-	unsigned long __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned long __user *) sp;
-		if (invalid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
-			return;
-
-		/*
-		 * Note: the next_sp - sp >= signal frame size check
-		 * is true when next_sp < sp, which can happen when
-		 * transitioning from an alternate signal stack to the
-		 * normal stack.
-		 */
-		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
-		    (is_sigreturn_64_address(next_ip, sp) ||
-		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
-		    sane_signal_64_frame(sp)) {
-			/*
-			 * This looks like an signal frame
-			 */
-			sigframe = (struct signal_frame_64 __user *) sp;
-			uregs = sigframe->uc.uc_mcontext.gp_regs;
-			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_64(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-
-#else  /* CONFIG_PPC64 */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	return 0;
-}
-
-static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-					  struct pt_regs *regs)
-{
-}
-
-#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
-#define sigcontext32		sigcontext
-#define mcontext32		mcontext
-#define ucontext32		ucontext
-#define compat_siginfo_t	struct siginfo
-
-#endif /* CONFIG_PPC64 */
-
-#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	int rc;
-
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
-
-	if (IS_ENABLED(CONFIG_PPC64) && rc)
-		return read_user_stack_slow(ptr, ret, 4);
-
-	return rc;
-}
-
-/*
- * Layout for non-RT signal frames
- */
-struct signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32];
-	struct sigcontext32	sctx;
-	struct mcontext32	mctx;
-	int			abigap[56];
-};
-
-/*
- * Layout for RT signal frames
- */
-struct rt_signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
-	compat_siginfo_t	info;
-	struct ucontext32	uc;
-	int			abigap[56];
-};
-
-static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
-		return 1;
-	if (vdso32_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct rt_signal_frame_32,
-				 uc.uc_mcontext.mc_pad))
-		return 1;
-	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int sane_signal_32_frame(unsigned int sp)
-{
-	struct signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->mctx;
-}
-
-static int sane_rt_signal_32_frame(unsigned int sp)
-{
-	struct rt_signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->uc.uc_mcontext;
-}
-
-static unsigned int __user *signal_frame_32_regs(unsigned int sp,
-				unsigned int next_sp, unsigned int next_ip)
-{
-	struct mcontext32 __user *mctx = NULL;
-	struct signal_frame_32 __user *sf;
-	struct rt_signal_frame_32 __user *rt_sf;
-
-	/*
-	 * Note: the next_sp - sp >= signal frame size check
-	 * is true when next_sp < sp, for example, when
-	 * transitioning from an alternate signal stack to the
-	 * normal stack.
-	 */
-	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
-	    is_sigreturn_32_address(next_ip, sp) &&
-	    sane_signal_32_frame(sp)) {
-		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &sf->mctx;
-	}
-
-	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
-	    is_rt_sigreturn_32_address(next_ip, sp) &&
-	    sane_rt_signal_32_frame(sp)) {
-		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &rt_sf->uc.uc_mcontext;
-	}
-
-	if (!mctx)
-		return NULL;
-	return mctx->mc_gregs;
-}
-
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned int sp, next_sp;
-	unsigned int next_ip;
-	unsigned int lr;
-	long level = 0;
-	unsigned int __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned int __user *) (unsigned long) sp;
-		if (invalid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
-			return;
-
-		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
-		if (!uregs && level <= 1)
-			uregs = signal_frame_32_regs(sp, next_sp, lr);
-		if (uregs) {
-			/*
-			 * This looks like an signal frame, so restart
-			 * the stack trace with the values in it.
-			 */
-			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_32(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-#else /* 32bit */
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{}
-#endif /* 32bit */
-
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
new file mode 100644
index 000000000000..7a2cb9e1181a
--- /dev/null
+++ b/arch/powerpc/perf/callchain.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _POWERPC_PERF_CALLCHAIN_H
+#define _POWERPC_PERF_CALLCHAIN_H
+
+int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+
+static inline bool invalid_user_sp(unsigned long sp)
+{
+	unsigned long mask = is_32bit_task() ? 3 : 7;
+	unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
+
+	return (!sp || (sp & mask) || (sp > top));
+}
+
+#endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
new file mode 100644
index 000000000000..8aa951003141
--- /dev/null
+++ b/arch/powerpc/perf/callchain_32.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+#ifdef CONFIG_PPC64
+#include "../kernel/ppc32.h"
+#else  /* CONFIG_PPC64 */
+
+#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
+#define sigcontext32		sigcontext
+#define mcontext32		mcontext
+#define ucontext32		ucontext
+#define compat_siginfo_t	struct siginfo
+
+#endif /* CONFIG_PPC64 */
+
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
+/*
+ * Layout for non-RT signal frames
+ */
+struct signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32];
+	struct sigcontext32	sctx;
+	struct mcontext32	mctx;
+	int			abigap[56];
+};
+
+/*
+ * Layout for RT signal frames
+ */
+struct rt_signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
+	compat_siginfo_t	info;
+	struct ucontext32	uc;
+	int			abigap[56];
+};
+
+static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
+		return 1;
+	if (vdso32_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct rt_signal_frame_32,
+				 uc.uc_mcontext.mc_pad))
+		return 1;
+	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int sane_signal_32_frame(unsigned int sp)
+{
+	struct signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->mctx;
+}
+
+static int sane_rt_signal_32_frame(unsigned int sp)
+{
+	struct rt_signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->uc.uc_mcontext;
+}
+
+static unsigned int __user *signal_frame_32_regs(unsigned int sp,
+				unsigned int next_sp, unsigned int next_ip)
+{
+	struct mcontext32 __user *mctx = NULL;
+	struct signal_frame_32 __user *sf;
+	struct rt_signal_frame_32 __user *rt_sf;
+
+	/*
+	 * Note: the next_sp - sp >= signal frame size check
+	 * is true when next_sp < sp, for example, when
+	 * transitioning from an alternate signal stack to the
+	 * normal stack.
+	 */
+	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
+	    is_sigreturn_32_address(next_ip, sp) &&
+	    sane_signal_32_frame(sp)) {
+		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &sf->mctx;
+	}
+
+	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
+	    is_rt_sigreturn_32_address(next_ip, sp) &&
+	    sane_rt_signal_32_frame(sp)) {
+		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &rt_sf->uc.uc_mcontext;
+	}
+
+	if (!mctx)
+		return NULL;
+	return mctx->mc_gregs;
+}
+
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned int sp, next_sp;
+	unsigned int next_ip;
+	unsigned int lr;
+	long level = 0;
+	unsigned int __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned int __user *) (unsigned long) sp;
+		if (invalid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
+			return;
+
+		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
+		if (!uregs && level <= 1)
+			uregs = signal_frame_32_regs(sp, next_sp, lr);
+		if (uregs) {
+			/*
+			 * This looks like an signal frame, so restart
+			 * the stack trace with the values in it.
+			 */
+			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_32(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
new file mode 100644
index 000000000000..df1ffd8b20f2
--- /dev/null
+++ b/arch/powerpc/perf/callchain_64.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+/*
+ * On 64-bit we don't want to invoke hash_page on user addresses from
+ * interrupt context, so if the access faults, we read the page tables
+ * to find which page (if any) is mapped and access it directly.
+ */
+int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+	int ret = -EFAULT;
+	pgd_t *pgdir;
+	pte_t *ptep, pte;
+	unsigned int shift;
+	unsigned long addr = (unsigned long) ptr;
+	unsigned long offset;
+	unsigned long pfn, flags;
+	void *kaddr;
+
+	pgdir = current->mm->pgd;
+	if (!pgdir)
+		return -EFAULT;
+
+	local_irq_save(flags);
+	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
+	if (!ptep)
+		goto err_out;
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* align address to page boundary */
+	offset = addr & ((1UL << shift) - 1);
+
+	pte = READ_ONCE(*ptep);
+	if (!pte_present(pte) || !pte_user(pte))
+		goto err_out;
+	pfn = pte_pfn(pte);
+	if (!page_is_ram(pfn))
+		goto err_out;
+
+	/* no highmem to worry about here */
+	kaddr = pfn_to_kaddr(pfn);
+	memcpy(buf, kaddr + offset, nb);
+	ret = 0;
+err_out:
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+{
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
+	    ((unsigned long)ptr & 7))
+		return -EFAULT;
+
+	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+		return 0;
+
+	return read_user_stack_slow(ptr, ret, 8);
+}
+
+/*
+ * 64-bit user processes use the same stack frame for RT and non-RT signals.
+ */
+struct signal_frame_64 {
+	char		dummy[__SIGNAL_FRAMESIZE];
+	struct ucontext	uc;
+	unsigned long	unused[2];
+	unsigned int	tramp[6];
+	struct siginfo	*pinfo;
+	void		*puc;
+	struct siginfo	info;
+	char		abigap[288];
+};
+
+static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_64, tramp))
+		return 1;
+	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+/*
+ * Do some sanity checking on the signal frame pointed to by sp.
+ * We check the pinfo and puc pointers in the frame.
+ */
+static int sane_signal_64_frame(unsigned long sp)
+{
+	struct signal_frame_64 __user *sf;
+	unsigned long pinfo, puc;
+
+	sf = (struct signal_frame_64 __user *) sp;
+	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
+	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
+		return 0;
+	return pinfo == (unsigned long) &sf->info &&
+		puc == (unsigned long) &sf->uc;
+}
+
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned long sp, next_sp;
+	unsigned long next_ip;
+	unsigned long lr;
+	long level = 0;
+	struct signal_frame_64 __user *sigframe;
+	unsigned long __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned long __user *) sp;
+		if (invalid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
+			return;
+
+		/*
+		 * Note: the next_sp - sp >= signal frame size check
+		 * is true when next_sp < sp, which can happen when
+		 * transitioning from an alternate signal stack to the
+		 * normal stack.
+		 */
+		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
+		    (is_sigreturn_64_address(next_ip, sp) ||
+		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
+		    sane_signal_64_frame(sp)) {
+			/*
+			 * This looks like an signal frame
+			 */
+			sigframe = (struct signal_frame_64 __user *) sp;
+			uregs = sigframe->uc.uc_mcontext.gp_regs;
+			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_64(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 7/8] powerpc/perf: split callchain.c by bitness
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.

No code change intended.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v6:
 - move current_is_64bit consolidetaion to earlier patch
 - move defines to the top of callchain_32.c
 - Makefile cleanup
v8:
 - fix valid_user_sp
v11:
 - rebase on top of def0bfdbd603
---
 arch/powerpc/perf/Makefile       |   5 +-
 arch/powerpc/perf/callchain.c    | 356 +------------------------------
 arch/powerpc/perf/callchain.h    |  19 ++
 arch/powerpc/perf/callchain_32.c | 196 +++++++++++++++++
 arch/powerpc/perf/callchain_64.c | 174 +++++++++++++++
 5 files changed, 394 insertions(+), 356 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index c155dcbb8691..53d614e98537 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-$(CONFIG_PERF_EVENTS)	+= callchain.o perf_regs.o
+obj-$(CONFIG_PERF_EVENTS)	+= callchain.o callchain_$(BITS).o perf_regs.o
+ifdef CONFIG_COMPAT
+obj-$(CONFIG_PERF_EVENTS)	+= callchain_32.o
+endif
 
 obj-$(CONFIG_PPC_PERF_CTRS)	+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)	+= ppc970-pmu.o power5-pmu.o \
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index b5afd0bec4f8..dd5051015008 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,11 +15,9 @@
 #include <asm/sigcontext.h>
 #include <asm/ucontext.h>
 #include <asm/vdso.h>
-#ifdef CONFIG_COMPAT
-#include "../kernel/ppc32.h"
-#endif
 #include <asm/pte-walk.h>
 
+#include "callchain.h"
 
 /*
  * Is sp valid as the address of the next kernel stack frame after prev_sp?
@@ -102,358 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	}
 }
 
-static inline bool invalid_user_sp(unsigned long sp)
-{
-	unsigned long mask = is_32bit_task() ? 3 : 7;
-	unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
-
-	return (!sp || (sp & mask) || (sp > top));
-}
-
-#ifdef CONFIG_PPC64
-/*
- * On 64-bit we don't want to invoke hash_page on user addresses from
- * interrupt context, so if the access faults, we read the page tables
- * to find which page (if any) is mapped and access it directly.
- */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	int ret = -EFAULT;
-	pgd_t *pgdir;
-	pte_t *ptep, pte;
-	unsigned shift;
-	unsigned long addr = (unsigned long) ptr;
-	unsigned long offset;
-	unsigned long pfn, flags;
-	void *kaddr;
-
-	pgdir = current->mm->pgd;
-	if (!pgdir)
-		return -EFAULT;
-
-	local_irq_save(flags);
-	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
-	if (!ptep)
-		goto err_out;
-	if (!shift)
-		shift = PAGE_SHIFT;
-
-	/* align address to page boundary */
-	offset = addr & ((1UL << shift) - 1);
-
-	pte = READ_ONCE(*ptep);
-	if (!pte_present(pte) || !pte_user(pte))
-		goto err_out;
-	pfn = pte_pfn(pte);
-	if (!page_is_ram(pfn))
-		goto err_out;
-
-	/* no highmem to worry about here */
-	kaddr = pfn_to_kaddr(pfn);
-	memcpy(buf, kaddr + offset, nb);
-	ret = 0;
-err_out:
-	local_irq_restore(flags);
-	return ret;
-}
-
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-{
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 8);
-}
-
-/*
- * 64-bit user processes use the same stack frame for RT and non-RT signals.
- */
-struct signal_frame_64 {
-	char		dummy[__SIGNAL_FRAMESIZE];
-	struct ucontext	uc;
-	unsigned long	unused[2];
-	unsigned int	tramp[6];
-	struct siginfo	*pinfo;
-	void		*puc;
-	struct siginfo	info;
-	char		abigap[288];
-};
-
-static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_64, tramp))
-		return 1;
-	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-/*
- * Do some sanity checking on the signal frame pointed to by sp.
- * We check the pinfo and puc pointers in the frame.
- */
-static int sane_signal_64_frame(unsigned long sp)
-{
-	struct signal_frame_64 __user *sf;
-	unsigned long pinfo, puc;
-
-	sf = (struct signal_frame_64 __user *) sp;
-	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
-	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
-		return 0;
-	return pinfo == (unsigned long) &sf->info &&
-		puc == (unsigned long) &sf->uc;
-}
-
-static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned long sp, next_sp;
-	unsigned long next_ip;
-	unsigned long lr;
-	long level = 0;
-	struct signal_frame_64 __user *sigframe;
-	unsigned long __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned long __user *) sp;
-		if (invalid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
-			return;
-
-		/*
-		 * Note: the next_sp - sp >= signal frame size check
-		 * is true when next_sp < sp, which can happen when
-		 * transitioning from an alternate signal stack to the
-		 * normal stack.
-		 */
-		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
-		    (is_sigreturn_64_address(next_ip, sp) ||
-		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
-		    sane_signal_64_frame(sp)) {
-			/*
-			 * This looks like an signal frame
-			 */
-			sigframe = (struct signal_frame_64 __user *) sp;
-			uregs = sigframe->uc.uc_mcontext.gp_regs;
-			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_64(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-
-#else  /* CONFIG_PPC64 */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
-	return 0;
-}
-
-static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
-					  struct pt_regs *regs)
-{
-}
-
-#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
-#define sigcontext32		sigcontext
-#define mcontext32		mcontext
-#define ucontext32		ucontext
-#define compat_siginfo_t	struct siginfo
-
-#endif /* CONFIG_PPC64 */
-
-#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
-	int rc;
-
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
-
-	if (IS_ENABLED(CONFIG_PPC64) && rc)
-		return read_user_stack_slow(ptr, ret, 4);
-
-	return rc;
-}
-
-/*
- * Layout for non-RT signal frames
- */
-struct signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32];
-	struct sigcontext32	sctx;
-	struct mcontext32	mctx;
-	int			abigap[56];
-};
-
-/*
- * Layout for RT signal frames
- */
-struct rt_signal_frame_32 {
-	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
-	compat_siginfo_t	info;
-	struct ucontext32	uc;
-	int			abigap[56];
-};
-
-static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
-		return 1;
-	if (vdso32_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
-	if (nip == fp + offsetof(struct rt_signal_frame_32,
-				 uc.uc_mcontext.mc_pad))
-		return 1;
-	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
-	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
-		return 1;
-	return 0;
-}
-
-static int sane_signal_32_frame(unsigned int sp)
-{
-	struct signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->mctx;
-}
-
-static int sane_rt_signal_32_frame(unsigned int sp)
-{
-	struct rt_signal_frame_32 __user *sf;
-	unsigned int regs;
-
-	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
-		return 0;
-	return regs == (unsigned long) &sf->uc.uc_mcontext;
-}
-
-static unsigned int __user *signal_frame_32_regs(unsigned int sp,
-				unsigned int next_sp, unsigned int next_ip)
-{
-	struct mcontext32 __user *mctx = NULL;
-	struct signal_frame_32 __user *sf;
-	struct rt_signal_frame_32 __user *rt_sf;
-
-	/*
-	 * Note: the next_sp - sp >= signal frame size check
-	 * is true when next_sp < sp, for example, when
-	 * transitioning from an alternate signal stack to the
-	 * normal stack.
-	 */
-	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
-	    is_sigreturn_32_address(next_ip, sp) &&
-	    sane_signal_32_frame(sp)) {
-		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &sf->mctx;
-	}
-
-	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
-	    is_rt_sigreturn_32_address(next_ip, sp) &&
-	    sane_rt_signal_32_frame(sp)) {
-		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
-		mctx = &rt_sf->uc.uc_mcontext;
-	}
-
-	if (!mctx)
-		return NULL;
-	return mctx->mc_gregs;
-}
-
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{
-	unsigned int sp, next_sp;
-	unsigned int next_ip;
-	unsigned int lr;
-	long level = 0;
-	unsigned int __user *fp, *uregs;
-
-	next_ip = perf_instruction_pointer(regs);
-	lr = regs->link;
-	sp = regs->gpr[1];
-	perf_callchain_store(entry, next_ip);
-
-	while (entry->nr < entry->max_stack) {
-		fp = (unsigned int __user *) (unsigned long) sp;
-		if (invalid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
-			return;
-		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
-			return;
-
-		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
-		if (!uregs && level <= 1)
-			uregs = signal_frame_32_regs(sp, next_sp, lr);
-		if (uregs) {
-			/*
-			 * This looks like an signal frame, so restart
-			 * the stack trace with the values in it.
-			 */
-			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
-			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
-			    read_user_stack_32(&uregs[PT_R1], &sp))
-				return;
-			level = 0;
-			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
-			perf_callchain_store(entry, next_ip);
-			continue;
-		}
-
-		if (level == 0)
-			next_ip = lr;
-		perf_callchain_store(entry, next_ip);
-		++level;
-		sp = next_sp;
-	}
-}
-#else /* 32bit */
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
-				   struct pt_regs *regs)
-{}
-#endif /* 32bit */
-
 void
 perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
 {
diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
new file mode 100644
index 000000000000..7a2cb9e1181a
--- /dev/null
+++ b/arch/powerpc/perf/callchain.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _POWERPC_PERF_CALLCHAIN_H
+#define _POWERPC_PERF_CALLCHAIN_H
+
+int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs);
+
+static inline bool invalid_user_sp(unsigned long sp)
+{
+	unsigned long mask = is_32bit_task() ? 3 : 7;
+	unsigned long top = STACK_TOP - (is_32bit_task() ? 16 : 32);
+
+	return (!sp || (sp & mask) || (sp > top));
+}
+
+#endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
new file mode 100644
index 000000000000..8aa951003141
--- /dev/null
+++ b/arch/powerpc/perf/callchain_32.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+#ifdef CONFIG_PPC64
+#include "../kernel/ppc32.h"
+#else  /* CONFIG_PPC64 */
+
+#define __SIGNAL_FRAMESIZE32	__SIGNAL_FRAMESIZE
+#define sigcontext32		sigcontext
+#define mcontext32		mcontext
+#define ucontext32		ucontext
+#define compat_siginfo_t	struct siginfo
+
+#endif /* CONFIG_PPC64 */
+
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+	    ((unsigned long)ptr & 3))
+		return -EFAULT;
+
+	rc = probe_user_read(ret, ptr, sizeof(*ret));
+
+	if (IS_ENABLED(CONFIG_PPC64) && rc)
+		return read_user_stack_slow(ptr, ret, 4);
+
+	return rc;
+}
+
+/*
+ * Layout for non-RT signal frames
+ */
+struct signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32];
+	struct sigcontext32	sctx;
+	struct mcontext32	mctx;
+	int			abigap[56];
+};
+
+/*
+ * Layout for RT signal frames
+ */
+struct rt_signal_frame_32 {
+	char			dummy[__SIGNAL_FRAMESIZE32 + 16];
+	compat_siginfo_t	info;
+	struct ucontext32	uc;
+	int			abigap[56];
+};
+
+static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
+		return 1;
+	if (vdso32_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+	if (nip == fp + offsetof(struct rt_signal_frame_32,
+				 uc.uc_mcontext.mc_pad))
+		return 1;
+	if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+static int sane_signal_32_frame(unsigned int sp)
+{
+	struct signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->mctx;
+}
+
+static int sane_rt_signal_32_frame(unsigned int sp)
+{
+	struct rt_signal_frame_32 __user *sf;
+	unsigned int regs;
+
+	sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+	if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, &regs))
+		return 0;
+	return regs == (unsigned long) &sf->uc.uc_mcontext;
+}
+
+static unsigned int __user *signal_frame_32_regs(unsigned int sp,
+				unsigned int next_sp, unsigned int next_ip)
+{
+	struct mcontext32 __user *mctx = NULL;
+	struct signal_frame_32 __user *sf;
+	struct rt_signal_frame_32 __user *rt_sf;
+
+	/*
+	 * Note: the next_sp - sp >= signal frame size check
+	 * is true when next_sp < sp, for example, when
+	 * transitioning from an alternate signal stack to the
+	 * normal stack.
+	 */
+	if (next_sp - sp >= sizeof(struct signal_frame_32) &&
+	    is_sigreturn_32_address(next_ip, sp) &&
+	    sane_signal_32_frame(sp)) {
+		sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &sf->mctx;
+	}
+
+	if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
+	    is_rt_sigreturn_32_address(next_ip, sp) &&
+	    sane_rt_signal_32_frame(sp)) {
+		rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+		mctx = &rt_sf->uc.uc_mcontext;
+	}
+
+	if (!mctx)
+		return NULL;
+	return mctx->mc_gregs;
+}
+
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned int sp, next_sp;
+	unsigned int next_ip;
+	unsigned int lr;
+	long level = 0;
+	unsigned int __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned int __user *) (unsigned long) sp;
+		if (invalid_user_sp(sp) || read_user_stack_32(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
+			return;
+
+		uregs = signal_frame_32_regs(sp, next_sp, next_ip);
+		if (!uregs && level <= 1)
+			uregs = signal_frame_32_regs(sp, next_sp, lr);
+		if (uregs) {
+			/*
+			 * This looks like an signal frame, so restart
+			 * the stack trace with the values in it.
+			 */
+			if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_32(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_32(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
new file mode 100644
index 000000000000..df1ffd8b20f2
--- /dev/null
+++ b/arch/powerpc/perf/callchain_64.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+/*
+ * On 64-bit we don't want to invoke hash_page on user addresses from
+ * interrupt context, so if the access faults, we read the page tables
+ * to find which page (if any) is mapped and access it directly.
+ */
+int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+	int ret = -EFAULT;
+	pgd_t *pgdir;
+	pte_t *ptep, pte;
+	unsigned int shift;
+	unsigned long addr = (unsigned long) ptr;
+	unsigned long offset;
+	unsigned long pfn, flags;
+	void *kaddr;
+
+	pgdir = current->mm->pgd;
+	if (!pgdir)
+		return -EFAULT;
+
+	local_irq_save(flags);
+	ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
+	if (!ptep)
+		goto err_out;
+	if (!shift)
+		shift = PAGE_SHIFT;
+
+	/* align address to page boundary */
+	offset = addr & ((1UL << shift) - 1);
+
+	pte = READ_ONCE(*ptep);
+	if (!pte_present(pte) || !pte_user(pte))
+		goto err_out;
+	pfn = pte_pfn(pte);
+	if (!page_is_ram(pfn))
+		goto err_out;
+
+	/* no highmem to worry about here */
+	kaddr = pfn_to_kaddr(pfn);
+	memcpy(buf, kaddr + offset, nb);
+	ret = 0;
+err_out:
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+{
+	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
+	    ((unsigned long)ptr & 7))
+		return -EFAULT;
+
+	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+		return 0;
+
+	return read_user_stack_slow(ptr, ret, 8);
+}
+
+/*
+ * 64-bit user processes use the same stack frame for RT and non-RT signals.
+ */
+struct signal_frame_64 {
+	char		dummy[__SIGNAL_FRAMESIZE];
+	struct ucontext	uc;
+	unsigned long	unused[2];
+	unsigned int	tramp[6];
+	struct siginfo	*pinfo;
+	void		*puc;
+	struct siginfo	info;
+	char		abigap[288];
+};
+
+static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
+{
+	if (nip == fp + offsetof(struct signal_frame_64, tramp))
+		return 1;
+	if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
+	    nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
+		return 1;
+	return 0;
+}
+
+/*
+ * Do some sanity checking on the signal frame pointed to by sp.
+ * We check the pinfo and puc pointers in the frame.
+ */
+static int sane_signal_64_frame(unsigned long sp)
+{
+	struct signal_frame_64 __user *sf;
+	unsigned long pinfo, puc;
+
+	sf = (struct signal_frame_64 __user *) sp;
+	if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
+	    read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
+		return 0;
+	return pinfo == (unsigned long) &sf->info &&
+		puc == (unsigned long) &sf->uc;
+}
+
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+			    struct pt_regs *regs)
+{
+	unsigned long sp, next_sp;
+	unsigned long next_ip;
+	unsigned long lr;
+	long level = 0;
+	struct signal_frame_64 __user *sigframe;
+	unsigned long __user *fp, *uregs;
+
+	next_ip = perf_instruction_pointer(regs);
+	lr = regs->link;
+	sp = regs->gpr[1];
+	perf_callchain_store(entry, next_ip);
+
+	while (entry->nr < entry->max_stack) {
+		fp = (unsigned long __user *) sp;
+		if (invalid_user_sp(sp) || read_user_stack_64(fp, &next_sp))
+			return;
+		if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
+			return;
+
+		/*
+		 * Note: the next_sp - sp >= signal frame size check
+		 * is true when next_sp < sp, which can happen when
+		 * transitioning from an alternate signal stack to the
+		 * normal stack.
+		 */
+		if (next_sp - sp >= sizeof(struct signal_frame_64) &&
+		    (is_sigreturn_64_address(next_ip, sp) ||
+		     (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
+		    sane_signal_64_frame(sp)) {
+			/*
+			 * This looks like an signal frame
+			 */
+			sigframe = (struct signal_frame_64 __user *) sp;
+			uregs = sigframe->uc.uc_mcontext.gp_regs;
+			if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
+			    read_user_stack_64(&uregs[PT_LNK], &lr) ||
+			    read_user_stack_64(&uregs[PT_R1], &sp))
+				return;
+			level = 0;
+			perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+			perf_callchain_store(entry, next_ip);
+			continue;
+		}
+
+		if (level == 0)
+			next_ip = lr;
+		perf_callchain_store(entry, next_ip);
+		++level;
+		sp = next_sp;
+	}
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 10:20   ` Michal Suchanek
@ 2020-03-20 10:20     ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Christophe Leroy,
	Thomas Gleixner, Arnd Bergmann, Nayna Jain, Eric Richter,
	Claudio Carvalho, Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

While at it also simplify the existing perf patterns.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v10: new patch
V12: remove redundant entries
---
 MAINTAINERS | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e1a99197fb34..578429d22220 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13080,7 +13080,7 @@ R:	Namhyung Kim <namhyung@kernel.org>
 L:	linux-kernel@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
 S:	Supported
-F:	kernel/events/*
+F:	kernel/events/
 F:	include/linux/perf_event.h
 F:	include/uapi/linux/perf_event.h
 F:	arch/*/kernel/perf_event*.c
@@ -13088,8 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
 F:	arch/*/kernel/*/*/perf_event*.c
 F:	arch/*/include/asm/perf_event.h
 F:	arch/*/kernel/perf_callchain.c
-F:	arch/*/events/*
-F:	arch/*/events/*/*
+F:	arch/*/events/
+F:	arch/*/perf/
 F:	tools/perf/
 
 PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 10:20     ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-03-20 10:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchanek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

While at it also simplify the existing perf patterns.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
v10: new patch
V12: remove redundant entries
---
 MAINTAINERS | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index e1a99197fb34..578429d22220 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13080,7 +13080,7 @@ R:	Namhyung Kim <namhyung@kernel.org>
 L:	linux-kernel@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
 S:	Supported
-F:	kernel/events/*
+F:	kernel/events/
 F:	include/linux/perf_event.h
 F:	include/uapi/linux/perf_event.h
 F:	arch/*/kernel/perf_event*.c
@@ -13088,8 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
 F:	arch/*/kernel/*/*/perf_event*.c
 F:	arch/*/include/asm/perf_event.h
 F:	arch/*/kernel/perf_callchain.c
-F:	arch/*/events/*
-F:	arch/*/events/*/*
+F:	arch/*/events/
+F:	arch/*/perf/
 F:	tools/perf/
 
 PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 10:20     ` Michal Suchanek
@ 2020-03-20 10:33       ` Andy Shevchenko
  -1 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 10:33 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> While at it also simplify the existing perf patterns.
> 

And still missed fixes from parse-maintainers.pl.

I see it like below in the linux-next (after the script)

PERFORMANCE EVENTS SUBSYSTEM
M:      Peter Zijlstra <peterz@infradead.org>
M:      Ingo Molnar <mingo@redhat.com>
M:      Arnaldo Carvalho de Melo <acme@kernel.org>
R:      Mark Rutland <mark.rutland@arm.com>
R:      Alexander Shishkin <alexander.shishkin@linux.intel.com>
R:      Jiri Olsa <jolsa@redhat.com>
R:      Namhyung Kim <namhyung@kernel.org>
L:      linux-kernel@vger.kernel.org
S:      Supported
T:      git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
F:      arch/*/events/*
F:      arch/*/events/*/*
F:      arch/*/include/asm/perf_event.h
F:      arch/*/kernel/*/*/perf_event*.c
F:      arch/*/kernel/*/perf_event*.c
F:      arch/*/kernel/perf_callchain.c
F:      arch/*/kernel/perf_event*.c
F:      include/linux/perf_event.h
F:      include/uapi/linux/perf_event.h
F:      kernel/events/*
F:      tools/perf/

> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13080,7 +13080,7 @@ R:	Namhyung Kim <namhyung@kernel.org>
>  L:	linux-kernel@vger.kernel.org
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>  S:	Supported
> -F:	kernel/events/*
> +F:	kernel/events/
>  F:	include/linux/perf_event.h
>  F:	include/uapi/linux/perf_event.h
>  F:	arch/*/kernel/perf_event*.c
> @@ -13088,8 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
>  F:	arch/*/kernel/*/*/perf_event*.c
>  F:	arch/*/include/asm/perf_event.h
>  F:	arch/*/kernel/perf_callchain.c
> -F:	arch/*/events/*
> -F:	arch/*/events/*/*
> +F:	arch/*/events/
> +F:	arch/*/perf/
>  F:	tools/perf/
>  
>  PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 10:33       ` Andy Shevchenko
  0 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 10:33 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> While at it also simplify the existing perf patterns.
> 

And still missed fixes from parse-maintainers.pl.

I see it like below in the linux-next (after the script)

PERFORMANCE EVENTS SUBSYSTEM
M:      Peter Zijlstra <peterz@infradead.org>
M:      Ingo Molnar <mingo@redhat.com>
M:      Arnaldo Carvalho de Melo <acme@kernel.org>
R:      Mark Rutland <mark.rutland@arm.com>
R:      Alexander Shishkin <alexander.shishkin@linux.intel.com>
R:      Jiri Olsa <jolsa@redhat.com>
R:      Namhyung Kim <namhyung@kernel.org>
L:      linux-kernel@vger.kernel.org
S:      Supported
T:      git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
F:      arch/*/events/*
F:      arch/*/events/*/*
F:      arch/*/include/asm/perf_event.h
F:      arch/*/kernel/*/*/perf_event*.c
F:      arch/*/kernel/*/perf_event*.c
F:      arch/*/kernel/perf_callchain.c
F:      arch/*/kernel/perf_event*.c
F:      include/linux/perf_event.h
F:      include/uapi/linux/perf_event.h
F:      kernel/events/*
F:      tools/perf/

> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13080,7 +13080,7 @@ R:	Namhyung Kim <namhyung@kernel.org>
>  L:	linux-kernel@vger.kernel.org
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>  S:	Supported
> -F:	kernel/events/*
> +F:	kernel/events/
>  F:	include/linux/perf_event.h
>  F:	include/uapi/linux/perf_event.h
>  F:	arch/*/kernel/perf_event*.c
> @@ -13088,8 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
>  F:	arch/*/kernel/*/*/perf_event*.c
>  F:	arch/*/include/asm/perf_event.h
>  F:	arch/*/kernel/perf_callchain.c
> -F:	arch/*/events/*
> -F:	arch/*/events/*/*
> +F:	arch/*/events/
> +F:	arch/*/perf/
>  F:	tools/perf/
>  
>  PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 10:33       ` Andy Shevchenko
@ 2020-03-20 11:23         ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-20 11:23 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > While at it also simplify the existing perf patterns.
> > 
> 
> And still missed fixes from parse-maintainers.pl.

Oh, that script UX is truly ingenious. It provides no output and quietly
creates MAINTAINERS.new which is, of course, not included in the patch.

Thanks

Michal

> 
> I see it like below in the linux-next (after the script)
> 
> PERFORMANCE EVENTS SUBSYSTEM
> M:      Peter Zijlstra <peterz@infradead.org>
> M:      Ingo Molnar <mingo@redhat.com>
> M:      Arnaldo Carvalho de Melo <acme@kernel.org>
> R:      Mark Rutland <mark.rutland@arm.com>
> R:      Alexander Shishkin <alexander.shishkin@linux.intel.com>
> R:      Jiri Olsa <jolsa@redhat.com>
> R:      Namhyung Kim <namhyung@kernel.org>
> L:      linux-kernel@vger.kernel.org
> S:      Supported
> T:      git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> F:      arch/*/events/*
> F:      arch/*/events/*/*
> F:      arch/*/include/asm/perf_event.h
> F:      arch/*/kernel/*/*/perf_event*.c
> F:      arch/*/kernel/*/perf_event*.c
> F:      arch/*/kernel/perf_callchain.c
> F:      arch/*/kernel/perf_event*.c
> F:      include/linux/perf_event.h
> F:      include/uapi/linux/perf_event.h
> F:      kernel/events/*
> F:      tools/perf/
> 
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13080,7 +13080,7 @@ R:	Namhyung Kim <namhyung@kernel.org>
> >  L:	linux-kernel@vger.kernel.org
> >  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> >  S:	Supported
> > -F:	kernel/events/*
> > +F:	kernel/events/
> >  F:	include/linux/perf_event.h
> >  F:	include/uapi/linux/perf_event.h
> >  F:	arch/*/kernel/perf_event*.c
> > @@ -13088,8 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
> >  F:	arch/*/kernel/*/*/perf_event*.c
> >  F:	arch/*/include/asm/perf_event.h
> >  F:	arch/*/kernel/perf_callchain.c
> > -F:	arch/*/events/*
> > -F:	arch/*/events/*/*
> > +F:	arch/*/events/
> > +F:	arch/*/perf/
> >  F:	tools/perf/
> >  
> >  PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 11:23         ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-20 11:23 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > While at it also simplify the existing perf patterns.
> > 
> 
> And still missed fixes from parse-maintainers.pl.

Oh, that script UX is truly ingenious. It provides no output and quietly
creates MAINTAINERS.new which is, of course, not included in the patch.

Thanks

Michal

> 
> I see it like below in the linux-next (after the script)
> 
> PERFORMANCE EVENTS SUBSYSTEM
> M:      Peter Zijlstra <peterz@infradead.org>
> M:      Ingo Molnar <mingo@redhat.com>
> M:      Arnaldo Carvalho de Melo <acme@kernel.org>
> R:      Mark Rutland <mark.rutland@arm.com>
> R:      Alexander Shishkin <alexander.shishkin@linux.intel.com>
> R:      Jiri Olsa <jolsa@redhat.com>
> R:      Namhyung Kim <namhyung@kernel.org>
> L:      linux-kernel@vger.kernel.org
> S:      Supported
> T:      git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> F:      arch/*/events/*
> F:      arch/*/events/*/*
> F:      arch/*/include/asm/perf_event.h
> F:      arch/*/kernel/*/*/perf_event*.c
> F:      arch/*/kernel/*/perf_event*.c
> F:      arch/*/kernel/perf_callchain.c
> F:      arch/*/kernel/perf_event*.c
> F:      include/linux/perf_event.h
> F:      include/uapi/linux/perf_event.h
> F:      kernel/events/*
> F:      tools/perf/
> 
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -13080,7 +13080,7 @@ R:	Namhyung Kim <namhyung@kernel.org>
> >  L:	linux-kernel@vger.kernel.org
> >  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> >  S:	Supported
> > -F:	kernel/events/*
> > +F:	kernel/events/
> >  F:	include/linux/perf_event.h
> >  F:	include/uapi/linux/perf_event.h
> >  F:	arch/*/kernel/perf_event*.c
> > @@ -13088,8 +13088,8 @@ F:	arch/*/kernel/*/perf_event*.c
> >  F:	arch/*/kernel/*/*/perf_event*.c
> >  F:	arch/*/include/asm/perf_event.h
> >  F:	arch/*/kernel/perf_callchain.c
> > -F:	arch/*/events/*
> > -F:	arch/*/events/*/*
> > +F:	arch/*/events/
> > +F:	arch/*/perf/
> >  F:	tools/perf/
> >  
> >  PERFORMANCE EVENTS SUBSYSTEM ARM64 PMU EVENTS
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 11:23         ` Michal Suchánek
@ 2020-03-20 12:42           ` Andy Shevchenko
  -1 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 12:42 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > While at it also simplify the existing perf patterns.

> > And still missed fixes from parse-maintainers.pl.
> 
> Oh, that script UX is truly ingenious.

You have at least two options, their combinations, etc:
 - complain to the author :-)
 - send a patch :-)

> It provides no output and quietly
> creates MAINTAINERS.new which is, of course, not included in the patch.

Yes. it also took me a while to understand how it works, luckily it has a
little help note.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 12:42           ` Andy Shevchenko
  0 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 12:42 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > While at it also simplify the existing perf patterns.

> > And still missed fixes from parse-maintainers.pl.
> 
> Oh, that script UX is truly ingenious.

You have at least two options, their combinations, etc:
 - complain to the author :-)
 - send a patch :-)

> It provides no output and quietly
> creates MAINTAINERS.new which is, of course, not included in the patch.

Yes. it also took me a while to understand how it works, luckily it has a
little help note.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 12:42           ` Andy Shevchenko
@ 2020-03-20 14:42             ` Joe Perches
  -1 siblings, 0 replies; 161+ messages in thread
From: Joe Perches @ 2020-03-20 14:42 UTC (permalink / raw)
  To: Andy Shevchenko, Michal Suchánek
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > While at it also simplify the existing perf patterns.
> > > And still missed fixes from parse-maintainers.pl.
> > 
> > Oh, that script UX is truly ingenious.
> 
> You have at least two options, their combinations, etc:
>  - complain to the author :-)
>  - send a patch :-)

Recently:

https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 14:42             ` Joe Perches
  0 siblings, 0 replies; 161+ messages in thread
From: Joe Perches @ 2020-03-20 14:42 UTC (permalink / raw)
  To: Andy Shevchenko, Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Hari Bathini, Greg Kroah-Hartman,
	Nicholas Piggin, Claudio Carvalho, Eric Richter,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > While at it also simplify the existing perf patterns.
> > > And still missed fixes from parse-maintainers.pl.
> > 
> > Oh, that script UX is truly ingenious.
> 
> You have at least two options, their combinations, etc:
>  - complain to the author :-)
>  - send a patch :-)

Recently:

https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 14:42             ` Joe Perches
@ 2020-03-20 16:28               ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-20 16:28 UTC (permalink / raw)
  To: Joe Perches
  Cc: Andy Shevchenko, Mark Rutland, Gustavo Luiz Duarte,
	Peter Zijlstra, Sebastian Andrzej Siewior, linux-kernel,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Mauro Carvalho Chehab, Masahiro Yamada, Nayna Jain,
	Alexander Shishkin, Ingo Molnar, Allison Randal, Jordan Niethe,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Hari Bathini, Greg Kroah-Hartman, Nicholas Piggin,
	Claudio Carvalho, Eric Richter, Eric W. Biederman, linux-fsdevel,
	linuxppc-dev, David S. Miller, Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > While at it also simplify the existing perf patterns.
> > > > And still missed fixes from parse-maintainers.pl.
> > > 
> > > Oh, that script UX is truly ingenious.
> > 
> > You have at least two options, their combinations, etc:
> >  - complain to the author :-)
> >  - send a patch :-)
> 
> Recently:
> 
> https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/

Can we expect that reaordering is taken care of in that discussion then?

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 16:28               ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-20 16:28 UTC (permalink / raw)
  To: Joe Perches
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra, Jordan Niethe,
	Sebastian Andrzej Siewior, Claudio Carvalho, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Hari Bathini, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Nicholas Piggin, linux-kernel, Eric Richter,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > While at it also simplify the existing perf patterns.
> > > > And still missed fixes from parse-maintainers.pl.
> > > 
> > > Oh, that script UX is truly ingenious.
> > 
> > You have at least two options, their combinations, etc:
> >  - complain to the author :-)
> >  - send a patch :-)
> 
> Recently:
> 
> https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/

Can we expect that reaordering is taken care of in that discussion then?

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 14:42             ` Joe Perches
@ 2020-03-20 16:31               ` Andy Shevchenko
  -1 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 16:31 UTC (permalink / raw)
  To: Joe Perches
  Cc: Michal Suchánek, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > While at it also simplify the existing perf patterns.
> > > > And still missed fixes from parse-maintainers.pl.
> > > 
> > > Oh, that script UX is truly ingenious.
> > 
> > You have at least two options, their combinations, etc:
> >  - complain to the author :-)
> >  - send a patch :-)
> 
> Recently:
> 
> https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/

But why?

Shouldn't we rather run MAINTAINERS clean up once and require people to use
parse-maintainers.pl for good?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 16:31               ` Andy Shevchenko
  0 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 16:31 UTC (permalink / raw)
  To: Joe Perches
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Michal Suchánek,
	Valentin Schneider, Arnd Bergmann, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Hari Bathini, Greg Kroah-Hartman, Nicholas Piggin,
	Claudio Carvalho, Eric Richter, Eric W. Biederman, linux-fsdevel,
	linuxppc-dev, David S. Miller, Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > While at it also simplify the existing perf patterns.
> > > > And still missed fixes from parse-maintainers.pl.
> > > 
> > > Oh, that script UX is truly ingenious.
> > 
> > You have at least two options, their combinations, etc:
> >  - complain to the author :-)
> >  - send a patch :-)
> 
> Recently:
> 
> https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/

But why?

Shouldn't we rather run MAINTAINERS clean up once and require people to use
parse-maintainers.pl for good?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 16:31               ` Andy Shevchenko
@ 2020-03-20 16:42                 ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-20 16:42 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Joe Perches, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, Mar 20, 2020 at 06:31:57PM +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > While at it also simplify the existing perf patterns.
> > > > > And still missed fixes from parse-maintainers.pl.
> > > > 
> > > > Oh, that script UX is truly ingenious.
> > > 
> > > You have at least two options, their combinations, etc:
> > >  - complain to the author :-)
> > >  - send a patch :-)
> > 
> > Recently:
> > 
> > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/
> 
> But why?
> 
> Shouldn't we rather run MAINTAINERS clean up once and require people to use
> parse-maintainers.pl for good?

That cleanup did not happen yet, and I am not volunteering for one.
The difference between MAINTAINERS and MAINTAINERS.new is:

 MAINTAINERS | 5510 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 2755 insertions(+), 2755 deletions(-)

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 16:42                 ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-20 16:42 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, linux-fsdevel, Alexander Viro,
	Jonathan Cameron, Namhyung Kim, Thomas Gleixner, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, Joe Perches, linuxppc-dev,
	David S. Miller, Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 06:31:57PM +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > While at it also simplify the existing perf patterns.
> > > > > And still missed fixes from parse-maintainers.pl.
> > > > 
> > > > Oh, that script UX is truly ingenious.
> > > 
> > > You have at least two options, their combinations, etc:
> > >  - complain to the author :-)
> > >  - send a patch :-)
> > 
> > Recently:
> > 
> > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/
> 
> But why?
> 
> Shouldn't we rather run MAINTAINERS clean up once and require people to use
> parse-maintainers.pl for good?

That cleanup did not happen yet, and I am not volunteering for one.
The difference between MAINTAINERS and MAINTAINERS.new is:

 MAINTAINERS | 5510 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 2755 insertions(+), 2755 deletions(-)

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 16:42                 ` Michal Suchánek
@ 2020-03-20 16:47                   ` Andy Shevchenko
  -1 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 16:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Joe Perches, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Christophe Leroy, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Fri, Mar 20, 2020 at 05:42:04PM +0100, Michal Suchánek wrote:
> On Fri, Mar 20, 2020 at 06:31:57PM +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > > While at it also simplify the existing perf patterns.
> > > > > > And still missed fixes from parse-maintainers.pl.
> > > > > 
> > > > > Oh, that script UX is truly ingenious.
> > > > 
> > > > You have at least two options, their combinations, etc:
> > > >  - complain to the author :-)
> > > >  - send a patch :-)
> > > 
> > > Recently:
> > > 
> > > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/
> > 
> > But why?
> > 
> > Shouldn't we rather run MAINTAINERS clean up once and require people to use
> > parse-maintainers.pl for good?
> 
> That cleanup did not happen yet, and I am not volunteering for one.
> The difference between MAINTAINERS and MAINTAINERS.new is:
> 
>  MAINTAINERS | 5510 +++++++++++++++++++++++++++++------------------------------
>  1 file changed, 2755 insertions(+), 2755 deletions(-)

Yes, it was basically reply to Joe.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 16:47                   ` Andy Shevchenko
  0 siblings, 0 replies; 161+ messages in thread
From: Andy Shevchenko @ 2020-03-20 16:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, linux-fsdevel, Alexander Viro,
	Jonathan Cameron, Namhyung Kim, Thomas Gleixner, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, Joe Perches, linuxppc-dev,
	David S. Miller, Thiago Jung Bauermann

On Fri, Mar 20, 2020 at 05:42:04PM +0100, Michal Suchánek wrote:
> On Fri, Mar 20, 2020 at 06:31:57PM +0200, Andy Shevchenko wrote:
> > On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > > While at it also simplify the existing perf patterns.
> > > > > > And still missed fixes from parse-maintainers.pl.
> > > > > 
> > > > > Oh, that script UX is truly ingenious.
> > > > 
> > > > You have at least two options, their combinations, etc:
> > > >  - complain to the author :-)
> > > >  - send a patch :-)
> > > 
> > > Recently:
> > > 
> > > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/
> > 
> > But why?
> > 
> > Shouldn't we rather run MAINTAINERS clean up once and require people to use
> > parse-maintainers.pl for good?
> 
> That cleanup did not happen yet, and I am not volunteering for one.
> The difference between MAINTAINERS and MAINTAINERS.new is:
> 
>  MAINTAINERS | 5510 +++++++++++++++++++++++++++++------------------------------
>  1 file changed, 2755 insertions(+), 2755 deletions(-)

Yes, it was basically reply to Joe.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
  2020-03-20 16:31               ` Andy Shevchenko
@ 2020-03-20 21:36                 ` Joe Perches
  -1 siblings, 0 replies; 161+ messages in thread
From: Joe Perches @ 2020-03-20 21:36 UTC (permalink / raw)
  To: Andy Shevchenko, Linus Torvalds
  Cc: Michal Suchánek, linuxppc-dev, linux-kernel, linux-fsdevel

(removed a bunch of cc's)

On Fri, 2020-03-20 at 18:31 +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > While at it also simplify the existing perf patterns.
> > > > > And still missed fixes from parse-maintainers.pl.
> > > > 
> > > > Oh, that script UX is truly ingenious.
> > > 
> > > You have at least two options, their combinations, etc:
> > >  - complain to the author :-)
> > >  - send a patch :-)
> > 
> > Recently:
> > 
> > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/
> 
> But why?
> 
> Shouldn't we rather run MAINTAINERS clean up once and require people to use
> parse-maintainers.pl for good?

That can basically only be done by Linus just before he releases
an RC1.

I am for it.  One day...



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry.
@ 2020-03-20 21:36                 ` Joe Perches
  0 siblings, 0 replies; 161+ messages in thread
From: Joe Perches @ 2020-03-20 21:36 UTC (permalink / raw)
  To: Andy Shevchenko, Linus Torvalds
  Cc: linux-fsdevel, Michal Suchánek, linuxppc-dev, linux-kernel

(removed a bunch of cc's)

On Fri, 2020-03-20 at 18:31 +0200, Andy Shevchenko wrote:
> On Fri, Mar 20, 2020 at 07:42:03AM -0700, Joe Perches wrote:
> > On Fri, 2020-03-20 at 14:42 +0200, Andy Shevchenko wrote:
> > > On Fri, Mar 20, 2020 at 12:23:38PM +0100, Michal Suchánek wrote:
> > > > On Fri, Mar 20, 2020 at 12:33:50PM +0200, Andy Shevchenko wrote:
> > > > > On Fri, Mar 20, 2020 at 11:20:19AM +0100, Michal Suchanek wrote:
> > > > > > While at it also simplify the existing perf patterns.
> > > > > And still missed fixes from parse-maintainers.pl.
> > > > 
> > > > Oh, that script UX is truly ingenious.
> > > 
> > > You have at least two options, their combinations, etc:
> > >  - complain to the author :-)
> > >  - send a patch :-)
> > 
> > Recently:
> > 
> > https://lore.kernel.org/lkml/4d5291fa3fb4962b1fa55e8fd9ef421ef0c1b1e5.camel@perches.com/
> 
> But why?
> 
> Shouldn't we rather run MAINTAINERS clean up once and require people to use
> parse-maintainers.pl for good?

That can basically only be done by Linus just before he releases
an RC1.

I am for it.  One day...



^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-03-19 12:19     ` Michal Suchanek
@ 2020-03-24  8:48       ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-03-24  8:48 UTC (permalink / raw)
  To: linuxppc-dev, Michal Suchanek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, Mark Rutland,
	Masahiro Yamada, Mauro Carvalho Chehab, Michael Neuling,
	Ingo Molnar, Michael Ellerman, Namhyung Kim, Nayna Jain,
	Paul Mackerras, Peter Zijlstra, Rob Herring, Thomas Gleixner,
	Valentin Schneider, Alexander Viro

Michal Suchanek's on March 19, 2020 10:19 pm:
> There are two almost identical copies for 32bit and 64bit.
> 
> The function is used only in 32bit code which will be split out in next
> patch so consolidate to one function.
> 
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> v6:  new patch
> v8:  move the consolidated function out of the ifdef block.
> v11: rebase on top of def0bfdbd603
> ---
>  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..c9a78c6e4361 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>  	return read_user_stack_slow(ptr, ret, 8);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 4);
> -}
> -
>  static inline int valid_user_sp(unsigned long sp, int is_64)
>  {
>  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>  }
>  
>  #else  /* CONFIG_PPC64 */
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables.  Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	return probe_user_read(ret, ptr, sizeof(*ret));
> +	return 0;
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>  
>  #endif /* CONFIG_PPC64 */
>  
> +/*
> + * On 32-bit we just access the address and let hash_page create a
> + * HPTE if necessary, so there is no need to fall back to reading
> + * the page tables.  Since this is called at interrupt level,
> + * do_page_fault() won't treat a DSI as a page fault.
> + */

The comment is actually probably better to stay in the 32-bit
read_user_stack_slow implementation. Is that function defined
on 32-bit purely so that you can use IS_ENABLED()? In that case
I would prefer to put a BUG() there which makes it self documenting.

Thanks,
Nick

> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> +	int rc;
> +
> +	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> +	    ((unsigned long)ptr & 3))
> +		return -EFAULT;
> +
> +	rc = probe_user_read(ret, ptr, sizeof(*ret));
> +
> +	if (IS_ENABLED(CONFIG_PPC64) && rc)
> +		return read_user_stack_slow(ptr, ret, 4);
> +
> +	return rc;
> +}
> +
>  /*
>   * Layout for non-RT signal frames
>   */
> -- 
> 2.23.0
> 
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-03-24  8:48       ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-03-24  8:48 UTC (permalink / raw)
  To: linuxppc-dev, Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Michal Suchanek's on March 19, 2020 10:19 pm:
> There are two almost identical copies for 32bit and 64bit.
> 
> The function is used only in 32bit code which will be split out in next
> patch so consolidate to one function.
> 
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> v6:  new patch
> v8:  move the consolidated function out of the ifdef block.
> v11: rebase on top of def0bfdbd603
> ---
>  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..c9a78c6e4361 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>  	return read_user_stack_slow(ptr, ret, 8);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 4);
> -}
> -
>  static inline int valid_user_sp(unsigned long sp, int is_64)
>  {
>  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>  }
>  
>  #else  /* CONFIG_PPC64 */
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables.  Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	return probe_user_read(ret, ptr, sizeof(*ret));
> +	return 0;
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>  
>  #endif /* CONFIG_PPC64 */
>  
> +/*
> + * On 32-bit we just access the address and let hash_page create a
> + * HPTE if necessary, so there is no need to fall back to reading
> + * the page tables.  Since this is called at interrupt level,
> + * do_page_fault() won't treat a DSI as a page fault.
> + */

The comment is actually probably better to stay in the 32-bit
read_user_stack_slow implementation. Is that function defined
on 32-bit purely so that you can use IS_ENABLED()? In that case
I would prefer to put a BUG() there which makes it self documenting.

Thanks,
Nick

> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> +	int rc;
> +
> +	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> +	    ((unsigned long)ptr & 3))
> +		return -EFAULT;
> +
> +	rc = probe_user_read(ret, ptr, sizeof(*ret));
> +
> +	if (IS_ENABLED(CONFIG_PPC64) && rc)
> +		return read_user_stack_slow(ptr, ret, 4);
> +
> +	return rc;
> +}
> +
>  /*
>   * Layout for non-RT signal frames
>   */
> -- 
> 2.23.0
> 
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-03-19 12:19     ` Michal Suchanek
@ 2020-03-24  8:54       ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-03-24  8:54 UTC (permalink / raw)
  To: linuxppc-dev, Michal Suchanek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, Mark Rutland,
	Masahiro Yamada, Mauro Carvalho Chehab, Michael Neuling,
	Ingo Molnar, Michael Ellerman, Namhyung Kim, Nayna Jain,
	Paul Mackerras, Peter Zijlstra, Rob Herring, Thomas Gleixner,
	Valentin Schneider, Alexander Viro

Michal Suchanek's on March 19, 2020 10:19 pm:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 4b0152108f61..a264989626fd 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>  	sigset_t *oldset = sigmask_to_save();
>  	struct ksignal ksig = { .sig = 0 };
>  	int ret;
> -	int is32 = is_32bit_task();
>  
>  	BUG_ON(tsk != current);
>  
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>  
>  	rseq_signal_deliver(&ksig, tsk->thread.regs);
>  
> -	if (is32) {
> +	if (is_32bit_task()) {
>          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>  			ret = handle_rt_signal32(&ksig, oldset, tsk);
>  		else

Unnecessary?

> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 87d95b455b83..2dcbfe38f5ac 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  				   long r6, long r7, long r8,
>  				   unsigned long r0, struct pt_regs *regs)
>  {
> -	unsigned long ti_flags;
>  	syscall_fn f;
>  
>  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  
>  	local_irq_enable();
>  
> -	ti_flags = current_thread_info()->flags;
> -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
>  		/*
>  		 * We use the return value of do_syscall_trace_enter() as the
>  		 * syscall number. If the syscall was rejected for any reason
> @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	/* May be faster to do array_index_nospec? */
>  	barrier_nospec();
>  
> -	if (unlikely(ti_flags & _TIF_32BIT)) {
> +	if (unlikely(is_32bit_task())) {

Problem is, does this allow the load of ti_flags to be used for both
tests, or does test_bit make it re-load?

This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&

Other than these, the patches all look pretty good to me.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-03-24  8:54       ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-03-24  8:54 UTC (permalink / raw)
  To: linuxppc-dev, Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Michal Suchanek's on March 19, 2020 10:19 pm:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 4b0152108f61..a264989626fd 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>  	sigset_t *oldset = sigmask_to_save();
>  	struct ksignal ksig = { .sig = 0 };
>  	int ret;
> -	int is32 = is_32bit_task();
>  
>  	BUG_ON(tsk != current);
>  
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>  
>  	rseq_signal_deliver(&ksig, tsk->thread.regs);
>  
> -	if (is32) {
> +	if (is_32bit_task()) {
>          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>  			ret = handle_rt_signal32(&ksig, oldset, tsk);
>  		else

Unnecessary?

> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 87d95b455b83..2dcbfe38f5ac 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  				   long r6, long r7, long r8,
>  				   unsigned long r0, struct pt_regs *regs)
>  {
> -	unsigned long ti_flags;
>  	syscall_fn f;
>  
>  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  
>  	local_irq_enable();
>  
> -	ti_flags = current_thread_info()->flags;
> -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
>  		/*
>  		 * We use the return value of do_syscall_trace_enter() as the
>  		 * syscall number. If the syscall was rejected for any reason
> @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	/* May be faster to do array_index_nospec? */
>  	barrier_nospec();
>  
> -	if (unlikely(ti_flags & _TIF_32BIT)) {
> +	if (unlikely(is_32bit_task())) {

Problem is, does this allow the load of ti_flags to be used for both
tests, or does test_bit make it re-load?

This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&

Other than these, the patches all look pretty good to me.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-03-24  8:54       ` Nicholas Piggin
@ 2020-03-24 19:30         ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-24 19:30 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Mark Rutland, Gustavo Luiz Duarte,
	Alexander Shishkin, Sebastian Andrzej Siewior, linux-kernel,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Eric Richter, Masahiro Yamada, Nayna Jain, Peter Zijlstra,
	Ingo Molnar, Hari Bathini, Jordan Niethe, Valentin Schneider,
	Arnd Bergmann, Arnaldo Carvalho de Melo, Alexander Viro,
	Jonathan Cameron, Namhyung Kim, Thomas Gleixner, Andy Shevchenko,
	Allison Randal, Greg Kroah-Hartman, Claudio Carvalho,
	Mauro Carvalho Chehab, Eric W. Biederman, linux-fsdevel,
	David S. Miller, Thiago Jung Bauermann

On Tue, Mar 24, 2020 at 06:54:20PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on March 19, 2020 10:19 pm:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 4b0152108f61..a264989626fd 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> >  	sigset_t *oldset = sigmask_to_save();
> >  	struct ksignal ksig = { .sig = 0 };
> >  	int ret;
> > -	int is32 = is_32bit_task();
> >  
> >  	BUG_ON(tsk != current);
> >  
> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
> >  
> >  	rseq_signal_deliver(&ksig, tsk->thread.regs);
> >  
> > -	if (is32) {
> > +	if (is_32bit_task()) {
> >          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> >  			ret = handle_rt_signal32(&ksig, oldset, tsk);
> >  		else
> 
> Unnecessary?
> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> > index 87d95b455b83..2dcbfe38f5ac 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  				   long r6, long r7, long r8,
> >  				   unsigned long r0, struct pt_regs *regs)
> >  {
> > -	unsigned long ti_flags;
> >  	syscall_fn f;
> >  
> >  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  
> >  	local_irq_enable();
> >  
> > -	ti_flags = current_thread_info()->flags;
> > -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> >  		/*
> >  		 * We use the return value of do_syscall_trace_enter() as the
> >  		 * syscall number. If the syscall was rejected for any reason
> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  	/* May be faster to do array_index_nospec? */
> >  	barrier_nospec();
> >  
> > -	if (unlikely(ti_flags & _TIF_32BIT)) {
> > +	if (unlikely(is_32bit_task())) {
> 
> Problem is, does this allow the load of ti_flags to be used for both
> tests, or does test_bit make it re-load?
> 
> This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&
Both points already discussed here:

https://lore.kernel.org/linuxppc-dev/13fa324dc879a7f325290bf2e131b87eb491cd7b.1573576649.git.msuchanek@suse.de/

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-03-24 19:30         ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-24 19:30 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Jordan Niethe, Sebastian Andrzej Siewior, Claudio Carvalho,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Eric Richter, Masahiro Yamada, Nayna Jain, Peter Zijlstra,
	Ingo Molnar, Allison Randal, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, linux-kernel, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Tue, Mar 24, 2020 at 06:54:20PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on March 19, 2020 10:19 pm:
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index 4b0152108f61..a264989626fd 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> >  	sigset_t *oldset = sigmask_to_save();
> >  	struct ksignal ksig = { .sig = 0 };
> >  	int ret;
> > -	int is32 = is_32bit_task();
> >  
> >  	BUG_ON(tsk != current);
> >  
> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
> >  
> >  	rseq_signal_deliver(&ksig, tsk->thread.regs);
> >  
> > -	if (is32) {
> > +	if (is_32bit_task()) {
> >          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> >  			ret = handle_rt_signal32(&ksig, oldset, tsk);
> >  		else
> 
> Unnecessary?
> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> > index 87d95b455b83..2dcbfe38f5ac 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  				   long r6, long r7, long r8,
> >  				   unsigned long r0, struct pt_regs *regs)
> >  {
> > -	unsigned long ti_flags;
> >  	syscall_fn f;
> >  
> >  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  
> >  	local_irq_enable();
> >  
> > -	ti_flags = current_thread_info()->flags;
> > -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> >  		/*
> >  		 * We use the return value of do_syscall_trace_enter() as the
> >  		 * syscall number. If the syscall was rejected for any reason
> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >  	/* May be faster to do array_index_nospec? */
> >  	barrier_nospec();
> >  
> > -	if (unlikely(ti_flags & _TIF_32BIT)) {
> > +	if (unlikely(is_32bit_task())) {
> 
> Problem is, does this allow the load of ti_flags to be used for both
> tests, or does test_bit make it re-load?
> 
> This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&
Both points already discussed here:

https://lore.kernel.org/linuxppc-dev/13fa324dc879a7f325290bf2e131b87eb491cd7b.1573576649.git.msuchanek@suse.de/

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-03-24  8:48       ` Nicholas Piggin
@ 2020-03-24 19:38         ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-24 19:38 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linuxppc-dev, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Allison Randal, Andy Shevchenko, Arnd Bergmann,
	Thiago Jung Bauermann, Benjamin Herrenschmidt,
	Sebastian Andrzej Siewior, Claudio Carvalho, Christophe Leroy,
	David S. Miller, Eric W. Biederman, Eric Richter,
	Greg Kroah-Hartman, Gustavo Luiz Duarte, Hari Bathini,
	Jordan Niethe, Jiri Olsa, Jonathan Cameron, linux-fsdevel,
	linux-kernel, Mark Rutland, Masahiro Yamada,
	Mauro Carvalho Chehab, Michael Neuling, Ingo Molnar,
	Michael Ellerman, Namhyung Kim, Nayna Jain, Paul Mackerras,
	Peter Zijlstra, Rob Herring, Thomas Gleixner, Valentin Schneider,
	Alexander Viro

On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on March 19, 2020 10:19 pm:
> > There are two almost identical copies for 32bit and 64bit.
> > 
> > The function is used only in 32bit code which will be split out in next
> > patch so consolidate to one function.
> > 
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > ---
> > v6:  new patch
> > v8:  move the consolidated function out of the ifdef block.
> > v11: rebase on top of def0bfdbd603
> > ---
> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
> >  1 file changed, 24 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index cbc251981209..c9a78c6e4361 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> >  	return read_user_stack_slow(ptr, ret, 8);
> >  }
> >  
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > -{
> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -	    ((unsigned long)ptr & 3))
> > -		return -EFAULT;
> > -
> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> > -		return 0;
> > -
> > -	return read_user_stack_slow(ptr, ret, 4);
> > -}
> > -
> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >  {
> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >  }
> >  
> >  #else  /* CONFIG_PPC64 */
> > -/*
> > - * On 32-bit we just access the address and let hash_page create a
> > - * HPTE if necessary, so there is no need to fall back to reading
> > - * the page tables.  Since this is called at interrupt level,
> > - * do_page_fault() won't treat a DSI as a page fault.
> > - */
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >  {
> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -	    ((unsigned long)ptr & 3))
> > -		return -EFAULT;
> > -
> > -	return probe_user_read(ret, ptr, sizeof(*ret));
> > +	return 0;
> >  }
> >  
> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
> >  
> >  #endif /* CONFIG_PPC64 */
> >  
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> 
> The comment is actually probably better to stay in the 32-bit
> read_user_stack_slow implementation. Is that function defined
> on 32-bit purely so that you can use IS_ENABLED()? In that case
It documents the IS_ENABLED() and that's where it is. The 32bit
definition is only a technical detail.
> I would prefer to put a BUG() there which makes it self documenting.
Which will cause checkpatch complaints about introducing new BUG() which
is frowned on.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-03-24 19:38         ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-03-24 19:38 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> Michal Suchanek's on March 19, 2020 10:19 pm:
> > There are two almost identical copies for 32bit and 64bit.
> > 
> > The function is used only in 32bit code which will be split out in next
> > patch so consolidate to one function.
> > 
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > ---
> > v6:  new patch
> > v8:  move the consolidated function out of the ifdef block.
> > v11: rebase on top of def0bfdbd603
> > ---
> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
> >  1 file changed, 24 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> > index cbc251981209..c9a78c6e4361 100644
> > --- a/arch/powerpc/perf/callchain.c
> > +++ b/arch/powerpc/perf/callchain.c
> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> >  	return read_user_stack_slow(ptr, ret, 8);
> >  }
> >  
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > -{
> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -	    ((unsigned long)ptr & 3))
> > -		return -EFAULT;
> > -
> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> > -		return 0;
> > -
> > -	return read_user_stack_slow(ptr, ret, 4);
> > -}
> > -
> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >  {
> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >  }
> >  
> >  #else  /* CONFIG_PPC64 */
> > -/*
> > - * On 32-bit we just access the address and let hash_page create a
> > - * HPTE if necessary, so there is no need to fall back to reading
> > - * the page tables.  Since this is called at interrupt level,
> > - * do_page_fault() won't treat a DSI as a page fault.
> > - */
> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >  {
> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> > -	    ((unsigned long)ptr & 3))
> > -		return -EFAULT;
> > -
> > -	return probe_user_read(ret, ptr, sizeof(*ret));
> > +	return 0;
> >  }
> >  
> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
> >  
> >  #endif /* CONFIG_PPC64 */
> >  
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> 
> The comment is actually probably better to stay in the 32-bit
> read_user_stack_slow implementation. Is that function defined
> on 32-bit purely so that you can use IS_ENABLED()? In that case
It documents the IS_ENABLED() and that's where it is. The 32bit
definition is only a technical detail.
> I would prefer to put a BUG() there which makes it self documenting.
Which will cause checkpatch complaints about introducing new BUG() which
is frowned on.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation
  2020-02-25 17:35 ` [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation Nicholas Piggin
@ 2020-04-01 12:53   ` Michael Ellerman
  0 siblings, 0 replies; 161+ messages in thread
From: Michael Ellerman @ 2020-04-01 12:53 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek, Nicholas Piggin

On Tue, 2020-02-25 at 17:35:10 UTC, Nicholas Piggin wrote:
> The code generation macro arguments are difficult to read, and
> defaults can't easily be used.
> 
> This introduces a block where parameters can be set for interrupt
> handler code generation by the subsequent macros, and adds the first
> generation macro for interrupt entry.
> 
> One interrupt handler is converted to the new macros to demonstrate
> the change, the rest will be coverted all at once.
> 
> No generated code change.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Patches 1-30 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a42a239db3262b8185cb1a07a9350392ef1439ca

cheers

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-03-24 19:38         ` Michal Suchánek
@ 2020-04-03  7:13           ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  7:13 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, linuxppc-dev,
	Mark Rutland, Masahiro Yamada, Mauro Carvalho Chehab,
	Michael Neuling, Ingo Molnar, Michael Ellerman, Namhyung Kim,
	Nayna Jain, Paul Mackerras, Peter Zijlstra, Rob Herring,
	Thomas Gleixner, Valentin Schneider, Alexander Viro

Michal Suchánek's on March 25, 2020 5:38 am:
> On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
>> Michal Suchanek's on March 19, 2020 10:19 pm:
>> > There are two almost identical copies for 32bit and 64bit.
>> > 
>> > The function is used only in 32bit code which will be split out in next
>> > patch so consolidate to one function.
>> > 
>> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
>> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> > ---
>> > v6:  new patch
>> > v8:  move the consolidated function out of the ifdef block.
>> > v11: rebase on top of def0bfdbd603
>> > ---
>> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
>> >  1 file changed, 24 insertions(+), 24 deletions(-)
>> > 
>> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
>> > index cbc251981209..c9a78c6e4361 100644
>> > --- a/arch/powerpc/perf/callchain.c
>> > +++ b/arch/powerpc/perf/callchain.c
>> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>> >  	return read_user_stack_slow(ptr, ret, 8);
>> >  }
>> >  
>> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>> > -{
>> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
>> > -	    ((unsigned long)ptr & 3))
>> > -		return -EFAULT;
>> > -
>> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
>> > -		return 0;
>> > -
>> > -	return read_user_stack_slow(ptr, ret, 4);
>> > -}
>> > -
>> >  static inline int valid_user_sp(unsigned long sp, int is_64)
>> >  {
>> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
>> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>> >  }
>> >  
>> >  #else  /* CONFIG_PPC64 */
>> > -/*
>> > - * On 32-bit we just access the address and let hash_page create a
>> > - * HPTE if necessary, so there is no need to fall back to reading
>> > - * the page tables.  Since this is called at interrupt level,
>> > - * do_page_fault() won't treat a DSI as a page fault.
>> > - */
>> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>> >  {
>> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
>> > -	    ((unsigned long)ptr & 3))
>> > -		return -EFAULT;
>> > -
>> > -	return probe_user_read(ret, ptr, sizeof(*ret));
>> > +	return 0;
>> >  }
>> >  
>> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>> >  
>> >  #endif /* CONFIG_PPC64 */
>> >  
>> > +/*
>> > + * On 32-bit we just access the address and let hash_page create a
>> > + * HPTE if necessary, so there is no need to fall back to reading
>> > + * the page tables.  Since this is called at interrupt level,
>> > + * do_page_fault() won't treat a DSI as a page fault.
>> > + */
>> 
>> The comment is actually probably better to stay in the 32-bit
>> read_user_stack_slow implementation. Is that function defined
>> on 32-bit purely so that you can use IS_ENABLED()? In that case
> It documents the IS_ENABLED() and that's where it is. The 32bit
> definition is only a technical detail.

Sorry for the late reply, busy trying to fix bugs in the C rewrite
series. I don't think it is the right place, it should be in the
ppc32 implementation detail. ppc64 has an equivalent comment at the
top of its read_user_stack functions.

>> I would prefer to put a BUG() there which makes it self documenting.
> Which will cause checkpatch complaints about introducing new BUG() which
> is frowned on.

It's fine in this case, that warning is about not introducing
runtime bugs, but this wouldn't be.

But... I actually don't like adding read_user_stack_slow on 32-bit
and especially not just to make IS_ENABLED work.

IMO this would be better if you really want to consolidate it

---

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..ca3a599b3f54 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
  * interrupt context, so if the access faults, we read the page tables
  * to find which page (if any) is mapped and access it directly.
  */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+static int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
 {
 	int ret = -EFAULT;
 	pgd_t *pgdir;
@@ -149,28 +149,21 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 	return ret;
 }
 
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
+	if ((unsigned long)ptr > TASK_SIZE - size ||
+	    ((unsigned long)ptr & (size - 1)))
 		return -EFAULT;
 
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+	if (!probe_user_read(ret, ptr, size))
 		return 0;
 
-	return read_user_stack_slow(ptr, ret, 8);
+	return read_user_stack_slow(ptr, ret, size);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 4);
+	return __read_user_stack(ptr, ret, sizeof(*ret));
 }
 
 static inline int valid_user_sp(unsigned long sp, int is_64)
@@ -283,13 +276,13 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
  * the page tables.  Since this is called at interrupt level,
  * do_page_fault() won't treat a DSI as a page fault.
  */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
+	if ((unsigned long)ptr > TASK_SIZE - size ||
+	    ((unsigned long)ptr & (size - 1)))
 		return -EFAULT;
 
-	return probe_user_read(ret, ptr, sizeof(*ret));
+	return probe_user_read(ret, ptr, size);
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
@@ -312,6 +305,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
 
 #endif /* CONFIG_PPC64 */
 
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	return __read_user_stack(ptr, ret, sizeof(*ret));
+}
+
 /*
  * Layout for non-RT signal frames
  */

^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-04-03  7:13           ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  7:13 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

Michal Suchánek's on March 25, 2020 5:38 am:
> On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
>> Michal Suchanek's on March 19, 2020 10:19 pm:
>> > There are two almost identical copies for 32bit and 64bit.
>> > 
>> > The function is used only in 32bit code which will be split out in next
>> > patch so consolidate to one function.
>> > 
>> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
>> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> > ---
>> > v6:  new patch
>> > v8:  move the consolidated function out of the ifdef block.
>> > v11: rebase on top of def0bfdbd603
>> > ---
>> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
>> >  1 file changed, 24 insertions(+), 24 deletions(-)
>> > 
>> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
>> > index cbc251981209..c9a78c6e4361 100644
>> > --- a/arch/powerpc/perf/callchain.c
>> > +++ b/arch/powerpc/perf/callchain.c
>> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>> >  	return read_user_stack_slow(ptr, ret, 8);
>> >  }
>> >  
>> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>> > -{
>> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
>> > -	    ((unsigned long)ptr & 3))
>> > -		return -EFAULT;
>> > -
>> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
>> > -		return 0;
>> > -
>> > -	return read_user_stack_slow(ptr, ret, 4);
>> > -}
>> > -
>> >  static inline int valid_user_sp(unsigned long sp, int is_64)
>> >  {
>> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
>> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>> >  }
>> >  
>> >  #else  /* CONFIG_PPC64 */
>> > -/*
>> > - * On 32-bit we just access the address and let hash_page create a
>> > - * HPTE if necessary, so there is no need to fall back to reading
>> > - * the page tables.  Since this is called at interrupt level,
>> > - * do_page_fault() won't treat a DSI as a page fault.
>> > - */
>> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>> >  {
>> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
>> > -	    ((unsigned long)ptr & 3))
>> > -		return -EFAULT;
>> > -
>> > -	return probe_user_read(ret, ptr, sizeof(*ret));
>> > +	return 0;
>> >  }
>> >  
>> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>> >  
>> >  #endif /* CONFIG_PPC64 */
>> >  
>> > +/*
>> > + * On 32-bit we just access the address and let hash_page create a
>> > + * HPTE if necessary, so there is no need to fall back to reading
>> > + * the page tables.  Since this is called at interrupt level,
>> > + * do_page_fault() won't treat a DSI as a page fault.
>> > + */
>> 
>> The comment is actually probably better to stay in the 32-bit
>> read_user_stack_slow implementation. Is that function defined
>> on 32-bit purely so that you can use IS_ENABLED()? In that case
> It documents the IS_ENABLED() and that's where it is. The 32bit
> definition is only a technical detail.

Sorry for the late reply, busy trying to fix bugs in the C rewrite
series. I don't think it is the right place, it should be in the
ppc32 implementation detail. ppc64 has an equivalent comment at the
top of its read_user_stack functions.

>> I would prefer to put a BUG() there which makes it self documenting.
> Which will cause checkpatch complaints about introducing new BUG() which
> is frowned on.

It's fine in this case, that warning is about not introducing
runtime bugs, but this wouldn't be.

But... I actually don't like adding read_user_stack_slow on 32-bit
and especially not just to make IS_ENABLED work.

IMO this would be better if you really want to consolidate it

---

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index cbc251981209..ca3a599b3f54 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
  * interrupt context, so if the access faults, we read the page tables
  * to find which page (if any) is mapped and access it directly.
  */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+static int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
 {
 	int ret = -EFAULT;
 	pgd_t *pgdir;
@@ -149,28 +149,21 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 	return ret;
 }
 
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
+	if ((unsigned long)ptr > TASK_SIZE - size ||
+	    ((unsigned long)ptr & (size - 1)))
 		return -EFAULT;
 
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
+	if (!probe_user_read(ret, ptr, size))
 		return 0;
 
-	return read_user_stack_slow(ptr, ret, 8);
+	return read_user_stack_slow(ptr, ret, size);
 }
 
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 4);
+	return __read_user_stack(ptr, ret, sizeof(*ret));
 }
 
 static inline int valid_user_sp(unsigned long sp, int is_64)
@@ -283,13 +276,13 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
  * the page tables.  Since this is called at interrupt level,
  * do_page_fault() won't treat a DSI as a page fault.
  */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
+	if ((unsigned long)ptr > TASK_SIZE - size ||
+	    ((unsigned long)ptr & (size - 1)))
 		return -EFAULT;
 
-	return probe_user_read(ret, ptr, sizeof(*ret));
+	return probe_user_read(ret, ptr, size);
 }
 
 static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
@@ -312,6 +305,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
 
 #endif /* CONFIG_PPC64 */
 
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+	return __read_user_stack(ptr, ret, sizeof(*ret));
+}
+
 /*
  * Layout for non-RT signal frames
  */

^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-03-24 19:30         ` Michal Suchánek
@ 2020-04-03  7:16           ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  7:16 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Sebastian Andrzej Siewior, Claudio Carvalho, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, linuxppc-dev,
	Mark Rutland, Masahiro Yamada, Mauro Carvalho Chehab,
	Michael Neuling, Ingo Molnar, Namhyung Kim, Nayna Jain,
	Paul Mackerras, Peter Zijlstra, Rob Herring, Thomas Gleixner,
	Valentin Schneider, Alexander Viro

Michal Suchánek's on March 25, 2020 5:30 am:
> On Tue, Mar 24, 2020 at 06:54:20PM +1000, Nicholas Piggin wrote:
>> Michal Suchanek's on March 19, 2020 10:19 pm:
>> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
>> > index 4b0152108f61..a264989626fd 100644
>> > --- a/arch/powerpc/kernel/signal.c
>> > +++ b/arch/powerpc/kernel/signal.c
>> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>> >  	sigset_t *oldset = sigmask_to_save();
>> >  	struct ksignal ksig = { .sig = 0 };
>> >  	int ret;
>> > -	int is32 = is_32bit_task();
>> >  
>> >  	BUG_ON(tsk != current);
>> >  
>> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>> >  
>> >  	rseq_signal_deliver(&ksig, tsk->thread.regs);
>> >  
>> > -	if (is32) {
>> > +	if (is_32bit_task()) {
>> >          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>> >  			ret = handle_rt_signal32(&ksig, oldset, tsk);
>> >  		else
>> 
>> Unnecessary?
>> 
>> > diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
>> > index 87d95b455b83..2dcbfe38f5ac 100644
>> > --- a/arch/powerpc/kernel/syscall_64.c
>> > +++ b/arch/powerpc/kernel/syscall_64.c
>> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> >  				   long r6, long r7, long r8,
>> >  				   unsigned long r0, struct pt_regs *regs)
>> >  {
>> > -	unsigned long ti_flags;
>> >  	syscall_fn f;
>> >  
>> >  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
>> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> >  
>> >  	local_irq_enable();
>> >  
>> > -	ti_flags = current_thread_info()->flags;
>> > -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
>> > +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
>> >  		/*
>> >  		 * We use the return value of do_syscall_trace_enter() as the
>> >  		 * syscall number. If the syscall was rejected for any reason
>> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> >  	/* May be faster to do array_index_nospec? */
>> >  	barrier_nospec();
>> >  
>> > -	if (unlikely(ti_flags & _TIF_32BIT)) {
>> > +	if (unlikely(is_32bit_task())) {
>> 
>> Problem is, does this allow the load of ti_flags to be used for both
>> tests, or does test_bit make it re-load?
>> 
>> This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&
> Both points already discussed here:

Agh, I'm hopeless.

I don't think it really resolves this issue. But probably don't have time
to look at generated asm, and might never because it won't really hit
LE unless we add a 32-bit ABI. It's pretty minor though either way.

Sorry for being difficult, I really do like your patches :)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-04-03  7:16           ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  7:16 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Jordan Niethe, Sebastian Andrzej Siewior, linux-kernel,
	Paul Mackerras, Jiri Olsa, Rob Herring, Michael Neuling,
	Eric Richter, Masahiro Yamada, Nayna Jain, Peter Zijlstra,
	Ingo Molnar, Hari Bathini, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

Michal Suchánek's on March 25, 2020 5:30 am:
> On Tue, Mar 24, 2020 at 06:54:20PM +1000, Nicholas Piggin wrote:
>> Michal Suchanek's on March 19, 2020 10:19 pm:
>> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
>> > index 4b0152108f61..a264989626fd 100644
>> > --- a/arch/powerpc/kernel/signal.c
>> > +++ b/arch/powerpc/kernel/signal.c
>> > @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
>> >  	sigset_t *oldset = sigmask_to_save();
>> >  	struct ksignal ksig = { .sig = 0 };
>> >  	int ret;
>> > -	int is32 = is_32bit_task();
>> >  
>> >  	BUG_ON(tsk != current);
>> >  
>> > @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>> >  
>> >  	rseq_signal_deliver(&ksig, tsk->thread.regs);
>> >  
>> > -	if (is32) {
>> > +	if (is_32bit_task()) {
>> >          	if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>> >  			ret = handle_rt_signal32(&ksig, oldset, tsk);
>> >  		else
>> 
>> Unnecessary?
>> 
>> > diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
>> > index 87d95b455b83..2dcbfe38f5ac 100644
>> > --- a/arch/powerpc/kernel/syscall_64.c
>> > +++ b/arch/powerpc/kernel/syscall_64.c
>> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> >  				   long r6, long r7, long r8,
>> >  				   unsigned long r0, struct pt_regs *regs)
>> >  {
>> > -	unsigned long ti_flags;
>> >  	syscall_fn f;
>> >  
>> >  	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
>> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> >  
>> >  	local_irq_enable();
>> >  
>> > -	ti_flags = current_thread_info()->flags;
>> > -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
>> > +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
>> >  		/*
>> >  		 * We use the return value of do_syscall_trace_enter() as the
>> >  		 * syscall number. If the syscall was rejected for any reason
>> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>> >  	/* May be faster to do array_index_nospec? */
>> >  	barrier_nospec();
>> >  
>> > -	if (unlikely(ti_flags & _TIF_32BIT)) {
>> > +	if (unlikely(is_32bit_task())) {
>> 
>> Problem is, does this allow the load of ti_flags to be used for both
>> tests, or does test_bit make it re-load?
>> 
>> This could maybe be fixed by testing if(IS_ENABLED(CONFIG_COMPAT) &&
> Both points already discussed here:

Agh, I'm hopeless.

I don't think it really resolves this issue. But probably don't have time
to look at generated asm, and might never because it won't really hit
LE unless we add a 32-bit ABI. It's pretty minor though either way.

Sorry for being difficult, I really do like your patches :)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-03-19 12:19   ` Michal Suchanek
@ 2020-04-03  7:25     ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  7:25 UTC (permalink / raw)
  To: linuxppc-dev, Michal Suchanek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, Mark Rutland,
	Masahiro Yamada, Mauro Carvalho Chehab, Michael Neuling,
	Ingo Molnar, Michael Ellerman, Namhyung Kim, Nayna Jain,
	Paul Mackerras, Peter Zijlstra, Rob Herring, Thomas Gleixner,
	Valentin Schneider, Alexander Viro

Michal Suchanek's on March 19, 2020 10:19 pm:
> Less code means less bugs so add a knob to skip the compat stuff.
> 
> Changes in v2: saner CONFIG_COMPAT ifdefs
> Changes in v3:
>  - change llseek to 32bit instead of builing it unconditionally in fs
>  - clanup the makefile conditionals
>  - remove some ifdefs or convert to IS_DEFINED where possible
> Changes in v4:
>  - cleanup is_32bit_task and current_is_64bit
>  - more makefile cleanup
> Changes in v5:
>  - more current_is_64bit cleanup
>  - split off callchain.c 32bit and 64bit parts
> Changes in v6:
>  - cleanup makefile after split
>  - consolidate read_user_stack_32
>  - fix some checkpatch warnings
> Changes in v7:
>  - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>  - remove leftover hunk
>  - add review tags
> Changes in v8:
>  - consolidate valid_user_sp to fix it in the split callchain.c
>  - fix build errors/warnings with PPC64 !COMPAT and PPC32
> Changes in v9:
>  - remove current_is_64bit()
> Chanegs in v10:
>  - rebase, sent together with the syscall cleanup
> Changes in v11:
>  - rebase
>  - add MAINTAINERS pattern for ppc perf

These all look good to me. I had some minor comment about one patch but 
not really a big deal and there were more cleanups on top of it, so I 
don't mind if it's merged as is.

Actually I think we have a bit of stack reading fixes for 64s radix now
(not a bug fix as such, but we don't need the hash fault logic in radix),
so if I get around to that I can propose the changes in that series.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-04-03  7:25     ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  7:25 UTC (permalink / raw)
  To: linuxppc-dev, Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Michal Suchanek's on March 19, 2020 10:19 pm:
> Less code means less bugs so add a knob to skip the compat stuff.
> 
> Changes in v2: saner CONFIG_COMPAT ifdefs
> Changes in v3:
>  - change llseek to 32bit instead of builing it unconditionally in fs
>  - clanup the makefile conditionals
>  - remove some ifdefs or convert to IS_DEFINED where possible
> Changes in v4:
>  - cleanup is_32bit_task and current_is_64bit
>  - more makefile cleanup
> Changes in v5:
>  - more current_is_64bit cleanup
>  - split off callchain.c 32bit and 64bit parts
> Changes in v6:
>  - cleanup makefile after split
>  - consolidate read_user_stack_32
>  - fix some checkpatch warnings
> Changes in v7:
>  - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>  - remove leftover hunk
>  - add review tags
> Changes in v8:
>  - consolidate valid_user_sp to fix it in the split callchain.c
>  - fix build errors/warnings with PPC64 !COMPAT and PPC32
> Changes in v9:
>  - remove current_is_64bit()
> Chanegs in v10:
>  - rebase, sent together with the syscall cleanup
> Changes in v11:
>  - rebase
>  - add MAINTAINERS pattern for ppc perf

These all look good to me. I had some minor comment about one patch but 
not really a big deal and there were more cleanups on top of it, so I 
don't mind if it's merged as is.

Actually I think we have a bit of stack reading fixes for 64s radix now
(not a bug fix as such, but we don't need the hash fault logic in radix),
so if I get around to that I can propose the changes in that series.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-04-03  7:25     ` Nicholas Piggin
@ 2020-04-03  7:26       ` Christophe Leroy
  -1 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-04-03  7:26 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, Michal Suchanek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, David S. Miller, Eric W. Biederman,
	Eric Richter, Greg Kroah-Hartman, Gustavo Luiz Duarte,
	Hari Bathini, Jordan Niethe, Jiri Olsa, Jonathan Cameron,
	linux-fsdevel, linux-kernel, Mark Rutland, Masahiro Yamada,
	Mauro Carvalho Chehab, Michael Neuling, Ingo Molnar,
	Michael Ellerman, Namhyung Kim, Nayna Jain, Paul Mackerras,
	Peter Zijlstra, Rob Herring, Thomas Gleixner, Valentin Schneider,
	Alexander Viro



Le 03/04/2020 à 09:25, Nicholas Piggin a écrit :
> Michal Suchanek's on March 19, 2020 10:19 pm:
>> Less code means less bugs so add a knob to skip the compat stuff.
>>
>> Changes in v2: saner CONFIG_COMPAT ifdefs
>> Changes in v3:
>>   - change llseek to 32bit instead of builing it unconditionally in fs
>>   - clanup the makefile conditionals
>>   - remove some ifdefs or convert to IS_DEFINED where possible
>> Changes in v4:
>>   - cleanup is_32bit_task and current_is_64bit
>>   - more makefile cleanup
>> Changes in v5:
>>   - more current_is_64bit cleanup
>>   - split off callchain.c 32bit and 64bit parts
>> Changes in v6:
>>   - cleanup makefile after split
>>   - consolidate read_user_stack_32
>>   - fix some checkpatch warnings
>> Changes in v7:
>>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>>   - remove leftover hunk
>>   - add review tags
>> Changes in v8:
>>   - consolidate valid_user_sp to fix it in the split callchain.c
>>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
>> Changes in v9:
>>   - remove current_is_64bit()
>> Chanegs in v10:
>>   - rebase, sent together with the syscall cleanup
>> Changes in v11:
>>   - rebase
>>   - add MAINTAINERS pattern for ppc perf
> 
> These all look good to me. I had some minor comment about one patch but
> not really a big deal and there were more cleanups on top of it, so I
> don't mind if it's merged as is.
> 
> Actually I think we have a bit of stack reading fixes for 64s radix now
> (not a bug fix as such, but we don't need the hash fault logic in radix),
> so if I get around to that I can propose the changes in that series.
> 

As far as I can see, there is a v12

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-04-03  7:26       ` Christophe Leroy
  0 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-04-03  7:26 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann



Le 03/04/2020 à 09:25, Nicholas Piggin a écrit :
> Michal Suchanek's on March 19, 2020 10:19 pm:
>> Less code means less bugs so add a knob to skip the compat stuff.
>>
>> Changes in v2: saner CONFIG_COMPAT ifdefs
>> Changes in v3:
>>   - change llseek to 32bit instead of builing it unconditionally in fs
>>   - clanup the makefile conditionals
>>   - remove some ifdefs or convert to IS_DEFINED where possible
>> Changes in v4:
>>   - cleanup is_32bit_task and current_is_64bit
>>   - more makefile cleanup
>> Changes in v5:
>>   - more current_is_64bit cleanup
>>   - split off callchain.c 32bit and 64bit parts
>> Changes in v6:
>>   - cleanup makefile after split
>>   - consolidate read_user_stack_32
>>   - fix some checkpatch warnings
>> Changes in v7:
>>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>>   - remove leftover hunk
>>   - add review tags
>> Changes in v8:
>>   - consolidate valid_user_sp to fix it in the split callchain.c
>>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
>> Changes in v9:
>>   - remove current_is_64bit()
>> Chanegs in v10:
>>   - rebase, sent together with the syscall cleanup
>> Changes in v11:
>>   - rebase
>>   - add MAINTAINERS pattern for ppc perf
> 
> These all look good to me. I had some minor comment about one patch but
> not really a big deal and there were more cleanups on top of it, so I
> don't mind if it's merged as is.
> 
> Actually I think we have a bit of stack reading fixes for 64s radix now
> (not a bug fix as such, but we don't need the hash fault logic in radix),
> so if I get around to that I can propose the changes in that series.
> 

As far as I can see, there is a v12

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-04-03  7:26       ` Christophe Leroy
@ 2020-04-03  9:43         ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  9:43 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, Michal Suchanek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, David S. Miller, Eric W. Biederman,
	Eric Richter, Greg Kroah-Hartman, Gustavo Luiz Duarte,
	Hari Bathini, Jordan Niethe, Jiri Olsa, Jonathan Cameron,
	linux-fsdevel, linux-kernel, Mark Rutland, Masahiro Yamada,
	Mauro Carvalho Chehab, Michael Neuling, Ingo Molnar,
	Michael Ellerman, Namhyung Kim, Nayna Jain, Paul Mackerras,
	Peter Zijlstra, Rob Herring, Thomas Gleixner, Valentin Schneider,
	Alexander Viro

Christophe Leroy's on April 3, 2020 5:26 pm:
> 
> 
> Le 03/04/2020 à 09:25, Nicholas Piggin a écrit :
>> Michal Suchanek's on March 19, 2020 10:19 pm:
>>> Less code means less bugs so add a knob to skip the compat stuff.
>>>
>>> Changes in v2: saner CONFIG_COMPAT ifdefs
>>> Changes in v3:
>>>   - change llseek to 32bit instead of builing it unconditionally in fs
>>>   - clanup the makefile conditionals
>>>   - remove some ifdefs or convert to IS_DEFINED where possible
>>> Changes in v4:
>>>   - cleanup is_32bit_task and current_is_64bit
>>>   - more makefile cleanup
>>> Changes in v5:
>>>   - more current_is_64bit cleanup
>>>   - split off callchain.c 32bit and 64bit parts
>>> Changes in v6:
>>>   - cleanup makefile after split
>>>   - consolidate read_user_stack_32
>>>   - fix some checkpatch warnings
>>> Changes in v7:
>>>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>>>   - remove leftover hunk
>>>   - add review tags
>>> Changes in v8:
>>>   - consolidate valid_user_sp to fix it in the split callchain.c
>>>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
>>> Changes in v9:
>>>   - remove current_is_64bit()
>>> Chanegs in v10:
>>>   - rebase, sent together with the syscall cleanup
>>> Changes in v11:
>>>   - rebase
>>>   - add MAINTAINERS pattern for ppc perf
>> 
>> These all look good to me. I had some minor comment about one patch but
>> not really a big deal and there were more cleanups on top of it, so I
>> don't mind if it's merged as is.
>> 
>> Actually I think we have a bit of stack reading fixes for 64s radix now
>> (not a bug fix as such, but we don't need the hash fault logic in radix),
>> so if I get around to that I can propose the changes in that series.
>> 
> 
> As far as I can see, there is a v12

For the most part I was looking at the patches in mpe's next-test
tree on github, if that's the v12 series, same comment applies but
it's a pretty small nitpick.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-04-03  9:43         ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03  9:43 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Christophe Leroy's on April 3, 2020 5:26 pm:
> 
> 
> Le 03/04/2020 à 09:25, Nicholas Piggin a écrit :
>> Michal Suchanek's on March 19, 2020 10:19 pm:
>>> Less code means less bugs so add a knob to skip the compat stuff.
>>>
>>> Changes in v2: saner CONFIG_COMPAT ifdefs
>>> Changes in v3:
>>>   - change llseek to 32bit instead of builing it unconditionally in fs
>>>   - clanup the makefile conditionals
>>>   - remove some ifdefs or convert to IS_DEFINED where possible
>>> Changes in v4:
>>>   - cleanup is_32bit_task and current_is_64bit
>>>   - more makefile cleanup
>>> Changes in v5:
>>>   - more current_is_64bit cleanup
>>>   - split off callchain.c 32bit and 64bit parts
>>> Changes in v6:
>>>   - cleanup makefile after split
>>>   - consolidate read_user_stack_32
>>>   - fix some checkpatch warnings
>>> Changes in v7:
>>>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>>>   - remove leftover hunk
>>>   - add review tags
>>> Changes in v8:
>>>   - consolidate valid_user_sp to fix it in the split callchain.c
>>>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
>>> Changes in v9:
>>>   - remove current_is_64bit()
>>> Chanegs in v10:
>>>   - rebase, sent together with the syscall cleanup
>>> Changes in v11:
>>>   - rebase
>>>   - add MAINTAINERS pattern for ppc perf
>> 
>> These all look good to me. I had some minor comment about one patch but
>> not really a big deal and there were more cleanups on top of it, so I
>> don't mind if it's merged as is.
>> 
>> Actually I think we have a bit of stack reading fixes for 64s radix now
>> (not a bug fix as such, but we don't need the hash fault logic in radix),
>> so if I get around to that I can propose the changes in that series.
>> 
> 
> As far as I can see, there is a v12

For the most part I was looking at the patches in mpe's next-test
tree on github, if that's the v12 series, same comment applies but
it's a pretty small nitpick.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-04-03  7:13           ` Nicholas Piggin
@ 2020-04-03 10:52             ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-03 10:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, linuxppc-dev,
	Mark Rutland, Masahiro Yamada, Mauro Carvalho Chehab,
	Michael Neuling, Ingo Molnar, Michael Ellerman, Namhyung Kim,
	Nayna Jain, Paul Mackerras, Peter Zijlstra, Rob Herring,
	Thomas Gleixner, Valentin Schneider, Alexander Viro

Hello,

there are 3 variants of the function

read_user_stack_64

32bit read_user_stack_32
64bit read_user_Stack_32

On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on March 25, 2020 5:38 am:
> > On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> >> Michal Suchanek's on March 19, 2020 10:19 pm:
> >> > There are two almost identical copies for 32bit and 64bit.
> >> > 
> >> > The function is used only in 32bit code which will be split out in next
> >> > patch so consolidate to one function.
> >> > 
> >> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> >> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> > ---
> >> > v6:  new patch
> >> > v8:  move the consolidated function out of the ifdef block.
> >> > v11: rebase on top of def0bfdbd603
> >> > ---
> >> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
> >> >  1 file changed, 24 insertions(+), 24 deletions(-)
> >> > 
> >> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> >> > index cbc251981209..c9a78c6e4361 100644
> >> > --- a/arch/powerpc/perf/callchain.c
> >> > +++ b/arch/powerpc/perf/callchain.c
> >> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> >> >  	return read_user_stack_slow(ptr, ret, 8);
> >> >  }
> >> >  
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > -{
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> >> > -		return 0;
> >> > -
> >> > -	return read_user_stack_slow(ptr, ret, 4);
> >> > -}
> >> > -
> >> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  {
> >> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> >> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> >  }
> >> >  
> >> >  #else  /* CONFIG_PPC64 */
> >> > -/*
> >> > - * On 32-bit we just access the address and let hash_page create a
> >> > - * HPTE if necessary, so there is no need to fall back to reading
> >> > - * the page tables.  Since this is called at interrupt level,
> >> > - * do_page_fault() won't treat a DSI as a page fault.
> >> > - */
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> >  {
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	return probe_user_read(ret, ptr, sizeof(*ret));
> >> > +	return 0;
> >> >  }
> >> >  
> >> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  
> >> >  #endif /* CONFIG_PPC64 */
> >> >  
> >> > +/*
> >> > + * On 32-bit we just access the address and let hash_page create a
> >> > + * HPTE if necessary, so there is no need to fall back to reading
> >> > + * the page tables.  Since this is called at interrupt level,
> >> > + * do_page_fault() won't treat a DSI as a page fault.
> >> > + */
> >> 
> >> The comment is actually probably better to stay in the 32-bit
> >> read_user_stack_slow implementation. Is that function defined
> >> on 32-bit purely so that you can use IS_ENABLED()? In that case
> > It documents the IS_ENABLED() and that's where it is. The 32bit
> > definition is only a technical detail.
> 
> Sorry for the late reply, busy trying to fix bugs in the C rewrite
> series. I don't think it is the right place, it should be in the
> ppc32 implementation detail. ppc64 has an equivalent comment at the
> top of its read_user_stack functions.
> 
> >> I would prefer to put a BUG() there which makes it self documenting.
> > Which will cause checkpatch complaints about introducing new BUG() which
> > is frowned on.
> 
> It's fine in this case, that warning is about not introducing
> runtime bugs, but this wouldn't be.
> 
> But... I actually don't like adding read_user_stack_slow on 32-bit
> and especially not just to make IS_ENABLED work.
> 
> IMO this would be better if you really want to consolidate it
> 
> ---
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..ca3a599b3f54 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>   * interrupt context, so if the access faults, we read the page tables
>   * to find which page (if any) is mapped and access it directly.
>   */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +static int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
>  {
>  	int ret = -EFAULT;
>  	pgd_t *pgdir;
> @@ -149,28 +149,21 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  	return ret;
>  }
>  
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> -	    ((unsigned long)ptr & 7))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> +	if (!probe_user_read(ret, ptr, size))
>  		return 0;
>  
> -	return read_user_stack_slow(ptr, ret, 8);
> +	return read_user_stack_slow(ptr, ret, size);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 4);
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
>  }
>  
>  static inline int valid_user_sp(unsigned long sp, int is_64)
> @@ -283,13 +276,13 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>   * the page tables.  Since this is called at interrupt level,
>   * do_page_fault() won't treat a DSI as a page fault.
>   */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	return probe_user_read(ret, ptr, sizeof(*ret));
> +	return probe_user_read(ret, ptr, size);
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> @@ -312,6 +305,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>  
>  #endif /* CONFIG_PPC64 */
>  
> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
Does not work for 64bit read_user_stack_32 ^ this should be 4.

Other than that it should preserve the existing logic just fine.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-04-03 10:52             ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-03 10:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

Hello,

there are 3 variants of the function

read_user_stack_64

32bit read_user_stack_32
64bit read_user_Stack_32

On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on March 25, 2020 5:38 am:
> > On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> >> Michal Suchanek's on March 19, 2020 10:19 pm:
> >> > There are two almost identical copies for 32bit and 64bit.
> >> > 
> >> > The function is used only in 32bit code which will be split out in next
> >> > patch so consolidate to one function.
> >> > 
> >> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> >> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> > ---
> >> > v6:  new patch
> >> > v8:  move the consolidated function out of the ifdef block.
> >> > v11: rebase on top of def0bfdbd603
> >> > ---
> >> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
> >> >  1 file changed, 24 insertions(+), 24 deletions(-)
> >> > 
> >> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> >> > index cbc251981209..c9a78c6e4361 100644
> >> > --- a/arch/powerpc/perf/callchain.c
> >> > +++ b/arch/powerpc/perf/callchain.c
> >> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> >> >  	return read_user_stack_slow(ptr, ret, 8);
> >> >  }
> >> >  
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > -{
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> >> > -		return 0;
> >> > -
> >> > -	return read_user_stack_slow(ptr, ret, 4);
> >> > -}
> >> > -
> >> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  {
> >> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> >> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> >  }
> >> >  
> >> >  #else  /* CONFIG_PPC64 */
> >> > -/*
> >> > - * On 32-bit we just access the address and let hash_page create a
> >> > - * HPTE if necessary, so there is no need to fall back to reading
> >> > - * the page tables.  Since this is called at interrupt level,
> >> > - * do_page_fault() won't treat a DSI as a page fault.
> >> > - */
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> >  {
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	return probe_user_read(ret, ptr, sizeof(*ret));
> >> > +	return 0;
> >> >  }
> >> >  
> >> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  
> >> >  #endif /* CONFIG_PPC64 */
> >> >  
> >> > +/*
> >> > + * On 32-bit we just access the address and let hash_page create a
> >> > + * HPTE if necessary, so there is no need to fall back to reading
> >> > + * the page tables.  Since this is called at interrupt level,
> >> > + * do_page_fault() won't treat a DSI as a page fault.
> >> > + */
> >> 
> >> The comment is actually probably better to stay in the 32-bit
> >> read_user_stack_slow implementation. Is that function defined
> >> on 32-bit purely so that you can use IS_ENABLED()? In that case
> > It documents the IS_ENABLED() and that's where it is. The 32bit
> > definition is only a technical detail.
> 
> Sorry for the late reply, busy trying to fix bugs in the C rewrite
> series. I don't think it is the right place, it should be in the
> ppc32 implementation detail. ppc64 has an equivalent comment at the
> top of its read_user_stack functions.
> 
> >> I would prefer to put a BUG() there which makes it self documenting.
> > Which will cause checkpatch complaints about introducing new BUG() which
> > is frowned on.
> 
> It's fine in this case, that warning is about not introducing
> runtime bugs, but this wouldn't be.
> 
> But... I actually don't like adding read_user_stack_slow on 32-bit
> and especially not just to make IS_ENABLED work.
> 
> IMO this would be better if you really want to consolidate it
> 
> ---
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..ca3a599b3f54 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>   * interrupt context, so if the access faults, we read the page tables
>   * to find which page (if any) is mapped and access it directly.
>   */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +static int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
>  {
>  	int ret = -EFAULT;
>  	pgd_t *pgdir;
> @@ -149,28 +149,21 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  	return ret;
>  }
>  
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> -	    ((unsigned long)ptr & 7))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> +	if (!probe_user_read(ret, ptr, size))
>  		return 0;
>  
> -	return read_user_stack_slow(ptr, ret, 8);
> +	return read_user_stack_slow(ptr, ret, size);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 4);
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
>  }
>  
>  static inline int valid_user_sp(unsigned long sp, int is_64)
> @@ -283,13 +276,13 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>   * the page tables.  Since this is called at interrupt level,
>   * do_page_fault() won't treat a DSI as a page fault.
>   */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	return probe_user_read(ret, ptr, sizeof(*ret));
> +	return probe_user_read(ret, ptr, size);
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> @@ -312,6 +305,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>  
>  #endif /* CONFIG_PPC64 */
>  
> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
Does not work for 64bit read_user_stack_32 ^ this should be 4.

Other than that it should preserve the existing logic just fine.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-04-03 10:52             ` Michal Suchánek
@ 2020-04-03 11:26               ` Nicholas Piggin
  -1 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03 11:26 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, linuxppc-dev,
	Mark Rutland, Masahiro Yamada, Mauro Carvalho Chehab,
	Michael Neuling, Ingo Molnar, Michael Ellerman, Namhyung Kim,
	Nayna Jain, Paul Mackerras, Peter Zijlstra, Rob Herring,
	Thomas Gleixner, Valentin Schneider, Alexander Viro

Michal Suchánek's on April 3, 2020 8:52 pm:
> Hello,
> 
> there are 3 variants of the function
> 
> read_user_stack_64
> 
> 32bit read_user_stack_32
> 64bit read_user_Stack_32

Right.

> On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
[...]
>>  #endif /* CONFIG_PPC64 */
>>  
>> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>> +{
>> +	return __read_user_stack(ptr, ret, sizeof(*ret));
> Does not work for 64bit read_user_stack_32 ^ this should be 4.
> 
> Other than that it should preserve the existing logic just fine.

sizeof(int) == 4 on 64bit so it should work.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-04-03 11:26               ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2020-04-03 11:26 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

Michal Suchánek's on April 3, 2020 8:52 pm:
> Hello,
> 
> there are 3 variants of the function
> 
> read_user_stack_64
> 
> 32bit read_user_stack_32
> 64bit read_user_Stack_32

Right.

> On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
[...]
>>  #endif /* CONFIG_PPC64 */
>>  
>> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
>> +{
>> +	return __read_user_stack(ptr, ret, sizeof(*ret));
> Does not work for 64bit read_user_stack_32 ^ this should be 4.
> 
> Other than that it should preserve the existing logic just fine.

sizeof(int) == 4 on 64bit so it should work.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-04-03 11:26               ` Nicholas Piggin
@ 2020-04-03 11:51                 ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-03 11:51 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, linuxppc-dev,
	Mark Rutland, Masahiro Yamada, Mauro Carvalho Chehab,
	Michael Neuling, Ingo Molnar, Michael Ellerman, Namhyung Kim,
	Nayna Jain, Paul Mackerras, Peter Zijlstra, Rob Herring,
	Thomas Gleixner, Valentin Schneider, Alexander Viro

On Fri, Apr 03, 2020 at 09:26:27PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on April 3, 2020 8:52 pm:
> > Hello,
> > 
> > there are 3 variants of the function
> > 
> > read_user_stack_64
> > 
> > 32bit read_user_stack_32
> > 64bit read_user_Stack_32
> 
> Right.
> 
> > On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> [...]
> >>  #endif /* CONFIG_PPC64 */
> >>  
> >> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> +{
> >> +	return __read_user_stack(ptr, ret, sizeof(*ret));
> > Does not work for 64bit read_user_stack_32 ^ this should be 4.
> > 
> > Other than that it should preserve the existing logic just fine.
> 
> sizeof(int) == 4 on 64bit so it should work.
> 
Right, the type is different for the 32bit and 64bit version.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-04-03 11:51                 ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-03 11:51 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, Apr 03, 2020 at 09:26:27PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on April 3, 2020 8:52 pm:
> > Hello,
> > 
> > there are 3 variants of the function
> > 
> > read_user_stack_64
> > 
> > 32bit read_user_stack_32
> > 64bit read_user_Stack_32
> 
> Right.
> 
> > On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> [...]
> >>  #endif /* CONFIG_PPC64 */
> >>  
> >> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> +{
> >> +	return __read_user_stack(ptr, ret, sizeof(*ret));
> > Does not work for 64bit read_user_stack_32 ^ this should be 4.
> > 
> > Other than that it should preserve the existing logic just fine.
> 
> sizeof(int) == 4 on 64bit so it should work.
> 
Right, the type is different for the 32bit and 64bit version.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
  2020-04-03  9:43         ` Nicholas Piggin
@ 2020-04-05  0:40           ` Michael Ellerman
  -1 siblings, 0 replies; 161+ messages in thread
From: Michael Ellerman @ 2020-04-05  0:40 UTC (permalink / raw)
  To: Nicholas Piggin, Christophe Leroy, linuxppc-dev, Michal Suchanek
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, David S. Miller, Eric W. Biederman,
	Eric Richter, Greg Kroah-Hartman, Gustavo Luiz Duarte,
	Hari Bathini, Jordan Niethe, Jiri Olsa, Jonathan Cameron,
	linux-fsdevel, linux-kernel, Mark Rutland, Masahiro Yamada,
	Mauro Carvalho Chehab, Michael Neuling, Ingo Molnar,
	Namhyung Kim, Nayna Jain, Paul Mackerras, Peter Zijlstra,
	Rob Herring, Thomas Gleixner, Valentin Schneider, Alexander Viro

Nicholas Piggin <npiggin@gmail.com> writes:
> Christophe Leroy's on April 3, 2020 5:26 pm:
>> Le 03/04/2020 à 09:25, Nicholas Piggin a écrit :
>>> Michal Suchanek's on March 19, 2020 10:19 pm:
>>>> Less code means less bugs so add a knob to skip the compat stuff.
>>>>
>>>> Changes in v2: saner CONFIG_COMPAT ifdefs
>>>> Changes in v3:
>>>>   - change llseek to 32bit instead of builing it unconditionally in fs
>>>>   - clanup the makefile conditionals
>>>>   - remove some ifdefs or convert to IS_DEFINED where possible
>>>> Changes in v4:
>>>>   - cleanup is_32bit_task and current_is_64bit
>>>>   - more makefile cleanup
>>>> Changes in v5:
>>>>   - more current_is_64bit cleanup
>>>>   - split off callchain.c 32bit and 64bit parts
>>>> Changes in v6:
>>>>   - cleanup makefile after split
>>>>   - consolidate read_user_stack_32
>>>>   - fix some checkpatch warnings
>>>> Changes in v7:
>>>>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>>>>   - remove leftover hunk
>>>>   - add review tags
>>>> Changes in v8:
>>>>   - consolidate valid_user_sp to fix it in the split callchain.c
>>>>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
>>>> Changes in v9:
>>>>   - remove current_is_64bit()
>>>> Chanegs in v10:
>>>>   - rebase, sent together with the syscall cleanup
>>>> Changes in v11:
>>>>   - rebase
>>>>   - add MAINTAINERS pattern for ppc perf
>>> 
>>> These all look good to me. I had some minor comment about one patch but
>>> not really a big deal and there were more cleanups on top of it, so I
>>> don't mind if it's merged as is.
>>> 
>>> Actually I think we have a bit of stack reading fixes for 64s radix now
>>> (not a bug fix as such, but we don't need the hash fault logic in radix),
>>> so if I get around to that I can propose the changes in that series.
>>> 
>> 
>> As far as I can see, there is a v12
>
> For the most part I was looking at the patches in mpe's next-test
> tree on github, if that's the v12 series, same comment applies but
> it's a pretty small nitpick.

Yeah I have v12 in my tree.

This has floated around long enough (our fault), so I'm going to take it
and we can fix anything up later.

cheers

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 0/8] Disable compat cruft on ppc64le v11
@ 2020-04-05  0:40           ` Michael Ellerman
  0 siblings, 0 replies; 161+ messages in thread
From: Michael Ellerman @ 2020-04-05  0:40 UTC (permalink / raw)
  To: Nicholas Piggin, Christophe Leroy, linuxppc-dev, Michal Suchanek
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann

Nicholas Piggin <npiggin@gmail.com> writes:
> Christophe Leroy's on April 3, 2020 5:26 pm:
>> Le 03/04/2020 à 09:25, Nicholas Piggin a écrit :
>>> Michal Suchanek's on March 19, 2020 10:19 pm:
>>>> Less code means less bugs so add a knob to skip the compat stuff.
>>>>
>>>> Changes in v2: saner CONFIG_COMPAT ifdefs
>>>> Changes in v3:
>>>>   - change llseek to 32bit instead of builing it unconditionally in fs
>>>>   - clanup the makefile conditionals
>>>>   - remove some ifdefs or convert to IS_DEFINED where possible
>>>> Changes in v4:
>>>>   - cleanup is_32bit_task and current_is_64bit
>>>>   - more makefile cleanup
>>>> Changes in v5:
>>>>   - more current_is_64bit cleanup
>>>>   - split off callchain.c 32bit and 64bit parts
>>>> Changes in v6:
>>>>   - cleanup makefile after split
>>>>   - consolidate read_user_stack_32
>>>>   - fix some checkpatch warnings
>>>> Changes in v7:
>>>>   - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
>>>>   - remove leftover hunk
>>>>   - add review tags
>>>> Changes in v8:
>>>>   - consolidate valid_user_sp to fix it in the split callchain.c
>>>>   - fix build errors/warnings with PPC64 !COMPAT and PPC32
>>>> Changes in v9:
>>>>   - remove current_is_64bit()
>>>> Chanegs in v10:
>>>>   - rebase, sent together with the syscall cleanup
>>>> Changes in v11:
>>>>   - rebase
>>>>   - add MAINTAINERS pattern for ppc perf
>>> 
>>> These all look good to me. I had some minor comment about one patch but
>>> not really a big deal and there were more cleanups on top of it, so I
>>> don't mind if it's merged as is.
>>> 
>>> Actually I think we have a bit of stack reading fixes for 64s radix now
>>> (not a bug fix as such, but we don't need the hash fault logic in radix),
>>> so if I get around to that I can propose the changes in that series.
>>> 
>> 
>> As far as I can see, there is a v12
>
> For the most part I was looking at the patches in mpe's next-test
> tree on github, if that's the v12 series, same comment applies but
> it's a pretty small nitpick.

Yeah I have v12 in my tree.

This has floated around long enough (our fault), so I'm going to take it
and we can fix anything up later.

cheers

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  2020-03-20 10:20     ` Michal Suchanek
  (?)
@ 2020-04-06 13:05     ` Michael Ellerman
  -1 siblings, 0 replies; 161+ messages in thread
From: Michael Ellerman @ 2020-04-06 13:05 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Mark Rutland, ,
	Gustavo Luiz Duarte, Peter Zijlstra, Jordan Niethe,
	Sebastian Andrzej Siewior, Claudio Carvalho, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, el.com, Alexander Shishkin,
	Ingo Molnar, Hari Bathini, Michal Suchanek, Valentin Schneider,
	Arnd Bergmann, Andy Shevchenko, Arnaldo Carvalho de Melo,
	Alexander Viro, Jonathan Cameron, Namhyung Kim, Thomas Gleixner,
	Allison Randal, Greg Kroah-Hartman, Nicholas Piggin,
	linux-kernel, Eric Richter, Eric W. Biederman, linux-fsdevel,
	David S. Miller, Thiago Jung Bauermann

On Fri, 2020-03-20 at 10:20:12 UTC, Michal Suchanek wrote:
> This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
> unneeded __ARCH_WANT_SYS_LLSEEK macro")
> 
> When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.
> 
> There is resistance to both removing the llseek syscall from the 64bit
> syscall tables and building the llseek interface unconditionally.
> 
> Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
> Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/
> 
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

Patches 1-7 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9e62ccec3ba0a17c8050ea78500dfdd0e4c5c0cc

cheers

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
  2020-04-03  7:13           ` Nicholas Piggin
@ 2020-04-06 20:52             ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-06 20:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Allison Randal,
	Andy Shevchenko, Arnd Bergmann, Thiago Jung Bauermann,
	Benjamin Herrenschmidt, Sebastian Andrzej Siewior,
	Claudio Carvalho, Christophe Leroy, David S. Miller,
	Eric W. Biederman, Eric Richter, Greg Kroah-Hartman,
	Gustavo Luiz Duarte, Hari Bathini, Jordan Niethe, Jiri Olsa,
	Jonathan Cameron, linux-fsdevel, linux-kernel, linuxppc-dev,
	Mark Rutland, Masahiro Yamada, Mauro Carvalho Chehab,
	Michael Neuling, Ingo Molnar, Michael Ellerman, Namhyung Kim,
	Nayna Jain, Paul Mackerras, Peter Zijlstra, Rob Herring,
	Thomas Gleixner, Valentin Schneider, Alexander Viro

On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on March 25, 2020 5:38 am:
> > On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> >> Michal Suchanek's on March 19, 2020 10:19 pm:
> >> > There are two almost identical copies for 32bit and 64bit.
> >> > 
> >> > The function is used only in 32bit code which will be split out in next
> >> > patch so consolidate to one function.
> >> > 
> >> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> >> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> > ---
> >> > v6:  new patch
> >> > v8:  move the consolidated function out of the ifdef block.
> >> > v11: rebase on top of def0bfdbd603
> >> > ---
> >> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
> >> >  1 file changed, 24 insertions(+), 24 deletions(-)
> >> > 
> >> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> >> > index cbc251981209..c9a78c6e4361 100644
> >> > --- a/arch/powerpc/perf/callchain.c
> >> > +++ b/arch/powerpc/perf/callchain.c
> >> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> >> >  	return read_user_stack_slow(ptr, ret, 8);
> >> >  }
> >> >  
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > -{
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> >> > -		return 0;
> >> > -
> >> > -	return read_user_stack_slow(ptr, ret, 4);
> >> > -}
> >> > -
> >> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  {
> >> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> >> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> >  }
> >> >  
> >> >  #else  /* CONFIG_PPC64 */
> >> > -/*
> >> > - * On 32-bit we just access the address and let hash_page create a
> >> > - * HPTE if necessary, so there is no need to fall back to reading
> >> > - * the page tables.  Since this is called at interrupt level,
> >> > - * do_page_fault() won't treat a DSI as a page fault.
> >> > - */
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> >  {
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	return probe_user_read(ret, ptr, sizeof(*ret));
> >> > +	return 0;
> >> >  }
> >> >  
> >> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  
> >> >  #endif /* CONFIG_PPC64 */
> >> >  
> >> > +/*
> >> > + * On 32-bit we just access the address and let hash_page create a
> >> > + * HPTE if necessary, so there is no need to fall back to reading
> >> > + * the page tables.  Since this is called at interrupt level,
> >> > + * do_page_fault() won't treat a DSI as a page fault.
> >> > + */
> >> 
> >> The comment is actually probably better to stay in the 32-bit
> >> read_user_stack_slow implementation. Is that function defined
> >> on 32-bit purely so that you can use IS_ENABLED()? In that case
> > It documents the IS_ENABLED() and that's where it is. The 32bit
> > definition is only a technical detail.
> 
> Sorry for the late reply, busy trying to fix bugs in the C rewrite
> series. I don't think it is the right place, it should be in the
> ppc32 implementation detail.
Which does not exist anymore after the 32bit and 64bit part is split.
> ppc64 has an equivalent comment at the top of its read_user_stack functions.
> 
> >> I would prefer to put a BUG() there which makes it self documenting.
> > Which will cause checkpatch complaints about introducing new BUG() which
> > is frowned on.
> 
> It's fine in this case, that warning is about not introducing
> runtime bugs, but this wouldn't be.
> 
> But... I actually don't like adding read_user_stack_slow on 32-bit
> and especially not just to make IS_ENABLED work.
That's to not break build at this point. Later the function is removed.
> 
> IMO this would be better if you really want to consolidate it
> 
> ---
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..ca3a599b3f54 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>   * interrupt context, so if the access faults, we read the page tables
>   * to find which page (if any) is mapped and access it directly.
>   */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +static int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
>  {
>  	int ret = -EFAULT;
>  	pgd_t *pgdir;
> @@ -149,28 +149,21 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  	return ret;
>  }
>  
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> -	    ((unsigned long)ptr & 7))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> +	if (!probe_user_read(ret, ptr, size))
>  		return 0;
>  
> -	return read_user_stack_slow(ptr, ret, 8);
> +	return read_user_stack_slow(ptr, ret, size);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 4);
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
>  }
>  
>  static inline int valid_user_sp(unsigned long sp, int is_64)
> @@ -283,13 +276,13 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>   * the page tables.  Since this is called at interrupt level,
>   * do_page_fault() won't treat a DSI as a page fault.
>   */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	return probe_user_read(ret, ptr, sizeof(*ret));
> +	return probe_user_read(ret, ptr, size);
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> @@ -312,6 +305,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>  
>  #endif /* CONFIG_PPC64 */
>  
> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
> +}
> +
>  /*
>   * Layout for non-RT signal frames
>   */

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32
@ 2020-04-06 20:52             ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-06 20:52 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Mark Rutland, Gustavo Luiz Duarte, Alexander Shishkin,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Eric Richter,
	Masahiro Yamada, Nayna Jain, Peter Zijlstra, Ingo Molnar,
	Hari Bathini, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Allison Randal,
	Greg Kroah-Hartman, Claudio Carvalho, Mauro Carvalho Chehab,
	Eric W. Biederman, linux-fsdevel, linuxppc-dev, David S. Miller,
	Thiago Jung Bauermann

On Fri, Apr 03, 2020 at 05:13:25PM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on March 25, 2020 5:38 am:
> > On Tue, Mar 24, 2020 at 06:48:20PM +1000, Nicholas Piggin wrote:
> >> Michal Suchanek's on March 19, 2020 10:19 pm:
> >> > There are two almost identical copies for 32bit and 64bit.
> >> > 
> >> > The function is used only in 32bit code which will be split out in next
> >> > patch so consolidate to one function.
> >> > 
> >> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> >> > Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> > ---
> >> > v6:  new patch
> >> > v8:  move the consolidated function out of the ifdef block.
> >> > v11: rebase on top of def0bfdbd603
> >> > ---
> >> >  arch/powerpc/perf/callchain.c | 48 +++++++++++++++++------------------
> >> >  1 file changed, 24 insertions(+), 24 deletions(-)
> >> > 
> >> > diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> >> > index cbc251981209..c9a78c6e4361 100644
> >> > --- a/arch/powerpc/perf/callchain.c
> >> > +++ b/arch/powerpc/perf/callchain.c
> >> > @@ -161,18 +161,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> >> >  	return read_user_stack_slow(ptr, ret, 8);
> >> >  }
> >> >  
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > -{
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> >> > -		return 0;
> >> > -
> >> > -	return read_user_stack_slow(ptr, ret, 4);
> >> > -}
> >> > -
> >> >  static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  {
> >> >  	if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> >> > @@ -277,19 +265,9 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> >  }
> >> >  
> >> >  #else  /* CONFIG_PPC64 */
> >> > -/*
> >> > - * On 32-bit we just access the address and let hash_page create a
> >> > - * HPTE if necessary, so there is no need to fall back to reading
> >> > - * the page tables.  Since this is called at interrupt level,
> >> > - * do_page_fault() won't treat a DSI as a page fault.
> >> > - */
> >> > -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> >> > +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> >  {
> >> > -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> >> > -	    ((unsigned long)ptr & 3))
> >> > -		return -EFAULT;
> >> > -
> >> > -	return probe_user_read(ret, ptr, sizeof(*ret));
> >> > +	return 0;
> >> >  }
> >> >  
> >> >  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >> > @@ -312,6 +290,28 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
> >> >  
> >> >  #endif /* CONFIG_PPC64 */
> >> >  
> >> > +/*
> >> > + * On 32-bit we just access the address and let hash_page create a
> >> > + * HPTE if necessary, so there is no need to fall back to reading
> >> > + * the page tables.  Since this is called at interrupt level,
> >> > + * do_page_fault() won't treat a DSI as a page fault.
> >> > + */
> >> 
> >> The comment is actually probably better to stay in the 32-bit
> >> read_user_stack_slow implementation. Is that function defined
> >> on 32-bit purely so that you can use IS_ENABLED()? In that case
> > It documents the IS_ENABLED() and that's where it is. The 32bit
> > definition is only a technical detail.
> 
> Sorry for the late reply, busy trying to fix bugs in the C rewrite
> series. I don't think it is the right place, it should be in the
> ppc32 implementation detail.
Which does not exist anymore after the 32bit and 64bit part is split.
> ppc64 has an equivalent comment at the top of its read_user_stack functions.
> 
> >> I would prefer to put a BUG() there which makes it self documenting.
> > Which will cause checkpatch complaints about introducing new BUG() which
> > is frowned on.
> 
> It's fine in this case, that warning is about not introducing
> runtime bugs, but this wouldn't be.
> 
> But... I actually don't like adding read_user_stack_slow on 32-bit
> and especially not just to make IS_ENABLED work.
That's to not break build at this point. Later the function is removed.
> 
> IMO this would be better if you really want to consolidate it
> 
> ---
> 
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index cbc251981209..ca3a599b3f54 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -108,7 +108,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>   * interrupt context, so if the access faults, we read the page tables
>   * to find which page (if any) is mapped and access it directly.
>   */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +static int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
>  {
>  	int ret = -EFAULT;
>  	pgd_t *pgdir;
> @@ -149,28 +149,21 @@ static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>  	return ret;
>  }
>  
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> -	    ((unsigned long)ptr & 7))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> +	if (!probe_user_read(ret, ptr, size))
>  		return 0;
>  
> -	return read_user_stack_slow(ptr, ret, 8);
> +	return read_user_stack_slow(ptr, ret, size);
>  }
>  
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 4);
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
>  }
>  
>  static inline int valid_user_sp(unsigned long sp, int is_64)
> @@ -283,13 +276,13 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>   * the page tables.  Since this is called at interrupt level,
>   * do_page_fault() won't treat a DSI as a page fault.
>   */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int __read_user_stack(const void __user *ptr, void *ret, size_t size)
>  {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +	    ((unsigned long)ptr & (size - 1)))
>  		return -EFAULT;
>  
> -	return probe_user_read(ret, ptr, sizeof(*ret));
> +	return probe_user_read(ret, ptr, size);
>  }
>  
>  static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> @@ -312,6 +305,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>  
>  #endif /* CONFIG_PPC64 */
>  
> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
> +}
> +
>  /*
>   * Layout for non-RT signal frames
>   */

^ permalink raw reply	[flat|nested] 161+ messages in thread

* [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32
  2020-04-03  7:13           ` Nicholas Piggin
@ 2020-04-06 21:00             ` Michal Suchanek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-04-06 21:00 UTC (permalink / raw)
  To: linuxppc-dev, Nicholas Piggin
  Cc: Michal Suchanek, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linux-kernel,
	Christophe Leroy

perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
Consolidate into one function with thin wrappers.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
 arch/powerpc/perf/callchain.h    | 24 +++++++++++++++++++++++-
 arch/powerpc/perf/callchain_32.c | 21 ++-------------------
 arch/powerpc/perf/callchain_64.c | 14 ++++----------
 3 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
index 7a2cb9e1181a..7540bb71cb60 100644
--- a/arch/powerpc/perf/callchain.h
+++ b/arch/powerpc/perf/callchain.h
@@ -2,7 +2,7 @@
 #ifndef _POWERPC_PERF_CALLCHAIN_H
 #define _POWERPC_PERF_CALLCHAIN_H
 
-int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+int read_user_stack_slow(const void __user *ptr, void *buf, int nb);
 void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 			    struct pt_regs *regs);
 void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
@@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
 	return (!sp || (sp & mask) || (sp > top));
 }
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static inline int __read_user_stack(const void __user *ptr, void *ret,
+				    size_t size)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - size ||
+			((unsigned long)ptr & (size - 1)))
+		return -EFAULT;
+	rc = probe_user_read(ret, ptr, size);
+
+	if (rc && IS_ENABLED(CONFIG_PPC64))
+		return read_user_stack_slow(ptr, ret, size);
+
+	return rc;
+}
+
 #endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
index 8aa951003141..1b4621f177e8 100644
--- a/arch/powerpc/perf/callchain_32.c
+++ b/arch/powerpc/perf/callchain_32.c
@@ -31,26 +31,9 @@
 
 #endif /* CONFIG_PPC64 */
 
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_32(const unsigned int __user *ptr, unsigned int *ret)
 {
-	int rc;
-
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
-
-	if (IS_ENABLED(CONFIG_PPC64) && rc)
-		return read_user_stack_slow(ptr, ret, 4);
-
-	return rc;
+	return __read_user_stack(ptr, ret, sizeof(*ret);
 }
 
 /*
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
index df1ffd8b20f2..55bbc25a54ed 100644
--- a/arch/powerpc/perf/callchain_64.c
+++ b/arch/powerpc/perf/callchain_64.c
@@ -24,7 +24,7 @@
  * interrupt context, so if the access faults, we read the page tables
  * to find which page (if any) is mapped and access it directly.
  */
-int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
 {
 	int ret = -EFAULT;
 	pgd_t *pgdir;
@@ -65,16 +65,10 @@ int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 	return ret;
 }
 
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+static int read_user_stack_64(const unsigned long __user *ptr,
+			      unsigned long *ret)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 8);
+	return __read_user_stack(ptr, ret, sizeof(*ret));
 }
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32
@ 2020-04-06 21:00             ` Michal Suchanek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchanek @ 2020-04-06 21:00 UTC (permalink / raw)
  To: linuxppc-dev, Nicholas Piggin
  Cc: Mark Rutland, Peter Zijlstra, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, Michal Suchanek, Jiri Olsa

perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
Consolidate into one function with thin wrappers.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
 arch/powerpc/perf/callchain.h    | 24 +++++++++++++++++++++++-
 arch/powerpc/perf/callchain_32.c | 21 ++-------------------
 arch/powerpc/perf/callchain_64.c | 14 ++++----------
 3 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
index 7a2cb9e1181a..7540bb71cb60 100644
--- a/arch/powerpc/perf/callchain.h
+++ b/arch/powerpc/perf/callchain.h
@@ -2,7 +2,7 @@
 #ifndef _POWERPC_PERF_CALLCHAIN_H
 #define _POWERPC_PERF_CALLCHAIN_H
 
-int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+int read_user_stack_slow(const void __user *ptr, void *buf, int nb);
 void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
 			    struct pt_regs *regs);
 void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
@@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
 	return (!sp || (sp & mask) || (sp > top));
 }
 
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables.  Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static inline int __read_user_stack(const void __user *ptr, void *ret,
+				    size_t size)
+{
+	int rc;
+
+	if ((unsigned long)ptr > TASK_SIZE - size ||
+			((unsigned long)ptr & (size - 1)))
+		return -EFAULT;
+	rc = probe_user_read(ret, ptr, size);
+
+	if (rc && IS_ENABLED(CONFIG_PPC64))
+		return read_user_stack_slow(ptr, ret, size);
+
+	return rc;
+}
+
 #endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
index 8aa951003141..1b4621f177e8 100644
--- a/arch/powerpc/perf/callchain_32.c
+++ b/arch/powerpc/perf/callchain_32.c
@@ -31,26 +31,9 @@
 
 #endif /* CONFIG_PPC64 */
 
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables.  Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+static int read_user_stack_32(const unsigned int __user *ptr, unsigned int *ret)
 {
-	int rc;
-
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
-	    ((unsigned long)ptr & 3))
-		return -EFAULT;
-
-	rc = probe_user_read(ret, ptr, sizeof(*ret));
-
-	if (IS_ENABLED(CONFIG_PPC64) && rc)
-		return read_user_stack_slow(ptr, ret, 4);
-
-	return rc;
+	return __read_user_stack(ptr, ret, sizeof(*ret);
 }
 
 /*
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
index df1ffd8b20f2..55bbc25a54ed 100644
--- a/arch/powerpc/perf/callchain_64.c
+++ b/arch/powerpc/perf/callchain_64.c
@@ -24,7 +24,7 @@
  * interrupt context, so if the access faults, we read the page tables
  * to find which page (if any) is mapped and access it directly.
  */
-int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
 {
 	int ret = -EFAULT;
 	pgd_t *pgdir;
@@ -65,16 +65,10 @@ int read_user_stack_slow(void __user *ptr, void *buf, int nb)
 	return ret;
 }
 
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+static int read_user_stack_64(const unsigned long __user *ptr,
+			      unsigned long *ret)
 {
-	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
-	    ((unsigned long)ptr & 7))
-		return -EFAULT;
-
-	if (!probe_user_read(ret, ptr, sizeof(*ret)))
-		return 0;
-
-	return read_user_stack_slow(ptr, ret, 8);
+	return __read_user_stack(ptr, ret, sizeof(*ret));
 }
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 161+ messages in thread

* Re: [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32
  2020-04-06 21:00             ` Michal Suchanek
@ 2020-04-07  5:21               ` Christophe Leroy
  -1 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-04-07  5:21 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev, Nicholas Piggin
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linux-kernel



Le 06/04/2020 à 23:00, Michal Suchanek a écrit :
> perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
> Consolidate into one function with thin wrappers.
> 
> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
>   arch/powerpc/perf/callchain.h    | 24 +++++++++++++++++++++++-
>   arch/powerpc/perf/callchain_32.c | 21 ++-------------------
>   arch/powerpc/perf/callchain_64.c | 14 ++++----------
>   3 files changed, 29 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> index 7a2cb9e1181a..7540bb71cb60 100644
> --- a/arch/powerpc/perf/callchain.h
> +++ b/arch/powerpc/perf/callchain.h
> @@ -2,7 +2,7 @@
>   #ifndef _POWERPC_PERF_CALLCHAIN_H
>   #define _POWERPC_PERF_CALLCHAIN_H
>   
> -int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> +int read_user_stack_slow(const void __user *ptr, void *buf, int nb);

Does the constification of ptr has to be in this patch ?
Wouldn't it be better to have it as a separate patch ?

>   void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>   			    struct pt_regs *regs);
>   void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> @@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
>   	return (!sp || (sp & mask) || (sp > top));
>   }
>   
> +/*
> + * On 32-bit we just access the address and let hash_page create a
> + * HPTE if necessary, so there is no need to fall back to reading
> + * the page tables.  Since this is called at interrupt level,
> + * do_page_fault() won't treat a DSI as a page fault.
> + */
> +static inline int __read_user_stack(const void __user *ptr, void *ret,
> +				    size_t size)
> +{
> +	int rc;
> +
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +			((unsigned long)ptr & (size - 1)))
> +		return -EFAULT;
> +	rc = probe_user_read(ret, ptr, size);
> +
> +	if (rc && IS_ENABLED(CONFIG_PPC64))

gcc is probably smart enough to deal with it efficiently, but it would
be more correct to test rc after checking CONFIG_PPC64.

> +		return read_user_stack_slow(ptr, ret, size);
> +
> +	return rc;
> +}
> +
>   #endif /* _POWERPC_PERF_CALLCHAIN_H */
> diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
> index 8aa951003141..1b4621f177e8 100644
> --- a/arch/powerpc/perf/callchain_32.c
> +++ b/arch/powerpc/perf/callchain_32.c
> @@ -31,26 +31,9 @@
>   
>   #endif /* CONFIG_PPC64 */
>   
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables.  Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_32(const unsigned int __user *ptr, unsigned int *ret)
>   {
> -	int rc;
> -
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	rc = probe_user_read(ret, ptr, sizeof(*ret));
> -
> -	if (IS_ENABLED(CONFIG_PPC64) && rc)
> -		return read_user_stack_slow(ptr, ret, 4);
> -
> -	return rc;
> +	return __read_user_stack(ptr, ret, sizeof(*ret);
>   }
>   
>   /*
> diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
> index df1ffd8b20f2..55bbc25a54ed 100644
> --- a/arch/powerpc/perf/callchain_64.c
> +++ b/arch/powerpc/perf/callchain_64.c
> @@ -24,7 +24,7 @@
>    * interrupt context, so if the access faults, we read the page tables
>    * to find which page (if any) is mapped and access it directly.
>    */
> -int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
>   {
>   	int ret = -EFAULT;
>   	pgd_t *pgdir;
> @@ -65,16 +65,10 @@ int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>   	return ret;
>   }
>   
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +static int read_user_stack_64(const unsigned long __user *ptr,
> +			      unsigned long *ret)
>   {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> -	    ((unsigned long)ptr & 7))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 8);
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
>   }
>   
>   /*
> 

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32
@ 2020-04-07  5:21               ` Christophe Leroy
  0 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-04-07  5:21 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev, Nicholas Piggin
  Cc: Mark Rutland, Peter Zijlstra, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, Jiri Olsa



Le 06/04/2020 à 23:00, Michal Suchanek a écrit :
> perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
> Consolidate into one function with thin wrappers.
> 
> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
>   arch/powerpc/perf/callchain.h    | 24 +++++++++++++++++++++++-
>   arch/powerpc/perf/callchain_32.c | 21 ++-------------------
>   arch/powerpc/perf/callchain_64.c | 14 ++++----------
>   3 files changed, 29 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> index 7a2cb9e1181a..7540bb71cb60 100644
> --- a/arch/powerpc/perf/callchain.h
> +++ b/arch/powerpc/perf/callchain.h
> @@ -2,7 +2,7 @@
>   #ifndef _POWERPC_PERF_CALLCHAIN_H
>   #define _POWERPC_PERF_CALLCHAIN_H
>   
> -int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> +int read_user_stack_slow(const void __user *ptr, void *buf, int nb);

Does the constification of ptr has to be in this patch ?
Wouldn't it be better to have it as a separate patch ?

>   void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
>   			    struct pt_regs *regs);
>   void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> @@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
>   	return (!sp || (sp & mask) || (sp > top));
>   }
>   
> +/*
> + * On 32-bit we just access the address and let hash_page create a
> + * HPTE if necessary, so there is no need to fall back to reading
> + * the page tables.  Since this is called at interrupt level,
> + * do_page_fault() won't treat a DSI as a page fault.
> + */
> +static inline int __read_user_stack(const void __user *ptr, void *ret,
> +				    size_t size)
> +{
> +	int rc;
> +
> +	if ((unsigned long)ptr > TASK_SIZE - size ||
> +			((unsigned long)ptr & (size - 1)))
> +		return -EFAULT;
> +	rc = probe_user_read(ret, ptr, size);
> +
> +	if (rc && IS_ENABLED(CONFIG_PPC64))

gcc is probably smart enough to deal with it efficiently, but it would
be more correct to test rc after checking CONFIG_PPC64.

> +		return read_user_stack_slow(ptr, ret, size);
> +
> +	return rc;
> +}
> +
>   #endif /* _POWERPC_PERF_CALLCHAIN_H */
> diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
> index 8aa951003141..1b4621f177e8 100644
> --- a/arch/powerpc/perf/callchain_32.c
> +++ b/arch/powerpc/perf/callchain_32.c
> @@ -31,26 +31,9 @@
>   
>   #endif /* CONFIG_PPC64 */
>   
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables.  Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +static int read_user_stack_32(const unsigned int __user *ptr, unsigned int *ret)
>   {
> -	int rc;
> -
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> -	    ((unsigned long)ptr & 3))
> -		return -EFAULT;
> -
> -	rc = probe_user_read(ret, ptr, sizeof(*ret));
> -
> -	if (IS_ENABLED(CONFIG_PPC64) && rc)
> -		return read_user_stack_slow(ptr, ret, 4);
> -
> -	return rc;
> +	return __read_user_stack(ptr, ret, sizeof(*ret);
>   }
>   
>   /*
> diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
> index df1ffd8b20f2..55bbc25a54ed 100644
> --- a/arch/powerpc/perf/callchain_64.c
> +++ b/arch/powerpc/perf/callchain_64.c
> @@ -24,7 +24,7 @@
>    * interrupt context, so if the access faults, we read the page tables
>    * to find which page (if any) is mapped and access it directly.
>    */
> -int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +int read_user_stack_slow(const void __user *ptr, void *buf, int nb)
>   {
>   	int ret = -EFAULT;
>   	pgd_t *pgdir;
> @@ -65,16 +65,10 @@ int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>   	return ret;
>   }
>   
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +static int read_user_stack_64(const unsigned long __user *ptr,
> +			      unsigned long *ret)
>   {
> -	if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> -	    ((unsigned long)ptr & 7))
> -		return -EFAULT;
> -
> -	if (!probe_user_read(ret, ptr, sizeof(*ret)))
> -		return 0;
> -
> -	return read_user_stack_slow(ptr, ret, 8);
> +	return __read_user_stack(ptr, ret, sizeof(*ret));
>   }
>   
>   /*
> 

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-03-20 10:20     ` Michal Suchanek
@ 2020-04-07  5:50       ` Christophe Leroy
  -1 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-04-07  5:50 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Alexander Viro, Mauro Carvalho Chehab, David S. Miller,
	Rob Herring, Greg Kroah-Hartman, Jonathan Cameron,
	Andy Shevchenko, Thomas Gleixner, Arnd Bergmann, Nayna Jain,
	Eric Richter, Claudio Carvalho, Nicholas Piggin, Hari Bathini,
	Masahiro Yamada, Thiago Jung Bauermann,
	Sebastian Andrzej Siewior, Valentin Schneider, Jordan Niethe,
	Michael Neuling, Gustavo Luiz Duarte, Allison Randal,
	Eric W. Biederman, linux-kernel, linux-fsdevel



Le 20/03/2020 à 11:20, Michal Suchanek a écrit :
> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
> 
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> v5:
> - avoid unreachable code on 32bit
> - make is_current_64bit constant on !COMPAT
> - add stub perf_callchain_user_32 to avoid some ifdefs
> v6:
> - consolidate current_is_64bit
> v7:
> - remove leftover perf_callchain_user_32 stub from previous series version
> v8:
> - fix build again - too trigger-happy with stub removal
> - remove a vdso.c hunk that causes warning according to kbuild test robot
> v9:
> - removed current_is_64bit in previous patch
> v10:
> - rebase on top of 70ed86f4de5bd
> ---
>   arch/powerpc/include/asm/thread_info.h | 4 ++--
>   arch/powerpc/kernel/Makefile           | 6 +++---
>   arch/powerpc/kernel/entry_64.S         | 2 ++
>   arch/powerpc/kernel/signal.c           | 3 +--
>   arch/powerpc/kernel/syscall_64.c       | 6 ++----
>   arch/powerpc/kernel/vdso.c             | 3 ++-
>   arch/powerpc/perf/callchain.c          | 8 +++++++-
>   7 files changed, 19 insertions(+), 13 deletions(-)
> 

[...]

> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 87d95b455b83..2dcbfe38f5ac 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   				   long r6, long r7, long r8,
>   				   unsigned long r0, struct pt_regs *regs)
>   {
> -	unsigned long ti_flags;
>   	syscall_fn f;
>   
>   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   
>   	local_irq_enable();
>   
> -	ti_flags = current_thread_info()->flags;
> -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
>   		/*
>   		 * We use the return value of do_syscall_trace_enter() as the
>   		 * syscall number. If the syscall was rejected for any reason
> @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   	/* May be faster to do array_index_nospec? */
>   	barrier_nospec();
>   
> -	if (unlikely(ti_flags & _TIF_32BIT)) {
> +	if (unlikely(is_32bit_task())) {

is_compat() should be used here instead, because we dont want to use 
compat_sys_call_table() on PPC32.

>   		f = (void *)compat_sys_call_table[r0];
>   
>   		r3 &= 0x00000000ffffffffULL;

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-04-07  5:50       ` Christophe Leroy
  0 siblings, 0 replies; 161+ messages in thread
From: Christophe Leroy @ 2020-04-07  5:50 UTC (permalink / raw)
  To: Michal Suchanek, linuxppc-dev
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, linux-fsdevel, David S. Miller,
	Thiago Jung Bauermann



Le 20/03/2020 à 11:20, Michal Suchanek a écrit :
> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
> 
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> v5:
> - avoid unreachable code on 32bit
> - make is_current_64bit constant on !COMPAT
> - add stub perf_callchain_user_32 to avoid some ifdefs
> v6:
> - consolidate current_is_64bit
> v7:
> - remove leftover perf_callchain_user_32 stub from previous series version
> v8:
> - fix build again - too trigger-happy with stub removal
> - remove a vdso.c hunk that causes warning according to kbuild test robot
> v9:
> - removed current_is_64bit in previous patch
> v10:
> - rebase on top of 70ed86f4de5bd
> ---
>   arch/powerpc/include/asm/thread_info.h | 4 ++--
>   arch/powerpc/kernel/Makefile           | 6 +++---
>   arch/powerpc/kernel/entry_64.S         | 2 ++
>   arch/powerpc/kernel/signal.c           | 3 +--
>   arch/powerpc/kernel/syscall_64.c       | 6 ++----
>   arch/powerpc/kernel/vdso.c             | 3 ++-
>   arch/powerpc/perf/callchain.c          | 8 +++++++-
>   7 files changed, 19 insertions(+), 13 deletions(-)
> 

[...]

> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 87d95b455b83..2dcbfe38f5ac 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   				   long r6, long r7, long r8,
>   				   unsigned long r0, struct pt_regs *regs)
>   {
> -	unsigned long ti_flags;
>   	syscall_fn f;
>   
>   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   
>   	local_irq_enable();
>   
> -	ti_flags = current_thread_info()->flags;
> -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
>   		/*
>   		 * We use the return value of do_syscall_trace_enter() as the
>   		 * syscall number. If the syscall was rejected for any reason
> @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   	/* May be faster to do array_index_nospec? */
>   	barrier_nospec();
>   
> -	if (unlikely(ti_flags & _TIF_32BIT)) {
> +	if (unlikely(is_32bit_task())) {

is_compat() should be used here instead, because we dont want to use 
compat_sys_call_table() on PPC32.

>   		f = (void *)compat_sys_call_table[r0];
>   
>   		r3 &= 0x00000000ffffffffULL;

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT
  2020-04-07  5:50       ` Christophe Leroy
@ 2020-04-07  9:57         ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-07  9:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Alexander Viro, Mauro Carvalho Chehab,
	David S. Miller, Rob Herring, Greg Kroah-Hartman,
	Jonathan Cameron, Andy Shevchenko, Thomas Gleixner,
	Arnd Bergmann, Nayna Jain, Eric Richter, Claudio Carvalho,
	Nicholas Piggin, Hari Bathini, Masahiro Yamada,
	Thiago Jung Bauermann, Sebastian Andrzej Siewior,
	Valentin Schneider, Jordan Niethe, Michael Neuling,
	Gustavo Luiz Duarte, Allison Randal, Eric W. Biederman,
	linux-kernel, linux-fsdevel

On Tue, Apr 07, 2020 at 07:50:30AM +0200, Christophe Leroy wrote:
> 
> 
> Le 20/03/2020 à 11:20, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > v7:
> > - remove leftover perf_callchain_user_32 stub from previous series version
> > v8:
> > - fix build again - too trigger-happy with stub removal
> > - remove a vdso.c hunk that causes warning according to kbuild test robot
> > v9:
> > - removed current_is_64bit in previous patch
> > v10:
> > - rebase on top of 70ed86f4de5bd
> > ---
> >   arch/powerpc/include/asm/thread_info.h | 4 ++--
> >   arch/powerpc/kernel/Makefile           | 6 +++---
> >   arch/powerpc/kernel/entry_64.S         | 2 ++
> >   arch/powerpc/kernel/signal.c           | 3 +--
> >   arch/powerpc/kernel/syscall_64.c       | 6 ++----
> >   arch/powerpc/kernel/vdso.c             | 3 ++-
> >   arch/powerpc/perf/callchain.c          | 8 +++++++-
> >   7 files changed, 19 insertions(+), 13 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> > index 87d95b455b83..2dcbfe38f5ac 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >   				   long r6, long r7, long r8,
> >   				   unsigned long r0, struct pt_regs *regs)
> >   {
> > -	unsigned long ti_flags;
> >   	syscall_fn f;
> >   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >   	local_irq_enable();
> > -	ti_flags = current_thread_info()->flags;
> > -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> >   		/*
> >   		 * We use the return value of do_syscall_trace_enter() as the
> >   		 * syscall number. If the syscall was rejected for any reason
> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >   	/* May be faster to do array_index_nospec? */
> >   	barrier_nospec();
> > -	if (unlikely(ti_flags & _TIF_32BIT)) {
> > +	if (unlikely(is_32bit_task())) {
> 
> is_compat() should be used here instead, because we dont want to use
is_compat_task()
> compat_sys_call_table() on PPC32.
> 
> >   		f = (void *)compat_sys_call_table[r0];
> >   		r3 &= 0x00000000ffffffffULL;
> 
That only applies once you use this for 32bit as well. Right now it's
64bit only so the two are the same.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT
@ 2020-04-07  9:57         ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-07  9:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mark Rutland, Gustavo Luiz Duarte, Peter Zijlstra,
	Sebastian Andrzej Siewior, linux-kernel, Paul Mackerras,
	Jiri Olsa, Rob Herring, Michael Neuling, Mauro Carvalho Chehab,
	Masahiro Yamada, Nayna Jain, Alexander Shishkin, Ingo Molnar,
	Allison Randal, Jordan Niethe, Valentin Schneider, Arnd Bergmann,
	Arnaldo Carvalho de Melo, Alexander Viro, Jonathan Cameron,
	Namhyung Kim, Thomas Gleixner, Andy Shevchenko, Hari Bathini,
	Greg Kroah-Hartman, Nicholas Piggin, Claudio Carvalho,
	Eric Richter, Eric W. Biederman, linux-fsdevel, linuxppc-dev,
	David S. Miller, Thiago Jung Bauermann

On Tue, Apr 07, 2020 at 07:50:30AM +0200, Christophe Leroy wrote:
> 
> 
> Le 20/03/2020 à 11:20, Michal Suchanek a écrit :
> > There are numerous references to 32bit functions in generic and 64bit
> > code so ifdef them out.
> > 
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> > v2:
> > - fix 32bit ifdef condition in signal.c
> > - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> > - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> > v3:
> > - use IS_ENABLED and maybe_unused where possible
> > - do not ifdef declarations
> > - clean up Makefile
> > v4:
> > - further makefile cleanup
> > - simplify is_32bit_task conditions
> > - avoid ifdef in condition by using return
> > v5:
> > - avoid unreachable code on 32bit
> > - make is_current_64bit constant on !COMPAT
> > - add stub perf_callchain_user_32 to avoid some ifdefs
> > v6:
> > - consolidate current_is_64bit
> > v7:
> > - remove leftover perf_callchain_user_32 stub from previous series version
> > v8:
> > - fix build again - too trigger-happy with stub removal
> > - remove a vdso.c hunk that causes warning according to kbuild test robot
> > v9:
> > - removed current_is_64bit in previous patch
> > v10:
> > - rebase on top of 70ed86f4de5bd
> > ---
> >   arch/powerpc/include/asm/thread_info.h | 4 ++--
> >   arch/powerpc/kernel/Makefile           | 6 +++---
> >   arch/powerpc/kernel/entry_64.S         | 2 ++
> >   arch/powerpc/kernel/signal.c           | 3 +--
> >   arch/powerpc/kernel/syscall_64.c       | 6 ++----
> >   arch/powerpc/kernel/vdso.c             | 3 ++-
> >   arch/powerpc/perf/callchain.c          | 8 +++++++-
> >   7 files changed, 19 insertions(+), 13 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> > index 87d95b455b83..2dcbfe38f5ac 100644
> > --- a/arch/powerpc/kernel/syscall_64.c
> > +++ b/arch/powerpc/kernel/syscall_64.c
> > @@ -24,7 +24,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >   				   long r6, long r7, long r8,
> >   				   unsigned long r0, struct pt_regs *regs)
> >   {
> > -	unsigned long ti_flags;
> >   	syscall_fn f;
> >   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> > @@ -68,8 +67,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >   	local_irq_enable();
> > -	ti_flags = current_thread_info()->flags;
> > -	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> > +	if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> >   		/*
> >   		 * We use the return value of do_syscall_trace_enter() as the
> >   		 * syscall number. If the syscall was rejected for any reason
> > @@ -94,7 +92,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
> >   	/* May be faster to do array_index_nospec? */
> >   	barrier_nospec();
> > -	if (unlikely(ti_flags & _TIF_32BIT)) {
> > +	if (unlikely(is_32bit_task())) {
> 
> is_compat() should be used here instead, because we dont want to use
is_compat_task()
> compat_sys_call_table() on PPC32.
> 
> >   		f = (void *)compat_sys_call_table[r0];
> >   		r3 &= 0x00000000ffffffffULL;
> 
That only applies once you use this for 32bit as well. Right now it's
64bit only so the two are the same.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32
  2020-04-07  5:21               ` Christophe Leroy
@ 2020-04-09 11:22                 ` Michal Suchánek
  -1 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-09 11:22 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev, Nicholas Piggin, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linux-kernel

On Tue, Apr 07, 2020 at 07:21:06AM +0200, Christophe Leroy wrote:
> 
> 
> Le 06/04/2020 à 23:00, Michal Suchanek a écrit :
> > perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
> > Consolidate into one function with thin wrappers.
> > 
> > Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> >   arch/powerpc/perf/callchain.h    | 24 +++++++++++++++++++++++-
> >   arch/powerpc/perf/callchain_32.c | 21 ++-------------------
> >   arch/powerpc/perf/callchain_64.c | 14 ++++----------
> >   3 files changed, 29 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> > index 7a2cb9e1181a..7540bb71cb60 100644
> > --- a/arch/powerpc/perf/callchain.h
> > +++ b/arch/powerpc/perf/callchain.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _POWERPC_PERF_CALLCHAIN_H
> >   #define _POWERPC_PERF_CALLCHAIN_H
> > -int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> > +int read_user_stack_slow(const void __user *ptr, void *buf, int nb);
> 
> Does the constification of ptr has to be in this patch ?
It was in the original patch. The code is touched anyway.
> Wouldn't it be better to have it as a separate patch ?
Don't care much either way. Can resend it as separate patches.
> 
> >   void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >   			    struct pt_regs *regs);
> >   void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > @@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
> >   	return (!sp || (sp & mask) || (sp > top));
> >   }
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> > +static inline int __read_user_stack(const void __user *ptr, void *ret,
> > +				    size_t size)
> > +{
> > +	int rc;
> > +
> > +	if ((unsigned long)ptr > TASK_SIZE - size ||
> > +			((unsigned long)ptr & (size - 1)))
> > +		return -EFAULT;
> > +	rc = probe_user_read(ret, ptr, size);
> > +
> > +	if (rc && IS_ENABLED(CONFIG_PPC64))
> 
> gcc is probably smart enough to deal with it efficiently, but it would
> be more correct to test rc after checking CONFIG_PPC64.
IS_ENABLED(CONFIG_PPC64) is constant so that part of the check should be
compiled out in any case.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32
@ 2020-04-09 11:22                 ` Michal Suchánek
  0 siblings, 0 replies; 161+ messages in thread
From: Michal Suchánek @ 2020-04-09 11:22 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mark Rutland, Peter Zijlstra, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Paul Mackerras, Nicholas Piggin, Namhyung Kim, linuxppc-dev

On Tue, Apr 07, 2020 at 07:21:06AM +0200, Christophe Leroy wrote:
> 
> 
> Le 06/04/2020 à 23:00, Michal Suchanek a écrit :
> > perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
> > Consolidate into one function with thin wrappers.
> > 
> > Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > ---
> >   arch/powerpc/perf/callchain.h    | 24 +++++++++++++++++++++++-
> >   arch/powerpc/perf/callchain_32.c | 21 ++-------------------
> >   arch/powerpc/perf/callchain_64.c | 14 ++++----------
> >   3 files changed, 29 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> > index 7a2cb9e1181a..7540bb71cb60 100644
> > --- a/arch/powerpc/perf/callchain.h
> > +++ b/arch/powerpc/perf/callchain.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _POWERPC_PERF_CALLCHAIN_H
> >   #define _POWERPC_PERF_CALLCHAIN_H
> > -int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> > +int read_user_stack_slow(const void __user *ptr, void *buf, int nb);
> 
> Does the constification of ptr has to be in this patch ?
It was in the original patch. The code is touched anyway.
> Wouldn't it be better to have it as a separate patch ?
Don't care much either way. Can resend it as separate patches.
> 
> >   void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> >   			    struct pt_regs *regs);
> >   void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > @@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
> >   	return (!sp || (sp & mask) || (sp > top));
> >   }
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> > +static inline int __read_user_stack(const void __user *ptr, void *ret,
> > +				    size_t size)
> > +{
> > +	int rc;
> > +
> > +	if ((unsigned long)ptr > TASK_SIZE - size ||
> > +			((unsigned long)ptr & (size - 1)))
> > +		return -EFAULT;
> > +	rc = probe_user_read(ret, ptr, size);
> > +
> > +	if (rc && IS_ENABLED(CONFIG_PPC64))
> 
> gcc is probably smart enough to deal with it efficiently, but it would
> be more correct to test rc after checking CONFIG_PPC64.
IS_ENABLED(CONFIG_PPC64) is constant so that part of the check should be
compiled out in any case.

Thanks

Michal

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
@ 2021-01-27  8:54   ` Christophe Leroy
  2021-01-28  0:09     ` Nicholas Piggin
  2021-02-03 16:25   ` Christophe Leroy
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2021-01-27  8:54 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek



Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
> Implement the bulk of interrupt return logic in C. The asm return code
> must handle a few cases: restoring full GPRs, and emulating stack store.
> 
> The stack store emulation is significantly simplfied, rather than creating
> a new return frame and switching to that before performing the store, it
> uses the PACA to keep a scratch register around to perform thestore.
> 
> The asm return code is moved into 64e for now. The new logic has made
> allowance for 64e, but I don't have a full environment that works well
> to test it, and even booting in emulated qemu is not great for stress
> testing. 64e shouldn't be too far off working with this, given a bit
> more testing and auditing of the logic.
> 
> This is slightly faster on a POWER9 (page fault speed increases about
> 1.1%), probably due to reduced mtmsrd.
> 

How do you measure 'page fault' speed ?


^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-01-27  8:54   ` Christophe Leroy
@ 2021-01-28  0:09     ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2021-01-28  0:09 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Excerpts from Christophe Leroy's message of January 27, 2021 6:54 pm:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> Implement the bulk of interrupt return logic in C. The asm return code
>> must handle a few cases: restoring full GPRs, and emulating stack store.
>> 
>> The stack store emulation is significantly simplfied, rather than creating
>> a new return frame and switching to that before performing the store, it
>> uses the PACA to keep a scratch register around to perform thestore.
>> 
>> The asm return code is moved into 64e for now. The new logic has made
>> allowance for 64e, but I don't have a full environment that works well
>> to test it, and even booting in emulated qemu is not great for stress
>> testing. 64e shouldn't be too far off working with this, given a bit
>> more testing and auditing of the logic.
>> 
>> This is slightly faster on a POWER9 (page fault speed increases about
>> 1.1%), probably due to reduced mtmsrd.
>> 
> 
> How do you measure 'page fault' speed ?

mmap 1000 pages, store to each one, mprotect(PROT_READ) then 
mprotect(PROT_READ|PROT_WRITE), then store a byte to each page
and measure the cost. Something like that IIRC.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
  2021-01-27  8:54   ` Christophe Leroy
@ 2021-02-03 16:25   ` Christophe Leroy
  2021-02-04  3:27     ` Nicholas Piggin
  2021-02-27 10:07   ` Christophe Leroy
  2021-03-15 13:41   ` Christophe Leroy
  3 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2021-02-03 16:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek



Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
> Implement the bulk of interrupt return logic in C. The asm return code
> must handle a few cases: restoring full GPRs, and emulating stack store.
> 


> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
> +{
> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
> +	unsigned long flags;
> +
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
> +		unrecoverable_exception(regs);
> +	BUG_ON(regs->msr & MSR_PR);
> +	BUG_ON(!FULL_REGS(regs));
> +
> +	local_irq_save(flags);
> +
> +	if (regs->softe == IRQS_ENABLED) {
> +		/* Returning to a kernel context with local irqs enabled. */
> +		WARN_ON_ONCE(!(regs->msr & MSR_EE));
> +again:
> +		if (IS_ENABLED(CONFIG_PREEMPT)) {
> +			/* Return to preemptible kernel context */
> +			if (unlikely(*ti_flagsp & _TIF_NEED_RESCHED)) {
> +				if (preempt_count() == 0)
> +					preempt_schedule_irq();
> +			}
> +		}
> +
> +		trace_hardirqs_on();
> +		__hard_EE_RI_disable();
> +		if (unlikely(lazy_irq_pending())) {
> +			__hard_RI_enable();
> +			irq_soft_mask_set(IRQS_ALL_DISABLED);
> +			trace_hardirqs_off();
> +			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +			/*
> +			 * Can't local_irq_enable in case we are in interrupt
> +			 * context. Must replay directly.
> +			 */
> +			replay_soft_interrupts();
> +			irq_soft_mask_set(flags);
> +			/* Took an interrupt, may have more exit work to do. */
> +			goto again;
> +		}
> +		local_paca->irq_happened = 0;
> +		irq_soft_mask_set(IRQS_ENABLED);
> +	} else {
> +		/* Returning to a kernel context with local irqs disabled. */
> +		trace_hardirqs_on();
> +		__hard_EE_RI_disable();
> +		if (regs->msr & MSR_EE)
> +			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
> +	}
> +
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	local_paca->tm_scratch = regs->msr;
> +#endif
> +
> +	/*
> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
> +	 * The value of AMR only matters while we're in the kernel.
> +	 */
> +	kuap_restore_amr(regs);

Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?

Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in 
a way or another, and get the previous KUAP state restored by this way ?

Also, it looks a bit strange to have kuap_save_amr_and_lock() done at lowest level in assembly, and 
kuap_restore_amr() done in upper level. That looks unbalanced.

Christophe


> +
> +	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
> +		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
> +		return 1;
> +	}
> +	return 0;
> +}
> +#endif
> diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
> index 25c14a0981bf..d20c5e79e03c 100644
> --- a/arch/powerpc/kernel/vector.S
> +++ b/arch/powerpc/kernel/vector.S
> @@ -134,7 +134,7 @@ _GLOBAL(load_up_vsx)
>   	/* enable use of VSX after return */
>   	oris	r12,r12,MSR_VSX@h
>   	std	r12,_MSR(r1)
> -	b	fast_exception_return
> +	b	fast_interrupt_return
>   
>   #endif /* CONFIG_VSX */
>   
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-03 16:25   ` Christophe Leroy
@ 2021-02-04  3:27     ` Nicholas Piggin
  2021-02-04  8:03       ` Christophe Leroy
  0 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2021-02-04  3:27 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> Implement the bulk of interrupt return logic in C. The asm return code
>> must handle a few cases: restoring full GPRs, and emulating stack store.
>> 
> 
> 
>> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
>> +{
>> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
>> +	unsigned long flags;
>> +
>> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
>> +		unrecoverable_exception(regs);
>> +	BUG_ON(regs->msr & MSR_PR);
>> +	BUG_ON(!FULL_REGS(regs));
>> +
>> +	local_irq_save(flags);
>> +
>> +	if (regs->softe == IRQS_ENABLED) {
>> +		/* Returning to a kernel context with local irqs enabled. */
>> +		WARN_ON_ONCE(!(regs->msr & MSR_EE));
>> +again:
>> +		if (IS_ENABLED(CONFIG_PREEMPT)) {
>> +			/* Return to preemptible kernel context */
>> +			if (unlikely(*ti_flagsp & _TIF_NEED_RESCHED)) {
>> +				if (preempt_count() == 0)
>> +					preempt_schedule_irq();
>> +			}
>> +		}
>> +
>> +		trace_hardirqs_on();
>> +		__hard_EE_RI_disable();
>> +		if (unlikely(lazy_irq_pending())) {
>> +			__hard_RI_enable();
>> +			irq_soft_mask_set(IRQS_ALL_DISABLED);
>> +			trace_hardirqs_off();
>> +			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
>> +			/*
>> +			 * Can't local_irq_enable in case we are in interrupt
>> +			 * context. Must replay directly.
>> +			 */
>> +			replay_soft_interrupts();
>> +			irq_soft_mask_set(flags);
>> +			/* Took an interrupt, may have more exit work to do. */
>> +			goto again;
>> +		}
>> +		local_paca->irq_happened = 0;
>> +		irq_soft_mask_set(IRQS_ENABLED);
>> +	} else {
>> +		/* Returning to a kernel context with local irqs disabled. */
>> +		trace_hardirqs_on();
>> +		__hard_EE_RI_disable();
>> +		if (regs->msr & MSR_EE)
>> +			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
>> +	}
>> +
>> +
>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> +	local_paca->tm_scratch = regs->msr;
>> +#endif
>> +
>> +	/*
>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>> +	 * The value of AMR only matters while we're in the kernel.
>> +	 */
>> +	kuap_restore_amr(regs);
> 
> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
> 
> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in 
> a way or another, and get the previous KUAP state restored by this way ?

I'm not sure if there much more risk if it's here rather than the 
instruction being in another place in the code.

There's a lot of user access around the kernel too if you want to find a 
gadget to unlock KUAP then I suppose there is a pretty large attack
surface.

> Also, it looks a bit strange to have kuap_save_amr_and_lock() done at lowest level in assembly, and 
> kuap_restore_amr() done in upper level. That looks unbalanced.

I'd like to bring the entry assembly into C.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-04  3:27     ` Nicholas Piggin
@ 2021-02-04  8:03       ` Christophe Leroy
  2021-02-04  8:53         ` Nicholas Piggin
  0 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2021-02-04  8:03 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek



Le 04/02/2021 à 04:27, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
>>
>>
>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>>> Implement the bulk of interrupt return logic in C. The asm return code
>>> must handle a few cases: restoring full GPRs, and emulating stack store.
>>>
>>
>>
>>> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
>>> +{
>>> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
>>> +	unsigned long flags;
>>> +
>>> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
>>> +		unrecoverable_exception(regs);
>>> +	BUG_ON(regs->msr & MSR_PR);
>>> +	BUG_ON(!FULL_REGS(regs));
>>> +
>>> +	local_irq_save(flags);
>>> +
>>> +	if (regs->softe == IRQS_ENABLED) {
>>> +		/* Returning to a kernel context with local irqs enabled. */
>>> +		WARN_ON_ONCE(!(regs->msr & MSR_EE));
>>> +again:
>>> +		if (IS_ENABLED(CONFIG_PREEMPT)) {
>>> +			/* Return to preemptible kernel context */
>>> +			if (unlikely(*ti_flagsp & _TIF_NEED_RESCHED)) {
>>> +				if (preempt_count() == 0)
>>> +					preempt_schedule_irq();
>>> +			}
>>> +		}
>>> +
>>> +		trace_hardirqs_on();
>>> +		__hard_EE_RI_disable();
>>> +		if (unlikely(lazy_irq_pending())) {
>>> +			__hard_RI_enable();
>>> +			irq_soft_mask_set(IRQS_ALL_DISABLED);
>>> +			trace_hardirqs_off();
>>> +			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
>>> +			/*
>>> +			 * Can't local_irq_enable in case we are in interrupt
>>> +			 * context. Must replay directly.
>>> +			 */
>>> +			replay_soft_interrupts();
>>> +			irq_soft_mask_set(flags);
>>> +			/* Took an interrupt, may have more exit work to do. */
>>> +			goto again;
>>> +		}
>>> +		local_paca->irq_happened = 0;
>>> +		irq_soft_mask_set(IRQS_ENABLED);
>>> +	} else {
>>> +		/* Returning to a kernel context with local irqs disabled. */
>>> +		trace_hardirqs_on();
>>> +		__hard_EE_RI_disable();
>>> +		if (regs->msr & MSR_EE)
>>> +			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
>>> +	}
>>> +
>>> +
>>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>> +	local_paca->tm_scratch = regs->msr;
>>> +#endif
>>> +
>>> +	/*
>>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>>> +	 * The value of AMR only matters while we're in the kernel.
>>> +	 */
>>> +	kuap_restore_amr(regs);
>>
>> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
>>
>> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in
>> a way or another, and get the previous KUAP state restored by this way ?
> 
> I'm not sure if there much more risk if it's here rather than the
> instruction being in another place in the code.
> 
> There's a lot of user access around the kernel too if you want to find a
> gadget to unlock KUAP then I suppose there is a pretty large attack
> surface.

My understanding is that user access scope is strictly limited, for instance we enforce the 
begin/end of user access to be in the same function, and we refrain from calling any other function 
inside the user access window. x86 even have 'objtool' to enforce it at build time. So in theory 
there is no way to get out of the function while user access is open.

Here with the interrupt exit function it is free beer. You have a place where you re-open user 
access and return with a simple blr. So that's open bar. If someone manages to just call the 
interrupt exit function, then user access remains open

> 
>> Also, it looks a bit strange to have kuap_save_amr_and_lock() done at lowest level in assembly, and
>> kuap_restore_amr() done in upper level. That looks unbalanced.
> 
> I'd like to bring the entry assembly into C.
> 

I really think it's not a good idea. We'll get better control and readability by keeping it at the 
lowest possible level in assembly.

x86 only save and restore SMAC state in assembly.

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-04  8:03       ` Christophe Leroy
@ 2021-02-04  8:53         ` Nicholas Piggin
  2021-02-05  0:22           ` Michael Ellerman
  0 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2021-02-04  8:53 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Excerpts from Christophe Leroy's message of February 4, 2021 6:03 pm:
> 
> 
> Le 04/02/2021 à 04:27, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
>>>
>>>
>>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>>>> Implement the bulk of interrupt return logic in C. The asm return code
>>>> must handle a few cases: restoring full GPRs, and emulating stack store.
>>>>
>>>
>>>
>>>> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
>>>> +{
>>>> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
>>>> +	unsigned long flags;
>>>> +
>>>> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
>>>> +		unrecoverable_exception(regs);
>>>> +	BUG_ON(regs->msr & MSR_PR);
>>>> +	BUG_ON(!FULL_REGS(regs));
>>>> +
>>>> +	local_irq_save(flags);
>>>> +
>>>> +	if (regs->softe == IRQS_ENABLED) {
>>>> +		/* Returning to a kernel context with local irqs enabled. */
>>>> +		WARN_ON_ONCE(!(regs->msr & MSR_EE));
>>>> +again:
>>>> +		if (IS_ENABLED(CONFIG_PREEMPT)) {
>>>> +			/* Return to preemptible kernel context */
>>>> +			if (unlikely(*ti_flagsp & _TIF_NEED_RESCHED)) {
>>>> +				if (preempt_count() == 0)
>>>> +					preempt_schedule_irq();
>>>> +			}
>>>> +		}
>>>> +
>>>> +		trace_hardirqs_on();
>>>> +		__hard_EE_RI_disable();
>>>> +		if (unlikely(lazy_irq_pending())) {
>>>> +			__hard_RI_enable();
>>>> +			irq_soft_mask_set(IRQS_ALL_DISABLED);
>>>> +			trace_hardirqs_off();
>>>> +			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
>>>> +			/*
>>>> +			 * Can't local_irq_enable in case we are in interrupt
>>>> +			 * context. Must replay directly.
>>>> +			 */
>>>> +			replay_soft_interrupts();
>>>> +			irq_soft_mask_set(flags);
>>>> +			/* Took an interrupt, may have more exit work to do. */
>>>> +			goto again;
>>>> +		}
>>>> +		local_paca->irq_happened = 0;
>>>> +		irq_soft_mask_set(IRQS_ENABLED);
>>>> +	} else {
>>>> +		/* Returning to a kernel context with local irqs disabled. */
>>>> +		trace_hardirqs_on();
>>>> +		__hard_EE_RI_disable();
>>>> +		if (regs->msr & MSR_EE)
>>>> +			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
>>>> +	}
>>>> +
>>>> +
>>>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>>> +	local_paca->tm_scratch = regs->msr;
>>>> +#endif
>>>> +
>>>> +	/*
>>>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>>>> +	 * The value of AMR only matters while we're in the kernel.
>>>> +	 */
>>>> +	kuap_restore_amr(regs);
>>>
>>> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
>>>
>>> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in
>>> a way or another, and get the previous KUAP state restored by this way ?
>> 
>> I'm not sure if there much more risk if it's here rather than the
>> instruction being in another place in the code.
>> 
>> There's a lot of user access around the kernel too if you want to find a
>> gadget to unlock KUAP then I suppose there is a pretty large attack
>> surface.
> 
> My understanding is that user access scope is strictly limited, for instance we enforce the 
> begin/end of user access to be in the same function, and we refrain from calling any other function 
> inside the user access window. x86 even have 'objtool' to enforce it at build time. So in theory 
> there is no way to get out of the function while user access is open.
> 
> Here with the interrupt exit function it is free beer. You have a place where you re-open user 
> access and return with a simple blr. So that's open bar. If someone manages to just call the 
> interrupt exit function, then user access remains open

Hmm okay maybe that's a good point.

>>> Also, it looks a bit strange to have kuap_save_amr_and_lock() done at lowest level in assembly, and
>>> kuap_restore_amr() done in upper level. That looks unbalanced.
>> 
>> I'd like to bring the entry assembly into C.
>> 
> 
> I really think it's not a good idea. We'll get better control and readability by keeping it at the 
> lowest possible level in assembly.
> 
> x86 only save and restore SMAC state in assembly.

Okay we could go the other way and move the unlock into asm then.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-04  8:53         ` Nicholas Piggin
@ 2021-02-05  0:22           ` Michael Ellerman
  2021-02-05  2:16             ` Nicholas Piggin
  0 siblings, 1 reply; 161+ messages in thread
From: Michael Ellerman @ 2021-02-05  0:22 UTC (permalink / raw)
  To: Nicholas Piggin, Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Christophe Leroy's message of February 4, 2021 6:03 pm:
>> Le 04/02/2021 à 04:27, Nicholas Piggin a écrit :
>>> Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
>>>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
...
>>>>> +
>>>>> +	/*
>>>>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>>>>> +	 * The value of AMR only matters while we're in the kernel.
>>>>> +	 */
>>>>> +	kuap_restore_amr(regs);
>>>>
>>>> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
>>>>
>>>> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in
>>>> a way or another, and get the previous KUAP state restored by this way ?
>>> 
>>> I'm not sure if there much more risk if it's here rather than the
>>> instruction being in another place in the code.
>>> 
>>> There's a lot of user access around the kernel too if you want to find a
>>> gadget to unlock KUAP then I suppose there is a pretty large attack
>>> surface.
>> 
>> My understanding is that user access scope is strictly limited, for instance we enforce the 
>> begin/end of user access to be in the same function, and we refrain from calling any other function 
>> inside the user access window. x86 even have 'objtool' to enforce it at build time. So in theory 
>> there is no way to get out of the function while user access is open.
>> 
>> Here with the interrupt exit function it is free beer. You have a place where you re-open user 
>> access and return with a simple blr. So that's open bar. If someone manages to just call the 
>> interrupt exit function, then user access remains open
>
> Hmm okay maybe that's a good point.

I don't think it's a very attractive gadget, it's not just a plain blr,
it does a full stack frame tear down before the return. And there's no
LR reloads anywhere very close.

Obviously it depends on what the compiler decides to do, it's possible
it could be a usable gadget. But there are other places that are more
attractive I think, eg:

c00000000061d768:	a6 03 3d 7d 	mtspr   29,r9
c00000000061d76c:	2c 01 00 4c 	isync
c00000000061d770:	00 00 00 60 	nop
c00000000061d774:	30 00 21 38 	addi    r1,r1,48
c00000000061d778:	20 00 80 4e 	blr


So I don't think we should redesign the code *purely* because we're
worried about interrupt_exit_kernel_prepare() being a useful gadget. If
we can come up with a way to restructure it that reads well and is
maintainable, and also reduces the chance of it being a good gadget then
sure.

cheers

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-05  0:22           ` Michael Ellerman
@ 2021-02-05  2:16             ` Nicholas Piggin
  2021-02-05  6:04               ` Christophe Leroy
  0 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2021-02-05  2:16 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, Michael Ellerman; +Cc: Michal Suchanek

Excerpts from Michael Ellerman's message of February 5, 2021 10:22 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Excerpts from Christophe Leroy's message of February 4, 2021 6:03 pm:
>>> Le 04/02/2021 à 04:27, Nicholas Piggin a écrit :
>>>> Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
>>>>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
> ...
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>>>>>> +	 * The value of AMR only matters while we're in the kernel.
>>>>>> +	 */
>>>>>> +	kuap_restore_amr(regs);
>>>>>
>>>>> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
>>>>>
>>>>> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in
>>>>> a way or another, and get the previous KUAP state restored by this way ?
>>>> 
>>>> I'm not sure if there much more risk if it's here rather than the
>>>> instruction being in another place in the code.
>>>> 
>>>> There's a lot of user access around the kernel too if you want to find a
>>>> gadget to unlock KUAP then I suppose there is a pretty large attack
>>>> surface.
>>> 
>>> My understanding is that user access scope is strictly limited, for instance we enforce the 
>>> begin/end of user access to be in the same function, and we refrain from calling any other function 
>>> inside the user access window. x86 even have 'objtool' to enforce it at build time. So in theory 
>>> there is no way to get out of the function while user access is open.
>>> 
>>> Here with the interrupt exit function it is free beer. You have a place where you re-open user 
>>> access and return with a simple blr. So that's open bar. If someone manages to just call the 
>>> interrupt exit function, then user access remains open
>>
>> Hmm okay maybe that's a good point.
> 
> I don't think it's a very attractive gadget, it's not just a plain blr,
> it does a full stack frame tear down before the return. And there's no
> LR reloads anywhere very close.
> 
> Obviously it depends on what the compiler decides to do, it's possible
> it could be a usable gadget. But there are other places that are more
> attractive I think, eg:
> 
> c00000000061d768:	a6 03 3d 7d 	mtspr   29,r9
> c00000000061d76c:	2c 01 00 4c 	isync
> c00000000061d770:	00 00 00 60 	nop
> c00000000061d774:	30 00 21 38 	addi    r1,r1,48
> c00000000061d778:	20 00 80 4e 	blr
> 
> 
> So I don't think we should redesign the code *purely* because we're
> worried about interrupt_exit_kernel_prepare() being a useful gadget. If
> we can come up with a way to restructure it that reads well and is
> maintainable, and also reduces the chance of it being a good gadget then
> sure.

Okay. That would be good if we can keep it in C, the pkeys + kuap combo
is fairly complicated and we might want to something cleverer with it, 
so that would make it even more difficult in asm.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-05  2:16             ` Nicholas Piggin
@ 2021-02-05  6:04               ` Christophe Leroy
  2021-02-06  2:28                 ` Nicholas Piggin
  0 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2021-02-05  6:04 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, Michael Ellerman; +Cc: Michal Suchanek



Le 05/02/2021 à 03:16, Nicholas Piggin a écrit :
> Excerpts from Michael Ellerman's message of February 5, 2021 10:22 am:
>> Nicholas Piggin <npiggin@gmail.com> writes:
>>> Excerpts from Christophe Leroy's message of February 4, 2021 6:03 pm:
>>>> Le 04/02/2021 à 04:27, Nicholas Piggin a écrit :
>>>>> Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
>>>>>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> ...
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>>>>>>> +	 * The value of AMR only matters while we're in the kernel.
>>>>>>> +	 */
>>>>>>> +	kuap_restore_amr(regs);
>>>>>>
>>>>>> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
>>>>>>
>>>>>> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in
>>>>>> a way or another, and get the previous KUAP state restored by this way ?
>>>>>
>>>>> I'm not sure if there much more risk if it's here rather than the
>>>>> instruction being in another place in the code.
>>>>>
>>>>> There's a lot of user access around the kernel too if you want to find a
>>>>> gadget to unlock KUAP then I suppose there is a pretty large attack
>>>>> surface.
>>>>
>>>> My understanding is that user access scope is strictly limited, for instance we enforce the
>>>> begin/end of user access to be in the same function, and we refrain from calling any other function
>>>> inside the user access window. x86 even have 'objtool' to enforce it at build time. So in theory
>>>> there is no way to get out of the function while user access is open.
>>>>
>>>> Here with the interrupt exit function it is free beer. You have a place where you re-open user
>>>> access and return with a simple blr. So that's open bar. If someone manages to just call the
>>>> interrupt exit function, then user access remains open
>>>
>>> Hmm okay maybe that's a good point.
>>
>> I don't think it's a very attractive gadget, it's not just a plain blr,
>> it does a full stack frame tear down before the return. And there's no
>> LR reloads anywhere very close.
>>
>> Obviously it depends on what the compiler decides to do, it's possible
>> it could be a usable gadget. But there are other places that are more
>> attractive I think, eg:
>>
>> c00000000061d768:	a6 03 3d 7d 	mtspr   29,r9
>> c00000000061d76c:	2c 01 00 4c 	isync
>> c00000000061d770:	00 00 00 60 	nop
>> c00000000061d774:	30 00 21 38 	addi    r1,r1,48
>> c00000000061d778:	20 00 80 4e 	blr
>>
>>
>> So I don't think we should redesign the code *purely* because we're
>> worried about interrupt_exit_kernel_prepare() being a useful gadget. If
>> we can come up with a way to restructure it that reads well and is
>> maintainable, and also reduces the chance of it being a good gadget then
>> sure.
> 
> Okay. That would be good if we can keep it in C, the pkeys + kuap combo
> is fairly complicated and we might want to something cleverer with it,
> so that would make it even more difficult in asm.
> 

Ok.

For ppc32, I prefer to keep it in assembly for the time being and move everything from ASM to C at 
once after porting syscall and interrupts to C and wrappers.

Hope this is OK for you.

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-05  6:04               ` Christophe Leroy
@ 2021-02-06  2:28                 ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2021-02-06  2:28 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, Michael Ellerman; +Cc: Michal Suchanek

Excerpts from Christophe Leroy's message of February 5, 2021 4:04 pm:
> 
> 
> Le 05/02/2021 à 03:16, Nicholas Piggin a écrit :
>> Excerpts from Michael Ellerman's message of February 5, 2021 10:22 am:
>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>> Excerpts from Christophe Leroy's message of February 4, 2021 6:03 pm:
>>>>> Le 04/02/2021 à 04:27, Nicholas Piggin a écrit :
>>>>>> Excerpts from Christophe Leroy's message of February 4, 2021 2:25 am:
>>>>>>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>>> ...
>>>>>>>> +
>>>>>>>> +	/*
>>>>>>>> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
>>>>>>>> +	 * The value of AMR only matters while we're in the kernel.
>>>>>>>> +	 */
>>>>>>>> +	kuap_restore_amr(regs);
>>>>>>>
>>>>>>> Is that correct to restore KUAP state here ? Shouldn't we have it at lower level in assembly ?
>>>>>>>
>>>>>>> Isn't there a risk that someone manages to call interrupt_exit_kernel_prepare() or the end of it in
>>>>>>> a way or another, and get the previous KUAP state restored by this way ?
>>>>>>
>>>>>> I'm not sure if there much more risk if it's here rather than the
>>>>>> instruction being in another place in the code.
>>>>>>
>>>>>> There's a lot of user access around the kernel too if you want to find a
>>>>>> gadget to unlock KUAP then I suppose there is a pretty large attack
>>>>>> surface.
>>>>>
>>>>> My understanding is that user access scope is strictly limited, for instance we enforce the
>>>>> begin/end of user access to be in the same function, and we refrain from calling any other function
>>>>> inside the user access window. x86 even have 'objtool' to enforce it at build time. So in theory
>>>>> there is no way to get out of the function while user access is open.
>>>>>
>>>>> Here with the interrupt exit function it is free beer. You have a place where you re-open user
>>>>> access and return with a simple blr. So that's open bar. If someone manages to just call the
>>>>> interrupt exit function, then user access remains open
>>>>
>>>> Hmm okay maybe that's a good point.
>>>
>>> I don't think it's a very attractive gadget, it's not just a plain blr,
>>> it does a full stack frame tear down before the return. And there's no
>>> LR reloads anywhere very close.
>>>
>>> Obviously it depends on what the compiler decides to do, it's possible
>>> it could be a usable gadget. But there are other places that are more
>>> attractive I think, eg:
>>>
>>> c00000000061d768:	a6 03 3d 7d 	mtspr   29,r9
>>> c00000000061d76c:	2c 01 00 4c 	isync
>>> c00000000061d770:	00 00 00 60 	nop
>>> c00000000061d774:	30 00 21 38 	addi    r1,r1,48
>>> c00000000061d778:	20 00 80 4e 	blr
>>>
>>>
>>> So I don't think we should redesign the code *purely* because we're
>>> worried about interrupt_exit_kernel_prepare() being a useful gadget. If
>>> we can come up with a way to restructure it that reads well and is
>>> maintainable, and also reduces the chance of it being a good gadget then
>>> sure.
>> 
>> Okay. That would be good if we can keep it in C, the pkeys + kuap combo
>> is fairly complicated and we might want to something cleverer with it,
>> so that would make it even more difficult in asm.
>> 
> 
> Ok.
> 
> For ppc32, I prefer to keep it in assembly for the time being and move everything from ASM to C at 
> once after porting syscall and interrupts to C and wrappers.
> 
> Hope this is OK for you.

I don't see a problem with that.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
  2021-01-27  8:54   ` Christophe Leroy
  2021-02-03 16:25   ` Christophe Leroy
@ 2021-02-27 10:07   ` Christophe Leroy
  2021-03-01  0:47     ` Nicholas Piggin
  2021-03-15 13:41   ` Christophe Leroy
  3 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2021-02-27 10:07 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek



Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
> Implement the bulk of interrupt return logic in C. The asm return code
> must handle a few cases: restoring full GPRs, and emulating stack store.
> 
> The stack store emulation is significantly simplfied, rather than creating
> a new return frame and switching to that before performing the store, it
> uses the PACA to keep a scratch register around to perform thestore.
> 
> The asm return code is moved into 64e for now. The new logic has made
> allowance for 64e, but I don't have a full environment that works well
> to test it, and even booting in emulated qemu is not great for stress
> testing. 64e shouldn't be too far off working with this, given a bit
> more testing and auditing of the logic.
> 
> This is slightly faster on a POWER9 (page fault speed increases about
> 1.1%), probably due to reduced mtmsrd.


This series, and especially this patch has added a awfull number of BUG_ON() traps.

We have an issue open at https://github.com/linuxppc/issues/issues/88 since 2017 for reducing the 
number of BUG_ON()s

And the kernel Documentation is explicit on the willingness to deprecate BUG_ON(), see 
https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=bug_on :

BUG() and BUG_ON()
Use WARN() and WARN_ON() instead, and handle the “impossible” error condition as gracefully as 
possible. While the BUG()-family of APIs were originally designed to act as an “impossible 
situation” assert and to kill a kernel thread “safely”, they turn out to just be too risky. (e.g. 
“In what order do locks need to be released? Have various states been restored?”) Very commonly, 
using BUG() will destabilize a system or entirely break it, which makes it impossible to debug or 
even get viable crash reports. Linus has very strong feelings about this.

So ... can we do something cleaner with all the BUG_ON()s recently added ?

Christophe

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
> v2,rebase (from Michal):
> - Move the FP restore functions to restore_math. They are not used
>    anywhere else and when restore_math is not built gcc warns about them
>    being unused (ms)
> - Add asm/context_tracking.h include to exceptions-64e.S for SCHEDULE_USER
>    definition
> 
> v3:
> - Fix return from interrupt replay problem by replaying interrupts rather
>    than enabling irqs. This ends up being cleaner and __check_irq_replay
>    goes away completely for 64s. Should bring 64e up to speed and kill a lot
>    of cruft after it's proven on 64s.
> - Don't use _GLOBAL if it's not called from C
> - Simplify stack store emulation code further, add a bit more commenting.
> - Some missing no probe annotations
> 
>   .../powerpc/include/asm/book3s/64/kup-radix.h |  10 +
>   arch/powerpc/include/asm/hw_irq.h             |   1 +
>   arch/powerpc/include/asm/switch_to.h          |   6 +
>   arch/powerpc/kernel/entry_64.S                | 486 +++++-------------
>   arch/powerpc/kernel/exceptions-64e.S          | 255 ++++++++-
>   arch/powerpc/kernel/exceptions-64s.S          | 119 ++---
>   arch/powerpc/kernel/irq.c                     |  36 +-
>   arch/powerpc/kernel/process.c                 |  89 ++--
>   arch/powerpc/kernel/syscall_64.c              | 164 +++++-
>   arch/powerpc/kernel/vector.S                  |   2 +-
>   10 files changed, 642 insertions(+), 526 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index 71081d90f999..3bcef989a35d 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> @@ -60,6 +60,12 @@
>   #include <asm/mmu.h>
>   #include <asm/ptrace.h>
>   
> +static inline void kuap_restore_amr(struct pt_regs *regs)
> +{
> +	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
> +		mtspr(SPRN_AMR, regs->kuap);
> +}
> +
>   static inline void kuap_check_amr(void)
>   {
>   	if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP))
> @@ -136,6 +142,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
>   		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
>   }
>   #else /* CONFIG_PPC_KUAP */
> +static inline void kuap_restore_amr(struct pt_regs *regs)
> +{
> +}
> +
>   static inline void kuap_check_amr(void)
>   {
>   }
> diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
> index 0e9a9598f91f..e0e71777961f 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -52,6 +52,7 @@
>   #ifndef __ASSEMBLY__
>   
>   extern void replay_system_reset(void);
> +extern void replay_soft_interrupts(void);
>   
>   extern void timer_interrupt(struct pt_regs *);
>   extern void timer_broadcast_interrupt(void);
> diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
> index 476008bc3d08..b867b58b1093 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -23,7 +23,13 @@ extern void switch_booke_debug_regs(struct debug_reg *new_debug);
>   
>   extern int emulate_altivec(struct pt_regs *);
>   
> +#ifdef CONFIG_PPC_BOOK3S_64
>   void restore_math(struct pt_regs *regs);
> +#else
> +static inline void restore_math(struct pt_regs *regs)
> +{
> +}
> +#endif
>   
>   void restore_tm_state(struct pt_regs *regs);
>   
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 0e2c56573a41..e13eac968dfc 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -16,6 +16,7 @@
>   
>   #include <linux/errno.h>
>   #include <linux/err.h>
> +#include <asm/cache.h>
>   #include <asm/unistd.h>
>   #include <asm/processor.h>
>   #include <asm/page.h>
> @@ -221,6 +222,7 @@ _GLOBAL(ret_from_kernel_thread)
>   	li	r3,0
>   	b	.Lsyscall_exit
>   
> +#ifdef CONFIG_PPC_BOOK3E
>   /* Save non-volatile GPRs, if not already saved. */
>   _GLOBAL(save_nvgprs)
>   	ld	r11,_TRAP(r1)
> @@ -231,6 +233,7 @@ _GLOBAL(save_nvgprs)
>   	std	r0,_TRAP(r1)
>   	blr
>   _ASM_NOKPROBE_SYMBOL(save_nvgprs);
> +#endif
>   
>   #ifdef CONFIG_PPC_BOOK3S_64
>   
> @@ -294,7 +297,7 @@ flush_count_cache:
>    * state of one is saved on its kernel stack.  Then the state
>    * of the other is restored from its kernel stack.  The memory
>    * management hardware is updated to the second process's state.
> - * Finally, we can return to the second process, via ret_from_except.
> + * Finally, we can return to the second process, via interrupt_return.
>    * On entry, r3 points to the THREAD for the current task, r4
>    * points to the THREAD for the new task.
>    *
> @@ -446,408 +449,151 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>   	addi	r1,r1,SWITCH_FRAME_SIZE
>   	blr
>   
> -	.align	7
> -_GLOBAL(ret_from_except)
> -	ld	r11,_TRAP(r1)
> -	andi.	r0,r11,1
> -	bne	ret_from_except_lite
> -	REST_NVGPRS(r1)
> -
> -_GLOBAL(ret_from_except_lite)
> +#ifdef CONFIG_PPC_BOOK3S
>   	/*
> -	 * Disable interrupts so that current_thread_info()->flags
> -	 * can't change between when we test it and when we return
> -	 * from the interrupt.
> -	 */
> -#ifdef CONFIG_PPC_BOOK3E
> -	wrteei	0
> -#else
> -	li	r10,MSR_RI
> -	mtmsrd	r10,1		  /* Update machine state */
> -#endif /* CONFIG_PPC_BOOK3E */
> +	 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
> +	 * touched, AMR not set, no exit work created, then this can be used.
> +	 */
> +	.balign IFETCH_ALIGN_BYTES
> +	.globl fast_interrupt_return
> +fast_interrupt_return:
> +_ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
> +	ld	r4,_MSR(r1)
> +	andi.	r0,r4,MSR_PR
> +	bne	.Lfast_user_interrupt_return
> +	andi.	r0,r4,MSR_RI
> +	bne+	.Lfast_kernel_interrupt_return
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	bl	unrecoverable_exception
> +	b	. /* should not get here */
>   
> -	ld	r9, PACA_THREAD_INFO(r13)
> -	ld	r3,_MSR(r1)
> -#ifdef CONFIG_PPC_BOOK3E
> -	ld	r10,PACACURRENT(r13)
> -#endif /* CONFIG_PPC_BOOK3E */
> -	ld	r4,TI_FLAGS(r9)
> -	andi.	r3,r3,MSR_PR
> -	beq	resume_kernel
> -#ifdef CONFIG_PPC_BOOK3E
> -	lwz	r3,(THREAD+THREAD_DBCR0)(r10)
> -#endif /* CONFIG_PPC_BOOK3E */
> +	.balign IFETCH_ALIGN_BYTES
> +	.globl interrupt_return
> +interrupt_return:
> +_ASM_NOKPROBE_SYMBOL(interrupt_return)
> +	REST_NVGPRS(r1)
>   
> -	/* Check current_thread_info()->flags */
> -	andi.	r0,r4,_TIF_USER_WORK_MASK
> -	bne	1f
> -#ifdef CONFIG_PPC_BOOK3E
> -	/*
> -	 * Check to see if the dbcr0 register is set up to debug.
> -	 * Use the internal debug mode bit to do this.
> -	 */
> -	andis.	r0,r3,DBCR0_IDM@h
> -	beq	restore
> -	mfmsr	r0
> -	rlwinm	r0,r0,0,~MSR_DE	/* Clear MSR.DE */
> -	mtmsr	r0
> -	mtspr	SPRN_DBCR0,r3
> -	li	r10, -1
> -	mtspr	SPRN_DBSR,r10
> -	b	restore
> -#else
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	bl	restore_math
> -	b	restore
> -#endif
> -1:	andi.	r0,r4,_TIF_NEED_RESCHED
> -	beq	2f
> -	bl	restore_interrupts
> -	SCHEDULE_USER
> -	b	ret_from_except_lite
> -2:
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> -	andi.	r0,r4,_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM
> -	bne	3f		/* only restore TM if nothing else to do */
> +	.balign IFETCH_ALIGN_BYTES
> +	.globl interrupt_return_lite
> +interrupt_return_lite:
> +_ASM_NOKPROBE_SYMBOL(interrupt_return_lite)
> +	ld	r4,_MSR(r1)
> +	andi.	r0,r4,MSR_PR
> +	beq	.Lkernel_interrupt_return
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	bl	restore_tm_state
> -	b	restore
> -3:
> -#endif
> -	bl	save_nvgprs
> -	/*
> -	 * Use a non volatile GPR to save and restore our thread_info flags
> -	 * across the call to restore_interrupts.
> -	 */
> -	mr	r30,r4
> -	bl	restore_interrupts
> -	mr	r4,r30
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	bl	do_notify_resume
> -	b	ret_from_except
> -
> -resume_kernel:
> -	/* check current_thread_info, _TIF_EMULATE_STACK_STORE */
> -	andis.	r8,r4,_TIF_EMULATE_STACK_STORE@h
> -	beq+	1f
> +	bl	interrupt_exit_user_prepare
> +	cmpdi	r3,0
> +	bne-	.Lrestore_nvgprs
>   
> -	addi	r8,r1,INT_FRAME_SIZE	/* Get the kprobed function entry */
> +.Lfast_user_interrupt_return:
> +	ld	r11,_NIP(r1)
> +	ld	r12,_MSR(r1)
> +BEGIN_FTR_SECTION
> +	ld	r10,_PPR(r1)
> +	mtspr	SPRN_PPR,r10
> +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> +	mtspr	SPRN_SRR0,r11
> +	mtspr	SPRN_SRR1,r12
>   
> -	ld	r3,GPR1(r1)
> -	subi	r3,r3,INT_FRAME_SIZE	/* dst: Allocate a trampoline exception frame */
> -	mr	r4,r1			/* src:  current exception frame */
> -	mr	r1,r3			/* Reroute the trampoline frame to r1 */
> +BEGIN_FTR_SECTION
> +	stdcx.	r0,0,r1		/* to clear the reservation */
> +FTR_SECTION_ELSE
> +	ldarx	r0,0,r1
> +ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
>   
> -	/* Copy from the original to the trampoline. */
> -	li	r5,INT_FRAME_SIZE/8	/* size: INT_FRAME_SIZE */
> -	li	r6,0			/* start offset: 0 */
> -	mtctr	r5
> -2:	ldx	r0,r6,r4
> -	stdx	r0,r6,r3
> -	addi	r6,r6,8
> -	bdnz	2b
> -
> -	/* Do real store operation to complete stdu */
> -	ld	r5,GPR1(r1)
> -	std	r8,0(r5)
> -
> -	/* Clear _TIF_EMULATE_STACK_STORE flag */
> -	lis	r11,_TIF_EMULATE_STACK_STORE@h
> -	addi	r5,r9,TI_FLAGS
> -0:	ldarx	r4,0,r5
> -	andc	r4,r4,r11
> -	stdcx.	r4,0,r5
> -	bne-	0b
> -1:
> -
> -#ifdef CONFIG_PREEMPTION
> -	/* Check if we need to preempt */
> -	andi.	r0,r4,_TIF_NEED_RESCHED
> -	beq+	restore
> -	/* Check that preempt_count() == 0 and interrupts are enabled */
> -	lwz	r8,TI_PREEMPT(r9)
> -	cmpwi	cr0,r8,0
> -	bne	restore
> -	ld	r0,SOFTE(r1)
> -	andi.	r0,r0,IRQS_DISABLED
> -	bne	restore
> +	ld	r3,_CCR(r1)
> +	ld	r4,_LINK(r1)
> +	ld	r5,_CTR(r1)
> +	ld	r6,_XER(r1)
> +	li	r0,0
>   
> -	/*
> -	 * Here we are preempting the current task. We want to make
> -	 * sure we are soft-disabled first and reconcile irq state.
> -	 */
> -	RECONCILE_IRQ_STATE(r3,r4)
> -	bl	preempt_schedule_irq
> +	REST_4GPRS(7, r1)
> +	REST_2GPRS(11, r1)
> +	REST_GPR(13, r1)
>   
> -	/*
> -	 * arch_local_irq_restore() from preempt_schedule_irq above may
> -	 * enable hard interrupt but we really should disable interrupts
> -	 * when we return from the interrupt, and so that we don't get
> -	 * interrupted after loading SRR0/1.
> -	 */
> -#ifdef CONFIG_PPC_BOOK3E
> -	wrteei	0
> -#else
> -	li	r10,MSR_RI
> -	mtmsrd	r10,1		  /* Update machine state */
> -#endif /* CONFIG_PPC_BOOK3E */
> -#endif /* CONFIG_PREEMPTION */
> +	mtcr	r3
> +	mtlr	r4
> +	mtctr	r5
> +	mtspr	SPRN_XER,r6
>   
> -	.globl	fast_exc_return_irq
> -fast_exc_return_irq:
> -restore:
> -	/*
> -	 * This is the main kernel exit path. First we check if we
> -	 * are about to re-enable interrupts
> -	 */
> -	ld	r5,SOFTE(r1)
> -	lbz	r6,PACAIRQSOFTMASK(r13)
> -	andi.	r5,r5,IRQS_DISABLED
> -	bne	.Lrestore_irq_off
> +	REST_4GPRS(2, r1)
> +	REST_GPR(6, r1)
> +	REST_GPR(0, r1)
> +	REST_GPR(1, r1)
> +	RFI_TO_USER
> +	b	.	/* prevent speculative execution */
>   
> -	/* We are enabling, were we already enabled ? Yes, just return */
> -	andi.	r6,r6,IRQS_DISABLED
> -	beq	cr0,.Ldo_restore
> +.Lrestore_nvgprs:
> +	REST_NVGPRS(r1)
> +	b	.Lfast_user_interrupt_return
>   
> -	/*
> -	 * We are about to soft-enable interrupts (we are hard disabled
> -	 * at this point). We check if there's anything that needs to
> -	 * be replayed first.
> -	 */
> -	lbz	r0,PACAIRQHAPPENED(r13)
> -	cmpwi	cr0,r0,0
> -	bne-	.Lrestore_check_irq_replay
> +	.balign IFETCH_ALIGN_BYTES
> +.Lkernel_interrupt_return:
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	bl	interrupt_exit_kernel_prepare
> +	cmpdi	cr1,r3,0
>   
> -	/*
> -	 * Get here when nothing happened while soft-disabled, just
> -	 * soft-enable and move-on. We will hard-enable as a side
> -	 * effect of rfi
> -	 */
> -.Lrestore_no_replay:
> -	TRACE_ENABLE_INTS
> -	li	r0,IRQS_ENABLED
> -	stb	r0,PACAIRQSOFTMASK(r13);
> +.Lfast_kernel_interrupt_return:
> +	ld	r11,_NIP(r1)
> +	ld	r12,_MSR(r1)
> +	mtspr	SPRN_SRR0,r11
> +	mtspr	SPRN_SRR1,r12
>   
> -	/*
> -	 * Final return path. BookE is handled in a different file
> -	 */
> -.Ldo_restore:
> -#ifdef CONFIG_PPC_BOOK3E
> -	b	exception_return_book3e
> -#else
> -	/*
> -	 * Clear the reservation. If we know the CPU tracks the address of
> -	 * the reservation then we can potentially save some cycles and use
> -	 * a larx. On POWER6 and POWER7 this is significantly faster.
> -	 */
>   BEGIN_FTR_SECTION
>   	stdcx.	r0,0,r1		/* to clear the reservation */
>   FTR_SECTION_ELSE
> -	ldarx	r4,0,r1
> +	ldarx	r0,0,r1
>   ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
>   
> -	/*
> -	 * Some code path such as load_up_fpu or altivec return directly
> -	 * here. They run entirely hard disabled and do not alter the
> -	 * interrupt state. They also don't use lwarx/stwcx. and thus
> -	 * are known not to leave dangling reservations.
> -	 */
> -	.globl	fast_exception_return
> -fast_exception_return:
> -	ld	r3,_MSR(r1)
> +	ld	r3,_LINK(r1)
>   	ld	r4,_CTR(r1)
> -	ld	r0,_LINK(r1)
> -	mtctr	r4
> -	mtlr	r0
> -	ld	r4,_XER(r1)
> -	mtspr	SPRN_XER,r4
> -
> -	kuap_check_amr r5, r6
> -
> -	REST_8GPRS(5, r1)
> -
> -	andi.	r0,r3,MSR_RI
> -	beq-	.Lunrecov_restore
> -
> -	/*
> -	 * Clear RI before restoring r13.  If we are returning to
> -	 * userspace and we take an exception after restoring r13,
> -	 * we end up corrupting the userspace r13 value.
> -	 */
> -	li	r4,0
> -	mtmsrd	r4,1
> -
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> -	/* TM debug */
> -	std	r3, PACATMSCRATCH(r13) /* Stash returned-to MSR */
> -#endif
> -	/*
> -	 * r13 is our per cpu area, only restore it if we are returning to
> -	 * userspace the value stored in the stack frame may belong to
> -	 * another CPU.
> -	 */
> -	andi.	r0,r3,MSR_PR
> -	beq	1f
> -BEGIN_FTR_SECTION
> -	/* Restore PPR */
> -	ld	r2,_PPR(r1)
> -	mtspr	SPRN_PPR,r2
> -END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> -	ACCOUNT_CPU_USER_EXIT(r13, r2, r4)
> -	REST_GPR(13, r1)
> -
> -	/*
> -	 * We don't need to restore AMR on the way back to userspace for KUAP.
> -	 * The value of AMR only matters while we're in the kernel.
> -	 */
> -	mtspr	SPRN_SRR1,r3
> -
> -	ld	r2,_CCR(r1)
> -	mtcrf	0xFF,r2
> -	ld	r2,_NIP(r1)
> -	mtspr	SPRN_SRR0,r2
> -
> -	ld	r0,GPR0(r1)
> -	ld	r2,GPR2(r1)
> -	ld	r3,GPR3(r1)
> -	ld	r4,GPR4(r1)
> -	ld	r1,GPR1(r1)
> -	RFI_TO_USER
> -	b	.	/* prevent speculative execution */
> +	ld	r5,_XER(r1)
> +	ld	r6,_CCR(r1)
> +	li	r0,0
>   
> -1:	mtspr	SPRN_SRR1,r3
> +	REST_4GPRS(7, r1)
> +	REST_2GPRS(11, r1)
>   
> -	ld	r2,_CCR(r1)
> -	mtcrf	0xFF,r2
> -	ld	r2,_NIP(r1)
> -	mtspr	SPRN_SRR0,r2
> +	mtlr	r3
> +	mtctr	r4
> +	mtspr	SPRN_XER,r5
>   
>   	/*
>   	 * Leaving a stale exception_marker on the stack can confuse
>   	 * the reliable stack unwinder later on. Clear it.
>   	 */
> -	li	r2,0
> -	std	r2,STACK_FRAME_OVERHEAD-16(r1)
> +	std	r0,STACK_FRAME_OVERHEAD-16(r1)
>   
> -	ld	r0,GPR0(r1)
> -	ld	r2,GPR2(r1)
> -	ld	r3,GPR3(r1)
> +	REST_4GPRS(2, r1)
>   
> -	kuap_restore_amr r4
> -
> -	ld	r4,GPR4(r1)
> -	ld	r1,GPR1(r1)
> +	bne-	cr1,1f /* emulate stack store */
> +	mtcr	r6
> +	REST_GPR(6, r1)
> +	REST_GPR(0, r1)
> +	REST_GPR(1, r1)
>   	RFI_TO_KERNEL
>   	b	.	/* prevent speculative execution */
>   
> -#endif /* CONFIG_PPC_BOOK3E */
> -
> -	/*
> -	 * We are returning to a context with interrupts soft disabled.
> -	 *
> -	 * However, we may also about to hard enable, so we need to
> -	 * make sure that in this case, we also clear PACA_IRQ_HARD_DIS
> -	 * or that bit can get out of sync and bad things will happen
> -	 */
> -.Lrestore_irq_off:
> -	ld	r3,_MSR(r1)
> -	lbz	r7,PACAIRQHAPPENED(r13)
> -	andi.	r0,r3,MSR_EE
> -	beq	1f
> -	rlwinm	r7,r7,0,~PACA_IRQ_HARD_DIS
> -	stb	r7,PACAIRQHAPPENED(r13)
> -1:
> -#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
> -	/* The interrupt should not have soft enabled. */
> -	lbz	r7,PACAIRQSOFTMASK(r13)
> -1:	tdeqi	r7,IRQS_ENABLED
> -	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
> -#endif
> -	b	.Ldo_restore
> -
> -	/*
> -	 * Something did happen, check if a re-emit is needed
> -	 * (this also clears paca->irq_happened)
> -	 */
> -.Lrestore_check_irq_replay:
> -	/* XXX: We could implement a fast path here where we check
> -	 * for irq_happened being just 0x01, in which case we can
> -	 * clear it and return. That means that we would potentially
> -	 * miss a decrementer having wrapped all the way around.
> -	 *
> -	 * Still, this might be useful for things like hash_page
> -	 */
> -	bl	__check_irq_replay
> -	cmpwi	cr0,r3,0
> -	beq	.Lrestore_no_replay
> -
> -	/*
> -	 * We need to re-emit an interrupt. We do so by re-using our
> -	 * existing exception frame. We first change the trap value,
> -	 * but we need to ensure we preserve the low nibble of it
> -	 */
> -	ld	r4,_TRAP(r1)
> -	clrldi	r4,r4,60
> -	or	r4,r4,r3
> -	std	r4,_TRAP(r1)
> -
> -	/*
> -	 * PACA_IRQ_HARD_DIS won't always be set here, so set it now
> -	 * to reconcile the IRQ state. Tracing is already accounted for.
> -	 */
> -	lbz	r4,PACAIRQHAPPENED(r13)
> -	ori	r4,r4,PACA_IRQ_HARD_DIS
> -	stb	r4,PACAIRQHAPPENED(r13)
> -
> -	/*
> -	 * Then find the right handler and call it. Interrupts are
> -	 * still soft-disabled and we keep them that way.
> -	*/
> -	cmpwi	cr0,r3,0x500
> -	bne	1f
> -	addi	r3,r1,STACK_FRAME_OVERHEAD;
> - 	bl	do_IRQ
> -	b	ret_from_except
> -1:	cmpwi	cr0,r3,0xf00
> -	bne	1f
> -	addi	r3,r1,STACK_FRAME_OVERHEAD;
> -	bl	performance_monitor_exception
> -	b	ret_from_except
> -1:	cmpwi	cr0,r3,0xe60
> -	bne	1f
> -	addi	r3,r1,STACK_FRAME_OVERHEAD;
> -	bl	handle_hmi_exception
> -	b	ret_from_except
> -1:	cmpwi	cr0,r3,0x900
> -	bne	1f
> -	addi	r3,r1,STACK_FRAME_OVERHEAD;
> -	bl	timer_interrupt
> -	b	ret_from_except
> -#ifdef CONFIG_PPC_DOORBELL
> -1:
> -#ifdef CONFIG_PPC_BOOK3E
> -	cmpwi	cr0,r3,0x280
> -#else
> -	cmpwi	cr0,r3,0xa00
> -#endif /* CONFIG_PPC_BOOK3E */
> -	bne	1f
> -	addi	r3,r1,STACK_FRAME_OVERHEAD;
> -	bl	doorbell_exception
> -#endif /* CONFIG_PPC_DOORBELL */
> -1:	b	ret_from_except /* What else to do here ? */
> -
> -.Lunrecov_restore:
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	bl	unrecoverable_exception
> -	b	.Lunrecov_restore
> -
> -_ASM_NOKPROBE_SYMBOL(ret_from_except);
> -_ASM_NOKPROBE_SYMBOL(ret_from_except_lite);
> -_ASM_NOKPROBE_SYMBOL(resume_kernel);
> -_ASM_NOKPROBE_SYMBOL(fast_exc_return_irq);
> -_ASM_NOKPROBE_SYMBOL(restore);
> -_ASM_NOKPROBE_SYMBOL(fast_exception_return);
> +1:	/*
> +	 * Emulate stack store with update. New r1 value was already calculated
> +	 * and updated in our interrupt regs by emulate_loadstore, but we can't
> +	 * store the previous value of r1 to the stack before re-loading our
> +	 * registers from it, otherwise they could be clobbered.  Use
> +	 * PACA_EXGEN as temporary storage to hold the store data, as
> +	 * interrupts are disabled here so it won't be clobbered.
> +	 */
> +	mtcr	r6
> +	std	r9,PACA_EXGEN+0(r13)
> +	addi	r9,r1,INT_FRAME_SIZE /* get original r1 */
> +	REST_GPR(6, r1)
> +	REST_GPR(0, r1)
> +	REST_GPR(1, r1)
> +	std	r9,0(r1) /* perform store component of stdu */
> +	ld	r9,PACA_EXGEN+0(r13)
>   
> +	RFI_TO_KERNEL
> +	b	.	/* prevent speculative execution */
> +#endif /* CONFIG_PPC_BOOK3S */
>   
>   #ifdef CONFIG_PPC_RTAS
>   /*
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index 4efac5490216..d9ed79415100 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -24,6 +24,7 @@
>   #include <asm/kvm_asm.h>
>   #include <asm/kvm_booke_hv_asm.h>
>   #include <asm/feature-fixups.h>
> +#include <asm/context_tracking.h>
>   
>   /* XXX This will ultimately add space for a special exception save
>    *     structure used to save things like SRR0/SRR1, SPRGs, MAS, etc...
> @@ -1041,17 +1042,161 @@ alignment_more:
>   	bl	alignment_exception
>   	b	ret_from_except
>   
> -/*
> - * We branch here from entry_64.S for the last stage of the exception
> - * return code path. MSR:EE is expected to be off at that point
> - */
> -_GLOBAL(exception_return_book3e)
> -	b	1f
> +	.align	7
> +_GLOBAL(ret_from_except)
> +	ld	r11,_TRAP(r1)
> +	andi.	r0,r11,1
> +	bne	ret_from_except_lite
> +	REST_NVGPRS(r1)
> +
> +_GLOBAL(ret_from_except_lite)
> +	/*
> +	 * Disable interrupts so that current_thread_info()->flags
> +	 * can't change between when we test it and when we return
> +	 * from the interrupt.
> +	 */
> +	wrteei	0
> +
> +	ld	r9, PACA_THREAD_INFO(r13)
> +	ld	r3,_MSR(r1)
> +	ld	r10,PACACURRENT(r13)
> +	ld	r4,TI_FLAGS(r9)
> +	andi.	r3,r3,MSR_PR
> +	beq	resume_kernel
> +	lwz	r3,(THREAD+THREAD_DBCR0)(r10)
> +
> +	/* Check current_thread_info()->flags */
> +	andi.	r0,r4,_TIF_USER_WORK_MASK
> +	bne	1f
> +	/*
> +	 * Check to see if the dbcr0 register is set up to debug.
> +	 * Use the internal debug mode bit to do this.
> +	 */
> +	andis.	r0,r3,DBCR0_IDM@h
> +	beq	restore
> +	mfmsr	r0
> +	rlwinm	r0,r0,0,~MSR_DE	/* Clear MSR.DE */
> +	mtmsr	r0
> +	mtspr	SPRN_DBCR0,r3
> +	li	r10, -1
> +	mtspr	SPRN_DBSR,r10
> +	b	restore
> +1:	andi.	r0,r4,_TIF_NEED_RESCHED
> +	beq	2f
> +	bl	restore_interrupts
> +	SCHEDULE_USER
> +	b	ret_from_except_lite
> +2:
> +	bl	save_nvgprs
> +	/*
> +	 * Use a non volatile GPR to save and restore our thread_info flags
> +	 * across the call to restore_interrupts.
> +	 */
> +	mr	r30,r4
> +	bl	restore_interrupts
> +	mr	r4,r30
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	bl	do_notify_resume
> +	b	ret_from_except
> +
> +resume_kernel:
> +	/* check current_thread_info, _TIF_EMULATE_STACK_STORE */
> +	andis.	r8,r4,_TIF_EMULATE_STACK_STORE@h
> +	beq+	1f
> +
> +	addi	r8,r1,INT_FRAME_SIZE	/* Get the kprobed function entry */
> +
> +	ld	r3,GPR1(r1)
> +	subi	r3,r3,INT_FRAME_SIZE	/* dst: Allocate a trampoline exception frame */
> +	mr	r4,r1			/* src:  current exception frame */
> +	mr	r1,r3			/* Reroute the trampoline frame to r1 */
> +
> +	/* Copy from the original to the trampoline. */
> +	li	r5,INT_FRAME_SIZE/8	/* size: INT_FRAME_SIZE */
> +	li	r6,0			/* start offset: 0 */
> +	mtctr	r5
> +2:	ldx	r0,r6,r4
> +	stdx	r0,r6,r3
> +	addi	r6,r6,8
> +	bdnz	2b
> +
> +	/* Do real store operation to complete stdu */
> +	ld	r5,GPR1(r1)
> +	std	r8,0(r5)
> +
> +	/* Clear _TIF_EMULATE_STACK_STORE flag */
> +	lis	r11,_TIF_EMULATE_STACK_STORE@h
> +	addi	r5,r9,TI_FLAGS
> +0:	ldarx	r4,0,r5
> +	andc	r4,r4,r11
> +	stdcx.	r4,0,r5
> +	bne-	0b
> +1:
> +
> +#ifdef CONFIG_PREEMPT
> +	/* Check if we need to preempt */
> +	andi.	r0,r4,_TIF_NEED_RESCHED
> +	beq+	restore
> +	/* Check that preempt_count() == 0 and interrupts are enabled */
> +	lwz	r8,TI_PREEMPT(r9)
> +	cmpwi	cr0,r8,0
> +	bne	restore
> +	ld	r0,SOFTE(r1)
> +	andi.	r0,r0,IRQS_DISABLED
> +	bne	restore
> +
> +	/*
> +	 * Here we are preempting the current task. We want to make
> +	 * sure we are soft-disabled first and reconcile irq state.
> +	 */
> +	RECONCILE_IRQ_STATE(r3,r4)
> +	bl	preempt_schedule_irq
> +
> +	/*
> +	 * arch_local_irq_restore() from preempt_schedule_irq above may
> +	 * enable hard interrupt but we really should disable interrupts
> +	 * when we return from the interrupt, and so that we don't get
> +	 * interrupted after loading SRR0/1.
> +	 */
> +	wrteei	0
> +#endif /* CONFIG_PREEMPT */
> +
> +restore:
> +	/*
> +	 * This is the main kernel exit path. First we check if we
> +	 * are about to re-enable interrupts
> +	 */
> +	ld	r5,SOFTE(r1)
> +	lbz	r6,PACAIRQSOFTMASK(r13)
> +	andi.	r5,r5,IRQS_DISABLED
> +	bne	.Lrestore_irq_off
> +
> +	/* We are enabling, were we already enabled ? Yes, just return */
> +	andi.	r6,r6,IRQS_DISABLED
> +	beq	cr0,fast_exception_return
> +
> +	/*
> +	 * We are about to soft-enable interrupts (we are hard disabled
> +	 * at this point). We check if there's anything that needs to
> +	 * be replayed first.
> +	 */
> +	lbz	r0,PACAIRQHAPPENED(r13)
> +	cmpwi	cr0,r0,0
> +	bne-	.Lrestore_check_irq_replay
> +
> +	/*
> +	 * Get here when nothing happened while soft-disabled, just
> +	 * soft-enable and move-on. We will hard-enable as a side
> +	 * effect of rfi
> +	 */
> +.Lrestore_no_replay:
> +	TRACE_ENABLE_INTS
> +	li	r0,IRQS_ENABLED
> +	stb	r0,PACAIRQSOFTMASK(r13);
>   
>   /* This is the return from load_up_fpu fast path which could do with
>    * less GPR restores in fact, but for now we have a single return path
>    */
> -	.globl fast_exception_return
>   fast_exception_return:
>   	wrteei	0
>   1:	mr	r0,r13
> @@ -1092,6 +1237,102 @@ fast_exception_return:
>   	mfspr	r13,SPRN_SPRG_GEN_SCRATCH
>   	rfi
>   
> +	/*
> +	 * We are returning to a context with interrupts soft disabled.
> +	 *
> +	 * However, we may also about to hard enable, so we need to
> +	 * make sure that in this case, we also clear PACA_IRQ_HARD_DIS
> +	 * or that bit can get out of sync and bad things will happen
> +	 */
> +.Lrestore_irq_off:
> +	ld	r3,_MSR(r1)
> +	lbz	r7,PACAIRQHAPPENED(r13)
> +	andi.	r0,r3,MSR_EE
> +	beq	1f
> +	rlwinm	r7,r7,0,~PACA_IRQ_HARD_DIS
> +	stb	r7,PACAIRQHAPPENED(r13)
> +1:
> +#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
> +	/* The interrupt should not have soft enabled. */
> +	lbz	r7,PACAIRQSOFTMASK(r13)
> +1:	tdeqi	r7,IRQS_ENABLED
> +	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
> +#endif
> +	b	fast_exception_return
> +
> +	/*
> +	 * Something did happen, check if a re-emit is needed
> +	 * (this also clears paca->irq_happened)
> +	 */
> +.Lrestore_check_irq_replay:
> +	/* XXX: We could implement a fast path here where we check
> +	 * for irq_happened being just 0x01, in which case we can
> +	 * clear it and return. That means that we would potentially
> +	 * miss a decrementer having wrapped all the way around.
> +	 *
> +	 * Still, this might be useful for things like hash_page
> +	 */
> +	bl	__check_irq_replay
> +	cmpwi	cr0,r3,0
> +	beq	.Lrestore_no_replay
> +
> +	/*
> +	 * We need to re-emit an interrupt. We do so by re-using our
> +	 * existing exception frame. We first change the trap value,
> +	 * but we need to ensure we preserve the low nibble of it
> +	 */
> +	ld	r4,_TRAP(r1)
> +	clrldi	r4,r4,60
> +	or	r4,r4,r3
> +	std	r4,_TRAP(r1)
> +
> +	/*
> +	 * PACA_IRQ_HARD_DIS won't always be set here, so set it now
> +	 * to reconcile the IRQ state. Tracing is already accounted for.
> +	 */
> +	lbz	r4,PACAIRQHAPPENED(r13)
> +	ori	r4,r4,PACA_IRQ_HARD_DIS
> +	stb	r4,PACAIRQHAPPENED(r13)
> +
> +	/*
> +	 * Then find the right handler and call it. Interrupts are
> +	 * still soft-disabled and we keep them that way.
> +	*/
> +	cmpwi	cr0,r3,0x500
> +	bne	1f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD;
> +	bl	do_IRQ
> +	b	ret_from_except
> +1:	cmpwi	cr0,r3,0xf00
> +	bne	1f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD;
> +	bl	performance_monitor_exception
> +	b	ret_from_except
> +1:	cmpwi	cr0,r3,0xe60
> +	bne	1f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD;
> +	bl	handle_hmi_exception
> +	b	ret_from_except
> +1:	cmpwi	cr0,r3,0x900
> +	bne	1f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD;
> +	bl	timer_interrupt
> +	b	ret_from_except
> +#ifdef CONFIG_PPC_DOORBELL
> +1:
> +	cmpwi	cr0,r3,0x280
> +	bne	1f
> +	addi	r3,r1,STACK_FRAME_OVERHEAD;
> +	bl	doorbell_exception
> +#endif /* CONFIG_PPC_DOORBELL */
> +1:	b	ret_from_except /* What else to do here ? */
> +
> +_ASM_NOKPROBE_SYMBOL(ret_from_except);
> +_ASM_NOKPROBE_SYMBOL(ret_from_except_lite);
> +_ASM_NOKPROBE_SYMBOL(resume_kernel);
> +_ASM_NOKPROBE_SYMBOL(restore);
> +_ASM_NOKPROBE_SYMBOL(fast_exception_return);
> +
>   /*
>    * Trampolines used when spotting a bad kernel stack pointer in
>    * the exception entry code.
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index bad8cd9e7dba..d635fd4e40ea 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -575,6 +575,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>   	std	r10,GPR12(r1)
>   	std	r11,GPR13(r1)
>   
> +	SAVE_NVGPRS(r1)
> +
>   	.if IDAR
>   	.if IISIDE
>   	ld	r10,_NIP(r1)
> @@ -611,7 +613,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>   	mfspr	r11,SPRN_XER		/* save XER in stackframe	*/
>   	std	r10,SOFTE(r1)
>   	std	r11,_XER(r1)
> -	li	r9,(IVEC)+1
> +	li	r9,IVEC
>   	std	r9,_TRAP(r1)		/* set trap number		*/
>   	li	r10,0
>   	ld	r11,exception_marker@toc(r2)
> @@ -918,7 +920,6 @@ EXC_COMMON_BEGIN(system_reset_common)
>   	ld	r1,PACA_NMI_EMERG_SP(r13)
>   	subi	r1,r1,INT_FRAME_SIZE
>   	__GEN_COMMON_BODY system_reset
> -	bl	save_nvgprs
>   	/*
>   	 * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
>   	 * the right thing. We do not want to reconcile because that goes
> @@ -1099,7 +1100,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   	li	r10,MSR_RI
>   	mtmsrd	r10,1
>   
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	machine_check_early
>   	std	r3,RESULT(r1)	/* Save result */
> @@ -1192,10 +1192,9 @@ EXC_COMMON_BEGIN(machine_check_common)
>   	/* Enable MSR_RI when finished with PACA_EXMC */
>   	li	r10,MSR_RI
>   	mtmsrd 	r10,1
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	machine_check_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM machine_check
>   
> @@ -1362,20 +1361,19 @@ BEGIN_MMU_FTR_SECTION
>   	bl	do_slb_fault
>   	cmpdi	r3,0
>   	bne-	1f
> -	b	fast_exception_return
> +	b	fast_interrupt_return
>   1:	/* Error case */
>   MMU_FTR_SECTION_ELSE
>   	/* Radix case, access is outside page table range */
>   	li	r3,-EFAULT
>   ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   	std	r3,RESULT(r1)
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	ld	r4,_DAR(r1)
>   	ld	r5,RESULT(r1)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_bad_slb_fault
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM data_access_slb
>   
> @@ -1455,20 +1453,19 @@ BEGIN_MMU_FTR_SECTION
>   	bl	do_slb_fault
>   	cmpdi	r3,0
>   	bne-	1f
> -	b	fast_exception_return
> +	b	fast_interrupt_return
>   1:	/* Error case */
>   MMU_FTR_SECTION_ELSE
>   	/* Radix case, access is outside page table range */
>   	li	r3,-EFAULT
>   ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   	std	r3,RESULT(r1)
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	ld	r4,_DAR(r1)
>   	ld	r5,RESULT(r1)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_bad_slb_fault
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM instruction_access_slb
>   
> @@ -1516,7 +1513,7 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
>   	RUNLATCH_ON
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_IRQ
> -	b	ret_from_except_lite
> +	b	interrupt_return_lite
>   
>   	GEN_KVM hardware_interrupt
>   
> @@ -1542,10 +1539,9 @@ EXC_VIRT_BEGIN(alignment, 0x4600, 0x100)
>   EXC_VIRT_END(alignment, 0x4600, 0x100)
>   EXC_COMMON_BEGIN(alignment_common)
>   	GEN_COMMON alignment
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	alignment_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM alignment
>   
> @@ -1606,10 +1602,9 @@ EXC_COMMON_BEGIN(program_check_common)
>   	__ISTACK(program_check)=1
>   	__GEN_COMMON_BODY program_check
>   3:
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	program_check_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM program_check
>   
> @@ -1640,7 +1635,6 @@ EXC_VIRT_END(fp_unavailable, 0x4800, 0x100)
>   EXC_COMMON_BEGIN(fp_unavailable_common)
>   	GEN_COMMON fp_unavailable
>   	bne	1f			/* if from user, just load it up */
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	kernel_fp_unavailable_exception
> @@ -1657,14 +1651,13 @@ BEGIN_FTR_SECTION
>   END_FTR_SECTION_IFSET(CPU_FTR_TM)
>   #endif
>   	bl	load_up_fpu
> -	b	fast_exception_return
> +	b	fast_interrupt_return
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   2:	/* User process was in a transaction */
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	fp_unavailable_tm
> -	b	ret_from_except
> +	b	interrupt_return
>   #endif
>   
>   	GEN_KVM fp_unavailable
> @@ -1707,7 +1700,7 @@ EXC_COMMON_BEGIN(decrementer_common)
>   	RUNLATCH_ON
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	timer_interrupt
> -	b	ret_from_except_lite
> +	b	interrupt_return_lite
>   
>   	GEN_KVM decrementer
>   
> @@ -1798,7 +1791,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
>   #else
>   	bl	unknown_exception
>   #endif
> -	b	ret_from_except_lite
> +	b	interrupt_return_lite
>   
>   	GEN_KVM doorbell_super
>   
> @@ -1970,10 +1963,9 @@ EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
>   EXC_VIRT_END(single_step, 0x4d00, 0x100)
>   EXC_COMMON_BEGIN(single_step_common)
>   	GEN_COMMON single_step
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	single_step_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM single_step
>   
> @@ -2008,7 +2000,6 @@ EXC_VIRT_BEGIN(h_data_storage, 0x4e00, 0x20)
>   EXC_VIRT_END(h_data_storage, 0x4e00, 0x20)
>   EXC_COMMON_BEGIN(h_data_storage_common)
>   	GEN_COMMON h_data_storage
> -	bl      save_nvgprs
>   	addi    r3,r1,STACK_FRAME_OVERHEAD
>   BEGIN_MMU_FTR_SECTION
>   	ld	r4,_DAR(r1)
> @@ -2017,7 +2008,7 @@ BEGIN_MMU_FTR_SECTION
>   MMU_FTR_SECTION_ELSE
>   	bl      unknown_exception
>   ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
> -	b       ret_from_except
> +	b       interrupt_return
>   
>   	GEN_KVM h_data_storage
>   
> @@ -2042,10 +2033,9 @@ EXC_VIRT_BEGIN(h_instr_storage, 0x4e20, 0x20)
>   EXC_VIRT_END(h_instr_storage, 0x4e20, 0x20)
>   EXC_COMMON_BEGIN(h_instr_storage_common)
>   	GEN_COMMON h_instr_storage
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	unknown_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM h_instr_storage
>   
> @@ -2068,10 +2058,9 @@ EXC_VIRT_BEGIN(emulation_assist, 0x4e40, 0x20)
>   EXC_VIRT_END(emulation_assist, 0x4e40, 0x20)
>   EXC_COMMON_BEGIN(emulation_assist_common)
>   	GEN_COMMON emulation_assist
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	emulation_assist_interrupt
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM emulation_assist
>   
> @@ -2151,10 +2140,9 @@ EXC_COMMON_BEGIN(hmi_exception_common)
>   	GEN_COMMON hmi_exception
>   	FINISH_NAP
>   	RUNLATCH_ON
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	handle_hmi_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM hmi_exception
>   
> @@ -2188,7 +2176,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
>   #else
>   	bl	unknown_exception
>   #endif
> -	b	ret_from_except_lite
> +	b	interrupt_return_lite
>   
>   	GEN_KVM h_doorbell
>   
> @@ -2218,7 +2206,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
>   	RUNLATCH_ON
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_IRQ
> -	b	ret_from_except_lite
> +	b	interrupt_return_lite
>   
>   	GEN_KVM h_virt_irq
>   
> @@ -2265,7 +2253,7 @@ EXC_COMMON_BEGIN(performance_monitor_common)
>   	RUNLATCH_ON
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	performance_monitor_exception
> -	b	ret_from_except_lite
> +	b	interrupt_return_lite
>   
>   	GEN_KVM performance_monitor
>   
> @@ -2305,23 +2293,21 @@ BEGIN_FTR_SECTION
>     END_FTR_SECTION_NESTED(CPU_FTR_TM, CPU_FTR_TM, 69)
>   #endif
>   	bl	load_up_altivec
> -	b	fast_exception_return
> +	b	fast_interrupt_return
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   2:	/* User process was in a transaction */
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	altivec_unavailable_tm
> -	b	ret_from_except
> +	b	interrupt_return
>   #endif
>   1:
>   END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
>   #endif
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	altivec_unavailable_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM altivec_unavailable
>   
> @@ -2363,20 +2349,18 @@ BEGIN_FTR_SECTION
>   	b	load_up_vsx
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   2:	/* User process was in a transaction */
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	vsx_unavailable_tm
> -	b	ret_from_except
> +	b	interrupt_return
>   #endif
>   1:
>   END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>   #endif
> -	bl	save_nvgprs
>   	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	vsx_unavailable_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM vsx_unavailable
>   
> @@ -2403,10 +2387,9 @@ EXC_VIRT_BEGIN(facility_unavailable, 0x4f60, 0x20)
>   EXC_VIRT_END(facility_unavailable, 0x4f60, 0x20)
>   EXC_COMMON_BEGIN(facility_unavailable_common)
>   	GEN_COMMON facility_unavailable
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	facility_unavailable_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM facility_unavailable
>   
> @@ -2433,10 +2416,9 @@ EXC_VIRT_BEGIN(h_facility_unavailable, 0x4f80, 0x20)
>   EXC_VIRT_END(h_facility_unavailable, 0x4f80, 0x20)
>   EXC_COMMON_BEGIN(h_facility_unavailable_common)
>   	GEN_COMMON h_facility_unavailable
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	facility_unavailable_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM h_facility_unavailable
>   
> @@ -2467,10 +2449,9 @@ EXC_REAL_END(cbe_system_error, 0x1200, 0x100)
>   EXC_VIRT_NONE(0x5200, 0x100)
>   EXC_COMMON_BEGIN(cbe_system_error_common)
>   	GEN_COMMON cbe_system_error
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	cbe_system_error_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM cbe_system_error
>   
> @@ -2496,10 +2477,9 @@ EXC_VIRT_BEGIN(instruction_breakpoint, 0x5300, 0x100)
>   EXC_VIRT_END(instruction_breakpoint, 0x5300, 0x100)
>   EXC_COMMON_BEGIN(instruction_breakpoint_common)
>   	GEN_COMMON instruction_breakpoint
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	instruction_breakpoint_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM instruction_breakpoint
>   
> @@ -2619,10 +2599,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>   
>   EXC_COMMON_BEGIN(denorm_exception_common)
>   	GEN_COMMON denorm_exception
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	unknown_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM denorm_exception
>   
> @@ -2641,10 +2620,9 @@ EXC_REAL_END(cbe_maintenance, 0x1600, 0x100)
>   EXC_VIRT_NONE(0x5600, 0x100)
>   EXC_COMMON_BEGIN(cbe_maintenance_common)
>   	GEN_COMMON cbe_maintenance
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	cbe_maintenance_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM cbe_maintenance
>   
> @@ -2669,14 +2647,13 @@ EXC_VIRT_BEGIN(altivec_assist, 0x5700, 0x100)
>   EXC_VIRT_END(altivec_assist, 0x5700, 0x100)
>   EXC_COMMON_BEGIN(altivec_assist_common)
>   	GEN_COMMON altivec_assist
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   #ifdef CONFIG_ALTIVEC
>   	bl	altivec_assist_exception
>   #else
>   	bl	unknown_exception
>   #endif
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM altivec_assist
>   
> @@ -2695,10 +2672,9 @@ EXC_REAL_END(cbe_thermal, 0x1800, 0x100)
>   EXC_VIRT_NONE(0x5800, 0x100)
>   EXC_COMMON_BEGIN(cbe_thermal_common)
>   	GEN_COMMON cbe_thermal
> -	bl	save_nvgprs
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	cbe_thermal_exception
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   	GEN_KVM cbe_thermal
>   
> @@ -2731,7 +2707,6 @@ EXC_COMMON_BEGIN(soft_nmi_common)
>   	ld	r1,PACAEMERGSP(r13)
>   	subi	r1,r1,INT_FRAME_SIZE
>   	__GEN_COMMON_BODY soft_nmi
> -	bl	save_nvgprs
>   
>   	/*
>   	 * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
> @@ -3063,7 +3038,7 @@ do_hash_page:
>           cmpdi	r3,0			/* see if __hash_page succeeded */
>   
>   	/* Success */
> -	beq	fast_exc_return_irq	/* Return from exception on success */
> +	beq	interrupt_return_lite	/* Return from exception on success */
>   
>   	/* Error */
>   	blt-	13f
> @@ -3080,17 +3055,15 @@ handle_page_fault:
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_page_fault
>   	cmpdi	r3,0
> -	beq+	ret_from_except_lite
> -	bl	save_nvgprs
> +	beq+	interrupt_return_lite
>   	mr	r5,r3
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	ld	r4,_DAR(r1)
>   	bl	bad_page_fault
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   /* We have a data breakpoint exception - handle it */
>   handle_dabr_fault:
> -	bl	save_nvgprs
>   	ld      r4,_DAR(r1)
>   	ld      r5,_DSISR(r1)
>   	addi    r3,r1,STACK_FRAME_OVERHEAD
> @@ -3098,21 +3071,20 @@ handle_dabr_fault:
>   	/*
>   	 * do_break() may have changed the NV GPRS while handling a breakpoint.
>   	 * If so, we need to restore them with their updated values. Don't use
> -	 * ret_from_except_lite here.
> +	 * interrupt_return_lite here.
>   	 */
> -	b       ret_from_except
> +	b       interrupt_return
>   
>   
>   #ifdef CONFIG_PPC_BOOK3S_64
>   /* We have a page fault that hash_page could handle but HV refused
>    * the PTE insertion
>    */
> -13:	bl	save_nvgprs
> -	mr	r5,r3
> +13:	mr	r5,r3
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	ld	r4,_DAR(r1)
>   	bl	low_hash_fault
> -	b	ret_from_except
> +	b	interrupt_return
>   #endif
>   
>   /*
> @@ -3122,11 +3094,10 @@ handle_dabr_fault:
>    * were soft-disabled.  We want to invoke the exception handler for
>    * the access, or panic if there isn't a handler.
>    */
> -77:	bl	save_nvgprs
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> +77:	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	li	r5,SIGSEGV
>   	bl	bad_page_fault
> -	b	ret_from_except
> +	b	interrupt_return
>   
>   /*
>    * When doorbell is triggered from system reset wakeup, the message is
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index afd74eba70aa..6ea27dbcb872 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -110,6 +110,8 @@ static inline notrace int decrementer_check_overflow(void)
>   	return now >= *next_tb;
>   }
>   
> +#ifdef CONFIG_PPC_BOOK3E
> +
>   /* This is called whenever we are re-enabling interrupts
>    * and returns either 0 (nothing to do) or 500/900/280/a00/e80 if
>    * there's an EE, DEC or DBELL to generate.
> @@ -169,41 +171,16 @@ notrace unsigned int __check_irq_replay(void)
>   		}
>   	}
>   
> -	/*
> -	 * Force the delivery of pending soft-disabled interrupts on PS3.
> -	 * Any HV call will have this side effect.
> -	 */
> -	if (firmware_has_feature(FW_FEATURE_PS3_LV1)) {
> -		u64 tmp, tmp2;
> -		lv1_get_version_info(&tmp, &tmp2);
> -	}
> -
> -	/*
> -	 * Check if an hypervisor Maintenance interrupt happened.
> -	 * This is a higher priority interrupt than the others, so
> -	 * replay it first.
> -	 */
> -	if (happened & PACA_IRQ_HMI) {
> -		local_paca->irq_happened &= ~PACA_IRQ_HMI;
> -		return 0xe60;
> -	}
> -
>   	if (happened & PACA_IRQ_DEC) {
>   		local_paca->irq_happened &= ~PACA_IRQ_DEC;
>   		return 0x900;
>   	}
>   
> -	if (happened & PACA_IRQ_PMI) {
> -		local_paca->irq_happened &= ~PACA_IRQ_PMI;
> -		return 0xf00;
> -	}
> -
>   	if (happened & PACA_IRQ_EE) {
>   		local_paca->irq_happened &= ~PACA_IRQ_EE;
>   		return 0x500;
>   	}
>   
> -#ifdef CONFIG_PPC_BOOK3E
>   	/*
>   	 * Check if an EPR external interrupt happened this bit is typically
>   	 * set if we need to handle another "edge" interrupt from within the
> @@ -218,20 +195,15 @@ notrace unsigned int __check_irq_replay(void)
>   		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
>   		return 0x280;
>   	}
> -#else
> -	if (happened & PACA_IRQ_DBELL) {
> -		local_paca->irq_happened &= ~PACA_IRQ_DBELL;
> -		return 0xa00;
> -	}
> -#endif /* CONFIG_PPC_BOOK3E */
>   
>   	/* There should be nothing left ! */
>   	BUG_ON(local_paca->irq_happened != 0);
>   
>   	return 0;
>   }
> +#endif /* CONFIG_PPC_BOOK3E */
>   
> -static void replay_soft_interrupts(void)
> +void replay_soft_interrupts(void)
>   {
>   	/*
>   	 * We use local_paca rather than get_paca() to avoid all
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..1dea4d280f6f 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -236,23 +236,9 @@ void enable_kernel_fp(void)
>   	}
>   }
>   EXPORT_SYMBOL(enable_kernel_fp);
> -
> -static int restore_fp(struct task_struct *tsk)
> -{
> -	if (tsk->thread.load_fp) {
> -		load_fp_state(&current->thread.fp_state);
> -		current->thread.load_fp++;
> -		return 1;
> -	}
> -	return 0;
> -}
> -#else
> -static int restore_fp(struct task_struct *tsk) { return 0; }
>   #endif /* CONFIG_PPC_FPU */
>   
>   #ifdef CONFIG_ALTIVEC
> -#define loadvec(thr) ((thr).load_vec)
> -
>   static void __giveup_altivec(struct task_struct *tsk)
>   {
>   	unsigned long msr;
> @@ -318,21 +304,6 @@ void flush_altivec_to_thread(struct task_struct *tsk)
>   	}
>   }
>   EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
> -
> -static int restore_altivec(struct task_struct *tsk)
> -{
> -	if (cpu_has_feature(CPU_FTR_ALTIVEC) && (tsk->thread.load_vec)) {
> -		load_vr_state(&tsk->thread.vr_state);
> -		tsk->thread.used_vr = 1;
> -		tsk->thread.load_vec++;
> -
> -		return 1;
> -	}
> -	return 0;
> -}
> -#else
> -#define loadvec(thr) 0
> -static inline int restore_altivec(struct task_struct *tsk) { return 0; }
>   #endif /* CONFIG_ALTIVEC */
>   
>   #ifdef CONFIG_VSX
> @@ -400,18 +371,6 @@ void flush_vsx_to_thread(struct task_struct *tsk)
>   	}
>   }
>   EXPORT_SYMBOL_GPL(flush_vsx_to_thread);
> -
> -static int restore_vsx(struct task_struct *tsk)
> -{
> -	if (cpu_has_feature(CPU_FTR_VSX)) {
> -		tsk->thread.used_vsr = 1;
> -		return 1;
> -	}
> -
> -	return 0;
> -}
> -#else
> -static inline int restore_vsx(struct task_struct *tsk) { return 0; }
>   #endif /* CONFIG_VSX */
>   
>   #ifdef CONFIG_SPE
> @@ -511,6 +470,53 @@ void giveup_all(struct task_struct *tsk)
>   }
>   EXPORT_SYMBOL(giveup_all);
>   
> +#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_FPU
> +static int restore_fp(struct task_struct *tsk)
> +{
> +	if (tsk->thread.load_fp) {
> +		load_fp_state(&current->thread.fp_state);
> +		current->thread.load_fp++;
> +		return 1;
> +	}
> +	return 0;
> +}
> +#else
> +static int restore_fp(struct task_struct *tsk) { return 0; }
> +#endif /* CONFIG_PPC_FPU */
> +
> +#ifdef CONFIG_ALTIVEC
> +#define loadvec(thr) ((thr).load_vec)
> +static int restore_altivec(struct task_struct *tsk)
> +{
> +	if (cpu_has_feature(CPU_FTR_ALTIVEC) && (tsk->thread.load_vec)) {
> +		load_vr_state(&tsk->thread.vr_state);
> +		tsk->thread.used_vr = 1;
> +		tsk->thread.load_vec++;
> +
> +		return 1;
> +	}
> +	return 0;
> +}
> +#else
> +#define loadvec(thr) 0
> +static inline int restore_altivec(struct task_struct *tsk) { return 0; }
> +#endif /* CONFIG_ALTIVEC */
> +
> +#ifdef CONFIG_VSX
> +static int restore_vsx(struct task_struct *tsk)
> +{
> +	if (cpu_has_feature(CPU_FTR_VSX)) {
> +		tsk->thread.used_vsr = 1;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +#else
> +static inline int restore_vsx(struct task_struct *tsk) { return 0; }
> +#endif /* CONFIG_VSX */
> +
>   /*
>    * The exception exit path calls restore_math() with interrupts hard disabled
>    * but the soft irq state not "reconciled". ftrace code that calls
> @@ -551,6 +557,7 @@ void notrace restore_math(struct pt_regs *regs)
>   
>   	regs->msr = msr;
>   }
> +#endif
>   
>   static void save_all(struct task_struct *tsk)
>   {
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 20f77cc19df8..08e0bebbd3b6 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -26,7 +26,11 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7,
>   	unsigned long ti_flags;
>   	syscall_fn f;
>   
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> +		BUG_ON(!(regs->msr & MSR_RI));
>   	BUG_ON(!(regs->msr & MSR_PR));
> +	BUG_ON(!FULL_REGS(regs));
> +	BUG_ON(regs->softe != IRQS_ENABLED);
>   
>   	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
>   	    unlikely(regs->msr & MSR_TS_T))
> @@ -195,7 +199,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>   		trace_hardirqs_off();
>   		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
>   		local_irq_enable();
> -		/* Took an interrupt which may have more exit work to do. */
> +		/* Took an interrupt, may have more exit work to do. */
>   		goto again;
>   	}
>   	local_paca->irq_happened = 0;
> @@ -211,3 +215,161 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>   
>   	return ret;
>   }
> +
> +#ifdef CONFIG_PPC_BOOK3S /* BOOK3E not yet using this */
> +notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
> +{
> +#ifdef CONFIG_PPC_BOOK3E
> +	struct thread_struct *ts = &current->thread;
> +#endif
> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
> +	unsigned long ti_flags;
> +	unsigned long flags;
> +	unsigned long ret = 0;
> +
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> +		BUG_ON(!(regs->msr & MSR_RI));
> +	BUG_ON(!(regs->msr & MSR_PR));
> +	BUG_ON(!FULL_REGS(regs));
> +	BUG_ON(regs->softe != IRQS_ENABLED);
> +
> +	local_irq_save(flags);
> +
> +again:
> +	ti_flags = READ_ONCE(*ti_flagsp);
> +	while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
> +		local_irq_enable(); /* returning to user: may enable */
> +		if (ti_flags & _TIF_NEED_RESCHED) {
> +			schedule();
> +		} else {
> +			if (ti_flags & _TIF_SIGPENDING)
> +				ret |= _TIF_RESTOREALL;
> +			do_notify_resume(regs, ti_flags);
> +		}
> +		local_irq_disable();
> +		ti_flags = READ_ONCE(*ti_flagsp);
> +	}
> +
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
> +		unsigned long mathflags = 0;
> +
> +		if (IS_ENABLED(CONFIG_PPC_FPU))
> +			mathflags |= MSR_FP;
> +		if (IS_ENABLED(CONFIG_ALTIVEC))
> +			mathflags |= MSR_VEC;
> +
> +		if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> +						(ti_flags & _TIF_RESTORE_TM))
> +			restore_tm_state(regs);
> +		else if ((regs->msr & mathflags) != mathflags)
> +			restore_math(regs);
> +	}
> +
> +	trace_hardirqs_on();
> +	__hard_EE_RI_disable();
> +	if (unlikely(lazy_irq_pending())) {
> +		__hard_RI_enable();
> +		trace_hardirqs_off();
> +		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +		local_irq_enable();
> +		local_irq_disable();
> +		/* Took an interrupt, may have more exit work to do. */
> +		goto again;
> +	}
> +	local_paca->irq_happened = 0;
> +	irq_soft_mask_set(IRQS_ENABLED);
> +
> +#ifdef CONFIG_PPC_BOOK3E
> +	if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) {
> +		/*
> +		 * Check to see if the dbcr0 register is set up to debug.
> +		 * Use the internal debug mode bit to do this.
> +		 */
> +		mtmsr(mfmsr() & ~MSR_DE);
> +		mtspr(SPRN_DBCR0, ts->debug.dbcr0);
> +		mtspr(SPRN_DBSR, -1);
> +	}
> +#endif
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	local_paca->tm_scratch = regs->msr;
> +#endif
> +
> +	kuap_check_amr();
> +
> +	account_cpu_user_exit();
> +
> +	return ret;
> +}
> +
> +void unrecoverable_exception(struct pt_regs *regs);
> +void preempt_schedule_irq(void);
> +
> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
> +{
> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
> +	unsigned long flags;
> +
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
> +		unrecoverable_exception(regs);
> +	BUG_ON(regs->msr & MSR_PR);
> +	BUG_ON(!FULL_REGS(regs));
> +
> +	local_irq_save(flags);
> +
> +	if (regs->softe == IRQS_ENABLED) {
> +		/* Returning to a kernel context with local irqs enabled. */
> +		WARN_ON_ONCE(!(regs->msr & MSR_EE));
> +again:
> +		if (IS_ENABLED(CONFIG_PREEMPT)) {
> +			/* Return to preemptible kernel context */
> +			if (unlikely(*ti_flagsp & _TIF_NEED_RESCHED)) {
> +				if (preempt_count() == 0)
> +					preempt_schedule_irq();
> +			}
> +		}
> +
> +		trace_hardirqs_on();
> +		__hard_EE_RI_disable();
> +		if (unlikely(lazy_irq_pending())) {
> +			__hard_RI_enable();
> +			irq_soft_mask_set(IRQS_ALL_DISABLED);
> +			trace_hardirqs_off();
> +			local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +			/*
> +			 * Can't local_irq_enable in case we are in interrupt
> +			 * context. Must replay directly.
> +			 */
> +			replay_soft_interrupts();
> +			irq_soft_mask_set(flags);
> +			/* Took an interrupt, may have more exit work to do. */
> +			goto again;
> +		}
> +		local_paca->irq_happened = 0;
> +		irq_soft_mask_set(IRQS_ENABLED);
> +	} else {
> +		/* Returning to a kernel context with local irqs disabled. */
> +		trace_hardirqs_on();
> +		__hard_EE_RI_disable();
> +		if (regs->msr & MSR_EE)
> +			local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
> +	}
> +
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	local_paca->tm_scratch = regs->msr;
> +#endif
> +
> +	/*
> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
> +	 * The value of AMR only matters while we're in the kernel.
> +	 */
> +	kuap_restore_amr(regs);
> +
> +	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
> +		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
> +		return 1;
> +	}
> +	return 0;
> +}
> +#endif
> diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
> index 25c14a0981bf..d20c5e79e03c 100644
> --- a/arch/powerpc/kernel/vector.S
> +++ b/arch/powerpc/kernel/vector.S
> @@ -134,7 +134,7 @@ _GLOBAL(load_up_vsx)
>   	/* enable use of VSX after return */
>   	oris	r12,r12,MSR_VSX@h
>   	std	r12,_MSR(r1)
> -	b	fast_exception_return
> +	b	fast_interrupt_return
>   
>   #endif /* CONFIG_VSX */
>   
> 

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-02-27 10:07   ` Christophe Leroy
@ 2021-03-01  0:47     ` Nicholas Piggin
  0 siblings, 0 replies; 161+ messages in thread
From: Nicholas Piggin @ 2021-03-01  0:47 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Excerpts from Christophe Leroy's message of February 27, 2021 8:07 pm:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> Implement the bulk of interrupt return logic in C. The asm return code
>> must handle a few cases: restoring full GPRs, and emulating stack store.
>> 
>> The stack store emulation is significantly simplfied, rather than creating
>> a new return frame and switching to that before performing the store, it
>> uses the PACA to keep a scratch register around to perform thestore.
>> 
>> The asm return code is moved into 64e for now. The new logic has made
>> allowance for 64e, but I don't have a full environment that works well
>> to test it, and even booting in emulated qemu is not great for stress
>> testing. 64e shouldn't be too far off working with this, given a bit
>> more testing and auditing of the logic.
>> 
>> This is slightly faster on a POWER9 (page fault speed increases about
>> 1.1%), probably due to reduced mtmsrd.
> 
> 
> This series, and especially this patch has added a awfull number of BUG_ON() traps.
> 
> We have an issue open at https://github.com/linuxppc/issues/issues/88 since 2017 for reducing the 
> number of BUG_ON()s
> 
> And the kernel Documentation is explicit on the willingness to deprecate BUG_ON(), see 
> https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=bug_on :
> 
> BUG() and BUG_ON()
> Use WARN() and WARN_ON() instead, and handle the “impossible” error condition as gracefully as 
> possible. While the BUG()-family of APIs were originally designed to act as an “impossible 
> situation” assert and to kill a kernel thread “safely”, they turn out to just be too risky. (e.g. 
> “In what order do locks need to be released? Have various states been restored?”) Very commonly, 
> using BUG() will destabilize a system or entirely break it, which makes it impossible to debug or 
> even get viable crash reports. Linus has very strong feelings about this.
> 
> So ... can we do something cleaner with all the BUG_ON()s recently added ?

Yeah you're right. Some of it is probably overkill due to paranoia when 
developing the series.

Now we have a bit more confidence we could probably look at cutting down 
on these.

I do get a bit concerned about detecting a problem in some code like 
this and attempting to just continue, it usually means the system is 
going to crash pretty badly anyway (and the WARN_ON trap interrupt is
probably going to finish you off anyway). So I think removing the more
obvious checks entirely (maybe with a PPC DEBUG config option) is the
right way to go.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
                     ` (2 preceding siblings ...)
  2021-02-27 10:07   ` Christophe Leroy
@ 2021-03-15 13:41   ` Christophe Leroy
  2021-03-16  7:36     ` Nicholas Piggin
  3 siblings, 1 reply; 161+ messages in thread
From: Christophe Leroy @ 2021-03-15 13:41 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Michal Suchanek



Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
> Implement the bulk of interrupt return logic in C. The asm return code
> must handle a few cases: restoring full GPRs, and emulating stack store.
> 
> The stack store emulation is significantly simplfied, rather than creating
> a new return frame and switching to that before performing the store, it
> uses the PACA to keep a scratch register around to perform thestore.
> 
> The asm return code is moved into 64e for now. The new logic has made
> allowance for 64e, but I don't have a full environment that works well
> to test it, and even booting in emulated qemu is not great for stress
> testing. 64e shouldn't be too far off working with this, given a bit
> more testing and auditing of the logic.
> 
> This is slightly faster on a POWER9 (page fault speed increases about
> 1.1%), probably due to reduced mtmsrd.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---

...

> +notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
> +{

...

> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	local_paca->tm_scratch = regs->msr;
> +#endif

Could we define a helper for that in asm/tm.h, that voids when CONFIG_PPC_TRANSACTIONAL_MEM is not 
selected ?

> +
> +	kuap_check_amr();
> +
> +	account_cpu_user_exit();
> +
> +	return ret;
> +}
> +
> +void unrecoverable_exception(struct pt_regs *regs);
> +void preempt_schedule_irq(void);
> +
> +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr)
> +{
> +

...

> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	local_paca->tm_scratch = regs->msr;
> +#endif
> +
> +	/*
> +	 * We don't need to restore AMR on the way back to userspace for KUAP.
> +	 * The value of AMR only matters while we're in the kernel.
> +	 */
> +	kuap_restore_amr(regs);
> +
> +	if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
> +		clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
> +		return 1;
> +	}
> +	return 0;
> +}
> +#endif

Christophe

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-03-15 13:41   ` Christophe Leroy
@ 2021-03-16  7:36     ` Nicholas Piggin
  2021-03-19 11:44       ` Michael Ellerman
  0 siblings, 1 reply; 161+ messages in thread
From: Nicholas Piggin @ 2021-03-16  7:36 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Excerpts from Christophe Leroy's message of March 15, 2021 11:41 pm:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> Implement the bulk of interrupt return logic in C. The asm return code
>> must handle a few cases: restoring full GPRs, and emulating stack store.
>> 
>> The stack store emulation is significantly simplfied, rather than creating
>> a new return frame and switching to that before performing the store, it
>> uses the PACA to keep a scratch register around to perform thestore.
>> 
>> The asm return code is moved into 64e for now. The new logic has made
>> allowance for 64e, but I don't have a full environment that works well
>> to test it, and even booting in emulated qemu is not great for stress
>> testing. 64e shouldn't be too far off working with this, given a bit
>> more testing and auditing of the logic.
>> 
>> This is slightly faster on a POWER9 (page fault speed increases about
>> 1.1%), probably due to reduced mtmsrd.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
>> ---
> 
> ...
> 
>> +notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
>> +{
> 
> ...
> 
>> +
>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> +	local_paca->tm_scratch = regs->msr;
>> +#endif
> 
> Could we define a helper for that in asm/tm.h, that voids when CONFIG_PPC_TRANSACTIONAL_MEM is not 
> selected ?

Yeah I wanted to do something about that. I don't know what it's used 
for here. I guess it saves the return MSR so if that causes a crash then 
the next oops would see it, but I wonder if we can just get that from 
SRR1 + program check error codes, or if there is something we can't
reconstruct from there. Have to check with someone who knows TM better.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 161+ messages in thread

* Re: [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
  2021-03-16  7:36     ` Nicholas Piggin
@ 2021-03-19 11:44       ` Michael Ellerman
  0 siblings, 0 replies; 161+ messages in thread
From: Michael Ellerman @ 2021-03-19 11:44 UTC (permalink / raw)
  To: Nicholas Piggin, Christophe Leroy, linuxppc-dev; +Cc: Michal Suchanek

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Christophe Leroy's message of March 15, 2021 11:41 pm:
>> 
>> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>>> Implement the bulk of interrupt return logic in C. The asm return code
>>> must handle a few cases: restoring full GPRs, and emulating stack store.
>>> 
>>> The stack store emulation is significantly simplfied, rather than creating
>>> a new return frame and switching to that before performing the store, it
>>> uses the PACA to keep a scratch register around to perform thestore.
>>> 
>>> The asm return code is moved into 64e for now. The new logic has made
>>> allowance for 64e, but I don't have a full environment that works well
>>> to test it, and even booting in emulated qemu is not great for stress
>>> testing. 64e shouldn't be too far off working with this, given a bit
>>> more testing and auditing of the logic.
>>> 
>>> This is slightly faster on a POWER9 (page fault speed increases about
>>> 1.1%), probably due to reduced mtmsrd.
>>> 
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
>>> ---
>> 
>> ...
>> 
>>> +notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr)
>>> +{
>> 
>> ...
>> 
>>> +
>>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>> +	local_paca->tm_scratch = regs->msr;
>>> +#endif
>> 
>> Could we define a helper for that in asm/tm.h, that voids when CONFIG_PPC_TRANSACTIONAL_MEM is not 
>> selected ?
>
> Yeah I wanted to do something about that. I don't know what it's used 
> for here. I guess it saves the return MSR so if that causes a crash then 
> the next oops would see it, but I wonder if we can just get that from 
> SRR1 + program check error codes, or if there is something we can't
> reconstruct from there.

In the cases when you need it, you can't reconstruct it :)

But given the TM code is on life support we could probably drop
tm_scratch.

I don't think we've used it in anger for several years. Probably since
265e60a170d0 ("powerpc/64s: Use emergency stack for kernel TM Bad Thing
program checks") (Oct 2017).

If one of us has to debug some hairy TM issue we can always add it back
temporarily in a dev kernel.

cheers

^ permalink raw reply	[flat|nested] 161+ messages in thread

end of thread, other threads:[~2021-03-19 11:45 UTC | newest]

Thread overview: 161+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25 17:35 [PATCH v3 00/32] powerpc/64: interrupts and syscalls series Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation Nicholas Piggin
2020-04-01 12:53   ` Michael Ellerman
2020-02-25 17:35 ` [PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 03/32] powerpc/64s/exception: Add GEN_KVM " Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 04/32] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 05/32] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 06/32] powerpc/64s/exception: Remove old INT_ENTRY macro Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 07/32] powerpc/64s/exception: Remove old INT_COMMON macro Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 08/32] powerpc/64s/exception: Remove old INT_KVM_HANDLER Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 09/32] powerpc/64s/exception: Add ISIDE option Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 10/32] powerpc/64s/exception: move real->virt switch into the common handler Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 11/32] powerpc/64s/exception: move soft-mask test to common code Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 12/32] powerpc/64s/exception: move KVM " Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 13/32] powerpc/64s/exception: remove confusing IEARLY option Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 14/32] powerpc/64s/exception: remove the SPR saving patch code macros Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 15/32] powerpc/64s/exception: trim unused arguments from KVMTEST macro Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 16/32] powerpc/64s/exception: hdecrementer avoid touching the stack Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 17/32] powerpc/64s/exception: re-inline some handlers Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 18/32] powerpc/64s/exception: Clean up SRR specifiers Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 19/32] powerpc/64s/exception: add more comments for interrupt handlers Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 20/32] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 21/32] powerpc/64s/exception: sreset interrupts reconcile fix Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 22/32] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 23/32] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 24/32] powerpc/64: sstep ifdef the deprecated fast endian switch syscall Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C Nicholas Piggin
2020-03-19  9:18   ` Christophe Leroy
2020-03-20  3:39     ` Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning Nicholas Piggin
2020-02-25 21:20   ` Segher Boessenkool
2020-02-26  3:39     ` Nicholas Piggin
2020-03-07  0:54     ` [PATCH] Fix " Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 27/32] powerpc/64: implement soft interrupt replay in C Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 28/32] powerpc/64s: interrupt implement exit logic " Nicholas Piggin
2021-01-27  8:54   ` Christophe Leroy
2021-01-28  0:09     ` Nicholas Piggin
2021-02-03 16:25   ` Christophe Leroy
2021-02-04  3:27     ` Nicholas Piggin
2021-02-04  8:03       ` Christophe Leroy
2021-02-04  8:53         ` Nicholas Piggin
2021-02-05  0:22           ` Michael Ellerman
2021-02-05  2:16             ` Nicholas Piggin
2021-02-05  6:04               ` Christophe Leroy
2021-02-06  2:28                 ` Nicholas Piggin
2021-02-27 10:07   ` Christophe Leroy
2021-03-01  0:47     ` Nicholas Piggin
2021-03-15 13:41   ` Christophe Leroy
2021-03-16  7:36     ` Nicholas Piggin
2021-03-19 11:44       ` Michael Ellerman
2020-02-25 17:35 ` [PATCH v3 29/32] powerpc/64s/exception: remove lite interrupt return Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 30/32] powerpc/64: system call reconcile interrupts Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 31/32] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin
2020-02-25 17:35 ` [PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin
2020-03-01 12:20   ` kbuild test robot
2020-03-01 12:20     ` kbuild test robot
2020-03-19 12:19 ` [PATCH v11 0/8] Disable compat cruft on ppc64le v11 Michal Suchanek
2020-03-19 12:19   ` Michal Suchanek
2020-03-19 12:19   ` [PATCH v11 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-19 12:19   ` [PATCH v11 2/8] powerpc: move common register copy functions from signal_32.c to signal.c Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-19 12:19   ` [PATCH v11 3/8] powerpc/perf: consolidate read_user_stack_32 Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-24  8:48     ` Nicholas Piggin
2020-03-24  8:48       ` Nicholas Piggin
2020-03-24 19:38       ` Michal Suchánek
2020-03-24 19:38         ` Michal Suchánek
2020-04-03  7:13         ` Nicholas Piggin
2020-04-03  7:13           ` Nicholas Piggin
2020-04-03 10:52           ` Michal Suchánek
2020-04-03 10:52             ` Michal Suchánek
2020-04-03 11:26             ` Nicholas Piggin
2020-04-03 11:26               ` Nicholas Piggin
2020-04-03 11:51               ` Michal Suchánek
2020-04-03 11:51                 ` Michal Suchánek
2020-04-06 20:52           ` Michal Suchánek
2020-04-06 20:52             ` Michal Suchánek
2020-04-06 21:00           ` [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32 Michal Suchanek
2020-04-06 21:00             ` Michal Suchanek
2020-04-07  5:21             ` Christophe Leroy
2020-04-07  5:21               ` Christophe Leroy
2020-04-09 11:22               ` Michal Suchánek
2020-04-09 11:22                 ` Michal Suchánek
2020-03-19 12:19   ` [PATCH v11 4/8] powerpc/perf: consolidate valid_user_sp Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-19 12:19   ` [PATCH v11 5/8] powerpc/64: make buildable without CONFIG_COMPAT Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-24  8:54     ` Nicholas Piggin
2020-03-24  8:54       ` Nicholas Piggin
2020-03-24 19:30       ` Michal Suchánek
2020-03-24 19:30         ` Michal Suchánek
2020-04-03  7:16         ` Nicholas Piggin
2020-04-03  7:16           ` Nicholas Piggin
2020-03-19 12:19   ` [PATCH v11 6/8] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-19 12:19   ` [PATCH v11 7/8] powerpc/perf: split callchain.c by bitness Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-19 12:19   ` [PATCH v11 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry Michal Suchanek
2020-03-19 12:19     ` Michal Suchanek
2020-03-19 13:37     ` Andy Shevchenko
2020-03-19 13:37       ` Andy Shevchenko
2020-03-19 14:00       ` Michal Suchánek
2020-03-19 14:00         ` Michal Suchánek
2020-03-19 14:26         ` Andy Shevchenko
2020-03-19 14:26           ` Andy Shevchenko
2020-03-19 17:03     ` Joe Perches
2020-03-19 17:03       ` Joe Perches
2020-03-19 12:36   ` [PATCH v11 0/8] Disable compat cruft on ppc64le v11 Christophe Leroy
2020-03-19 12:36     ` Christophe Leroy
2020-03-19 14:01     ` Michal Suchánek
2020-03-19 14:01       ` Michal Suchánek
2020-04-03  7:25   ` Nicholas Piggin
2020-04-03  7:25     ` Nicholas Piggin
2020-04-03  7:26     ` Christophe Leroy
2020-04-03  7:26       ` Christophe Leroy
2020-04-03  9:43       ` Nicholas Piggin
2020-04-03  9:43         ` Nicholas Piggin
2020-04-05  0:40         ` Michael Ellerman
2020-04-05  0:40           ` Michael Ellerman
2020-03-20 10:20 ` [PATCH v12 0/8] Disable compat cruft on ppc64le v12 Michal Suchanek
2020-03-20 10:20   ` Michal Suchanek
2020-03-20 10:20   ` [PATCH v12 1/8] powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-04-06 13:05     ` Michael Ellerman
2020-03-20 10:20   ` [PATCH v12 2/8] powerpc: move common register copy functions from signal_32.c to signal.c Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-03-20 10:20   ` [PATCH v12 3/8] powerpc/perf: consolidate read_user_stack_32 Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-03-20 10:20   ` [PATCH v12 4/8] powerpc/perf: consolidate valid_user_sp -> invalid_user_sp Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-03-20 10:20   ` [PATCH v12 5/8] powerpc/64: make buildable without CONFIG_COMPAT Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-04-07  5:50     ` Christophe Leroy
2020-04-07  5:50       ` Christophe Leroy
2020-04-07  9:57       ` Michal Suchánek
2020-04-07  9:57         ` Michal Suchánek
2020-03-20 10:20   ` [PATCH v12 6/8] powerpc/64: Make COMPAT user-selectable disabled on littleendian by default Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-03-20 10:20   ` [PATCH v12 7/8] powerpc/perf: split callchain.c by bitness Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-03-20 10:20   ` [PATCH v12 8/8] MAINTAINERS: perf: Add pattern that matches ppc perf to the perf entry Michal Suchanek
2020-03-20 10:20     ` Michal Suchanek
2020-03-20 10:33     ` Andy Shevchenko
2020-03-20 10:33       ` Andy Shevchenko
2020-03-20 11:23       ` Michal Suchánek
2020-03-20 11:23         ` Michal Suchánek
2020-03-20 12:42         ` Andy Shevchenko
2020-03-20 12:42           ` Andy Shevchenko
2020-03-20 14:42           ` Joe Perches
2020-03-20 14:42             ` Joe Perches
2020-03-20 16:28             ` Michal Suchánek
2020-03-20 16:28               ` Michal Suchánek
2020-03-20 16:31             ` Andy Shevchenko
2020-03-20 16:31               ` Andy Shevchenko
2020-03-20 16:42               ` Michal Suchánek
2020-03-20 16:42                 ` Michal Suchánek
2020-03-20 16:47                 ` Andy Shevchenko
2020-03-20 16:47                   ` Andy Shevchenko
2020-03-20 21:36               ` Joe Perches
2020-03-20 21:36                 ` Joe Perches

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.