linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] powerpc/64: syscalls in C
@ 2019-08-27  3:30 Nicholas Piggin
  2019-08-27  3:30 ` [PATCH 1/4] powerpc: convert to copy_thread_tls Nicholas Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27  3:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

This hasn't been highly stress tested or tested all cases (like 32-bit
binaries, single stepping, etc.) but so far it's running stable doing
normal kernel development and microbenchmarks.

Thanks,
Nick

Nicholas Piggin (4):
  powerpc: convert to copy_thread_tls
  powerpc/64: remove support for kernel-mode syscalls
  powerpc/64: system call remove non-volatile GPR save optimisation
  powerpc/64: system call implement the bulk of the logic in C

 arch/powerpc/Kconfig                      |   1 +
 arch/powerpc/include/asm/asm-prototypes.h |  11 -
 arch/powerpc/include/asm/ptrace.h         |   3 +
 arch/powerpc/include/asm/signal.h         |   2 +
 arch/powerpc/include/asm/switch_to.h      |   4 +
 arch/powerpc/include/asm/time.h           |   3 +
 arch/powerpc/kernel/Makefile              |   3 +-
 arch/powerpc/kernel/entry_64.S            | 424 ++++------------------
 arch/powerpc/kernel/exceptions-64s.S      |   2 -
 arch/powerpc/kernel/process.c             |  15 +-
 arch/powerpc/kernel/signal.h              |   2 -
 arch/powerpc/kernel/syscall_64.c          | 202 +++++++++++
 arch/powerpc/kernel/syscalls/syscall.tbl  |  22 +-
 13 files changed, 305 insertions(+), 389 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c

-- 
2.22.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] powerpc: convert to copy_thread_tls
  2019-08-27  3:30 [PATCH 0/4] powerpc/64: syscalls in C Nicholas Piggin
@ 2019-08-27  3:30 ` Nicholas Piggin
  2019-08-27  6:07   ` Christophe Leroy
  2019-09-02  3:29   ` Michael Ellerman
  2019-08-27  3:30 ` [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls Nicholas Piggin
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27  3:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather
than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it
to avoid a subtle assumption about the argument ordering of clone type
syscalls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/Kconfig          | 1 +
 arch/powerpc/kernel/process.c | 9 +++++----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d8dcd8820369..7477a3263225 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -182,6 +182,7 @@ config PPC
 	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
 	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
 	select HAVE_CONTEXT_TRACKING		if PPC64
+	select HAVE_COPY_THREAD_TLS
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DYNAMIC_FTRACE
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8fc4de0d22b4..24621e7e5033 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1600,8 +1600,9 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp)
 /*
  * Copy architecture-specific thread state
  */
-int copy_thread(unsigned long clone_flags, unsigned long usp,
-		unsigned long kthread_arg, struct task_struct *p)
+int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
+		unsigned long kthread_arg, struct task_struct *p,
+		unsigned long tls)
 {
 	struct pt_regs *childregs, *kregs;
 	extern void ret_from_fork(void);
@@ -1642,10 +1643,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 		if (clone_flags & CLONE_SETTLS) {
 #ifdef CONFIG_PPC64
 			if (!is_32bit_task())
-				childregs->gpr[13] = childregs->gpr[6];
+				childregs->gpr[13] = tls;
 			else
 #endif
-				childregs->gpr[2] = childregs->gpr[6];
+				childregs->gpr[2] = tls;
 		}
 
 		f = ret_from_fork;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls
  2019-08-27  3:30 [PATCH 0/4] powerpc/64: syscalls in C Nicholas Piggin
  2019-08-27  3:30 ` [PATCH 1/4] powerpc: convert to copy_thread_tls Nicholas Piggin
@ 2019-08-27  3:30 ` Nicholas Piggin
  2019-08-27  6:13   ` Christophe Leroy
  2019-08-27  3:30 ` [PATCH 2/4] powerpc/64s: " Nicholas Piggin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27  3:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

There is support for the kernel to execute the 'sc 0' instruction and
make a system call to itself. This is a relic that is unused in the
tree, therefore untested. It's also highly questionable for modules to
be doing this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S       | 21 ++++++---------------
 arch/powerpc/kernel/exceptions-64s.S |  2 --
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0a0b5310f54a..6467bdab8d40 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -69,24 +69,20 @@ BEGIN_FTR_SECTION
 	bne	.Ltabort_syscall
 END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #endif
-	andi.	r10,r12,MSR_PR
 	mr	r10,r1
-	addi	r1,r1,-INT_FRAME_SIZE
-	beq-	1f
 	ld	r1,PACAKSAVE(r13)
-1:	std	r10,0(r1)
+	std	r10,0(r1)
 	std	r11,_NIP(r1)
 	std	r12,_MSR(r1)
 	std	r0,GPR0(r1)
 	std	r10,GPR1(r1)
-	beq	2f			/* if from kernel mode */
 #ifdef CONFIG_PPC_FSL_BOOK3E
 START_BTB_FLUSH_SECTION
 	BTB_FLUSH(r10)
 END_BTB_FLUSH_SECTION
 #endif
 	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
-2:	std	r2,GPR2(r1)
+	std	r2,GPR2(r1)
 	std	r3,GPR3(r1)
 	mfcr	r2
 	std	r4,GPR4(r1)
@@ -122,14 +118,13 @@ END_BTB_FLUSH_SECTION
 
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
 BEGIN_FW_FTR_SECTION
-	beq	33f
-	/* if from user, see if there are any DTL entries to process */
+	/* see if there are any DTL entries to process */
 	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
 	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
 	addi	r10,r10,LPPACA_DTLIDX
 	LDX_BE	r10,0,r10		/* get log write index */
-	cmpd	cr1,r11,r10
-	beq+	cr1,33f
+	cmpd	r11,r10
+	beq+	33f
 	bl	accumulate_stolen_time
 	REST_GPR(0,r1)
 	REST_4GPRS(3,r1)
@@ -203,6 +198,7 @@ system_call:			/* label this so stack traces look sane */
 	mtctr   r12
 	bctrl			/* Call handler */
 
+	/* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */
 .Lsyscall_exit:
 	std	r3,RESULT(r1)
 
@@ -216,11 +212,6 @@ system_call:			/* label this so stack traces look sane */
 	ld	r12, PACA_THREAD_INFO(r13)
 
 	ld	r8,_MSR(r1)
-#ifdef CONFIG_PPC_BOOK3S
-	/* No MSR:RI on BookE */
-	andi.	r10,r8,MSR_RI
-	beq-	.Lunrecov_restore
-#endif
 
 /*
  * This is a few instructions into the actual syscall exit path (which actually
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6ba3cc2ef8ab..768f133de4f1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1521,8 +1521,6 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
  * system call / hypercall (0xc00, 0x4c00)
  *
  * The system call exception is invoked with "sc 0" and does not alter HV bit.
- * There is support for kernel code to invoke system calls but there are no
- * in-tree users.
  *
  * The hypercall is invoked with "sc 1" and sets HV=1.
  *
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] powerpc/64s: remove support for kernel-mode syscalls
  2019-08-27  3:30 [PATCH 0/4] powerpc/64: syscalls in C Nicholas Piggin
  2019-08-27  3:30 ` [PATCH 1/4] powerpc: convert to copy_thread_tls Nicholas Piggin
  2019-08-27  3:30 ` [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls Nicholas Piggin
@ 2019-08-27  3:30 ` Nicholas Piggin
  2019-08-27  6:14   ` Christophe Leroy
  2019-08-27  3:30 ` [PATCH 3/4] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
  2019-08-27  3:30 ` [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C Nicholas Piggin
  4 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27  3:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

There is support for the kernel to execute the 'sc 0' instruction and
make a system call to itself. This is a relic that is unused in the
tree, therefore untested. It's also highly questionable for modules to
be doing this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S       | 21 ++++++---------------
 arch/powerpc/kernel/exceptions-64s.S |  2 --
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0a0b5310f54a..6467bdab8d40 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -69,24 +69,20 @@ BEGIN_FTR_SECTION
 	bne	.Ltabort_syscall
 END_FTR_SECTION_IFSET(CPU_FTR_TM)
 #endif
-	andi.	r10,r12,MSR_PR
 	mr	r10,r1
-	addi	r1,r1,-INT_FRAME_SIZE
-	beq-	1f
 	ld	r1,PACAKSAVE(r13)
-1:	std	r10,0(r1)
+	std	r10,0(r1)
 	std	r11,_NIP(r1)
 	std	r12,_MSR(r1)
 	std	r0,GPR0(r1)
 	std	r10,GPR1(r1)
-	beq	2f			/* if from kernel mode */
 #ifdef CONFIG_PPC_FSL_BOOK3E
 START_BTB_FLUSH_SECTION
 	BTB_FLUSH(r10)
 END_BTB_FLUSH_SECTION
 #endif
 	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
-2:	std	r2,GPR2(r1)
+	std	r2,GPR2(r1)
 	std	r3,GPR3(r1)
 	mfcr	r2
 	std	r4,GPR4(r1)
@@ -122,14 +118,13 @@ END_BTB_FLUSH_SECTION
 
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
 BEGIN_FW_FTR_SECTION
-	beq	33f
-	/* if from user, see if there are any DTL entries to process */
+	/* see if there are any DTL entries to process */
 	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
 	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
 	addi	r10,r10,LPPACA_DTLIDX
 	LDX_BE	r10,0,r10		/* get log write index */
-	cmpd	cr1,r11,r10
-	beq+	cr1,33f
+	cmpd	r11,r10
+	beq+	33f
 	bl	accumulate_stolen_time
 	REST_GPR(0,r1)
 	REST_4GPRS(3,r1)
@@ -203,6 +198,7 @@ system_call:			/* label this so stack traces look sane */
 	mtctr   r12
 	bctrl			/* Call handler */
 
+	/* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */
 .Lsyscall_exit:
 	std	r3,RESULT(r1)
 
@@ -216,11 +212,6 @@ system_call:			/* label this so stack traces look sane */
 	ld	r12, PACA_THREAD_INFO(r13)
 
 	ld	r8,_MSR(r1)
-#ifdef CONFIG_PPC_BOOK3S
-	/* No MSR:RI on BookE */
-	andi.	r10,r8,MSR_RI
-	beq-	.Lunrecov_restore
-#endif
 
 /*
  * This is a few instructions into the actual syscall exit path (which actually
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 6ba3cc2ef8ab..768f133de4f1 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1521,8 +1521,6 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
  * system call / hypercall (0xc00, 0x4c00)
  *
  * The system call exception is invoked with "sc 0" and does not alter HV bit.
- * There is support for kernel code to invoke system calls but there are no
- * in-tree users.
  *
  * The hypercall is invoked with "sc 1" and sets HV=1.
  *
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] powerpc/64: system call remove non-volatile GPR save optimisation
  2019-08-27  3:30 [PATCH 0/4] powerpc/64: syscalls in C Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-08-27  3:30 ` [PATCH 2/4] powerpc/64s: " Nicholas Piggin
@ 2019-08-27  3:30 ` Nicholas Piggin
  2019-08-27  8:10   ` Segher Boessenkool
  2019-08-27  3:30 ` [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C Nicholas Piggin
  4 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27  3:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

powerpc has an optimisation where interrupts avoid saving the
non-volatile (or callee saved) registers to the interrupt stack frame if
they are not required.

Two problems with this are that an interrupt does not always know
whether it will need non-volatiles; and if it does need them, they can
only be saved from the entry-scoped asm code (because we don't control
what the C compiler does with these registers).

system calls are about the worst example of this, some system calls
always require all registers (e.g., fork, to copy regs into the child).
Sometimes registers are only required under certain conditions (e.g.,
tracing, signal delivery). This causes weird ugly logic in the call
chains (e.g., ppc_fork), and also prevents moving substantial amounts
of logic to C.

Remove this optimisation for system calls, and always save NVGPRs on
entry. On modern CPUs this doesn't appear to be worth the complexity
because the stores can be hidden by other work quite well the null
syscall selftests benchmark on a POWER9 has (124.40ns before and
123.64ns after, i.e., well within the noise).

Other interrupts are a little less complex in NVGPR handling, many
do less work at entry and also they don't have as clear a fast path
for avoiding the GPR restore as system calls, so this patch does not
remove the facility completely until more tests can be done.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/entry_64.S           | 72 +++++-------------------
 arch/powerpc/kernel/syscalls/syscall.tbl | 22 +++++---
 2 files changed, 28 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6467bdab8d40..5a3e0b5c9ad1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -98,13 +98,14 @@ END_BTB_FLUSH_SECTION
 	std	r11,_XER(r1)
 	std	r11,_CTR(r1)
 	std	r9,GPR13(r1)
+	SAVE_NVGPRS(r1)
 	mflr	r10
 	/*
 	 * This clears CR0.SO (bit 28), which is the error indication on
 	 * return from this system call.
 	 */
 	rldimi	r2,r11,28,(63-28)
-	li	r11,0xc01
+	li	r11,0xc00
 	std	r10,_LINK(r1)
 	std	r11,_TRAP(r1)
 	std	r3,ORIG_GPR3(r1)
@@ -323,7 +324,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 /* Traced system call support */
 .Lsyscall_dotrace:
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_syscall_trace_enter
 
@@ -408,7 +408,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	mtmsrd	r10,1
 #endif /* CONFIG_PPC_BOOK3E */
 
-	bl	save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_syscall_trace_leave
 	b	ret_from_except
@@ -442,62 +441,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 _ASM_NOKPROBE_SYMBOL(system_call_common);
 _ASM_NOKPROBE_SYMBOL(system_call_exit);
 
-/* Save non-volatile GPRs, if not already saved. */
-_GLOBAL(save_nvgprs)
-	ld	r11,_TRAP(r1)
-	andi.	r0,r11,1
-	beqlr-
-	SAVE_NVGPRS(r1)
-	clrrdi	r0,r11,1
-	std	r0,_TRAP(r1)
-	blr
-_ASM_NOKPROBE_SYMBOL(save_nvgprs);
-
-	
-/*
- * The sigsuspend and rt_sigsuspend system calls can call do_signal
- * and thus put the process into the stopped state where we might
- * want to examine its user state with ptrace.  Therefore we need
- * to save all the nonvolatile registers (r14 - r31) before calling
- * the C code.  Similarly, fork, vfork and clone need the full
- * register state on the stack so that it can be copied to the child.
- */
-
-_GLOBAL(ppc_fork)
-	bl	save_nvgprs
-	bl	sys_fork
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_vfork)
-	bl	save_nvgprs
-	bl	sys_vfork
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_clone)
-	bl	save_nvgprs
-	bl	sys_clone
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_clone3)
-       bl      save_nvgprs
-       bl      sys_clone3
-       b       .Lsyscall_exit
-
-_GLOBAL(ppc32_swapcontext)
-	bl	save_nvgprs
-	bl	compat_sys_swapcontext
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc64_swapcontext)
-	bl	save_nvgprs
-	bl	sys_swapcontext
-	b	.Lsyscall_exit
-
-_GLOBAL(ppc_switch_endian)
-	bl	save_nvgprs
-	bl	sys_switch_endian
-	b	.Lsyscall_exit
-
 _GLOBAL(ret_from_fork)
 	bl	schedule_tail
 	REST_NVGPRS(r1)
@@ -516,6 +459,17 @@ _GLOBAL(ret_from_kernel_thread)
 	li	r3,0
 	b	.Lsyscall_exit
 
+/* Save non-volatile GPRs, if not already saved. */
+_GLOBAL(save_nvgprs)
+	ld	r11,_TRAP(r1)
+	andi.	r0,r11,1
+	beqlr-
+	SAVE_NVGPRS(r1)
+	clrrdi	r0,r11,1
+	std	r0,_TRAP(r1)
+	blr
+_ASM_NOKPROBE_SYMBOL(save_nvgprs);
+
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #define FLUSH_COUNT_CACHE	\
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 43f736ed47f2..010b9f445586 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -9,7 +9,9 @@
 #
 0	nospu	restart_syscall			sys_restart_syscall
 1	nospu	exit				sys_exit
-2	nospu	fork				ppc_fork
+2	32	fork				ppc_fork			sys_fork
+2	64	fork				sys_fork
+2	spu	fork				sys_ni_syscall
 3	common	read				sys_read
 4	common	write				sys_write
 5	common	open				sys_open			compat_sys_open
@@ -158,7 +160,9 @@
 119	32	sigreturn			sys_sigreturn			compat_sys_sigreturn
 119	64	sigreturn			sys_ni_syscall
 119	spu	sigreturn			sys_ni_syscall
-120	nospu	clone				ppc_clone
+120	32	clone				ppc_clone			sys_clone
+120	64	clone				sys_clone
+120	spu	clone				sys_ni_syscall
 121	common	setdomainname			sys_setdomainname
 122	common	uname				sys_newuname
 123	common	modify_ldt			sys_ni_syscall
@@ -240,7 +244,9 @@
 186	spu	sendfile			sys_sendfile64
 187	common	getpmsg				sys_ni_syscall
 188	common 	putpmsg				sys_ni_syscall
-189	nospu	vfork				ppc_vfork
+189	32	vfork				ppc_vfork			sys_vfork
+189	64	vfork				sys_vfork
+189	spu	vfork				sys_ni_syscall
 190	common	ugetrlimit			sys_getrlimit			compat_sys_getrlimit
 191	common	readahead			sys_readahead			compat_sys_readahead
 192	32	mmap2				sys_mmap2			compat_sys_mmap2
@@ -316,8 +322,8 @@
 248	32	clock_nanosleep			sys_clock_nanosleep_time32
 248	64	clock_nanosleep			sys_clock_nanosleep
 248	spu	clock_nanosleep			sys_clock_nanosleep
-249	32	swapcontext			ppc_swapcontext			ppc32_swapcontext
-249	64	swapcontext			ppc64_swapcontext
+249	32	swapcontext			ppc_swapcontext			compat_sys_swapcontext
+249	64	swapcontext			sys_swapcontext
 249	spu	swapcontext			sys_ni_syscall
 250	common	tgkill				sys_tgkill
 251	32	utimes				sys_utimes_time32
@@ -456,7 +462,7 @@
 361	common	bpf				sys_bpf
 362	nospu	execveat			sys_execveat			compat_sys_execveat
 363	32	switch_endian			sys_ni_syscall
-363	64	switch_endian			ppc_switch_endian
+363	64	switch_endian			sys_switch_endian
 363	spu	switch_endian			sys_ni_syscall
 364	common	userfaultfd			sys_userfaultfd
 365	common	membarrier			sys_membarrier
@@ -516,4 +522,6 @@
 432	common	fsmount				sys_fsmount
 433	common	fspick				sys_fspick
 434	common	pidfd_open			sys_pidfd_open
-435	nospu	clone3				ppc_clone3
+435	32	clone3				ppc_clone3			sys_clone3
+435	64	clone3				sys_clone3
+435	nospu	clone3				sys_ni_syscall
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C
  2019-08-27  3:30 [PATCH 0/4] powerpc/64: syscalls in C Nicholas Piggin
                   ` (3 preceding siblings ...)
  2019-08-27  3:30 ` [PATCH 3/4] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
@ 2019-08-27  3:30 ` Nicholas Piggin
  2019-08-27  6:46   ` Christophe Leroy
  2019-08-27 14:35   ` kbuild test robot
  4 siblings, 2 replies; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27  3:30 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

System call entry and particularly exit code is beyond the limit of what
is reasonable to implement in asm.

This conversion moves all conditional branches out of the asm code,
except a relatively simple test to see whether all GPRs should be
restored at exit time.

Null syscall test is about 5% faster after this patch, because the exit
work is handled under local_irq_disable, and the hard mask and pending
interrupt replay is handled after that, which avoids games with MSR.

The asm instruction scheduling has not really been analysed and
optimised yet, as there are higher level features and improvements to
add first.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/asm-prototypes.h |  11 -
 arch/powerpc/include/asm/ptrace.h         |   3 +
 arch/powerpc/include/asm/signal.h         |   2 +
 arch/powerpc/include/asm/switch_to.h      |   4 +
 arch/powerpc/include/asm/time.h           |   3 +
 arch/powerpc/kernel/Makefile              |   3 +-
 arch/powerpc/kernel/entry_64.S            | 343 ++++------------------
 arch/powerpc/kernel/process.c             |   6 +-
 arch/powerpc/kernel/signal.h              |   2 -
 arch/powerpc/kernel/syscall_64.c          | 202 +++++++++++++
 10 files changed, 271 insertions(+), 308 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index ec1c97a8e8cb..f00ef8924a99 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -92,14 +92,6 @@ long sys_switch_endian(void);
 notrace unsigned int __check_irq_replay(void);
 void notrace restore_interrupts(void);
 
-/* ptrace */
-long do_syscall_trace_enter(struct pt_regs *regs);
-void do_syscall_trace_leave(struct pt_regs *regs);
-
-/* process */
-void restore_math(struct pt_regs *regs);
-void restore_tm_state(struct pt_regs *regs);
-
 /* prom_init (OpenFirmware) */
 unsigned long __init prom_init(unsigned long r3, unsigned long r4,
 			       unsigned long pp,
@@ -110,9 +102,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
 void __init early_setup(unsigned long dt_ptr);
 void early_setup_secondary(void);
 
-/* time */
-void accumulate_stolen_time(void);
-
 /* misc runtime */
 extern u64 __bswapdi2(u64);
 extern s64 __lshrdi3(s64, int);
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index feee1b21bbd5..af363086403a 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -138,6 +138,9 @@ extern unsigned long profile_pc(struct pt_regs *regs);
 #define profile_pc(regs) instruction_pointer(regs)
 #endif
 
+long do_syscall_trace_enter(struct pt_regs *regs);
+void do_syscall_trace_leave(struct pt_regs *regs);
+
 #define kernel_stack_pointer(regs) ((regs)->gpr[1])
 static inline int is_syscall_success(struct pt_regs *regs)
 {
diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h
index 0803ca8b9149..0113be8dcb59 100644
--- a/arch/powerpc/include/asm/signal.h
+++ b/arch/powerpc/include/asm/signal.h
@@ -6,4 +6,6 @@
 #include <uapi/asm/signal.h>
 #include <uapi/asm/ptrace.h>
 
+void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
+
 #endif /* _ASM_POWERPC_SIGNAL_H */
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 5b03d8a82409..231fcb2058cf 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -5,6 +5,7 @@
 #ifndef _ASM_POWERPC_SWITCH_TO_H
 #define _ASM_POWERPC_SWITCH_TO_H
 
+#include <linux/sched.h>
 #include <asm/reg.h>
 
 struct thread_struct;
@@ -22,6 +23,9 @@ extern void switch_booke_debug_regs(struct debug_reg *new_debug);
 
 extern int emulate_altivec(struct pt_regs *);
 
+void restore_math(struct pt_regs *regs);
+void restore_tm_state(struct pt_regs *regs);
+
 extern void flush_all_to_thread(struct task_struct *);
 extern void giveup_all(struct task_struct *);
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 54f4ec1f9fab..ba97858c9d76 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -199,5 +199,8 @@ DECLARE_PER_CPU(u64, decrementers_next_tb);
 /* Convert timebase ticks to nanoseconds */
 unsigned long long tb_to_ns(unsigned long long tb_ticks);
 
+/* SPLPAR */
+void accumulate_stolen_time(void);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 56dfa7a2a6f2..7f53cc07f933 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -52,7 +52,8 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
 				   of_platform.o prom_parse.o
 obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 				   signal_64.o ptrace32.o \
-				   paca.o nvram_64.o firmware.o
+				   paca.o nvram_64.o firmware.o \
+				   syscall_64.o
 obj-$(CONFIG_VDSO32)		+= vdso32/
 obj-$(CONFIG_PPC_WATCHDOG)	+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 5a3e0b5c9ad1..2ec825a85f5b 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -63,12 +63,6 @@ exception_marker:
 
 	.globl system_call_common
 system_call_common:
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-BEGIN_FTR_SECTION
-	extrdi.	r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */
-	bne	.Ltabort_syscall
-END_FTR_SECTION_IFSET(CPU_FTR_TM)
-#endif
 	mr	r10,r1
 	ld	r1,PACAKSAVE(r13)
 	std	r10,0(r1)
@@ -76,350 +70,118 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 	std	r12,_MSR(r1)
 	std	r0,GPR0(r1)
 	std	r10,GPR1(r1)
+	std	r2,GPR2(r1)
 #ifdef CONFIG_PPC_FSL_BOOK3E
+	andi.	r10,r12,MSR_PR
+	beq	1f			/* if from kernel mode */
 START_BTB_FLUSH_SECTION
 	BTB_FLUSH(r10)
 END_BTB_FLUSH_SECTION
+1:
 #endif
-	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
-	std	r2,GPR2(r1)
+	ld	r2,PACATOC(r13)
+	mfcr	r12
+	li	r11,0
+	/* Can we avoid saving r3-r8 in common case? */
 	std	r3,GPR3(r1)
-	mfcr	r2
 	std	r4,GPR4(r1)
 	std	r5,GPR5(r1)
 	std	r6,GPR6(r1)
 	std	r7,GPR7(r1)
 	std	r8,GPR8(r1)
-	li	r11,0
+	/* Zero r9-r12, this should only be required when restoring all GPRs */
 	std	r11,GPR9(r1)
 	std	r11,GPR10(r1)
 	std	r11,GPR11(r1)
 	std	r11,GPR12(r1)
-	std	r11,_XER(r1)
-	std	r11,_CTR(r1)
 	std	r9,GPR13(r1)
 	SAVE_NVGPRS(r1)
+	std	r11,_XER(r1)
+	std	r11,_CTR(r1)
 	mflr	r10
+
 	/*
 	 * This clears CR0.SO (bit 28), which is the error indication on
 	 * return from this system call.
 	 */
-	rldimi	r2,r11,28,(63-28)
+	rldimi	r12,r11,28,(63-28)
 	li	r11,0xc00
 	std	r10,_LINK(r1)
 	std	r11,_TRAP(r1)
+	std	r12,_CCR(r1)
 	std	r3,ORIG_GPR3(r1)
-	std	r2,_CCR(r1)
-	ld	r2,PACATOC(r13)
-	addi	r9,r1,STACK_FRAME_OVERHEAD
+	addi	r10,r1,STACK_FRAME_OVERHEAD
 	ld	r11,exception_marker@toc(r2)
-	std	r11,-16(r9)		/* "regshere" marker */
-
-	kuap_check_amr r10, r11
-
-#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
-BEGIN_FW_FTR_SECTION
-	/* see if there are any DTL entries to process */
-	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
-	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
-	addi	r10,r10,LPPACA_DTLIDX
-	LDX_BE	r10,0,r10		/* get log write index */
-	cmpd	r11,r10
-	beq+	33f
-	bl	accumulate_stolen_time
-	REST_GPR(0,r1)
-	REST_4GPRS(3,r1)
-	REST_2GPRS(7,r1)
-	addi	r9,r1,STACK_FRAME_OVERHEAD
-33:
-END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE && CONFIG_PPC_SPLPAR */
-
-	/*
-	 * A syscall should always be called with interrupts enabled
-	 * so we just unconditionally hard-enable here. When some kind
-	 * of irq tracing is used, we additionally check that condition
-	 * is correct
-	 */
-#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) && defined(CONFIG_BUG)
-	lbz	r10,PACAIRQSOFTMASK(r13)
-1:	tdnei	r10,IRQS_ENABLED
-	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
-#endif
-
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	1
-#else
-	li	r11,MSR_RI
-	ori	r11,r11,MSR_EE
-	mtmsrd	r11,1
-#endif /* CONFIG_PPC_BOOK3E */
-
-system_call:			/* label this so stack traces look sane */
-	/* We do need to set SOFTE in the stack frame or the return
-	 * from interrupt will be painful
-	 */
-	li	r10,IRQS_ENABLED
-	std	r10,SOFTE(r1)
-
-	ld	r11, PACA_THREAD_INFO(r13)
-	ld	r10,TI_FLAGS(r11)
-	andi.	r11,r10,_TIF_SYSCALL_DOTRACE
-	bne	.Lsyscall_dotrace		/* does not return */
-	cmpldi	0,r0,NR_syscalls
-	bge-	.Lsyscall_enosys
-
-.Lsyscall:
-/*
- * Need to vector to 32 Bit or default sys_call_table here,
- * based on caller's run-mode / personality.
- */
-	ld	r11,SYS_CALL_TABLE@toc(2)
-	andis.	r10,r10,_TIF_32BIT@h
-	beq	15f
-	ld	r11,COMPAT_SYS_CALL_TABLE@toc(2)
-	clrldi	r3,r3,32
-	clrldi	r4,r4,32
-	clrldi	r5,r5,32
-	clrldi	r6,r6,32
-	clrldi	r7,r7,32
-	clrldi	r8,r8,32
-15:
-	slwi	r0,r0,3
-
-	barrier_nospec_asm
-	/*
-	 * Prevent the load of the handler below (based on the user-passed
-	 * system call number) being speculatively executed until the test
-	 * against NR_syscalls and branch to .Lsyscall_enosys above has
-	 * committed.
-	 */
+	std	r11,-16(r10)		/* "regshere" marker */
 
-	ldx	r12,r11,r0	/* Fetch system call handler [ptr] */
-	mtctr   r12
-	bctrl			/* Call handler */
+	/* Calling convention has r9 = orig r0, r10 = regs */
+	mr	r9,r0
+	bl	system_call_exception
 
-	/* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */
 .Lsyscall_exit:
-	std	r3,RESULT(r1)
-
-#ifdef CONFIG_DEBUG_RSEQ
-	/* Check whether the syscall is issued inside a restartable sequence */
-	addi    r3,r1,STACK_FRAME_OVERHEAD
-	bl      rseq_syscall
-	ld	r3,RESULT(r1)
-#endif
-
-	ld	r12, PACA_THREAD_INFO(r13)
-
-	ld	r8,_MSR(r1)
-
-/*
- * This is a few instructions into the actual syscall exit path (which actually
- * starts at .Lsyscall_exit) to cater to kprobe blacklisting and to reduce the
- * number of visible symbols for profiling purposes.
- *
- * We can probe from system_call until this point as MSR_RI is set. But once it
- * is cleared below, we won't be able to take a trap.
- *
- * This is blacklisted from kprobes further below with _ASM_NOKPROBE_SYMBOL().
- */
-system_call_exit:
-	/*
-	 * Disable interrupts so current_thread_info()->flags can't change,
-	 * and so that we don't get interrupted after loading SRR0/1.
-	 *
-	 * Leave MSR_RI enabled for now, because with THREAD_INFO_IN_TASK we
-	 * could fault on the load of the TI_FLAGS below.
-	 */
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	0
-#else
-	li	r11,MSR_RI
-	mtmsrd	r11,1
-#endif /* CONFIG_PPC_BOOK3E */
+	addi    r4,r1,STACK_FRAME_OVERHEAD
+	bl	syscall_exit_prepare
 
-	ld	r9,TI_FLAGS(r12)
-	li	r11,-MAX_ERRNO
-	andi.	r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
-	bne-	.Lsyscall_exit_work
+	ld	r2,_CCR(r1)
+	ld	r4,_NIP(r1)
+	ld	r5,_MSR(r1)
+	ld	r6,_LINK(r1)
 
-	andi.	r0,r8,MSR_FP
-	beq 2f
-#ifdef CONFIG_ALTIVEC
-	andis.	r0,r8,MSR_VEC@h
-	bne	3f
-#endif
-2:	addi    r3,r1,STACK_FRAME_OVERHEAD
-	bl	restore_math
-	ld	r8,_MSR(r1)
-	ld	r3,RESULT(r1)
-	li	r11,-MAX_ERRNO
-
-3:	cmpld	r3,r11
-	ld	r5,_CCR(r1)
-	bge-	.Lsyscall_error
-.Lsyscall_error_cont:
-	ld	r7,_NIP(r1)
 BEGIN_FTR_SECTION
 	stdcx.	r0,0,r1			/* to clear the reservation */
 END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
-	andi.	r6,r8,MSR_PR
-	ld	r4,_LINK(r1)
 
-	kuap_check_amr r10, r11
-
-#ifdef CONFIG_PPC_BOOK3S
-	/*
-	 * Clear MSR_RI, MSR_EE is already and remains disabled. We could do
-	 * this later, but testing shows that doing it here causes less slow
-	 * down than doing it closer to the rfid.
-	 */
-	li	r11,0
-	mtmsrd	r11,1
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	std	r5,PACATMSCRATCH(r13)
 #endif
+	mtlr	r6
 
-	beq-	1f
-	ACCOUNT_CPU_USER_EXIT(r13, r11, r12)
+	cmpdi	r3,0
+	bne	syscall_restore_regs
+.Lsyscall_restore_regs_cont:
 
 BEGIN_FTR_SECTION
 	HMT_MEDIUM_LOW
 END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-	std	r8, PACATMSCRATCH(r13)
-#endif
-
 	/*
 	 * We don't need to restore AMR on the way back to userspace for KUAP.
 	 * The value of AMR only matters while we're in the kernel.
 	 */
-	ld	r13,GPR13(r1)	/* only restore r13 if returning to usermode */
+	mtcr	r2
 	ld	r2,GPR2(r1)
+	ld	r3,GPR3(r1)
+	ld	r13,GPR13(r1)
 	ld	r1,GPR1(r1)
-	mtlr	r4
-	mtcr	r5
-	mtspr	SPRN_SRR0,r7
-	mtspr	SPRN_SRR1,r8
 	RFI_TO_USER
 	b	.	/* prevent speculative execution */
+_ASM_NOKPROBE_SYMBOL(system_call_common);
 
-1:	/* exit to kernel */
-	kuap_restore_amr r2
-
-	ld	r2,GPR2(r1)
-	ld	r1,GPR1(r1)
-	mtlr	r4
-	mtcr	r5
-	mtspr	SPRN_SRR0,r7
-	mtspr	SPRN_SRR1,r8
-	RFI_TO_KERNEL
-	b	.	/* prevent speculative execution */
-
-.Lsyscall_error:
-	oris	r5,r5,0x1000	/* Set SO bit in CR */
-	neg	r3,r3
-	std	r5,_CCR(r1)
-	b	.Lsyscall_error_cont
-
-/* Traced system call support */
-.Lsyscall_dotrace:
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	do_syscall_trace_enter
-
-	/*
-	 * We use the return value of do_syscall_trace_enter() as the syscall
-	 * number. If the syscall was rejected for any reason do_syscall_trace_enter()
-	 * returns an invalid syscall number and the test below against
-	 * NR_syscalls will fail.
-	 */
-	mr	r0,r3
-
-	/* Restore argument registers just clobbered and/or possibly changed. */
-	ld	r3,GPR3(r1)
-	ld	r4,GPR4(r1)
-	ld	r5,GPR5(r1)
-	ld	r6,GPR6(r1)
-	ld	r7,GPR7(r1)
-	ld	r8,GPR8(r1)
-
-	/* Repopulate r9 and r10 for the syscall path */
-	addi	r9,r1,STACK_FRAME_OVERHEAD
-	ld	r10, PACA_THREAD_INFO(r13)
-	ld	r10,TI_FLAGS(r10)
-
-	cmpldi	r0,NR_syscalls
-	blt+	.Lsyscall
-
-	/* Return code is already in r3 thanks to do_syscall_trace_enter() */
-	b	.Lsyscall_exit
-
-
-.Lsyscall_enosys:
-	li	r3,-ENOSYS
-	b	.Lsyscall_exit
-	
-.Lsyscall_exit_work:
-	/* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr.
-	 If TIF_NOERROR is set, just save r3 as it is. */
-
-	andi.	r0,r9,_TIF_RESTOREALL
-	beq+	0f
+syscall_restore_regs:
+	ld	r3,_CTR(r1)
+	ld	r4,_XER(r1)
 	REST_NVGPRS(r1)
-	b	2f
-0:	cmpld	r3,r11		/* r11 is -MAX_ERRNO */
-	blt+	1f
-	andi.	r0,r9,_TIF_NOERROR
-	bne-	1f
-	ld	r5,_CCR(r1)
-	neg	r3,r3
-	oris	r5,r5,0x1000	/* Set SO bit in CR */
-	std	r5,_CCR(r1)
-1:	std	r3,GPR3(r1)
-2:	andi.	r0,r9,(_TIF_PERSYSCALL_MASK)
-	beq	4f
-
-	/* Clear per-syscall TIF flags if any are set.  */
-
-	li	r11,_TIF_PERSYSCALL_MASK
-	addi	r12,r12,TI_FLAGS
-3:	ldarx	r10,0,r12
-	andc	r10,r10,r11
-	stdcx.	r10,0,r12
-	bne-	3b
-	subi	r12,r12,TI_FLAGS
-
-4:	/* Anything else left to do? */
-BEGIN_FTR_SECTION
-	lis	r3,DEFAULT_PPR@highest	/* Set default PPR */
-	sldi	r3,r3,32	/* bits 11-13 are used for ppr */
-	std	r3,_PPR(r1)
-END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
-
-	andi.	r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP)
-	beq	ret_from_except_lite
-
-	/* Re-enable interrupts */
-#ifdef CONFIG_PPC_BOOK3E
-	wrteei	1
-#else
-	li	r10,MSR_RI
-	ori	r10,r10,MSR_EE
-	mtmsrd	r10,1
-#endif /* CONFIG_PPC_BOOK3E */
-
-	addi	r3,r1,STACK_FRAME_OVERHEAD
-	bl	do_syscall_trace_leave
-	b	ret_from_except
+	mtctr	r3
+	mtspr	SPRN_XER,r4
+	ld	r0,GPR0(r1)
+	REST_8GPRS(4, r1)
+	ld	r12,GPR12(r1)
+	b	.Lsyscall_restore_regs_cont
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-.Ltabort_syscall:
+	.globl tabort_syscall
+tabort_syscall:
 	/* Firstly we need to enable TM in the kernel */
 	mfmsr	r10
 	li	r9, 1
 	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
 	mtmsrd	r10, 0
 
+	ld	r11,_NIP(r13)
+	ld	r12,_MSR(r13)
+
 	/* tabort, this dooms the transaction, nothing else */
 	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
 	TABORT(R9)
@@ -438,9 +200,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	RFI_TO_USER
 	b	.	/* prevent speculative execution */
 #endif
-_ASM_NOKPROBE_SYMBOL(system_call_common);
-_ASM_NOKPROBE_SYMBOL(system_call_exit);
-
 _GLOBAL(ret_from_fork)
 	bl	schedule_tail
 	REST_NVGPRS(r1)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 24621e7e5033..f444525da9ce 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1609,7 +1609,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
 	extern void ret_from_kernel_thread(void);
 	void (*f)(void);
 	unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE;
-	struct thread_info *ti = task_thread_info(p);
 
 	klp_init_thread_info(p);
 
@@ -1617,6 +1616,8 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
 	sp -= sizeof(struct pt_regs);
 	childregs = (struct pt_regs *) sp;
 	if (unlikely(p->flags & PF_KTHREAD)) {
+		struct thread_info *ti = task_thread_info(p);
+
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->gpr[1] = sp + sizeof(struct pt_regs);
@@ -1634,12 +1635,13 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
 	} else {
 		/* user thread */
 		struct pt_regs *regs = current_pt_regs();
+
 		CHECK_FULL_REGS(regs);
 		*childregs = *regs;
 		if (usp)
 			childregs->gpr[1] = usp;
 		p->thread.regs = childregs;
-		childregs->gpr[3] = 0;  /* Result from fork() */
+		/* ret_from_fork sets fork() result to 0 */
 		if (clone_flags & CLONE_SETTLS) {
 #ifdef CONFIG_PPC64
 			if (!is_32bit_task())
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index 800433685888..d396efca4068 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -10,8 +10,6 @@
 #ifndef _POWERPC_ARCH_SIGNAL_H
 #define _POWERPC_ARCH_SIGNAL_H
 
-extern void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
-
 extern void __user *get_sigframe(struct ksignal *ksig, unsigned long sp,
 				  size_t frame_size, int is_32);
 
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
new file mode 100644
index 000000000000..98ed970796d5
--- /dev/null
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -0,0 +1,202 @@
+#include <linux/err.h>
+#include <asm/hw_irq.h>
+#include <asm/paca.h>
+#include <asm/ptrace.h>
+#include <asm/reg.h>
+#include <asm/signal.h>
+#include <asm/switch_to.h>
+#include <asm/syscall.h>
+#include <asm/time.h>
+
+extern void __noreturn tabort_syscall(void);
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+static inline void account_cpu_user_entry(void)
+{
+	unsigned long tb = mftb();
+
+	local_paca->accounting.utime += (tb - local_paca->accounting.starttime_user);
+	local_paca->accounting.starttime = tb;
+}
+static inline void account_cpu_user_exit(void)
+{
+	unsigned long tb = mftb();
+
+	local_paca->accounting.stime += (tb - local_paca->accounting.starttime);
+	local_paca->accounting.starttime_user = tb;
+}
+#else
+static inline void account_cpu_user_entry(void)
+{
+}
+static inline void account_cpu_user_exit(void)
+{
+}
+#endif
+
+typedef long (*syscall_fn)(long, long, long, long, long, long);
+
+long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs)
+{
+	unsigned long ti_flags;
+	syscall_fn f;
+
+	BUG_ON(!(regs->msr & MSR_PR));
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+	if (unlikely(regs->msr & MSR_TS_T))
+		tabort_syscall();
+#endif
+
+	account_cpu_user_entry();
+
+#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
+	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		struct lppaca *lp = get_lppaca();
+
+		if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))
+			accumulate_stolen_time();
+	}
+#endif
+
+#ifdef CONFIG_PPC_KUAP_DEBUG
+	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
+		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
+#endif
+
+	/*
+	 * A syscall should always be called with interrupts enabled
+	 * so we just unconditionally hard-enable here. When some kind
+	 * of irq tracing is used, we additionally check that condition
+	 * is correct
+	 */
+#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)
+	WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
+	WARN_ON(local_paca->irq_happened);
+#endif
+
+	__hard_irq_enable();
+
+	/*
+	 * We do need to set SOFTE in the stack frame or the return
+	 * from interrupt will be painful
+	 */
+	regs->softe = IRQS_ENABLED;
+
+	ti_flags = current_thread_info()->flags;
+	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+		/*
+		 * We use the return value of do_syscall_trace_enter() as the
+		 * syscall number. If the syscall was rejected for any reason
+		 * do_syscall_trace_enter() returns an invalid syscall number
+		 * and the test below against NR_syscalls will fail.
+		 */
+		r0 = do_syscall_trace_enter(regs);
+	}
+
+	if (unlikely(r0 >= NR_syscalls))
+		return -ENOSYS;
+
+	/* May be faster to do array_index_nospec? */
+	barrier_nospec();
+
+	if (unlikely(ti_flags & _TIF_32BIT)) {
+		f = (void *)compat_sys_call_table[r0];
+
+		r3 &= 0x00000000ffffffffULL;
+		r4 &= 0x00000000ffffffffULL;
+		r5 &= 0x00000000ffffffffULL;
+		r6 &= 0x00000000ffffffffULL;
+		r7 &= 0x00000000ffffffffULL;
+		r8 &= 0x00000000ffffffffULL;
+
+	} else {
+		f = (void *)sys_call_table[r0];
+	}
+
+	return f(r3, r4, r5, r6, r7, r8);
+}
+
+unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs)
+{
+	unsigned long *ti_flagsp = &current_thread_info()->flags;
+	unsigned long ti_flags;
+	unsigned long ret = 0;
+
+	regs->result = r3;
+
+	/* Check whether the syscall is issued inside a restartable sequence */
+	rseq_syscall(regs);
+
+	ti_flags = *ti_flagsp;
+	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE))
+		do_syscall_trace_leave(regs);
+
+	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) {
+		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
+			r3 = -r3;
+			regs->ccr |= 0x10000000; /* Set SO bit in CR */
+		}
+	}
+
+	if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
+		if (ti_flags & _TIF_RESTOREALL)
+			ret = _TIF_RESTOREALL;
+		else
+			regs->gpr[3] = r3;
+		clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
+	} else {
+		regs->gpr[3] = r3;
+	}
+
+again:
+	local_irq_disable();
+	ti_flags = READ_ONCE(*ti_flagsp);
+	while (unlikely(ti_flags & _TIF_USER_WORK_MASK)) {
+		local_irq_enable();
+		if (ti_flags & _TIF_NEED_RESCHED) {
+			schedule();
+		} else {
+			/*
+			 * SIGPENDING must restore signal handler function
+			 * argument GPRs, and some non-volatiles (e.g., r1).
+			 * Restore all for now. This could be made lighter.
+			 */
+			if (ti_flags & _TIF_SIGPENDING)
+				ret |= _TIF_RESTOREALL;
+			do_notify_resume(regs, ti_flags);
+		}
+		local_irq_disable();
+		ti_flags = READ_ONCE(*ti_flagsp);
+	}
+
+#ifdef CONFIG_ALTIVEC
+	if ((regs->msr & (MSR_FP|MSR_VEC)) != (MSR_FP|MSR_VEC))
+#else
+	if ((regs->msr & MSR_FP) != MSR_FP)
+#endif
+		restore_math(regs);
+
+	/* This pattern matches prep_irq_for_idle */
+	__mtmsrd(0, 1);	/* Disable MSR_EE and MSR_RI */
+	if (unlikely(lazy_irq_pending())) {
+		__mtmsrd(MSR_RI, 1);
+		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+		local_irq_enable();
+		/* Took an interrupt which may have more exit work to do. */
+		goto again;
+	}
+	trace_hardirqs_on();
+	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
+	irq_soft_mask_set(IRQS_ENABLED);
+
+	account_cpu_user_exit();
+
+#ifdef CONFIG_PPC_KUAP_DEBUG
+	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
+		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
+#endif
+
+	return ret;
+}
+
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] powerpc: convert to copy_thread_tls
  2019-08-27  3:30 ` [PATCH 1/4] powerpc: convert to copy_thread_tls Nicholas Piggin
@ 2019-08-27  6:07   ` Christophe Leroy
  2019-08-27 10:13     ` Nicholas Piggin
  2019-09-02  3:29   ` Michael Ellerman
  1 sibling, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2019-08-27  6:07 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev



Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
> Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather
> than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it
> to avoid a subtle assumption about the argument ordering of clone type
> syscalls.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/Kconfig          | 1 +
>   arch/powerpc/kernel/process.c | 9 +++++----
>   2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index d8dcd8820369..7477a3263225 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -182,6 +182,7 @@ config PPC
>   	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
>   	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>   	select HAVE_CONTEXT_TRACKING		if PPC64
> +	select HAVE_COPY_THREAD_TLS
>   	select HAVE_DEBUG_KMEMLEAK
>   	select HAVE_DEBUG_STACKOVERFLOW
>   	select HAVE_DYNAMIC_FTRACE
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 8fc4de0d22b4..24621e7e5033 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1600,8 +1600,9 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp)
>   /*
>    * Copy architecture-specific thread state
>    */
> -int copy_thread(unsigned long clone_flags, unsigned long usp,
> -		unsigned long kthread_arg, struct task_struct *p)
> +int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
> +		unsigned long kthread_arg, struct task_struct *p,
> +		unsigned long tls)
>   {
>   	struct pt_regs *childregs, *kregs;
>   	extern void ret_from_fork(void);
> @@ -1642,10 +1643,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>   		if (clone_flags & CLONE_SETTLS) {
>   #ifdef CONFIG_PPC64

is_32bit_task() exists and always returns 1 on PPC32 so this gross ifdef 
in the middle of an if/else is pointless, it should be dropped.

>   			if (!is_32bit_task())
> -				childregs->gpr[13] = childregs->gpr[6];
> +				childregs->gpr[13] = tls;
>   			else
>   #endif
> -				childregs->gpr[2] = childregs->gpr[6];
> +				childregs->gpr[2] = tls;
>   		}
>   
>   		f = ret_from_fork;
> 

Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls
  2019-08-27  3:30 ` [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls Nicholas Piggin
@ 2019-08-27  6:13   ` Christophe Leroy
  2019-08-27 10:20     ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2019-08-27  6:13 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev



Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
> There is support for the kernel to execute the 'sc 0' instruction and
> make a system call to itself. This is a relic that is unused in the
> tree, therefore untested. It's also highly questionable for modules to
> be doing this.

I like it.

I dropped support for that in PPC32 when I added fast-path syscalls.

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/kernel/entry_64.S       | 21 ++++++---------------
>   arch/powerpc/kernel/exceptions-64s.S |  2 --
>   2 files changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 0a0b5310f54a..6467bdab8d40 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -69,24 +69,20 @@ BEGIN_FTR_SECTION
>   	bne	.Ltabort_syscall
>   END_FTR_SECTION_IFSET(CPU_FTR_TM)
>   #endif
> -	andi.	r10,r12,MSR_PR
>   	mr	r10,r1
> -	addi	r1,r1,-INT_FRAME_SIZE
> -	beq-	1f
>   	ld	r1,PACAKSAVE(r13)
> -1:	std	r10,0(r1)
> +	std	r10,0(r1)
>   	std	r11,_NIP(r1)
>   	std	r12,_MSR(r1)
>   	std	r0,GPR0(r1)
>   	std	r10,GPR1(r1)
> -	beq	2f			/* if from kernel mode */
>   #ifdef CONFIG_PPC_FSL_BOOK3E
>   START_BTB_FLUSH_SECTION
>   	BTB_FLUSH(r10)
>   END_BTB_FLUSH_SECTION
>   #endif
>   	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
> -2:	std	r2,GPR2(r1)
> +	std	r2,GPR2(r1)
>   	std	r3,GPR3(r1)
>   	mfcr	r2
>   	std	r4,GPR4(r1)
> @@ -122,14 +118,13 @@ END_BTB_FLUSH_SECTION
>   
>   #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
>   BEGIN_FW_FTR_SECTION
> -	beq	33f
> -	/* if from user, see if there are any DTL entries to process */
> +	/* see if there are any DTL entries to process */
>   	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
>   	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
>   	addi	r10,r10,LPPACA_DTLIDX
>   	LDX_BE	r10,0,r10		/* get log write index */
> -	cmpd	cr1,r11,r10
> -	beq+	cr1,33f
> +	cmpd	r11,r10
> +	beq+	33f

Any need to do this change ? Why not keep it as is ?

Christophe

>   	bl	accumulate_stolen_time
>   	REST_GPR(0,r1)
>   	REST_4GPRS(3,r1)
> @@ -203,6 +198,7 @@ system_call:			/* label this so stack traces look sane */
>   	mtctr   r12
>   	bctrl			/* Call handler */
>   
> +	/* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */
>   .Lsyscall_exit:
>   	std	r3,RESULT(r1)
>   
> @@ -216,11 +212,6 @@ system_call:			/* label this so stack traces look sane */
>   	ld	r12, PACA_THREAD_INFO(r13)
>   
>   	ld	r8,_MSR(r1)
> -#ifdef CONFIG_PPC_BOOK3S
> -	/* No MSR:RI on BookE */
> -	andi.	r10,r8,MSR_RI
> -	beq-	.Lunrecov_restore
> -#endif
>   
>   /*
>    * This is a few instructions into the actual syscall exit path (which actually
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 6ba3cc2ef8ab..768f133de4f1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1521,8 +1521,6 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
>    * system call / hypercall (0xc00, 0x4c00)
>    *
>    * The system call exception is invoked with "sc 0" and does not alter HV bit.
> - * There is support for kernel code to invoke system calls but there are no
> - * in-tree users.
>    *
>    * The hypercall is invoked with "sc 1" and sets HV=1.
>    *
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] powerpc/64s: remove support for kernel-mode syscalls
  2019-08-27  3:30 ` [PATCH 2/4] powerpc/64s: " Nicholas Piggin
@ 2019-08-27  6:14   ` Christophe Leroy
  2019-08-27 10:21     ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2019-08-27  6:14 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev

Euh ... That's a duplicate of [PATCH 2/4] "powerpc/64: remove support 
for kernel-mode syscalls" ?

Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
> There is support for the kernel to execute the 'sc 0' instruction and
> make a system call to itself. This is a relic that is unused in the
> tree, therefore untested. It's also highly questionable for modules to
> be doing this.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/kernel/entry_64.S       | 21 ++++++---------------
>   arch/powerpc/kernel/exceptions-64s.S |  2 --
>   2 files changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 0a0b5310f54a..6467bdab8d40 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -69,24 +69,20 @@ BEGIN_FTR_SECTION
>   	bne	.Ltabort_syscall
>   END_FTR_SECTION_IFSET(CPU_FTR_TM)
>   #endif
> -	andi.	r10,r12,MSR_PR
>   	mr	r10,r1
> -	addi	r1,r1,-INT_FRAME_SIZE
> -	beq-	1f
>   	ld	r1,PACAKSAVE(r13)
> -1:	std	r10,0(r1)
> +	std	r10,0(r1)
>   	std	r11,_NIP(r1)
>   	std	r12,_MSR(r1)
>   	std	r0,GPR0(r1)
>   	std	r10,GPR1(r1)
> -	beq	2f			/* if from kernel mode */
>   #ifdef CONFIG_PPC_FSL_BOOK3E
>   START_BTB_FLUSH_SECTION
>   	BTB_FLUSH(r10)
>   END_BTB_FLUSH_SECTION
>   #endif
>   	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
> -2:	std	r2,GPR2(r1)
> +	std	r2,GPR2(r1)
>   	std	r3,GPR3(r1)
>   	mfcr	r2
>   	std	r4,GPR4(r1)
> @@ -122,14 +118,13 @@ END_BTB_FLUSH_SECTION
>   
>   #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
>   BEGIN_FW_FTR_SECTION
> -	beq	33f
> -	/* if from user, see if there are any DTL entries to process */
> +	/* see if there are any DTL entries to process */
>   	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
>   	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
>   	addi	r10,r10,LPPACA_DTLIDX
>   	LDX_BE	r10,0,r10		/* get log write index */
> -	cmpd	cr1,r11,r10
> -	beq+	cr1,33f
> +	cmpd	r11,r10
> +	beq+	33f
>   	bl	accumulate_stolen_time
>   	REST_GPR(0,r1)
>   	REST_4GPRS(3,r1)
> @@ -203,6 +198,7 @@ system_call:			/* label this so stack traces look sane */
>   	mtctr   r12
>   	bctrl			/* Call handler */
>   
> +	/* syscall_exit can exit to kernel mode, via ret_from_kernel_thread */
>   .Lsyscall_exit:
>   	std	r3,RESULT(r1)
>   
> @@ -216,11 +212,6 @@ system_call:			/* label this so stack traces look sane */
>   	ld	r12, PACA_THREAD_INFO(r13)
>   
>   	ld	r8,_MSR(r1)
> -#ifdef CONFIG_PPC_BOOK3S
> -	/* No MSR:RI on BookE */
> -	andi.	r10,r8,MSR_RI
> -	beq-	.Lunrecov_restore
> -#endif
>   
>   /*
>    * This is a few instructions into the actual syscall exit path (which actually
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 6ba3cc2ef8ab..768f133de4f1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1521,8 +1521,6 @@ EXC_COMMON(trap_0b_common, 0xb00, unknown_exception)
>    * system call / hypercall (0xc00, 0x4c00)
>    *
>    * The system call exception is invoked with "sc 0" and does not alter HV bit.
> - * There is support for kernel code to invoke system calls but there are no
> - * in-tree users.
>    *
>    * The hypercall is invoked with "sc 1" and sets HV=1.
>    *
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C
  2019-08-27  3:30 ` [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C Nicholas Piggin
@ 2019-08-27  6:46   ` Christophe Leroy
  2019-08-27 10:30     ` Nicholas Piggin
  2019-08-27 14:35   ` kbuild test robot
  1 sibling, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2019-08-27  6:46 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev



Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
> System call entry and particularly exit code is beyond the limit of what
> is reasonable to implement in asm.
> 
> This conversion moves all conditional branches out of the asm code,
> except a relatively simple test to see whether all GPRs should be
> restored at exit time.
> 
> Null syscall test is about 5% faster after this patch, because the exit
> work is handled under local_irq_disable, and the hard mask and pending
> interrupt replay is handled after that, which avoids games with MSR.

Interesting optimisation.

> 
> The asm instruction scheduling has not really been analysed and
> optimised yet, as there are higher level features and improvements to
> add first.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/asm-prototypes.h |  11 -
>   arch/powerpc/include/asm/ptrace.h         |   3 +
>   arch/powerpc/include/asm/signal.h         |   2 +
>   arch/powerpc/include/asm/switch_to.h      |   4 +
>   arch/powerpc/include/asm/time.h           |   3 +
>   arch/powerpc/kernel/Makefile              |   3 +-
>   arch/powerpc/kernel/entry_64.S            | 343 ++++------------------
>   arch/powerpc/kernel/process.c             |   6 +-
>   arch/powerpc/kernel/signal.h              |   2 -
>   arch/powerpc/kernel/syscall_64.c          | 202 +++++++++++++
>   10 files changed, 271 insertions(+), 308 deletions(-)
>   create mode 100644 arch/powerpc/kernel/syscall_64.c
> 

[...]

> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 24621e7e5033..f444525da9ce 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1609,7 +1609,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>   	extern void ret_from_kernel_thread(void);
>   	void (*f)(void);
>   	unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE;
> -	struct thread_info *ti = task_thread_info(p);
>   
>   	klp_init_thread_info(p);
>   
> @@ -1617,6 +1616,8 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>   	sp -= sizeof(struct pt_regs);
>   	childregs = (struct pt_regs *) sp;
>   	if (unlikely(p->flags & PF_KTHREAD)) {
> +		struct thread_info *ti = task_thread_info(p);
> +
>   		/* kernel thread */
>   		memset(childregs, 0, sizeof(struct pt_regs));
>   		childregs->gpr[1] = sp + sizeof(struct pt_regs);
> @@ -1634,12 +1635,13 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>   	} else {
>   		/* user thread */
>   		struct pt_regs *regs = current_pt_regs();
> +
>   		CHECK_FULL_REGS(regs);
>   		*childregs = *regs;
>   		if (usp)
>   			childregs->gpr[1] = usp;
>   		p->thread.regs = childregs;
> -		childregs->gpr[3] = 0;  /* Result from fork() */
> +		/* ret_from_fork sets fork() result to 0 */

Does PPC32 ret_from_fork() do it as well ?

>   		if (clone_flags & CLONE_SETTLS) {
>   #ifdef CONFIG_PPC64
>   			if (!is_32bit_task())
> diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
> index 800433685888..d396efca4068 100644
> --- a/arch/powerpc/kernel/signal.h
> +++ b/arch/powerpc/kernel/signal.h
> @@ -10,8 +10,6 @@
>   #ifndef _POWERPC_ARCH_SIGNAL_H
>   #define _POWERPC_ARCH_SIGNAL_H
>   
> -extern void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags);
> -
>   extern void __user *get_sigframe(struct ksignal *ksig, unsigned long sp,
>   				  size_t frame_size, int is_32);
>   
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> new file mode 100644
> index 000000000000..98ed970796d5
> --- /dev/null
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -0,0 +1,202 @@
> +#include <linux/err.h>
> +#include <asm/hw_irq.h>
> +#include <asm/paca.h>
> +#include <asm/ptrace.h>
> +#include <asm/reg.h>
> +#include <asm/signal.h>
> +#include <asm/switch_to.h>
> +#include <asm/syscall.h>
> +#include <asm/time.h>
> +
> +extern void __noreturn tabort_syscall(void);
> +
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> +static inline void account_cpu_user_entry(void)
> +{
> +	unsigned long tb = mftb();
> +
> +	local_paca->accounting.utime += (tb - local_paca->accounting.starttime_user);
> +	local_paca->accounting.starttime = tb;
> +}
> +static inline void account_cpu_user_exit(void)
> +{
> +	unsigned long tb = mftb();
> +
> +	local_paca->accounting.stime += (tb - local_paca->accounting.starttime);
> +	local_paca->accounting.starttime_user = tb;
> +}
> +#else
> +static inline void account_cpu_user_entry(void)
> +{
> +}
> +static inline void account_cpu_user_exit(void)
> +{
> +}
> +#endif

I think this block should go in arch/powerpc/include/asm/cputime.h, we 
should limit #ifdefs as much as possible in C files.

And use get_accounting() instead of local_paca->accounting in order to 
be usable on PPC32 as well



> +
> +typedef long (*syscall_fn)(long, long, long, long, long, long);
> +
> +long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs)
> +{
> +	unsigned long ti_flags;
> +	syscall_fn f;
> +
> +	BUG_ON(!(regs->msr & MSR_PR));
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +	if (unlikely(regs->msr & MSR_TS_T))
> +		tabort_syscall();
> +#endif

MSR_TS_T and tabort_syscall() are declared regardless of 
CONFIG_PPC_TRANSACTIONAL_MEM so IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) 
should be used instead of #ifdef

> +
> +	account_cpu_user_entry();
> +
> +#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
> +	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
> +		struct lppaca *lp = get_lppaca();
> +
> +		if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))
> +			accumulate_stolen_time();
> +	}
> +#endif

Same here, I think everything is available so IS_ENABLED() should be 
used instead of #if

> +
> +#ifdef CONFIG_PPC_KUAP_DEBUG
> +	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
> +		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
> +#endif

This should go in a helper in one of the kup/kuap header files.

> +
> +	/*
> +	 * A syscall should always be called with interrupts enabled
> +	 * so we just unconditionally hard-enable here. When some kind
> +	 * of irq tracing is used, we additionally check that condition
> +	 * is correct
> +	 */
> +#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)
> +	WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
> +	WARN_ON(local_paca->irq_happened);
> +#endif

Can we use IS_ENABLED() here as well ?

> +
> +	__hard_irq_enable();
> +
> +	/*
> +	 * We do need to set SOFTE in the stack frame or the return
> +	 * from interrupt will be painful
> +	 */
> +	regs->softe = IRQS_ENABLED;
> +
> +	ti_flags = current_thread_info()->flags;
> +	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> +		/*
> +		 * We use the return value of do_syscall_trace_enter() as the
> +		 * syscall number. If the syscall was rejected for any reason
> +		 * do_syscall_trace_enter() returns an invalid syscall number
> +		 * and the test below against NR_syscalls will fail.
> +		 */
> +		r0 = do_syscall_trace_enter(regs);
> +	}
> +
> +	if (unlikely(r0 >= NR_syscalls))
> +		return -ENOSYS;
> +
> +	/* May be faster to do array_index_nospec? */
> +	barrier_nospec();
> +
> +	if (unlikely(ti_flags & _TIF_32BIT)) {
> +		f = (void *)compat_sys_call_table[r0];
> +
> +		r3 &= 0x00000000ffffffffULL;
> +		r4 &= 0x00000000ffffffffULL;
> +		r5 &= 0x00000000ffffffffULL;
> +		r6 &= 0x00000000ffffffffULL;
> +		r7 &= 0x00000000ffffffffULL;
> +		r8 &= 0x00000000ffffffffULL;
> +
> +	} else {
> +		f = (void *)sys_call_table[r0];
> +	}
> +
> +	return f(r3, r4, r5, r6, r7, r8);
> +}
> +
> +unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs)
> +{
> +	unsigned long *ti_flagsp = &current_thread_info()->flags;
> +	unsigned long ti_flags;
> +	unsigned long ret = 0;
> +
> +	regs->result = r3;
> +
> +	/* Check whether the syscall is issued inside a restartable sequence */
> +	rseq_syscall(regs);
> +
> +	ti_flags = *ti_flagsp;
> +	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE))
> +		do_syscall_trace_leave(regs);
> +
> +	if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) {
> +		if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> +			r3 = -r3;
> +			regs->ccr |= 0x10000000; /* Set SO bit in CR */
> +		}
> +	}
> +
> +	if (unlikely(ti_flags & _TIF_PERSYSCALL_MASK)) {
> +		if (ti_flags & _TIF_RESTOREALL)
> +			ret = _TIF_RESTOREALL;
> +		else
> +			regs->gpr[3] = r3;
> +		clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
> +	} else {
> +		regs->gpr[3] = r3;
> +	}
> +
> +again:
> +	local_irq_disable();
> +	ti_flags = READ_ONCE(*ti_flagsp);
> +	while (unlikely(ti_flags & _TIF_USER_WORK_MASK)) {
> +		local_irq_enable();
> +		if (ti_flags & _TIF_NEED_RESCHED) {
> +			schedule();
> +		} else {
> +			/*
> +			 * SIGPENDING must restore signal handler function
> +			 * argument GPRs, and some non-volatiles (e.g., r1).
> +			 * Restore all for now. This could be made lighter.
> +			 */
> +			if (ti_flags & _TIF_SIGPENDING)
> +				ret |= _TIF_RESTOREALL;
> +			do_notify_resume(regs, ti_flags);
> +		}
> +		local_irq_disable();
> +		ti_flags = READ_ONCE(*ti_flagsp);
> +	}
> +
> +#ifdef CONFIG_ALTIVEC
> +	if ((regs->msr & (MSR_FP|MSR_VEC)) != (MSR_FP|MSR_VEC))
> +#else
> +	if ((regs->msr & MSR_FP) != MSR_FP)
> +#endif

Use 'if (IS_ENABLED(CONFIG_ALTIVEC)) / else' instead of an 
#ifdef/#else/#endif

> +		restore_math(regs);
> +
> +	/* This pattern matches prep_irq_for_idle */
> +	__mtmsrd(0, 1);	/* Disable MSR_EE and MSR_RI */
> +	if (unlikely(lazy_irq_pending())) {
> +		__mtmsrd(MSR_RI, 1);
> +		local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +		local_irq_enable();
> +		/* Took an interrupt which may have more exit work to do. */
> +		goto again;
> +	}
> +	trace_hardirqs_on();
> +	local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
> +	irq_soft_mask_set(IRQS_ENABLED);
> +
> +	account_cpu_user_exit();
> +
> +#ifdef CONFIG_PPC_KUAP_DEBUG
> +	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
> +		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
> +#endif

Same, define a helper in the kuap header files to avoid the #ifdefs and 
platform specif stuff here.

> +
> +	return ret;
> +}
> +
> 

Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] powerpc/64: system call remove non-volatile GPR save optimisation
  2019-08-27  3:30 ` [PATCH 3/4] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
@ 2019-08-27  8:10   ` Segher Boessenkool
  0 siblings, 0 replies; 19+ messages in thread
From: Segher Boessenkool @ 2019-08-27  8:10 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev

On Tue, Aug 27, 2019 at 01:30:09PM +1000, Nicholas Piggin wrote:
> -435	nospu	clone3				ppc_clone3
> +435	32	clone3				ppc_clone3			sys_clone3
> +435	64	clone3				sys_clone3
> +435	nospu	clone3				sys_ni_syscall

s/no//  on that last line.  Rest looks good to me :-)


Segher

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] powerpc: convert to copy_thread_tls
  2019-08-27  6:07   ` Christophe Leroy
@ 2019-08-27 10:13     ` Nicholas Piggin
  2019-08-27 14:00       ` Christophe Leroy
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27 10:13 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

Christophe Leroy's on August 27, 2019 4:07 pm:
> 
> 
> Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
>> Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather
>> than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it
>> to avoid a subtle assumption about the argument ordering of clone type
>> syscalls.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/Kconfig          | 1 +
>>   arch/powerpc/kernel/process.c | 9 +++++----
>>   2 files changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index d8dcd8820369..7477a3263225 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -182,6 +182,7 @@ config PPC
>>   	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
>>   	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>>   	select HAVE_CONTEXT_TRACKING		if PPC64
>> +	select HAVE_COPY_THREAD_TLS
>>   	select HAVE_DEBUG_KMEMLEAK
>>   	select HAVE_DEBUG_STACKOVERFLOW
>>   	select HAVE_DYNAMIC_FTRACE
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index 8fc4de0d22b4..24621e7e5033 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1600,8 +1600,9 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp)
>>   /*
>>    * Copy architecture-specific thread state
>>    */
>> -int copy_thread(unsigned long clone_flags, unsigned long usp,
>> -		unsigned long kthread_arg, struct task_struct *p)
>> +int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>> +		unsigned long kthread_arg, struct task_struct *p,
>> +		unsigned long tls)
>>   {
>>   	struct pt_regs *childregs, *kregs;
>>   	extern void ret_from_fork(void);
>> @@ -1642,10 +1643,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>>   		if (clone_flags & CLONE_SETTLS) {
>>   #ifdef CONFIG_PPC64
> 
> is_32bit_task() exists and always returns 1 on PPC32 so this gross ifdef 
> in the middle of an if/else is pointless, it should be dropped.

I could do that as another patch in the series.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls
  2019-08-27  6:13   ` Christophe Leroy
@ 2019-08-27 10:20     ` Nicholas Piggin
  2019-08-27 10:41       ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27 10:20 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

Christophe Leroy's on August 27, 2019 4:13 pm:
> 
> 
> Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
>> There is support for the kernel to execute the 'sc 0' instruction and
>> make a system call to itself. This is a relic that is unused in the
>> tree, therefore untested. It's also highly questionable for modules to
>> be doing this.
> 
> I like it.
> 
> I dropped support for that in PPC32 when I added fast-path syscalls.

Good, then we'll match again.

>> -	beq	2f			/* if from kernel mode */
>>   #ifdef CONFIG_PPC_FSL_BOOK3E
>>   START_BTB_FLUSH_SECTION
>>   	BTB_FLUSH(r10)
>>   END_BTB_FLUSH_SECTION
>>   #endif
>>   	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
>> -2:	std	r2,GPR2(r1)

Btw. there is a hunk which restores this optimisation but it leaked
into a later patch, I'll move it back here.

>> @@ -122,14 +118,13 @@ END_BTB_FLUSH_SECTION
>>   
>>   #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
>>   BEGIN_FW_FTR_SECTION
>> -	beq	33f
>> -	/* if from user, see if there are any DTL entries to process */
>> +	/* see if there are any DTL entries to process */
>>   	ld	r10,PACALPPACAPTR(r13)	/* get ptr to VPA */
>>   	ld	r11,PACA_DTL_RIDX(r13)	/* get log read index */
>>   	addi	r10,r10,LPPACA_DTLIDX
>>   	LDX_BE	r10,0,r10		/* get log write index */
>> -	cmpd	cr1,r11,r10
>> -	beq+	cr1,33f
>> +	cmpd	r11,r10
>> +	beq+	33f
> 
> Any need to do this change ? Why not keep it as is ?

We don't need to keep cr0 alive with the MSR_PR compare, so I prefer
to keep the cr usage more compact, and I prefer to have cr0 for short
lived results and others for long running ones as a rule.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] powerpc/64s: remove support for kernel-mode syscalls
  2019-08-27  6:14   ` Christophe Leroy
@ 2019-08-27 10:21     ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27 10:21 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

Christophe Leroy's on August 27, 2019 4:14 pm:
> Euh ... That's a duplicate of [PATCH 2/4] "powerpc/64: remove support 
> for kernel-mode syscalls" ?

Yeah sorry I changed the title and got myself confused.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C
  2019-08-27  6:46   ` Christophe Leroy
@ 2019-08-27 10:30     ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27 10:30 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

Christophe Leroy's on August 27, 2019 4:46 pm:
> 
> 
> Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
>> System call entry and particularly exit code is beyond the limit of what
>> is reasonable to implement in asm.
>> 
>> This conversion moves all conditional branches out of the asm code,
>> except a relatively simple test to see whether all GPRs should be
>> restored at exit time.
>> 
>> Null syscall test is about 5% faster after this patch, because the exit
>> work is handled under local_irq_disable, and the hard mask and pending
>> interrupt replay is handled after that, which avoids games with MSR.
> 
> Interesting optimisation.
> 
>> 
>> The asm instruction scheduling has not really been analysed and
>> optimised yet, as there are higher level features and improvements to
>> add first.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/asm-prototypes.h |  11 -
>>   arch/powerpc/include/asm/ptrace.h         |   3 +
>>   arch/powerpc/include/asm/signal.h         |   2 +
>>   arch/powerpc/include/asm/switch_to.h      |   4 +
>>   arch/powerpc/include/asm/time.h           |   3 +
>>   arch/powerpc/kernel/Makefile              |   3 +-
>>   arch/powerpc/kernel/entry_64.S            | 343 ++++------------------
>>   arch/powerpc/kernel/process.c             |   6 +-
>>   arch/powerpc/kernel/signal.h              |   2 -
>>   arch/powerpc/kernel/syscall_64.c          | 202 +++++++++++++
>>   10 files changed, 271 insertions(+), 308 deletions(-)
>>   create mode 100644 arch/powerpc/kernel/syscall_64.c
>> 
> 
> [...]
> 
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index 24621e7e5033..f444525da9ce 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -1609,7 +1609,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>>   	extern void ret_from_kernel_thread(void);
>>   	void (*f)(void);
>>   	unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE;
>> -	struct thread_info *ti = task_thread_info(p);
>>   
>>   	klp_init_thread_info(p);
>>   
>> @@ -1617,6 +1616,8 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>>   	sp -= sizeof(struct pt_regs);
>>   	childregs = (struct pt_regs *) sp;
>>   	if (unlikely(p->flags & PF_KTHREAD)) {
>> +		struct thread_info *ti = task_thread_info(p);
>> +
>>   		/* kernel thread */
>>   		memset(childregs, 0, sizeof(struct pt_regs));
>>   		childregs->gpr[1] = sp + sizeof(struct pt_regs);
>> @@ -1634,12 +1635,13 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>>   	} else {
>>   		/* user thread */
>>   		struct pt_regs *regs = current_pt_regs();
>> +
>>   		CHECK_FULL_REGS(regs);
>>   		*childregs = *regs;
>>   		if (usp)
>>   			childregs->gpr[1] = usp;
>>   		p->thread.regs = childregs;
>> -		childregs->gpr[3] = 0;  /* Result from fork() */
>> +		/* ret_from_fork sets fork() result to 0 */
> 
> Does PPC32 ret_from_fork() do it as well ?

AFAICT it does, but I guess it should be a patch by itself
with powerpc: prefix. I will split it out.

>> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>> +static inline void account_cpu_user_entry(void)
>> +{
>> +	unsigned long tb = mftb();
>> +
>> +	local_paca->accounting.utime += (tb - local_paca->accounting.starttime_user);
>> +	local_paca->accounting.starttime = tb;
>> +}
>> +static inline void account_cpu_user_exit(void)
>> +{
>> +	unsigned long tb = mftb();
>> +
>> +	local_paca->accounting.stime += (tb - local_paca->accounting.starttime);
>> +	local_paca->accounting.starttime_user = tb;
>> +}
>> +#else
>> +static inline void account_cpu_user_entry(void)
>> +{
>> +}
>> +static inline void account_cpu_user_exit(void)
>> +{
>> +}
>> +#endif
> 
> I think this block should go in arch/powerpc/include/asm/cputime.h, we 
> should limit #ifdefs as much as possible in C files.
> 
> And use get_accounting() instead of local_paca->accounting in order to 
> be usable on PPC32 as well

Sure thing. I need to move them for other interrupt return in C as
well.

>> +
>> +typedef long (*syscall_fn)(long, long, long, long, long, long);
>> +
>> +long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs)
>> +{
>> +	unsigned long ti_flags;
>> +	syscall_fn f;
>> +
>> +	BUG_ON(!(regs->msr & MSR_PR));
>> +
>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>> +	if (unlikely(regs->msr & MSR_TS_T))
>> +		tabort_syscall();
>> +#endif
> 
> MSR_TS_T and tabort_syscall() are declared regardless of 
> CONFIG_PPC_TRANSACTIONAL_MEM so IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) 
> should be used instead of #ifdef

Will do.

>> +#if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC_SPLPAR)
>> +	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
>> +		struct lppaca *lp = get_lppaca();
>> +
>> +		if (unlikely(local_paca->dtl_ridx != lp->dtl_idx))
>> +			accumulate_stolen_time();
>> +	}
>> +#endif
> 
> Same here, I think everything is available so IS_ENABLED() should be 
> used instead of #if

Will do.

>> +
>> +#ifdef CONFIG_PPC_KUAP_DEBUG
>> +	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
>> +		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
>> +#endif
> 
> This should go in a helper in one of the kup/kuap header files.

Will do.

>> +
>> +	/*
>> +	 * A syscall should always be called with interrupts enabled
>> +	 * so we just unconditionally hard-enable here. When some kind
>> +	 * of irq tracing is used, we additionally check that condition
>> +	 * is correct
>> +	 */
>> +#if defined(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)
>> +	WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
>> +	WARN_ON(local_paca->irq_happened);
>> +#endif
> 
> Can we use IS_ENABLED() here as well ?

I think so.

>> +#ifdef CONFIG_ALTIVEC
>> +	if ((regs->msr & (MSR_FP|MSR_VEC)) != (MSR_FP|MSR_VEC))
>> +#else
>> +	if ((regs->msr & MSR_FP) != MSR_FP)
>> +#endif
> 
> Use 'if (IS_ENABLED(CONFIG_ALTIVEC)) / else' instead of an 
> #ifdef/#else/#endif

Yeah that'll be nicer.

>> +#ifdef CONFIG_PPC_KUAP_DEBUG
>> +	if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
>> +		WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
>> +#endif
> 
> Same, define a helper in the kuap header files to avoid the #ifdefs and 
> platform specif stuff here.

Will do. Thanks for the quick review.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls
  2019-08-27 10:20     ` Nicholas Piggin
@ 2019-08-27 10:41       ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2019-08-27 10:41 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev

Nicholas Piggin's on August 27, 2019 8:20 pm:
> Christophe Leroy's on August 27, 2019 4:13 pm:
>> 
>> 
>> Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
>>> There is support for the kernel to execute the 'sc 0' instruction and
>>> make a system call to itself. This is a relic that is unused in the
>>> tree, therefore untested. It's also highly questionable for modules to
>>> be doing this.
>> 
>> I like it.
>> 
>> I dropped support for that in PPC32 when I added fast-path syscalls.
> 
> Good, then we'll match again.
> 
>>> -	beq	2f			/* if from kernel mode */
>>>   #ifdef CONFIG_PPC_FSL_BOOK3E
>>>   START_BTB_FLUSH_SECTION
>>>   	BTB_FLUSH(r10)
>>>   END_BTB_FLUSH_SECTION
>>>   #endif
>>>   	ACCOUNT_CPU_USER_ENTRY(r13, r10, r11)
>>> -2:	std	r2,GPR2(r1)
> 
> Btw. there is a hunk which restores this optimisation but it leaked
> into a later patch, I'll move it back here.

I'm wrong. Now I look at it again, the hunk should be removed
completely of course.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] powerpc: convert to copy_thread_tls
  2019-08-27 10:13     ` Nicholas Piggin
@ 2019-08-27 14:00       ` Christophe Leroy
  0 siblings, 0 replies; 19+ messages in thread
From: Christophe Leroy @ 2019-08-27 14:00 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev



Le 27/08/2019 à 12:13, Nicholas Piggin a écrit :
> Christophe Leroy's on August 27, 2019 4:07 pm:
>>
>>
>> Le 27/08/2019 à 05:30, Nicholas Piggin a écrit :
>>> Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather
>>> than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it
>>> to avoid a subtle assumption about the argument ordering of clone type
>>> syscalls.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>    arch/powerpc/Kconfig          | 1 +
>>>    arch/powerpc/kernel/process.c | 9 +++++----
>>>    2 files changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index d8dcd8820369..7477a3263225 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -182,6 +182,7 @@ config PPC
>>>    	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
>>>    	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>>>    	select HAVE_CONTEXT_TRACKING		if PPC64
>>> +	select HAVE_COPY_THREAD_TLS
>>>    	select HAVE_DEBUG_KMEMLEAK
>>>    	select HAVE_DEBUG_STACKOVERFLOW
>>>    	select HAVE_DYNAMIC_FTRACE
>>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>>> index 8fc4de0d22b4..24621e7e5033 100644
>>> --- a/arch/powerpc/kernel/process.c
>>> +++ b/arch/powerpc/kernel/process.c
>>> @@ -1600,8 +1600,9 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp)
>>>    /*
>>>     * Copy architecture-specific thread state
>>>     */
>>> -int copy_thread(unsigned long clone_flags, unsigned long usp,
>>> -		unsigned long kthread_arg, struct task_struct *p)
>>> +int copy_thread_tls(unsigned long clone_flags, unsigned long usp,
>>> +		unsigned long kthread_arg, struct task_struct *p,
>>> +		unsigned long tls)
>>>    {
>>>    	struct pt_regs *childregs, *kregs;
>>>    	extern void ret_from_fork(void);
>>> @@ -1642,10 +1643,10 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>>>    		if (clone_flags & CLONE_SETTLS) {
>>>    #ifdef CONFIG_PPC64
>>
>> is_32bit_task() exists and always returns 1 on PPC32 so this gross ifdef
>> in the middle of an if/else is pointless, it should be dropped.
> 
> I could do that as another patch in the series.


Yes, would be good, because if I do an independant patch for that it 
will conflict with your series.

Thanks
Christophe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C
  2019-08-27  3:30 ` [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C Nicholas Piggin
  2019-08-27  6:46   ` Christophe Leroy
@ 2019-08-27 14:35   ` kbuild test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2019-08-27 14:35 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, kbuild-all, Nicholas Piggin

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc6 next-20190827]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64-syscalls-in-C/20190827-202000
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   powerpc64-linux-ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash'.
   arch/powerpc/kernel/syscall_64.o: In function `.system_call_exception':
>> syscall_64.c:(.text+0x180): undefined reference to `.tabort_syscall'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25348 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] powerpc: convert to copy_thread_tls
  2019-08-27  3:30 ` [PATCH 1/4] powerpc: convert to copy_thread_tls Nicholas Piggin
  2019-08-27  6:07   ` Christophe Leroy
@ 2019-09-02  3:29   ` Michael Ellerman
  1 sibling, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2019-09-02  3:29 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin

On Tue, 2019-08-27 at 03:30:06 UTC, Nicholas Piggin wrote:
> Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather
> than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it
> to avoid a subtle assumption about the argument ordering of clone type
> syscalls.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Patches 1 & 2 applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/facd04a904ff6cdc6ee85d6e85d500f478a1bec4

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-09-02  4:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27  3:30 [PATCH 0/4] powerpc/64: syscalls in C Nicholas Piggin
2019-08-27  3:30 ` [PATCH 1/4] powerpc: convert to copy_thread_tls Nicholas Piggin
2019-08-27  6:07   ` Christophe Leroy
2019-08-27 10:13     ` Nicholas Piggin
2019-08-27 14:00       ` Christophe Leroy
2019-09-02  3:29   ` Michael Ellerman
2019-08-27  3:30 ` [PATCH 2/4] powerpc/64: remove support for kernel-mode syscalls Nicholas Piggin
2019-08-27  6:13   ` Christophe Leroy
2019-08-27 10:20     ` Nicholas Piggin
2019-08-27 10:41       ` Nicholas Piggin
2019-08-27  3:30 ` [PATCH 2/4] powerpc/64s: " Nicholas Piggin
2019-08-27  6:14   ` Christophe Leroy
2019-08-27 10:21     ` Nicholas Piggin
2019-08-27  3:30 ` [PATCH 3/4] powerpc/64: system call remove non-volatile GPR save optimisation Nicholas Piggin
2019-08-27  8:10   ` Segher Boessenkool
2019-08-27  3:30 ` [PATCH 4/4] powerpc/64: system call implement the bulk of the logic in C Nicholas Piggin
2019-08-27  6:46   ` Christophe Leroy
2019-08-27 10:30     ` Nicholas Piggin
2019-08-27 14:35   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).