linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] ARM: remove set_fs callers and implementation
@ 2020-09-07 15:36 Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann

Hi Christoph, Russell,

As promised, here is my series to remove set_fs() from arch/arm based
on the architecture-independent patches.

I have tested the oabi-compat changes using the LTP tests for the three
modified syscalls using an Armv7 kernel and a Debian 5 OABI user space,
and I have lightly tested the get_kernel_nofault infrastructure by
loading the test_lockup.ko module after setting CONFIG_DEBUG_SPINLOCK.

Let me know if there is a more thorough test I should try.

Overall there is a bit of ugliness getting introduced in the oabi compat
code and for adding another code path for TUSER(). If anyone has a better
idea, I can try out a different way.

I assume these patches can go through Al's tree along with the
corresponding changes for other architectures, once Russell is happy
with them.

Please review.

     Arnd

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=arm-kill-set_fs

Arnd Bergmann (9):
  mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  ARM: traps: use get_kernel_nofault instead of set_fs()
  ARM: oabi-compat: add epoll_pwait handler
  ARM: syscall: always store thread_info->syscall
  ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
  ARM: oabi-compat: rework sys_semtimedop emulation
  ARM: oabi-compat: rework fcntl64() emulation
  ARM: uaccess: add __{get,put}_kernel_nofault
  ARM: uaccess: remove set_fs() implementation

 arch/arm/Kconfig                   |   1 -
 arch/arm/include/asm/ptrace.h      |   1 -
 arch/arm/include/asm/syscall.h     |  14 +++
 arch/arm/include/asm/thread_info.h |   4 -
 arch/arm/include/asm/uaccess-asm.h |   6 -
 arch/arm/include/asm/uaccess.h     | 169 ++++++++++++++-------------
 arch/arm/kernel/asm-offsets.c      |   3 +-
 arch/arm/kernel/entry-common.S     |  16 +--
 arch/arm/kernel/process.c          |   7 +-
 arch/arm/kernel/ptrace.c           |   4 +-
 arch/arm/kernel/signal.c           |   8 --
 arch/arm/kernel/sys_oabi-compat.c  | 180 ++++++++++++++++-------------
 arch/arm/kernel/traps.c            |  69 +++++------
 arch/arm/lib/copy_from_user.S      |   3 +-
 arch/arm/lib/copy_to_user.S        |   3 +-
 arch/arm/tools/syscall.tbl         |   2 +-
 fs/eventpoll.c                     |   5 +-
 include/linux/eventpoll.h          |  16 +++
 include/linux/syscalls.h           |   2 +
 ipc/sem.c                          |  84 +++++++++-----
 mm/maccess.c                       |  28 ++++-
 21 files changed, 338 insertions(+), 287 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-08  6:11   ` Christoph Hellwig
  2020-09-27  9:25   ` Linus Walleij
  2020-09-07 15:36 ` [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, Andrew Morton, Daniel Borkmann,
	Alexei Starovoitov, linux-mm, linux-kernel

On machines such as ARMv5 that trap unaligned accesses, these
two functions can be slow when each access needs to be emulated,
or they might not work at all.

Change them so that each loop is only used when both the src
and dst pointers are naturally aligned.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 mm/maccess.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 3bd70405f2d8..d3f1a1f0b1c1 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -24,13 +24,21 @@ bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
 
 long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
 {
+	unsigned long align = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+		align = (unsigned long)dst | (unsigned long)src;
+
 	if (!copy_from_kernel_nofault_allowed(src, size))
 		return -ERANGE;
 
 	pagefault_disable();
-	copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
-	copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
-	copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
+	if (!(align & 7))
+		copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
+	if (!(align & 3))
+		copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
+	if (!(align & 1))
+		copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
 	copy_from_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
@@ -50,10 +58,18 @@ EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);
 
 long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
 {
+	unsigned long align = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+		align = (unsigned long)dst | (unsigned long)src;
+
 	pagefault_disable();
-	copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
-	copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
-	copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
+	if (!(align & 7))
+		copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
+	if (!(align & 3))
+		copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
+	if (!(align & 1))
+		copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
 	copy_to_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs()
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-08  6:15   ` Christoph Hellwig
  2020-09-07 15:36 ` [PATCH 3/9] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, Russell King, Andrew Morton,
	Dmitry Safonov, linux-kernel

The stack dumping code needs to work for both kernel and user mode,
and currently this works by using set_fs() and then calling get_user()
to carefully access a potentially invalid pointer.

Change both locations to handle user and kernel mode differently, using
get_kernel_nofault() in case of kernel pointers.

I change __get_user() to get_user() here for consistency, as user space
stacks should not point into kernel memory.

In dump_backtrace_entry() I assume that dump_mem() can only operate on
kernel pointers when in_entry_text(from) is true, rather than checking
the mode register.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/traps.c | 69 ++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 17d5a785df28..ebed261b356f 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -60,7 +60,7 @@ static int __init user_debug_setup(char *str)
 __setup("user_debug=", user_debug_setup);
 #endif
 
-static void dump_mem(const char *, const char *, unsigned long, unsigned long);
+static void dump_mem(const char *, const char *, unsigned long, unsigned long, bool kernel_mode);
 
 void dump_backtrace_entry(unsigned long where, unsigned long from,
 			  unsigned long frame, const char *loglvl)
@@ -76,7 +76,7 @@ void dump_backtrace_entry(unsigned long where, unsigned long from,
 #endif
 
 	if (in_entry_text(from) && end <= ALIGN(frame, THREAD_SIZE))
-		dump_mem(loglvl, "Exception stack", frame + 4, end);
+		dump_mem(loglvl, "Exception stack", frame + 4, end, true);
 }
 
 void dump_backtrace_stm(u32 *stack, u32 instruction, const char *loglvl)
@@ -119,20 +119,11 @@ static int verify_stack(unsigned long sp)
  * Dump out the contents of some memory nicely...
  */
 static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
-		     unsigned long top)
+		     unsigned long top, bool kernel_mode)
 {
 	unsigned long first;
-	mm_segment_t fs;
 	int i;
 
-	/*
-	 * We need to switch to kernel mode so that we can use __get_user
-	 * to safely read from kernel space.  Note that we now dump the
-	 * code first, just in case the backtrace kills us.
-	 */
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-
 	printk("%s%s(0x%08lx to 0x%08lx)\n", lvl, str, bottom, top);
 
 	for (first = bottom & ~31; first < top; first += 32) {
@@ -144,20 +135,25 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
 
 		for (p = first, i = 0; i < 8 && p < top; i++, p += 4) {
 			if (p >= bottom && p < top) {
-				unsigned long val;
-				if (__get_user(val, (unsigned long *)p) == 0)
-					sprintf(str + i * 9, " %08lx", val);
+				u32 val;
+				int err;
+
+				if (kernel_mode)
+					err = get_kernel_nofault(val, (u32 *)p);
+				else
+					err = get_user(val, (u32 *)p);
+
+				if (!err)
+					sprintf(str + i * 9, " %08x", val);
 				else
 					sprintf(str + i * 9, " ????????");
 			}
 		}
 		printk("%s%04lx:%s\n", lvl, first & 0xffff, str);
 	}
-
-	set_fs(fs);
 }
 
-static void __dump_instr(const char *lvl, struct pt_regs *regs)
+static void dump_instr(const char *lvl, struct pt_regs *regs)
 {
 	unsigned long addr = instruction_pointer(regs);
 	const int thumb = thumb_mode(regs);
@@ -173,10 +169,20 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
 	for (i = -4; i < 1 + !!thumb; i++) {
 		unsigned int val, bad;
 
-		if (thumb)
-			bad = get_user(val, &((u16 *)addr)[i]);
-		else
-			bad = get_user(val, &((u32 *)addr)[i]);
+		if (!user_mode(regs)) {
+			if (thumb) {
+				u16 val16;
+				bad = get_kernel_nofault(val16, &((u16 *)addr)[i]);
+				val = val16;
+			} else {
+				bad = get_kernel_nofault(val, &((u32 *)addr)[i]);
+			}
+		} else {
+			if (thumb)
+				bad = get_user(val, &((u16 *)addr)[i]);
+			else
+				bad = get_user(val, &((u32 *)addr)[i]);
+		}
 
 		if (!bad)
 			p += sprintf(p, i == 0 ? "(%0*x) " : "%0*x ",
@@ -189,20 +195,6 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
 	printk("%sCode: %s\n", lvl, str);
 }
 
-static void dump_instr(const char *lvl, struct pt_regs *regs)
-{
-	mm_segment_t fs;
-
-	if (!user_mode(regs)) {
-		fs = get_fs();
-		set_fs(KERNEL_DS);
-		__dump_instr(lvl, regs);
-		set_fs(fs);
-	} else {
-		__dump_instr(lvl, regs);
-	}
-}
-
 #ifdef CONFIG_ARM_UNWIND
 static inline void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 				  const char *loglvl)
@@ -276,6 +268,7 @@ static int __die(const char *str, int err, struct pt_regs *regs)
 	struct task_struct *tsk = current;
 	static int die_counter;
 	int ret;
+	bool kernel_mode = !user_mode(regs);
 
 	pr_emerg("Internal error: %s: %x [#%d]" S_PREEMPT S_SMP S_ISA "\n",
 	         str, err, ++die_counter);
@@ -290,9 +283,9 @@ static int __die(const char *str, int err, struct pt_regs *regs)
 	pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n",
 		 TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), end_of_stack(tsk));
 
-	if (!user_mode(regs) || in_interrupt()) {
+	if (kernel_mode || in_interrupt()) {
 		dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
-			 THREAD_SIZE + (unsigned long)task_stack_page(tsk));
+			 THREAD_SIZE + (unsigned long)task_stack_page(tsk), kernel_mode);
 		dump_backtrace(regs, tsk, KERN_EMERG);
 		dump_instr(KERN_EMERG, regs);
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/9] ARM: oabi-compat: add epoll_pwait handler
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-08  6:16   ` Christoph Hellwig
  2020-09-07 15:36 ` [PATCH 4/9] ARM: syscall: always store thread_info->syscall Arnd Bergmann
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, stable, Mikael Pettersson,
	Russell King, Russell King, Christian Brauner, Andrew Morton,
	linux-kernel

The epoll_wait() syscall has a special version for OABI compat
mode to convert the arguments to the EABI structure layout
of the kernel. However, the later epoll_pwait() syscall was
added in arch/arm in linux-2.6.32 without this conversion.

Use the same kind of handler for both.

Fixes: 369842658a36 ("ARM: 5677/1: ARM support for TIF_RESTORE_SIGMASK/pselect6/ppoll/epoll_pwait")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 35 ++++++++++++++++++++++++++++---
 arch/arm/tools/syscall.tbl        |  2 +-
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 0203e545bbc8..2ce3e8c6ca91 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -264,9 +264,8 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
 	return do_epoll_ctl(epfd, op, fd, &kernel, false);
 }
 
-asmlinkage long sys_oabi_epoll_wait(int epfd,
-				    struct oabi_epoll_event __user *events,
-				    int maxevents, int timeout)
+static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
+			       int maxevents, int timeout)
 {
 	struct epoll_event *kbuf;
 	struct oabi_epoll_event e;
@@ -299,6 +298,36 @@ asmlinkage long sys_oabi_epoll_wait(int epfd,
 	return err ? -EFAULT : ret;
 }
 
+SYSCALL_DEFINE4(oabi_epoll_wait, int, epfd, struct oabi_epoll_event __user *, events,
+		int, maxevents, int, timeout)
+{
+	return do_oabi_epoll_wait(epfd, events, maxevents, timeout);
+}
+
+/*
+ * Implement the event wait interface for the eventpoll file. It is the kernel
+ * part of the user space epoll_pwait(2).
+ */
+SYSCALL_DEFINE6(oabi_epoll_pwait, int, epfd, struct oabi_epoll_event __user *, events,
+		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
+		size_t, sigsetsize)
+{
+	int error;
+
+	/*
+	 * If the caller wants a certain signal mask to be set during the wait,
+	 * we apply it here.
+	 */
+	error = set_user_sigmask(sigmask, sigsetsize);
+	if (error)
+		return error;
+
+	error = do_oabi_epoll_wait(epfd, events, maxevents, timeout);
+	restore_saved_sigmask_unless(error == -EINTR);
+
+	return error;
+}
+
 struct oabi_sembuf {
 	unsigned short	sem_num;
 	short		sem_op;
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 171077cbf419..39a24bee7df8 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -360,7 +360,7 @@
 343	common	vmsplice		sys_vmsplice
 344	common	move_pages		sys_move_pages
 345	common	getcpu			sys_getcpu
-346	common	epoll_pwait		sys_epoll_pwait
+346	common	epoll_pwait		sys_epoll_pwait		sys_oabi_epoll_pwait
 347	common	kexec_load		sys_kexec_load
 348	common	utimensat		sys_utimensat_time32
 349	common	signalfd		sys_signalfd
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/9] ARM: syscall: always store thread_info->syscall
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (2 preceding siblings ...)
  2020-09-07 15:36 ` [PATCH 3/9] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-28  9:41   ` Linus Walleij
  2020-09-07 15:36 ` [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, Russell King, Oleg Nesterov,
	linux-kernel

The system call number is used in a a couple of places, in particular
ptrace, seccomp and /proc/<pid>/syscall.

The last one apparently never worked reliably on ARM for tasks
that are not currently getting traced.

Storing the syscall number in the normal entry path makes it work,
as well as allowing us to see if the current system call is for
OABI compat mode, which is the next thing I want to hook into.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/syscall.h | 3 +++
 arch/arm/kernel/asm-offsets.c  | 1 +
 arch/arm/kernel/entry-common.S | 7 +++++--
 arch/arm/kernel/ptrace.c       | 4 ++--
 4 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fd02761ba06c..ff6cc365eaf7 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -22,6 +22,9 @@ extern const unsigned long sys_call_table[];
 static inline int syscall_get_nr(struct task_struct *task,
 				 struct pt_regs *regs)
 {
+	if (IS_ENABLED(CONFIG_OABI_COMPAT))
+		return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE;
+
 	return task_thread_info(task)->syscall;
 }
 
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index a1570c8bab25..97af6735172b 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -46,6 +46,7 @@ int main(void)
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
   DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
+  DEFINE(TI_SYSCALL,		offsetof(struct thread_info, syscall));
   DEFINE(TI_USED_CP,		offsetof(struct thread_info, used_cp));
   DEFINE(TI_TP_VALUE,		offsetof(struct thread_info, tp_value));
   DEFINE(TI_FPSTATE,		offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 271cb8a1eba1..2ea3a1989fed 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -223,6 +223,7 @@ ENTRY(vector_swi)
 	/* saved_psr and saved_pc are now dead */
 
 	uaccess_disable tbl
+	get_thread_info tsk
 
 	adr	tbl, sys_call_table		@ load syscall table pointer
 
@@ -234,13 +235,16 @@ ENTRY(vector_swi)
 	 * get the old ABI syscall table address.
 	 */
 	bics	r10, r10, #0xff000000
+	str	r10, [tsk, #TI_SYSCALL]
 	eorne	scno, r10, #__NR_OABI_SYSCALL_BASE
 	ldrne	tbl, =sys_oabi_call_table
 #elif !defined(CONFIG_AEABI)
 	bic	scno, scno, #0xff000000		@ mask off SWI op-code
+	str	scno, [tsk, #TI_SYSCALL]
 	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
+#else
+	str	scno, [tsk, #TI_SYSCALL]
 #endif
-	get_thread_info tsk
 	/*
 	 * Reload the registers that may have been corrupted on entry to
 	 * the syscall assembly (by tracing or context tracking.)
@@ -285,7 +289,6 @@ ENDPROC(vector_swi)
 	 * context switches, and waiting for our parent to respond.
 	 */
 __sys_trace:
-	mov	r1, scno
 	add	r0, sp, #S_OFF
 	bl	syscall_trace_enter
 	mov	scno, r0
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2771e682220b..252060663b00 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -885,9 +885,9 @@ static void tracehook_report_syscall(struct pt_regs *regs,
 	regs->ARM_ip = ip;
 }
 
-asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
+asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
-	current_thread_info()->syscall = scno;
+	int scno;
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (3 preceding siblings ...)
  2020-09-07 15:36 ` [PATCH 4/9] ARM: syscall: always store thread_info->syscall Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-08  6:20   ` Christoph Hellwig
  2020-09-07 15:36 ` [PATCH 6/9] ARM: oabi-compat: rework sys_semtimedop emulation Arnd Bergmann
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, Russell King, Christian Brauner,
	Andrew Morton, linux-kernel, linux-fsdevel

The epoll_wait() system call wrapper is one of the remaining users of
the set_fs() infrasturcture for Arm. Changing it to not require set_fs()
is rather complex unfortunately.

The approach I'm taking here is to allow architectures to override
the code that copies the output to user space, and let the oabi-compat
implementation check whether it is getting called from an EABI or OABI
system call based on the thread_info->syscall value.

The in_oabi_syscall() check here mirrors the in_compat_syscall() and
in_x32_syscall() helpers for 32-bit compat implementations on other
architectures.

Overall, the amount of code goes down, at least with the newly added
sys_oabi_epoll_pwait() helper getting removed again. The downside
is added complexity in the source code for the native implementation.
There should be no difference in runtime performance except for Arm
kernels with CONFIG_OABI_COMPAT enabled that now have to go through
an external function call to check which of the two variants to use.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/syscall.h    | 11 +++++
 arch/arm/kernel/sys_oabi-compat.c | 72 +++++++------------------------
 arch/arm/tools/syscall.tbl        |  4 +-
 fs/eventpoll.c                    |  5 +--
 include/linux/eventpoll.h         | 16 +++++++
 5 files changed, 46 insertions(+), 62 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index ff6cc365eaf7..0d8afceeefd9 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -28,6 +28,17 @@ static inline int syscall_get_nr(struct task_struct *task,
 	return task_thread_info(task)->syscall;
 }
 
+static inline bool __in_oabi_syscall(struct task_struct *task)
+{
+	return IS_ENABLED(CONFIG_OABI_COMPAT) &&
+		(task_thread_info(task)->syscall & __NR_OABI_SYSCALL_BASE);
+}
+
+static inline bool in_oabi_syscall(void)
+{
+	return __in_oabi_syscall(current);
+}
+
 static inline void syscall_rollback(struct task_struct *task,
 				    struct pt_regs *regs)
 {
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 2ce3e8c6ca91..abf1153c5315 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -83,6 +83,8 @@
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 
+#include <asm/syscall.h>
+
 struct oldabi_stat64 {
 	unsigned long long st_dev;
 	unsigned int	__pad1;
@@ -264,68 +266,24 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
 	return do_epoll_ctl(epfd, op, fd, &kernel, false);
 }
 
-static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
-			       int maxevents, int timeout)
+struct epoll_event __user *
+epoll_put_uevent(__poll_t revents, __u64 data, struct epoll_event __user *uevent)
 {
-	struct epoll_event *kbuf;
-	struct oabi_epoll_event e;
-	mm_segment_t fs;
-	long ret, err, i;
+	if (in_oabi_syscall()) {
+		struct oabi_epoll_event *oevent = (void __user *)uevent;
 
-	if (maxevents <= 0 ||
-			maxevents > (INT_MAX/sizeof(*kbuf)) ||
-			maxevents > (INT_MAX/sizeof(*events)))
-		return -EINVAL;
-	if (!access_ok(events, sizeof(*events) * maxevents))
-		return -EFAULT;
-	kbuf = kmalloc_array(maxevents, sizeof(*kbuf), GFP_KERNEL);
-	if (!kbuf)
-		return -ENOMEM;
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-	ret = sys_epoll_wait(epfd, kbuf, maxevents, timeout);
-	set_fs(fs);
-	err = 0;
-	for (i = 0; i < ret; i++) {
-		e.events = kbuf[i].events;
-		e.data = kbuf[i].data;
-		err = __copy_to_user(events, &e, sizeof(e));
-		if (err)
-			break;
-		events++;
-	}
-	kfree(kbuf);
-	return err ? -EFAULT : ret;
-}
+		if (__put_user(revents, &oevent->events) ||
+		    __put_user(data, &oevent->data))
+			return NULL;
 
-SYSCALL_DEFINE4(oabi_epoll_wait, int, epfd, struct oabi_epoll_event __user *, events,
-		int, maxevents, int, timeout)
-{
-	return do_oabi_epoll_wait(epfd, events, maxevents, timeout);
-}
-
-/*
- * Implement the event wait interface for the eventpoll file. It is the kernel
- * part of the user space epoll_pwait(2).
- */
-SYSCALL_DEFINE6(oabi_epoll_pwait, int, epfd, struct oabi_epoll_event __user *, events,
-		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
-		size_t, sigsetsize)
-{
-	int error;
-
-	/*
-	 * If the caller wants a certain signal mask to be set during the wait,
-	 * we apply it here.
-	 */
-	error = set_user_sigmask(sigmask, sigsetsize);
-	if (error)
-		return error;
+		return (void __user *)uevent+1;
+	}
 
-	error = do_oabi_epoll_wait(epfd, events, maxevents, timeout);
-	restore_saved_sigmask_unless(error == -EINTR);
+	if (__put_user(revents, &uevent->events) ||
+	    __put_user(data, &uevent->data))
+		return NULL;
 
-	return error;
+	return uevent+1;
 }
 
 struct oabi_sembuf {
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 39a24bee7df8..fe5cd48fed91 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -266,7 +266,7 @@
 249	common	lookup_dcookie		sys_lookup_dcookie
 250	common	epoll_create		sys_epoll_create
 251	common	epoll_ctl		sys_epoll_ctl		sys_oabi_epoll_ctl
-252	common	epoll_wait		sys_epoll_wait		sys_oabi_epoll_wait
+252	common	epoll_wait		sys_epoll_wait
 253	common	remap_file_pages	sys_remap_file_pages
 # 254 for set_thread_area
 # 255 for get_thread_area
@@ -360,7 +360,7 @@
 343	common	vmsplice		sys_vmsplice
 344	common	move_pages		sys_move_pages
 345	common	getcpu			sys_getcpu
-346	common	epoll_pwait		sys_epoll_pwait		sys_oabi_epoll_pwait
+346	common	epoll_pwait		sys_epoll_pwait
 347	common	kexec_load		sys_kexec_load
 348	common	utimensat		sys_utimensat_time32
 349	common	signalfd		sys_signalfd
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 12eebcdea9c8..796d9e72dc96 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1745,8 +1745,8 @@ static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head *head
 		if (!revents)
 			continue;
 
-		if (__put_user(revents, &uevent->events) ||
-		    __put_user(epi->event.data, &uevent->data)) {
+		uevent = epoll_put_uevent(revents, epi->event.data, uevent);
+		if (!uevent) {
 			list_add(&epi->rdllink, head);
 			ep_pm_stay_awake(epi);
 			if (!esed->res)
@@ -1754,7 +1754,6 @@ static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head *head
 			return 0;
 		}
 		esed->res++;
-		uevent++;
 		if (epi->event.events & EPOLLONESHOT)
 			epi->event.events &= EP_PRIVATE_BITS;
 		else if (!(epi->event.events & EPOLLET)) {
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 8f000fada5a4..60df60ee78c6 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -77,4 +77,20 @@ static inline void eventpoll_release(struct file *file) {}
 
 #endif
 
+#if !defined(CONFIG_ARM) || !defined(CONFIG_OABI_COMPAT)
+/* ARM OABI has an incompatible struct layout and needs a special handler */
+static inline struct epoll_event __user *
+epoll_put_uevent(__poll_t revents, __u64 data, struct epoll_event __user *uevent)
+{
+	if (__put_user(revents, &uevent->events) ||
+	    __put_user(data, &uevent->data))
+		return NULL;
+
+	return uevent+1;
+}
+#else
+struct epoll_event __user *
+epoll_put_uevent(__poll_t revents, __u64 data, struct epoll_event __user *uevent);
+#endif
+
 #endif /* #ifndef _LINUX_EVENTPOLL_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/9] ARM: oabi-compat: rework sys_semtimedop emulation
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (4 preceding siblings ...)
  2020-09-07 15:36 ` [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 7/9] ARM: oabi-compat: rework fcntl64() emulation Arnd Bergmann
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, Christian Brauner,
	Dominik Brodowski, Andrew Morton, linux-kernel, linux-api

sys_oabi_semtimedop() is one of the last users of set_fs() on Arm. To
remove this one, expose the internal code of the actual implementation
that operates on a kernel pointer and call it directly after copying.

There should be no measurable impact on the normal execution of this
function, and it makes the overly long function a little shorter, which
may help readability.

While reworking the oabi version, make it behave a little more like
the native one, using kvmalloc_array() and restructure the code
flow in a similar way.

The naming of __do_semtimedop() is not very good, I hope someone can
come up with a better name.

One regression was spotted by kernel test robot <rong.a.chen@intel.com>
and fixed before the first mailing list submission.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 38 ++++++++------
 include/linux/syscalls.h          |  2 +
 ipc/sem.c                         | 84 +++++++++++++++++++------------
 3 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index abf1153c5315..d3c6460d13ca 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -80,6 +80,7 @@
 #include <linux/socket.h>
 #include <linux/net.h>
 #include <linux/ipc.h>
+#include <linux/ipc_namespace.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 
@@ -293,46 +294,51 @@ struct oabi_sembuf {
 	unsigned short	__pad;
 };
 
+#define sc_semopm     sem_ctls[2]
+
 asmlinkage long sys_oabi_semtimedop(int semid,
 				    struct oabi_sembuf __user *tsops,
 				    unsigned nsops,
 				    const struct old_timespec32 __user *timeout)
 {
+	struct ipc_namespace *ns;
 	struct sembuf *sops;
-	struct old_timespec32 local_timeout;
 	long err;
 	int i;
 
+	ns = current->nsproxy->ipc_ns;
+	if (nsops > ns->sc_semopm)
+		return -E2BIG;
 	if (nsops < 1 || nsops > SEMOPM)
 		return -EINVAL;
-	if (!access_ok(tsops, sizeof(*tsops) * nsops))
-		return -EFAULT;
-	sops = kmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+	sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
 	if (!sops)
 		return -ENOMEM;
 	err = 0;
 	for (i = 0; i < nsops; i++) {
 		struct oabi_sembuf osb;
-		err |= __copy_from_user(&osb, tsops, sizeof(osb));
+		err |= copy_from_user(&osb, tsops, sizeof(osb));
 		sops[i].sem_num = osb.sem_num;
 		sops[i].sem_op = osb.sem_op;
 		sops[i].sem_flg = osb.sem_flg;
 		tsops++;
 	}
-	if (timeout) {
-		/* copy this as well before changing domain protection */
-		err |= copy_from_user(&local_timeout, timeout, sizeof(*timeout));
-		timeout = &local_timeout;
-	}
 	if (err) {
 		err = -EFAULT;
-	} else {
-		mm_segment_t fs = get_fs();
-		set_fs(KERNEL_DS);
-		err = sys_semtimedop_time32(semid, sops, nsops, timeout);
-		set_fs(fs);
+		goto out;
+	}
+
+	if (timeout) {
+		struct timespec64 ts;
+		err = get_old_timespec32(&ts, timeout);
+		if (err)
+			goto out;
+		err = __do_semtimedop(semid, sops, nsops, &ts, ns);
+		goto out;
 	}
-	kfree(sops);
+	err = __do_semtimedop(semid, sops, nsops, NULL, ns);
+out:
+	kvfree(sops);
 	return err;
 }
 
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 75ac7f8ae93c..c77bd4cce536 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1340,6 +1340,8 @@ long ksys_old_shmctl(int shmid, int cmd, struct shmid_ds __user *buf);
 long compat_ksys_semtimedop(int semid, struct sembuf __user *tsems,
 			    unsigned int nsops,
 			    const struct old_timespec32 __user *timeout);
+long __do_semtimedop(int semid, struct sembuf *tsems, unsigned int nsops,
+		     const struct timespec64 *timeout, struct ipc_namespace *ns);
 
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen);
diff --git a/ipc/sem.c b/ipc/sem.c
index 8c0244e0365e..515a39a67534 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1978,46 +1978,34 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
 	return un;
 }
 
-static long do_semtimedop(int semid, struct sembuf __user *tsops,
-		unsigned nsops, const struct timespec64 *timeout)
+long __do_semtimedop(int semid, struct sembuf *sops,
+		unsigned nsops, const struct timespec64 *timeout,
+		struct ipc_namespace *ns)
 {
 	int error = -EINVAL;
 	struct sem_array *sma;
-	struct sembuf fast_sops[SEMOPM_FAST];
-	struct sembuf *sops = fast_sops, *sop;
+	struct sembuf *sop;
 	struct sem_undo *un;
 	int max, locknum;
 	bool undos = false, alter = false, dupsop = false;
 	struct sem_queue queue;
 	unsigned long dup = 0, jiffies_left = 0;
-	struct ipc_namespace *ns;
-
-	ns = current->nsproxy->ipc_ns;
 
 	if (nsops < 1 || semid < 0)
 		return -EINVAL;
 	if (nsops > ns->sc_semopm)
 		return -E2BIG;
-	if (nsops > SEMOPM_FAST) {
-		sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
-		if (sops == NULL)
-			return -ENOMEM;
-	}
-
-	if (copy_from_user(sops, tsops, nsops * sizeof(*tsops))) {
-		error =  -EFAULT;
-		goto out_free;
-	}
 
 	if (timeout) {
 		if (timeout->tv_sec < 0 || timeout->tv_nsec < 0 ||
 			timeout->tv_nsec >= 1000000000L) {
 			error = -EINVAL;
-			goto out_free;
+			goto out;
 		}
 		jiffies_left = timespec64_to_jiffies(timeout);
 	}
 
+
 	max = 0;
 	for (sop = sops; sop < sops + nsops; sop++) {
 		unsigned long mask = 1ULL << ((sop->sem_num) % BITS_PER_LONG);
@@ -2046,7 +2034,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		un = find_alloc_undo(ns, semid);
 		if (IS_ERR(un)) {
 			error = PTR_ERR(un);
-			goto out_free;
+			goto out;
 		}
 	} else {
 		un = NULL;
@@ -2057,25 +2045,25 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	if (IS_ERR(sma)) {
 		rcu_read_unlock();
 		error = PTR_ERR(sma);
-		goto out_free;
+		goto out;
 	}
 
 	error = -EFBIG;
 	if (max >= sma->sem_nsems) {
 		rcu_read_unlock();
-		goto out_free;
+		goto out;
 	}
 
 	error = -EACCES;
 	if (ipcperms(ns, &sma->sem_perm, alter ? S_IWUGO : S_IRUGO)) {
 		rcu_read_unlock();
-		goto out_free;
+		goto out;
 	}
 
 	error = security_sem_semop(&sma->sem_perm, sops, nsops, alter);
 	if (error) {
 		rcu_read_unlock();
-		goto out_free;
+		goto out;
 	}
 
 	error = -EIDRM;
@@ -2089,7 +2077,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	 * entangled here and why it's RMID race safe on comments at sem_lock()
 	 */
 	if (!ipc_valid_object(&sma->sem_perm))
-		goto out_unlock_free;
+		goto out_unlock;
 	/*
 	 * semid identifiers are not unique - find_alloc_undo may have
 	 * allocated an undo structure, it was invalidated by an RMID
@@ -2098,7 +2086,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	 * "un" itself is guaranteed by rcu.
 	 */
 	if (un && un->semid == -1)
-		goto out_unlock_free;
+		goto out_unlock;
 
 	queue.sops = sops;
 	queue.nsops = nsops;
@@ -2124,10 +2112,10 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		rcu_read_unlock();
 		wake_up_q(&wake_q);
 
-		goto out_free;
+		goto out;
 	}
 	if (error < 0) /* non-blocking error path */
-		goto out_unlock_free;
+		goto out_unlock;
 
 	/*
 	 * We need to sleep on this operation, so we put the current
@@ -2192,14 +2180,14 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		if (error != -EINTR) {
 			/* see SEM_BARRIER_2 for purpose/pairing */
 			smp_acquire__after_ctrl_dep();
-			goto out_free;
+			goto out;
 		}
 
 		rcu_read_lock();
 		locknum = sem_lock(sma, sops, nsops);
 
 		if (!ipc_valid_object(&sma->sem_perm))
-			goto out_unlock_free;
+			goto out_unlock;
 
 		/*
 		 * No necessity for any barrier: We are protect by sem_lock()
@@ -2211,7 +2199,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		 * Leave without unlink_queue(), but with sem_unlock().
 		 */
 		if (error != -EINTR)
-			goto out_unlock_free;
+			goto out_unlock;
 
 		/*
 		 * If an interrupt occurred we have to clean up the queue.
@@ -2222,13 +2210,45 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 
 	unlink_queue(sma, &queue);
 
-out_unlock_free:
+out_unlock:
 	sem_unlock(sma, locknum);
 	rcu_read_unlock();
+out:
+	return error;
+}
+
+static long do_semtimedop(int semid, struct sembuf __user *tsops,
+		unsigned nsops, const struct timespec64 *timeout)
+{
+	struct sembuf fast_sops[SEMOPM_FAST];
+	struct sembuf *sops = fast_sops;
+	struct ipc_namespace *ns;
+	int ret;
+
+	ns = current->nsproxy->ipc_ns;
+	if (nsops > ns->sc_semopm)
+		return -E2BIG;
+	if (nsops < 1)
+		return -EINVAL;
+
+	if (nsops > SEMOPM_FAST) {
+		sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+		if (sops == NULL)
+			return -ENOMEM;
+	}
+
+	if (copy_from_user(sops, tsops, nsops * sizeof(*tsops))) {
+		ret =  -EFAULT;
+		goto out_free;
+	}
+
+	ret = __do_semtimedop(semid, sops, nsops, timeout, ns);
+
 out_free:
 	if (sops != fast_sops)
 		kvfree(sops);
-	return error;
+
+	return ret;
 }
 
 long ksys_semtimedop(int semid, struct sembuf __user *tsops,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/9] ARM: oabi-compat: rework fcntl64() emulation
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (5 preceding siblings ...)
  2020-09-07 15:36 ` [PATCH 6/9] ARM: oabi-compat: rework sys_semtimedop emulation Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 8/9] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 9/9] ARM: uaccess: remove set_fs() implementation Arnd Bergmann
  8 siblings, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, linux-kernel

This is one of the last users of get_fs(), and this is fairly easy to
change, since the infrastructure for it is already there.

The replacement here is essentially a copy of the existing fcntl64()
syscall entry function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 93 ++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index d3c6460d13ca..13956d5e50d7 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -194,56 +194,83 @@ struct oabi_flock64 {
 	pid_t	l_pid;
 } __attribute__ ((packed,aligned(4)));
 
-static long do_locks(unsigned int fd, unsigned int cmd,
-				 unsigned long arg)
+static int get_oabi_flock(struct flock64 *kernel, struct oabi_flock64 __user *arg)
 {
-	struct flock64 kernel;
 	struct oabi_flock64 user;
-	mm_segment_t fs;
-	long ret;
 
 	if (copy_from_user(&user, (struct oabi_flock64 __user *)arg,
 			   sizeof(user)))
 		return -EFAULT;
-	kernel.l_type	= user.l_type;
-	kernel.l_whence	= user.l_whence;
-	kernel.l_start	= user.l_start;
-	kernel.l_len	= user.l_len;
-	kernel.l_pid	= user.l_pid;
-
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-	ret = sys_fcntl64(fd, cmd, (unsigned long)&kernel);
-	set_fs(fs);
-
-	if (!ret && (cmd == F_GETLK64 || cmd == F_OFD_GETLK)) {
-		user.l_type	= kernel.l_type;
-		user.l_whence	= kernel.l_whence;
-		user.l_start	= kernel.l_start;
-		user.l_len	= kernel.l_len;
-		user.l_pid	= kernel.l_pid;
-		if (copy_to_user((struct oabi_flock64 __user *)arg,
-				 &user, sizeof(user)))
-			ret = -EFAULT;
-	}
-	return ret;
+
+	kernel->l_type	 = user.l_type;
+	kernel->l_whence = user.l_whence;
+	kernel->l_start	 = user.l_start;
+	kernel->l_len	 = user.l_len;
+	kernel->l_pid	 = user.l_pid;
+
+	return 0;
+}
+
+static int put_oabi_flock(struct flock64 *kernel, struct oabi_flock64 __user *arg)
+{
+	struct oabi_flock64 user;
+
+	user.l_type	= kernel->l_type;
+	user.l_whence	= kernel->l_whence;
+	user.l_start	= kernel->l_start;
+	user.l_len	= kernel->l_len;
+	user.l_pid	= kernel->l_pid;
+
+	if (copy_to_user((struct oabi_flock64 __user *)arg,
+			 &user, sizeof(user)))
+		return -EFAULT;
+
+	return 0;
 }
 
 asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 				 unsigned long arg)
 {
+	void __user *argp = (void __user *)arg;
+	struct fd f = fdget_raw(fd);
+	struct flock64 flock;
+	long err = -EBADF;
+
+	if (!f.file)
+		goto out;
+
 	switch (cmd) {
-	case F_OFD_GETLK:
-	case F_OFD_SETLK:
-	case F_OFD_SETLKW:
 	case F_GETLK64:
+	case F_OFD_GETLK:
+		err = security_file_fcntl(f.file, cmd, arg);
+		if (err)
+			break;
+		err = get_oabi_flock(&flock, argp);
+		if (err)
+			break;
+		err = fcntl_getlk64(f.file, cmd, &flock);
+		if (!err)
+		       err = put_oabi_flock(&flock, argp);
+		break;
 	case F_SETLK64:
 	case F_SETLKW64:
-		return do_locks(fd, cmd, arg);
-
+	case F_OFD_SETLK:
+	case F_OFD_SETLKW:
+		err = security_file_fcntl(f.file, cmd, arg);
+		if (err)
+			break;
+		err = get_oabi_flock(&flock, argp);
+		if (err)
+			break;
+		err = fcntl_setlk64(fd, f.file, cmd, &flock);
+		break;
 	default:
-		return sys_fcntl64(fd, cmd, arg);
+		err = sys_fcntl64(fd, cmd, arg);
+		break;
 	}
+	fdput(f);
+out:
+	return err;
 }
 
 struct oabi_epoll_event {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/9] ARM: uaccess: add __{get,put}_kernel_nofault
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (6 preceding siblings ...)
  2020-09-07 15:36 ` [PATCH 7/9] ARM: oabi-compat: rework fcntl64() emulation Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  2020-09-07 15:36 ` [PATCH 9/9] ARM: uaccess: remove set_fs() implementation Arnd Bergmann
  8 siblings, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, linux-kernel

These mimic the behavior of get_user and put_user, except
for domain switching, address limit checking and handling
of mismatched sizes, none of which are relevant here.

To work with pre-Armv6 kernels, this has to avoid TUSER()
inside of the new macros, the new approach passes the "t"
string along with the opcode, which is a bit uglier but
avoids duplicating more code.

As there is no __get_user_asm_dword(), I work around it
by copying 32 bit at a time, which is possible because
the output size is known.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/uaccess.h | 123 ++++++++++++++++++++++-----------
 1 file changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index a13d90206472..4f60638755c4 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -308,11 +308,11 @@ static inline void set_fs(mm_segment_t fs)
 #define __get_user(x, ptr)						\
 ({									\
 	long __gu_err = 0;						\
-	__get_user_err((x), (ptr), __gu_err);				\
+	__get_user_err((x), (ptr), __gu_err, TUSER());			\
 	__gu_err;							\
 })
 
-#define __get_user_err(x, ptr, err)					\
+#define __get_user_err(x, ptr, err, __t)				\
 do {									\
 	unsigned long __gu_addr = (unsigned long)(ptr);			\
 	unsigned long __gu_val;						\
@@ -321,18 +321,19 @@ do {									\
 	might_fault();							\
 	__ua_flags = uaccess_save_and_enable();				\
 	switch (sizeof(*(ptr))) {					\
-	case 1:	__get_user_asm_byte(__gu_val, __gu_addr, err);	break;	\
-	case 2:	__get_user_asm_half(__gu_val, __gu_addr, err);	break;	\
-	case 4:	__get_user_asm_word(__gu_val, __gu_addr, err);	break;	\
+	case 1:	__get_user_asm_byte(__gu_val, __gu_addr, err, __t); break;	\
+	case 2:	__get_user_asm_half(__gu_val, __gu_addr, err, __t); break;	\
+	case 4:	__get_user_asm_word(__gu_val, __gu_addr, err, __t); break;	\
 	default: (__gu_val) = __get_user_bad();				\
 	}								\
 	uaccess_restore(__ua_flags);					\
 	(x) = (__typeof__(*(ptr)))__gu_val;				\
 } while (0)
+#endif
 
 #define __get_user_asm(x, addr, err, instr)			\
 	__asm__ __volatile__(					\
-	"1:	" TUSER(instr) " %1, [%2], #0\n"		\
+	"1:	" instr " %1, [%2], #0\n"			\
 	"2:\n"							\
 	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
@@ -348,40 +349,38 @@ do {									\
 	: "r" (addr), "i" (-EFAULT)				\
 	: "cc")
 
-#define __get_user_asm_byte(x, addr, err)			\
-	__get_user_asm(x, addr, err, ldrb)
+#define __get_user_asm_byte(x, addr, err, __t)			\
+	__get_user_asm(x, addr, err, "ldrb" __t)
 
 #if __LINUX_ARM_ARCH__ >= 6
 
-#define __get_user_asm_half(x, addr, err)			\
-	__get_user_asm(x, addr, err, ldrh)
+#define __get_user_asm_half(x, addr, err, __t)			\
+	__get_user_asm(x, addr, err, "ldrh" __t)
 
 #else
 
 #ifndef __ARMEB__
-#define __get_user_asm_half(x, __gu_addr, err)			\
+#define __get_user_asm_half(x, __gu_addr, err, __t)		\
 ({								\
 	unsigned long __b1, __b2;				\
-	__get_user_asm_byte(__b1, __gu_addr, err);		\
-	__get_user_asm_byte(__b2, __gu_addr + 1, err);		\
+	__get_user_asm_byte(__b1, __gu_addr, err, __t);		\
+	__get_user_asm_byte(__b2, __gu_addr + 1, err, __t);	\
 	(x) = __b1 | (__b2 << 8);				\
 })
 #else
-#define __get_user_asm_half(x, __gu_addr, err)			\
+#define __get_user_asm_half(x, __gu_addr, err, __t)		\
 ({								\
 	unsigned long __b1, __b2;				\
-	__get_user_asm_byte(__b1, __gu_addr, err);		\
-	__get_user_asm_byte(__b2, __gu_addr + 1, err);		\
+	__get_user_asm_byte(__b1, __gu_addr, err, __t);		\
+	__get_user_asm_byte(__b2, __gu_addr + 1, err, __t);	\
 	(x) = (__b1 << 8) | __b2;				\
 })
 #endif
 
 #endif /* __LINUX_ARM_ARCH__ >= 6 */
 
-#define __get_user_asm_word(x, addr, err)			\
-	__get_user_asm(x, addr, err, ldr)
-#endif
-
+#define __get_user_asm_word(x, addr, err, __t)			\
+	__get_user_asm(x, addr, err, "ldr" __t)
 
 #define __put_user_switch(x, ptr, __err, __fn)				\
 	do {								\
@@ -425,7 +424,7 @@ do {									\
 #define __put_user_nocheck(x, __pu_ptr, __err, __size)			\
 	do {								\
 		unsigned long __pu_addr = (unsigned long)__pu_ptr;	\
-		__put_user_nocheck_##__size(x, __pu_addr, __err);	\
+		__put_user_nocheck_##__size(x, __pu_addr, __err, TUSER());\
 	} while (0)
 
 #define __put_user_nocheck_1 __put_user_asm_byte
@@ -433,9 +432,11 @@ do {									\
 #define __put_user_nocheck_4 __put_user_asm_word
 #define __put_user_nocheck_8 __put_user_asm_dword
 
+#endif /* !CONFIG_CPU_SPECTRE */
+
 #define __put_user_asm(x, __pu_addr, err, instr)		\
 	__asm__ __volatile__(					\
-	"1:	" TUSER(instr) " %1, [%2], #0\n"		\
+	"1:	" instr " %1, [%2], #0\n"		\
 	"2:\n"							\
 	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
@@ -450,36 +451,36 @@ do {									\
 	: "r" (x), "r" (__pu_addr), "i" (-EFAULT)		\
 	: "cc")
 
-#define __put_user_asm_byte(x, __pu_addr, err)			\
-	__put_user_asm(x, __pu_addr, err, strb)
+#define __put_user_asm_byte(x, __pu_addr, err, __t)		\
+	__put_user_asm(x, __pu_addr, err, "strb" __t)
 
 #if __LINUX_ARM_ARCH__ >= 6
 
-#define __put_user_asm_half(x, __pu_addr, err)			\
-	__put_user_asm(x, __pu_addr, err, strh)
+#define __put_user_asm_half(x, __pu_addr, err, __t)		\
+	__put_user_asm(x, __pu_addr, err, "strh" __t)
 
 #else
 
 #ifndef __ARMEB__
-#define __put_user_asm_half(x, __pu_addr, err)			\
+#define __put_user_asm_half(x, __pu_addr, err, __t)		\
 ({								\
 	unsigned long __temp = (__force unsigned long)(x);	\
-	__put_user_asm_byte(__temp, __pu_addr, err);		\
-	__put_user_asm_byte(__temp >> 8, __pu_addr + 1, err);	\
+	__put_user_asm_byte(__temp, __pu_addr, err, __t);	\
+	__put_user_asm_byte(__temp >> 8, __pu_addr + 1, err, __t);\
 })
 #else
-#define __put_user_asm_half(x, __pu_addr, err)			\
+#define __put_user_asm_half(x, __pu_addr, err, __t)		\
 ({								\
 	unsigned long __temp = (__force unsigned long)(x);	\
-	__put_user_asm_byte(__temp >> 8, __pu_addr, err);	\
-	__put_user_asm_byte(__temp, __pu_addr + 1, err);	\
+	__put_user_asm_byte(__temp >> 8, __pu_addr, err, __t);	\
+	__put_user_asm_byte(__temp, __pu_addr + 1, err, __t);	\
 })
 #endif
 
 #endif /* __LINUX_ARM_ARCH__ >= 6 */
 
-#define __put_user_asm_word(x, __pu_addr, err)			\
-	__put_user_asm(x, __pu_addr, err, str)
+#define __put_user_asm_word(x, __pu_addr, err, __t)		\
+	__put_user_asm(x, __pu_addr, err, "str" __t)
 
 #ifndef __ARMEB__
 #define	__reg_oper0	"%R2"
@@ -489,12 +490,12 @@ do {									\
 #define	__reg_oper1	"%R2"
 #endif
 
-#define __put_user_asm_dword(x, __pu_addr, err)			\
+#define __put_user_asm_dword(x, __pu_addr, err, __t)		\
 	__asm__ __volatile__(					\
- ARM(	"1:	" TUSER(str) "	" __reg_oper1 ", [%1], #4\n"	) \
- ARM(	"2:	" TUSER(str) "	" __reg_oper0 ", [%1]\n"	) \
- THUMB(	"1:	" TUSER(str) "	" __reg_oper1 ", [%1]\n"	) \
- THUMB(	"2:	" TUSER(str) "	" __reg_oper0 ", [%1, #4]\n"	) \
+ ARM(	"1:	str" __t "	" __reg_oper1 ", [%1], #4\n"  ) \
+ ARM(	"2:	str" __t "	" __reg_oper0 ", [%1]\n"      ) \
+ THUMB(	"1:	str" __t "	" __reg_oper1 ", [%1]\n"      ) \
+ THUMB(	"2:	str" __t "	" __reg_oper0 ", [%1, #4]\n"  ) \
 	"3:\n"							\
 	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
@@ -510,7 +511,49 @@ do {									\
 	: "r" (x), "i" (-EFAULT)				\
 	: "cc")
 
-#endif /* !CONFIG_CPU_SPECTRE */
+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label)			\
+do {									\
+	const type *__pk_ptr = (src);					\
+	unsigned long __src = (unsigned long)(__pk_ptr);		\
+	type __val;							\
+	int __err = 0;							\
+	switch (sizeof(type)) {						\
+	case 1:	__get_user_asm_byte(__val, __src, __err, ""); break;	\
+	case 2: __get_user_asm_half(__val, __src, __err, ""); break;	\
+	case 4: __get_user_asm_word(__val, __src, __err, ""); break;	\
+	case 8: {							\
+		u32 *__v32 = (u32*)&__val;				\
+		__get_user_asm_word(__v32[0], __src, __err, "");	\
+		if (__err)						\
+			break;						\
+		__get_user_asm_word(__v32[1], __src+4, __err, "");	\
+		break;							\
+	}								\
+	default: __err = __get_user_bad(); break;			\
+	}								\
+	*(type *)(dst) = __val;						\
+	if (__err)							\
+		goto err_label;						\
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label)			\
+do {									\
+	const type *__pk_ptr = (dst);					\
+	unsigned long __dst = (unsigned long)__pk_ptr;			\
+	int __err = 0;							\
+	type __val = *(type *)src;					\
+	switch (sizeof(type)) {						\
+	case 1: __put_user_asm_byte(__val, __dst, __err, ""); break;	\
+	case 2:	__put_user_asm_half(__val, __dst, __err, ""); break;	\
+	case 4:	__put_user_asm_word(__val, __dst, __err, ""); break;	\
+	case 8:	__put_user_asm_dword(__val, __dst, __err, ""); break;	\
+	default: __err = __put_user_bad(); break;			\
+	}								\
+	if (__err)							\
+		goto err_label;						\
+} while (0)
 
 #ifdef CONFIG_MMU
 extern unsigned long __must_check
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 9/9] ARM: uaccess: remove set_fs() implementation
  2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (7 preceding siblings ...)
  2020-09-07 15:36 ` [PATCH 8/9] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
@ 2020-09-07 15:36 ` Arnd Bergmann
  8 siblings, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-07 15:36 UTC (permalink / raw)
  To: Christoph Hellwig, Russell King, Russell King
  Cc: Alexander Viro, kernel, linux-arch, linux-arm-kernel,
	linus.walleij, Arnd Bergmann, Oleg Nesterov

There are no remaining callers of set_fs(), so just remove it
along with all associated code that operates on
thread_info->addr_limit.

There are still further optimizations that can be done:

- In get_user(), the address check could be moved entirely
  into the out of line code, rather than passing a constant
  as an argument,

- I assume the DACR handling can be simplified as we now
  only change it during user access when CONFIG_CPU_SW_DOMAIN_PAN
  is set, but not during set_fs().

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/Kconfig                   |  1 -
 arch/arm/include/asm/ptrace.h      |  1 -
 arch/arm/include/asm/thread_info.h |  4 ---
 arch/arm/include/asm/uaccess-asm.h |  6 ----
 arch/arm/include/asm/uaccess.h     | 46 +++---------------------------
 arch/arm/kernel/asm-offsets.c      |  2 --
 arch/arm/kernel/entry-common.S     |  9 ------
 arch/arm/kernel/process.c          |  7 +----
 arch/arm/kernel/signal.c           |  8 ------
 arch/arm/lib/copy_from_user.S      |  3 +-
 arch/arm/lib/copy_to_user.S        |  3 +-
 11 files changed, 7 insertions(+), 83 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87e1478a42dc..e00d94b16658 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -118,7 +118,6 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
-	select SET_FS
 	select SYS_SUPPORTS_APM_EMULATION
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 91d6b7856be4..93051e2f402c 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -19,7 +19,6 @@ struct pt_regs {
 struct svc_pt_regs {
 	struct pt_regs regs;
 	u32 dacr;
-	u32 addr_limit;
 };
 
 #define to_svc_pt_regs(r) container_of(r, struct svc_pt_regs, regs)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 536b6b979f63..8b705f611216 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -23,8 +23,6 @@ struct task_struct;
 
 #include <asm/types.h>
 
-typedef unsigned long mm_segment_t;
-
 struct cpu_context_save {
 	__u32	r4;
 	__u32	r5;
@@ -46,7 +44,6 @@ struct cpu_context_save {
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	int			preempt_count;	/* 0 => preemptable, <0 => bug */
-	mm_segment_t		addr_limit;	/* address limit */
 	struct task_struct	*task;		/* main task structure */
 	__u32			cpu;		/* cpu */
 	__u32			cpu_domain;	/* cpu domain */
@@ -72,7 +69,6 @@ struct thread_info {
 	.task		= &tsk,						\
 	.flags		= 0,						\
 	.preempt_count	= INIT_PREEMPT_COUNT,				\
-	.addr_limit	= KERNEL_DS,					\
 }
 
 /*
diff --git a/arch/arm/include/asm/uaccess-asm.h b/arch/arm/include/asm/uaccess-asm.h
index 907571fd05c6..6451a433912c 100644
--- a/arch/arm/include/asm/uaccess-asm.h
+++ b/arch/arm/include/asm/uaccess-asm.h
@@ -84,12 +84,8 @@
 	 * if \disable is set.
 	 */
 	.macro	uaccess_entry, tsk, tmp0, tmp1, tmp2, disable
-	ldr	\tmp1, [\tsk, #TI_ADDR_LIMIT]
-	mov	\tmp2, #TASK_SIZE
-	str	\tmp2, [\tsk, #TI_ADDR_LIMIT]
  DACR(	mrc	p15, 0, \tmp0, c3, c0, 0)
  DACR(	str	\tmp0, [sp, #SVC_DACR])
-	str	\tmp1, [sp, #SVC_ADDR_LIMIT]
 	.if \disable && IS_ENABLED(CONFIG_CPU_SW_DOMAIN_PAN)
 	/* kernel=client, user=no access */
 	mov	\tmp2, #DACR_UACCESS_DISABLE
@@ -106,9 +102,7 @@
 
 	/* Restore the user access state previously saved by uaccess_entry */
 	.macro	uaccess_exit, tsk, tmp0, tmp1
-	ldr	\tmp1, [sp, #SVC_ADDR_LIMIT]
  DACR(	ldr	\tmp0, [sp, #SVC_DACR])
-	str	\tmp1, [\tsk, #TI_ADDR_LIMIT]
  DACR(	mcr	p15, 0, \tmp0, c3, c0, 0)
 	.endm
 
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 4f60638755c4..084d1c07c2d0 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -52,32 +52,8 @@ static __always_inline void uaccess_restore(unsigned int flags)
 extern int __get_user_bad(void);
 extern int __put_user_bad(void);
 
-/*
- * Note that this is actually 0x1,0000,0000
- */
-#define KERNEL_DS	0x00000000
-
 #ifdef CONFIG_MMU
 
-#define USER_DS		TASK_SIZE
-#define get_fs()	(current_thread_info()->addr_limit)
-
-static inline void set_fs(mm_segment_t fs)
-{
-	current_thread_info()->addr_limit = fs;
-
-	/*
-	 * Prevent a mispredicted conditional call to set_fs from forwarding
-	 * the wrong address limit to access_ok under speculation.
-	 */
-	dsb(nsh);
-	isb();
-
-	modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER);
-}
-
-#define uaccess_kernel()	(get_fs() == KERNEL_DS)
-
 /*
  * We use 33-bit arithmetic here.  Success returns zero, failure returns
  * addr_limit.  We take advantage that addr_limit will be zero for KERNEL_DS,
@@ -89,7 +65,7 @@ static inline void set_fs(mm_segment_t fs)
 	__asm__(".syntax unified\n" \
 		"adds %1, %2, %3; sbcscc %1, %1, %0; movcc %0, #0" \
 		: "=&r" (flag), "=&r" (roksum) \
-		: "r" (addr), "Ir" (size), "0" (current_thread_info()->addr_limit) \
+		: "r" (addr), "Ir" (size), "0" (TASK_SIZE) \
 		: "cc"); \
 	flag; })
 
@@ -120,7 +96,7 @@ static inline void __user *__uaccess_mask_range_ptr(const void __user *ptr,
 	"	subshs	%1, %1, %2\n"
 	"	movlo	%0, #0\n"
 	: "+r" (safe_ptr), "=&r" (tmp)
-	: "r" (size), "r" (current_thread_info()->addr_limit)
+	: "r" (size), "r" (TASK_SIZE)
 	: "cc");
 
 	csdb();
@@ -194,7 +170,7 @@ extern int __get_user_64t_4(void *);
 
 #define __get_user_check(x, p)						\
 	({								\
-		unsigned long __limit = current_thread_info()->addr_limit - 1; \
+		unsigned long __limit = TASK_SIZE - 1; \
 		register typeof(*(p)) __user *__p asm("r0") = (p);	\
 		register __inttype(x) __r2 asm("r2");			\
 		register unsigned long __l asm("r1") = __limit;		\
@@ -245,7 +221,7 @@ extern int __put_user_8(void *, unsigned long long);
 
 #define __put_user_check(__pu_val, __ptr, __err, __s)			\
 	({								\
-		unsigned long __limit = current_thread_info()->addr_limit - 1; \
+		unsigned long __limit = TASK_SIZE - 1; \
 		register typeof(__pu_val) __r2 asm("r2") = __pu_val;	\
 		register const void __user *__p asm("r0") = __ptr;	\
 		register unsigned long __l asm("r1") = __limit;		\
@@ -262,19 +238,8 @@ extern int __put_user_8(void *, unsigned long long);
 
 #else /* CONFIG_MMU */
 
-/*
- * uClinux has only one addr space, so has simplified address limits.
- */
-#define USER_DS			KERNEL_DS
-
-#define uaccess_kernel()	(true)
 #define __addr_ok(addr)		((void)(addr), 1)
 #define __range_ok(addr, size)	((void)(addr), 0)
-#define get_fs()		(KERNEL_DS)
-
-static inline void set_fs(mm_segment_t fs)
-{
-}
 
 #define get_user(x, p)	__get_user(x, p)
 #define __put_user_check __put_user_nocheck
@@ -283,9 +248,6 @@ static inline void set_fs(mm_segment_t fs)
 
 #define access_ok(addr, size)	(__range_ok(addr, size) == 0)
 
-#define user_addr_max() \
-	(uaccess_kernel() ? ~0UL : get_fs())
-
 #ifdef CONFIG_CPU_SPECTRE
 /*
  * When mitigating Spectre variant 1, it is not worth fixing the non-
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 97af6735172b..78f0a25baf2d 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -41,7 +41,6 @@ int main(void)
   BLANK();
   DEFINE(TI_FLAGS,		offsetof(struct thread_info, flags));
   DEFINE(TI_PREEMPT,		offsetof(struct thread_info, preempt_count));
-  DEFINE(TI_ADDR_LIMIT,		offsetof(struct thread_info, addr_limit));
   DEFINE(TI_TASK,		offsetof(struct thread_info, task));
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
@@ -90,7 +89,6 @@ int main(void)
   DEFINE(S_OLD_R0,		offsetof(struct pt_regs, ARM_ORIG_r0));
   DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
   DEFINE(SVC_DACR,		offsetof(struct svc_pt_regs, dacr));
-  DEFINE(SVC_ADDR_LIMIT,	offsetof(struct svc_pt_regs, addr_limit));
   DEFINE(SVC_REGS_SIZE,		sizeof(struct svc_pt_regs));
   BLANK();
   DEFINE(SIGFRAME_RC3_OFFSET,	offsetof(struct sigframe, retcode[3]));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 2ea3a1989fed..610e32273c81 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -49,9 +49,6 @@ __ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
 	disable_irq_notrace			@ disable interrupts
-	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	cmp	r2, #TASK_SIZE
-	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
 	bne	fast_work_pending
@@ -86,9 +83,6 @@ __ret_fast_syscall:
 	bl	do_rseq_syscall
 #endif
 	disable_irq_notrace			@ disable interrupts
-	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	cmp	r2, #TASK_SIZE
-	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
 	beq	no_work_pending
@@ -127,9 +121,6 @@ ret_slow_syscall:
 #endif
 	disable_irq_notrace			@ disable interrupts
 ENTRY(ret_to_user_from_irq)
-	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	cmp	r2, #TASK_SIZE
-	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]
 	tst	r1, #_TIF_WORK_MASK
 	bne	slow_work_pending
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 8e6ace03e960..28a1a4a9dd77 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -97,7 +97,7 @@ void __show_regs(struct pt_regs *regs)
 	unsigned long flags;
 	char buf[64];
 #ifndef CONFIG_CPU_V7M
-	unsigned int domain, fs;
+	unsigned int domain;
 #ifdef CONFIG_CPU_SW_DOMAIN_PAN
 	/*
 	 * Get the domain register for the parent context. In user
@@ -106,14 +106,11 @@ void __show_regs(struct pt_regs *regs)
 	 */
 	if (user_mode(regs)) {
 		domain = DACR_UACCESS_ENABLE;
-		fs = get_fs();
 	} else {
 		domain = to_svc_pt_regs(regs)->dacr;
-		fs = to_svc_pt_regs(regs)->addr_limit;
 	}
 #else
 	domain = get_domain();
-	fs = get_fs();
 #endif
 #endif
 
@@ -149,8 +146,6 @@ void __show_regs(struct pt_regs *regs)
 		if ((domain & domain_mask(DOMAIN_USER)) ==
 		    domain_val(DOMAIN_USER, DOMAIN_NOACCESS))
 			segment = "none";
-		else if (fs == KERNEL_DS)
-			segment = "kernel";
 		else
 			segment = "user";
 
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index c9dc912b83f0..618b5d938317 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -710,14 +710,6 @@ struct page *get_signal_page(void)
 	return page;
 }
 
-/* Defer to generic check */
-asmlinkage void addr_limit_check_failed(void)
-{
-#ifdef CONFIG_MMU
-	addr_limit_user_check();
-#endif
-}
-
 #ifdef CONFIG_DEBUG_RSEQ
 asmlinkage void do_rseq_syscall(struct pt_regs *regs)
 {
diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index f8016e3db65d..f481ef789a93 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -109,8 +109,7 @@
 
 ENTRY(arm_copy_from_user)
 #ifdef CONFIG_CPU_SPECTRE
-	get_thread_info r3
-	ldr	r3, [r3, #TI_ADDR_LIMIT]
+	mov	r3, #TASK_SIZE
 	uaccess_mask_range_ptr r1, r2, r3, ip
 #endif
 
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index ebfe4cb3d912..215da16c7d6e 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -109,8 +109,7 @@
 ENTRY(__copy_to_user_std)
 WEAK(arm_copy_to_user)
 #ifdef CONFIG_CPU_SPECTRE
-	get_thread_info r3
-	ldr	r3, [r3, #TI_ADDR_LIMIT]
+	mov	r3, #TASK_SIZE
 	uaccess_mask_range_ptr r0, r2, r3, ip
 #endif
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  2020-09-07 15:36 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
@ 2020-09-08  6:11   ` Christoph Hellwig
  2020-09-27  9:25   ` Linus Walleij
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-09-08  6:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Russell King, Alexander Viro, kernel,
	linux-arch, linux-arm-kernel, linus.walleij, Andrew Morton,
	Daniel Borkmann, Alexei Starovoitov, linux-mm, linux-kernel

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

This should probably go into Al's base.set_fs branch.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs()
  2020-09-07 15:36 ` [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
@ 2020-09-08  6:15   ` Christoph Hellwig
  2020-09-17 17:29     ` Arnd Bergmann
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2020-09-08  6:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Russell King, Alexander Viro, kernel,
	linux-arch, linux-arm-kernel, linus.walleij, Russell King,
	Andrew Morton, Dmitry Safonov, linux-kernel

> +static void dump_mem(const char *, const char *, unsigned long, unsigned long, bool kernel_mode);

This adds a pointlessly long line.  

And looking at the code I don't see why the argument is even needed.

dump_mem() currently does an unconditional set_fs(KERNEL_DS), so it
should always use get_kernel_nofault.

> +static void dump_instr(const char *lvl, struct pt_regs *regs)
>  {
>  	unsigned long addr = instruction_pointer(regs);
>  	const int thumb = thumb_mode(regs);
> @@ -173,10 +169,20 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
>  	for (i = -4; i < 1 + !!thumb; i++) {
>  		unsigned int val, bad;
>  
> -		if (thumb)
> -			bad = get_user(val, &((u16 *)addr)[i]);
> -		else
> -			bad = get_user(val, &((u32 *)addr)[i]);
> +		if (!user_mode(regs)) {
> +			if (thumb) {
> +				u16 val16;
> +				bad = get_kernel_nofault(val16, &((u16 *)addr)[i]);
> +				val = val16;
> +			} else {
> +				bad = get_kernel_nofault(val, &((u32 *)addr)[i]);
> +			}
> +		} else {
> +			if (thumb)
> +				bad = get_user(val, &((u16 *)addr)[i]);
> +			else
> +				bad = get_user(val, &((u32 *)addr)[i]);
> +		}

When I looked at this earlier I just added a little helper to make
this a little easier to read.   Here is my patch from an old tree:

http://git.infradead.org/users/hch/misc.git/commitdiff/67413030ccb7a64a7eb828e13ff0795f4eadfeb7

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/9] ARM: oabi-compat: add epoll_pwait handler
  2020-09-07 15:36 ` [PATCH 3/9] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
@ 2020-09-08  6:16   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-09-08  6:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Russell King, Alexander Viro, kernel,
	linux-arch, linux-arm-kernel, linus.walleij, stable,
	Mikael Pettersson, Russell King, Russell King, Christian Brauner,
	Andrew Morton, linux-kernel

> +SYSCALL_DEFINE4(oabi_epoll_wait, int, epfd, struct oabi_epoll_event __user *, events,
> +		int, maxevents, int, timeout)

> +SYSCALL_DEFINE6(oabi_epoll_pwait, int, epfd, struct oabi_epoll_event __user *, events,
> +		int, maxevents, int, timeout, const sigset_t __user *, sigmask,
> +		size_t, sigsetsize)

More pointlessly long lines..

Otherwise this looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
  2020-09-07 15:36 ` [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
@ 2020-09-08  6:20   ` Christoph Hellwig
  2020-09-08 20:56     ` Arnd Bergmann
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2020-09-08  6:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Russell King, Alexander Viro, kernel,
	linux-arch, linux-arm-kernel, linus.walleij, Russell King,
	Christian Brauner, Andrew Morton, linux-kernel, linux-fsdevel

On Mon, Sep 07, 2020 at 05:36:46PM +0200, Arnd Bergmann wrote:
> The epoll_wait() system call wrapper is one of the remaining users of
> the set_fs() infrasturcture for Arm. Changing it to not require set_fs()
> is rather complex unfortunately.
> 
> The approach I'm taking here is to allow architectures to override
> the code that copies the output to user space, and let the oabi-compat
> implementation check whether it is getting called from an EABI or OABI
> system call based on the thread_info->syscall value.
> 
> The in_oabi_syscall() check here mirrors the in_compat_syscall() and
> in_x32_syscall() helpers for 32-bit compat implementations on other
> architectures.
> 
> Overall, the amount of code goes down, at least with the newly added
> sys_oabi_epoll_pwait() helper getting removed again. The downside
> is added complexity in the source code for the native implementation.
> There should be no difference in runtime performance except for Arm
> kernels with CONFIG_OABI_COMPAT enabled that now have to go through
> an external function call to check which of the two variants to use.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/include/asm/syscall.h    | 11 +++++
>  arch/arm/kernel/sys_oabi-compat.c | 72 +++++++------------------------
>  arch/arm/tools/syscall.tbl        |  4 +-
>  fs/eventpoll.c                    |  5 +--
>  include/linux/eventpoll.h         | 16 +++++++
>  5 files changed, 46 insertions(+), 62 deletions(-)
> 
> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
> index ff6cc365eaf7..0d8afceeefd9 100644
> --- a/arch/arm/include/asm/syscall.h
> +++ b/arch/arm/include/asm/syscall.h
> @@ -28,6 +28,17 @@ static inline int syscall_get_nr(struct task_struct *task,
>  	return task_thread_info(task)->syscall;
>  }
>  
> +static inline bool __in_oabi_syscall(struct task_struct *task)
> +{
> +	return IS_ENABLED(CONFIG_OABI_COMPAT) &&
> +		(task_thread_info(task)->syscall & __NR_OABI_SYSCALL_BASE);
> +}
> +
> +static inline bool in_oabi_syscall(void)
> +{
> +	return __in_oabi_syscall(current);
> +}
> +
>  static inline void syscall_rollback(struct task_struct *task,
>  				    struct pt_regs *regs)
>  {
> diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
> index 2ce3e8c6ca91..abf1153c5315 100644
> --- a/arch/arm/kernel/sys_oabi-compat.c
> +++ b/arch/arm/kernel/sys_oabi-compat.c
> @@ -83,6 +83,8 @@
>  #include <linux/uaccess.h>
>  #include <linux/slab.h>
>  
> +#include <asm/syscall.h>
> +
>  struct oldabi_stat64 {
>  	unsigned long long st_dev;
>  	unsigned int	__pad1;
> @@ -264,68 +266,24 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
>  	return do_epoll_ctl(epfd, op, fd, &kernel, false);
>  }
>  
> -static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
> -			       int maxevents, int timeout)
> +struct epoll_event __user *
> +epoll_put_uevent(__poll_t revents, __u64 data, struct epoll_event __user *uevent)
>  {
> +	if (in_oabi_syscall()) {
> +		struct oabi_epoll_event *oevent = (void __user *)uevent;
>  
> +		if (__put_user(revents, &oevent->events) ||
> +		    __put_user(data, &oevent->data))
> +			return NULL;
>  
> +		return (void __user *)uevent+1;
> +	}

I wonder if we'd be better off doing the in_oabi_syscall() branch in
the common code.  E.g. rename in_oabi_syscall to in_legacy_syscall and
stub it out for all other architectures.  Then just do

	if (in_oabi_syscall()
		legacy_syscall_foo_bit();
	else
		normal_syscall_foo_bit();

in common code, where so far only arm provides
legacy_syscall_foo_bit().

Tons of long lines again in this patch..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
  2020-09-08  6:20   ` Christoph Hellwig
@ 2020-09-08 20:56     ` Arnd Bergmann
  0 siblings, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-08 20:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Russell King, Alexander Viro, linux-,
	linux-arch, Linux ARM, Linus Walleij, Russell King,
	Christian Brauner, Andrew Morton, linux-kernel,
	Linux FS-devel Mailing List

On Tue, Sep 8, 2020 at 8:20 AM Christoph Hellwig <hch@lst.de> wrote:
> > @@ -264,68 +266,24 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
> >       return do_epoll_ctl(epfd, op, fd, &kernel, false);
> >  }
> >
> > -static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
> > -                            int maxevents, int timeout)
> > +struct epoll_event __user *
> > +epoll_put_uevent(__poll_t revents, __u64 data, struct epoll_event __user *uevent)
> >  {
> > +     if (in_oabi_syscall()) {
> > +             struct oabi_epoll_event *oevent = (void __user *)uevent;
> >
> > +             if (__put_user(revents, &oevent->events) ||
> > +                 __put_user(data, &oevent->data))
> > +                     return NULL;
> >
> > +             return (void __user *)uevent+1;

FWIW, this line needs to be

         return (void __user *)(oevent+1);

It turns out that while I thought I had tested this already, my earlier
tests were on the EABI Debian 5 instead of the OABI version of the
same distro. I reproduced it both ways now and LTP successfully
found that bug ;-)

> I wonder if we'd be better off doing the in_oabi_syscall() branch in
> the common code.  E.g. rename in_oabi_syscall to in_legacy_syscall and
> stub it out for all other architectures.  Then just do
>
>         if (in_oabi_syscall()
>                 legacy_syscall_foo_bit();
>         else
>                 normal_syscall_foo_bit();
>
> in common code, where so far only arm provides
> legacy_syscall_foo_bit().

I tried out different ways, the first one I had was with an #ifdef in the
C code that I did not like much.

Moving the different code path into common code would avoid that
#ifdef but also put the rather obscure oabi-compat code into a
much more prominent location. I'd prefer to keep it out of there
as much as possible and hope we don't need to do this anywhere
else. x86-32 has some similar issues with struct layout, but that
already goes through the normal compat layer on 64-bit kernels.

> Tons of long lines again in this patch..

Fixed now.

       Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs()
  2020-09-08  6:15   ` Christoph Hellwig
@ 2020-09-17 17:29     ` Arnd Bergmann
  2020-09-18  7:42       ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-17 17:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Russell King, Alexander Viro, linux-,
	linux-arch, Linux ARM, Linus Walleij, Russell King,
	Andrew Morton, Dmitry Safonov, linux-kernel

On Tue, Sep 8, 2020 at 8:15 AM Christoph Hellwig <hch@lst.de> wrote:
>
> > +static void dump_mem(const char *, const char *, unsigned long, unsigned long, bool kernel_mode);
>
> This adds a pointlessly long line.

Fixed.

> And looking at the code I don't see why the argument is even needed.
>
> dump_mem() currently does an unconditional set_fs(KERNEL_DS), so it
> should always use get_kernel_nofault.

I had looked at

        if (!user_mode(regs) || in_interrupt()) {
                dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
                         THREAD_SIZE + (unsigned
long)task_stack_page(tsk), kernel_mode);

which told me that there should be at least some code path ending up in
__die() that has in_interrupt() set but comes from user mode.

In this case, the memory to print would be a user pointer and cannot be
accessed by get_kernel_nofault() (but could be accessed with
"set_fs(KERNEL_DS); __get_user()").

I looked through the history now and the only code path I could
find that would arrive here this way is from bad_mode(), indicating
that there is probably a hardware bug or the contents of *regs are
corrupted.

Russell might have a better explanation for this, but I would assume
now that you are right in that we don't ever need to care about
dumping user space addresses here.

> > +static void dump_instr(const char *lvl, struct pt_regs *regs)
> >  {
> >       unsigned long addr = instruction_pointer(regs);
> >       const int thumb = thumb_mode(regs);
> > @@ -173,10 +169,20 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
> >       for (i = -4; i < 1 + !!thumb; i++) {
> >               unsigned int val, bad;
> >
> > -             if (thumb)
> > -                     bad = get_user(val, &((u16 *)addr)[i]);
> > -             else
> > -                     bad = get_user(val, &((u32 *)addr)[i]);
> > +             if (!user_mode(regs)) {
> > +                     if (thumb) {
> > +                             u16 val16;
> > +                             bad = get_kernel_nofault(val16, &((u16 *)addr)[i]);
> > +                             val = val16;
> > +                     } else {
> > +                             bad = get_kernel_nofault(val, &((u32 *)addr)[i]);
> > +                     }
> > +             } else {
> > +                     if (thumb)
> > +                             bad = get_user(val, &((u16 *)addr)[i]);
> > +                     else
> > +                             bad = get_user(val, &((u32 *)addr)[i]);
> > +             }
>
> When I looked at this earlier I just added a little helper to make
> this a little easier to read.   Here is my patch from an old tree:
>
> http://git.infradead.org/users/hch/misc.git/commitdiff/67413030ccb7a64a7eb828e13ff0795f4eadfeb7

I think your version was broken for the in-kernel thumb version
because get_kernel_nofault does not widen the result
from 16 to 32 bits. I tried fixing this in your version, but the
result ended up more complex than my version here, so I
decided to keep what I had.

       Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs()
  2020-09-17 17:29     ` Arnd Bergmann
@ 2020-09-18  7:42       ` Russell King - ARM Linux admin
  2020-09-18 12:36         ` Arnd Bergmann
  0 siblings, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux admin @ 2020-09-18  7:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Alexander Viro, linux-,
	linux-arch, Linux ARM, Linus Walleij, Andrew Morton,
	Dmitry Safonov, linux-kernel

On Thu, Sep 17, 2020 at 07:29:37PM +0200, Arnd Bergmann wrote:
> On Tue, Sep 8, 2020 at 8:15 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > > +static void dump_mem(const char *, const char *, unsigned long, unsigned long, bool kernel_mode);
> >
> > This adds a pointlessly long line.
> 
> Fixed.
> 
> > And looking at the code I don't see why the argument is even needed.
> >
> > dump_mem() currently does an unconditional set_fs(KERNEL_DS), so it
> > should always use get_kernel_nofault.
> 
> I had looked at
> 
>         if (!user_mode(regs) || in_interrupt()) {
>                 dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
>                          THREAD_SIZE + (unsigned
> long)task_stack_page(tsk), kernel_mode);
> 
> which told me that there should be at least some code path ending up in
> __die() that has in_interrupt() set but comes from user mode.
> 
> In this case, the memory to print would be a user pointer and cannot be
> accessed by get_kernel_nofault() (but could be accessed with
> "set_fs(KERNEL_DS); __get_user()").
> 
> I looked through the history now and the only code path I could
> find that would arrive here this way is from bad_mode(), indicating
> that there is probably a hardware bug or the contents of *regs are
> corrupted.

Yes, that's correct.  It isn't something entirely theoretical, although
we never see it now, it used to happen in the distant past due to saved
regs corruption.  If bad_mode() ever gets called, all bets are off and
we're irrecoverably crashing.

Note that in that case, while user_mode(regs) may return true or false,
regs->ARM_sp and regs->ARM_lr are always the SVC mode stack and return
address after regs has been stacked, and not the expected values for
the parent context (which we have most likely long since destroyed.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs()
  2020-09-18  7:42       ` Russell King - ARM Linux admin
@ 2020-09-18 12:36         ` Arnd Bergmann
  0 siblings, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-18 12:36 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Christoph Hellwig, Alexander Viro, linux-,
	linux-arch, Linux ARM, Linus Walleij, Andrew Morton,
	Dmitry Safonov, linux-kernel

On Fri, Sep 18, 2020 at 9:42 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Sep 17, 2020 at 07:29:37PM +0200, Arnd Bergmann wrote:
> > On Tue, Sep 8, 2020 at 8:15 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > I looked through the history now and the only code path I could
> > find that would arrive here this way is from bad_mode(), indicating
> > that there is probably a hardware bug or the contents of *regs are
> > corrupted.
>
> Yes, that's correct.  It isn't something entirely theoretical, although
> we never see it now, it used to happen in the distant past due to saved
> regs corruption.  If bad_mode() ever gets called, all bets are off and
> we're irrecoverably crashing.
>
> Note that in that case, while user_mode(regs) may return true or false,
> regs->ARM_sp and regs->ARM_lr are always the SVC mode stack and return
> address after regs has been stacked, and not the expected values for
> the parent context (which we have most likely long since destroyed.)

Ok, I have rewritten the patch and my changelog text accordingly, sending
an updated version now.

Thanks,

      Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  2020-09-07 15:36 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
  2020-09-08  6:11   ` Christoph Hellwig
@ 2020-09-27  9:25   ` Linus Walleij
  1 sibling, 0 replies; 24+ messages in thread
From: Linus Walleij @ 2020-09-27  9:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Russell King, Alexander Viro,
	linux-kernel@vger.kernel.org, linux-arch, Linux ARM,
	Andrew Morton, Daniel Borkmann, Alexei Starovoitov, linux-mm,
	linux-kernel

On Mon, Sep 7, 2020 at 5:37 PM Arnd Bergmann <arnd@arndb.de> wrote:

> On machines such as ARMv5 that trap unaligned accesses, these
> two functions can be slow when each access needs to be emulated,
> or they might not work at all.
>
> Change them so that each loop is only used when both the src
> and dst pointers are naturally aligned.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

That's a clever optimization idea, nice!
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] ARM: syscall: always store thread_info->syscall
  2020-09-07 15:36 ` [PATCH 4/9] ARM: syscall: always store thread_info->syscall Arnd Bergmann
@ 2020-09-28  9:41   ` Linus Walleij
  2020-09-28 12:42     ` Arnd Bergmann
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Walleij @ 2020-09-28  9:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Russell King, Alexander Viro,
	linux-kernel@vger.kernel.org, linux-arch, Linux ARM,
	Russell King, Oleg Nesterov, linux-kernel

Hi Arnd,

help me out here because I feel vaguely stupid...

On Mon, Sep 7, 2020 at 5:38 PM Arnd Bergmann <arnd@arndb.de> wrote:

>  {
> +       if (IS_ENABLED(CONFIG_OABI_COMPAT))
> +               return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE;

Where __NR_OABI_SYSCALL_BASE is
#define __NR_OABI_SYSCALL_BASE       0x900000

So you will end up with sycall number & FF6FFFFF
masking off bits 20 and 23.

I suppose this is based on this:

>         bics    r10, r10, #0xff000000
> +       str     r10, [tsk, #TI_SYSCALL]

OK we mask off bits 24-31 before we store this.

>         bic     scno, scno, #0xff000000         @ mask off SWI op-code
> +       str     scno, [tsk, #TI_SYSCALL]

And here too.

>         eor     scno, scno, #__NR_SYSCALL_BASE  @ check OS number

And then happens that which will ... I don't know really.
Exclusive or with 0x9000000 is not immediately intuitive
evident to me, I suppose it is for everyone else... :/

I need some idea how this numberspace is managed in order to
understand the code so I can review it, I guess it all makes perfect
sense but I need some background here.

Thanks,
Linus Walleij

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] ARM: syscall: always store thread_info->syscall
  2020-09-28  9:41   ` Linus Walleij
@ 2020-09-28 12:42     ` Arnd Bergmann
  2020-09-28 15:08       ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-09-28 12:42 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Christoph Hellwig, Russell King, Alexander Viro,
	linux-kernel@vger.kernel.org, linux-arch, Linux ARM,
	Russell King, Oleg Nesterov, linux-kernel

On Mon, Sep 28, 2020 at 11:41 AM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> Hi Arnd,
>
> help me out here because I feel vaguely stupid...
>
> On Mon, Sep 7, 2020 at 5:38 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> >  {
> > +       if (IS_ENABLED(CONFIG_OABI_COMPAT))
> > +               return task_thread_info(task)->syscall & ~__NR_OABI_SYSCALL_BASE;
>
> Where __NR_OABI_SYSCALL_BASE is
> #define __NR_OABI_SYSCALL_BASE       0x900000
>
> So you will end up with sycall number & FF6FFFFF
> masking off bits 20 and 23.

Right. I fixed a bug in here since I sent this, the correct version also
needs to mask away the __NR_OABI_SYSCALL_BASE for a native
oabi kernel, not just for an eabi kernel with oabi-compat mode.

> I suppose this is based on this:
>
> >         bics    r10, r10, #0xff000000
> > +       str     r10, [tsk, #TI_SYSCALL]
>
> OK we mask off bits 24-31 before we store this.
>
> >         bic     scno, scno, #0xff000000         @ mask off SWI op-code
> > +       str     scno, [tsk, #TI_SYSCALL]
>
> And here too.
>
> >         eor     scno, scno, #__NR_SYSCALL_BASE  @ check OS number
>
> And then happens that which will ... I don't know really.
> Exclusive or with 0x9000000 is not immediately intuitive
> evident to me, I suppose it is for everyone else... :/

This is how the SWI/SVC immediate argument gets turned into
a system call number that is used as an offset into the sys_call_table.

OABI syscalls are called with '__NR_OABI_SYSCALL_BASE | scno'
in the immediate argument of the instruction, so using an
'eor ... , #__NR_SYSCALL_BASE' means that any valid
argument afterwards is a number between zero and
__NR_syscalls, and any invalid argument is a number outside
of that range

EABI syscalls are just 'SVC 0' with the syscall number in register 7
and no offset.

See also
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3f2829a31573e3e502b874c8d69a765f7a778793

> I need some idea how this numberspace is managed in order to
> understand the code so I can review it, I guess it all makes perfect
> sense but I need some background here.

I also had never understood this part before, and I'm still not
sure where the 0x900000 actually comes from, though my best
guess is that this was intended as a an OS specific number space,
with '9' being assigned to Linux (similar to the way Itanium and
MIPS do with their respective offsets). By the time EABI got added,
this was apparently no longer considered helpful.

        Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/9] ARM: syscall: always store thread_info->syscall
  2020-09-28 12:42     ` Arnd Bergmann
@ 2020-09-28 15:08       ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 24+ messages in thread
From: Russell King - ARM Linux admin @ 2020-09-28 15:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Walleij, Christoph Hellwig, Alexander Viro,
	linux-kernel@vger.kernel.org, linux-arch, Linux ARM,
	Oleg Nesterov, linux-kernel

On Mon, Sep 28, 2020 at 02:42:43PM +0200, Arnd Bergmann wrote:
> > I need some idea how this numberspace is managed in order to
> > understand the code so I can review it, I guess it all makes perfect
> > sense but I need some background here.
> 
> I also had never understood this part before, and I'm still not
> sure where the 0x900000 actually comes from, though my best
> guess is that this was intended as a an OS specific number space,
> with '9' being assigned to Linux (similar to the way Itanium and
> MIPS do with their respective offsets). By the time EABI got added,
> this was apparently no longer considered helpful.

It is an OS specific number space, originally designed to allow
RISC OS programs to be run under Linux.  There was indeed such a
project, but that died and the code ripped out. EABI, by using
SWI 0 - or more accurately, not reading the SWI opcode, trampled
over the ability for RISC OS programs to be run under Linux.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  2020-10-30 15:49 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
@ 2020-11-06  8:57   ` Linus Walleij
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Walleij @ 2020-11-06  8:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King, Christoph Hellwig, linux-kernel, Linux ARM,
	Linux-Arch, Linux Memory Management List, Al Viro, Arnd Bergmann

On Fri, Oct 30, 2020 at 4:49 PM Arnd Bergmann <arnd@kernel.org> wrote:

> From: Arnd Bergmann <arnd@arndb.de>
>
> On machines such as ARMv5 that trap unaligned accesses, these
> two functions can be slow when each access needs to be emulated,
> or they might not work at all.
>
> Change them so that each loop is only used when both the src
> and dst pointers are naturally aligned.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  2020-10-30 15:45 [PATCH v4 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
@ 2020-10-30 15:49 ` Arnd Bergmann
  2020-11-06  8:57   ` Linus Walleij
  0 siblings, 1 reply; 24+ messages in thread
From: Arnd Bergmann @ 2020-10-30 15:49 UTC (permalink / raw)
  To: Russell King, Christoph Hellwig
  Cc: linux-kernel, linux-arm-kernel, linux-arch, linux-mm, viro,
	linus.walleij, arnd

From: Arnd Bergmann <arnd@arndb.de>

On machines such as ARMv5 that trap unaligned accesses, these
two functions can be slow when each access needs to be emulated,
or they might not work at all.

Change them so that each loop is only used when both the src
and dst pointers are naturally aligned.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 mm/maccess.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 3bd70405f2d8..d3f1a1f0b1c1 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -24,13 +24,21 @@ bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
 
 long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
 {
+	unsigned long align = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+		align = (unsigned long)dst | (unsigned long)src;
+
 	if (!copy_from_kernel_nofault_allowed(src, size))
 		return -ERANGE;
 
 	pagefault_disable();
-	copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
-	copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
-	copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
+	if (!(align & 7))
+		copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
+	if (!(align & 3))
+		copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
+	if (!(align & 1))
+		copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
 	copy_from_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
@@ -50,10 +58,18 @@ EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);
 
 long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
 {
+	unsigned long align = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+		align = (unsigned long)dst | (unsigned long)src;
+
 	pagefault_disable();
-	copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
-	copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
-	copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
+	if (!(align & 7))
+		copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
+	if (!(align & 3))
+		copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
+	if (!(align & 1))
+		copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
 	copy_to_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-11-06  8:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 15:36 [PATCH 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
2020-09-07 15:36 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
2020-09-08  6:11   ` Christoph Hellwig
2020-09-27  9:25   ` Linus Walleij
2020-09-07 15:36 ` [PATCH 2/9] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
2020-09-08  6:15   ` Christoph Hellwig
2020-09-17 17:29     ` Arnd Bergmann
2020-09-18  7:42       ` Russell King - ARM Linux admin
2020-09-18 12:36         ` Arnd Bergmann
2020-09-07 15:36 ` [PATCH 3/9] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
2020-09-08  6:16   ` Christoph Hellwig
2020-09-07 15:36 ` [PATCH 4/9] ARM: syscall: always store thread_info->syscall Arnd Bergmann
2020-09-28  9:41   ` Linus Walleij
2020-09-28 12:42     ` Arnd Bergmann
2020-09-28 15:08       ` Russell King - ARM Linux admin
2020-09-07 15:36 ` [PATCH 5/9] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
2020-09-08  6:20   ` Christoph Hellwig
2020-09-08 20:56     ` Arnd Bergmann
2020-09-07 15:36 ` [PATCH 6/9] ARM: oabi-compat: rework sys_semtimedop emulation Arnd Bergmann
2020-09-07 15:36 ` [PATCH 7/9] ARM: oabi-compat: rework fcntl64() emulation Arnd Bergmann
2020-09-07 15:36 ` [PATCH 8/9] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
2020-09-07 15:36 ` [PATCH 9/9] ARM: uaccess: remove set_fs() implementation Arnd Bergmann
2020-10-30 15:45 [PATCH v4 0/9] ARM: remove set_fs callers and implementation Arnd Bergmann
2020-10-30 15:49 ` [PATCH 1/9] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
2020-11-06  8:57   ` Linus Walleij

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).