linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/10] ARM: remove set_fs callers and implementation
@ 2021-07-26 14:11 Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 01/10] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

Hi Christoph, Russell,

This is the rebased version of my ARM set_fs patches on top of
v5.14-rc2. There were a couple of minor conflicts that I resolved,
but otherwise this is still identical to the version I tested earlier.

The one notable change this time is that I rename thread_info->syscall
to thread_info->abi_syscall, to clarify that this field now contains
both the ABI bits (0x900000 for OABI, 0x000000 for EABI) in kernels
that support both, and every access to this has to mask out the bits
it needs. As Russell mentioned before, having a 'syscall' field that
not just contains the syscall number is confusing and error-prone, so
I hope this change is good enough.

I have tested the oabi-compat changes using the LTP tests for the three
modified syscalls using an Armv7 kernel and a Debian 5 OABI user space.

I also tested the syscall_get_nr() in all combinations of OABI/EABI
kernel user space and fixed the bugs I found after Russell pointed out
those issues in the early versions.

Linus Walleij did additional testing of v4 on Footbridge with OABI
user space.

With this and the m68k changes getting merged, we are very close
to eliminating set_fs completely.

Russell, you can pull these from

 https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git arm-setfs_v5

or I can add them to the ARM patch tracker if you prefer.

     Arnd

v4: https://lore.kernel.org/linux-arm-kernel/20201030154519.1245983-1-arnd@kernel.org/
v3: https://lore.kernel.org/linux-arm-kernel/20201001141233.119343-1-arnd@arndb.de/
v2: https://lore.kernel.org/linux-arm-kernel/20200918124624.1469673-1-arnd@arndb.de/
v1: https://lore.kernel.org/linux-arm-kernel/20200907153701.2981205-1-arnd@arndb.de/

Arnd Bergmann (9):
  mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  ARM: traps: use get_kernel_nofault instead of set_fs()
  ARM: oabi-compat: add epoll_pwait handler
  ARM: syscall: always store thread_info->abi_syscall
  ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
  ARM: oabi-compat: rework sys_semtimedop emulation
  ARM: oabi-compat: rework fcntl64() emulation
  ARM: uaccess: add __{get,put}_kernel_nofault
  ARM: uaccess: remove set_fs() implementation
  ARM: oabi-compat: fix oabi epoll sparse warning

 arch/arm/Kconfig                   |   1 -
 arch/arm/include/asm/ptrace.h      |   1 -
 arch/arm/include/asm/syscall.h     |  16 ++-
 arch/arm/include/asm/thread_info.h |   6 +-
 arch/arm/include/asm/uaccess-asm.h |   6 -
 arch/arm/include/asm/uaccess.h     | 169 ++++++++++++-----------
 arch/arm/include/uapi/asm/unistd.h |   1 +
 arch/arm/kernel/asm-offsets.c      |   3 +-
 arch/arm/kernel/entry-common.S     |  20 +--
 arch/arm/kernel/process.c          |   7 +-
 arch/arm/kernel/ptrace.c           |  14 +-
 arch/arm/kernel/signal.c           |   8 --
 arch/arm/kernel/sys_oabi-compat.c  | 214 +++++++++++++++++------------
 arch/arm/kernel/traps.c            |  47 +++----
 arch/arm/lib/copy_from_user.S      |   3 +-
 arch/arm/lib/copy_to_user.S        |   3 +-
 arch/arm/tools/syscall.tbl         |   2 +-
 fs/eventpoll.c                     |   5 +-
 include/linux/eventpoll.h          |  18 +++
 include/linux/syscalls.h           |   3 +
 ipc/sem.c                          |  84 ++++++-----
 mm/maccess.c                       |  28 +++-
 22 files changed, 361 insertions(+), 298 deletions(-)

Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>

-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 01/10] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 02/10] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij, Christoph Hellwig

From: Arnd Bergmann <arnd@arndb.de>

On machines such as ARMv5 that trap unaligned accesses, these
two functions can be slow when each access needs to be emulated,
or they might not work at all.

Change them so that each loop is only used when both the src
and dst pointers are naturally aligned.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 mm/maccess.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/mm/maccess.c b/mm/maccess.c
index 3bd70405f2d8..d3f1a1f0b1c1 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -24,13 +24,21 @@ bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src,
 
 long copy_from_kernel_nofault(void *dst, const void *src, size_t size)
 {
+	unsigned long align = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+		align = (unsigned long)dst | (unsigned long)src;
+
 	if (!copy_from_kernel_nofault_allowed(src, size))
 		return -ERANGE;
 
 	pagefault_disable();
-	copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
-	copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
-	copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
+	if (!(align & 7))
+		copy_from_kernel_nofault_loop(dst, src, size, u64, Efault);
+	if (!(align & 3))
+		copy_from_kernel_nofault_loop(dst, src, size, u32, Efault);
+	if (!(align & 1))
+		copy_from_kernel_nofault_loop(dst, src, size, u16, Efault);
 	copy_from_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
@@ -50,10 +58,18 @@ EXPORT_SYMBOL_GPL(copy_from_kernel_nofault);
 
 long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
 {
+	unsigned long align = 0;
+
+	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
+		align = (unsigned long)dst | (unsigned long)src;
+
 	pagefault_disable();
-	copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
-	copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
-	copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
+	if (!(align & 7))
+		copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
+	if (!(align & 3))
+		copy_to_kernel_nofault_loop(dst, src, size, u32, Efault);
+	if (!(align & 1))
+		copy_to_kernel_nofault_loop(dst, src, size, u16, Efault);
 	copy_to_kernel_nofault_loop(dst, src, size, u8, Efault);
 	pagefault_enable();
 	return 0;
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 02/10] ARM: traps: use get_kernel_nofault instead of set_fs()
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 01/10] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 03/10] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij, Christoph Hellwig

From: Arnd Bergmann <arnd@arndb.de>

ARM uses set_fs() and __get_user() to allow the stack dumping code to
access possibly invalid pointers carefully. These can be changed to the
simpler get_kernel_nofault(), and allow the eventual removal of set_fs().

dump_instr() will print either kernel or user space pointers,
depending on how it was called. For dump_mem(), I assume we are only
interested in kernel pointers, and the only time that this is called
with user_mode(regs)==true is when the regs themselves are unreliable
as a result of the condition that caused the trap.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/traps.c | 47 ++++++++++++++---------------------------
 1 file changed, 16 insertions(+), 31 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 64308e3a5d0c..10dd3ef1f398 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -122,17 +122,8 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
 		     unsigned long top)
 {
 	unsigned long first;
-	mm_segment_t fs;
 	int i;
 
-	/*
-	 * We need to switch to kernel mode so that we can use __get_user
-	 * to safely read from kernel space.  Note that we now dump the
-	 * code first, just in case the backtrace kills us.
-	 */
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-
 	printk("%s%s(0x%08lx to 0x%08lx)\n", lvl, str, bottom, top);
 
 	for (first = bottom & ~31; first < top; first += 32) {
@@ -145,7 +136,7 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
 		for (p = first, i = 0; i < 8 && p < top; i++, p += 4) {
 			if (p >= bottom && p < top) {
 				unsigned long val;
-				if (__get_user(val, (unsigned long *)p) == 0)
+				if (get_kernel_nofault(val, (unsigned long *)p))
 					sprintf(str + i * 9, " %08lx", val);
 				else
 					sprintf(str + i * 9, " ????????");
@@ -153,11 +144,9 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
 		}
 		printk("%s%04lx:%s\n", lvl, first & 0xffff, str);
 	}
-
-	set_fs(fs);
 }
 
-static void __dump_instr(const char *lvl, struct pt_regs *regs)
+static void dump_instr(const char *lvl, struct pt_regs *regs)
 {
 	unsigned long addr = instruction_pointer(regs);
 	const int thumb = thumb_mode(regs);
@@ -173,10 +162,20 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
 	for (i = -4; i < 1 + !!thumb; i++) {
 		unsigned int val, bad;
 
-		if (thumb)
-			bad = get_user(val, &((u16 *)addr)[i]);
-		else
-			bad = get_user(val, &((u32 *)addr)[i]);
+		if (!user_mode(regs)) {
+			if (thumb) {
+				u16 val16;
+				bad = get_kernel_nofault(val16, &((u16 *)addr)[i]);
+				val = val16;
+			} else {
+				bad = get_kernel_nofault(val, &((u32 *)addr)[i]);
+			}
+		} else {
+			if (thumb)
+				bad = get_user(val, &((u16 *)addr)[i]);
+			else
+				bad = get_user(val, &((u32 *)addr)[i]);
+		}
 
 		if (!bad)
 			p += sprintf(p, i == 0 ? "(%0*x) " : "%0*x ",
@@ -189,20 +188,6 @@ static void __dump_instr(const char *lvl, struct pt_regs *regs)
 	printk("%sCode: %s\n", lvl, str);
 }
 
-static void dump_instr(const char *lvl, struct pt_regs *regs)
-{
-	mm_segment_t fs;
-
-	if (!user_mode(regs)) {
-		fs = get_fs();
-		set_fs(KERNEL_DS);
-		__dump_instr(lvl, regs);
-		set_fs(fs);
-	} else {
-		__dump_instr(lvl, regs);
-	}
-}
-
 #ifdef CONFIG_ARM_UNWIND
 static inline void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
 				  const char *loglvl)
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 03/10] ARM: oabi-compat: add epoll_pwait handler
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 01/10] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 02/10] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 04/10] ARM: syscall: always store thread_info->abi_syscall Arnd Bergmann
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij, stable,
	Christoph Hellwig

From: Arnd Bergmann <arnd@arndb.de>

The epoll_wait() syscall has a special version for OABI compat
mode to convert the arguments to the EABI structure layout
of the kernel. However, the later epoll_pwait() syscall was
added in arch/arm in linux-2.6.32 without this conversion.

Use the same kind of handler for both.

Fixes: 369842658a36 ("ARM: 5677/1: ARM support for TIF_RESTORE_SIGMASK/pselect6/ppoll/epoll_pwait")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 38 ++++++++++++++++++++++++++++---
 arch/arm/tools/syscall.tbl        |  2 +-
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 075a2e0ed2c1..443203fafb6b 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -265,9 +265,8 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
 	return do_epoll_ctl(epfd, op, fd, &kernel, false);
 }
 
-asmlinkage long sys_oabi_epoll_wait(int epfd,
-				    struct oabi_epoll_event __user *events,
-				    int maxevents, int timeout)
+static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
+			       int maxevents, int timeout)
 {
 	struct epoll_event *kbuf;
 	struct oabi_epoll_event e;
@@ -314,6 +313,39 @@ asmlinkage long sys_oabi_epoll_wait(int epfd,
 }
 #endif
 
+SYSCALL_DEFINE4(oabi_epoll_wait, int, epfd,
+		struct oabi_epoll_event __user *, events,
+		int, maxevents, int, timeout)
+{
+	return do_oabi_epoll_wait(epfd, events, maxevents, timeout);
+}
+
+/*
+ * Implement the event wait interface for the eventpoll file. It is the kernel
+ * part of the user space epoll_pwait(2).
+ */
+SYSCALL_DEFINE6(oabi_epoll_pwait, int, epfd,
+		struct oabi_epoll_event __user *, events, int, maxevents,
+		int, timeout, const sigset_t __user *, sigmask,
+		size_t, sigsetsize)
+{
+	int error;
+
+	/*
+	 * If the caller wants a certain signal mask to be set during the wait,
+	 * we apply it here.
+	 */
+	error = set_user_sigmask(sigmask, sigsetsize);
+	if (error)
+		return error;
+
+	error = do_oabi_epoll_wait(epfd, events, maxevents, timeout);
+	restore_saved_sigmask_unless(error == -EINTR);
+
+	return error;
+}
+#endif
+
 struct oabi_sembuf {
 	unsigned short	sem_num;
 	short		sem_op;
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index c5df1179fc5d..11d0b960b2c2 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -360,7 +360,7 @@
 343	common	vmsplice		sys_vmsplice
 344	common	move_pages		sys_move_pages
 345	common	getcpu			sys_getcpu
-346	common	epoll_pwait		sys_epoll_pwait
+346	common	epoll_pwait		sys_epoll_pwait		sys_oabi_epoll_pwait
 347	common	kexec_load		sys_kexec_load
 348	common	utimensat		sys_utimensat_time32
 349	common	signalfd		sys_signalfd
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 04/10] ARM: syscall: always store thread_info->abi_syscall
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (2 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 03/10] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 05/10] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

The system call number is used in a a couple of places, in particular
ptrace, seccomp and /proc/<pid>/syscall.

The last one apparently never worked reliably on ARM for tasks that are
not currently getting traced.

Storing the syscall number in the normal entry path makes it work,
as well as allowing us to see if the current system call is for OABI
compat mode, which is the next thing I want to hook into.

Since the thread_info->syscall field is not just the number any more, it
is now renamed to abi_syscall. In kernels that enable both OABI and EABI,
the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE)
for OABI tasks, while normal EABI tasks do not set the upper bits. This
makes it possible to implement the in_oabi_syscall() helper later.

All other users of thread_info->syscall go through the syscall_get_nr()
helper, which in turn filters out the ABI bits.

Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one
cannot set the internal number to a particular version, but this was
already the case. We could change it to let gdb encode the ABI type along
with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself
would be a (backwards-compatible) ABI change, so I don't do it here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/syscall.h     |  5 ++++-
 arch/arm/include/asm/thread_info.h |  2 +-
 arch/arm/include/uapi/asm/unistd.h |  1 +
 arch/arm/kernel/asm-offsets.c      |  1 +
 arch/arm/kernel/entry-common.S     |  8 ++++++--
 arch/arm/kernel/ptrace.c           | 14 ++++++++------
 6 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fd02761ba06c..f055e846a5cc 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -22,7 +22,10 @@ extern const unsigned long sys_call_table[];
 static inline int syscall_get_nr(struct task_struct *task,
 				 struct pt_regs *regs)
 {
-	return task_thread_info(task)->syscall;
+	if (IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))
+		return task_thread_info(task)->abi_syscall;
+
+	return task_thread_info(task)->abi_syscall & __NR_SYSCALL_MASK;
 }
 
 static inline void syscall_rollback(struct task_struct *task,
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 70d4cbc49ae1..17c56051747b 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -62,7 +62,7 @@ struct thread_info {
 	unsigned long		stack_canary;
 #endif
 	struct cpu_context_save	cpu_context;	/* cpu context */
-	__u32			syscall;	/* syscall number */
+	__u32			abi_syscall;	/* ABI type and syscall nr */
 	__u8			used_cp[16];	/* thread used copro */
 	unsigned long		tp_value[2];	/* TLS registers */
 #ifdef CONFIG_CRUNCH
diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ae7749e15726..a1149911464c 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -15,6 +15,7 @@
 #define _UAPI__ASM_ARM_UNISTD_H
 
 #define __NR_OABI_SYSCALL_BASE	0x900000
+#define __NR_SYSCALL_MASK	0x0fffff
 
 #if defined(__thumb__) || defined(__ARM_EABI__)
 #define __NR_SYSCALL_BASE	0
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 70993af22d80..a0945b898ca3 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -48,6 +48,7 @@ int main(void)
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
   DEFINE(TI_CPU_SAVE,		offsetof(struct thread_info, cpu_context));
+  DEFINE(TI_ABI_SYSCALL,	offsetof(struct thread_info, abi_syscall));
   DEFINE(TI_USED_CP,		offsetof(struct thread_info, used_cp));
   DEFINE(TI_TP_VALUE,		offsetof(struct thread_info, tp_value));
   DEFINE(TI_FPSTATE,		offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 7f0b7aba1498..e837af90cd44 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -226,6 +226,7 @@ ENTRY(vector_swi)
 	/* saved_psr and saved_pc are now dead */
 
 	uaccess_disable tbl
+	get_thread_info tsk
 
 	adr	tbl, sys_call_table		@ load syscall table pointer
 
@@ -237,13 +238,17 @@ ENTRY(vector_swi)
 	 * get the old ABI syscall table address.
 	 */
 	bics	r10, r10, #0xff000000
+	strne	r10, [tsk, #TI_ABI_SYSCALL]
+	streq	scno, [tsk, #TI_ABI_SYSCALL]
 	eorne	scno, r10, #__NR_OABI_SYSCALL_BASE
 	ldrne	tbl, =sys_oabi_call_table
 #elif !defined(CONFIG_AEABI)
 	bic	scno, scno, #0xff000000		@ mask off SWI op-code
+	str	scno, [tsk, #TI_ABI_SYSCALL]
 	eor	scno, scno, #__NR_SYSCALL_BASE	@ check OS number
+#else
+	str	scno, [tsk, #TI_ABI_SYSCALL]
 #endif
-	get_thread_info tsk
 	/*
 	 * Reload the registers that may have been corrupted on entry to
 	 * the syscall assembly (by tracing or context tracking.)
@@ -288,7 +293,6 @@ ENDPROC(vector_swi)
 	 * context switches, and waiting for our parent to respond.
 	 */
 __sys_trace:
-	mov	r1, scno
 	add	r0, sp, #S_OFF
 	bl	syscall_trace_enter
 	mov	scno, r0
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index 2771e682220b..d886ea8910cb 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -25,6 +25,7 @@
 #include <linux/tracehook.h>
 #include <linux/unistd.h>
 
+#include <asm/syscall.h>
 #include <asm/traps.h>
 
 #define CREATE_TRACE_POINTS
@@ -811,7 +812,8 @@ long arch_ptrace(struct task_struct *child, long request,
 			break;
 
 		case PTRACE_SET_SYSCALL:
-			task_thread_info(child)->syscall = data;
+			task_thread_info(child)->abi_syscall = data &
+							__NR_SYSCALL_MASK;
 			ret = 0;
 			break;
 
@@ -880,14 +882,14 @@ static void tracehook_report_syscall(struct pt_regs *regs,
 	if (dir == PTRACE_SYSCALL_EXIT)
 		tracehook_report_syscall_exit(regs, 0);
 	else if (tracehook_report_syscall_entry(regs))
-		current_thread_info()->syscall = -1;
+		current_thread_info()->abi_syscall = -1;
 
 	regs->ARM_ip = ip;
 }
 
-asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
+asmlinkage int syscall_trace_enter(struct pt_regs *regs)
 {
-	current_thread_info()->syscall = scno;
+	int scno;
 
 	if (test_thread_flag(TIF_SYSCALL_TRACE))
 		tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
@@ -898,11 +900,11 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs, int scno)
 		return -1;
 #else
 	/* XXX: remove this once OABI gets fixed */
-	secure_computing_strict(current_thread_info()->syscall);
+	secure_computing_strict(syscall_get_nr(current, regs));
 #endif
 
 	/* Tracer or seccomp may have changed syscall. */
-	scno = current_thread_info()->syscall;
+	scno = syscall_get_nr(current, regs);
 
 	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
 		trace_sys_enter(regs, scno);
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 05/10] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (3 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 04/10] ARM: syscall: always store thread_info->abi_syscall Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 06/10] ARM: oabi-compat: rework sys_semtimedop emulation Arnd Bergmann
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

The epoll_wait() system call wrapper is one of the remaining users of
the set_fs() infrasturcture for Arm. Changing it to not require set_fs()
is rather complex unfortunately.

The approach I'm taking here is to allow architectures to override
the code that copies the output to user space, and let the oabi-compat
implementation check whether it is getting called from an EABI or OABI
system call based on the thread_info->syscall value.

The in_oabi_syscall() check here mirrors the in_compat_syscall() and
in_x32_syscall() helpers for 32-bit compat implementations on other
architectures.

Overall, the amount of code goes down, at least with the newly added
sys_oabi_epoll_pwait() helper getting removed again. The downside
is added complexity in the source code for the native implementation.
There should be no difference in runtime performance except for Arm
kernels with CONFIG_OABI_COMPAT enabled that now have to go through
an external function call to check which of the two variants to use.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/syscall.h    | 11 ++++
 arch/arm/kernel/sys_oabi-compat.c | 83 ++++++-------------------------
 arch/arm/tools/syscall.tbl        |  4 +-
 fs/eventpoll.c                    |  5 +-
 include/linux/eventpoll.h         | 18 +++++++
 5 files changed, 49 insertions(+), 72 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index f055e846a5cc..24c19d63ff0a 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -28,6 +28,17 @@ static inline int syscall_get_nr(struct task_struct *task,
 	return task_thread_info(task)->abi_syscall & __NR_SYSCALL_MASK;
 }
 
+static inline bool __in_oabi_syscall(struct task_struct *task)
+{
+	return IS_ENABLED(CONFIG_OABI_COMPAT) &&
+		(task_thread_info(task)->abi_syscall & __NR_OABI_SYSCALL_BASE);
+}
+
+static inline bool in_oabi_syscall(void)
+{
+	return __in_oabi_syscall(current);
+}
+
 static inline void syscall_rollback(struct task_struct *task,
 				    struct pt_regs *regs)
 {
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 443203fafb6b..1f6a433200f1 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -83,6 +83,8 @@
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 
+#include <asm/syscall.h>
+
 struct oldabi_stat64 {
 	unsigned long long st_dev;
 	unsigned int	__pad1;
@@ -264,87 +266,34 @@ asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
 
 	return do_epoll_ctl(epfd, op, fd, &kernel, false);
 }
-
-static long do_oabi_epoll_wait(int epfd, struct oabi_epoll_event __user *events,
-			       int maxevents, int timeout)
-{
-	struct epoll_event *kbuf;
-	struct oabi_epoll_event e;
-	mm_segment_t fs;
-	long ret, err, i;
-
-	if (maxevents <= 0 ||
-			maxevents > (INT_MAX/sizeof(*kbuf)) ||
-			maxevents > (INT_MAX/sizeof(*events)))
-		return -EINVAL;
-	if (!access_ok(events, sizeof(*events) * maxevents))
-		return -EFAULT;
-	kbuf = kmalloc_array(maxevents, sizeof(*kbuf), GFP_KERNEL);
-	if (!kbuf)
-		return -ENOMEM;
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-	ret = sys_epoll_wait(epfd, kbuf, maxevents, timeout);
-	set_fs(fs);
-	err = 0;
-	for (i = 0; i < ret; i++) {
-		e.events = kbuf[i].events;
-		e.data = kbuf[i].data;
-		err = __copy_to_user(events, &e, sizeof(e));
-		if (err)
-			break;
-		events++;
-	}
-	kfree(kbuf);
-	return err ? -EFAULT : ret;
-}
 #else
 asmlinkage long sys_oabi_epoll_ctl(int epfd, int op, int fd,
 				   struct oabi_epoll_event __user *event)
 {
 	return -EINVAL;
 }
-
-asmlinkage long sys_oabi_epoll_wait(int epfd,
-				    struct oabi_epoll_event __user *events,
-				    int maxevents, int timeout)
-{
-	return -EINVAL;
-}
 #endif
 
-SYSCALL_DEFINE4(oabi_epoll_wait, int, epfd,
-		struct oabi_epoll_event __user *, events,
-		int, maxevents, int, timeout)
+struct epoll_event __user *
+epoll_put_uevent(__poll_t revents, __u64 data,
+		 struct epoll_event __user *uevent)
 {
-	return do_oabi_epoll_wait(epfd, events, maxevents, timeout);
-}
+	if (in_oabi_syscall()) {
+		struct oabi_epoll_event __user *oevent = (void __user *)uevent;
 
-/*
- * Implement the event wait interface for the eventpoll file. It is the kernel
- * part of the user space epoll_pwait(2).
- */
-SYSCALL_DEFINE6(oabi_epoll_pwait, int, epfd,
-		struct oabi_epoll_event __user *, events, int, maxevents,
-		int, timeout, const sigset_t __user *, sigmask,
-		size_t, sigsetsize)
-{
-	int error;
+		if (__put_user(revents, &oevent->events) ||
+		    __put_user(data, &oevent->data))
+			return NULL;
 
-	/*
-	 * If the caller wants a certain signal mask to be set during the wait,
-	 * we apply it here.
-	 */
-	error = set_user_sigmask(sigmask, sigsetsize);
-	if (error)
-		return error;
+		return (void __user *)(oevent+1);
+	}
 
-	error = do_oabi_epoll_wait(epfd, events, maxevents, timeout);
-	restore_saved_sigmask_unless(error == -EINTR);
+	if (__put_user(revents, &uevent->events) ||
+	    __put_user(data, &uevent->data))
+		return NULL;
 
-	return error;
+	return uevent+1;
 }
-#endif
 
 struct oabi_sembuf {
 	unsigned short	sem_num;
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 11d0b960b2c2..344424a9611f 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -266,7 +266,7 @@
 249	common	lookup_dcookie		sys_lookup_dcookie
 250	common	epoll_create		sys_epoll_create
 251	common	epoll_ctl		sys_epoll_ctl		sys_oabi_epoll_ctl
-252	common	epoll_wait		sys_epoll_wait		sys_oabi_epoll_wait
+252	common	epoll_wait		sys_epoll_wait
 253	common	remap_file_pages	sys_remap_file_pages
 # 254 for set_thread_area
 # 255 for get_thread_area
@@ -360,7 +360,7 @@
 343	common	vmsplice		sys_vmsplice
 344	common	move_pages		sys_move_pages
 345	common	getcpu			sys_getcpu
-346	common	epoll_pwait		sys_epoll_pwait		sys_oabi_epoll_pwait
+346	common	epoll_pwait		sys_epoll_pwait
 347	common	kexec_load		sys_kexec_load
 348	common	utimensat		sys_utimensat_time32
 349	common	signalfd		sys_signalfd
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e596e1d0bba..c90c4352325e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1684,8 +1684,8 @@ static int ep_send_events(struct eventpoll *ep,
 		if (!revents)
 			continue;
 
-		if (__put_user(revents, &events->events) ||
-		    __put_user(epi->event.data, &events->data)) {
+		events = epoll_put_uevent(revents, epi->event.data, events);
+		if (!events) {
 			list_add(&epi->rdllink, &txlist);
 			ep_pm_stay_awake(epi);
 			if (!res)
@@ -1693,7 +1693,6 @@ static int ep_send_events(struct eventpoll *ep,
 			break;
 		}
 		res++;
-		events++;
 		if (epi->event.events & EPOLLONESHOT)
 			epi->event.events &= EP_PRIVATE_BITS;
 		else if (!(epi->event.events & EPOLLET)) {
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 593322c946e6..3337745d81bd 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -68,4 +68,22 @@ static inline void eventpoll_release(struct file *file) {}
 
 #endif
 
+#if defined(CONFIG_ARM) && defined(CONFIG_OABI_COMPAT)
+/* ARM OABI has an incompatible struct layout and needs a special handler */
+extern struct epoll_event __user *
+epoll_put_uevent(__poll_t revents, __u64 data,
+		 struct epoll_event __user *uevent);
+#else
+static inline struct epoll_event __user *
+epoll_put_uevent(__poll_t revents, __u64 data,
+		 struct epoll_event __user *uevent)
+{
+	if (__put_user(revents, &uevent->events) ||
+	    __put_user(data, &uevent->data))
+		return NULL;
+
+	return uevent+1;
+}
+#endif
+
 #endif /* #ifndef _LINUX_EVENTPOLL_H */
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 06/10] ARM: oabi-compat: rework sys_semtimedop emulation
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (4 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 05/10] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 07/10] ARM: oabi-compat: rework fcntl64() emulation Arnd Bergmann
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

sys_oabi_semtimedop() is one of the last users of set_fs() on Arm. To
remove this one, expose the internal code of the actual implementation
that operates on a kernel pointer and call it directly after copying.

There should be no measurable impact on the normal execution of this
function, and it makes the overly long function a little shorter, which
may help readability.

While reworking the oabi version, make it behave a little more like
the native one, using kvmalloc_array() and restructure the code
flow in a similar way.

The naming of __do_semtimedop() is not very good, I hope someone can
come up with a better name.

One regression was spotted by kernel test robot <rong.a.chen@intel.com>
and fixed before the first mailing list submission.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 60 ++++++++++++++++------
 include/linux/syscalls.h          |  3 ++
 ipc/sem.c                         | 84 +++++++++++++++++++------------
 3 files changed, 99 insertions(+), 48 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 1f6a433200f1..5ea365c35ca5 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -80,6 +80,7 @@
 #include <linux/socket.h>
 #include <linux/net.h>
 #include <linux/ipc.h>
+#include <linux/ipc_namespace.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 
@@ -302,46 +303,52 @@ struct oabi_sembuf {
 	unsigned short	__pad;
 };
 
+#define sc_semopm     sem_ctls[2]
+
+#ifdef CONFIG_SYSVIPC
 asmlinkage long sys_oabi_semtimedop(int semid,
 				    struct oabi_sembuf __user *tsops,
 				    unsigned nsops,
 				    const struct old_timespec32 __user *timeout)
 {
+	struct ipc_namespace *ns;
 	struct sembuf *sops;
-	struct old_timespec32 local_timeout;
 	long err;
 	int i;
 
+	ns = current->nsproxy->ipc_ns;
+	if (nsops > ns->sc_semopm)
+		return -E2BIG;
 	if (nsops < 1 || nsops > SEMOPM)
 		return -EINVAL;
-	if (!access_ok(tsops, sizeof(*tsops) * nsops))
-		return -EFAULT;
-	sops = kmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+	sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
 	if (!sops)
 		return -ENOMEM;
 	err = 0;
 	for (i = 0; i < nsops; i++) {
 		struct oabi_sembuf osb;
-		err |= __copy_from_user(&osb, tsops, sizeof(osb));
+		err |= copy_from_user(&osb, tsops, sizeof(osb));
 		sops[i].sem_num = osb.sem_num;
 		sops[i].sem_op = osb.sem_op;
 		sops[i].sem_flg = osb.sem_flg;
 		tsops++;
 	}
-	if (timeout) {
-		/* copy this as well before changing domain protection */
-		err |= copy_from_user(&local_timeout, timeout, sizeof(*timeout));
-		timeout = &local_timeout;
-	}
 	if (err) {
 		err = -EFAULT;
-	} else {
-		mm_segment_t fs = get_fs();
-		set_fs(KERNEL_DS);
-		err = sys_semtimedop_time32(semid, sops, nsops, timeout);
-		set_fs(fs);
+		goto out;
+	}
+
+	if (timeout) {
+		struct timespec64 ts;
+		err = get_old_timespec32(&ts, timeout);
+		if (err)
+			goto out;
+		err = __do_semtimedop(semid, sops, nsops, &ts, ns);
+		goto out;
 	}
-	kfree(sops);
+	err = __do_semtimedop(semid, sops, nsops, NULL, ns);
+out:
+	kvfree(sops);
 	return err;
 }
 
@@ -368,6 +375,27 @@ asmlinkage int sys_oabi_ipc(uint call, int first, int second, int third,
 		return sys_ipc(call, first, second, third, ptr, fifth);
 	}
 }
+#else
+asmlinkage long sys_oabi_semtimedop(int semid,
+				    struct oabi_sembuf __user *tsops,
+				    unsigned nsops,
+				    const struct old_timespec32 __user *timeout)
+{
+	return -ENOSYS;
+}
+
+asmlinkage long sys_oabi_semop(int semid, struct oabi_sembuf __user *tsops,
+			       unsigned nsops)
+{
+	return -ENOSYS;
+}
+
+asmlinkage int sys_oabi_ipc(uint call, int first, int second, int third,
+			    void __user *ptr, long fifth)
+{
+	return -ENOSYS;
+}
+#endif
 
 asmlinkage long sys_oabi_bind(int fd, struct sockaddr __user *addr, int addrlen)
 {
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 69c9a7010081..6c6fc3fd5b72 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1373,6 +1373,9 @@ long ksys_old_shmctl(int shmid, int cmd, struct shmid_ds __user *buf);
 long compat_ksys_semtimedop(int semid, struct sembuf __user *tsems,
 			    unsigned int nsops,
 			    const struct old_timespec32 __user *timeout);
+long __do_semtimedop(int semid, struct sembuf *tsems, unsigned int nsops,
+		     const struct timespec64 *timeout,
+		     struct ipc_namespace *ns);
 
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen);
diff --git a/ipc/sem.c b/ipc/sem.c
index 971e75d28364..ae8d9104b0a0 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1984,46 +1984,34 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
 	return un;
 }
 
-static long do_semtimedop(int semid, struct sembuf __user *tsops,
-		unsigned nsops, const struct timespec64 *timeout)
+long __do_semtimedop(int semid, struct sembuf *sops,
+		unsigned nsops, const struct timespec64 *timeout,
+		struct ipc_namespace *ns)
 {
 	int error = -EINVAL;
 	struct sem_array *sma;
-	struct sembuf fast_sops[SEMOPM_FAST];
-	struct sembuf *sops = fast_sops, *sop;
+	struct sembuf *sop;
 	struct sem_undo *un;
 	int max, locknum;
 	bool undos = false, alter = false, dupsop = false;
 	struct sem_queue queue;
 	unsigned long dup = 0, jiffies_left = 0;
-	struct ipc_namespace *ns;
-
-	ns = current->nsproxy->ipc_ns;
 
 	if (nsops < 1 || semid < 0)
 		return -EINVAL;
 	if (nsops > ns->sc_semopm)
 		return -E2BIG;
-	if (nsops > SEMOPM_FAST) {
-		sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
-		if (sops == NULL)
-			return -ENOMEM;
-	}
-
-	if (copy_from_user(sops, tsops, nsops * sizeof(*tsops))) {
-		error =  -EFAULT;
-		goto out_free;
-	}
 
 	if (timeout) {
 		if (timeout->tv_sec < 0 || timeout->tv_nsec < 0 ||
 			timeout->tv_nsec >= 1000000000L) {
 			error = -EINVAL;
-			goto out_free;
+			goto out;
 		}
 		jiffies_left = timespec64_to_jiffies(timeout);
 	}
 
+
 	max = 0;
 	for (sop = sops; sop < sops + nsops; sop++) {
 		unsigned long mask = 1ULL << ((sop->sem_num) % BITS_PER_LONG);
@@ -2052,7 +2040,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		un = find_alloc_undo(ns, semid);
 		if (IS_ERR(un)) {
 			error = PTR_ERR(un);
-			goto out_free;
+			goto out;
 		}
 	} else {
 		un = NULL;
@@ -2063,25 +2051,25 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	if (IS_ERR(sma)) {
 		rcu_read_unlock();
 		error = PTR_ERR(sma);
-		goto out_free;
+		goto out;
 	}
 
 	error = -EFBIG;
 	if (max >= sma->sem_nsems) {
 		rcu_read_unlock();
-		goto out_free;
+		goto out;
 	}
 
 	error = -EACCES;
 	if (ipcperms(ns, &sma->sem_perm, alter ? S_IWUGO : S_IRUGO)) {
 		rcu_read_unlock();
-		goto out_free;
+		goto out;
 	}
 
 	error = security_sem_semop(&sma->sem_perm, sops, nsops, alter);
 	if (error) {
 		rcu_read_unlock();
-		goto out_free;
+		goto out;
 	}
 
 	error = -EIDRM;
@@ -2095,7 +2083,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	 * entangled here and why it's RMID race safe on comments at sem_lock()
 	 */
 	if (!ipc_valid_object(&sma->sem_perm))
-		goto out_unlock_free;
+		goto out_unlock;
 	/*
 	 * semid identifiers are not unique - find_alloc_undo may have
 	 * allocated an undo structure, it was invalidated by an RMID
@@ -2104,7 +2092,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	 * "un" itself is guaranteed by rcu.
 	 */
 	if (un && un->semid == -1)
-		goto out_unlock_free;
+		goto out_unlock;
 
 	queue.sops = sops;
 	queue.nsops = nsops;
@@ -2130,10 +2118,10 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		rcu_read_unlock();
 		wake_up_q(&wake_q);
 
-		goto out_free;
+		goto out;
 	}
 	if (error < 0) /* non-blocking error path */
-		goto out_unlock_free;
+		goto out_unlock;
 
 	/*
 	 * We need to sleep on this operation, so we put the current
@@ -2198,14 +2186,14 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		if (error != -EINTR) {
 			/* see SEM_BARRIER_2 for purpose/pairing */
 			smp_acquire__after_ctrl_dep();
-			goto out_free;
+			goto out;
 		}
 
 		rcu_read_lock();
 		locknum = sem_lock(sma, sops, nsops);
 
 		if (!ipc_valid_object(&sma->sem_perm))
-			goto out_unlock_free;
+			goto out_unlock;
 
 		/*
 		 * No necessity for any barrier: We are protect by sem_lock()
@@ -2217,7 +2205,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		 * Leave without unlink_queue(), but with sem_unlock().
 		 */
 		if (error != -EINTR)
-			goto out_unlock_free;
+			goto out_unlock;
 
 		/*
 		 * If an interrupt occurred we have to clean up the queue.
@@ -2228,13 +2216,45 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 
 	unlink_queue(sma, &queue);
 
-out_unlock_free:
+out_unlock:
 	sem_unlock(sma, locknum);
 	rcu_read_unlock();
+out:
+	return error;
+}
+
+static long do_semtimedop(int semid, struct sembuf __user *tsops,
+		unsigned nsops, const struct timespec64 *timeout)
+{
+	struct sembuf fast_sops[SEMOPM_FAST];
+	struct sembuf *sops = fast_sops;
+	struct ipc_namespace *ns;
+	int ret;
+
+	ns = current->nsproxy->ipc_ns;
+	if (nsops > ns->sc_semopm)
+		return -E2BIG;
+	if (nsops < 1)
+		return -EINVAL;
+
+	if (nsops > SEMOPM_FAST) {
+		sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
+		if (sops == NULL)
+			return -ENOMEM;
+	}
+
+	if (copy_from_user(sops, tsops, nsops * sizeof(*tsops))) {
+		ret =  -EFAULT;
+		goto out_free;
+	}
+
+	ret = __do_semtimedop(semid, sops, nsops, timeout, ns);
+
 out_free:
 	if (sops != fast_sops)
 		kvfree(sops);
-	return error;
+
+	return ret;
 }
 
 long ksys_semtimedop(int semid, struct sembuf __user *tsops,
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 07/10] ARM: oabi-compat: rework fcntl64() emulation
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (5 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 06/10] ARM: oabi-compat: rework sys_semtimedop emulation Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

This is one of the last users of get_fs(), and this is fairly easy to
change, since the infrastructure for it is already there.

The replacement here is essentially a copy of the existing fcntl64()
syscall entry function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 93 ++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 5ea365c35ca5..223ee46b6e75 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -194,56 +194,83 @@ struct oabi_flock64 {
 	pid_t	l_pid;
 } __attribute__ ((packed,aligned(4)));
 
-static long do_locks(unsigned int fd, unsigned int cmd,
-				 unsigned long arg)
+static int get_oabi_flock(struct flock64 *kernel, struct oabi_flock64 __user *arg)
 {
-	struct flock64 kernel;
 	struct oabi_flock64 user;
-	mm_segment_t fs;
-	long ret;
 
 	if (copy_from_user(&user, (struct oabi_flock64 __user *)arg,
 			   sizeof(user)))
 		return -EFAULT;
-	kernel.l_type	= user.l_type;
-	kernel.l_whence	= user.l_whence;
-	kernel.l_start	= user.l_start;
-	kernel.l_len	= user.l_len;
-	kernel.l_pid	= user.l_pid;
-
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-	ret = sys_fcntl64(fd, cmd, (unsigned long)&kernel);
-	set_fs(fs);
-
-	if (!ret && (cmd == F_GETLK64 || cmd == F_OFD_GETLK)) {
-		user.l_type	= kernel.l_type;
-		user.l_whence	= kernel.l_whence;
-		user.l_start	= kernel.l_start;
-		user.l_len	= kernel.l_len;
-		user.l_pid	= kernel.l_pid;
-		if (copy_to_user((struct oabi_flock64 __user *)arg,
-				 &user, sizeof(user)))
-			ret = -EFAULT;
-	}
-	return ret;
+
+	kernel->l_type	 = user.l_type;
+	kernel->l_whence = user.l_whence;
+	kernel->l_start	 = user.l_start;
+	kernel->l_len	 = user.l_len;
+	kernel->l_pid	 = user.l_pid;
+
+	return 0;
+}
+
+static int put_oabi_flock(struct flock64 *kernel, struct oabi_flock64 __user *arg)
+{
+	struct oabi_flock64 user;
+
+	user.l_type	= kernel->l_type;
+	user.l_whence	= kernel->l_whence;
+	user.l_start	= kernel->l_start;
+	user.l_len	= kernel->l_len;
+	user.l_pid	= kernel->l_pid;
+
+	if (copy_to_user((struct oabi_flock64 __user *)arg,
+			 &user, sizeof(user)))
+		return -EFAULT;
+
+	return 0;
 }
 
 asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 				 unsigned long arg)
 {
+	void __user *argp = (void __user *)arg;
+	struct fd f = fdget_raw(fd);
+	struct flock64 flock;
+	long err = -EBADF;
+
+	if (!f.file)
+		goto out;
+
 	switch (cmd) {
-	case F_OFD_GETLK:
-	case F_OFD_SETLK:
-	case F_OFD_SETLKW:
 	case F_GETLK64:
+	case F_OFD_GETLK:
+		err = security_file_fcntl(f.file, cmd, arg);
+		if (err)
+			break;
+		err = get_oabi_flock(&flock, argp);
+		if (err)
+			break;
+		err = fcntl_getlk64(f.file, cmd, &flock);
+		if (!err)
+		       err = put_oabi_flock(&flock, argp);
+		break;
 	case F_SETLK64:
 	case F_SETLKW64:
-		return do_locks(fd, cmd, arg);
-
+	case F_OFD_SETLK:
+	case F_OFD_SETLKW:
+		err = security_file_fcntl(f.file, cmd, arg);
+		if (err)
+			break;
+		err = get_oabi_flock(&flock, argp);
+		if (err)
+			break;
+		err = fcntl_setlk64(fd, f.file, cmd, &flock);
+		break;
 	default:
-		return sys_fcntl64(fd, cmd, arg);
+		err = sys_fcntl64(fd, cmd, arg);
+		break;
 	}
+	fdput(f);
+out:
+	return err;
 }
 
 struct oabi_epoll_event {
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (6 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 07/10] ARM: oabi-compat: rework fcntl64() emulation Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2022-01-12 17:29   ` Daniel Thompson
  2021-07-26 14:11 ` [PATCH v5 09/10] ARM: uaccess: remove set_fs() implementation Arnd Bergmann
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

These mimic the behavior of get_user and put_user, except
for domain switching, address limit checking and handling
of mismatched sizes, none of which are relevant here.

To work with pre-Armv6 kernels, this has to avoid TUSER()
inside of the new macros, the new approach passes the "t"
string along with the opcode, which is a bit uglier but
avoids duplicating more code.

As there is no __get_user_asm_dword(), I work around it
by copying 32 bit at a time, which is possible because
the output size is known.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/uaccess.h | 123 ++++++++++++++++++++++-----------
 1 file changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index a13d90206472..4f60638755c4 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -308,11 +308,11 @@ static inline void set_fs(mm_segment_t fs)
 #define __get_user(x, ptr)						\
 ({									\
 	long __gu_err = 0;						\
-	__get_user_err((x), (ptr), __gu_err);				\
+	__get_user_err((x), (ptr), __gu_err, TUSER());			\
 	__gu_err;							\
 })
 
-#define __get_user_err(x, ptr, err)					\
+#define __get_user_err(x, ptr, err, __t)				\
 do {									\
 	unsigned long __gu_addr = (unsigned long)(ptr);			\
 	unsigned long __gu_val;						\
@@ -321,18 +321,19 @@ do {									\
 	might_fault();							\
 	__ua_flags = uaccess_save_and_enable();				\
 	switch (sizeof(*(ptr))) {					\
-	case 1:	__get_user_asm_byte(__gu_val, __gu_addr, err);	break;	\
-	case 2:	__get_user_asm_half(__gu_val, __gu_addr, err);	break;	\
-	case 4:	__get_user_asm_word(__gu_val, __gu_addr, err);	break;	\
+	case 1:	__get_user_asm_byte(__gu_val, __gu_addr, err, __t); break;	\
+	case 2:	__get_user_asm_half(__gu_val, __gu_addr, err, __t); break;	\
+	case 4:	__get_user_asm_word(__gu_val, __gu_addr, err, __t); break;	\
 	default: (__gu_val) = __get_user_bad();				\
 	}								\
 	uaccess_restore(__ua_flags);					\
 	(x) = (__typeof__(*(ptr)))__gu_val;				\
 } while (0)
+#endif
 
 #define __get_user_asm(x, addr, err, instr)			\
 	__asm__ __volatile__(					\
-	"1:	" TUSER(instr) " %1, [%2], #0\n"		\
+	"1:	" instr " %1, [%2], #0\n"			\
 	"2:\n"							\
 	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
@@ -348,40 +349,38 @@ do {									\
 	: "r" (addr), "i" (-EFAULT)				\
 	: "cc")
 
-#define __get_user_asm_byte(x, addr, err)			\
-	__get_user_asm(x, addr, err, ldrb)
+#define __get_user_asm_byte(x, addr, err, __t)			\
+	__get_user_asm(x, addr, err, "ldrb" __t)
 
 #if __LINUX_ARM_ARCH__ >= 6
 
-#define __get_user_asm_half(x, addr, err)			\
-	__get_user_asm(x, addr, err, ldrh)
+#define __get_user_asm_half(x, addr, err, __t)			\
+	__get_user_asm(x, addr, err, "ldrh" __t)
 
 #else
 
 #ifndef __ARMEB__
-#define __get_user_asm_half(x, __gu_addr, err)			\
+#define __get_user_asm_half(x, __gu_addr, err, __t)		\
 ({								\
 	unsigned long __b1, __b2;				\
-	__get_user_asm_byte(__b1, __gu_addr, err);		\
-	__get_user_asm_byte(__b2, __gu_addr + 1, err);		\
+	__get_user_asm_byte(__b1, __gu_addr, err, __t);		\
+	__get_user_asm_byte(__b2, __gu_addr + 1, err, __t);	\
 	(x) = __b1 | (__b2 << 8);				\
 })
 #else
-#define __get_user_asm_half(x, __gu_addr, err)			\
+#define __get_user_asm_half(x, __gu_addr, err, __t)		\
 ({								\
 	unsigned long __b1, __b2;				\
-	__get_user_asm_byte(__b1, __gu_addr, err);		\
-	__get_user_asm_byte(__b2, __gu_addr + 1, err);		\
+	__get_user_asm_byte(__b1, __gu_addr, err, __t);		\
+	__get_user_asm_byte(__b2, __gu_addr + 1, err, __t);	\
 	(x) = (__b1 << 8) | __b2;				\
 })
 #endif
 
 #endif /* __LINUX_ARM_ARCH__ >= 6 */
 
-#define __get_user_asm_word(x, addr, err)			\
-	__get_user_asm(x, addr, err, ldr)
-#endif
-
+#define __get_user_asm_word(x, addr, err, __t)			\
+	__get_user_asm(x, addr, err, "ldr" __t)
 
 #define __put_user_switch(x, ptr, __err, __fn)				\
 	do {								\
@@ -425,7 +424,7 @@ do {									\
 #define __put_user_nocheck(x, __pu_ptr, __err, __size)			\
 	do {								\
 		unsigned long __pu_addr = (unsigned long)__pu_ptr;	\
-		__put_user_nocheck_##__size(x, __pu_addr, __err);	\
+		__put_user_nocheck_##__size(x, __pu_addr, __err, TUSER());\
 	} while (0)
 
 #define __put_user_nocheck_1 __put_user_asm_byte
@@ -433,9 +432,11 @@ do {									\
 #define __put_user_nocheck_4 __put_user_asm_word
 #define __put_user_nocheck_8 __put_user_asm_dword
 
+#endif /* !CONFIG_CPU_SPECTRE */
+
 #define __put_user_asm(x, __pu_addr, err, instr)		\
 	__asm__ __volatile__(					\
-	"1:	" TUSER(instr) " %1, [%2], #0\n"		\
+	"1:	" instr " %1, [%2], #0\n"		\
 	"2:\n"							\
 	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
@@ -450,36 +451,36 @@ do {									\
 	: "r" (x), "r" (__pu_addr), "i" (-EFAULT)		\
 	: "cc")
 
-#define __put_user_asm_byte(x, __pu_addr, err)			\
-	__put_user_asm(x, __pu_addr, err, strb)
+#define __put_user_asm_byte(x, __pu_addr, err, __t)		\
+	__put_user_asm(x, __pu_addr, err, "strb" __t)
 
 #if __LINUX_ARM_ARCH__ >= 6
 
-#define __put_user_asm_half(x, __pu_addr, err)			\
-	__put_user_asm(x, __pu_addr, err, strh)
+#define __put_user_asm_half(x, __pu_addr, err, __t)		\
+	__put_user_asm(x, __pu_addr, err, "strh" __t)
 
 #else
 
 #ifndef __ARMEB__
-#define __put_user_asm_half(x, __pu_addr, err)			\
+#define __put_user_asm_half(x, __pu_addr, err, __t)		\
 ({								\
 	unsigned long __temp = (__force unsigned long)(x);	\
-	__put_user_asm_byte(__temp, __pu_addr, err);		\
-	__put_user_asm_byte(__temp >> 8, __pu_addr + 1, err);	\
+	__put_user_asm_byte(__temp, __pu_addr, err, __t);	\
+	__put_user_asm_byte(__temp >> 8, __pu_addr + 1, err, __t);\
 })
 #else
-#define __put_user_asm_half(x, __pu_addr, err)			\
+#define __put_user_asm_half(x, __pu_addr, err, __t)		\
 ({								\
 	unsigned long __temp = (__force unsigned long)(x);	\
-	__put_user_asm_byte(__temp >> 8, __pu_addr, err);	\
-	__put_user_asm_byte(__temp, __pu_addr + 1, err);	\
+	__put_user_asm_byte(__temp >> 8, __pu_addr, err, __t);	\
+	__put_user_asm_byte(__temp, __pu_addr + 1, err, __t);	\
 })
 #endif
 
 #endif /* __LINUX_ARM_ARCH__ >= 6 */
 
-#define __put_user_asm_word(x, __pu_addr, err)			\
-	__put_user_asm(x, __pu_addr, err, str)
+#define __put_user_asm_word(x, __pu_addr, err, __t)		\
+	__put_user_asm(x, __pu_addr, err, "str" __t)
 
 #ifndef __ARMEB__
 #define	__reg_oper0	"%R2"
@@ -489,12 +490,12 @@ do {									\
 #define	__reg_oper1	"%R2"
 #endif
 
-#define __put_user_asm_dword(x, __pu_addr, err)			\
+#define __put_user_asm_dword(x, __pu_addr, err, __t)		\
 	__asm__ __volatile__(					\
- ARM(	"1:	" TUSER(str) "	" __reg_oper1 ", [%1], #4\n"	) \
- ARM(	"2:	" TUSER(str) "	" __reg_oper0 ", [%1]\n"	) \
- THUMB(	"1:	" TUSER(str) "	" __reg_oper1 ", [%1]\n"	) \
- THUMB(	"2:	" TUSER(str) "	" __reg_oper0 ", [%1, #4]\n"	) \
+ ARM(	"1:	str" __t "	" __reg_oper1 ", [%1], #4\n"  ) \
+ ARM(	"2:	str" __t "	" __reg_oper0 ", [%1]\n"      ) \
+ THUMB(	"1:	str" __t "	" __reg_oper1 ", [%1]\n"      ) \
+ THUMB(	"2:	str" __t "	" __reg_oper0 ", [%1, #4]\n"  ) \
 	"3:\n"							\
 	"	.pushsection .text.fixup,\"ax\"\n"		\
 	"	.align	2\n"					\
@@ -510,7 +511,49 @@ do {									\
 	: "r" (x), "i" (-EFAULT)				\
 	: "cc")
 
-#endif /* !CONFIG_CPU_SPECTRE */
+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label)			\
+do {									\
+	const type *__pk_ptr = (src);					\
+	unsigned long __src = (unsigned long)(__pk_ptr);		\
+	type __val;							\
+	int __err = 0;							\
+	switch (sizeof(type)) {						\
+	case 1:	__get_user_asm_byte(__val, __src, __err, ""); break;	\
+	case 2: __get_user_asm_half(__val, __src, __err, ""); break;	\
+	case 4: __get_user_asm_word(__val, __src, __err, ""); break;	\
+	case 8: {							\
+		u32 *__v32 = (u32*)&__val;				\
+		__get_user_asm_word(__v32[0], __src, __err, "");	\
+		if (__err)						\
+			break;						\
+		__get_user_asm_word(__v32[1], __src+4, __err, "");	\
+		break;							\
+	}								\
+	default: __err = __get_user_bad(); break;			\
+	}								\
+	*(type *)(dst) = __val;						\
+	if (__err)							\
+		goto err_label;						\
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label)			\
+do {									\
+	const type *__pk_ptr = (dst);					\
+	unsigned long __dst = (unsigned long)__pk_ptr;			\
+	int __err = 0;							\
+	type __val = *(type *)src;					\
+	switch (sizeof(type)) {						\
+	case 1: __put_user_asm_byte(__val, __dst, __err, ""); break;	\
+	case 2:	__put_user_asm_half(__val, __dst, __err, ""); break;	\
+	case 4:	__put_user_asm_word(__val, __dst, __err, ""); break;	\
+	case 8:	__put_user_asm_dword(__val, __dst, __err, ""); break;	\
+	default: __err = __put_user_bad(); break;			\
+	}								\
+	if (__err)							\
+		goto err_label;						\
+} while (0)
 
 #ifdef CONFIG_MMU
 extern unsigned long __must_check
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 09/10] ARM: uaccess: remove set_fs() implementation
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (7 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-07-26 14:11 ` [PATCH v5 10/10] ARM: oabi-compat: fix oabi epoll sparse warning Arnd Bergmann
  2021-08-11  6:39 ` [PATCH v5 00/10] ARM: remove set_fs callers and implementation Christoph Hellwig
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij

From: Arnd Bergmann <arnd@arndb.de>

There are no remaining callers of set_fs(), so just remove it
along with all associated code that operates on
thread_info->addr_limit.

There are still further optimizations that can be done:

- In get_user(), the address check could be moved entirely
  into the out of line code, rather than passing a constant
  as an argument,

- I assume the DACR handling can be simplified as we now
  only change it during user access when CONFIG_CPU_SW_DOMAIN_PAN
  is set, but not during set_fs().

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/Kconfig                   |  1 -
 arch/arm/include/asm/ptrace.h      |  1 -
 arch/arm/include/asm/thread_info.h |  4 ---
 arch/arm/include/asm/uaccess-asm.h |  6 ----
 arch/arm/include/asm/uaccess.h     | 46 +++---------------------------
 arch/arm/kernel/asm-offsets.c      |  2 --
 arch/arm/kernel/entry-common.S     | 12 --------
 arch/arm/kernel/process.c          |  7 +----
 arch/arm/kernel/signal.c           |  8 ------
 arch/arm/lib/copy_from_user.S      |  3 +-
 arch/arm/lib/copy_to_user.S        |  3 +-
 11 files changed, 7 insertions(+), 86 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 82f908fa5676..40a3a96b3d4e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -126,7 +126,6 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
-	select SET_FS
 	select SYS_SUPPORTS_APM_EMULATION
 	# Above selects are sorted alphabetically; please add new ones
 	# according to that.  Thanks.
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 91d6b7856be4..93051e2f402c 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -19,7 +19,6 @@ struct pt_regs {
 struct svc_pt_regs {
 	struct pt_regs regs;
 	u32 dacr;
-	u32 addr_limit;
 };
 
 #define to_svc_pt_regs(r) container_of(r, struct svc_pt_regs, regs)
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index 17c56051747b..d89931aed59f 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -31,8 +31,6 @@ struct task_struct;
 
 #include <asm/types.h>
 
-typedef unsigned long mm_segment_t;
-
 struct cpu_context_save {
 	__u32	r4;
 	__u32	r5;
@@ -54,7 +52,6 @@ struct cpu_context_save {
 struct thread_info {
 	unsigned long		flags;		/* low level flags */
 	int			preempt_count;	/* 0 => preemptable, <0 => bug */
-	mm_segment_t		addr_limit;	/* address limit */
 	struct task_struct	*task;		/* main task structure */
 	__u32			cpu;		/* cpu */
 	__u32			cpu_domain;	/* cpu domain */
@@ -80,7 +77,6 @@ struct thread_info {
 	.task		= &tsk,						\
 	.flags		= 0,						\
 	.preempt_count	= INIT_PREEMPT_COUNT,				\
-	.addr_limit	= KERNEL_DS,					\
 }
 
 /*
diff --git a/arch/arm/include/asm/uaccess-asm.h b/arch/arm/include/asm/uaccess-asm.h
index e6eb7a2aaf1e..6451a433912c 100644
--- a/arch/arm/include/asm/uaccess-asm.h
+++ b/arch/arm/include/asm/uaccess-asm.h
@@ -84,12 +84,8 @@
 	 * if \disable is set.
 	 */
 	.macro	uaccess_entry, tsk, tmp0, tmp1, tmp2, disable
-	ldr	\tmp1, [\tsk, #TI_ADDR_LIMIT]
-	ldr	\tmp2, =TASK_SIZE
-	str	\tmp2, [\tsk, #TI_ADDR_LIMIT]
  DACR(	mrc	p15, 0, \tmp0, c3, c0, 0)
  DACR(	str	\tmp0, [sp, #SVC_DACR])
-	str	\tmp1, [sp, #SVC_ADDR_LIMIT]
 	.if \disable && IS_ENABLED(CONFIG_CPU_SW_DOMAIN_PAN)
 	/* kernel=client, user=no access */
 	mov	\tmp2, #DACR_UACCESS_DISABLE
@@ -106,9 +102,7 @@
 
 	/* Restore the user access state previously saved by uaccess_entry */
 	.macro	uaccess_exit, tsk, tmp0, tmp1
-	ldr	\tmp1, [sp, #SVC_ADDR_LIMIT]
  DACR(	ldr	\tmp0, [sp, #SVC_DACR])
-	str	\tmp1, [\tsk, #TI_ADDR_LIMIT]
  DACR(	mcr	p15, 0, \tmp0, c3, c0, 0)
 	.endm
 
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 4f60638755c4..084d1c07c2d0 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -52,32 +52,8 @@ static __always_inline void uaccess_restore(unsigned int flags)
 extern int __get_user_bad(void);
 extern int __put_user_bad(void);
 
-/*
- * Note that this is actually 0x1,0000,0000
- */
-#define KERNEL_DS	0x00000000
-
 #ifdef CONFIG_MMU
 
-#define USER_DS		TASK_SIZE
-#define get_fs()	(current_thread_info()->addr_limit)
-
-static inline void set_fs(mm_segment_t fs)
-{
-	current_thread_info()->addr_limit = fs;
-
-	/*
-	 * Prevent a mispredicted conditional call to set_fs from forwarding
-	 * the wrong address limit to access_ok under speculation.
-	 */
-	dsb(nsh);
-	isb();
-
-	modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER);
-}
-
-#define uaccess_kernel()	(get_fs() == KERNEL_DS)
-
 /*
  * We use 33-bit arithmetic here.  Success returns zero, failure returns
  * addr_limit.  We take advantage that addr_limit will be zero for KERNEL_DS,
@@ -89,7 +65,7 @@ static inline void set_fs(mm_segment_t fs)
 	__asm__(".syntax unified\n" \
 		"adds %1, %2, %3; sbcscc %1, %1, %0; movcc %0, #0" \
 		: "=&r" (flag), "=&r" (roksum) \
-		: "r" (addr), "Ir" (size), "0" (current_thread_info()->addr_limit) \
+		: "r" (addr), "Ir" (size), "0" (TASK_SIZE) \
 		: "cc"); \
 	flag; })
 
@@ -120,7 +96,7 @@ static inline void __user *__uaccess_mask_range_ptr(const void __user *ptr,
 	"	subshs	%1, %1, %2\n"
 	"	movlo	%0, #0\n"
 	: "+r" (safe_ptr), "=&r" (tmp)
-	: "r" (size), "r" (current_thread_info()->addr_limit)
+	: "r" (size), "r" (TASK_SIZE)
 	: "cc");
 
 	csdb();
@@ -194,7 +170,7 @@ extern int __get_user_64t_4(void *);
 
 #define __get_user_check(x, p)						\
 	({								\
-		unsigned long __limit = current_thread_info()->addr_limit - 1; \
+		unsigned long __limit = TASK_SIZE - 1; \
 		register typeof(*(p)) __user *__p asm("r0") = (p);	\
 		register __inttype(x) __r2 asm("r2");			\
 		register unsigned long __l asm("r1") = __limit;		\
@@ -245,7 +221,7 @@ extern int __put_user_8(void *, unsigned long long);
 
 #define __put_user_check(__pu_val, __ptr, __err, __s)			\
 	({								\
-		unsigned long __limit = current_thread_info()->addr_limit - 1; \
+		unsigned long __limit = TASK_SIZE - 1; \
 		register typeof(__pu_val) __r2 asm("r2") = __pu_val;	\
 		register const void __user *__p asm("r0") = __ptr;	\
 		register unsigned long __l asm("r1") = __limit;		\
@@ -262,19 +238,8 @@ extern int __put_user_8(void *, unsigned long long);
 
 #else /* CONFIG_MMU */
 
-/*
- * uClinux has only one addr space, so has simplified address limits.
- */
-#define USER_DS			KERNEL_DS
-
-#define uaccess_kernel()	(true)
 #define __addr_ok(addr)		((void)(addr), 1)
 #define __range_ok(addr, size)	((void)(addr), 0)
-#define get_fs()		(KERNEL_DS)
-
-static inline void set_fs(mm_segment_t fs)
-{
-}
 
 #define get_user(x, p)	__get_user(x, p)
 #define __put_user_check __put_user_nocheck
@@ -283,9 +248,6 @@ static inline void set_fs(mm_segment_t fs)
 
 #define access_ok(addr, size)	(__range_ok(addr, size) == 0)
 
-#define user_addr_max() \
-	(uaccess_kernel() ? ~0UL : get_fs())
-
 #ifdef CONFIG_CPU_SPECTRE
 /*
  * When mitigating Spectre variant 1, it is not worth fixing the non-
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index a0945b898ca3..c60fefabc868 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -43,7 +43,6 @@ int main(void)
   BLANK();
   DEFINE(TI_FLAGS,		offsetof(struct thread_info, flags));
   DEFINE(TI_PREEMPT,		offsetof(struct thread_info, preempt_count));
-  DEFINE(TI_ADDR_LIMIT,		offsetof(struct thread_info, addr_limit));
   DEFINE(TI_TASK,		offsetof(struct thread_info, task));
   DEFINE(TI_CPU,		offsetof(struct thread_info, cpu));
   DEFINE(TI_CPU_DOMAIN,		offsetof(struct thread_info, cpu_domain));
@@ -92,7 +91,6 @@ int main(void)
   DEFINE(S_OLD_R0,		offsetof(struct pt_regs, ARM_ORIG_r0));
   DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
   DEFINE(SVC_DACR,		offsetof(struct svc_pt_regs, dacr));
-  DEFINE(SVC_ADDR_LIMIT,	offsetof(struct svc_pt_regs, addr_limit));
   DEFINE(SVC_REGS_SIZE,		sizeof(struct svc_pt_regs));
   BLANK();
   DEFINE(SIGFRAME_RC3_OFFSET,	offsetof(struct sigframe, retcode[3]));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index e837af90cd44..d9c99db50243 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -49,10 +49,6 @@ __ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
 	disable_irq_notrace			@ disable interrupts
-	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	ldr	r1, =TASK_SIZE
-	cmp	r2, r1
-	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	movs	r1, r1, lsl #16
 	bne	fast_work_pending
@@ -87,10 +83,6 @@ __ret_fast_syscall:
 	bl	do_rseq_syscall
 #endif
 	disable_irq_notrace			@ disable interrupts
-	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	ldr     r1, =TASK_SIZE
-	cmp     r2, r1
-	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	movs	r1, r1, lsl #16
 	beq	no_work_pending
@@ -129,10 +121,6 @@ ret_slow_syscall:
 #endif
 	disable_irq_notrace			@ disable interrupts
 ENTRY(ret_to_user_from_irq)
-	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	ldr     r1, =TASK_SIZE
-	cmp	r2, r1
-	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]
 	movs	r1, r1, lsl #16
 	bne	slow_work_pending
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index fc9e8b37eaa8..581542fbf5bf 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -108,7 +108,7 @@ void __show_regs(struct pt_regs *regs)
 	unsigned long flags;
 	char buf[64];
 #ifndef CONFIG_CPU_V7M
-	unsigned int domain, fs;
+	unsigned int domain;
 #ifdef CONFIG_CPU_SW_DOMAIN_PAN
 	/*
 	 * Get the domain register for the parent context. In user
@@ -117,14 +117,11 @@ void __show_regs(struct pt_regs *regs)
 	 */
 	if (user_mode(regs)) {
 		domain = DACR_UACCESS_ENABLE;
-		fs = get_fs();
 	} else {
 		domain = to_svc_pt_regs(regs)->dacr;
-		fs = to_svc_pt_regs(regs)->addr_limit;
 	}
 #else
 	domain = get_domain();
-	fs = get_fs();
 #endif
 #endif
 
@@ -160,8 +157,6 @@ void __show_regs(struct pt_regs *regs)
 		if ((domain & domain_mask(DOMAIN_USER)) ==
 		    domain_val(DOMAIN_USER, DOMAIN_NOACCESS))
 			segment = "none";
-		else if (fs == KERNEL_DS)
-			segment = "kernel";
 		else
 			segment = "user";
 
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index a3a38d0a4c85..3e7ddac086c2 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -711,14 +711,6 @@ struct page *get_signal_page(void)
 	return page;
 }
 
-/* Defer to generic check */
-asmlinkage void addr_limit_check_failed(void)
-{
-#ifdef CONFIG_MMU
-	addr_limit_user_check();
-#endif
-}
-
 #ifdef CONFIG_DEBUG_RSEQ
 asmlinkage void do_rseq_syscall(struct pt_regs *regs)
 {
diff --git a/arch/arm/lib/copy_from_user.S b/arch/arm/lib/copy_from_user.S
index f8016e3db65d..480a20766137 100644
--- a/arch/arm/lib/copy_from_user.S
+++ b/arch/arm/lib/copy_from_user.S
@@ -109,8 +109,7 @@
 
 ENTRY(arm_copy_from_user)
 #ifdef CONFIG_CPU_SPECTRE
-	get_thread_info r3
-	ldr	r3, [r3, #TI_ADDR_LIMIT]
+	ldr	r3, =TASK_SIZE
 	uaccess_mask_range_ptr r1, r2, r3, ip
 #endif
 
diff --git a/arch/arm/lib/copy_to_user.S b/arch/arm/lib/copy_to_user.S
index ebfe4cb3d912..842ea5ede485 100644
--- a/arch/arm/lib/copy_to_user.S
+++ b/arch/arm/lib/copy_to_user.S
@@ -109,8 +109,7 @@
 ENTRY(__copy_to_user_std)
 WEAK(arm_copy_to_user)
 #ifdef CONFIG_CPU_SPECTRE
-	get_thread_info r3
-	ldr	r3, [r3, #TI_ADDR_LIMIT]
+	ldr	r3, =TASK_SIZE
 	uaccess_mask_range_ptr r0, r2, r3, ip
 #endif
 
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 10/10] ARM: oabi-compat: fix oabi epoll sparse warning
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (8 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 09/10] ARM: uaccess: remove set_fs() implementation Arnd Bergmann
@ 2021-07-26 14:11 ` Arnd Bergmann
  2021-08-11  6:39 ` [PATCH v5 00/10] ARM: remove set_fs callers and implementation Christoph Hellwig
  10 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-07-26 14:11 UTC (permalink / raw)
  To: Russell King
  Cc: Arnd Bergmann, linux-kernel, linux-arm-kernel, linux-arch,
	linux-mm, Alexander Viro, Linus Walleij, kernel test robot

From: Arnd Bergmann <arnd@arndb.de>

As my patches change the oabi epoll definition, I received a report
from the kernel test robot about a pre-existing issue with a mismatched
__poll_t type.

The OABI code was correct when it was initially added in linux-2.16,
but a later (also correct) change to the generic __poll_t triggered a
type mismatch warning from sparse.

As __poll_t is always 32-bit bits wide and otherwise compatible, using
this instead of __u32 in the oabi_epoll_event definition is a valid
workaround.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 8ced390c2b18 ("define __poll_t, annotate constants")
Fixes: ee219b946e4b ("uapi: turn __poll_t sparse checks on by default")
Fixes: 687ad0191488 ("[ARM] 3109/1: old ABI compat: syscall wrappers for ABI impedance matching")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/sys_oabi-compat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 223ee46b6e75..68112c172025 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -274,7 +274,7 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 }
 
 struct oabi_epoll_event {
-	__u32 events;
+	__poll_t events;
 	__u64 data;
 } __attribute__ ((packed,aligned(4)));
 
-- 
2.29.2



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 00/10] ARM: remove set_fs callers and implementation
  2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
                   ` (9 preceding siblings ...)
  2021-07-26 14:11 ` [PATCH v5 10/10] ARM: oabi-compat: fix oabi epoll sparse warning Arnd Bergmann
@ 2021-08-11  6:39 ` Christoph Hellwig
  2021-08-11  7:31   ` Arnd Bergmann
  10 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2021-08-11  6:39 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King, Arnd Bergmann, linux-kernel, linux-arm-kernel,
	linux-arch, linux-mm, Alexander Viro, Linus Walleij

From my high-level perspective this looks great and I'd love to see it
merged:

Acked-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 00/10] ARM: remove set_fs callers and implementation
  2021-08-11  6:39 ` [PATCH v5 00/10] ARM: remove set_fs callers and implementation Christoph Hellwig
@ 2021-08-11  7:31   ` Arnd Bergmann
  0 siblings, 0 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-08-11  7:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Russell King, Arnd Bergmann, Linux Kernel Mailing List,
	Linux ARM, linux-arch, Linux-MM, Alexander Viro, Linus Walleij

On Wed, Aug 11, 2021 at 8:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> From my high-level perspective this looks great and I'd love to see it
> merged:
>
> Acked-by: Christoph Hellwig <hch@lst.de>

Thank you for taking another look, and for the reminder. I've now added the
series to Russell's patch tracker.

       Arnd


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault
  2021-07-26 14:11 ` [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
@ 2022-01-12 17:29   ` Daniel Thompson
       [not found]     ` <Yd8ZEbywqjXkAx9k@shell.armlinux.org.uk>
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Thompson @ 2022-01-12 17:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King, Arnd Bergmann, linux-kernel, linux-arm-kernel,
	linux-arch, linux-mm, Alexander Viro, Linus Walleij

On Mon, Jul 26, 2021 at 04:11:39PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> These mimic the behavior of get_user and put_user, except
> for domain switching, address limit checking and handling
> of mismatched sizes, none of which are relevant here.
> 
> To work with pre-Armv6 kernels, this has to avoid TUSER()
> inside of the new macros, the new approach passes the "t"
> string along with the opcode, which is a bit uglier but
> avoids duplicating more code.
> 
> As there is no __get_user_asm_dword(), I work around it
> by copying 32 bit at a time, which is possible because
> the output size is known.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

I've just been bisecting some regressions running the kgdbts tests on
arm and this patch came up.

It looks like once this patch applies then copy_from_kernel_nofault()
starts faulting when it called from kgdb. I've put an example stack
trace at the bottom of this mail and the most simplified reproduction
I currently have is:
~~~
make multi_v7_defconfig
../scripts/config --enable KGDB --enable KGDB_TESTS
make olddefconfig
make -j `nproc`
qemu-system-arm -M virt -m 1G -nographic \
                -kernel arch/arm/boot/zImage -initrd rootfs.cpio.gz
# Boot and login
echo V1 > /sys/module/kgdbts/parameters/kgdbts
~~~

I suspect this will reproduce on any arm system with CONFIG_KGDB and
CONFIG_KGDB_TESTS enabled simply by running that last echo command...
but I have only tested on QEMU for now.


Daniel.


Stack trace:
~~~
# echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts
[   34.995507] KGDB: Registered I/O driver kgdbts
[   35.038102] kgdbts:RUN plant and detach test

Entering kdb (current=0xd4264380, pid 134) on processor 0 due to Keyboard Entry
[0]kdb> [   35.056005] kgdbts:RUN sw breakpoint test
[   35.062309] kgdbts:RUN bad memory access test
[   35.063619] 8<--- cut here ---
[   35.064022] Unhandled fault: page domain fault (0x01b) at 0x00000000
[   35.064212] pgd = (ptrval)
[   35.064459] [00000000] *pgd=942dc835, *pte=00000000, *ppte=00000000
[   35.065071] Internal error: : 1b [#1] SMP ARM
[   35.065381] KGDB: re-enter exception: ALL breakpoints killed
[   35.065850] ---[ end trace 909d8c43057666be ]---
[   35.066088] 8<--- cut here ---
[   35.066189] Unhandled fault: page domain fault (0x01b) at 0x00000000
[   35.066332] pgd = (ptrval)
[   35.066406] [00000000] *pgd=942dc835, *pte=00000000, *ppte=00000000
[   35.066597] Internal error: : 1b [#2] SMP ARM
[   35.066906] CPU: 0 PID: 134 Comm: sh Tainted: G      D           5.14.0-rc1-00013-g2df4c9a741a0 #60
[   35.067152] Hardware name: ARM-Versatile Express
[   35.067432] [<c0311bdc>] (unwind_backtrace) from [<c030bdc0>] (show_stack+0x10/0x14)
[   35.067880] [<c030bdc0>] (show_stack) from [<c114b9c8>] (dump_stack_lvl+0x58/0x70)
[   35.068054] [<c114b9c8>] (dump_stack_lvl) from [<c0430cdc>] (kgdb_reenter_check+0x104/0x150)
[   35.068213] [<c0430cdc>] (kgdb_reenter_check) from [<c0430dcc>] (kgdb_handle_exception+0xa4/0x114)
[   35.068395] [<c0430dcc>] (kgdb_handle_exception) from [<c0311268>] (kgdb_notify+0x30/0x74)
[   35.068563] [<c0311268>] (kgdb_notify) from [<c037422c>] (atomic_notifier_call_chain+0xac/0x194)
[   35.068745] [<c037422c>] (atomic_notifier_call_chain) from [<c0374370>] (notify_die+0x5c/0xbc)
[   35.068933] [<c0374370>] (notify_die) from [<c030bf04>] (die+0x140/0x544)
[   35.069079] [<c030bf04>] (die) from [<c03164d4>] (do_DataAbort+0xb8/0xbc)
[   35.069220] [<c03164d4>] (do_DataAbort) from [<c0300afc>] (__dabt_svc+0x5c/0xa0)
[   35.069434] Exception stack(0xd4249c10 to 0xd4249c58)
[   35.069616] 9c00:                                     ???????? ???????? ???????? ????????
[   35.069776] 9c20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[   35.069943] 9c40: ???????? ???????? ???????? ???????? ???????? ????????
[   35.070107] [<c0300afc>] (__dabt_svc) from [<c049c8c4>] (copy_from_kernel_nofault+0x114/0x13c)
[   35.070291] [<c049c8c4>] (copy_from_kernel_nofault) from [<c0431688>] (kgdb_mem2hex+0x1c/0x88)
[   35.070463] [<c0431688>] (kgdb_mem2hex) from [<c04322b0>] (gdb_serial_stub+0x8c4/0x1088)
[   35.070640] [<c04322b0>] (gdb_serial_stub) from [<c04302e8>] (kgdb_cpu_enter+0x4f4/0x988)
[   35.070796] [<c04302e8>] (kgdb_cpu_enter) from [<c0430e08>] (kgdb_handle_exception+0xe0/0x114)
[   35.070982] [<c0430e08>] (kgdb_handle_exception) from [<c0311210>] (kgdb_compiled_brk_fn+0x24/0x2c)
[   35.071166] [<c0311210>] (kgdb_compiled_brk_fn) from [<c030c40c>] (do_undefinstr+0x104/0x230)
[   35.071342] [<c030c40c>] (do_undefinstr) from [<c0300c6c>] (__und_svc_finish+0x0/0x54)
[   35.071502] Exception stack(0xd4249dc8 to 0xd4249e10)
[   35.071614] 9dc0:                   ???????? ???????? ???????? ???????? ???????? ????????
[   35.071778] 9de0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[   35.071944] 9e00: ???????? ???????? ???????? ????????
[   35.072054] [<c0300c6c>] (__und_svc_finish) from [<c042fd20>] (kgdb_breakpoint+0x30/0x58)
[   35.072211] [<c042fd20>] (kgdb_breakpoint) from [<c0b14b08>] (configure_kgdbts+0x228/0x68c)
[   35.072395] [<c0b14b08>] (configure_kgdbts) from [<c036fdcc>] (param_attr_store+0x60/0xb8)
[   35.072560] [<c036fdcc>] (param_attr_store) from [<c05bcf14>] (kernfs_fop_write_iter+0x110/0x1d4)
[   35.072745] [<c05bcf14>] (kernfs_fop_write_iter) from [<c050f074>] (vfs_write+0x350/0x508)
[   35.072920] [<c050f074>] (vfs_write) from [<c050f370>] (ksys_write+0x64/0xdc)
[   35.073075] [<c050f370>] (ksys_write) from [<c03000c0>] (ret_fast_syscall+0x0/0x2c)
[   35.073259] Exception stack(0xd4249fa8 to 0xd4249ff0)
[   35.073372] 9fa0:                   ???????? ???????? ???????? ???????? ???????? ????????
[   35.073527] 9fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[   35.073679] 9fe0: ???????? ???????? ???????? ????????
[   35.073960] Kernel panic - not syncing: Recursive entry to debugger
[   36.286118] SMP: failed to stop secondary CPUs
[   36.286568] ---[ end Kernel panic - not syncing: Recursive entry to debugger ]---
~~~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault
       [not found]     ` <Yd8ZEbywqjXkAx9k@shell.armlinux.org.uk>
@ 2022-01-13  9:47       ` Daniel Thompson
  2022-01-13 11:14         ` Arnd Bergmann
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Thompson @ 2022-01-13  9:47 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Arnd Bergmann, Arnd Bergmann, linux-kernel, linux-arm-kernel,
	linux-arch, linux-mm, Alexander Viro, Linus Walleij

On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle) wrote:
> On Wed, Jan 12, 2022 at 05:29:03PM +0000, Daniel Thompson wrote:
> > On Mon, Jul 26, 2021 at 04:11:39PM +0200, Arnd Bergmann wrote:
> > > From: Arnd Bergmann <arnd@arndb.de>
> > > 
> > > These mimic the behavior of get_user and put_user, except
> > > for domain switching, address limit checking and handling
> > > of mismatched sizes, none of which are relevant here.
> > > 
> > > To work with pre-Armv6 kernels, this has to avoid TUSER()
> > > inside of the new macros, the new approach passes the "t"
> > > string along with the opcode, which is a bit uglier but
> > > avoids duplicating more code.
> > > 
> > > As there is no __get_user_asm_dword(), I work around it
> > > by copying 32 bit at a time, which is possible because
> > > the output size is known.
> > > 
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > 
> > I've just been bisecting some regressions running the kgdbts tests on
> > arm and this patch came up.
> 
> So the software PAN code is working :)

Interesting. I noticed it was odd that kgdbts works just fine
if launched from kernel command line. I guess that runs before
PAN is activated. Neat.


> The kernel attempted to access an address that is in the userspace
> domain (NULL pointer) and took an exception.
> 
> I suppose we should handle a domain fault more gracefully - what are
> the required semantics if the kernel attempts a userspace access
> using one of the _nofault() accessors?

I think the best answer might well be that, if the arch provides
implementations of hooks such as copy_from_kernel_nofault_allowed()
then the kernel should never attempt a userspace access using the
_nofault() accessors. That means they can do whatever they like!

In other words something like the patch below looks like a promising
approach.


Daniel.


From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00 2001
From: Daniel Thompson <daniel.thompson@linaro.org>
Date: Thu, 13 Jan 2022 09:34:45 +0000
Subject: [PATCH] arm: mm: Implement copy_from_kernel_nofault_allowed()

Currently copy_from_kernel_nofault() can actually fault (due to software
PAN) if we attempt userspace access. In any case, the documented
behaviour for this function is to return -ERANGE if we attempt an access
outside of kernel space.

Implementing copy_from_kernel_nofault_allowed() solves both these
problems.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
---
 arch/arm/mm/Makefile  | 2 +-
 arch/arm/mm/maccess.c | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/mm/maccess.c

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 3510503bc5e6..d1c5f4f256de 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -3,7 +3,7 @@
 # Makefile for the linux arm-specific parts of the memory manager.
 #
 
-obj-y				:= extable.o fault.o init.o iomap.o
+obj-y				:= extable.o fault.o init.o iomap.o maccess.o
 obj-y				+= dma-mapping$(MMUEXT).o
 obj-$(CONFIG_MMU)		+= fault-armv.o flush.o idmap.o ioremap.o \
 				   mmap.o pgd.o mmu.o pageattr.o
diff --git a/arch/arm/mm/maccess.c b/arch/arm/mm/maccess.c
new file mode 100644
index 000000000000..0251062cb40d
--- /dev/null
+++ b/arch/arm/mm/maccess.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
+{
+	return (unsigned long)unsafe_src >= TASK_SIZE;
+}
-- 
2.33.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault
  2022-01-13  9:47       ` Daniel Thompson
@ 2022-01-13 11:14         ` Arnd Bergmann
  2022-02-01 17:29           ` Daniel Thompson
  0 siblings, 1 reply; 17+ messages in thread
From: Arnd Bergmann @ 2022-01-13 11:14 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Russell King (Oracle),
	Arnd Bergmann, Linux Kernel Mailing List, Linux ARM, linux-arch,
	Linux-MM, Alexander Viro, Linus Walleij

On Thu, Jan 13, 2022 at 10:47 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
> On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle) wrote:
>
> > The kernel attempted to access an address that is in the userspace
> > domain (NULL pointer) and took an exception.
> >
> > I suppose we should handle a domain fault more gracefully - what are
> > the required semantics if the kernel attempts a userspace access
> > using one of the _nofault() accessors?
>
> I think the best answer might well be that, if the arch provides
> implementations of hooks such as copy_from_kernel_nofault_allowed()
> then the kernel should never attempt a userspace access using the
> _nofault() accessors. That means they can do whatever they like!
>
> In other words something like the patch below looks like a promising
> approach.

Right, it seems this is the same as on x86.

> From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00 2001
> From: Daniel Thompson <daniel.thompson@linaro.org>
> Date: Thu, 13 Jan 2022 09:34:45 +0000
> Subject: [PATCH] arm: mm: Implement copy_from_kernel_nofault_allowed()
>
> Currently copy_from_kernel_nofault() can actually fault (due to software
> PAN) if we attempt userspace access. In any case, the documented
> behaviour for this function is to return -ERANGE if we attempt an access
> outside of kernel space.
>
> Implementing copy_from_kernel_nofault_allowed() solves both these
> problems.
>
> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>

Reviewed-by: Arnd Bergmann <arnd@arndb.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault
  2022-01-13 11:14         ` Arnd Bergmann
@ 2022-02-01 17:29           ` Daniel Thompson
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Thompson @ 2022-02-01 17:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Russell King (Oracle),
	Arnd Bergmann, Linux Kernel Mailing List, Linux ARM, linux-arch,
	Linux-MM, Alexander Viro, Linus Walleij

On Thu, Jan 13, 2022 at 12:14:50PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 13, 2022 at 10:47 AM Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
> > On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle) wrote:
> >
> > > The kernel attempted to access an address that is in the userspace
> > > domain (NULL pointer) and took an exception.
> > >
> > > I suppose we should handle a domain fault more gracefully - what are
> > > the required semantics if the kernel attempts a userspace access
> > > using one of the _nofault() accessors?
> >
> > I think the best answer might well be that, if the arch provides
> > implementations of hooks such as copy_from_kernel_nofault_allowed()
> > then the kernel should never attempt a userspace access using the
> > _nofault() accessors. That means they can do whatever they like!
> >
> > In other words something like the patch below looks like a promising
> > approach.
> 
> Right, it seems this is the same as on x86.

Hmnn...

Looking a bit deeper into copy_from_kernel_nofault() there is an odd
asymmetry between copy_to_kernel_nofault(). Basically there is
copy_from_kernel_nofault_allowed() but no corresponding
copy_to_kernel_nofault_allowed() which means we cannot defend memory
pokes using a helper function.

I checked the behaviour of copy_to_kernel_nofault() on arm, arm64, mips,
powerpc, riscv, x86 kernels (which is pretty much everything where I
know how to fire up qemu). All except arm gracefully handle an
attempt to write to userspace (well, NULL actually) with
copy_to_kernel_nofault() so I think there still a few more changes
to fully fix this.

Looks like we would need a slightly more assertive change, either adding
a copy_to_kernel_nofault_allowed() or modifying the arm dabt handlers to
avoid faults on userspace access.

Any views on which is better?


Daniel.

> 
> > From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00 2001
> > From: Daniel Thompson <daniel.thompson@linaro.org>
> > Date: Thu, 13 Jan 2022 09:34:45 +0000
> > Subject: [PATCH] arm: mm: Implement copy_from_kernel_nofault_allowed()
> >
> > Currently copy_from_kernel_nofault() can actually fault (due to software
> > PAN) if we attempt userspace access. In any case, the documented
> > behaviour for this function is to return -ERANGE if we attempt an access
> > outside of kernel space.
> >
> > Implementing copy_from_kernel_nofault_allowed() solves both these
> > problems.
> >
> > Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
> 
> Reviewed-by: Arnd Bergmann <arnd@arndb.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-02-01 17:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 14:11 [PATCH v5 00/10] ARM: remove set_fs callers and implementation Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 01/10] mm/maccess: fix unaligned copy_{from,to}_kernel_nofault Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 02/10] ARM: traps: use get_kernel_nofault instead of set_fs() Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 03/10] ARM: oabi-compat: add epoll_pwait handler Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 04/10] ARM: syscall: always store thread_info->abi_syscall Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 05/10] ARM: oabi-compat: rework epoll_wait/epoll_pwait emulation Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 06/10] ARM: oabi-compat: rework sys_semtimedop emulation Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 07/10] ARM: oabi-compat: rework fcntl64() emulation Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault Arnd Bergmann
2022-01-12 17:29   ` Daniel Thompson
     [not found]     ` <Yd8ZEbywqjXkAx9k@shell.armlinux.org.uk>
2022-01-13  9:47       ` Daniel Thompson
2022-01-13 11:14         ` Arnd Bergmann
2022-02-01 17:29           ` Daniel Thompson
2021-07-26 14:11 ` [PATCH v5 09/10] ARM: uaccess: remove set_fs() implementation Arnd Bergmann
2021-07-26 14:11 ` [PATCH v5 10/10] ARM: oabi-compat: fix oabi epoll sparse warning Arnd Bergmann
2021-08-11  6:39 ` [PATCH v5 00/10] ARM: remove set_fs callers and implementation Christoph Hellwig
2021-08-11  7:31   ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).