All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v14 0/4] Checkpoint Support for Syscall User Dispatch
@ 2023-03-28 16:48 ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

v14: implement task_access_ok variant for cross-task pointer checks
     patch 2/4 changed from access_ok to task_access_ok

v13: sizeof consistency and cosmetic changes in patch 2

v12: split test into its own patch
     change from padding a u8 to using a u64
     casting issues
     checkpatch.pl

[truncating version history]

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land.  However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Create new task_access_ok which leverages the provided task's
  information when validating userland pointers.  For ARM64 this means
  MTE tags are accounted for.  For all other architectures, this simply
  reduces to access_ok (presently).

- Refactor configuration setting interface to operate on a task
  rather than current, so the set and error paths can be consolidated

- Implement a getter interface for Syscall User Dispatch config info.
  To resume successfully, the checkpoint/resume software has to
  save and restore this information.  Presently this configuration
  is write-only, with no way for C/R software to save it.

  This was done in ptrace because syscall user dispatch is not part of
  uapi. The syscall_user_dispatch_config structure was added to the
  ptrace exports.

- Selftest for the new feature

Gregory Price (4):
  asm-generic,arm64: create task variant of access_ok
  syscall_user_dispatch: helper function to operate on given task
  ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  selftest,ptrace: Add selftest for syscall user dispatch config api

 .../admin-guide/syscall-user-dispatch.rst     |  4 ++
 arch/arm64/include/asm/uaccess.h              | 13 +++-
 include/asm-generic/access_ok.h               | 10 +++
 include/linux/syscall_user_dispatch.h         | 18 +++++
 include/uapi/linux/ptrace.h                   | 29 ++++++++
 kernel/entry/syscall_user_dispatch.c          | 67 ++++++++++++++---
 kernel/ptrace.c                               |  9 +++
 tools/testing/selftests/ptrace/.gitignore     |  1 +
 tools/testing/selftests/ptrace/Makefile       |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c  | 72 +++++++++++++++++++
 10 files changed, 213 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v14 0/4] Checkpoint Support for Syscall User Dispatch
@ 2023-03-28 16:48 ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

v14: implement task_access_ok variant for cross-task pointer checks
     patch 2/4 changed from access_ok to task_access_ok

v13: sizeof consistency and cosmetic changes in patch 2

v12: split test into its own patch
     change from padding a u8 to using a u64
     casting issues
     checkpatch.pl

[truncating version history]

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land.  However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Create new task_access_ok which leverages the provided task's
  information when validating userland pointers.  For ARM64 this means
  MTE tags are accounted for.  For all other architectures, this simply
  reduces to access_ok (presently).

- Refactor configuration setting interface to operate on a task
  rather than current, so the set and error paths can be consolidated

- Implement a getter interface for Syscall User Dispatch config info.
  To resume successfully, the checkpoint/resume software has to
  save and restore this information.  Presently this configuration
  is write-only, with no way for C/R software to save it.

  This was done in ptrace because syscall user dispatch is not part of
  uapi. The syscall_user_dispatch_config structure was added to the
  ptrace exports.

- Selftest for the new feature

Gregory Price (4):
  asm-generic,arm64: create task variant of access_ok
  syscall_user_dispatch: helper function to operate on given task
  ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  selftest,ptrace: Add selftest for syscall user dispatch config api

 .../admin-guide/syscall-user-dispatch.rst     |  4 ++
 arch/arm64/include/asm/uaccess.h              | 13 +++-
 include/asm-generic/access_ok.h               | 10 +++
 include/linux/syscall_user_dispatch.h         | 18 +++++
 include/uapi/linux/ptrace.h                   | 29 ++++++++
 kernel/entry/syscall_user_dispatch.c          | 67 ++++++++++++++---
 kernel/ptrace.c                               |  9 +++
 tools/testing/selftests/ptrace/.gitignore     |  1 +
 tools/testing/selftests/ptrace/Makefile       |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c  | 72 +++++++++++++++++++
 10 files changed, 213 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-28 16:48 ` Gregory Price
@ 2023-03-28 16:48   ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

On arm64, access_ok makes adjustments to pointers based on whether
memory tagging is enabled for a task (ARM MTE). When leveraging ptrace,
it's possible for a task to enable/disable various kernel features (such
as syscall user dispatch) which require user points as arguments.

To enable Task A to set these features via ptrace with Task B's
pointers, a task variant of access_ok is required for architectures with
features such as memory tagging.

If the architecture does not implement task_access_ok, the operation
reduces to access_ok and the task argument is discarded.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 arch/arm64/include/asm/uaccess.h | 13 +++++++++++--
 include/asm-generic/access_ok.h  | 10 ++++++++++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 5c7b2f9d5913..1a51a54f264f 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
  * This is equivalent to the following test:
  * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
  */
-static inline int access_ok(const void __user *addr, unsigned long size)
+static inline int task_access_ok(struct task_struct *task,
+				 const void __user *addr,
+				 unsigned long size)
 {
 	/*
 	 * Asynchronous I/O running in a kernel thread does not have the
@@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
 	 * the user address before checking.
 	 */
 	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
-	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
+	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
 		addr = untagged_addr(addr);
 
 	return likely(__access_ok(addr, size));
 }
+
+static inline int access_ok(const void __user *addr, unsigned long size)
+{
+	return task_access_ok(current, addr, size);
+}
+
+#define task_access_ok task_access_ok
 #define access_ok access_ok
 
 #include <asm-generic/access_ok.h>
diff --git a/include/asm-generic/access_ok.h b/include/asm-generic/access_ok.h
index 2866ae61b1cd..31465773c40a 100644
--- a/include/asm-generic/access_ok.h
+++ b/include/asm-generic/access_ok.h
@@ -45,4 +45,14 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
 #define access_ok(addr, size) likely(__access_ok(addr, size))
 #endif
 
+/*
+ * Some architectures may have special features (such as ARM MTE)
+ * that require handling if access_ok is called on a pointer from one
+ * task in the context of another.  On most architectures this operation
+ * is equivalent to simply __access_ok.
+ */
+#ifndef task_access_ok
+#define task_access_ok(task, addr, size) likely(__access_ok(addr, size))
+#endif
+
 #endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-28 16:48   ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

On arm64, access_ok makes adjustments to pointers based on whether
memory tagging is enabled for a task (ARM MTE). When leveraging ptrace,
it's possible for a task to enable/disable various kernel features (such
as syscall user dispatch) which require user points as arguments.

To enable Task A to set these features via ptrace with Task B's
pointers, a task variant of access_ok is required for architectures with
features such as memory tagging.

If the architecture does not implement task_access_ok, the operation
reduces to access_ok and the task argument is discarded.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 arch/arm64/include/asm/uaccess.h | 13 +++++++++++--
 include/asm-generic/access_ok.h  | 10 ++++++++++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 5c7b2f9d5913..1a51a54f264f 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
  * This is equivalent to the following test:
  * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
  */
-static inline int access_ok(const void __user *addr, unsigned long size)
+static inline int task_access_ok(struct task_struct *task,
+				 const void __user *addr,
+				 unsigned long size)
 {
 	/*
 	 * Asynchronous I/O running in a kernel thread does not have the
@@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
 	 * the user address before checking.
 	 */
 	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
-	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
+	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
 		addr = untagged_addr(addr);
 
 	return likely(__access_ok(addr, size));
 }
+
+static inline int access_ok(const void __user *addr, unsigned long size)
+{
+	return task_access_ok(current, addr, size);
+}
+
+#define task_access_ok task_access_ok
 #define access_ok access_ok
 
 #include <asm-generic/access_ok.h>
diff --git a/include/asm-generic/access_ok.h b/include/asm-generic/access_ok.h
index 2866ae61b1cd..31465773c40a 100644
--- a/include/asm-generic/access_ok.h
+++ b/include/asm-generic/access_ok.h
@@ -45,4 +45,14 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
 #define access_ok(addr, size) likely(__access_ok(addr, size))
 #endif
 
+/*
+ * Some architectures may have special features (such as ARM MTE)
+ * that require handling if access_ok is called on a pointer from one
+ * task in the context of another.  On most architectures this operation
+ * is equivalent to simply __access_ok.
+ */
+#ifndef task_access_ok
+#define task_access_ok(task, addr, size) likely(__access_ok(addr, size))
+#endif
+
 #endif
-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 2/4] syscall_user_dispatch: helper function to operate on given task
  2023-03-28 16:48 ` Gregory Price
@ 2023-03-28 16:48   ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

Preparatory patch ahead of set/get interfaces which will allow a
ptrace to get/set the syscall user dispatch configuration of a task.

This will simplify the set interface and consolidates error paths.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/entry/syscall_user_dispatch.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..a0624c3cda75 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -68,8 +68,9 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 	return true;
 }
 
-int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
-			      unsigned long len, char __user *selector)
+static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
+					  unsigned long offset, unsigned long len,
+					  char __user *selector)
 {
 	switch (mode) {
 	case PR_SYS_DISPATCH_OFF:
@@ -86,7 +87,7 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		if (offset && offset + len <= offset)
 			return -EINVAL;
 
-		if (selector && !access_ok(selector, sizeof(*selector)))
+		if (selector && !task_access_ok(task, selector, sizeof(*selector)))
 			return -EFAULT;
 
 		break;
@@ -94,15 +95,21 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		return -EINVAL;
 	}
 
-	current->syscall_dispatch.selector = selector;
-	current->syscall_dispatch.offset = offset;
-	current->syscall_dispatch.len = len;
-	current->syscall_dispatch.on_dispatch = false;
+	task->syscall_dispatch.selector = selector;
+	task->syscall_dispatch.offset = offset;
+	task->syscall_dispatch.len = len;
+	task->syscall_dispatch.on_dispatch = false;
 
 	if (mode == PR_SYS_DISPATCH_ON)
-		set_syscall_work(SYSCALL_USER_DISPATCH);
+		set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 	else
-		clear_syscall_work(SYSCALL_USER_DISPATCH);
+		clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 
 	return 0;
 }
+
+int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
+			      unsigned long len, char __user *selector)
+{
+	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 2/4] syscall_user_dispatch: helper function to operate on given task
@ 2023-03-28 16:48   ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

Preparatory patch ahead of set/get interfaces which will allow a
ptrace to get/set the syscall user dispatch configuration of a task.

This will simplify the set interface and consolidates error paths.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/entry/syscall_user_dispatch.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..a0624c3cda75 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -68,8 +68,9 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 	return true;
 }
 
-int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
-			      unsigned long len, char __user *selector)
+static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
+					  unsigned long offset, unsigned long len,
+					  char __user *selector)
 {
 	switch (mode) {
 	case PR_SYS_DISPATCH_OFF:
@@ -86,7 +87,7 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		if (offset && offset + len <= offset)
 			return -EINVAL;
 
-		if (selector && !access_ok(selector, sizeof(*selector)))
+		if (selector && !task_access_ok(task, selector, sizeof(*selector)))
 			return -EFAULT;
 
 		break;
@@ -94,15 +95,21 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		return -EINVAL;
 	}
 
-	current->syscall_dispatch.selector = selector;
-	current->syscall_dispatch.offset = offset;
-	current->syscall_dispatch.len = len;
-	current->syscall_dispatch.on_dispatch = false;
+	task->syscall_dispatch.selector = selector;
+	task->syscall_dispatch.offset = offset;
+	task->syscall_dispatch.len = len;
+	task->syscall_dispatch.on_dispatch = false;
 
 	if (mode == PR_SYS_DISPATCH_ON)
-		set_syscall_work(SYSCALL_USER_DISPATCH);
+		set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 	else
-		clear_syscall_work(SYSCALL_USER_DISPATCH);
+		clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 
 	return 0;
 }
+
+int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
+			      unsigned long len, char __user *selector)
+{
+	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
+}
-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 3/4] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-03-28 16:48 ` Gregory Price
@ 2023-03-28 16:48   ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 .../admin-guide/syscall-user-dispatch.rst     |  4 ++
 include/linux/syscall_user_dispatch.h         | 18 ++++++++
 include/uapi/linux/ptrace.h                   | 29 +++++++++++++
 kernel/entry/syscall_user_dispatch.c          | 42 +++++++++++++++++++
 kernel/ptrace.c                               |  9 ++++
 5 files changed, 102 insertions(+)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..f7648c08297e 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -73,6 +73,10 @@ thread-wide, without the need to invoke the kernel directly.  selector
 can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK.
 Any other value should terminate the program with a SIGSYS.
 
+Additionally, a task's syscall user dispatch configuration can be peeked
+and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace
+requests. This is useful for checkpoint/restart software.
+
 Security Notes
 --------------
 
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..641ca8880995 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 #define clear_syscall_work_syscall_user_dispatch(tsk) \
 	clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)
 
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
 #else
 struct syscall_user_dispatch {};
 
@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
 {
 }
 
+static inline int syscall_user_dispatch_get_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_GENERIC_ENTRY */
 
 #endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..1e77b02344c3 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,35 @@ struct ptrace_rseq_configuration {
 	__u32 pad;
 };
 
+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+
+/*
+ * struct ptrace_sud_config - Per-task configuration for SUD
+ * @mode:	One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
+ * @selector:	Tracee's user virtual address of SUD selector
+ * @offset:	SUD exclusion area (virtual address)
+ * @len:	Length of SUD exclusion area
+ *
+ * Used to get/set the syscall user dispatch configuration for tracee.
+ * process.  Selector is optional (may be NULL), and if invalid will produce
+ * a SIGSEGV in the tracee upon first access.
+ *
+ * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
+ * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
+ * parameters must be 0.  The value in *selector (if not null), also determines
+ * whether syscall dispatch will occur.
+ *
+ * The SUD Exclusion area described by offset/len is the virtual address space
+ * from which syscalls will not produce a user dispatch.
+ */
+struct ptrace_sud_config {
+	__u64 mode;
+	__u64 selector;
+	__u64 offset;
+	__u64 len;
+};
+
 /*
  * These values are stored in task->ptrace_message
  * by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index a0624c3cda75..af49c44fdc67 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
  */
 #include <linux/sched.h>
 #include <linux/prctl.h>
+#include <linux/ptrace.h>
 #include <linux/syscall_user_dispatch.h>
 #include <linux/uaccess.h>
 #include <linux/signal.h>
@@ -113,3 +114,44 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 {
 	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
 }
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data)
+{
+	struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+	struct ptrace_sud_config cfg;
+
+	if (size != sizeof(cfg))
+		return -EINVAL;
+
+	if (test_task_syscall_work(task, SYSCALL_USER_DISPATCH))
+		cfg.mode = PR_SYS_DISPATCH_ON;
+	else
+		cfg.mode = PR_SYS_DISPATCH_OFF;
+
+	cfg.offset = sd->offset;
+	cfg.len = sd->len;
+	cfg.selector = (__u64)(uintptr_t)sd->selector;
+
+	if (copy_to_user(data, &cfg, sizeof(cfg)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data)
+{
+	int rc;
+	struct ptrace_sud_config cfg;
+
+	if (size != sizeof(cfg))
+		return -EINVAL;
+
+	if (copy_from_user(&cfg, data, sizeof(cfg)))
+		return -EFAULT;
+
+	rc = task_set_syscall_user_dispatch(task, cfg.mode, cfg.offset, cfg.len,
+					    (char __user *)(uintptr_t)cfg.selector);
+	return rc;
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 #include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <asm/syscall.h>	/* for syscall_get_* */
 
@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 #endif
 
+	case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_set_config(child, addr, datavp);
+		break;
+
+	case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_get_config(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 3/4] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
@ 2023-03-28 16:48   ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 .../admin-guide/syscall-user-dispatch.rst     |  4 ++
 include/linux/syscall_user_dispatch.h         | 18 ++++++++
 include/uapi/linux/ptrace.h                   | 29 +++++++++++++
 kernel/entry/syscall_user_dispatch.c          | 42 +++++++++++++++++++
 kernel/ptrace.c                               |  9 ++++
 5 files changed, 102 insertions(+)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..f7648c08297e 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -73,6 +73,10 @@ thread-wide, without the need to invoke the kernel directly.  selector
 can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK.
 Any other value should terminate the program with a SIGSYS.
 
+Additionally, a task's syscall user dispatch configuration can be peeked
+and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace
+requests. This is useful for checkpoint/restart software.
+
 Security Notes
 --------------
 
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..641ca8880995 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 #define clear_syscall_work_syscall_user_dispatch(tsk) \
 	clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)
 
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
 #else
 struct syscall_user_dispatch {};
 
@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
 {
 }
 
+static inline int syscall_user_dispatch_get_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_GENERIC_ENTRY */
 
 #endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..1e77b02344c3 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,35 @@ struct ptrace_rseq_configuration {
 	__u32 pad;
 };
 
+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+
+/*
+ * struct ptrace_sud_config - Per-task configuration for SUD
+ * @mode:	One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
+ * @selector:	Tracee's user virtual address of SUD selector
+ * @offset:	SUD exclusion area (virtual address)
+ * @len:	Length of SUD exclusion area
+ *
+ * Used to get/set the syscall user dispatch configuration for tracee.
+ * process.  Selector is optional (may be NULL), and if invalid will produce
+ * a SIGSEGV in the tracee upon first access.
+ *
+ * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
+ * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
+ * parameters must be 0.  The value in *selector (if not null), also determines
+ * whether syscall dispatch will occur.
+ *
+ * The SUD Exclusion area described by offset/len is the virtual address space
+ * from which syscalls will not produce a user dispatch.
+ */
+struct ptrace_sud_config {
+	__u64 mode;
+	__u64 selector;
+	__u64 offset;
+	__u64 len;
+};
+
 /*
  * These values are stored in task->ptrace_message
  * by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index a0624c3cda75..af49c44fdc67 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
  */
 #include <linux/sched.h>
 #include <linux/prctl.h>
+#include <linux/ptrace.h>
 #include <linux/syscall_user_dispatch.h>
 #include <linux/uaccess.h>
 #include <linux/signal.h>
@@ -113,3 +114,44 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 {
 	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
 }
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data)
+{
+	struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+	struct ptrace_sud_config cfg;
+
+	if (size != sizeof(cfg))
+		return -EINVAL;
+
+	if (test_task_syscall_work(task, SYSCALL_USER_DISPATCH))
+		cfg.mode = PR_SYS_DISPATCH_ON;
+	else
+		cfg.mode = PR_SYS_DISPATCH_OFF;
+
+	cfg.offset = sd->offset;
+	cfg.len = sd->len;
+	cfg.selector = (__u64)(uintptr_t)sd->selector;
+
+	if (copy_to_user(data, &cfg, sizeof(cfg)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data)
+{
+	int rc;
+	struct ptrace_sud_config cfg;
+
+	if (size != sizeof(cfg))
+		return -EINVAL;
+
+	if (copy_from_user(&cfg, data, sizeof(cfg)))
+		return -EFAULT;
+
+	rc = task_set_syscall_user_dispatch(task, cfg.mode, cfg.offset, cfg.len,
+					    (char __user *)(uintptr_t)cfg.selector);
+	return rc;
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 #include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <asm/syscall.h>	/* for syscall_get_* */
 
@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 #endif
 
+	case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_set_config(child, addr, datavp);
+		break;
+
+	case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_get_config(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 4/4] selftest,ptrace: Add selftest for syscall user dispatch config api
  2023-03-28 16:48 ` Gregory Price
@ 2023-03-28 16:48   ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

Validate that the following new ptrace requests work as expected

* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
  - returns the contents of task->syscall_dispatch if enabled

* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
  - sets the contents of task->syscall_dispatch

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 tools/testing/selftests/ptrace/.gitignore    |  1 +
 tools/testing/selftests/ptrace/Makefile      |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c | 72 ++++++++++++++++++++
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index 792318aaa30c..b7dde152e75a 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 get_syscall_info
+get_set_sud
 peeksiginfo
 vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 2f1f532c39db..33a36b73bcb9 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall
 
-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud
 
 include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/get_set_sud.c b/tools/testing/selftests/ptrace/get_set_sud.c
new file mode 100644
index 000000000000..5297b10d25c3
--- /dev/null
+++ b/tools/testing/selftests/ptrace/get_set_sud.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+
+#include "linux/ptrace.h"
+
+static int sys_ptrace(int request, pid_t pid, void *addr, void *data)
+{
+	return syscall(SYS_ptrace, request, pid, addr, data);
+}
+
+TEST(get_set_sud)
+{
+	struct ptrace_sud_config config;
+	pid_t child;
+	int ret = 0;
+	int status;
+
+	child = fork();
+	ASSERT_GE(child, 0);
+	if (child == 0) {
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		kill(getpid(), SIGSTOP);
+		_exit(1);
+	}
+
+	waitpid(child, &status, 0);
+
+	memset(&config, 0xff, sizeof(config));
+	config.mode = PR_SYS_DISPATCH_ON;
+
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_OFF);
+	ASSERT_EQ(config.selector, 0);
+	ASSERT_EQ(config.offset, 0);
+	ASSERT_EQ(config.len, 0);
+
+	config.mode = PR_SYS_DISPATCH_ON;
+	config.selector = 0;
+	config.offset = 0x400000;
+	config.len = 0x1000;
+
+	ret = sys_ptrace(PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+
+	memset(&config, 1, sizeof(config));
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_ON);
+	ASSERT_EQ(config.selector, 0);
+	ASSERT_EQ(config.offset, 0x400000);
+	ASSERT_EQ(config.len, 0x1000);
+
+	kill(child, SIGKILL);
+}
+
+TEST_HARNESS_MAIN
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v14 4/4] selftest,ptrace: Add selftest for syscall user dispatch config api
@ 2023-03-28 16:48   ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-28 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, linux-arm-kernel, linux-arch, oleg, avagin, peterz,
	luto, krisman, tglx, corbet, shuah, catalin.marinas, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

Validate that the following new ptrace requests work as expected

* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
  - returns the contents of task->syscall_dispatch if enabled

* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
  - sets the contents of task->syscall_dispatch

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 tools/testing/selftests/ptrace/.gitignore    |  1 +
 tools/testing/selftests/ptrace/Makefile      |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c | 72 ++++++++++++++++++++
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index 792318aaa30c..b7dde152e75a 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 get_syscall_info
+get_set_sud
 peeksiginfo
 vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 2f1f532c39db..33a36b73bcb9 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall
 
-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud
 
 include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/get_set_sud.c b/tools/testing/selftests/ptrace/get_set_sud.c
new file mode 100644
index 000000000000..5297b10d25c3
--- /dev/null
+++ b/tools/testing/selftests/ptrace/get_set_sud.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+
+#include "linux/ptrace.h"
+
+static int sys_ptrace(int request, pid_t pid, void *addr, void *data)
+{
+	return syscall(SYS_ptrace, request, pid, addr, data);
+}
+
+TEST(get_set_sud)
+{
+	struct ptrace_sud_config config;
+	pid_t child;
+	int ret = 0;
+	int status;
+
+	child = fork();
+	ASSERT_GE(child, 0);
+	if (child == 0) {
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		kill(getpid(), SIGSTOP);
+		_exit(1);
+	}
+
+	waitpid(child, &status, 0);
+
+	memset(&config, 0xff, sizeof(config));
+	config.mode = PR_SYS_DISPATCH_ON;
+
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_OFF);
+	ASSERT_EQ(config.selector, 0);
+	ASSERT_EQ(config.offset, 0);
+	ASSERT_EQ(config.len, 0);
+
+	config.mode = PR_SYS_DISPATCH_ON;
+	config.selector = 0;
+	config.offset = 0x400000;
+	config.len = 0x1000;
+
+	ret = sys_ptrace(PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+
+	memset(&config, 1, sizeof(config));
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_ON);
+	ASSERT_EQ(config.selector, 0);
+	ASSERT_EQ(config.offset, 0x400000);
+	ASSERT_EQ(config.len, 0x1000);
+
+	kill(child, SIGKILL);
+}
+
+TEST_HARNESS_MAIN
-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-28 16:48   ` Gregory Price
@ 2023-03-28 18:46     ` Arnd Bergmann
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnd Bergmann @ 2023-03-28 18:46 UTC (permalink / raw)
  To: Gregory Price, linux-kernel
  Cc: linux-doc, linux-arm-kernel, Linux-Arch, Oleg Nesterov, avagin,
	Peter Zijlstra, Andy Lutomirski, krisman, Thomas Gleixner,
	Jonathan Corbet, shuah, Catalin Marinas, Will Deacon,
	Mark Rutland, tongtiangen, Robin Murphy, Gregory Price

On Tue, Mar 28, 2023, at 18:48, Gregory Price wrote:
> On arm64, access_ok makes adjustments to pointers based on whether
> memory tagging is enabled for a task (ARM MTE). When leveraging ptrace,
> it's possible for a task to enable/disable various kernel features (such
> as syscall user dispatch) which require user points as arguments.
>
> To enable Task A to set these features via ptrace with Task B's
> pointers, a task variant of access_ok is required for architectures with
> features such as memory tagging.
>
> If the architecture does not implement task_access_ok, the operation
> reduces to access_ok and the task argument is discarded.
>
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

For asm-generic:

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-28 18:46     ` Arnd Bergmann
  0 siblings, 0 replies; 44+ messages in thread
From: Arnd Bergmann @ 2023-03-28 18:46 UTC (permalink / raw)
  To: Gregory Price, linux-kernel
  Cc: linux-doc, linux-arm-kernel, Linux-Arch, Oleg Nesterov, avagin,
	Peter Zijlstra, Andy Lutomirski, krisman, Thomas Gleixner,
	Jonathan Corbet, shuah, Catalin Marinas, Will Deacon,
	Mark Rutland, tongtiangen, Robin Murphy, Gregory Price

On Tue, Mar 28, 2023, at 18:48, Gregory Price wrote:
> On arm64, access_ok makes adjustments to pointers based on whether
> memory tagging is enabled for a task (ARM MTE). When leveraging ptrace,
> it's possible for a task to enable/disable various kernel features (such
> as syscall user dispatch) which require user points as arguments.
>
> To enable Task A to set these features via ptrace with Task B's
> pointers, a task variant of access_ok is required for architectures with
> features such as memory tagging.
>
> If the architecture does not implement task_access_ok, the operation
> reduces to access_ok and the task argument is discarded.
>
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

For asm-generic:

Acked-by: Arnd Bergmann <arnd@arndb.de>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 16:03         ` Oleg Nesterov
@ 2023-03-29  3:56           ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  3:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 06:03:23PM +0200, Oleg Nesterov wrote:
> On 03/29, Arnd Bergmann wrote:
> >
> > On Wed, Mar 29, 2023, at 17:15, Oleg Nesterov wrote:
> > >
> > > This look as if access_ok() or __access_ok() doesn't depend on task, but
> > > this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
> > > test_thread_flag(TIF_32BIT...) and this uses "current".
> > >
> > > Again, we probably do not care, but I don't like the fact task_access_ok()
> > > looks as if task_access_ok(task) returns the same result as "task" calling
> > > access_ok().
> >
> > I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
> > takes TIF_32BIT into account.
> 
> Say, arch/loongarch defines TASK_SIZE which depends on test_thread_flag(TIF_32BIT_ADDR)
> but it doesn't define TASK_SIZE_MAX, so __access_ok() will use TASK_SIZE.
> 
> Oleg.
> 

I did not notice this at first.  Thinking of solutions, I'd originally
considered writing a similar change in asm-generic that I made in arm64,
but that would have ultimately resulted in "(void) task;" because task
appears unused.

Now it seems like TASK_SIZE/_MAX seems like a dangerous define
combination that hides relevant functionality. Fixing this seeems to
naturally want a "TASK_TASK_SIZE(task)" which is... uh... annoying.

Not sure how I should proceed here, but this makes me wonder if there
are oversights like this elsewhere, as this seems like a pretty easy
thing to overlook.

~Gregory

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29  3:56           ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  3:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 06:03:23PM +0200, Oleg Nesterov wrote:
> On 03/29, Arnd Bergmann wrote:
> >
> > On Wed, Mar 29, 2023, at 17:15, Oleg Nesterov wrote:
> > >
> > > This look as if access_ok() or __access_ok() doesn't depend on task, but
> > > this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
> > > test_thread_flag(TIF_32BIT...) and this uses "current".
> > >
> > > Again, we probably do not care, but I don't like the fact task_access_ok()
> > > looks as if task_access_ok(task) returns the same result as "task" calling
> > > access_ok().
> >
> > I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
> > takes TIF_32BIT into account.
> 
> Say, arch/loongarch defines TASK_SIZE which depends on test_thread_flag(TIF_32BIT_ADDR)
> but it doesn't define TASK_SIZE_MAX, so __access_ok() will use TASK_SIZE.
> 
> Oleg.
> 

I did not notice this at first.  Thinking of solutions, I'd originally
considered writing a similar change in asm-generic that I made in arm64,
but that would have ultimately resulted in "(void) task;" because task
appears unused.

Now it seems like TASK_SIZE/_MAX seems like a dangerous define
combination that hides relevant functionality. Fixing this seeems to
naturally want a "TASK_TASK_SIZE(task)" which is... uh... annoying.

Not sure how I should proceed here, but this makes me wonder if there
are oversights like this elsewhere, as this seems like a pretty easy
thing to overlook.

~Gregory

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 16:22     ` Catalin Marinas
@ 2023-03-29  4:34       ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  4:34 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	linux-arch, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, arnd, will, mark.rutland, tongtiangen, robin.murphy

On Wed, Mar 29, 2023 at 05:22:49PM +0100, Catalin Marinas wrote:
> On Tue, Mar 28, 2023 at 12:48:08PM -0400, Gregory Price wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 5c7b2f9d5913..1a51a54f264f 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
> >   * This is equivalent to the following test:
> >   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
> >   */
> > -static inline int access_ok(const void __user *addr, unsigned long size)
> > +static inline int task_access_ok(struct task_struct *task,
> > +				 const void __user *addr,
> > +				 unsigned long size)
> >  {
> >  	/*
> >  	 * Asynchronous I/O running in a kernel thread does not have the
> > @@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
> >  	 * the user address before checking.
> >  	 */
> >  	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> > -	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> > +	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
> >  		addr = untagged_addr(addr);
> >  
> >  	return likely(__access_ok(addr, size));
> >  }
> > +
> > +static inline int access_ok(const void __user *addr, unsigned long size)
> > +{
> > +	return task_access_ok(current, addr, size);
> > +}
> > +
> > +#define task_access_ok task_access_ok
> 
> I'd not bother with this at all. In the generic code you can either do
> an __access_ok() check directly or just
> access_ok(untagged_addr(selector), ...) with a comment that address
> tagging of the ptraced task may not be enabled.
> 
> -- 
> Catalin

This was my original proposal, but the comment that lead to this patch
was the following:

"""
If this would be correct, then access_ok() on arm64 would
unconditionally untag the checked address, but it does not. Simply
because untagging is only valid if the task enabled pointer tagging. If
it didn't a tagged pointer is obviously invalid.

Why would ptrace make this suddenly valid?
"""

https://lore.kernel.org/all/87a605anvx.ffs@tglx/

I did not have a sufficient answer for this so I went down this path.

It does seem simpler to simply untag the address, however it didn't seem
like a good solution to simply leave an identified bad edge case.

with access_ok(untagged_addr(addr), ...) it breaks down like this:

(tracer,tracee) : result 

tag,tag     : untagged - (correct)
tag,untag   : untagged - incorrect as this would have been an impossible
              state to reach through the standard prctl interface.  Will
	      lead to a SIGSEGV in the tracee upon next syscall
untag,tag   : untagged - (correct)
untag,untag : no-op - (correct), tagged address will fail to set

Basically if the tracer is a tagged process while the tracee is not, it
would become possible to set the tracee's selector to a tagged pointer.

It's beyond me to say whether or not this situation is "ok" and "the
user's fault", but it does feel like an addressable problem.

Thoughts?
~Gregory

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29  4:34       ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  4:34 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	linux-arch, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, arnd, will, mark.rutland, tongtiangen, robin.murphy

On Wed, Mar 29, 2023 at 05:22:49PM +0100, Catalin Marinas wrote:
> On Tue, Mar 28, 2023 at 12:48:08PM -0400, Gregory Price wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 5c7b2f9d5913..1a51a54f264f 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
> >   * This is equivalent to the following test:
> >   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
> >   */
> > -static inline int access_ok(const void __user *addr, unsigned long size)
> > +static inline int task_access_ok(struct task_struct *task,
> > +				 const void __user *addr,
> > +				 unsigned long size)
> >  {
> >  	/*
> >  	 * Asynchronous I/O running in a kernel thread does not have the
> > @@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
> >  	 * the user address before checking.
> >  	 */
> >  	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> > -	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> > +	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
> >  		addr = untagged_addr(addr);
> >  
> >  	return likely(__access_ok(addr, size));
> >  }
> > +
> > +static inline int access_ok(const void __user *addr, unsigned long size)
> > +{
> > +	return task_access_ok(current, addr, size);
> > +}
> > +
> > +#define task_access_ok task_access_ok
> 
> I'd not bother with this at all. In the generic code you can either do
> an __access_ok() check directly or just
> access_ok(untagged_addr(selector), ...) with a comment that address
> tagging of the ptraced task may not be enabled.
> 
> -- 
> Catalin

This was my original proposal, but the comment that lead to this patch
was the following:

"""
If this would be correct, then access_ok() on arm64 would
unconditionally untag the checked address, but it does not. Simply
because untagging is only valid if the task enabled pointer tagging. If
it didn't a tagged pointer is obviously invalid.

Why would ptrace make this suddenly valid?
"""

https://lore.kernel.org/all/87a605anvx.ffs@tglx/

I did not have a sufficient answer for this so I went down this path.

It does seem simpler to simply untag the address, however it didn't seem
like a good solution to simply leave an identified bad edge case.

with access_ok(untagged_addr(addr), ...) it breaks down like this:

(tracer,tracee) : result 

tag,tag     : untagged - (correct)
tag,untag   : untagged - incorrect as this would have been an impossible
              state to reach through the standard prctl interface.  Will
	      lead to a SIGSEGV in the tracee upon next syscall
untag,tag   : untagged - (correct)
untag,untag : no-op - (correct), tagged address will fail to set

Basically if the tracer is a tagged process while the tracee is not, it
would become possible to set the tracee's selector to a tagged pointer.

It's beyond me to say whether or not this situation is "ok" and "the
user's fault", but it does feel like an addressable problem.

Thoughts?
~Gregory

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 17:13             ` Oleg Nesterov
@ 2023-03-29  5:37               ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  5:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> On 03/28, Gregory Price wrote:
> >
> > Not sure how I should proceed here,
> 
> Can't we just kill this access_ok() in set_syscall_user_dispatch() ?
> I don't think it buys too much.
> 
> Oleg.
> 
> diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
> index 0b6379adff6b..d2e516ece52b 100644
> --- a/kernel/entry/syscall_user_dispatch.c
> +++ b/kernel/entry/syscall_user_dispatch.c
> @@ -43,11 +43,7 @@ bool syscall_user_dispatch(struct pt_regs *regs)
>  		return false;
>  
>  	if (likely(sd->selector)) {
> -		/*
> -		 * access_ok() is performed once, at prctl time, when
> -		 * the selector is loaded by userspace.
> -		 */
> -		if (unlikely(__get_user(state, sd->selector))) {
> +		if (unlikely(get_user(state, sd->selector))) {
>  			force_exit_sig(SIGSEGV);
>  			return true;
>  		}
> @@ -86,9 +82,6 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
>  		if (offset && offset + len <= offset)
>  			return -EINVAL;
>  
> -		if (selector && !access_ok(selector, sizeof(*selector)))
> -			return -EFAULT;
> -
>  		break;
>  	default:
>  		return -EINVAL;
> 

The result of this would be either a task calling via prctl or a tracer
calling via ptrace would be capable of setting selector to a bad pointer
and producing a SIGSEGV on the next system call.

It's a pretty small footgun, but maybe that's reasonable?

From a user perspective, debugging this behavior would be nightmarish.
Your call to prctl/ptrace would succeed and the process would continue
to execute until the next syscall - at which point you incur a SIGSEGV,
and depending on the syscall that could mean anything?

Everything feels bad here lol.

~Gregory

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29  5:37               ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  5:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> On 03/28, Gregory Price wrote:
> >
> > Not sure how I should proceed here,
> 
> Can't we just kill this access_ok() in set_syscall_user_dispatch() ?
> I don't think it buys too much.
> 
> Oleg.
> 
> diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
> index 0b6379adff6b..d2e516ece52b 100644
> --- a/kernel/entry/syscall_user_dispatch.c
> +++ b/kernel/entry/syscall_user_dispatch.c
> @@ -43,11 +43,7 @@ bool syscall_user_dispatch(struct pt_regs *regs)
>  		return false;
>  
>  	if (likely(sd->selector)) {
> -		/*
> -		 * access_ok() is performed once, at prctl time, when
> -		 * the selector is loaded by userspace.
> -		 */
> -		if (unlikely(__get_user(state, sd->selector))) {
> +		if (unlikely(get_user(state, sd->selector))) {
>  			force_exit_sig(SIGSEGV);
>  			return true;
>  		}
> @@ -86,9 +82,6 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
>  		if (offset && offset + len <= offset)
>  			return -EINVAL;
>  
> -		if (selector && !access_ok(selector, sizeof(*selector)))
> -			return -EFAULT;
> -
>  		break;
>  	default:
>  		return -EINVAL;
> 

The result of this would be either a task calling via prctl or a tracer
calling via ptrace would be capable of setting selector to a bad pointer
and producing a SIGSEGV on the next system call.

It's a pretty small footgun, but maybe that's reasonable?

From a user perspective, debugging this behavior would be nightmarish.
Your call to prctl/ptrace would succeed and the process would continue
to execute until the next syscall - at which point you incur a SIGSEGV,
and depending on the syscall that could mean anything?

Everything feels bad here lol.

~Gregory

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 17:58                 ` Oleg Nesterov
@ 2023-03-29  5:54                   ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  5:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 07:58:51PM +0200, Oleg Nesterov wrote:
> On 03/29, Gregory Price wrote:
> >
> > On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> > >
> > > -		if (selector && !access_ok(selector, sizeof(*selector)))
> > > -			return -EFAULT;
> > > -
> > >  		break;
> > >  	default:
> > >  		return -EINVAL;
> > >
> >
> > The result of this would be either a task calling via prctl or a tracer
> > calling via ptrace would be capable of setting selector to a bad pointer
> > and producing a SIGSEGV on the next system call.
> 
> Yes,
> 
> > It's a pretty small footgun, but maybe that's reasonable?
> 
> I hope this is reasonable,
> 
> > From a user perspective, debugging this behavior would be nightmarish.
> > Your call to prctl/ptrace would succeed and the process would continue
> > to execute until the next syscall - at which point you incur a SIGSEGV,
> 
> Yes. But how does this differ from the case when, for example, user
> does prtcl(PR_SET_SYSCALL_USER_DISPATCH, selector = 1) ? Or another
> bad address < TASK_SIZE?
> 
> access_ok() will happily succeed, then later syscall_user_dispatch()
> will equally trigger SIGSEGV.
> 
> Oleg.
> 

I'm convinced now, this feels like the correct solution.  I will pull
your suggested patch ahead and drop the task variant of access_ok.

Am I ok to add your signed-off-by to the suggested patch, and i'll add
it to the series?  Not quite sure what the correct set of tags is,
since i don't have any suggested changes to your patch.

~Gregory

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29  5:54                   ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29  5:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 07:58:51PM +0200, Oleg Nesterov wrote:
> On 03/29, Gregory Price wrote:
> >
> > On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> > >
> > > -		if (selector && !access_ok(selector, sizeof(*selector)))
> > > -			return -EFAULT;
> > > -
> > >  		break;
> > >  	default:
> > >  		return -EINVAL;
> > >
> >
> > The result of this would be either a task calling via prctl or a tracer
> > calling via ptrace would be capable of setting selector to a bad pointer
> > and producing a SIGSEGV on the next system call.
> 
> Yes,
> 
> > It's a pretty small footgun, but maybe that's reasonable?
> 
> I hope this is reasonable,
> 
> > From a user perspective, debugging this behavior would be nightmarish.
> > Your call to prctl/ptrace would succeed and the process would continue
> > to execute until the next syscall - at which point you incur a SIGSEGV,
> 
> Yes. But how does this differ from the case when, for example, user
> does prtcl(PR_SET_SYSCALL_USER_DISPATCH, selector = 1) ? Or another
> bad address < TASK_SIZE?
> 
> access_ok() will happily succeed, then later syscall_user_dispatch()
> will equally trigger SIGSEGV.
> 
> Oleg.
> 

I'm convinced now, this feels like the correct solution.  I will pull
your suggested patch ahead and drop the task variant of access_ok.

Am I ok to add your signed-off-by to the suggested patch, and i'll add
it to the series?  Not quite sure what the correct set of tags is,
since i don't have any suggested changes to your patch.

~Gregory

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 17:58                 ` Oleg Nesterov
@ 2023-03-29 10:02                   ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29 10:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 07:58:51PM +0200, Oleg Nesterov wrote:
> On 03/29, Gregory Price wrote:
> >
> > On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> > >
> > > -		if (selector && !access_ok(selector, sizeof(*selector)))
> > > -			return -EFAULT;
> > > -
> > >  		break;
> > >  	default:
> > >  		return -EINVAL;
> > >
> >
> > The result of this would be either a task calling via prctl or a tracer
> > calling via ptrace would be capable of setting selector to a bad pointer
> > and producing a SIGSEGV on the next system call.
> 
> Yes,
> 
> > It's a pretty small footgun, but maybe that's reasonable?
> 
> I hope this is reasonable,
> 
> > From a user perspective, debugging this behavior would be nightmarish.
> > Your call to prctl/ptrace would succeed and the process would continue
> > to execute until the next syscall - at which point you incur a SIGSEGV,
> 
> Yes. But how does this differ from the case when, for example, user
> does prtcl(PR_SET_SYSCALL_USER_DISPATCH, selector = 1) ? Or another
> bad address < TASK_SIZE?
> 
> access_ok() will happily succeed, then later syscall_user_dispatch()
> will equally trigger SIGSEGV.
> 
> Oleg.
> 

Last note on this before I push up another patch set.

The change from __get_user to get_user also introduces a call to
might_fault() which adds a larger callstack for every syscall /
dispatch.  This turns into a might_sleep and might_reschedule, which
represent a very different pattern of execution from before.

At the very least, syscall-user-dispatch will be less performant as the
selector is validated on every syscall.  I have to assume that is why
they chose to validate it upon activating SUD - to avoid the overhead.

The current cost of a dispatch is about 3-5us (2 context switches + the
signal system).  This could be a small amount of overhead comparatively.
However, this additional overhead would apply to ALL system calls,
regardless of whether they dispatch or not.  That seems concerning for
syscall hotpath code.


So given this, the three options I presently see available are:

1) drop access_ok on SUD setup, validate the pointer on every syscall
   with get_user instead of __get user, or

2) create task_access_ok and deal with the TASK_SIZE implications (or
   not? there seems to be some argument for and against)

3) indescriminately untag all pointers and allow tracers to set selector
   to what otherwise would be a bad value in a particuarly degenerate
   case.  (There seems to be some argument for/against this?)

Will leave this for comment for a day or so before I push another set.

Personally I fall on the side of untagging the pointer for the access_ok
check, as its only real effect is the situation where a tracer has tagging
and enabled, and is tracing an untagged task.  That seems extremely narrow,
and not particularly realistic, and the result is the tracee firing a
SIGSEGV - which is equivalent to allowing the pointer being invalid in
the first place without the additional overhead.

~Gregory

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 10:02                   ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-29 10:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On Wed, Mar 29, 2023 at 07:58:51PM +0200, Oleg Nesterov wrote:
> On 03/29, Gregory Price wrote:
> >
> > On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> > >
> > > -		if (selector && !access_ok(selector, sizeof(*selector)))
> > > -			return -EFAULT;
> > > -
> > >  		break;
> > >  	default:
> > >  		return -EINVAL;
> > >
> >
> > The result of this would be either a task calling via prctl or a tracer
> > calling via ptrace would be capable of setting selector to a bad pointer
> > and producing a SIGSEGV on the next system call.
> 
> Yes,
> 
> > It's a pretty small footgun, but maybe that's reasonable?
> 
> I hope this is reasonable,
> 
> > From a user perspective, debugging this behavior would be nightmarish.
> > Your call to prctl/ptrace would succeed and the process would continue
> > to execute until the next syscall - at which point you incur a SIGSEGV,
> 
> Yes. But how does this differ from the case when, for example, user
> does prtcl(PR_SET_SYSCALL_USER_DISPATCH, selector = 1) ? Or another
> bad address < TASK_SIZE?
> 
> access_ok() will happily succeed, then later syscall_user_dispatch()
> will equally trigger SIGSEGV.
> 
> Oleg.
> 

Last note on this before I push up another patch set.

The change from __get_user to get_user also introduces a call to
might_fault() which adds a larger callstack for every syscall /
dispatch.  This turns into a might_sleep and might_reschedule, which
represent a very different pattern of execution from before.

At the very least, syscall-user-dispatch will be less performant as the
selector is validated on every syscall.  I have to assume that is why
they chose to validate it upon activating SUD - to avoid the overhead.

The current cost of a dispatch is about 3-5us (2 context switches + the
signal system).  This could be a small amount of overhead comparatively.
However, this additional overhead would apply to ALL system calls,
regardless of whether they dispatch or not.  That seems concerning for
syscall hotpath code.


So given this, the three options I presently see available are:

1) drop access_ok on SUD setup, validate the pointer on every syscall
   with get_user instead of __get user, or

2) create task_access_ok and deal with the TASK_SIZE implications (or
   not? there seems to be some argument for and against)

3) indescriminately untag all pointers and allow tracers to set selector
   to what otherwise would be a bad value in a particuarly degenerate
   case.  (There seems to be some argument for/against this?)

Will leave this for comment for a day or so before I push another set.

Personally I fall on the side of untagging the pointer for the access_ok
check, as its only real effect is the situation where a tracer has tagging
and enabled, and is tracing an untagged task.  That seems extremely narrow,
and not particularly realistic, and the result is the tracee firing a
SIGSEGV - which is equivalent to allowing the pointer being invalid in
the first place without the additional overhead.

~Gregory

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-28 16:48   ` Gregory Price
@ 2023-03-29 15:15     ` Oleg Nesterov
  -1 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 15:15 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, linux-arm-kernel, linux-arch, avagin,
	peterz, luto, krisman, tglx, corbet, shuah, catalin.marinas,
	arnd, will, mark.rutland, tongtiangen, robin.murphy,
	Gregory Price

Hmm. I am not comfortable with this change...

I won't really argue because I don't have a better solution and because
I think we don't really care as long as task_set_syscall_user_dispatch()
is the only user of task_access_ok(), but still...

OK, so this version changes set_syscall_user_dispatch() to use
task_access_ok() instead of access_ok() because task != current.

On 03/28, Gregory Price wrote:
>
> If the architecture does not implement task_access_ok, the operation
> reduces to access_ok and the task argument is discarded.

No, with this patch it reduces to __access_ok(). And this already doesn't
look very good to me, but this is minor.

> --- a/include/asm-generic/access_ok.h
> +++ b/include/asm-generic/access_ok.h
> @@ -45,4 +45,14 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
>  #define access_ok(addr, size) likely(__access_ok(addr, size))
>  #endif
>
> +/*
> + * Some architectures may have special features (such as ARM MTE)
> + * that require handling if access_ok is called on a pointer from one
> + * task in the context of another.  On most architectures this operation
> + * is equivalent to simply __access_ok.
> + */
> +#ifndef task_access_ok
> +#define task_access_ok(task, addr, size) likely(__access_ok(addr, size))
> +#endif

Lets ignore arm64.

This look as if access_ok() or __access_ok() doesn't depend on task, but
this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
test_thread_flag(TIF_32BIT...) and this uses "current".

Again, we probably do not care, but I don't like the fact task_access_ok()
looks as if task_access_ok(task) returns the same result as "task" calling
access_ok().

Oleg.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 15:15     ` Oleg Nesterov
  0 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 15:15 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, linux-arm-kernel, linux-arch, avagin,
	peterz, luto, krisman, tglx, corbet, shuah, catalin.marinas,
	arnd, will, mark.rutland, tongtiangen, robin.murphy,
	Gregory Price

Hmm. I am not comfortable with this change...

I won't really argue because I don't have a better solution and because
I think we don't really care as long as task_set_syscall_user_dispatch()
is the only user of task_access_ok(), but still...

OK, so this version changes set_syscall_user_dispatch() to use
task_access_ok() instead of access_ok() because task != current.

On 03/28, Gregory Price wrote:
>
> If the architecture does not implement task_access_ok, the operation
> reduces to access_ok and the task argument is discarded.

No, with this patch it reduces to __access_ok(). And this already doesn't
look very good to me, but this is minor.

> --- a/include/asm-generic/access_ok.h
> +++ b/include/asm-generic/access_ok.h
> @@ -45,4 +45,14 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
>  #define access_ok(addr, size) likely(__access_ok(addr, size))
>  #endif
>
> +/*
> + * Some architectures may have special features (such as ARM MTE)
> + * that require handling if access_ok is called on a pointer from one
> + * task in the context of another.  On most architectures this operation
> + * is equivalent to simply __access_ok.
> + */
> +#ifndef task_access_ok
> +#define task_access_ok(task, addr, size) likely(__access_ok(addr, size))
> +#endif

Lets ignore arm64.

This look as if access_ok() or __access_ok() doesn't depend on task, but
this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
test_thread_flag(TIF_32BIT...) and this uses "current".

Again, we probably do not care, but I don't like the fact task_access_ok()
looks as if task_access_ok(task) returns the same result as "task" calling
access_ok().

Oleg.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 15:15     ` Oleg Nesterov
@ 2023-03-29 15:41       ` Arnd Bergmann
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnd Bergmann @ 2023-03-29 15:41 UTC (permalink / raw)
  To: Oleg Nesterov, Gregory Price
  Cc: linux-kernel, linux-doc, linux-arm-kernel, Linux-Arch, avagin,
	Peter Zijlstra, Andy Lutomirski, krisman, Thomas Gleixner,
	Jonathan Corbet, shuah, Catalin Marinas, Will Deacon,
	Mark Rutland, tongtiangen, Robin Murphy, Gregory Price

On Wed, Mar 29, 2023, at 17:15, Oleg Nesterov wrote:
>
> This look as if access_ok() or __access_ok() doesn't depend on task, but
> this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
> test_thread_flag(TIF_32BIT...) and this uses "current".
>
> Again, we probably do not care, but I don't like the fact task_access_ok()
> looks as if task_access_ok(task) returns the same result as "task" calling
> access_ok().

I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
takes TIF_32BIT into account.

     Arnd

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 15:41       ` Arnd Bergmann
  0 siblings, 0 replies; 44+ messages in thread
From: Arnd Bergmann @ 2023-03-29 15:41 UTC (permalink / raw)
  To: Oleg Nesterov, Gregory Price
  Cc: linux-kernel, linux-doc, linux-arm-kernel, Linux-Arch, avagin,
	Peter Zijlstra, Andy Lutomirski, krisman, Thomas Gleixner,
	Jonathan Corbet, shuah, Catalin Marinas, Will Deacon,
	Mark Rutland, tongtiangen, Robin Murphy, Gregory Price

On Wed, Mar 29, 2023, at 17:15, Oleg Nesterov wrote:
>
> This look as if access_ok() or __access_ok() doesn't depend on task, but
> this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
> test_thread_flag(TIF_32BIT...) and this uses "current".
>
> Again, we probably do not care, but I don't like the fact task_access_ok()
> looks as if task_access_ok(task) returns the same result as "task" calling
> access_ok().

I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
takes TIF_32BIT into account.

     Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 15:41       ` Arnd Bergmann
@ 2023-03-29 16:03         ` Oleg Nesterov
  -1 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 16:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	Linux-Arch, avagin, Peter Zijlstra, Andy Lutomirski, krisman,
	Thomas Gleixner, Jonathan Corbet, shuah, Catalin Marinas,
	Will Deacon, Mark Rutland, tongtiangen, Robin Murphy,
	Gregory Price

On 03/29, Arnd Bergmann wrote:
>
> On Wed, Mar 29, 2023, at 17:15, Oleg Nesterov wrote:
> >
> > This look as if access_ok() or __access_ok() doesn't depend on task, but
> > this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
> > test_thread_flag(TIF_32BIT...) and this uses "current".
> >
> > Again, we probably do not care, but I don't like the fact task_access_ok()
> > looks as if task_access_ok(task) returns the same result as "task" calling
> > access_ok().
>
> I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
> takes TIF_32BIT into account.

Say, arch/loongarch defines TASK_SIZE which depends on test_thread_flag(TIF_32BIT_ADDR)
but it doesn't define TASK_SIZE_MAX, so __access_ok() will use TASK_SIZE.

Oleg.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 16:03         ` Oleg Nesterov
  0 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 16:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	Linux-Arch, avagin, Peter Zijlstra, Andy Lutomirski, krisman,
	Thomas Gleixner, Jonathan Corbet, shuah, Catalin Marinas,
	Will Deacon, Mark Rutland, tongtiangen, Robin Murphy,
	Gregory Price

On 03/29, Arnd Bergmann wrote:
>
> On Wed, Mar 29, 2023, at 17:15, Oleg Nesterov wrote:
> >
> > This look as if access_ok() or __access_ok() doesn't depend on task, but
> > this is not true in general. Say, TASK_SIZE_MAX can check is_32bit_task()
> > test_thread_flag(TIF_32BIT...) and this uses "current".
> >
> > Again, we probably do not care, but I don't like the fact task_access_ok()
> > looks as if task_access_ok(task) returns the same result as "task" calling
> > access_ok().
>
> I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
> takes TIF_32BIT into account.

Say, arch/loongarch defines TASK_SIZE which depends on test_thread_flag(TIF_32BIT_ADDR)
but it doesn't define TASK_SIZE_MAX, so __access_ok() will use TASK_SIZE.

Oleg.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-28 16:48   ` Gregory Price
@ 2023-03-29 16:22     ` Catalin Marinas
  -1 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2023-03-29 16:22 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, linux-arm-kernel, linux-arch, oleg,
	avagin, peterz, luto, krisman, tglx, corbet, shuah, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

On Tue, Mar 28, 2023 at 12:48:08PM -0400, Gregory Price wrote:
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 5c7b2f9d5913..1a51a54f264f 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
>   * This is equivalent to the following test:
>   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
>   */
> -static inline int access_ok(const void __user *addr, unsigned long size)
> +static inline int task_access_ok(struct task_struct *task,
> +				 const void __user *addr,
> +				 unsigned long size)
>  {
>  	/*
>  	 * Asynchronous I/O running in a kernel thread does not have the
> @@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
>  	 * the user address before checking.
>  	 */
>  	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> -	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> +	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
>  		addr = untagged_addr(addr);
>  
>  	return likely(__access_ok(addr, size));
>  }
> +
> +static inline int access_ok(const void __user *addr, unsigned long size)
> +{
> +	return task_access_ok(current, addr, size);
> +}
> +
> +#define task_access_ok task_access_ok

I'd not bother with this at all. In the generic code you can either do
an __access_ok() check directly or just
access_ok(untagged_addr(selector), ...) with a comment that address
tagging of the ptraced task may not be enabled.

-- 
Catalin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 16:22     ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2023-03-29 16:22 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, linux-arm-kernel, linux-arch, oleg,
	avagin, peterz, luto, krisman, tglx, corbet, shuah, arnd, will,
	mark.rutland, tongtiangen, robin.murphy, Gregory Price

On Tue, Mar 28, 2023 at 12:48:08PM -0400, Gregory Price wrote:
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 5c7b2f9d5913..1a51a54f264f 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
>   * This is equivalent to the following test:
>   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
>   */
> -static inline int access_ok(const void __user *addr, unsigned long size)
> +static inline int task_access_ok(struct task_struct *task,
> +				 const void __user *addr,
> +				 unsigned long size)
>  {
>  	/*
>  	 * Asynchronous I/O running in a kernel thread does not have the
> @@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
>  	 * the user address before checking.
>  	 */
>  	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> -	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> +	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
>  		addr = untagged_addr(addr);
>  
>  	return likely(__access_ok(addr, size));
>  }
> +
> +static inline int access_ok(const void __user *addr, unsigned long size)
> +{
> +	return task_access_ok(current, addr, size);
> +}
> +
> +#define task_access_ok task_access_ok

I'd not bother with this at all. In the generic code you can either do
an __access_ok() check directly or just
access_ok(untagged_addr(selector), ...) with a comment that address
tagging of the ptraced task may not be enabled.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29  3:56           ` Gregory Price
@ 2023-03-29 17:13             ` Oleg Nesterov
  -1 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 17:13 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/28, Gregory Price wrote:
>
> Not sure how I should proceed here,

Can't we just kill this access_ok() in set_syscall_user_dispatch() ?
I don't think it buys too much.

Oleg.

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..d2e516ece52b 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -43,11 +43,7 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 		return false;
 
 	if (likely(sd->selector)) {
-		/*
-		 * access_ok() is performed once, at prctl time, when
-		 * the selector is loaded by userspace.
-		 */
-		if (unlikely(__get_user(state, sd->selector))) {
+		if (unlikely(get_user(state, sd->selector))) {
 			force_exit_sig(SIGSEGV);
 			return true;
 		}
@@ -86,9 +82,6 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		if (offset && offset + len <= offset)
 			return -EINVAL;
 
-		if (selector && !access_ok(selector, sizeof(*selector)))
-			return -EFAULT;
-
 		break;
 	default:
 		return -EINVAL;


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 17:13             ` Oleg Nesterov
  0 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 17:13 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/28, Gregory Price wrote:
>
> Not sure how I should proceed here,

Can't we just kill this access_ok() in set_syscall_user_dispatch() ?
I don't think it buys too much.

Oleg.

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..d2e516ece52b 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -43,11 +43,7 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 		return false;
 
 	if (likely(sd->selector)) {
-		/*
-		 * access_ok() is performed once, at prctl time, when
-		 * the selector is loaded by userspace.
-		 */
-		if (unlikely(__get_user(state, sd->selector))) {
+		if (unlikely(get_user(state, sd->selector))) {
 			force_exit_sig(SIGSEGV);
 			return true;
 		}
@@ -86,9 +82,6 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		if (offset && offset + len <= offset)
 			return -EINVAL;
 
-		if (selector && !access_ok(selector, sizeof(*selector)))
-			return -EFAULT;
-
 		break;
 	default:
 		return -EINVAL;


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29  5:37               ` Gregory Price
@ 2023-03-29 17:58                 ` Oleg Nesterov
  -1 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 17:58 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/29, Gregory Price wrote:
>
> On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> >
> > -		if (selector && !access_ok(selector, sizeof(*selector)))
> > -			return -EFAULT;
> > -
> >  		break;
> >  	default:
> >  		return -EINVAL;
> >
>
> The result of this would be either a task calling via prctl or a tracer
> calling via ptrace would be capable of setting selector to a bad pointer
> and producing a SIGSEGV on the next system call.

Yes,

> It's a pretty small footgun, but maybe that's reasonable?

I hope this is reasonable,

> From a user perspective, debugging this behavior would be nightmarish.
> Your call to prctl/ptrace would succeed and the process would continue
> to execute until the next syscall - at which point you incur a SIGSEGV,

Yes. But how does this differ from the case when, for example, user
does prtcl(PR_SET_SYSCALL_USER_DISPATCH, selector = 1) ? Or another
bad address < TASK_SIZE?

access_ok() will happily succeed, then later syscall_user_dispatch()
will equally trigger SIGSEGV.

Oleg.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 17:58                 ` Oleg Nesterov
  0 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 17:58 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/29, Gregory Price wrote:
>
> On Wed, Mar 29, 2023 at 07:13:22PM +0200, Oleg Nesterov wrote:
> >
> > -		if (selector && !access_ok(selector, sizeof(*selector)))
> > -			return -EFAULT;
> > -
> >  		break;
> >  	default:
> >  		return -EINVAL;
> >
>
> The result of this would be either a task calling via prctl or a tracer
> calling via ptrace would be capable of setting selector to a bad pointer
> and producing a SIGSEGV on the next system call.

Yes,

> It's a pretty small footgun, but maybe that's reasonable?

I hope this is reasonable,

> From a user perspective, debugging this behavior would be nightmarish.
> Your call to prctl/ptrace would succeed and the process would continue
> to execute until the next syscall - at which point you incur a SIGSEGV,

Yes. But how does this differ from the case when, for example, user
does prtcl(PR_SET_SYSCALL_USER_DISPATCH, selector = 1) ? Or another
bad address < TASK_SIZE?

access_ok() will happily succeed, then later syscall_user_dispatch()
will equally trigger SIGSEGV.

Oleg.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29  5:54                   ` Gregory Price
@ 2023-03-29 18:13                     ` Oleg Nesterov
  -1 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 18:13 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/29, Gregory Price wrote:
>
> Am I ok to add your signed-off-by to the suggested patch, and i'll add
> it to the series?  Not quite sure what the correct set of tags is,
> since i don't have any suggested changes to your patch.

Feel free to add my SOB, but probably Suggested-by: make more sense.

Oleg.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 18:13                     ` Oleg Nesterov
  0 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 18:13 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/29, Gregory Price wrote:
>
> Am I ok to add your signed-off-by to the suggested patch, and i'll add
> it to the series?  Not quite sure what the correct set of tags is,
> since i don't have any suggested changes to your patch.

Feel free to add my SOB, but probably Suggested-by: make more sense.

Oleg.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 16:03         ` Oleg Nesterov
@ 2023-03-29 18:29           ` Arnd Bergmann
  -1 siblings, 0 replies; 44+ messages in thread
From: Arnd Bergmann @ 2023-03-29 18:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	Linux-Arch, avagin, Peter Zijlstra, Andy Lutomirski, krisman,
	Thomas Gleixner, Jonathan Corbet, shuah, Catalin Marinas,
	Will Deacon, Mark Rutland, tongtiangen, Robin Murphy,
	Gregory Price

On Wed, Mar 29, 2023, at 18:03, Oleg Nesterov wrote:
> On 03/29, Arnd Bergmann wrote:
>>
>> I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
>> takes TIF_32BIT into account.
>
> Say, arch/loongarch defines TASK_SIZE which depends on 
> test_thread_flag(TIF_32BIT_ADDR)
> but it doesn't define TASK_SIZE_MAX, so __access_ok() will use TASK_SIZE.

I'd consider that a bug in loongarch, though it's
as harmless as it gets: The only downside is that
it's missing an optimization from constant-folding
the value, and since there is no CONFIG_COMPAT on
loongarch yet, it doesn't even have a different
value.

TASK_SIZE_MAX become mandatory here when I worked
on the optimized access_ok() across architectures,
and the reason it's safe to use is that access_ok()
has to only guarantee that a task cannot access
data that it can't already access, i.e. kernel
data. Passing a pointer between TASK_SIZE and
TASK_SIZE_MAX will still cause a -EFAULT error
because of the trap.

    Arnd

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 18:29           ` Arnd Bergmann
  0 siblings, 0 replies; 44+ messages in thread
From: Arnd Bergmann @ 2023-03-29 18:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	Linux-Arch, avagin, Peter Zijlstra, Andy Lutomirski, krisman,
	Thomas Gleixner, Jonathan Corbet, shuah, Catalin Marinas,
	Will Deacon, Mark Rutland, tongtiangen, Robin Murphy,
	Gregory Price

On Wed, Mar 29, 2023, at 18:03, Oleg Nesterov wrote:
> On 03/29, Arnd Bergmann wrote:
>>
>> I think the idea of TASK_SIZE_MAX is that it is a compile-time constant and in fact independent of current, while TASK_SIZE
>> takes TIF_32BIT into account.
>
> Say, arch/loongarch defines TASK_SIZE which depends on 
> test_thread_flag(TIF_32BIT_ADDR)
> but it doesn't define TASK_SIZE_MAX, so __access_ok() will use TASK_SIZE.

I'd consider that a bug in loongarch, though it's
as harmless as it gets: The only downside is that
it's missing an optimization from constant-folding
the value, and since there is no CONFIG_COMPAT on
loongarch yet, it doesn't even have a different
value.

TASK_SIZE_MAX become mandatory here when I worked
on the optimized access_ok() across architectures,
and the reason it's safe to use is that access_ok()
has to only guarantee that a task cannot access
data that it can't already access, i.e. kernel
data. Passing a pointer between TASK_SIZE and
TASK_SIZE_MAX will still cause a -EFAULT error
because of the trap.

    Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29 10:02                   ` Gregory Price
@ 2023-03-29 23:54                     ` Oleg Nesterov
  -1 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 23:54 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/29, Gregory Price wrote:
>
> Last note on this before I push up another patch set.
>
> The change from __get_user to get_user also introduces a call to
> might_fault() which adds a larger callstack for every syscall /
> dispatch.  This turns into a might_sleep and might_reschedule, which
> represent a very different pattern of execution from before.

might_fault() is nop unless CONFIG_PROVE_LOCKING || DEBUG_ATOMIC_SLEEP.

Again, I won't really argue with task_access_ok(). Just I am not sure
2/4 gives enough justification for this new helper with unclear semantics
(until we ensure that access_ok() doesn't depend on current).

Oleg.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-29 23:54                     ` Oleg Nesterov
  0 siblings, 0 replies; 44+ messages in thread
From: Oleg Nesterov @ 2023-03-29 23:54 UTC (permalink / raw)
  To: Gregory Price
  Cc: Arnd Bergmann, Gregory Price, linux-kernel, linux-doc,
	linux-arm-kernel, Linux-Arch, avagin, Peter Zijlstra,
	Andy Lutomirski, krisman, Thomas Gleixner, Jonathan Corbet,
	shuah, Catalin Marinas, Will Deacon, Mark Rutland, tongtiangen,
	Robin Murphy

On 03/29, Gregory Price wrote:
>
> Last note on this before I push up another patch set.
>
> The change from __get_user to get_user also introduces a call to
> might_fault() which adds a larger callstack for every syscall /
> dispatch.  This turns into a might_sleep and might_reschedule, which
> represent a very different pattern of execution from before.

might_fault() is nop unless CONFIG_PROVE_LOCKING || DEBUG_ATOMIC_SLEEP.

Again, I won't really argue with task_access_ok(). Just I am not sure
2/4 gives enough justification for this new helper with unclear semantics
(until we ensure that access_ok() doesn't depend on current).

Oleg.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-30 14:05         ` Catalin Marinas
@ 2023-03-30  4:18           ` Gregory Price
  -1 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-30  4:18 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	linux-arch, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, arnd, will, mark.rutland, tongtiangen, robin.murphy

On Thu, Mar 30, 2023 at 03:05:07PM +0100, Catalin Marinas wrote:
> 
> Ah, thanks for the pointer.
> 
> For ptrace(), we live with this relaxation as there's no easy way to
> check. Take __access_remote_vm() for example, it ends up in
> get_user_pages_remote() -> ... -> __get_user_pages() which just untags
> the address. Even if it would want to do this conditionally, the tag
> pointer is enabled per thread (and inherited) but the GUP API only takes
> the mm.
> 
> While we could improve it as ptrace() can tell which thread it is
> tracing, I don't think it's worth the effort. On arm64, top-byte-ignore
> was enabled from the start for in-user accesses but not at the syscall
> level. We wanted to avoid breaking some use-cases with untagging all
> user pointers, hence the explicit opt-in to catch some issues (glibc did
> have a problem with brk() ignoring the top byte -
> https://bugzilla.redhat.com/show_bug.cgi?id=1797052).
> 
> So yeah, this access_ok() in a few places is a best effort to catch some
> potential ABI regressions like the one above and also as a way to force
> the old ABI (mostly) via sysctl. But we do have places like GUP where we
> don't have the thread information (only the mm), so we just
> indiscriminately untag the pointer.
> 
> Note that there is no security risk for the access itself. The Arm
> architecture selects the user vs kernel address spaces based on bit 55
> rather than 63. Untagging a pointer sign-extends bit 55.
> 
> > I did not have a sufficient answer for this so I went down this path.
> > 
> > It does seem simpler to simply untag the address, however it didn't seem
> > like a good solution to simply leave an identified bad edge case.
> > 
> > with access_ok(untagged_addr(addr), ...) it breaks down like this:
> > 
> > (tracer,tracee) : result 
> > 
> > tag,tag     : untagged - (correct)
> > tag,untag   : untagged - incorrect as this would have been an impossible
> >               state to reach through the standard prctl interface.  Will
> > 	      lead to a SIGSEGV in the tracee upon next syscall
> 
> Well, even without untagging the pointer, the tracer can set a random
> address that passes access_ok() but still faults in the tracee. It's the
> tracer that should ensure the pointer is valid in the context of the
> tracee.
> 
> Now, even if the selector pointer is tagged, the accesses still work
> fine (top-byte-ignore) unless MTE is enabled in the tracee and the tag
> should match the region's colour. But, again, that's no different from a
> debugger changing pointer variables in the debugged process, they should
> be valid and it's not for the kernel to sanitise them.
> 
> > untag,tag   : untagged - (correct)
> > untag,untag : no-op - (correct), tagged address will fail to set
> > 
> > Basically if the tracer is a tagged process while the tracee is not, it
> > would become possible to set the tracee's selector to a tagged pointer.
> 
> Yes, but does it matter? You'd trust the tracer to work correctly. There
> are multiple ways it can break the tracee here even if access_ok()
> worked as intended.
> 
> > It's beyond me to say whether or not this situation is "ok" and "the
> > user's fault", but it does feel like an addressable problem.
> 
> To me, the situation looks fine. While it's addressable, we have other
> places where the tag is ignored on the ptrace() path, so I don't think
> it's worth the effort.
> 
> -- 
> Catalin

Thank you for the extensive breakdown.  Given this, it seems like I
should just revert to untagging the pointer and drop the access_ok
extensions.

I'll add a comment at the untag site that discusses this interaction.

Thanks!
Gregory

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-30  4:18           ` Gregory Price
  0 siblings, 0 replies; 44+ messages in thread
From: Gregory Price @ 2023-03-30  4:18 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	linux-arch, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, arnd, will, mark.rutland, tongtiangen, robin.murphy

On Thu, Mar 30, 2023 at 03:05:07PM +0100, Catalin Marinas wrote:
> 
> Ah, thanks for the pointer.
> 
> For ptrace(), we live with this relaxation as there's no easy way to
> check. Take __access_remote_vm() for example, it ends up in
> get_user_pages_remote() -> ... -> __get_user_pages() which just untags
> the address. Even if it would want to do this conditionally, the tag
> pointer is enabled per thread (and inherited) but the GUP API only takes
> the mm.
> 
> While we could improve it as ptrace() can tell which thread it is
> tracing, I don't think it's worth the effort. On arm64, top-byte-ignore
> was enabled from the start for in-user accesses but not at the syscall
> level. We wanted to avoid breaking some use-cases with untagging all
> user pointers, hence the explicit opt-in to catch some issues (glibc did
> have a problem with brk() ignoring the top byte -
> https://bugzilla.redhat.com/show_bug.cgi?id=1797052).
> 
> So yeah, this access_ok() in a few places is a best effort to catch some
> potential ABI regressions like the one above and also as a way to force
> the old ABI (mostly) via sysctl. But we do have places like GUP where we
> don't have the thread information (only the mm), so we just
> indiscriminately untag the pointer.
> 
> Note that there is no security risk for the access itself. The Arm
> architecture selects the user vs kernel address spaces based on bit 55
> rather than 63. Untagging a pointer sign-extends bit 55.
> 
> > I did not have a sufficient answer for this so I went down this path.
> > 
> > It does seem simpler to simply untag the address, however it didn't seem
> > like a good solution to simply leave an identified bad edge case.
> > 
> > with access_ok(untagged_addr(addr), ...) it breaks down like this:
> > 
> > (tracer,tracee) : result 
> > 
> > tag,tag     : untagged - (correct)
> > tag,untag   : untagged - incorrect as this would have been an impossible
> >               state to reach through the standard prctl interface.  Will
> > 	      lead to a SIGSEGV in the tracee upon next syscall
> 
> Well, even without untagging the pointer, the tracer can set a random
> address that passes access_ok() but still faults in the tracee. It's the
> tracer that should ensure the pointer is valid in the context of the
> tracee.
> 
> Now, even if the selector pointer is tagged, the accesses still work
> fine (top-byte-ignore) unless MTE is enabled in the tracee and the tag
> should match the region's colour. But, again, that's no different from a
> debugger changing pointer variables in the debugged process, they should
> be valid and it's not for the kernel to sanitise them.
> 
> > untag,tag   : untagged - (correct)
> > untag,untag : no-op - (correct), tagged address will fail to set
> > 
> > Basically if the tracer is a tagged process while the tracee is not, it
> > would become possible to set the tracee's selector to a tagged pointer.
> 
> Yes, but does it matter? You'd trust the tracer to work correctly. There
> are multiple ways it can break the tracee here even if access_ok()
> worked as intended.
> 
> > It's beyond me to say whether or not this situation is "ok" and "the
> > user's fault", but it does feel like an addressable problem.
> 
> To me, the situation looks fine. While it's addressable, we have other
> places where the tag is ignored on the ptrace() path, so I don't think
> it's worth the effort.
> 
> -- 
> Catalin

Thank you for the extensive breakdown.  Given this, it seems like I
should just revert to untagging the pointer and drop the access_ok
extensions.

I'll add a comment at the untag site that discusses this interaction.

Thanks!
Gregory

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
  2023-03-29  4:34       ` Gregory Price
@ 2023-03-30 14:05         ` Catalin Marinas
  -1 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2023-03-30 14:05 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	linux-arch, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, arnd, will, mark.rutland, tongtiangen, robin.murphy

On Wed, Mar 29, 2023 at 12:34:04AM -0400, Gregory Price wrote:
> On Wed, Mar 29, 2023 at 05:22:49PM +0100, Catalin Marinas wrote:
> > On Tue, Mar 28, 2023 at 12:48:08PM -0400, Gregory Price wrote:
> > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > > index 5c7b2f9d5913..1a51a54f264f 100644
> > > --- a/arch/arm64/include/asm/uaccess.h
> > > +++ b/arch/arm64/include/asm/uaccess.h
> > > @@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
> > >   * This is equivalent to the following test:
> > >   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
> > >   */
> > > -static inline int access_ok(const void __user *addr, unsigned long size)
> > > +static inline int task_access_ok(struct task_struct *task,
> > > +				 const void __user *addr,
> > > +				 unsigned long size)
> > >  {
> > >  	/*
> > >  	 * Asynchronous I/O running in a kernel thread does not have the
> > > @@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
> > >  	 * the user address before checking.
> > >  	 */
> > >  	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> > > -	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> > > +	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
> > >  		addr = untagged_addr(addr);
> > >  
> > >  	return likely(__access_ok(addr, size));
> > >  }
> > > +
> > > +static inline int access_ok(const void __user *addr, unsigned long size)
> > > +{
> > > +	return task_access_ok(current, addr, size);
> > > +}
> > > +
> > > +#define task_access_ok task_access_ok
> > 
> > I'd not bother with this at all. In the generic code you can either do
> > an __access_ok() check directly or just
> > access_ok(untagged_addr(selector), ...) with a comment that address
> > tagging of the ptraced task may not be enabled.
> 
> This was my original proposal, but the comment that lead to this patch
> was the following:
> 
> """
> If this would be correct, then access_ok() on arm64 would
> unconditionally untag the checked address, but it does not. Simply
> because untagging is only valid if the task enabled pointer tagging. If
> it didn't a tagged pointer is obviously invalid.
> 
> Why would ptrace make this suddenly valid?
> """
> 
> https://lore.kernel.org/all/87a605anvx.ffs@tglx/

Ah, thanks for the pointer.

For ptrace(), we live with this relaxation as there's no easy way to
check. Take __access_remote_vm() for example, it ends up in
get_user_pages_remote() -> ... -> __get_user_pages() which just untags
the address. Even if it would want to do this conditionally, the tag
pointer is enabled per thread (and inherited) but the GUP API only takes
the mm.

While we could improve it as ptrace() can tell which thread it is
tracing, I don't think it's worth the effort. On arm64, top-byte-ignore
was enabled from the start for in-user accesses but not at the syscall
level. We wanted to avoid breaking some use-cases with untagging all
user pointers, hence the explicit opt-in to catch some issues (glibc did
have a problem with brk() ignoring the top byte -
https://bugzilla.redhat.com/show_bug.cgi?id=1797052).

So yeah, this access_ok() in a few places is a best effort to catch some
potential ABI regressions like the one above and also as a way to force
the old ABI (mostly) via sysctl. But we do have places like GUP where we
don't have the thread information (only the mm), so we just
indiscriminately untag the pointer.

Note that there is no security risk for the access itself. The Arm
architecture selects the user vs kernel address spaces based on bit 55
rather than 63. Untagging a pointer sign-extends bit 55.

> I did not have a sufficient answer for this so I went down this path.
> 
> It does seem simpler to simply untag the address, however it didn't seem
> like a good solution to simply leave an identified bad edge case.
> 
> with access_ok(untagged_addr(addr), ...) it breaks down like this:
> 
> (tracer,tracee) : result 
> 
> tag,tag     : untagged - (correct)
> tag,untag   : untagged - incorrect as this would have been an impossible
>               state to reach through the standard prctl interface.  Will
> 	      lead to a SIGSEGV in the tracee upon next syscall

Well, even without untagging the pointer, the tracer can set a random
address that passes access_ok() but still faults in the tracee. It's the
tracer that should ensure the pointer is valid in the context of the
tracee.

Now, even if the selector pointer is tagged, the accesses still work
fine (top-byte-ignore) unless MTE is enabled in the tracee and the tag
should match the region's colour. But, again, that's no different from a
debugger changing pointer variables in the debugged process, they should
be valid and it's not for the kernel to sanitise them.

> untag,tag   : untagged - (correct)
> untag,untag : no-op - (correct), tagged address will fail to set
> 
> Basically if the tracer is a tagged process while the tracee is not, it
> would become possible to set the tracee's selector to a tagged pointer.

Yes, but does it matter? You'd trust the tracer to work correctly. There
are multiple ways it can break the tracee here even if access_ok()
worked as intended.

> It's beyond me to say whether or not this situation is "ok" and "the
> user's fault", but it does feel like an addressable problem.

To me, the situation looks fine. While it's addressable, we have other
places where the tag is ignored on the ptrace() path, so I don't think
it's worth the effort.

-- 
Catalin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok
@ 2023-03-30 14:05         ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2023-03-30 14:05 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, linux-kernel, linux-doc, linux-arm-kernel,
	linux-arch, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, arnd, will, mark.rutland, tongtiangen, robin.murphy

On Wed, Mar 29, 2023 at 12:34:04AM -0400, Gregory Price wrote:
> On Wed, Mar 29, 2023 at 05:22:49PM +0100, Catalin Marinas wrote:
> > On Tue, Mar 28, 2023 at 12:48:08PM -0400, Gregory Price wrote:
> > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > > index 5c7b2f9d5913..1a51a54f264f 100644
> > > --- a/arch/arm64/include/asm/uaccess.h
> > > +++ b/arch/arm64/include/asm/uaccess.h
> > > @@ -35,7 +35,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
> > >   * This is equivalent to the following test:
> > >   * (u65)addr + (u65)size <= (u65)TASK_SIZE_MAX
> > >   */
> > > -static inline int access_ok(const void __user *addr, unsigned long size)
> > > +static inline int task_access_ok(struct task_struct *task,
> > > +				 const void __user *addr,
> > > +				 unsigned long size)
> > >  {
> > >  	/*
> > >  	 * Asynchronous I/O running in a kernel thread does not have the
> > > @@ -43,11 +45,18 @@ static inline int access_ok(const void __user *addr, unsigned long size)
> > >  	 * the user address before checking.
> > >  	 */
> > >  	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> > > -	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> > > +	    (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
> > >  		addr = untagged_addr(addr);
> > >  
> > >  	return likely(__access_ok(addr, size));
> > >  }
> > > +
> > > +static inline int access_ok(const void __user *addr, unsigned long size)
> > > +{
> > > +	return task_access_ok(current, addr, size);
> > > +}
> > > +
> > > +#define task_access_ok task_access_ok
> > 
> > I'd not bother with this at all. In the generic code you can either do
> > an __access_ok() check directly or just
> > access_ok(untagged_addr(selector), ...) with a comment that address
> > tagging of the ptraced task may not be enabled.
> 
> This was my original proposal, but the comment that lead to this patch
> was the following:
> 
> """
> If this would be correct, then access_ok() on arm64 would
> unconditionally untag the checked address, but it does not. Simply
> because untagging is only valid if the task enabled pointer tagging. If
> it didn't a tagged pointer is obviously invalid.
> 
> Why would ptrace make this suddenly valid?
> """
> 
> https://lore.kernel.org/all/87a605anvx.ffs@tglx/

Ah, thanks for the pointer.

For ptrace(), we live with this relaxation as there's no easy way to
check. Take __access_remote_vm() for example, it ends up in
get_user_pages_remote() -> ... -> __get_user_pages() which just untags
the address. Even if it would want to do this conditionally, the tag
pointer is enabled per thread (and inherited) but the GUP API only takes
the mm.

While we could improve it as ptrace() can tell which thread it is
tracing, I don't think it's worth the effort. On arm64, top-byte-ignore
was enabled from the start for in-user accesses but not at the syscall
level. We wanted to avoid breaking some use-cases with untagging all
user pointers, hence the explicit opt-in to catch some issues (glibc did
have a problem with brk() ignoring the top byte -
https://bugzilla.redhat.com/show_bug.cgi?id=1797052).

So yeah, this access_ok() in a few places is a best effort to catch some
potential ABI regressions like the one above and also as a way to force
the old ABI (mostly) via sysctl. But we do have places like GUP where we
don't have the thread information (only the mm), so we just
indiscriminately untag the pointer.

Note that there is no security risk for the access itself. The Arm
architecture selects the user vs kernel address spaces based on bit 55
rather than 63. Untagging a pointer sign-extends bit 55.

> I did not have a sufficient answer for this so I went down this path.
> 
> It does seem simpler to simply untag the address, however it didn't seem
> like a good solution to simply leave an identified bad edge case.
> 
> with access_ok(untagged_addr(addr), ...) it breaks down like this:
> 
> (tracer,tracee) : result 
> 
> tag,tag     : untagged - (correct)
> tag,untag   : untagged - incorrect as this would have been an impossible
>               state to reach through the standard prctl interface.  Will
> 	      lead to a SIGSEGV in the tracee upon next syscall

Well, even without untagging the pointer, the tracer can set a random
address that passes access_ok() but still faults in the tracee. It's the
tracer that should ensure the pointer is valid in the context of the
tracee.

Now, even if the selector pointer is tagged, the accesses still work
fine (top-byte-ignore) unless MTE is enabled in the tracee and the tag
should match the region's colour. But, again, that's no different from a
debugger changing pointer variables in the debugged process, they should
be valid and it's not for the kernel to sanitise them.

> untag,tag   : untagged - (correct)
> untag,untag : no-op - (correct), tagged address will fail to set
> 
> Basically if the tracer is a tagged process while the tracee is not, it
> would become possible to set the tracee's selector to a tagged pointer.

Yes, but does it matter? You'd trust the tracer to work correctly. There
are multiple ways it can break the tracee here even if access_ok()
worked as intended.

> It's beyond me to say whether or not this situation is "ok" and "the
> user's fault", but it does feel like an addressable problem.

To me, the situation looks fine. While it's addressable, we have other
places where the tag is ignored on the ptrace() path, so I don't think
it's worth the effort.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-03-30 14:44 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 16:48 [PATCH v14 0/4] Checkpoint Support for Syscall User Dispatch Gregory Price
2023-03-28 16:48 ` Gregory Price
2023-03-28 16:48 ` [PATCH v14 1/4] asm-generic,arm64: create task variant of access_ok Gregory Price
2023-03-28 16:48   ` Gregory Price
2023-03-28 18:46   ` Arnd Bergmann
2023-03-28 18:46     ` Arnd Bergmann
2023-03-29 15:15   ` Oleg Nesterov
2023-03-29 15:15     ` Oleg Nesterov
2023-03-29 15:41     ` Arnd Bergmann
2023-03-29 15:41       ` Arnd Bergmann
2023-03-29 16:03       ` Oleg Nesterov
2023-03-29 16:03         ` Oleg Nesterov
2023-03-29  3:56         ` Gregory Price
2023-03-29  3:56           ` Gregory Price
2023-03-29 17:13           ` Oleg Nesterov
2023-03-29 17:13             ` Oleg Nesterov
2023-03-29  5:37             ` Gregory Price
2023-03-29  5:37               ` Gregory Price
2023-03-29 17:58               ` Oleg Nesterov
2023-03-29 17:58                 ` Oleg Nesterov
2023-03-29  5:54                 ` Gregory Price
2023-03-29  5:54                   ` Gregory Price
2023-03-29 18:13                   ` Oleg Nesterov
2023-03-29 18:13                     ` Oleg Nesterov
2023-03-29 10:02                 ` Gregory Price
2023-03-29 10:02                   ` Gregory Price
2023-03-29 23:54                   ` Oleg Nesterov
2023-03-29 23:54                     ` Oleg Nesterov
2023-03-29 18:29         ` Arnd Bergmann
2023-03-29 18:29           ` Arnd Bergmann
2023-03-29 16:22   ` Catalin Marinas
2023-03-29 16:22     ` Catalin Marinas
2023-03-29  4:34     ` Gregory Price
2023-03-29  4:34       ` Gregory Price
2023-03-30 14:05       ` Catalin Marinas
2023-03-30 14:05         ` Catalin Marinas
2023-03-30  4:18         ` Gregory Price
2023-03-30  4:18           ` Gregory Price
2023-03-28 16:48 ` [PATCH v14 2/4] syscall_user_dispatch: helper function to operate on given task Gregory Price
2023-03-28 16:48   ` Gregory Price
2023-03-28 16:48 ` [PATCH v14 3/4] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
2023-03-28 16:48   ` Gregory Price
2023-03-28 16:48 ` [PATCH v14 4/4] selftest,ptrace: Add selftest for syscall user dispatch config api Gregory Price
2023-03-28 16:48   ` Gregory Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.