linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 0/3] Checkpoint Support for Syscall User Dispatch
@ 2023-03-01 20:58 Gregory Price
  2023-03-01 20:58 ` [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task Gregory Price
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Gregory Price @ 2023-03-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

v13: sizeof consistency and cosmetic changes in patch 2

v12: split test into its own patch
     change from padding a u8 to using a u64
     casting issues
     checkpatch.pl

[truncated version history]

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land.  However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Refactor configuration setting interface to operate on a task
  rather than current, so the set and error paths can be consolidated

- Implement a getter interface for Syscall User Dispatch config info.
  To resume successfully, the checkpoint/resume software has to
  save and restore this information.  Presently this configuration
  is write-only, with no way for C/R software to save it.

  This was done in ptrace because syscall user dispatch is not part of
  uapi. The syscall_user_dispatch_config structure was added to the
  ptrace exports.

- Selftest for the new feature

Gregory Price (3):
  syscall_user_dispatch: helper function to operate on given task
  ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  selftest,ptrace: Add selftest for syscall user dispatch config api

 .../admin-guide/syscall-user-dispatch.rst     |  4 ++
 include/linux/syscall_user_dispatch.h         | 18 +++++
 include/uapi/linux/ptrace.h                   | 29 ++++++++
 kernel/entry/syscall_user_dispatch.c          | 65 ++++++++++++++---
 kernel/ptrace.c                               |  9 +++
 tools/testing/selftests/ptrace/.gitignore     |  1 +
 tools/testing/selftests/ptrace/Makefile       |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c  | 72 +++++++++++++++++++
 8 files changed, 191 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task
  2023-03-01 20:58 [PATCH v13 0/3] Checkpoint Support for Syscall User Dispatch Gregory Price
@ 2023-03-01 20:58 ` Gregory Price
  2023-03-06 18:04   ` Oleg Nesterov
  2023-03-21 15:41   ` Thomas Gleixner
  2023-03-01 20:58 ` [PATCH v13 2/3] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
  2023-03-01 20:58 ` [PATCH v13 3/3] selftest,ptrace: Add selftest for syscall user dispatch config api Gregory Price
  2 siblings, 2 replies; 10+ messages in thread
From: Gregory Price @ 2023-03-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

Preparatory patch ahead of set/get interfaces which will allow a
ptrace to get/set the syscall user dispatch configuration of a task.

This will simplify the set interface and consolidates error paths.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 kernel/entry/syscall_user_dispatch.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..22396b234854 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -68,8 +68,9 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 	return true;
 }
 
-int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
-			      unsigned long len, char __user *selector)
+static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
+					  unsigned long offset, unsigned long len,
+					  char __user *selector)
 {
 	switch (mode) {
 	case PR_SYS_DISPATCH_OFF:
@@ -94,15 +95,21 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		return -EINVAL;
 	}
 
-	current->syscall_dispatch.selector = selector;
-	current->syscall_dispatch.offset = offset;
-	current->syscall_dispatch.len = len;
-	current->syscall_dispatch.on_dispatch = false;
+	task->syscall_dispatch.selector = selector;
+	task->syscall_dispatch.offset = offset;
+	task->syscall_dispatch.len = len;
+	task->syscall_dispatch.on_dispatch = false;
 
 	if (mode == PR_SYS_DISPATCH_ON)
-		set_syscall_work(SYSCALL_USER_DISPATCH);
+		set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 	else
-		clear_syscall_work(SYSCALL_USER_DISPATCH);
+		clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 
 	return 0;
 }
+
+int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
+			      unsigned long len, char __user *selector)
+{
+	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v13 2/3] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-03-01 20:58 [PATCH v13 0/3] Checkpoint Support for Syscall User Dispatch Gregory Price
  2023-03-01 20:58 ` [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task Gregory Price
@ 2023-03-01 20:58 ` Gregory Price
  2023-03-06 18:04   ` Oleg Nesterov
  2023-03-01 20:58 ` [PATCH v13 3/3] selftest,ptrace: Add selftest for syscall user dispatch config api Gregory Price
  2 siblings, 1 reply; 10+ messages in thread
From: Gregory Price @ 2023-03-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 .../admin-guide/syscall-user-dispatch.rst     |  4 ++
 include/linux/syscall_user_dispatch.h         | 18 ++++++++
 include/uapi/linux/ptrace.h                   | 29 +++++++++++++
 kernel/entry/syscall_user_dispatch.c          | 42 +++++++++++++++++++
 kernel/ptrace.c                               |  9 ++++
 5 files changed, 102 insertions(+)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..f7648c08297e 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -73,6 +73,10 @@ thread-wide, without the need to invoke the kernel directly.  selector
 can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK.
 Any other value should terminate the program with a SIGSYS.
 
+Additionally, a task's syscall user dispatch configuration can be peeked
+and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace
+requests. This is useful for checkpoint/restart software.
+
 Security Notes
 --------------
 
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..641ca8880995 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 #define clear_syscall_work_syscall_user_dispatch(tsk) \
 	clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)
 
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
 #else
 struct syscall_user_dispatch {};
 
@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
 {
 }
 
+static inline int syscall_user_dispatch_get_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_GENERIC_ENTRY */
 
 #endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..1e77b02344c3 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,35 @@ struct ptrace_rseq_configuration {
 	__u32 pad;
 };
 
+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+
+/*
+ * struct ptrace_sud_config - Per-task configuration for SUD
+ * @mode:	One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
+ * @selector:	Tracee's user virtual address of SUD selector
+ * @offset:	SUD exclusion area (virtual address)
+ * @len:	Length of SUD exclusion area
+ *
+ * Used to get/set the syscall user dispatch configuration for tracee.
+ * process.  Selector is optional (may be NULL), and if invalid will produce
+ * a SIGSEGV in the tracee upon first access.
+ *
+ * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
+ * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
+ * parameters must be 0.  The value in *selector (if not null), also determines
+ * whether syscall dispatch will occur.
+ *
+ * The SUD Exclusion area described by offset/len is the virtual address space
+ * from which syscalls will not produce a user dispatch.
+ */
+struct ptrace_sud_config {
+	__u64 mode;
+	__u64 selector;
+	__u64 offset;
+	__u64 len;
+};
+
 /*
  * These values are stored in task->ptrace_message
  * by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 22396b234854..5191d2eac168 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
  */
 #include <linux/sched.h>
 #include <linux/prctl.h>
+#include <linux/ptrace.h>
 #include <linux/syscall_user_dispatch.h>
 #include <linux/uaccess.h>
 #include <linux/signal.h>
@@ -113,3 +114,44 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 {
 	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
 }
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data)
+{
+	struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+	struct ptrace_sud_config cfg;
+
+	if (size != sizeof(cfg))
+		return -EINVAL;
+
+	if (test_task_syscall_work(task, SYSCALL_USER_DISPATCH))
+		cfg.mode = PR_SYS_DISPATCH_ON;
+	else
+		cfg.mode = PR_SYS_DISPATCH_OFF;
+
+	cfg.offset = sd->offset;
+	cfg.len = sd->len;
+	cfg.selector = (__u64)(uintptr_t)sd->selector;
+
+	if (copy_to_user(data, &cfg, sizeof(cfg)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data)
+{
+	int rc;
+	struct ptrace_sud_config cfg;
+
+	if (size != sizeof(cfg))
+		return -EINVAL;
+
+	if (copy_from_user(&cfg, data, sizeof(cfg)))
+		return -EFAULT;
+
+	rc = task_set_syscall_user_dispatch(task, cfg.mode, cfg.offset, cfg.len,
+					    (char __user *)(uintptr_t)cfg.selector);
+	return rc;
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0786450074c1..443057bee87c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 #include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <asm/syscall.h>	/* for syscall_get_* */
 
@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 #endif
 
+	case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_set_config(child, addr, datavp);
+		break;
+
+	case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_get_config(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v13 3/3] selftest,ptrace: Add selftest for syscall user dispatch config api
  2023-03-01 20:58 [PATCH v13 0/3] Checkpoint Support for Syscall User Dispatch Gregory Price
  2023-03-01 20:58 ` [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task Gregory Price
  2023-03-01 20:58 ` [PATCH v13 2/3] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
@ 2023-03-01 20:58 ` Gregory Price
  2 siblings, 0 replies; 10+ messages in thread
From: Gregory Price @ 2023-03-01 20:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

Validate that the following new ptrace requests work as expected

* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
  - returns the contents of task->syscall_dispatch if enabled

* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
  - sets the contents of task->syscall_dispatch

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 tools/testing/selftests/ptrace/.gitignore    |  1 +
 tools/testing/selftests/ptrace/Makefile      |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c | 72 ++++++++++++++++++++
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index 792318aaa30c..b7dde152e75a 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 get_syscall_info
+get_set_sud
 peeksiginfo
 vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 96ffa94afb91..1c631740a730 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES)
 
-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud
 
 include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/get_set_sud.c b/tools/testing/selftests/ptrace/get_set_sud.c
new file mode 100644
index 000000000000..5297b10d25c3
--- /dev/null
+++ b/tools/testing/selftests/ptrace/get_set_sud.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+
+#include "linux/ptrace.h"
+
+static int sys_ptrace(int request, pid_t pid, void *addr, void *data)
+{
+	return syscall(SYS_ptrace, request, pid, addr, data);
+}
+
+TEST(get_set_sud)
+{
+	struct ptrace_sud_config config;
+	pid_t child;
+	int ret = 0;
+	int status;
+
+	child = fork();
+	ASSERT_GE(child, 0);
+	if (child == 0) {
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		kill(getpid(), SIGSTOP);
+		_exit(1);
+	}
+
+	waitpid(child, &status, 0);
+
+	memset(&config, 0xff, sizeof(config));
+	config.mode = PR_SYS_DISPATCH_ON;
+
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_OFF);
+	ASSERT_EQ(config.selector, 0);
+	ASSERT_EQ(config.offset, 0);
+	ASSERT_EQ(config.len, 0);
+
+	config.mode = PR_SYS_DISPATCH_ON;
+	config.selector = 0;
+	config.offset = 0x400000;
+	config.len = 0x1000;
+
+	ret = sys_ptrace(PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+
+	memset(&config, 1, sizeof(config));
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void *)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_ON);
+	ASSERT_EQ(config.selector, 0);
+	ASSERT_EQ(config.offset, 0x400000);
+	ASSERT_EQ(config.len, 0x1000);
+
+	kill(child, SIGKILL);
+}
+
+TEST_HARNESS_MAIN
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task
  2023-03-01 20:58 ` [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task Gregory Price
@ 2023-03-06 18:04   ` Oleg Nesterov
  2023-03-21 15:41   ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2023-03-06 18:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, avagin, peterz, luto, krisman, tglx,
	corbet, shuah, Gregory Price

On 03/01, Gregory Price wrote:
>
> Preparatory patch ahead of set/get interfaces which will allow a
> ptrace to get/set the syscall user dispatch configuration of a task.
>
> This will simplify the set interface and consolidates error paths.
>
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v13 2/3] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-03-01 20:58 ` [PATCH v13 2/3] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
@ 2023-03-06 18:04   ` Oleg Nesterov
  0 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2023-03-06 18:04 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, avagin, peterz, luto, krisman, tglx,
	corbet, shuah, Gregory Price

On 03/01, Gregory Price wrote:
>
> Implement ptrace getter/setter interface for syscall user dispatch.
> 
> These prctl settings are presently write-only, making it impossible to
> implement transparent checkpoint/restore via software like CRIU.
> 
> 'on_dispatch' field is not exposed because it is a kernel-internal
> only field that cannot be 'true' when returning to userland.
> 
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task
  2023-03-01 20:58 ` [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task Gregory Price
  2023-03-06 18:04   ` Oleg Nesterov
@ 2023-03-21 15:41   ` Thomas Gleixner
  2023-03-21 16:55     ` Gregory Price
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2023-03-21 15:41 UTC (permalink / raw)
  To: Gregory Price, linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, corbet, shuah,
	Gregory Price, Mark Rutland, Will Deacon

Gregory!

On Wed, Mar 01 2023 at 15:58, Gregory Price wrote:
> +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
> +					  unsigned long offset, unsigned long len,
> +					  char __user *selector)
>  {
>  	switch (mode) {
>  	case PR_SYS_DISPATCH_OFF:
        ...

	case PR_SYS_DISPATCH_ON:
		if (selector && !access_ok(selector, sizeof(*selector)))
			return -EFAULT;

I'm not seing how this can work on ARM64 when user pointer tagging is
enabled in the tracee, but not in the tracer. In such a case, if the
pointer is tagged, access_ok() will fail because access_ok() wont untag
it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task
  2023-03-21 15:41   ` Thomas Gleixner
@ 2023-03-21 16:55     ` Gregory Price
  2023-03-21 19:46       ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Price @ 2023-03-21 16:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gregory Price, linux-kernel, linux-doc, oleg, avagin, peterz,
	luto, krisman, corbet, shuah, Mark Rutland, Will Deacon

On Tue, Mar 21, 2023 at 04:41:37PM +0100, Thomas Gleixner wrote:
> Gregory!
> 
> On Wed, Mar 01 2023 at 15:58, Gregory Price wrote:
> > +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
> > +					  unsigned long offset, unsigned long len,
> > +					  char __user *selector)
> >  {
> >  	switch (mode) {
> >  	case PR_SYS_DISPATCH_OFF:
>         ...
> 
> 	case PR_SYS_DISPATCH_ON:
> 		if (selector && !access_ok(selector, sizeof(*selector)))
> 			return -EFAULT;
> 
> I'm not seing how this can work on ARM64 when user pointer tagging is
> enabled in the tracee, but not in the tracer. In such a case, if the
> pointer is tagged, access_ok() will fail because access_ok() wont untag
> it.
> 
> Thanks,
> 
>         tglx

I see that untagged_addr(x) is available to clear tags, I don't see an
immediate issues with converting to:

!access_ok(untagged_addr(selector), sizeof(*selector))

In both the tracee calling the prctl interface and the tracer calling
the ptrace interface the tag will be cleared, which appears to be the
intended effect.  Just want a sanity check before i push it through, as
I'm not overly familiar with the ARM/tagging ecosystem.

Seems reasoanble that this change should live with this commit, so i'll
plan to squash and push it up if the change is reasonable.

Thanks for your input
~Gregory

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task
  2023-03-21 16:55     ` Gregory Price
@ 2023-03-21 19:46       ` Thomas Gleixner
  2023-03-21 21:12         ` Gregory Price
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2023-03-21 19:46 UTC (permalink / raw)
  To: Gregory Price
  Cc: Gregory Price, linux-kernel, linux-doc, oleg, avagin, peterz,
	luto, krisman, corbet, shuah, Mark Rutland, Will Deacon

On Tue, Mar 21 2023 at 12:55, Gregory Price wrote:
> On Tue, Mar 21, 2023 at 04:41:37PM +0100, Thomas Gleixner wrote:
>> On Wed, Mar 01 2023 at 15:58, Gregory Price wrote:
>> > +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
>> > +					  unsigned long offset, unsigned long len,
>> > +					  char __user *selector)
>> >  {
>> >  	switch (mode) {
>> >  	case PR_SYS_DISPATCH_OFF:
>>         ...
>> 
>> 	case PR_SYS_DISPATCH_ON:
>> 		if (selector && !access_ok(selector, sizeof(*selector)))
>> 			return -EFAULT;
>> 
>> I'm not seing how this can work on ARM64 when user pointer tagging is
>> enabled in the tracee, but not in the tracer. In such a case, if the
>> pointer is tagged, access_ok() will fail because access_ok() wont untag
>> it.
>
> I see that untagged_addr(x) is available to clear tags, I don't see an
> immediate issues with converting to:
>
> !access_ok(untagged_addr(selector), sizeof(*selector))

If this would be correct, then access_ok() on arm64 would
unconditionally untag the checked address, but it does not. Simply
because untagging is only valid if the task enabled pointer tagging. If
it didn't a tagged pointer is obviously invalid.

Why would ptrace make this suddenly valid?

Just because it's in the way of what you want to achieve is not a really
sufficient justification.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task
  2023-03-21 19:46       ` Thomas Gleixner
@ 2023-03-21 21:12         ` Gregory Price
  0 siblings, 0 replies; 10+ messages in thread
From: Gregory Price @ 2023-03-21 21:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gregory Price, linux-kernel, linux-doc, oleg, avagin, peterz,
	luto, krisman, corbet, shuah, Mark Rutland, Will Deacon

On Tue, Mar 21, 2023 at 08:46:26PM +0100, Thomas Gleixner wrote:
> On Tue, Mar 21 2023 at 12:55, Gregory Price wrote:
> > On Tue, Mar 21, 2023 at 04:41:37PM +0100, Thomas Gleixner wrote:
> >> On Wed, Mar 01 2023 at 15:58, Gregory Price wrote:
> >> > +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
> >> > +					  unsigned long offset, unsigned long len,
> >> > +					  char __user *selector)
> >> >  {
> >> >  	switch (mode) {
> >> >  	case PR_SYS_DISPATCH_OFF:
> >>         ...
> >> 
> >> 	case PR_SYS_DISPATCH_ON:
> >> 		if (selector && !access_ok(selector, sizeof(*selector)))
> >> 			return -EFAULT;
> >> 
> >> I'm not seing how this can work on ARM64 when user pointer tagging is
> >> enabled in the tracee, but not in the tracer. In such a case, if the
> >> pointer is tagged, access_ok() will fail because access_ok() wont untag
> >> it.
> >
> > I see that untagged_addr(x) is available to clear tags, I don't see an
> > immediate issues with converting to:
> >
> > !access_ok(untagged_addr(selector), sizeof(*selector))
> 
> If this would be correct, then access_ok() on arm64 would
> unconditionally untag the checked address, but it does not. Simply
> because untagging is only valid if the task enabled pointer tagging. If
> it didn't a tagged pointer is obviously invalid.
> 
> Why would ptrace make this suddenly valid?
> 
> Just because it's in the way of what you want to achieve is not a really
> sufficient justification.
> 
> Thanks,
> 
>         tglx


Ah, I see, The issue stems from this code in arch/arm64/asm/uaccess.h

static inline int access_ok(const void __user *addr, unsigned long size)
{
        /*
         * Asynchronous I/O running in a kernel thread does not have the
         * TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
         * the user address before checking.
         */
        if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
            (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
                addr = untagged_addr(addr);

        return likely(__access_ok(addr, size));
}

The calling task clears the tags if the tagged flag is set.

The problem is that no task_access_ok equivalent exists to validate a
pointer based on another task's settings.



The "clean" way to fix this issue is with a task_access_ok, this keeps
things portable.

On ARM64, it looks like refactoring access_ok into the following: 

static inline int task_access_ok(struct task_struct *task,
                                 const void __user *addr,
				 unsigned long size)
{
        /*
         * Asynchronous I/O running in a kernel thread does not have the
         * TIF_TAGGED_ADDR flag of the process owning the mm, so always untag
         * the user address before checking.
         */
        if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
            (task->flags & PF_KTHREAD || test_ti_thread_flag(task, TIF_TAGGED_ADDR)))
                addr = untagged_addr(addr);

        return likely(__access_ok(addr, size));
}

static inline int access_ok(const void __user *addr, unsigned long size)
{
	return task_access_ok(current, addr, size);
}

#define task_access_ok task_access_ok
#define access_ok access_ok



A similar change is made in include/asm-generic/access_ok.h

If this is an amenable solution, I will pull this into a patch ahead of
the changes in syscall user dispatch.


~Gregory

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-21 21:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01 20:58 [PATCH v13 0/3] Checkpoint Support for Syscall User Dispatch Gregory Price
2023-03-01 20:58 ` [PATCH v13 1/3] syscall_user_dispatch: helper function to operate on given task Gregory Price
2023-03-06 18:04   ` Oleg Nesterov
2023-03-21 15:41   ` Thomas Gleixner
2023-03-21 16:55     ` Gregory Price
2023-03-21 19:46       ` Thomas Gleixner
2023-03-21 21:12         ` Gregory Price
2023-03-01 20:58 ` [PATCH v13 2/3] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
2023-03-06 18:04   ` Oleg Nesterov
2023-03-01 20:58 ` [PATCH v13 3/3] selftest,ptrace: Add selftest for syscall user dispatch config api Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).