linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/1] Checkpoint Support for Syscall User Dispatch
@ 2023-02-10  7:25 Gregory Price
  2023-02-10  7:25 ` [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Price @ 2023-02-10  7:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

v9: tglx feedback
    whitespace
    documentation of ptrace struct
    shorten struct name
    helper function for set_syscall_user_dispatch
    use task variant of set/clear_syscall_work 
    use task variant of test_syscall_work in getter
    selftest

v8: fix include issue   Reported-by: kernel test robot <lkp@intel.com>
    summary:
    +++ b/kernel/entry/syscall_user_dispatch.c
    + include <linux/ptrace.h>

v7: drop ptrace suspend flag, not required
    hanging unreferenced variable
                whitespace

v6: drop fs/proc/array update, it's not needed
    drop on_dispatch field exposure in config structure, it's not
    checkpoint relevant.
    (Thank you for the reviews Oleg and Andrei)

v5: automated test for !defined(GENERIC_ENTRY) failed, fix fs/proc
    use ifdef for GENERIC_ENTRY || TIF_SYSCALL_USER_DISPATCH
    note: syscall user dispatch is not presently supported for
          non-generic entry, but could be implemented. question is
          whether the TIF_ define should be carved out now or then

v4: Whitespace
    s/CHECKPOINT_RESTART/CHECKPOINT_RESUME
    check test_syscall_work(SYSCALL_USER_DISPATCH) to determine if it's
    turned on or not in fs/proc/array and getter interface

v3: Kernel test robot static function fix
    Whitespace nitpicks

v2: Implements the getter/setter interface in ptrace rather than prctl

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land.  However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Implement a getter interface for Syscall User Dispatch config info.
  To resume successfully, the checkpoint/resume software has to
  save and restore this information.  Presently this configuration
  is write-only, with no way for C/R software to save it.

  This was done in ptrace because syscall user dispatch is not part of
  uapi. The syscall_user_dispatch_config structure was added to the
  ptrace exports.


Gregory Price (1):
  ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

 .../admin-guide/syscall-user-dispatch.rst     |  5 +-
 include/linux/syscall_user_dispatch.h         | 18 +++++
 include/uapi/linux/ptrace.h                   | 29 +++++++
 kernel/entry/syscall_user_dispatch.c          | 63 +++++++++++++--
 kernel/ptrace.c                               |  9 +++
 tools/testing/selftests/ptrace/.gitignore     |  1 +
 tools/testing/selftests/ptrace/Makefile       |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c  | 80 +++++++++++++++++++
 8 files changed, 197 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-10  7:25 [PATCH v9 0/1] Checkpoint Support for Syscall User Dispatch Gregory Price
@ 2023-02-10  7:25 ` Gregory Price
  2023-02-13 20:26   ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Price @ 2023-02-10  7:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 .../admin-guide/syscall-user-dispatch.rst     |  5 +-
 include/linux/syscall_user_dispatch.h         | 18 +++++
 include/uapi/linux/ptrace.h                   | 29 +++++++
 kernel/entry/syscall_user_dispatch.c          | 63 +++++++++++++--
 kernel/ptrace.c                               |  9 +++
 tools/testing/selftests/ptrace/.gitignore     |  1 +
 tools/testing/selftests/ptrace/Makefile       |  2 +-
 tools/testing/selftests/ptrace/get_set_sud.c  | 80 +++++++++++++++++++
 8 files changed, 197 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/get_set_sud.c

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..a23ae21a1d5b 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -43,7 +43,10 @@ doesn't rely on any of the syscall ABI to make the filtering.  It uses
 only the syscall dispatcher address and the userspace key.
 
 As the ABI of these intercepted syscalls is unknown to Linux, these
-syscalls are not instrumentable via ptrace or the syscall tracepoints.
+syscalls are not instrumentable via ptrace or the syscall tracepoints,
+however an interfaces to suspend, checkpoint, and restore syscall user
+dispatch configuration has been added to ptrace to assist userland
+checkpoint/restart software.
 
 Interface
 ---------
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..641ca8880995 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 #define clear_syscall_work_syscall_user_dispatch(tsk) \
 	clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)
 
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+				     void __user *data);
+
 #else
 struct syscall_user_dispatch {};
 
@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
 {
 }
 
+static inline int syscall_user_dispatch_get_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task,
+						   unsigned long size, void __user *data)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_GENERIC_ENTRY */
 
 #endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..e79739e86a68 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,35 @@ struct ptrace_rseq_configuration {
 	__u32 pad;
 };
 
+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+
+/*
+ * struct ptrace_sud_config - Per-task configuration for SUD
+ * @mode:	One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
+ * @selector:	Tracee's user virtual address of SUD selector
+ * @offset:	SUD exclusion area (virtual address)
+ * @len:	Length of SUD exclusion area
+ *
+ * Used to get/set the syscall user dispatch configuration for tracee.
+ * process.  Selector is optional (may be NULL), and if invalid will produce
+ * a SIGSEGV in the tracee upon first access.
+ *
+ * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
+ * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
+ * parameters must be 0.  The value in *selector (if not null), also determines
+ * whether syscall dispatch will occur.
+ *
+ * The SUD Exclusion area described by offset/len is the virtual address space
+ * from which syscalls will not produce a user dispatch.
+ */
+struct ptrace_sud_config {
+	__u64 mode;
+	__s8 *selector;
+	__u64 offset;
+	__u64 len;
+};
+
 /*
  * These values are stored in task->ptrace_message
  * by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..7fffcf1a53cb 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
  */
 #include <linux/sched.h>
 #include <linux/prctl.h>
+#include <linux/ptrace.h>
 #include <linux/syscall_user_dispatch.h>
 #include <linux/uaccess.h>
 #include <linux/signal.h>
@@ -68,8 +69,9 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 	return true;
 }
 
-int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
-			      unsigned long len, char __user *selector)
+static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
+		                          unsigned long offset, unsigned long len,
+		                          char __user *selector)
 {
 	switch (mode) {
 	case PR_SYS_DISPATCH_OFF:
@@ -94,15 +96,60 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 		return -EINVAL;
 	}
 
-	current->syscall_dispatch.selector = selector;
-	current->syscall_dispatch.offset = offset;
-	current->syscall_dispatch.len = len;
-	current->syscall_dispatch.on_dispatch = false;
+	task->syscall_dispatch.selector = selector;
+	task->syscall_dispatch.offset = offset;
+	task->syscall_dispatch.len = len;
+	task->syscall_dispatch.on_dispatch = false;
 
 	if (mode == PR_SYS_DISPATCH_ON)
-		set_syscall_work(SYSCALL_USER_DISPATCH);
+		set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 	else
-		clear_syscall_work(SYSCALL_USER_DISPATCH);
+		clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
 
 	return 0;
 }
+
+int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
+		              unsigned long len, char __user *selector)
+{
+	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
+}
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+		                     void __user *data)
+{
+	struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+	struct ptrace_sud_config config;
+
+	if (size != sizeof(struct ptrace_sud_config))
+		return -EINVAL;
+
+	if (test_task_syscall_work(task, SYSCALL_USER_DISPATCH))
+		config.mode = PR_SYS_DISPATCH_ON;
+	else
+		config.mode = PR_SYS_DISPATCH_OFF;
+
+	config.offset = sd->offset;
+	config.len = sd->len;
+	config.selector = sd->selector;
+
+	if (copy_to_user(data, &config, sizeof(config)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+		                     void __user *data)
+{
+	struct ptrace_sud_config config;
+
+	if (size != sizeof(struct ptrace_sud_config))
+		return -EINVAL;
+
+	if (copy_from_user(&config, data, sizeof(config)))
+		return -EFAULT;
+
+	return task_set_syscall_user_dispatch(task, config.mode, config.offset,
+	                                      config.len, config.selector);
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 #include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <asm/syscall.h>	/* for syscall_get_* */
 
@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 #endif
 
+	case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_set_config(child, addr, datavp);
+		break;
+
+	case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_get_config(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index 792318aaa30c..b7dde152e75a 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 get_syscall_info
+get_set_sud
 peeksiginfo
 vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 2f1f532c39db..33a36b73bcb9 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall
 
-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
+TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud
 
 include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/get_set_sud.c b/tools/testing/selftests/ptrace/get_set_sud.c
new file mode 100644
index 000000000000..b22ab35cbca7
--- /dev/null
+++ b/tools/testing/selftests/ptrace/get_set_sud.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include "../kselftest_harness.h"
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+
+#include "linux/ptrace.h"
+
+static int sys_ptrace(int request, pid_t pid, void *addr, void *data)
+{
+	return syscall(SYS_ptrace, request, pid, addr, data);
+}
+
+TEST(get_set_sud)
+{
+	struct ptrace_sud_config config;
+	pid_t child;
+	int ret = 0;
+	int status;
+
+	child = fork();
+	ASSERT_GE(child, 0);
+	if (child == 0) {
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		kill(getpid(), SIGSTOP);
+		sleep(2);
+		_exit(1);
+	}
+
+	waitpid(child, &status, 0);
+
+	config.mode = PR_SYS_DISPATCH_ON;
+	config.selector = (void*)0xDEADBEEF;
+	config.offset = 0x12345678;
+	config.len = 0x87654321;
+
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void*)sizeof(config), &config);
+	if (ret < 0) {
+		ASSERT_EQ(errno, EIO);
+		goto leave;
+	}
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_OFF);
+	ASSERT_EQ(config.selector, (void*)NULL);
+	ASSERT_EQ(config.offset, 0);
+	ASSERT_EQ(config.len, 0);
+
+	config.mode = PR_SYS_DISPATCH_ON;
+	config.selector = (void*)NULL;
+	config.offset = 0x400000;
+	config.len = 0x1000;
+
+	ret = sys_ptrace(PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void*)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+
+	memset(&config, 1, sizeof(config));
+	ret = sys_ptrace(PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG, child,
+			 (void*)sizeof(config), &config);
+
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(config.mode, PR_SYS_DISPATCH_ON);
+	ASSERT_EQ(config.selector, (void*)NULL);
+	ASSERT_EQ(config.offset, 0x400000);
+	ASSERT_EQ(config.len, 0x1000);
+
+leave:
+	kill(child, SIGKILL);
+}
+
+TEST_HARNESS_MAIN
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-10  7:25 ` [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
@ 2023-02-13 20:26   ` Thomas Gleixner
  2023-02-13 21:00     ` Gregory Price
  2023-02-14  0:00     ` Gregory Price
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2023-02-13 20:26 UTC (permalink / raw)
  To: Gregory Price, linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, corbet, shuah,
	Gregory Price

On Fri, Feb 10 2023 at 02:25, Gregory Price wrote:
>  
>  As the ABI of these intercepted syscalls is unknown to Linux, these
> -syscalls are not instrumentable via ptrace or the syscall tracepoints.
> +syscalls are not instrumentable via ptrace or the syscall tracepoints,
> +however an interfaces to suspend, checkpoint, and restore syscall user
> +dispatch configuration has been added to ptrace to assist userland
> +checkpoint/restart software.

The above is incomprehensible word salad to me. Once the ptrace
functions are there then they can be used independent of CRIU, no?

> + * struct ptrace_sud_config - Per-task configuration for SUD
> + * @mode:	One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
> + * @selector:	Tracee's user virtual address of SUD selector
> + * @offset:	SUD exclusion area (virtual address)
> + * @len:	Length of SUD exclusion area
> + *
> + * Used to get/set the syscall user dispatch configuration for tracee.
> + * process.  Selector is optional (may be NULL), and if invalid will produce
> + * a SIGSEGV in the tracee upon first access.
> + *
> + * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
> + * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
> + * parameters must be 0.  The value in *selector (if not null), also determines
> + * whether syscall dispatch will occur.
> + *
> + * The SUD Exclusion area described by offset/len is the virtual address space
> + * from which syscalls will not produce a user dispatch.
> + */
> +struct ptrace_sud_config {
> +	__u64 mode;
> +	__s8 *selector;

How is this correct for a 32bit ptracer running on a 64bit kernel? Aside
of not wiring up the compat syscall without any argumentation in the
changelog.


> --- a/kernel/entry/syscall_user_dispatch.c
> +++ b/kernel/entry/syscall_user_dispatch.c

This section:

> -int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
> -			      unsigned long len, char __user *selector)
> +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
> +		                          unsigned long offset, unsigned long len,
> +		                          char __user *selector)
>  {
>  	switch (mode) {
>  	case PR_SYS_DISPATCH_OFF:
> @@ -94,15 +96,60 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
>  		return -EINVAL;
>  	}
>  
> -	current->syscall_dispatch.selector = selector;
> -	current->syscall_dispatch.offset = offset;
> -	current->syscall_dispatch.len = len;
> -	current->syscall_dispatch.on_dispatch = false;
> +	task->syscall_dispatch.selector = selector;
> +	task->syscall_dispatch.offset = offset;
> +	task->syscall_dispatch.len = len;
> +	task->syscall_dispatch.on_dispatch = false;
>  
>  	if (mode == PR_SYS_DISPATCH_ON)
> -		set_syscall_work(SYSCALL_USER_DISPATCH);
> +		set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
>  	else
> -		clear_syscall_work(SYSCALL_USER_DISPATCH);
> +		clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
>  
>  	return 0;
>  }
> +
> +int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
> +		              unsigned long len, char __user *selector)
> +{
> +	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
> +}

until here want's to be a seperate preparatory patch.

> +++ b/tools/testing/selftests/ptrace/get_set_sud.c
> +	child = fork();
> +	ASSERT_GE(child, 0);
> +	if (child == 0) {
> +		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
> +			TH_LOG("PTRACE_TRACEME: %m");
> +		}
> +		kill(getpid(), SIGSTOP);
> +		sleep(2);

The purpose of this sleep is what?

> +		_exit(1);
> +	}
> +
> +	waitpid(child, &status, 0);
> +
> +	config.mode = PR_SYS_DISPATCH_ON;
> +	config.selector = (void*)0xDEADBEEF;
> +	config.offset = 0x12345678;
> +	config.len = 0x87654321;

What's the purpose of these magic numbers? memset(&config, 0xff,...) is
sufficient, no?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-13 20:26   ` Thomas Gleixner
@ 2023-02-13 21:00     ` Gregory Price
  2023-02-14  0:00     ` Gregory Price
  1 sibling, 0 replies; 6+ messages in thread
From: Gregory Price @ 2023-02-13 21:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gregory Price, linux-kernel, linux-doc, oleg, avagin, peterz,
	luto, krisman, corbet, shuah

On Mon, Feb 13, 2023 at 09:26:21PM +0100, Thomas Gleixner wrote:
> On Fri, Feb 10 2023 at 02:25, Gregory Price wrote:
> >  
> >  As the ABI of these intercepted syscalls is unknown to Linux, these
> > -syscalls are not instrumentable via ptrace or the syscall tracepoints.
> > +syscalls are not instrumentable via ptrace or the syscall tracepoints,
> > +however an interfaces to suspend, checkpoint, and restore syscall user
> > +dispatch configuration has been added to ptrace to assist userland
> > +checkpoint/restart software.
> 
> The above is incomprehensible word salad to me. Once the ptrace
> functions are there then they can be used independent of CRIU, no?
> 

The verbiage here is half-baked, I'll just break out a separate
paragraph to explain better (or drop entirely, if that's preferable).

Since SUD isn't really designed for anything other than syscall
emulation, there's not much of a use for these get/set interfaces
outside the context of checkpoint/restart.  GDB and friends are already
perfectly happy and capable of debugging SUD enabled software in the
absense of these interfaces and have no need to disable the feature.

> > + * struct ptrace_sud_config - Per-task configuration for SUD
> > + * @mode:	One of PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF
> > + * @selector:	Tracee's user virtual address of SUD selector
> > + * @offset:	SUD exclusion area (virtual address)
> > + * @len:	Length of SUD exclusion area
> > + *
> > + * Used to get/set the syscall user dispatch configuration for tracee.
> > + * process.  Selector is optional (may be NULL), and if invalid will produce
> > + * a SIGSEGV in the tracee upon first access.
> > + *
> > + * If mode is PR_SYS_DISPATCH_ON, syscall dispatch will be enabled. If
> > + * PR_SYS_DISPATCH_OFF, syscall dispatch will be disabled and all other
> > + * parameters must be 0.  The value in *selector (if not null), also determines
> > + * whether syscall dispatch will occur.
> > + *
> > + * The SUD Exclusion area described by offset/len is the virtual address space
> > + * from which syscalls will not produce a user dispatch.
> > + */
> > +struct ptrace_sud_config {
> > +	__u64 mode;
> > +	__s8 *selector;
> 
> How is this correct for a 32bit ptracer running on a 64bit kernel? Aside
> of not wiring up the compat syscall without any argumentation in the
> changelog.
> 

you're right, these would need to be unsigned long/pointers, i will
take a closer look at how ptrace manages this elsewhere and come back.

> 
> > --- a/kernel/entry/syscall_user_dispatch.c
> > +++ b/kernel/entry/syscall_user_dispatch.c
> 
> This section:
> 
> > -int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
> > -			      unsigned long len, char __user *selector)
> > +static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned long mode,
> > +		                          unsigned long offset, unsigned long len,
> > +		                          char __user *selector)
> >  {
> >  	switch (mode) {
> >  	case PR_SYS_DISPATCH_OFF:
> > @@ -94,15 +96,60 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
> >  		return -EINVAL;
> >  	}
> >  
> > -	current->syscall_dispatch.selector = selector;
> > -	current->syscall_dispatch.offset = offset;
> > -	current->syscall_dispatch.len = len;
> > -	current->syscall_dispatch.on_dispatch = false;
> > +	task->syscall_dispatch.selector = selector;
> > +	task->syscall_dispatch.offset = offset;
> > +	task->syscall_dispatch.len = len;
> > +	task->syscall_dispatch.on_dispatch = false;
> >  
> >  	if (mode == PR_SYS_DISPATCH_ON)
> > -		set_syscall_work(SYSCALL_USER_DISPATCH);
> > +		set_task_syscall_work(task, SYSCALL_USER_DISPATCH);
> >  	else
> > -		clear_syscall_work(SYSCALL_USER_DISPATCH);
> > +		clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
> >  
> >  	return 0;
> >  }
> > +
> > +int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
> > +		              unsigned long len, char __user *selector)
> > +{
> > +	return task_set_syscall_user_dispatch(current, mode, offset, len, selector);
> > +}
> 
> until here want's to be a seperate preparatory patch.
> 

I had considered this, but didn't know what was preferable, given that
there's not much reason to create the functions outside the context of
this patch.

Will do.

> > +++ b/tools/testing/selftests/ptrace/get_set_sud.c
> > +	child = fork();
> > +	ASSERT_GE(child, 0);
> > +	if (child == 0) {
> > +		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
> > +			TH_LOG("PTRACE_TRACEME: %m");
> > +		}
> > +		kill(getpid(), SIGSTOP);
> > +		sleep(2);
> 
> The purpose of this sleep is what?
> 

artifact of taking other tests as an outline, will drop it and rerun.

> > +		_exit(1);
> > +	}
> > +
> > +	waitpid(child, &status, 0);
> > +
> > +	config.mode = PR_SYS_DISPATCH_ON;
> > +	config.selector = (void*)0xDEADBEEF;
> > +	config.offset = 0x12345678;
> > +	config.len = 0x87654321;
> 
> What's the purpose of these magic numbers? memset(&config, 0xff,...) is
> sufficient, no?
> 
> Thanks,
> 
>         tglx

Nothing, will drop.

~Gregory

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-13 20:26   ` Thomas Gleixner
  2023-02-13 21:00     ` Gregory Price
@ 2023-02-14  0:00     ` Gregory Price
  2023-02-16 13:33       ` Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Gregory Price @ 2023-02-14  0:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gregory Price, linux-kernel, linux-doc, oleg, avagin, peterz,
	luto, krisman, corbet, shuah

On Mon, Feb 13, 2023 at 09:26:21PM +0100, Thomas Gleixner wrote:
> On Fri, Feb 10 2023 at 02:25, Gregory Price wrote:
> > +struct ptrace_sud_config {
> > +	__u64 mode;
> > +	__s8 *selector;
> 
> How is this correct for a 32bit ptracer running on a 64bit kernel? Aside
> of not wiring up the compat syscall without any argumentation in the
> changelog.
> 

I'm having a little trouble wrapping my head around what is "right" here
with regard to compat.  Granted I've never had to deal with compat
issues, so please excuse the ignorance if this is a trivial issue.


PEEKSIGINFO has a similar u64 field for an argument, and likewise does
not have plumbing through the compat path (it falls through to
ptrace_request).

There is no compat counterpart to this peeksiginfo structure, and
therefore no compat plumbing.  

struct ptrace_peeksiginfo_args {
        __u64 off;      /* from which siginfo to start */
        __u32 flags;
        __s32 nr;       /* how may siginfos to take */
};

static int ptrace_peek_siginfo(struct task_struct *child,
                               unsigned long addr,
                               unsigned long data)
{
        struct ptrace_peeksiginfo_args arg;
[...snip...]
        ret = copy_from_user(&arg, (void __user *) addr,
                                sizeof(struct ptrace_peeksiginfo_args));
[...snip...]
}

This makes it appear that the args struct has a __u64 field regardless
of mode. Either this or anything compiling in 32-bit mode will run into
an issue with any of these __u64 structures.  That doesn't seem likely.

This code suggests there's no need for special compat code. Otherwise,
is this a bug?  Too much data would be copied from userland if the field
is truncated to 32-bit when compiled.

either way, would an appropriate compat structure be:

struct compat_ptrace_sud_config {
    __u32 mode;
    __s8 *selector;
    __u32 offset_high;
    __u32 offset_low;
    __u32 len_high;
    __u32 len_low;
};

Or is a 32-bit program attempting to ptrace a 64-bit program not
generally expected to be supported? (from a bit of research, seems
like "you can make the attempt, good luck", levels of supported).

Thanks again for your time,
~Gregory

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-14  0:00     ` Gregory Price
@ 2023-02-16 13:33       ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2023-02-16 13:33 UTC (permalink / raw)
  To: Gregory Price
  Cc: Thomas Gleixner, Gregory Price, linux-kernel, linux-doc, avagin,
	peterz, luto, krisman, corbet, shuah

On 02/13, Gregory Price wrote:
>
> On Mon, Feb 13, 2023 at 09:26:21PM +0100, Thomas Gleixner wrote:
> > On Fri, Feb 10 2023 at 02:25, Gregory Price wrote:
> > > +struct ptrace_sud_config {
> > > +	__u64 mode;
> > > +	__s8 *selector;
> >
> > How is this correct for a 32bit ptracer running on a 64bit kernel? Aside
> > of not wiring up the compat syscall without any argumentation in the
> > changelog.
> >
>
> I'm having a little trouble wrapping my head around what is "right" here
> with regard to compat.  Granted I've never had to deal with compat
> issues, so please excuse the ignorance if this is a trivial issue.

The problem is the sizeof(selector). 4 bytes for 32bit ptracer but the
kernel will write 8 bytes. I think you should make "selector" __u64 too.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-02-16 13:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10  7:25 [PATCH v9 0/1] Checkpoint Support for Syscall User Dispatch Gregory Price
2023-02-10  7:25 ` [PATCH v9 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
2023-02-13 20:26   ` Thomas Gleixner
2023-02-13 21:00     ` Gregory Price
2023-02-14  0:00     ` Gregory Price
2023-02-16 13:33       ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).