linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/1] Checkpoint Support for Syscall User Dispatch
@ 2023-01-31 14:44 Gregory Price
  2023-01-31 14:44 ` [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Price @ 2023-01-31 14:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

v8: fix include issue   Reported-by: kernel test robot <lkp@intel.com>
    summary:
    +++ b/kernel/entry/syscall_user_dispatch.c
    + include <linux/ptrace.h>

v7: drop ptrace suspend flag, not required
    hanging unreferenced variable
                whitespace

v6: drop fs/proc/array update, it's not needed
    drop on_dispatch field exposure in config structure, it's not
    checkpoint relevant.
    (Thank you for the reviews Oleg and Andrei)

v5: automated test for !defined(GENERIC_ENTRY) failed, fix fs/proc
    use ifdef for GENERIC_ENTRY || TIF_SYSCALL_USER_DISPATCH
    note: syscall user dispatch is not presently supported for
          non-generic entry, but could be implemented. question is
          whether the TIF_ define should be carved out now or then

v4: Whitespace
    s/CHECKPOINT_RESTART/CHECKPOINT_RESUME
    check test_syscall_work(SYSCALL_USER_DISPATCH) to determine if it's
    turned on or not in fs/proc/array and getter interface

v3: Kernel test robot static function fix
    Whitespace nitpicks

v2: Implements the getter/setter interface in ptrace rather than prctl

Syscall user dispatch makes it possible to cleanly intercept system
calls from user-land.  However, most transparent checkpoint software
presently leverages some combination of ptrace and system call
injection to place software in a ready-to-checkpoint state.

If Syscall User Dispatch is enabled at the time of being quiesced,
injected system calls will subsequently be interposed upon and
dispatched to the task's signal handler.

Patch summary:
- Implement a getter interface for Syscall User Dispatch config info.
  To resume successfully, the checkpoint/resume software has to
  save and restore this information.  Presently this configuration
  is write-only, with no way for C/R software to save it.

  This was done in ptrace because syscall user dispatch is not part of
  uapi. The syscall_user_dispatch_config structure was added to the
  ptrace exports.

Gregory Price (1):
  ptrace,syscall_user_dispatch: checkpoint/restore support for SUD

 .../admin-guide/syscall-user-dispatch.rst     |  5 ++-
 include/linux/syscall_user_dispatch.h         | 18 +++++++++
 include/uapi/linux/ptrace.h                   |  9 +++++
 kernel/entry/syscall_user_dispatch.c          | 40 +++++++++++++++++++
 kernel/ptrace.c                               |  9 +++++
 5 files changed, 80 insertions(+), 1 deletion(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-01-31 14:44 [PATCH v8 0/1] Checkpoint Support for Syscall User Dispatch Gregory Price
@ 2023-01-31 14:44 ` Gregory Price
  2023-01-31 18:58   ` Oleg Nesterov
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Gregory Price @ 2023-01-31 14:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, tglx, corbet,
	shuah, Gregory Price

Implement ptrace getter/setter interface for syscall user dispatch.

These prctl settings are presently write-only, making it impossible to
implement transparent checkpoint/restore via software like CRIU.

'on_dispatch' field is not exposed because it is a kernel-internal
only field that cannot be 'true' when returning to userland.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 .../admin-guide/syscall-user-dispatch.rst     |  5 ++-
 include/linux/syscall_user_dispatch.h         | 18 +++++++++
 include/uapi/linux/ptrace.h                   |  9 +++++
 kernel/entry/syscall_user_dispatch.c          | 40 +++++++++++++++++++
 kernel/ptrace.c                               |  9 +++++
 5 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
index 60314953c728..a23ae21a1d5b 100644
--- a/Documentation/admin-guide/syscall-user-dispatch.rst
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -43,7 +43,10 @@ doesn't rely on any of the syscall ABI to make the filtering.  It uses
 only the syscall dispatcher address and the userspace key.
 
 As the ABI of these intercepted syscalls is unknown to Linux, these
-syscalls are not instrumentable via ptrace or the syscall tracepoints.
+syscalls are not instrumentable via ptrace or the syscall tracepoints,
+however an interfaces to suspend, checkpoint, and restore syscall user
+dispatch configuration has been added to ptrace to assist userland
+checkpoint/restart software.
 
 Interface
 ---------
diff --git a/include/linux/syscall_user_dispatch.h b/include/linux/syscall_user_dispatch.h
index a0ae443fb7df..5de2d64ace19 100644
--- a/include/linux/syscall_user_dispatch.h
+++ b/include/linux/syscall_user_dispatch.h
@@ -22,6 +22,12 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 #define clear_syscall_work_syscall_user_dispatch(tsk) \
 	clear_task_syscall_work(tsk, SYSCALL_USER_DISPATCH)
 
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+	void __user *data);
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+	void __user *data);
+
 #else
 struct syscall_user_dispatch {};
 
@@ -35,6 +41,18 @@ static inline void clear_syscall_work_syscall_user_dispatch(struct task_struct *
 {
 }
 
+static inline int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+	void __user *data)
+{
+	return -EINVAL;
+}
+
+static inline int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+	void __user *data)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_GENERIC_ENTRY */
 
 #endif /* _SYSCALL_USER_DISPATCH_H */
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..6d2f3b86f932 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
 	__u32 pad;
 };
 
+#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
+#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
+struct syscall_user_dispatch_config {
+	__u64 mode;
+	__s8 *selector;
+	__u64 offset;
+	__u64 len;
+};
+
 /*
  * These values are stored in task->ptrace_message
  * by ptrace_stop to describe the current syscall-stop.
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..05d4053a771e 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -4,6 +4,7 @@
  */
 #include <linux/sched.h>
 #include <linux/prctl.h>
+#include <linux/ptrace.h>
 #include <linux/syscall_user_dispatch.h>
 #include <linux/uaccess.h>
 #include <linux/signal.h>
@@ -106,3 +107,42 @@ int set_syscall_user_dispatch(unsigned long mode, unsigned long offset,
 
 	return 0;
 }
+
+int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
+		void __user *data)
+{
+	struct syscall_user_dispatch *sd = &task->syscall_dispatch;
+	struct syscall_user_dispatch_config config;
+
+	if (size != sizeof(struct syscall_user_dispatch_config))
+		return -EINVAL;
+
+	if (test_syscall_work(SYSCALL_USER_DISPATCH))
+		config.mode = PR_SYS_DISPATCH_ON;
+	else
+		config.mode = PR_SYS_DISPATCH_OFF;
+
+	config.offset = sd->offset;
+	config.len = sd->len;
+	config.selector = sd->selector;
+
+	if (copy_to_user(data, &config, sizeof(config)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
+		void __user *data)
+{
+	struct syscall_user_dispatch_config config;
+
+	if (size != sizeof(struct syscall_user_dispatch_config))
+		return -EINVAL;
+
+	if (copy_from_user(&config, data, sizeof(config)))
+		return -EFAULT;
+
+	return set_syscall_user_dispatch(config.mode, config.offset, config.len,
+			config.selector);
+}
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..d99376532b56 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -32,6 +32,7 @@
 #include <linux/compat.h>
 #include <linux/sched/signal.h>
 #include <linux/minmax.h>
+#include <linux/syscall_user_dispatch.h>
 
 #include <asm/syscall.h>	/* for syscall_get_* */
 
@@ -1259,6 +1260,14 @@ int ptrace_request(struct task_struct *child, long request,
 		break;
 #endif
 
+	case PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_set_config(child, addr, datavp);
+		break;
+
+	case PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG:
+		ret = syscall_user_dispatch_get_config(child, addr, datavp);
+		break;
+
 	default:
 		break;
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-01-31 14:44 ` [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
@ 2023-01-31 18:58   ` Oleg Nesterov
  2023-02-09 20:29   ` Andrei Vagin
  2023-02-09 23:45   ` Thomas Gleixner
  2 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2023-01-31 18:58 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, avagin, peterz, luto, krisman, tglx,
	corbet, shuah, Gregory Price

On 01/31, Gregory Price wrote:
>
> Implement ptrace getter/setter interface for syscall user dispatch.
>
> These prctl settings are presently write-only, making it impossible to
> implement transparent checkpoint/restore via software like CRIU.
>
> 'on_dispatch' field is not exposed because it is a kernel-internal
> only field that cannot be 'true' when returning to userland.
>
> Signed-off-by: Gregory Price <gregory.price@memverge.com>

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-01-31 14:44 ` [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
  2023-01-31 18:58   ` Oleg Nesterov
@ 2023-02-09 20:29   ` Andrei Vagin
  2023-02-09 23:45   ` Thomas Gleixner
  2 siblings, 0 replies; 7+ messages in thread
From: Andrei Vagin @ 2023-02-09 20:29 UTC (permalink / raw)
  To: Gregory Price
  Cc: linux-kernel, linux-doc, oleg, peterz, luto, krisman, tglx,
	corbet, shuah, Gregory Price

On Tue, Jan 31, 2023 at 7:30 AM Gregory Price <gourry.memverge@gmail.com> wrote:
>
> Implement ptrace getter/setter interface for syscall user dispatch.
>
> These prctl settings are presently write-only, making it impossible to
> implement transparent checkpoint/restore via software like CRIU.
>
> 'on_dispatch' field is not exposed because it is a kernel-internal
> only field that cannot be 'true' when returning to userland.
>

Acked-by: Andrei Vagin <avagin@gmail.com>

> Signed-off-by: Gregory Price <gregory.price@memverge.com>

<snip>

> diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
> index 195ae64a8c87..6d2f3b86f932 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
>         __u32 pad;
>  };
>
> +#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
> +#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
> +struct syscall_user_dispatch_config {

nit: all structures in this header have the ptrace prefix. I think it
is better to add it to this one too.

> +       __u64 mode;
> +       __s8 *selector;
> +       __u64 offset;
> +       __u64 len;
> +};
> +
>  /*
>   * These values are stored in task->ptrace_message
>   * by ptrace_stop to describe the current syscall-stop.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-01-31 14:44 ` [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
  2023-01-31 18:58   ` Oleg Nesterov
  2023-02-09 20:29   ` Andrei Vagin
@ 2023-02-09 23:45   ` Thomas Gleixner
  2023-02-09 23:46     ` Thomas Gleixner
  2 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2023-02-09 23:45 UTC (permalink / raw)
  To: Gregory Price, linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, corbet, shuah,
	Gregory Price

On Tue, Jan 31 2023 at 09:44, Gregory Price wrote:
>  
> +static inline int syscall_user_dispatch_get_config(struct task_struct *task, unsigned long size,
> +	void __user *data)

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#line-breaks

All over the place.

> index 195ae64a8c87..6d2f3b86f932 100644
> --- a/include/uapi/linux/ptrace.h
> +++ b/include/uapi/linux/ptrace.h
> @@ -112,6 +112,15 @@ struct ptrace_rseq_configuration {
>  	__u32 pad;
>  };
>  
> +#define PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG 0x4210
> +#define PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG 0x4211
> +struct syscall_user_dispatch_config {

Can you please visibly separate the defines from the struct definition
by a newline? Glueing that stuff together is just horrible to read.

> +	__u64 mode;
> +	__s8 *selector;
> +	__u64 offset;
> +	__u64 len;

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#struct-declarations-and-initializers

Please add proper documentation to this struct. It's user space ABI and
it's not the job of the man page maintainers to figure out what this
actually means.

> +int syscall_user_dispatch_set_config(struct task_struct *task, unsigned long size,
> +		void __user *data)
> +{
> +	struct syscall_user_dispatch_config config;
> +
> +	if (size != sizeof(struct syscall_user_dispatch_config))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&config, data, sizeof(config)))
> +		return -EFAULT;
> +
> +	return set_syscall_user_dispatch(config.mode, config.offset, config.len,
> +			config.selector);

How is this supposed to work? This is called from the ptracer to set the
user dispatch mode on the ptracee, i.e. on @task.

But set_syscall_user_dispatch() operates on current, which is the
ptracer itself.

Clearly well tested with the non-existant selftest, which is part of
this submission.

So please fix the above issues, add a selftest and proper documentation.

I'm neither impressed by this patch nor by the reviews.

Sigh

        tglx


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-09 23:45   ` Thomas Gleixner
@ 2023-02-09 23:46     ` Thomas Gleixner
  2023-02-10  1:27       ` Gregory Price
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2023-02-09 23:46 UTC (permalink / raw)
  To: Gregory Price, linux-kernel
  Cc: linux-doc, oleg, avagin, peterz, luto, krisman, corbet, shuah,
	Gregory Price

On Fri, Feb 10 2023 at 00:45, Thomas Gleixner wrote:
> I'm neither impressed by this patch nor by the reviews.

I just noticed that this is V8. I'm truly impressed now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD
  2023-02-09 23:46     ` Thomas Gleixner
@ 2023-02-10  1:27       ` Gregory Price
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Price @ 2023-02-10  1:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gregory Price, linux-kernel, linux-doc, oleg, avagin, peterz,
	luto, krisman, corbet, shuah

On Fri, Feb 10, 2023 at 12:46:41AM +0100, Thomas Gleixner wrote:
> On Fri, Feb 10 2023 at 00:45, Thomas Gleixner wrote:
> > I'm neither impressed by this patch nor by the reviews.
> 
> I just noticed that this is V8. I'm truly impressed now.

I'll address your feedback, thank you for the review :]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-10  4:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 14:44 [PATCH v8 0/1] Checkpoint Support for Syscall User Dispatch Gregory Price
2023-01-31 14:44 ` [PATCH v8 1/1] ptrace,syscall_user_dispatch: checkpoint/restore support for SUD Gregory Price
2023-01-31 18:58   ` Oleg Nesterov
2023-02-09 20:29   ` Andrei Vagin
2023-02-09 23:45   ` Thomas Gleixner
2023-02-09 23:46     ` Thomas Gleixner
2023-02-10  1:27       ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).