All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
To: Andrei Vagin <avagin@gmail.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Cc: linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
Date: Wed, 14 Apr 2021 08:22:28 +0100	[thread overview]
Message-ID: <78cdee11-1923-595f-90d2-e236efbafa6a@cambridgegreys.com> (raw)
In-Reply-To: <20210414055217.543246-1-avagin@gmail.com>

On 14/04/2021 06:52, Andrei Vagin wrote:
> We already have process_vm_readv and process_vm_writev to read and write
> to a process memory faster than we can do this with ptrace. And now it
> is time for process_vm_exec that allows executing code in an address
> space of another process. We can do this with ptrace but it is much
> slower.
> 
> = Use-cases =
> 
> Here are two known use-cases. The first one is “application kernel”
> sandboxes like User-mode Linux and gVisor. In this case, we have a
> process that runs the sandbox kernel and a set of stub processes that
> are used to manage guest address spaces. Guest code is executed in the
> context of stub processes but all system calls are intercepted and
> handled in the sandbox kernel. Right now, these sort of sandboxes use
> PTRACE_SYSEMU to trap system calls, but the process_vm_exec can
> significantly speed them up.

Certainly interesting, but will require um to rework most of its memory 
management and we will most likely need extra mm support to make use of 
it in UML. We are not likely to get away just with one syscall there.

> 
> Another use-case is CRIU (Checkpoint/Restore in User-space). Several
> process properties can be received only from the process itself. Right
> now, we use a parasite code that is injected into the process. We do
> this with ptrace but it is slow, unsafe, and tricky. process_vm_exec can
> simplify the process of injecting a parasite code and it will allow
> pre-dump memory without stopping processes. The pre-dump here is when we
> enable a memory tracker and dump the memory while a process is continue
> running. On each interaction we dump memory that has been changed from
> the previous iteration. In the final step, we will stop processes and
> dump their full state. Right now the most effective way to dump process
> memory is to create a set of pipes and splice memory into these pipes
> from the parasite code. With process_vm_exec, we will be able to call
> vmsplice directly. It means that we will not need to stop a process to
> inject the parasite code.
> 
> = How it works =
> 
> process_vm_exec has two modes:
> 
> * Execute code in an address space of a target process and stop on any
>    signal or system call.
> 
> * Execute a system call in an address space of a target process.
> 
> int process_vm_exec(pid_t pid, struct sigcontext uctx,
> 		    unsigned long flags, siginfo_t siginfo,
> 		    sigset_t  *sigmask, size_t sizemask)
> 
> PID - target process identification. We can consider to use pidfd
> instead of PID here.
> 
> sigcontext contains a process state with what the process will be
> resumed after switching the address space and then when a process will
> be stopped, its sate will be saved back to sigcontext.
> 
> siginfo is information about a signal that has interrupted the process.
> If a process is interrupted by a system call, signfo will contain a
> synthetic siginfo of the SIGSYS signal.
> 
> sigmask is a set of signals that process_vm_exec returns via signfo.
> 
> # How fast is it
> 
> In the fourth patch, you can find two benchmarks that execute a function
> that calls system calls in a loop. ptrace_vm_exe uses ptrace to trap
> system calls, proces_vm_exec uses the process_vm_exec syscall to do the
> same thing.
> 
> ptrace_vm_exec:   1446 ns/syscall
> ptrocess_vm_exec:  289 ns/syscall
> 
> PS: This version is just a prototype. Its goal is to collect the initial
> feedback, to discuss the interfaces, and maybe to get some advice on
> implementation..
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: Dmitry Safonov <0x7f454c46@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jeff Dike <jdike@addtoit.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> 
> Andrei Vagin (4):
>    signal: add a helper to restore a process state from sigcontex
>    arch/x86: implement the process_vm_exec syscall
>    arch/x86: allow to execute syscalls via process_vm_exec
>    selftests: add tests for process_vm_exec
> 
>   arch/Kconfig                                  |  15 ++
>   arch/x86/Kconfig                              |   1 +
>   arch/x86/entry/common.c                       |  19 +++
>   arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>   arch/x86/include/asm/sigcontext.h             |   2 +
>   arch/x86/kernel/Makefile                      |   1 +
>   arch/x86/kernel/process_vm_exec.c             | 160 ++++++++++++++++++
>   arch/x86/kernel/signal.c                      | 125 ++++++++++----
>   include/linux/entry-common.h                  |   2 +
>   include/linux/process_vm_exec.h               |  17 ++
>   include/linux/sched.h                         |   7 +
>   include/linux/syscalls.h                      |   6 +
>   include/uapi/asm-generic/unistd.h             |   4 +-
>   include/uapi/linux/process_vm_exec.h          |   8 +
>   kernel/entry/common.c                         |   2 +-
>   kernel/fork.c                                 |   9 +
>   kernel/sys_ni.c                               |   2 +
>   .../selftests/process_vm_exec/Makefile        |   7 +
>   tools/testing/selftests/process_vm_exec/log.h |  26 +++
>   .../process_vm_exec/process_vm_exec.c         | 105 ++++++++++++
>   .../process_vm_exec/process_vm_exec_fault.c   | 111 ++++++++++++
>   .../process_vm_exec/process_vm_exec_syscall.c |  81 +++++++++
>   .../process_vm_exec/ptrace_vm_exec.c          | 111 ++++++++++++
>   23 files changed, 785 insertions(+), 37 deletions(-)
>   create mode 100644 arch/x86/kernel/process_vm_exec.c
>   create mode 100644 include/linux/process_vm_exec.h
>   create mode 100644 include/uapi/linux/process_vm_exec.h
>   create mode 100644 tools/testing/selftests/process_vm_exec/Makefile
>   create mode 100644 tools/testing/selftests/process_vm_exec/log.h
>   create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec.c
>   create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec_fault.c
>   create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec_syscall.c
>   create mode 100644 tools/testing/selftests/process_vm_exec/ptrace_vm_exec.c
> 


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

WARNING: multiple messages have this Message-ID (diff)
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
To: Andrei Vagin <avagin@gmail.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Cc: linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
Date: Wed, 14 Apr 2021 08:22:28 +0100	[thread overview]
Message-ID: <78cdee11-1923-595f-90d2-e236efbafa6a@cambridgegreys.com> (raw)
In-Reply-To: <20210414055217.543246-1-avagin@gmail.com>

On 14/04/2021 06:52, Andrei Vagin wrote:
> We already have process_vm_readv and process_vm_writev to read and write
> to a process memory faster than we can do this with ptrace. And now it
> is time for process_vm_exec that allows executing code in an address
> space of another process. We can do this with ptrace but it is much
> slower.
> 
> = Use-cases =
> 
> Here are two known use-cases. The first one is “application kernel”
> sandboxes like User-mode Linux and gVisor. In this case, we have a
> process that runs the sandbox kernel and a set of stub processes that
> are used to manage guest address spaces. Guest code is executed in the
> context of stub processes but all system calls are intercepted and
> handled in the sandbox kernel. Right now, these sort of sandboxes use
> PTRACE_SYSEMU to trap system calls, but the process_vm_exec can
> significantly speed them up.

Certainly interesting, but will require um to rework most of its memory 
management and we will most likely need extra mm support to make use of 
it in UML. We are not likely to get away just with one syscall there.

> 
> Another use-case is CRIU (Checkpoint/Restore in User-space). Several
> process properties can be received only from the process itself. Right
> now, we use a parasite code that is injected into the process. We do
> this with ptrace but it is slow, unsafe, and tricky. process_vm_exec can
> simplify the process of injecting a parasite code and it will allow
> pre-dump memory without stopping processes. The pre-dump here is when we
> enable a memory tracker and dump the memory while a process is continue
> running. On each interaction we dump memory that has been changed from
> the previous iteration. In the final step, we will stop processes and
> dump their full state. Right now the most effective way to dump process
> memory is to create a set of pipes and splice memory into these pipes
> from the parasite code. With process_vm_exec, we will be able to call
> vmsplice directly. It means that we will not need to stop a process to
> inject the parasite code.
> 
> = How it works =
> 
> process_vm_exec has two modes:
> 
> * Execute code in an address space of a target process and stop on any
>    signal or system call.
> 
> * Execute a system call in an address space of a target process.
> 
> int process_vm_exec(pid_t pid, struct sigcontext uctx,
> 		    unsigned long flags, siginfo_t siginfo,
> 		    sigset_t  *sigmask, size_t sizemask)
> 
> PID - target process identification. We can consider to use pidfd
> instead of PID here.
> 
> sigcontext contains a process state with what the process will be
> resumed after switching the address space and then when a process will
> be stopped, its sate will be saved back to sigcontext.
> 
> siginfo is information about a signal that has interrupted the process.
> If a process is interrupted by a system call, signfo will contain a
> synthetic siginfo of the SIGSYS signal.
> 
> sigmask is a set of signals that process_vm_exec returns via signfo.
> 
> # How fast is it
> 
> In the fourth patch, you can find two benchmarks that execute a function
> that calls system calls in a loop. ptrace_vm_exe uses ptrace to trap
> system calls, proces_vm_exec uses the process_vm_exec syscall to do the
> same thing.
> 
> ptrace_vm_exec:   1446 ns/syscall
> ptrocess_vm_exec:  289 ns/syscall
> 
> PS: This version is just a prototype. Its goal is to collect the initial
> feedback, to discuss the interfaces, and maybe to get some advice on
> implementation..
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: Dmitry Safonov <0x7f454c46@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jeff Dike <jdike@addtoit.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> 
> Andrei Vagin (4):
>    signal: add a helper to restore a process state from sigcontex
>    arch/x86: implement the process_vm_exec syscall
>    arch/x86: allow to execute syscalls via process_vm_exec
>    selftests: add tests for process_vm_exec
> 
>   arch/Kconfig                                  |  15 ++
>   arch/x86/Kconfig                              |   1 +
>   arch/x86/entry/common.c                       |  19 +++
>   arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>   arch/x86/include/asm/sigcontext.h             |   2 +
>   arch/x86/kernel/Makefile                      |   1 +
>   arch/x86/kernel/process_vm_exec.c             | 160 ++++++++++++++++++
>   arch/x86/kernel/signal.c                      | 125 ++++++++++----
>   include/linux/entry-common.h                  |   2 +
>   include/linux/process_vm_exec.h               |  17 ++
>   include/linux/sched.h                         |   7 +
>   include/linux/syscalls.h                      |   6 +
>   include/uapi/asm-generic/unistd.h             |   4 +-
>   include/uapi/linux/process_vm_exec.h          |   8 +
>   kernel/entry/common.c                         |   2 +-
>   kernel/fork.c                                 |   9 +
>   kernel/sys_ni.c                               |   2 +
>   .../selftests/process_vm_exec/Makefile        |   7 +
>   tools/testing/selftests/process_vm_exec/log.h |  26 +++
>   .../process_vm_exec/process_vm_exec.c         | 105 ++++++++++++
>   .../process_vm_exec/process_vm_exec_fault.c   | 111 ++++++++++++
>   .../process_vm_exec/process_vm_exec_syscall.c |  81 +++++++++
>   .../process_vm_exec/ptrace_vm_exec.c          | 111 ++++++++++++
>   23 files changed, 785 insertions(+), 37 deletions(-)
>   create mode 100644 arch/x86/kernel/process_vm_exec.c
>   create mode 100644 include/linux/process_vm_exec.h
>   create mode 100644 include/uapi/linux/process_vm_exec.h
>   create mode 100644 tools/testing/selftests/process_vm_exec/Makefile
>   create mode 100644 tools/testing/selftests/process_vm_exec/log.h
>   create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec.c
>   create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec_fault.c
>   create mode 100644 tools/testing/selftests/process_vm_exec/process_vm_exec_syscall.c
>   create mode 100644 tools/testing/selftests/process_vm_exec/ptrace_vm_exec.c
> 


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

  parent reply	other threads:[~2021-04-14  7:23 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14  5:52 [PATCH 0/4 POC] Allow executing code and syscalls in another address space Andrei Vagin
2021-04-14  5:52 ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 1/4] signal: add a helper to restore a process state from sigcontex Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 2/4] arch/x86: implement the process_vm_exec syscall Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14 17:09   ` Oleg Nesterov
2021-04-14 17:09     ` Oleg Nesterov
2021-04-23  6:59     ` Andrei Vagin
2021-04-23  6:59       ` Andrei Vagin
2021-06-28 16:13   ` Jann Horn
2021-06-28 16:13     ` Jann Horn
2021-06-28 16:30     ` Andy Lutomirski
2021-06-28 17:14       ` Jann Horn
2021-06-28 17:14         ` Jann Horn
2021-06-28 18:18         ` Eric W. Biederman
2021-06-28 18:18           ` Eric W. Biederman
2021-06-29  1:01           ` Andrei Vagin
2021-06-29  1:01             ` Andrei Vagin
2021-07-02  6:22     ` Andrei Vagin
2021-07-02  6:22       ` Andrei Vagin
2021-07-02 11:51       ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 20:40         ` Andy Lutomirski
2021-07-02 20:40           ` Andy Lutomirski
2021-07-02  8:51   ` Peter Zijlstra
2021-07-02  8:51     ` Peter Zijlstra
2021-07-02 22:21     ` Andrei Vagin
2021-07-02 22:21       ` Andrei Vagin
2021-07-02 20:56   ` Jann Horn
2021-07-02 20:56     ` Jann Horn
2021-07-02 22:48     ` Andrei Vagin
2021-07-02 22:48       ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 3/4] arch/x86: allow to execute syscalls via process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 4/4] selftests: add tests for process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  6:46 ` [PATCH 0/4 POC] Allow executing code and syscalls in another address space Jann Horn
2021-04-14  6:46   ` Jann Horn
2021-04-14 22:10   ` Andrei Vagin
2021-04-14 22:10     ` Andrei Vagin
2021-07-02  6:57   ` Andrei Vagin
2021-07-02  6:57     ` Andrei Vagin
2021-07-02 15:12     ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-18  0:38       ` Andrei Vagin
2021-07-18  0:38         ` Andrei Vagin
2021-04-14  7:22 ` Anton Ivanov [this message]
2021-04-14  7:22   ` Anton Ivanov
2021-04-14  7:34   ` Johannes Berg
2021-04-14  7:34     ` Johannes Berg
2021-04-14  9:24     ` Benjamin Berg
2021-04-14  9:24       ` Benjamin Berg
2021-04-14 10:27 ` Florian Weimer
2021-04-14 10:27   ` Florian Weimer
2021-04-14 11:24   ` Jann Horn
2021-04-14 11:24     ` Jann Horn
2021-04-14 12:20     ` Florian Weimer
2021-04-14 12:20       ` Florian Weimer
2021-04-14 13:58       ` Jann Horn
2021-04-14 13:58         ` Jann Horn
2021-04-16 19:29 ` Kirill Smelkov
2021-04-16 19:29   ` Kirill Smelkov
2021-04-17 16:28 ` sbaugh
2021-04-17 16:28   ` sbaugh
2021-07-02 22:44 ` Andy Lutomirski
2021-07-02 22:44   ` Andy Lutomirski
2021-07-18  1:34   ` Andrei Vagin
2021-07-18  1:34     ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78cdee11-1923-595f-90d2-e236efbafa6a@cambridgegreys.com \
    --to=anton.ivanov@cambridgegreys.com \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=avagin@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=criu@openvz.org \
    --cc=jdike@addtoit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.