All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Berg <benjamin@sipsolutions.net>
To: Johannes Berg <johannes@sipsolutions.net>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Andrei Vagin <avagin@gmail.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Cc: linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
Date: Wed, 14 Apr 2021 11:24:19 +0200	[thread overview]
Message-ID: <c35e321fe3d3e6993c0d9a1ad638230a8f77866b.camel@sipsolutions.net> (raw)
In-Reply-To: <9f8280540bbc6f3c857ac5749eeafcd145577da3.camel@sipsolutions.net>

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

On Wed, 2021-04-14 at 09:34 +0200, Johannes Berg wrote:
> On Wed, 2021-04-14 at 08:22 +0100, Anton Ivanov wrote:
> > On 14/04/2021 06:52, Andrei Vagin wrote:
> > > We already have process_vm_readv and process_vm_writev to read and
> > > write
> > > to a process memory faster than we can do this with ptrace. And now
> > > it
> > > is time for process_vm_exec that allows executing code in an
> > > address
> > > space of another process. We can do this with ptrace but it is much
> > > slower.
> > > 
> > > = Use-cases =
> > > 
> > > Here are two known use-cases. The first one is “application kernel”
> > > sandboxes like User-mode Linux and gVisor. In this case, we have a
> > > process that runs the sandbox kernel and a set of stub processes
> > > that
> > > are used to manage guest address spaces. Guest code is executed in
> > > the
> > > context of stub processes but all system calls are intercepted and
> > > handled in the sandbox kernel. Right now, these sort of sandboxes
> > > use
> > > PTRACE_SYSEMU to trap system calls, but the process_vm_exec can
> > > significantly speed them up.
> > 
> > Certainly interesting, but will require um to rework most of its
> > memory 
> > management and we will most likely need extra mm support to make use
> > of 
> > it in UML. We are not likely to get away just with one syscall there.
> 
> Might help the seccomp mode though:
> 
> https://patchwork.ozlabs.org/project/linux-um/list/?series=231980

Hmm, to me it sounds like it replaces both ptrace and seccomp mode
while completely avoiding the scheduling overhead that these techniques
have. I think everything UML needs is covered:

 * The new API can do syscalls in the target memory space
   (we can modify the address space)
 * The new API can run code until the next syscall happens
   (or a signal happens, which means SIGALRM for scheduling works)
 * Single step tracing should work by setting EFLAGS

I think the memory management itself stays fundamentally the same. We
just do the initial clone() using CLONE_STOPPED. We don't need any stub
code/data and we have everything we need to modify the address space
and run the userspace process.

Benjamin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Berg <benjamin@sipsolutions.net>
To: Johannes Berg <johannes@sipsolutions.net>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Andrei Vagin <avagin@gmail.com>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Cc: linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
Date: Wed, 14 Apr 2021 11:24:19 +0200	[thread overview]
Message-ID: <c35e321fe3d3e6993c0d9a1ad638230a8f77866b.camel@sipsolutions.net> (raw)
In-Reply-To: <9f8280540bbc6f3c857ac5749eeafcd145577da3.camel@sipsolutions.net>


[-- Attachment #1.1: Type: text/plain, Size: 2225 bytes --]

On Wed, 2021-04-14 at 09:34 +0200, Johannes Berg wrote:
> On Wed, 2021-04-14 at 08:22 +0100, Anton Ivanov wrote:
> > On 14/04/2021 06:52, Andrei Vagin wrote:
> > > We already have process_vm_readv and process_vm_writev to read and
> > > write
> > > to a process memory faster than we can do this with ptrace. And now
> > > it
> > > is time for process_vm_exec that allows executing code in an
> > > address
> > > space of another process. We can do this with ptrace but it is much
> > > slower.
> > > 
> > > = Use-cases =
> > > 
> > > Here are two known use-cases. The first one is “application kernel”
> > > sandboxes like User-mode Linux and gVisor. In this case, we have a
> > > process that runs the sandbox kernel and a set of stub processes
> > > that
> > > are used to manage guest address spaces. Guest code is executed in
> > > the
> > > context of stub processes but all system calls are intercepted and
> > > handled in the sandbox kernel. Right now, these sort of sandboxes
> > > use
> > > PTRACE_SYSEMU to trap system calls, but the process_vm_exec can
> > > significantly speed them up.
> > 
> > Certainly interesting, but will require um to rework most of its
> > memory 
> > management and we will most likely need extra mm support to make use
> > of 
> > it in UML. We are not likely to get away just with one syscall there.
> 
> Might help the seccomp mode though:
> 
> https://patchwork.ozlabs.org/project/linux-um/list/?series=231980

Hmm, to me it sounds like it replaces both ptrace and seccomp mode
while completely avoiding the scheduling overhead that these techniques
have. I think everything UML needs is covered:

 * The new API can do syscalls in the target memory space
   (we can modify the address space)
 * The new API can run code until the next syscall happens
   (or a signal happens, which means SIGALRM for scheduling works)
 * Single step tracing should work by setting EFLAGS

I think the memory management itself stays fundamentally the same. We
just do the initial clone() using CLONE_STOPPED. We don't need any stub
code/data and we have everything we need to modify the address space
and run the userspace process.

Benjamin

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 152 bytes --]

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

  reply	other threads:[~2021-04-14  9:24 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14  5:52 [PATCH 0/4 POC] Allow executing code and syscalls in another address space Andrei Vagin
2021-04-14  5:52 ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 1/4] signal: add a helper to restore a process state from sigcontex Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 2/4] arch/x86: implement the process_vm_exec syscall Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14 17:09   ` Oleg Nesterov
2021-04-14 17:09     ` Oleg Nesterov
2021-04-23  6:59     ` Andrei Vagin
2021-04-23  6:59       ` Andrei Vagin
2021-06-28 16:13   ` Jann Horn
2021-06-28 16:13     ` Jann Horn
2021-06-28 16:30     ` Andy Lutomirski
2021-06-28 17:14       ` Jann Horn
2021-06-28 17:14         ` Jann Horn
2021-06-28 18:18         ` Eric W. Biederman
2021-06-28 18:18           ` Eric W. Biederman
2021-06-29  1:01           ` Andrei Vagin
2021-06-29  1:01             ` Andrei Vagin
2021-07-02  6:22     ` Andrei Vagin
2021-07-02  6:22       ` Andrei Vagin
2021-07-02 11:51       ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 20:40         ` Andy Lutomirski
2021-07-02 20:40           ` Andy Lutomirski
2021-07-02  8:51   ` Peter Zijlstra
2021-07-02  8:51     ` Peter Zijlstra
2021-07-02 22:21     ` Andrei Vagin
2021-07-02 22:21       ` Andrei Vagin
2021-07-02 20:56   ` Jann Horn
2021-07-02 20:56     ` Jann Horn
2021-07-02 22:48     ` Andrei Vagin
2021-07-02 22:48       ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 3/4] arch/x86: allow to execute syscalls via process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 4/4] selftests: add tests for process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  6:46 ` [PATCH 0/4 POC] Allow executing code and syscalls in another address space Jann Horn
2021-04-14  6:46   ` Jann Horn
2021-04-14 22:10   ` Andrei Vagin
2021-04-14 22:10     ` Andrei Vagin
2021-07-02  6:57   ` Andrei Vagin
2021-07-02  6:57     ` Andrei Vagin
2021-07-02 15:12     ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-18  0:38       ` Andrei Vagin
2021-07-18  0:38         ` Andrei Vagin
2021-04-14  7:22 ` Anton Ivanov
2021-04-14  7:22   ` Anton Ivanov
2021-04-14  7:34   ` Johannes Berg
2021-04-14  7:34     ` Johannes Berg
2021-04-14  9:24     ` Benjamin Berg [this message]
2021-04-14  9:24       ` Benjamin Berg
2021-04-14 10:27 ` Florian Weimer
2021-04-14 10:27   ` Florian Weimer
2021-04-14 11:24   ` Jann Horn
2021-04-14 11:24     ` Jann Horn
2021-04-14 12:20     ` Florian Weimer
2021-04-14 12:20       ` Florian Weimer
2021-04-14 13:58       ` Jann Horn
2021-04-14 13:58         ` Jann Horn
2021-04-16 19:29 ` Kirill Smelkov
2021-04-16 19:29   ` Kirill Smelkov
2021-04-17 16:28 ` sbaugh
2021-04-17 16:28   ` sbaugh
2021-07-02 22:44 ` Andy Lutomirski
2021-07-02 22:44   ` Andy Lutomirski
2021-07-18  1:34   ` Andrei Vagin
2021-07-18  1:34     ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c35e321fe3d3e6993c0d9a1ad638230a8f77866b.camel@sipsolutions.net \
    --to=benjamin@sipsolutions.net \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=avagin@gmail.com \
    --cc=avagin@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=criu@openvz.org \
    --cc=jdike@addtoit.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.