All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@gmail.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
Date: Sat, 17 Jul 2021 18:34:39 -0700	[thread overview]
Message-ID: <YPOFL6QHZl2pq+cb@gmail.com> (raw)
In-Reply-To: <6073e4c6-6fe8-0448-4586-5d04d7154164@kernel.org>

On Fri, Jul 02, 2021 at 03:44:41PM -0700, Andy Lutomirski wrote:
> On 4/13/21 10:52 PM, Andrei Vagin wrote:
> 
> > process_vm_exec has two modes:
> > 
> > * Execute code in an address space of a target process and stop on any
> >   signal or system call.
> 
> We already have a perfectly good context switch mechanism: context
> switches.  If you execute code, you are basically guaranteed to be
> subject to being hijacked, which means you pretty much can't allow
> syscalls.  But there's a lot of non-syscall state, and I think context
> switching needs to be done with extreme care.
> 
> (Just as example, suppose you switch mms, then set %gs to point to the
> LDT, then switch back.  Now you're in a weird state.  With %ss the plot
> is a bit thicker.  And there are emulated vsyscalls and such.)
> 
> If you, PeterZ, and the UMCG could all find an acceptable, efficient way
> to wake-and-wait so you can switch into an injected task in the target
> process and switch back quickly, then I think a much nicer solution will
> become available.

I know about umcg and I even did a prototype that used fuxet_swap (the
previous attempt of umcg). Here are a few problems and maybe you will
have some ideas on how to solve them.

The main question is how to hijack a stub process where a guest code is
executing. We need to trap system calls, memory faults, and other
exceptions and handle them in the Sentry (supervisor/kernel). All
interested events except system calls generate signals. We can use
seccomp to get signals on system calls too. In my prototype, a guest
code is running in stub processes. One stub process is for each guest
address space. In a stub process, I set a signal handler for SIGSEGV,
SIGBUS, SIGFPE, SIGSYS, SIGILL, set an alternate signal stack, and set
seccomp rules. The signal handler communicates with the Sentry
(supervisor/kernel) via shared memory and uses futex_swap to make fast
switches to the Sentry and back to a stub process.

Here are a few problems. First, we have a signal handler code, its
stack, and a shared memory region in a guest address space, and we need
to guarantee that a guest code will not be able to use them to do
something unexpected.

The second problem is performance. It is much faster if we compare it
with the ptrace platform, but it is still a few times slower than
process_vm_exec. Signal handling is expensive. The kernel has to
generate a signal frame, execute a signal handler, and then it needs to
call rt_sigreturn. Futex_swap makes fast context switches, but it is
still slower than process_vm_exec. UMCG should be faster because it
doesn’t have a futex overhead.

Andy, what do you think about the idea to rework process_vm_exec so that
it executes code and syscalls in the context of a target process?
Maybe you see other ways how we can “hijack” a remote process?

Thanks,
Andrei

> 
> > 
> > * Execute a system call in an address space of a target process.
> 
> I could get behind this, but there are plenty of cans of worms to watch
> out for.  Serious auditing would be needed.

WARNING: multiple messages have this Message-ID (diff)
From: Andrei Vagin <avagin@gmail.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
Date: Sat, 17 Jul 2021 18:34:39 -0700	[thread overview]
Message-ID: <YPOFL6QHZl2pq+cb@gmail.com> (raw)
In-Reply-To: <6073e4c6-6fe8-0448-4586-5d04d7154164@kernel.org>

On Fri, Jul 02, 2021 at 03:44:41PM -0700, Andy Lutomirski wrote:
> On 4/13/21 10:52 PM, Andrei Vagin wrote:
> 
> > process_vm_exec has two modes:
> > 
> > * Execute code in an address space of a target process and stop on any
> >   signal or system call.
> 
> We already have a perfectly good context switch mechanism: context
> switches.  If you execute code, you are basically guaranteed to be
> subject to being hijacked, which means you pretty much can't allow
> syscalls.  But there's a lot of non-syscall state, and I think context
> switching needs to be done with extreme care.
> 
> (Just as example, suppose you switch mms, then set %gs to point to the
> LDT, then switch back.  Now you're in a weird state.  With %ss the plot
> is a bit thicker.  And there are emulated vsyscalls and such.)
> 
> If you, PeterZ, and the UMCG could all find an acceptable, efficient way
> to wake-and-wait so you can switch into an injected task in the target
> process and switch back quickly, then I think a much nicer solution will
> become available.

I know about umcg and I even did a prototype that used fuxet_swap (the
previous attempt of umcg). Here are a few problems and maybe you will
have some ideas on how to solve them.

The main question is how to hijack a stub process where a guest code is
executing. We need to trap system calls, memory faults, and other
exceptions and handle them in the Sentry (supervisor/kernel). All
interested events except system calls generate signals. We can use
seccomp to get signals on system calls too. In my prototype, a guest
code is running in stub processes. One stub process is for each guest
address space. In a stub process, I set a signal handler for SIGSEGV,
SIGBUS, SIGFPE, SIGSYS, SIGILL, set an alternate signal stack, and set
seccomp rules. The signal handler communicates with the Sentry
(supervisor/kernel) via shared memory and uses futex_swap to make fast
switches to the Sentry and back to a stub process.

Here are a few problems. First, we have a signal handler code, its
stack, and a shared memory region in a guest address space, and we need
to guarantee that a guest code will not be able to use them to do
something unexpected.

The second problem is performance. It is much faster if we compare it
with the ptrace platform, but it is still a few times slower than
process_vm_exec. Signal handling is expensive. The kernel has to
generate a signal frame, execute a signal handler, and then it needs to
call rt_sigreturn. Futex_swap makes fast context switches, but it is
still slower than process_vm_exec. UMCG should be faster because it
doesn’t have a futex overhead.

Andy, what do you think about the idea to rework process_vm_exec so that
it executes code and syscalls in the context of a target process?
Maybe you see other ways how we can “hijack” a remote process?

Thanks,
Andrei

> 
> > 
> > * Execute a system call in an address space of a target process.
> 
> I could get behind this, but there are plenty of cans of worms to watch
> out for.  Serious auditing would be needed.

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

  reply	other threads:[~2021-07-18  1:38 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14  5:52 [PATCH 0/4 POC] Allow executing code and syscalls in another address space Andrei Vagin
2021-04-14  5:52 ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 1/4] signal: add a helper to restore a process state from sigcontex Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 2/4] arch/x86: implement the process_vm_exec syscall Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14 17:09   ` Oleg Nesterov
2021-04-14 17:09     ` Oleg Nesterov
2021-04-23  6:59     ` Andrei Vagin
2021-04-23  6:59       ` Andrei Vagin
2021-06-28 16:13   ` Jann Horn
2021-06-28 16:13     ` Jann Horn
2021-06-28 16:30     ` Andy Lutomirski
2021-06-28 17:14       ` Jann Horn
2021-06-28 17:14         ` Jann Horn
2021-06-28 18:18         ` Eric W. Biederman
2021-06-28 18:18           ` Eric W. Biederman
2021-06-29  1:01           ` Andrei Vagin
2021-06-29  1:01             ` Andrei Vagin
2021-07-02  6:22     ` Andrei Vagin
2021-07-02  6:22       ` Andrei Vagin
2021-07-02 11:51       ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 20:40         ` Andy Lutomirski
2021-07-02 20:40           ` Andy Lutomirski
2021-07-02  8:51   ` Peter Zijlstra
2021-07-02  8:51     ` Peter Zijlstra
2021-07-02 22:21     ` Andrei Vagin
2021-07-02 22:21       ` Andrei Vagin
2021-07-02 20:56   ` Jann Horn
2021-07-02 20:56     ` Jann Horn
2021-07-02 22:48     ` Andrei Vagin
2021-07-02 22:48       ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 3/4] arch/x86: allow to execute syscalls via process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 4/4] selftests: add tests for process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  6:46 ` [PATCH 0/4 POC] Allow executing code and syscalls in another address space Jann Horn
2021-04-14  6:46   ` Jann Horn
2021-04-14 22:10   ` Andrei Vagin
2021-04-14 22:10     ` Andrei Vagin
2021-07-02  6:57   ` Andrei Vagin
2021-07-02  6:57     ` Andrei Vagin
2021-07-02 15:12     ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-18  0:38       ` Andrei Vagin
2021-07-18  0:38         ` Andrei Vagin
2021-04-14  7:22 ` Anton Ivanov
2021-04-14  7:22   ` Anton Ivanov
2021-04-14  7:34   ` Johannes Berg
2021-04-14  7:34     ` Johannes Berg
2021-04-14  9:24     ` Benjamin Berg
2021-04-14  9:24       ` Benjamin Berg
2021-04-14 10:27 ` Florian Weimer
2021-04-14 10:27   ` Florian Weimer
2021-04-14 11:24   ` Jann Horn
2021-04-14 11:24     ` Jann Horn
2021-04-14 12:20     ` Florian Weimer
2021-04-14 12:20       ` Florian Weimer
2021-04-14 13:58       ` Jann Horn
2021-04-14 13:58         ` Jann Horn
2021-04-16 19:29 ` Kirill Smelkov
2021-04-16 19:29   ` Kirill Smelkov
2021-04-17 16:28 ` sbaugh
2021-04-17 16:28   ` sbaugh
2021-07-02 22:44 ` Andy Lutomirski
2021-07-02 22:44   ` Andy Lutomirski
2021-07-18  1:34   ` Andrei Vagin [this message]
2021-07-18  1:34     ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YPOFL6QHZl2pq+cb@gmail.com \
    --to=avagin@gmail.com \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=avagin@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=criu@openvz.org \
    --cc=jdike@addtoit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.