All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Andrei Vagin <avagin@gmail.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Andy Lutomirski <luto@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall
Date: Fri, 2 Jul 2021 22:56:38 +0200	[thread overview]
Message-ID: <CAG48ez37ZUNvWy1eOvrW13kFRM-_ZW175x99Nyjq43w4Qz1qJQ@mail.gmail.com> (raw)
In-Reply-To: <20210414055217.543246-3-avagin@gmail.com>

On Wed, Apr 14, 2021 at 7:59 AM Andrei Vagin <avagin@gmail.com> wrote:
> This change introduces the new system call:
> process_vm_exec(pid_t pid, struct sigcontext *uctx, unsigned long flags,
>                 siginfo_t * uinfo, sigset_t *sigmask, size_t sizemask)
>
> process_vm_exec allows to execute the current process in an address
> space of another process.
>
> process_vm_exec swaps the current address space with an address space of
> a specified process, sets a state from sigcontex and resumes the process.
> When a process receives a signal or calls a system call,
> process_vm_exec saves the process state back to sigcontext, restores the
> origin address space, restores the origin process state, and returns to
> userspace.
>
> If it was interrupted by a signal and the signal is in the user_mask,
> the signal is dequeued and information about it is saved in uinfo.
> If process_vm_exec is interrupted by a system call, a synthetic siginfo
> for the SIGSYS signal is generated.
>
> The behavior of this system call is similar to PTRACE_SYSEMU but
> everything is happing in the context of one process, so
> process_vm_exec shows a better performance.
>
> PTRACE_SYSEMU is primarily used to implement sandboxes (application
> kernels) like User-mode Linux or gVisor. These type of sandboxes
> intercepts applications system calls and acts as the guest kernel.
> A simple benchmark, where a "tracee" process executes systems calls in a
> loop and a "tracer" process traps syscalls and handles them just
> incrementing the tracee instruction pointer to skip the syscall
> instruction shows that process_vm_exec works more than 5 times faster
> than PTRACE_SYSEMU.
[...]
> +long swap_vm_exec_context(struct sigcontext __user *uctx)
> +{
> +       struct sigcontext ctx = {};
> +       sigset_t set = {};
> +
> +
> +       if (copy_from_user(&ctx, uctx, CONTEXT_COPY_SIZE))
> +               return -EFAULT;
> +       /* A floating point state is managed from user-space. */
> +       if (ctx.fpstate != 0)
> +               return -EINVAL;
> +       if (!user_access_begin(uctx, sizeof(*uctx)))
> +               return -EFAULT;
> +       unsafe_put_sigcontext(uctx, NULL, current_pt_regs(), (&set), Efault);
> +       user_access_end();
> +
> +       if (__restore_sigcontext(current_pt_regs(), &ctx, 0))
> +               goto badframe;
> +
> +       return 0;
> +Efault:
> +       user_access_end();
> +badframe:
> +       signal_fault(current_pt_regs(), uctx, "swap_vm_exec_context");
> +       return -EFAULT;
> +}

Comparing the pieces of context that restore_sigcontext() restores
with what a normal task switch does (see __switch_to() and callees), I
noticed: On CPUs with FSGSBASE support, I think sandboxed code could
overwrite FSBASE/GSBASE using the WRFSBASE/WRGSBASE instructions,
causing the supervisor to access attacker-controlled addresses when it
tries to access a thread-local variable like "errno"? Signal handling
saves the segment registers, but not the FS/GS base addresses.


jannh@laptop:~/test$ cat signal_gsbase.c
// compile with -mfsgsbase
#include <stdio.h>
#include <signal.h>
#include <immintrin.h>

void signal_handler(int sig, siginfo_t *info, void *ucontext_) {
  puts("signal handler");
  _writegsbase_u64(0x12345678);
}

int main(void) {
  struct sigaction new_act = {
    .sa_sigaction = signal_handler,
    .sa_flags = SA_SIGINFO
  };
  sigaction(SIGUSR1, &new_act, NULL);

  printf("original gsbase is 0x%lx\n", _readgsbase_u64());
  raise(SIGUSR1);
  printf("post-signal gsbase is 0x%lx\n", _readgsbase_u64());
}
jannh@laptop:~/test$ gcc -o signal_gsbase signal_gsbase.c -mfsgsbase
jannh@laptop:~/test$ ./signal_gsbase
original gsbase is 0x0
signal handler
post-signal gsbase is 0x12345678
jannh@laptop:~/test$


So to make this usable for a sandboxing usecase, you'd also have to
save and restore FSBASE/GSBASE, just like __switch_to().

WARNING: multiple messages have this Message-ID (diff)
From: Jann Horn <jannh@google.com>
To: Andrei Vagin <avagin@gmail.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Andy Lutomirski <luto@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Anton Ivanov <anton.ivanov@cambridgegreys.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Jeff Dike <jdike@addtoit.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall
Date: Fri, 2 Jul 2021 22:56:38 +0200	[thread overview]
Message-ID: <CAG48ez37ZUNvWy1eOvrW13kFRM-_ZW175x99Nyjq43w4Qz1qJQ@mail.gmail.com> (raw)
In-Reply-To: <20210414055217.543246-3-avagin@gmail.com>

On Wed, Apr 14, 2021 at 7:59 AM Andrei Vagin <avagin@gmail.com> wrote:
> This change introduces the new system call:
> process_vm_exec(pid_t pid, struct sigcontext *uctx, unsigned long flags,
>                 siginfo_t * uinfo, sigset_t *sigmask, size_t sizemask)
>
> process_vm_exec allows to execute the current process in an address
> space of another process.
>
> process_vm_exec swaps the current address space with an address space of
> a specified process, sets a state from sigcontex and resumes the process.
> When a process receives a signal or calls a system call,
> process_vm_exec saves the process state back to sigcontext, restores the
> origin address space, restores the origin process state, and returns to
> userspace.
>
> If it was interrupted by a signal and the signal is in the user_mask,
> the signal is dequeued and information about it is saved in uinfo.
> If process_vm_exec is interrupted by a system call, a synthetic siginfo
> for the SIGSYS signal is generated.
>
> The behavior of this system call is similar to PTRACE_SYSEMU but
> everything is happing in the context of one process, so
> process_vm_exec shows a better performance.
>
> PTRACE_SYSEMU is primarily used to implement sandboxes (application
> kernels) like User-mode Linux or gVisor. These type of sandboxes
> intercepts applications system calls and acts as the guest kernel.
> A simple benchmark, where a "tracee" process executes systems calls in a
> loop and a "tracer" process traps syscalls and handles them just
> incrementing the tracee instruction pointer to skip the syscall
> instruction shows that process_vm_exec works more than 5 times faster
> than PTRACE_SYSEMU.
[...]
> +long swap_vm_exec_context(struct sigcontext __user *uctx)
> +{
> +       struct sigcontext ctx = {};
> +       sigset_t set = {};
> +
> +
> +       if (copy_from_user(&ctx, uctx, CONTEXT_COPY_SIZE))
> +               return -EFAULT;
> +       /* A floating point state is managed from user-space. */
> +       if (ctx.fpstate != 0)
> +               return -EINVAL;
> +       if (!user_access_begin(uctx, sizeof(*uctx)))
> +               return -EFAULT;
> +       unsafe_put_sigcontext(uctx, NULL, current_pt_regs(), (&set), Efault);
> +       user_access_end();
> +
> +       if (__restore_sigcontext(current_pt_regs(), &ctx, 0))
> +               goto badframe;
> +
> +       return 0;
> +Efault:
> +       user_access_end();
> +badframe:
> +       signal_fault(current_pt_regs(), uctx, "swap_vm_exec_context");
> +       return -EFAULT;
> +}

Comparing the pieces of context that restore_sigcontext() restores
with what a normal task switch does (see __switch_to() and callees), I
noticed: On CPUs with FSGSBASE support, I think sandboxed code could
overwrite FSBASE/GSBASE using the WRFSBASE/WRGSBASE instructions,
causing the supervisor to access attacker-controlled addresses when it
tries to access a thread-local variable like "errno"? Signal handling
saves the segment registers, but not the FS/GS base addresses.


jannh@laptop:~/test$ cat signal_gsbase.c
// compile with -mfsgsbase
#include <stdio.h>
#include <signal.h>
#include <immintrin.h>

void signal_handler(int sig, siginfo_t *info, void *ucontext_) {
  puts("signal handler");
  _writegsbase_u64(0x12345678);
}

int main(void) {
  struct sigaction new_act = {
    .sa_sigaction = signal_handler,
    .sa_flags = SA_SIGINFO
  };
  sigaction(SIGUSR1, &new_act, NULL);

  printf("original gsbase is 0x%lx\n", _readgsbase_u64());
  raise(SIGUSR1);
  printf("post-signal gsbase is 0x%lx\n", _readgsbase_u64());
}
jannh@laptop:~/test$ gcc -o signal_gsbase signal_gsbase.c -mfsgsbase
jannh@laptop:~/test$ ./signal_gsbase
original gsbase is 0x0
signal handler
post-signal gsbase is 0x12345678
jannh@laptop:~/test$


So to make this usable for a sandboxing usecase, you'd also have to
save and restore FSBASE/GSBASE, just like __switch_to().

_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um


  parent reply	other threads:[~2021-07-02 20:57 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14  5:52 [PATCH 0/4 POC] Allow executing code and syscalls in another address space Andrei Vagin
2021-04-14  5:52 ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 1/4] signal: add a helper to restore a process state from sigcontex Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 2/4] arch/x86: implement the process_vm_exec syscall Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14 17:09   ` Oleg Nesterov
2021-04-14 17:09     ` Oleg Nesterov
2021-04-23  6:59     ` Andrei Vagin
2021-04-23  6:59       ` Andrei Vagin
2021-06-28 16:13   ` Jann Horn
2021-06-28 16:13     ` Jann Horn
2021-06-28 16:30     ` Andy Lutomirski
2021-06-28 17:14       ` Jann Horn
2021-06-28 17:14         ` Jann Horn
2021-06-28 18:18         ` Eric W. Biederman
2021-06-28 18:18           ` Eric W. Biederman
2021-06-29  1:01           ` Andrei Vagin
2021-06-29  1:01             ` Andrei Vagin
2021-07-02  6:22     ` Andrei Vagin
2021-07-02  6:22       ` Andrei Vagin
2021-07-02 11:51       ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 11:51         ` Jann Horn
2021-07-02 20:40         ` Andy Lutomirski
2021-07-02 20:40           ` Andy Lutomirski
2021-07-02  8:51   ` Peter Zijlstra
2021-07-02  8:51     ` Peter Zijlstra
2021-07-02 22:21     ` Andrei Vagin
2021-07-02 22:21       ` Andrei Vagin
2021-07-02 20:56   ` Jann Horn [this message]
2021-07-02 20:56     ` Jann Horn
2021-07-02 22:48     ` Andrei Vagin
2021-07-02 22:48       ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 3/4] arch/x86: allow to execute syscalls via process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  5:52 ` [PATCH 4/4] selftests: add tests for process_vm_exec Andrei Vagin
2021-04-14  5:52   ` Andrei Vagin
2021-04-14  6:46 ` [PATCH 0/4 POC] Allow executing code and syscalls in another address space Jann Horn
2021-04-14  6:46   ` Jann Horn
2021-04-14 22:10   ` Andrei Vagin
2021-04-14 22:10     ` Andrei Vagin
2021-07-02  6:57   ` Andrei Vagin
2021-07-02  6:57     ` Andrei Vagin
2021-07-02 15:12     ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-02 15:12       ` Jann Horn
2021-07-18  0:38       ` Andrei Vagin
2021-07-18  0:38         ` Andrei Vagin
2021-04-14  7:22 ` Anton Ivanov
2021-04-14  7:22   ` Anton Ivanov
2021-04-14  7:34   ` Johannes Berg
2021-04-14  7:34     ` Johannes Berg
2021-04-14  9:24     ` Benjamin Berg
2021-04-14  9:24       ` Benjamin Berg
2021-04-14 10:27 ` Florian Weimer
2021-04-14 10:27   ` Florian Weimer
2021-04-14 11:24   ` Jann Horn
2021-04-14 11:24     ` Jann Horn
2021-04-14 12:20     ` Florian Weimer
2021-04-14 12:20       ` Florian Weimer
2021-04-14 13:58       ` Jann Horn
2021-04-14 13:58         ` Jann Horn
2021-04-16 19:29 ` Kirill Smelkov
2021-04-16 19:29   ` Kirill Smelkov
2021-04-17 16:28 ` sbaugh
2021-04-17 16:28   ` sbaugh
2021-07-02 22:44 ` Andy Lutomirski
2021-07-02 22:44   ` Andy Lutomirski
2021-07-18  1:34   ` Andrei Vagin
2021-07-18  1:34     ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez37ZUNvWy1eOvrW13kFRM-_ZW175x99Nyjq43w4Qz1qJQ@mail.gmail.com \
    --to=jannh@google.com \
    --cc=0x7f454c46@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=avagin@gmail.com \
    --cc=avagin@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=criu@openvz.org \
    --cc=jdike@addtoit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.