All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Andrei Vagin <avagin@google.com>,
	Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Jianfeng Tan <henry.tjf@antfin.com>,
	Adin Scannell <ascannell@google.com>,
	Konstantin Bogomolov <bogomolov@google.com>,
	Etienne Perot <eperot@google.com>
Subject: [PATCH 0/5] KVM/x86: add a new hypercall to execute host system
Date: Fri, 22 Jul 2022 16:02:36 -0700	[thread overview]
Message-ID: <20220722230241.1944655-1-avagin@google.com> (raw)

There is a class of applications that use KVM to manage multiple address
spaces rather than use it as an isolation boundary. In all other terms,
they are normal processes that execute system calls, handle signals,
etc. Currently, each time when such a process needs to interact with the
operation system, it has to switch to host and back to guest. Such
entire switches are expensive and significantly increase the overhead of
system calls. The new hypercall reduces this overhead by more than two
times.

The new hypercall runs system calls on the host.  As for native system
calls, seccomp filters are executed before system calls. It takes one
argument that is a pointer to a pt_regs structure in the host address
space. It provides registers to execute a system call according to the
calling convention. Arguments are passed in %rdi, %rsi, %rdx, %r10, %r8
and %r9 and a return code is stored in %rax. 

The hypercall returns 0 if a system call has been executed. Otherwise,
it returns an error code.

This series introduces a new capability that has to be set to enable the
hypercall. The new hypercall is a backdoor for regular virtual machines,
so it is disabled by default. There is another standard way to allow
hypercalls via cpuid. It has not been used because one of the common
ways to manage them is to request all available features and let them
all together. In this case, it is a hard requirement that the new
hypercall can be enabled only intentionally.

= Background =

gVisor is one such application. It is an application kernel written in
Go that implements a substantial portion of the Linux system call
interface. gVisor intercepts application system calls and acts as the
guest kernel. It has a platform abstraction that implements interception
of syscalls, basic context switching, and memory mapping functionality.
Currently, it has two platforms: ptrace and KVM.

The ptrace platform uses PTRACE_SYSEMU to execute user code without
allowing it to perform host system calls, and it creates stub processes
to manage user address spaces. This platform is primarily for testing
needs due to its bad performance.

Another option is the KVM platform. In this case, the Sentry (gVisor
kernel) can run in a guest ring0 and create/manage multiple address
spaces. Its performance is much better than the ptrace one, but it is
still not great compared with the native performance. This change
optimizes the most critical part, which is the syscall overhead.  The
idea of using vmcall to execute system calls isn’t new. Two large users
of gVisor (Google and AntFinacial) have out-of-tree code to implement
such hypercalls.

In the Google kernel, we have a kvm-like subsystem designed especially
for gVisor. This change is the first step of integrating it into the KVM
code base and making it available to all Linux users.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jianfeng Tan <henry.tjf@antfin.com>
Cc: Adin Scannell <ascannell@google.com>
Cc: Konstantin Bogomolov <bogomolov@google.com>
Cc: Etienne Perot <eperot@google.com>

Andrei Vagin (5):
  kernel: add a new helper to execute system calls from kernel code
  kvm: add controls to enable/disable paravirtualized system calls
  KVM/x86: add a new hypercall to execute host system calls.
  selftests/kvm/x86_64: set rax before vmcall
  selftests/kvm/x86_64: add tests for KVM_HC_HOST_SYSCALL

 Documentation/virt/kvm/x86/hypercalls.rst     |  15 ++
 arch/x86/entry/common.c                       |  48 ++++++
 arch/x86/include/asm/syscall.h                |   1 +
 arch/x86/include/uapi/asm/kvm_para.h          |   2 +
 arch/x86/kvm/cpuid.c                          |  25 +++
 arch/x86/kvm/cpuid.h                          |   8 +-
 arch/x86/kvm/x86.c                            |  37 +++++
 include/uapi/linux/kvm.h                      |   1 +
 include/uapi/linux/kvm_para.h                 |   1 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |   4 +
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +-
 .../kvm/x86_64/kvm_pv_syscall_test.c          | 145 ++++++++++++++++++
 14 files changed, 289 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_pv_syscall_test.c

-- 
2.37.0.rc0.161.g10f37bed90-goog


             reply	other threads:[~2022-07-22 23:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-22 23:02 Andrei Vagin [this message]
2022-07-22 23:02 ` [PATCH 1/5] kernel: add a new helper to execute system calls from kernel code Andrei Vagin
2022-07-22 23:02 ` [PATCH 2/5] kvm/x86: add controls to enable/disable paravirtualized system calls Andrei Vagin
2022-07-22 23:02 ` [PATCH 3/5] KVM/x86: add a new hypercall to execute host " Andrei Vagin
2022-07-22 23:02 ` [PATCH 4/5] selftests/kvm/x86_64: set rax before vmcall Andrei Vagin
2022-08-01 11:32   ` Vitaly Kuznetsov
2022-08-01 12:43     ` Paolo Bonzini
2022-07-22 23:02 ` [PATCH 5/5] selftests/kvm/x86_64: add tests for KVM_HC_HOST_SYSCALL Andrei Vagin
2022-07-22 23:41 ` [PATCH 0/5] KVM/x86: add a new hypercall to execute host system Sean Christopherson
2022-07-26  8:33   ` Andrei Vagin
2022-07-26 10:27     ` Paolo Bonzini
2022-07-27  6:44       ` Andrei Vagin
2022-07-26 15:10     ` Sean Christopherson
2022-07-26 22:10       ` Thomas Gleixner
2022-07-27  1:03         ` Andrei Vagin
2022-08-22 20:26           ` Andrei Vagin
2022-07-27  0:25       ` Andrei Vagin
2022-07-26 21:27   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220722230241.1944655-1-avagin@google.com \
    --to=avagin@google.com \
    --cc=ascannell@google.com \
    --cc=bogomolov@google.com \
    --cc=eperot@google.com \
    --cc=henry.tjf@antfin.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.