All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
       [not found] <20200107124903.16505-1-n54@gmx.com>
@ 2020-01-28 14:09 ` Kamil Rytarowski
  2020-01-28 14:09   ` [PATCH v2 1/4] Add the NVMM vcpu API Kamil Rytarowski
                     ` (5 more replies)
  0 siblings, 6 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-01-28 14:09 UTC (permalink / raw)
  To: rth, ehabkost, philmd, slp, pbonzini, peter.maydell, max
  Cc: Kamil Rytarowski, qemu-devel

Hello QEMU Community!

Over the past year the NetBSD team has been working hard on a new user-mode API
for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
This new API adds user-mode capabilities to create and manage virtual machines,
configure memory mappings for guest machines, and create and control execution
of virtual processors.

With this new API we are now able to bring our hypervisor to the QEMU
community! The following patches implement the NetBSD Virtual Machine Monitor
accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.

When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
accelerator for use. At runtime using the '-accel nvmm' should see a
significant performance improvement over emulation, much like when using 'hax'
on NetBSD.

The documentation for this new API is visible at https://man.netbsd.org under
the libnvmm(3) and nvmm(4) pages.

NVMM was designed and implemented by Maxime Villard.

Thank you for your feedback.

Refrences:
https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html

Test plan:

1. Download a NetBSD 9.0 pre-release snapshot:
http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso

2. Install it natively on a not too old x86_64 hardware (Intel or AMD).

There is no support for nested virtualization in NVMM.

3. Setup the system.

 export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
 pkg_add git gmake python37 glib2 bison pkgconf pixman

Install mozilla-rootcerts and follow post-install instructions.

 pkg_add mozilla-rootcerts

More information: https://wiki.qemu.org/Hosts/BSD#NetBSD

4. Build qemu

 mkdir build
 cd build
 ../configure --python=python3.7
 gmake
 gmake check

5. Test

 qemu -accel nvmm ...


History:
v1 -> v2:
 - Included the testing plan as requested by Philippe Mathieu-Daude
 - Formatting nit fix in qemu-options.hx
 - Document NVMM in the accel section of qemu-options.hx

Maxime Villard (4):
  Add the NVMM vcpu API
  Add the NetBSD Virtual Machine Monitor accelerator.
  Introduce the NVMM impl
  Add the NVMM acceleration enlightenments

 accel/stubs/Makefile.objs |    1 +
 accel/stubs/nvmm-stub.c   |   43 ++
 configure                 |   36 ++
 cpus.c                    |   58 ++
 include/sysemu/hw_accel.h |   14 +
 include/sysemu/nvmm.h     |   35 ++
 qemu-options.hx           |   16 +-
 target/i386/Makefile.objs |    1 +
 target/i386/helper.c      |    2 +-
 target/i386/nvmm-all.c    | 1222 +++++++++++++++++++++++++++++++++++++
 10 files changed, 1419 insertions(+), 9 deletions(-)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h
 create mode 100644 target/i386/nvmm-all.c

--
2.24.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v2 1/4] Add the NVMM vcpu API
  2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-01-28 14:09   ` Kamil Rytarowski
  2020-02-03 11:42     ` Philippe Mathieu-Daudé
  2020-01-28 14:09   ` [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-01-28 14:09 UTC (permalink / raw)
  To: rth, ehabkost, philmd, slp, pbonzini, peter.maydell, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
introduces the nvmm.h sysemu API for managing the vcpu scheduling and
management.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
---
 accel/stubs/Makefile.objs |  1 +
 accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h

diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
index 3894caf95d..09f2d3e1dd 100644
--- a/accel/stubs/Makefile.objs
+++ b/accel/stubs/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
 obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
 obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
+obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
 obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
 obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
new file mode 100644
index 0000000000..c2208b84a3
--- /dev/null
+++ b/accel/stubs/nvmm-stub.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/nvmm.h"
+
+int nvmm_init_vcpu(CPUState *cpu)
+{
+    return -1;
+}
+
+int nvmm_vcpu_exec(CPUState *cpu)
+{
+    return -1;
+}
+
+void nvmm_destroy_vcpu(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
new file mode 100644
index 0000000000..10496f3980
--- /dev/null
+++ b/include/sysemu/nvmm.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVMM_H
+#define QEMU_NVMM_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int nvmm_init_vcpu(CPUState *);
+int nvmm_vcpu_exec(CPUState *);
+void nvmm_destroy_vcpu(CPUState *);
+
+void nvmm_cpu_synchronize_state(CPUState *);
+void nvmm_cpu_synchronize_post_reset(CPUState *);
+void nvmm_cpu_synchronize_post_init(CPUState *);
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
+
+#ifdef CONFIG_NVMM
+
+int nvmm_enabled(void);
+
+#else /* CONFIG_NVMM */
+
+#define nvmm_enabled() (0)
+
+#endif /* CONFIG_NVMM */
+
+#endif /* CONFIG_NVMM */
--
2.24.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  2020-01-28 14:09   ` [PATCH v2 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-01-28 14:09   ` Kamil Rytarowski
  2020-02-03 11:41     ` Philippe Mathieu-Daudé
  2020-01-28 14:09   ` [PATCH v2 3/4] Introduce the NVMM impl Kamil Rytarowski
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-01-28 14:09 UTC (permalink / raw)
  To: rth, ehabkost, philmd, slp, pbonzini, peter.maydell, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Introduces the configure support for the new NetBSD Virtual Machine Monitor that
allows for hypervisor acceleration from usermode components on the NetBSD
platform.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
---
 configure       | 36 ++++++++++++++++++++++++++++++++++++
 qemu-options.hx | 16 ++++++++--------
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index 0ce2c0354a..eb456a271e 100755
--- a/configure
+++ b/configure
@@ -241,6 +241,17 @@ supported_whpx_target() {
     return 1
 }

+supported_nvmm_target() {
+    test "$nvmm" = "yes" || return 1
+    glob "$1" "*-softmmu" || return 1
+    case "${1%-softmmu}" in
+        i386|x86_64)
+            return 0
+        ;;
+    esac
+    return 1
+}
+
 supported_target() {
     case "$1" in
         *-softmmu)
@@ -268,6 +279,7 @@ supported_target() {
     supported_hax_target "$1" && return 0
     supported_hvf_target "$1" && return 0
     supported_whpx_target "$1" && return 0
+    supported_nvmm_target "$1" && return 0
     print_error "TCG disabled, but hardware accelerator not available for '$target'"
     return 1
 }
@@ -387,6 +399,7 @@ kvm="no"
 hax="no"
 hvf="no"
 whpx="no"
+nvmm="no"
 rdma=""
 pvrdma=""
 gprof="no"
@@ -1168,6 +1181,10 @@ for opt do
   ;;
   --enable-whpx) whpx="yes"
   ;;
+  --disable-nvmm) nvmm="no"
+  ;;
+  --enable-nvmm) nvmm="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   hax             HAX acceleration support
   hvf             Hypervisor.framework acceleration support
   whpx            Windows Hypervisor Platform acceleration support
+  nvmm            NetBSD Virtual Machine Monitor acceleration support
   rdma            Enable RDMA-based migration
   pvrdma          Enable PVRDMA support
   vde             support for vde network
@@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
     fi
 fi

+##########################################
+# NetBSD Virtual Machine Monitor (NVMM) accelerator check
+if test "$nvmm" != "no" ; then
+    if check_include "nvmm.h" ; then
+        nvmm="yes"
+	LIBS="-lnvmm $LIBS"
+    else
+        if test "$nvmm" = "yes"; then
+            feature_not_found "NVMM" "NVMM is not available"
+        fi
+        nvmm="no"
+    fi
+fi
+
 ##########################################
 # Sparse probe
 if test "$sparse" != "no" ; then
@@ -6495,6 +6527,7 @@ echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
 echo "WHPX support      $whpx"
+echo "NVMM support      $nvmm"
 echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
@@ -7771,6 +7804,9 @@ fi
 if test "$target_aligned_only" = "yes" ; then
   echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
 fi
+if supported_nvmm_target $target; then
+    echo "CONFIG_NVMM=y" >> $config_target_mak
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/qemu-options.hx b/qemu-options.hx
index e9d6231438..4ddf7c91a0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "-machine [type=]name[,prop[=value][,...]]\n"
     "                selects emulated machine ('-machine help' for list)\n"
     "                property accel=accel1[:accel2[:...]] selects accelerator\n"
-    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
+    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
@@ -63,9 +63,9 @@ Supported machine properties are:
 @table @option
 @item accel=@var{accels1}[:@var{accels2}[:...]]
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
-more than one accelerator specified, the next one is used if the previous one
-fails to initialize.
+kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
+If there is more than one accelerator specified, the next one is used if the
+previous one fails to initialize.
 @item vmport=on|off|auto
 Enables emulation of VMWare IO port, for vmmouse etc. auto says to select the
 value based on accel. For accel=xen the default is off otherwise the default
@@ -110,7 +110,7 @@ ETEXI

 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "-accel [accel=]accelerator[,prop[=value][,...]]\n"
-    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
+    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
     "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
     "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
@@ -120,9 +120,9 @@ STEXI
 @item -accel @var{name}[,prop=@var{value}[,...]]
 @findex -accel
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
-more than one accelerator specified, the next one is used if the previous one
-fails to initialize.
+kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
+If there is more than one accelerator specified, the next one is used if the
+previous one fails to initialize.
 @table @option
 @item igd-passthru=on|off
 When Xen is in use, this option controls whether Intel integrated graphics
--
2.24.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 3/4] Introduce the NVMM impl
  2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  2020-01-28 14:09   ` [PATCH v2 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-01-28 14:09   ` [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-01-28 14:09   ` Kamil Rytarowski
  2020-02-03 11:51     ` Philippe Mathieu-Daudé
  2020-01-28 14:09   ` [PATCH v2 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-01-28 14:09 UTC (permalink / raw)
  To: rth, ehabkost, philmd, slp, pbonzini, peter.maydell, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
QEMU much greater speed over the emulated x86_64 path's that are taken on
NetBSD today.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
---
 target/i386/Makefile.objs |    1 +
 target/i386/nvmm-all.c    | 1222 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1223 insertions(+)
 create mode 100644 target/i386/nvmm-all.c

diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 48e0c28434..bdcdb32e93 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -17,6 +17,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_NVMM) += nvmm-all.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
new file mode 100644
index 0000000000..66b08f4f66
--- /dev/null
+++ b/target/i386/nvmm-all.c
@@ -0,0 +1,1222 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "exec/ioport.h"
+#include "qemu-common.h"
+#include "strings.h"
+#include "sysemu/accel.h"
+#include "sysemu/nvmm.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "qemu/main-loop.h"
+#include "hw/boards.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#include <nvmm.h>
+
+struct qemu_vcpu {
+    struct nvmm_vcpu vcpu;
+    uint8_t tpr;
+    bool stop;
+
+    /* Window-exiting for INTs/NMIs. */
+    bool int_window_exit;
+    bool nmi_window_exit;
+
+    /* The guest is in an interrupt shadow (POP SS, etc). */
+    bool int_shadow;
+};
+
+struct qemu_machine {
+    struct nvmm_capability cap;
+    struct nvmm_machine mach;
+};
+
+/* -------------------------------------------------------------------------- */
+
+static bool nvmm_allowed;
+static struct qemu_machine qemu_mach;
+
+static struct qemu_vcpu *
+get_qemu_vcpu(CPUState *cpu)
+{
+    return (struct qemu_vcpu *)cpu->hax_vcpu;
+}
+
+static struct nvmm_machine *
+get_nvmm_mach(void)
+{
+    return &qemu_mach.mach;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
+{
+    uint32_t attrib = qseg->flags;
+
+    nseg->selector = qseg->selector;
+    nseg->limit = qseg->limit;
+    nseg->base = qseg->base;
+    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
+    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
+    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
+    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
+    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
+    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
+    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
+    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
+}
+
+static void
+nvmm_set_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* GPRs. */
+    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
+    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
+    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
+    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
+    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
+    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
+    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
+    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
+    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
+    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
+    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
+    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
+    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
+    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
+    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
+    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
+
+    /* RIP and RFLAGS. */
+    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
+    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
+
+    /* Segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
+
+    /* Special segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
+
+    /* Control registers. */
+    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
+    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
+    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
+    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
+    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
+
+    /* Debug registers. */
+    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
+    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
+    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
+    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
+    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
+    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
+
+    /* FPU. */
+    state->fpu.fx_cw = env->fpuc;
+    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+    state->fpu.fx_tw = 0;
+    for (i = 0; i < 8; i++) {
+        state->fpu.fx_tw |= (!env->fptags[i]) << i;
+    }
+    state->fpu.fx_opcode = env->fpop;
+    state->fpu.fx_ip.fa_64 = env->fpip;
+    state->fpu.fx_dp.fa_64 = env->fpdp;
+    state->fpu.fx_mxcsr = env->mxcsr;
+    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
+            &env->xmm_regs[i].ZMM_Q(0), 8);
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
+            &env->xmm_regs[i].ZMM_Q(1), 8);
+    }
+
+    /* MSRs. */
+    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
+    state->msrs[NVMM_X64_MSR_STAR] = env->star;
+#ifdef TARGET_X86_64
+    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
+    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
+    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
+    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
+#endif
+    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
+    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
+    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
+    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
+    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to set virtual processor context,"
+            " error=%d", errno);
+    }
+}
+
+static void
+nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
+{
+    qseg->selector = nseg->selector;
+    qseg->limit = nseg->limit;
+    qseg->base = nseg->base;
+
+    qseg->flags =
+        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
+}
+
+static void
+nvmm_get_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap, tpr;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to get virtual processor context,"
+            " error=%d", errno);
+    }
+
+    /* GPRs. */
+    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
+    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
+    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
+    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
+    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
+    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
+    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
+    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
+    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
+    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
+    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
+    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
+    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
+    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
+    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
+    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
+
+    /* RIP and RFLAGS. */
+    env->eip = state->gprs[NVMM_X64_GPR_RIP];
+    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
+
+    /* Segments. */
+    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
+    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
+    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
+    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
+    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
+    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
+
+    /* Special segments. */
+    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
+    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
+    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
+    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
+
+    /* Control registers. */
+    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
+    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
+    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
+    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
+    tpr = state->crs[NVMM_X64_CR_CR8];
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+    }
+    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
+
+    /* Debug registers. */
+    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
+    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
+    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
+    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
+    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
+    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
+
+    /* FPU. */
+    env->fpuc = state->fpu.fx_cw;
+    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
+    env->fpus = state->fpu.fx_sw & ~0x3800;
+    for (i = 0; i < 8; i++) {
+        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
+    }
+    env->fpop = state->fpu.fx_opcode;
+    env->fpip = state->fpu.fx_ip.fa_64;
+    env->fpdp = state->fpu.fx_dp.fa_64;
+    env->mxcsr = state->fpu.fx_mxcsr;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&env->xmm_regs[i].ZMM_Q(0),
+            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
+        memcpy(&env->xmm_regs[i].ZMM_Q(1),
+            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
+    }
+
+    /* MSRs. */
+    env->efer = state->msrs[NVMM_X64_MSR_EFER];
+    env->star = state->msrs[NVMM_X64_MSR_STAR];
+#ifdef TARGET_X86_64
+    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
+    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
+    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
+    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
+#endif
+    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
+    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
+    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
+    env->pat = state->msrs[NVMM_X64_MSR_PAT];
+    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
+
+    x86_update_hflags(env);
+}
+
+static bool
+nvmm_can_take_int(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_machine *mach = get_nvmm_mach();
+
+    if (qcpu->int_window_exit) {
+        return false;
+    }
+
+    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
+        struct nvmm_x64_state *state = vcpu->state;
+
+        /* Exit on interrupt window. */
+        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
+        state->intr.int_window_exiting = 1;
+        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
+
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+nvmm_can_take_nmi(CPUState *cpu)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    /*
+     * Contrary to INTs, NMIs always schedule an exit when they are
+     * completed. Therefore, if window-exiting is enabled, it means
+     * NMIs are blocked.
+     */
+    if (qcpu->nmi_window_exit) {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Called before the VCPU is run. We inject events generated by the I/O
+ * thread, and synchronize the guest TPR.
+ */
+static void
+nvmm_vcpu_pre_run(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    struct nvmm_vcpu_event *event = vcpu->event;
+    bool has_event = false;
+    bool sync_tpr = false;
+    uint8_t tpr;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+
+    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        sync_tpr = true;
+    }
+
+    /*
+     * Force the VCPU out of its inner loop to process any INIT requests
+     * or commit pending TPR access.
+     */
+    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+        cpu->exit_request = 1;
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        if (nvmm_can_take_nmi(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = 2;
+            has_event = true;
+        }
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+        if (nvmm_can_take_int(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = cpu_get_pic_interrupt(env);
+            has_event = true;
+        }
+    }
+
+    /* Don't want SMIs. */
+    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
+    }
+
+    if (sync_tpr) {
+        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to get CPU state,"
+                " error=%d", errno);
+        }
+
+        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+
+        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to set CPU state,"
+                " error=%d", errno);
+        }
+    }
+
+    if (has_event) {
+        ret = nvmm_vcpu_inject(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to inject event,"
+                " error=%d", errno);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+}
+
+/*
+ * Called after the VCPU ran. We synchronize the host view of the TPR and
+ * RFLAGS.
+ */
+static void
+nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    uint64_t tpr;
+
+    env->eflags = exit->exitstate.rflags;
+    qcpu->int_shadow = exit->exitstate.int_shadow;
+    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
+    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
+
+    tpr = exit->exitstate.cr8;
+    if (qcpu->tpr != tpr) {
+        qcpu->tpr = tpr;
+        qemu_mutex_lock_iothread();
+        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_io_callback(struct nvmm_io *io)
+{
+    MemTxAttrs attrs = { 0 };
+    int ret;
+
+    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
+        io->size, !io->in);
+    if (ret != MEMTX_OK) {
+        error_report("NVMM: I/O Transaction Failed "
+            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
+            io->port, io->size);
+    }
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static void
+nvmm_mem_callback(struct nvmm_mem *mem)
+{
+    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static struct nvmm_assist_callbacks nvmm_callbacks = {
+    .io = nvmm_io_callback,
+    .mem = nvmm_mem_callback
+};
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_mem(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: Mem Assist Failed [gpa=%p]",
+            (void *)vcpu->exit->u.mem.gpa);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_io(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: I/O Assist Failed [port=%d]",
+            (int)vcpu->exit->u.io.port);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    switch (exit->u.rdmsr.msr) {
+    case MSR_IA32_APICBASE:
+        val = cpu_get_apic_base(x86_cpu->apic_state);
+        break;
+    case MSR_MTRRcap:
+    case MSR_MTRRdefType:
+    case MSR_MCG_CAP:
+    case MSR_MCG_STATUS:
+        val = 0;
+        break;
+    default: /* More MSRs to add? */
+        val = 0;
+        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
+            exit->u.rdmsr.msr);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
+    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    val = exit->u.wrmsr.val;
+
+    switch (exit->u.wrmsr.msr) {
+    case MSR_IA32_APICBASE:
+        cpu_set_apic_base(x86_cpu->apic_state, val);
+        break;
+    case MSR_MTRRdefType:
+    case MSR_MCG_STATUS:
+        break;
+    default: /* More MSRs to add? */
+        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
+            exit->u.wrmsr.msr, val);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+          (env->eflags & IF_MASK)) &&
+        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->exception_index = EXCP_HLT;
+        cpu->halted = true;
+        ret = 1;
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
+static int
+nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    struct nvmm_vcpu_event *event = vcpu->event;
+
+    event->type = NVMM_VCPU_EVENT_EXCP;
+    event->vector = 6;
+    event->u.excp.error = 0;
+
+    return nvmm_vcpu_inject(mach, vcpu);
+}
+
+static int
+nvmm_vcpu_loop(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_vcpu_exit *exit = vcpu->exit;
+    int ret;
+
+    /*
+     * Some asynchronous events must be handled outside of the inner
+     * VCPU loop. They are handled here.
+     */
+    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_init(x86_cpu);
+        /* XXX: reset the INT/NMI windows */
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
+        apic_poll_irq(x86_cpu->apic_state);
+    }
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = false;
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_sipi(x86_cpu);
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        nvmm_cpu_synchronize_state(cpu);
+        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
+            env->tpr_access_type);
+    }
+
+    if (cpu->halted) {
+        cpu->exception_index = EXCP_HLT;
+        atomic_set(&cpu->exit_request, false);
+        return 0;
+    }
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    /*
+     * Inner VCPU loop.
+     */
+    do {
+        if (cpu->vcpu_dirty) {
+            nvmm_set_registers(cpu);
+            cpu->vcpu_dirty = false;
+        }
+
+        if (qcpu->stop) {
+            cpu->exception_index = EXCP_INTERRUPT;
+            qcpu->stop = false;
+            ret = 1;
+            break;
+        }
+
+        nvmm_vcpu_pre_run(cpu);
+
+        if (atomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        ret = nvmm_vcpu_run(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to exec a virtual processor,"
+                " error=%d", errno);
+            break;
+        }
+
+        nvmm_vcpu_post_run(cpu, exit);
+
+        switch (exit->reason) {
+        case NVMM_VCPU_EXIT_NONE:
+            break;
+        case NVMM_VCPU_EXIT_MEMORY:
+            ret = nvmm_handle_mem(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_IO:
+            ret = nvmm_handle_io(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_INT_READY:
+        case NVMM_VCPU_EXIT_NMI_READY:
+        case NVMM_VCPU_EXIT_TPR_CHANGED:
+            break;
+        case NVMM_VCPU_EXIT_HALTED:
+            ret = nvmm_handle_halted(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_SHUTDOWN:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            cpu->exception_index = EXCP_INTERRUPT;
+            ret = 1;
+            break;
+        case NVMM_VCPU_EXIT_RDMSR:
+            ret = nvmm_handle_rdmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_WRMSR:
+            ret = nvmm_handle_wrmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_MONITOR:
+        case NVMM_VCPU_EXIT_MWAIT:
+            ret = nvmm_inject_ud(mach, vcpu);
+            break;
+        default:
+            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
+                exit->reason, exit->u.inv.hwcode);
+            nvmm_get_registers(cpu);
+            qemu_mutex_lock_iothread();
+            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+            qemu_mutex_unlock_iothread();
+            ret = -1;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+    current_cpu = cpu;
+
+    atomic_set(&cpu->exit_request, false);
+
+    return ret < 0;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_get_registers(cpu);
+    cpu->vcpu_dirty = true;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static Error *nvmm_migration_blocker;
+
+static void
+nvmm_ipi_signal(int sigcpu)
+{
+    struct qemu_vcpu *qcpu;
+
+    if (current_cpu) {
+        qcpu = get_qemu_vcpu(current_cpu);
+        qcpu->stop = true;
+    }
+}
+
+static void
+nvmm_init_cpu_signals(void)
+{
+    struct sigaction sigact;
+    sigset_t set;
+
+    /* Install the IPI handler. */
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = nvmm_ipi_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    /* Allow IPIs on the current thread. */
+    sigprocmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
+int
+nvmm_init_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct nvmm_vcpu_conf_cpuid cpuid;
+    struct nvmm_vcpu_conf_tpr tpr;
+    Error *local_error = NULL;
+    struct qemu_vcpu *qcpu;
+    int ret, err;
+
+    nvmm_init_cpu_signals();
+
+    if (nvmm_migration_blocker == NULL) {
+        error_setg(&nvmm_migration_blocker,
+            "NVMM: Migration not supported");
+
+        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
+        if (local_error) {
+            error_report_err(local_error);
+            migrate_del_blocker(nvmm_migration_blocker);
+            error_free(nvmm_migration_blocker);
+            return -EINVAL;
+        }
+    }
+
+    qcpu = g_malloc0(sizeof(*qcpu));
+    if (qcpu == NULL) {
+        error_report("NVMM: Failed to allocate VCPU context.");
+        return -ENOMEM;
+    }
+
+    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to create a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    memset(&cpuid, 0, sizeof(cpuid));
+    cpuid.mask = 1;
+    cpuid.leaf = 0x00000001;
+    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
+        &cpuid);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
+        &nvmm_callbacks);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
+        memset(&tpr, 0, sizeof(tpr));
+        tpr.exit_changed = 1;
+        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
+        if (ret == -1) {
+            err = errno;
+            error_report("NVMM: Failed to configure a virtual processor,"
+                " error=%d", err);
+            g_free(qcpu);
+            return -err;
+        }
+    }
+
+    cpu->vcpu_dirty = true;
+    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
+
+    return 0;
+}
+
+int
+nvmm_vcpu_exec(CPUState *cpu)
+{
+    int ret, fatal;
+
+    while (1) {
+        if (cpu->exception_index >= EXCP_INTERRUPT) {
+            ret = cpu->exception_index;
+            cpu->exception_index = -1;
+            break;
+        }
+
+        fatal = nvmm_vcpu_loop(cpu);
+
+        if (fatal) {
+            error_report("NVMM: Failed to execute a VCPU.");
+            abort();
+        }
+    }
+
+    return ret;
+}
+
+void
+nvmm_destroy_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
+    g_free(cpu->hax_vcpu);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
+    bool add, bool rom, const char *name)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    int ret, prot;
+
+    if (add) {
+        prot = PROT_READ | PROT_EXEC;
+        if (!rom) {
+            prot |= PROT_WRITE;
+        }
+        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
+    } else {
+        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
+    }
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
+            "Size:%p bytes, HostVA:%p, error=%d",
+            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
+            (void *)size, (void *)hva, errno);
+    }
+}
+
+static void
+nvmm_process_section(MemoryRegionSection *section, int add)
+{
+    MemoryRegion *mr = section->mr;
+    hwaddr start_pa = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned int delta;
+    uintptr_t hva;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    /* Adjust start_pa and size so that they are page-aligned. */
+    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
+    delta &= ~qemu_real_host_page_mask;
+    if (delta > size) {
+        return;
+    }
+    start_pa += delta;
+    size -= delta;
+    size &= qemu_real_host_page_mask;
+    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
+        return;
+    }
+
+    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
+        section->offset_within_region + delta;
+
+    nvmm_update_mapping(start_pa, size, hva, add,
+        memory_region_is_rom(mr), mr->name);
+}
+
+static void
+nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    memory_region_ref(section->mr);
+    nvmm_process_section(section, 1);
+}
+
+static void
+nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
+{
+    nvmm_process_section(section, 0);
+    memory_region_unref(section->mr);
+}
+
+static void
+nvmm_transaction_begin(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_transaction_commit(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    memory_region_set_dirty(mr, 0, int128_get64(section->size));
+}
+
+static MemoryListener nvmm_memory_listener = {
+    .begin = nvmm_transaction_begin,
+    .commit = nvmm_transaction_commit,
+    .region_add = nvmm_region_add,
+    .region_del = nvmm_region_del,
+    .log_sync = nvmm_log_sync,
+    .priority = 10,
+};
+
+static void
+nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    uintptr_t hva = (uintptr_t)host;
+    int ret;
+
+    ret = nvmm_hva_map(mach, hva, size);
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to map HVA, HostVA:%p "
+            "Size:%p bytes, error=%d",
+            (void *)hva, (void *)size, errno);
+    }
+}
+
+static struct RAMBlockNotifier nvmm_ram_notifier = {
+    .ram_block_added = nvmm_ram_block_added
+};
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_handle_interrupt(CPUState *cpu, int mask)
+{
+    cpu->interrupt_request |= mask;
+
+    if (!qemu_cpu_is_self(cpu)) {
+        qemu_cpu_kick(cpu);
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_accel_init(MachineState *ms)
+{
+    int ret, err;
+
+    ret = nvmm_init();
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Initialization failed, error=%d", errno);
+        return -err;
+    }
+
+    ret = nvmm_capability(&qemu_mach.cap);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Unable to fetch capability, error=%d", errno);
+        return -err;
+    }
+    if (qemu_mach.cap.version != 1) {
+        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
+        return -EPROGMISMATCH;
+    }
+    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
+        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
+        return -EPROGMISMATCH;
+    }
+
+    ret = nvmm_machine_create(&qemu_mach.mach);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Machine creation failed, error=%d", errno);
+        return -err;
+    }
+
+    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
+    ram_block_notifier_add(&nvmm_ram_notifier);
+
+    cpu_interrupt_handler = nvmm_handle_interrupt;
+
+    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
+    return 0;
+}
+
+int
+nvmm_enabled(void)
+{
+    return nvmm_allowed;
+}
+
+static void
+nvmm_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "NVMM";
+    ac->init_machine = nvmm_accel_init;
+    ac->allowed = &nvmm_allowed;
+}
+
+static const TypeInfo nvmm_accel_type = {
+    .name = ACCEL_CLASS_NAME("nvmm"),
+    .parent = TYPE_ACCEL,
+    .class_init = nvmm_accel_class_init,
+};
+
+static void
+nvmm_type_init(void)
+{
+    type_register_static(&nvmm_accel_type);
+}
+
+type_init(nvmm_type_init);
--
2.24.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                     ` (2 preceding siblings ...)
  2020-01-28 14:09   ` [PATCH v2 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-01-28 14:09   ` Kamil Rytarowski
  2020-02-03 11:54     ` Philippe Mathieu-Daudé
  2020-02-03  9:52   ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-01-28 14:09 UTC (permalink / raw)
  To: rth, ehabkost, philmd, slp, pbonzini, peter.maydell, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
accelerator on NetBSD platforms.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
---
 cpus.c                    | 58 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/hw_accel.h | 14 ++++++++++
 target/i386/helper.c      |  2 +-
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index b472378b70..3c3f63588c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -42,6 +42,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"
 #include "exec/exec-all.h"

 #include "qemu/thread.h"
@@ -1666,6 +1667,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     return NULL;
 }

+static void *qemu_nvmm_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    assert(nvmm_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = nvmm_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = nvmm_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    nvmm_destroy_vcpu(cpu);
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 #ifdef _WIN32
 static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 {
@@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }

+static void qemu_nvmm_start_vcpu(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
 static void qemu_dummy_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -2069,6 +2125,8 @@ void qemu_init_vcpu(CPUState *cpu)
         qemu_tcg_init_vcpu(cpu);
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
+    } else if (nvmm_enabled()) {
+        qemu_nvmm_start_vcpu(cpu);
     } else {
         qemu_dummy_start_vcpu(cpu);
     }
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index 0ec2372477..dbfa7a02f9 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -15,6 +15,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/kvm.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"

 static inline void cpu_synchronize_state(CPUState *cpu)
 {
@@ -27,6 +28,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_state(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_state(cpu);
+    }
 }

 static inline void cpu_synchronize_post_reset(CPUState *cpu)
@@ -40,6 +44,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_reset(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_reset(cpu);
+    }
+
 }

 static inline void cpu_synchronize_post_init(CPUState *cpu)
@@ -53,6 +61,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_init(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_init(cpu);
+    }
 }

 static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
@@ -66,6 +77,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_pre_loadvm(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_pre_loadvm(cpu);
+    }
 }

 #endif /* QEMU_HW_ACCEL_H */
diff --git a/target/i386/helper.c b/target/i386/helper.c
index c3a6e4fabe..2e79d61329 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -981,7 +981,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
     X86CPU *cpu = env_archcpu(env);
     CPUState *cs = env_cpu(env);

-    if (kvm_enabled() || whpx_enabled()) {
+    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
         env->tpr_access_type = access;

         cpu_interrupt(cs, CPU_INTERRUPT_TPR);
--
2.24.1



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                     ` (3 preceding siblings ...)
  2020-01-28 14:09   ` [PATCH v2 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
@ 2020-02-03  9:52   ` Kamil Rytarowski
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
  5 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-03  9:52 UTC (permalink / raw)
  To: rth, ehabkost, philmd, slp, pbonzini, peter.maydell, max; +Cc: qemu-devel

Ping?

We plan to release NetBSD 9.0 in two weeks and we would love to have
this patchset merged.

"A second (and hopefulle final) release candidate for the upcoming
NetBSD 9.0 release is now available.
Please help testing it!

Tentative final 9.0 release date: February 14, 2020"

http://netbsd.org/

On 28.01.2020 15:09, Kamil Rytarowski wrote:
> Hello QEMU Community!
>
> Over the past year the NetBSD team has been working hard on a new user-mode API
> for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
> This new API adds user-mode capabilities to create and manage virtual machines,
> configure memory mappings for guest machines, and create and control execution
> of virtual processors.
>
> With this new API we are now able to bring our hypervisor to the QEMU
> community! The following patches implement the NetBSD Virtual Machine Monitor
> accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.
>
> When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
> accelerator for use. At runtime using the '-accel nvmm' should see a
> significant performance improvement over emulation, much like when using 'hax'
> on NetBSD.
>
> The documentation for this new API is visible at https://man.netbsd.org under
> the libnvmm(3) and nvmm(4) pages.
>
> NVMM was designed and implemented by Maxime Villard.
>
> Thank you for your feedback.
>
> Refrences:
> https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html
>
> Test plan:
>
> 1. Download a NetBSD 9.0 pre-release snapshot:
> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso
>
> 2. Install it natively on a not too old x86_64 hardware (Intel or AMD).
>
> There is no support for nested virtualization in NVMM.
>
> 3. Setup the system.
>
>  export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
>  pkg_add git gmake python37 glib2 bison pkgconf pixman
>
> Install mozilla-rootcerts and follow post-install instructions.
>
>  pkg_add mozilla-rootcerts
>
> More information: https://wiki.qemu.org/Hosts/BSD#NetBSD
>
> 4. Build qemu
>
>  mkdir build
>  cd build
>  ../configure --python=python3.7
>  gmake
>  gmake check
>
> 5. Test
>
>  qemu -accel nvmm ...
>
>
> History:
> v1 -> v2:
>  - Included the testing plan as requested by Philippe Mathieu-Daude
>  - Formatting nit fix in qemu-options.hx
>  - Document NVMM in the accel section of qemu-options.hx
>
> Maxime Villard (4):
>   Add the NVMM vcpu API
>   Add the NetBSD Virtual Machine Monitor accelerator.
>   Introduce the NVMM impl
>   Add the NVMM acceleration enlightenments
>
>  accel/stubs/Makefile.objs |    1 +
>  accel/stubs/nvmm-stub.c   |   43 ++
>  configure                 |   36 ++
>  cpus.c                    |   58 ++
>  include/sysemu/hw_accel.h |   14 +
>  include/sysemu/nvmm.h     |   35 ++
>  qemu-options.hx           |   16 +-
>  target/i386/Makefile.objs |    1 +
>  target/i386/helper.c      |    2 +-
>  target/i386/nvmm-all.c    | 1222 +++++++++++++++++++++++++++++++++++++
>  10 files changed, 1419 insertions(+), 9 deletions(-)
>  create mode 100644 accel/stubs/nvmm-stub.c
>  create mode 100644 include/sysemu/nvmm.h
>  create mode 100644 target/i386/nvmm-all.c
>
> --
> 2.24.1
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-01-28 14:09   ` [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-02-03 11:41     ` Philippe Mathieu-Daudé
  2020-02-03 11:56       ` Kamil Rytarowski
  2020-03-02 17:11       ` Paolo Bonzini
  0 siblings, 2 replies; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-03 11:41 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 1/28/20 3:09 PM, Kamil Rytarowski wrote:
> From: Maxime Villard <max@m00nbsd.net>
> 
> Introduces the configure support for the new NetBSD Virtual Machine Monitor that
> allows for hypervisor acceleration from usermode components on the NetBSD
> platform.
> 
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> ---
>   configure       | 36 ++++++++++++++++++++++++++++++++++++
>   qemu-options.hx | 16 ++++++++--------
>   2 files changed, 44 insertions(+), 8 deletions(-)
> 
> diff --git a/configure b/configure
> index 0ce2c0354a..eb456a271e 100755
> --- a/configure
> +++ b/configure
> @@ -241,6 +241,17 @@ supported_whpx_target() {
>       return 1
>   }
> 
> +supported_nvmm_target() {
> +    test "$nvmm" = "yes" || return 1
> +    glob "$1" "*-softmmu" || return 1
> +    case "${1%-softmmu}" in
> +        i386|x86_64)
> +            return 0
> +        ;;
> +    esac
> +    return 1
> +}
> +
>   supported_target() {
>       case "$1" in
>           *-softmmu)
> @@ -268,6 +279,7 @@ supported_target() {
>       supported_hax_target "$1" && return 0
>       supported_hvf_target "$1" && return 0
>       supported_whpx_target "$1" && return 0
> +    supported_nvmm_target "$1" && return 0
>       print_error "TCG disabled, but hardware accelerator not available for '$target'"
>       return 1
>   }
> @@ -387,6 +399,7 @@ kvm="no"
>   hax="no"
>   hvf="no"
>   whpx="no"
> +nvmm="no"
>   rdma=""
>   pvrdma=""
>   gprof="no"
> @@ -1168,6 +1181,10 @@ for opt do
>     ;;
>     --enable-whpx) whpx="yes"
>     ;;
> +  --disable-nvmm) nvmm="no"
> +  ;;
> +  --enable-nvmm) nvmm="yes"
> +  ;;
>     --disable-tcg-interpreter) tcg_interpreter="no"
>     ;;
>     --enable-tcg-interpreter) tcg_interpreter="yes"
> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is enabled if available:
>     hax             HAX acceleration support
>     hvf             Hypervisor.framework acceleration support
>     whpx            Windows Hypervisor Platform acceleration support
> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>     rdma            Enable RDMA-based migration
>     pvrdma          Enable PVRDMA support
>     vde             support for vde network
> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>       fi
>   fi
> 

Maybe you can add something like:

if test "$targetos" = "NetBSD"; then
     nvmm="check"
fi

to build by default with NVMM if available.

> +##########################################
> +# NetBSD Virtual Machine Monitor (NVMM) accelerator check
> +if test "$nvmm" != "no" ; then
> +    if check_include "nvmm.h" ; then
> +        nvmm="yes"
> +	LIBS="-lnvmm $LIBS"
> +    else
> +        if test "$nvmm" = "yes"; then
> +            feature_not_found "NVMM" "NVMM is not available"
> +        fi
> +        nvmm="no"
> +    fi
> +fi
> +
>   ##########################################
>   # Sparse probe
>   if test "$sparse" != "no" ; then
> @@ -6495,6 +6527,7 @@ echo "KVM support       $kvm"
>   echo "HAX support       $hax"
>   echo "HVF support       $hvf"
>   echo "WHPX support      $whpx"
> +echo "NVMM support      $nvmm"
>   echo "TCG support       $tcg"
>   if test "$tcg" = "yes" ; then
>       echo "TCG debug enabled $debug_tcg"
> @@ -7771,6 +7804,9 @@ fi
>   if test "$target_aligned_only" = "yes" ; then
>     echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
>   fi
> +if supported_nvmm_target $target; then
> +    echo "CONFIG_NVMM=y" >> $config_target_mak
> +fi
>   if test "$target_bigendian" = "yes" ; then
>     echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
>   fi
> diff --git a/qemu-options.hx b/qemu-options.hx
> index e9d6231438..4ddf7c91a0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>       "-machine [type=]name[,prop[=value][,...]]\n"
>       "                selects emulated machine ('-machine help' for list)\n"
>       "                property accel=accel1[:accel2[:...]] selects accelerator\n"
> -    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
> +    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
>       "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
>       "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>       "                mem-merge=on|off controls memory merge support (default: on)\n"
> @@ -63,9 +63,9 @@ Supported machine properties are:
>   @table @option
>   @item accel=@var{accels1}[:@var{accels2}[:...]]
>   This is used to enable an accelerator. Depending on the target architecture,
> -kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
> -more than one accelerator specified, the next one is used if the previous one
> -fails to initialize.
> +kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
> +If there is more than one accelerator specified, the next one is used if the
> +previous one fails to initialize.
>   @item vmport=on|off|auto
>   Enables emulation of VMWare IO port, for vmmouse etc. auto says to select the
>   value based on accel. For accel=xen the default is off otherwise the default
> @@ -110,7 +110,7 @@ ETEXI
> 
>   DEF("accel", HAS_ARG, QEMU_OPTION_accel,
>       "-accel [accel=]accelerator[,prop[=value][,...]]\n"
> -    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
> +    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
>       "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
>       "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
>       "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
> @@ -120,9 +120,9 @@ STEXI
>   @item -accel @var{name}[,prop=@var{value}[,...]]
>   @findex -accel
>   This is used to enable an accelerator. Depending on the target architecture,
> -kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
> -more than one accelerator specified, the next one is used if the previous one
> -fails to initialize.
> +kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
> +If there is more than one accelerator specified, the next one is used if the
> +previous one fails to initialize.
>   @table @option
>   @item igd-passthru=on|off
>   When Xen is in use, this option controls whether Intel integrated graphics
> --
> 2.24.1
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 1/4] Add the NVMM vcpu API
  2020-01-28 14:09   ` [PATCH v2 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-02-03 11:42     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-03 11:42 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 1/28/20 3:09 PM, Kamil Rytarowski wrote:
> From: Maxime Villard <max@m00nbsd.net>
> 
> Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
> introduces the nvmm.h sysemu API for managing the vcpu scheduling and
> management.
> 
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> ---
>   accel/stubs/Makefile.objs |  1 +
>   accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
>   include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
>   3 files changed, 79 insertions(+)
>   create mode 100644 accel/stubs/nvmm-stub.c
>   create mode 100644 include/sysemu/nvmm.h
> 
> diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
> index 3894caf95d..09f2d3e1dd 100644
> --- a/accel/stubs/Makefile.objs
> +++ b/accel/stubs/Makefile.objs
> @@ -1,5 +1,6 @@
>   obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
>   obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
>   obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
> +obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
>   obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
>   obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
> diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
> new file mode 100644
> index 0000000000..c2208b84a3
> --- /dev/null
> +++ b/accel/stubs/nvmm-stub.c
> @@ -0,0 +1,43 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "cpu.h"
> +#include "sysemu/nvmm.h"
> +
> +int nvmm_init_vcpu(CPUState *cpu)
> +{
> +    return -1;
> +}
> +
> +int nvmm_vcpu_exec(CPUState *cpu)
> +{
> +    return -1;
> +}
> +
> +void nvmm_destroy_vcpu(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_state(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_post_init(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +}
> diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
> new file mode 100644
> index 0000000000..10496f3980
> --- /dev/null
> +++ b/include/sysemu/nvmm.h
> @@ -0,0 +1,35 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_NVMM_H
> +#define QEMU_NVMM_H
> +
> +#include "config-host.h"
> +#include "qemu-common.h"
> +
> +int nvmm_init_vcpu(CPUState *);
> +int nvmm_vcpu_exec(CPUState *);
> +void nvmm_destroy_vcpu(CPUState *);
> +
> +void nvmm_cpu_synchronize_state(CPUState *);
> +void nvmm_cpu_synchronize_post_reset(CPUState *);
> +void nvmm_cpu_synchronize_post_init(CPUState *);
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
> +
> +#ifdef CONFIG_NVMM
> +
> +int nvmm_enabled(void);
> +
> +#else /* CONFIG_NVMM */
> +
> +#define nvmm_enabled() (0)
> +
> +#endif /* CONFIG_NVMM */
> +
> +#endif /* CONFIG_NVMM */
> --
> 2.24.1
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 3/4] Introduce the NVMM impl
  2020-01-28 14:09   ` [PATCH v2 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-02-03 11:51     ` Philippe Mathieu-Daudé
  2020-02-05 17:22       ` Kamil Rytarowski
  2020-02-05 17:47       ` Maxime Villard
  0 siblings, 2 replies; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-03 11:51 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 1/28/20 3:09 PM, Kamil Rytarowski wrote:
> From: Maxime Villard <max@m00nbsd.net>
> 
> Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
> acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
> QEMU much greater speed over the emulated x86_64 path's that are taken on
> NetBSD today.
> 
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> ---
>   target/i386/Makefile.objs |    1 +
>   target/i386/nvmm-all.c    | 1222 +++++++++++++++++++++++++++++++++++++
>   2 files changed, 1223 insertions(+)
>   create mode 100644 target/i386/nvmm-all.c
> 
> diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
> index 48e0c28434..bdcdb32e93 100644
> --- a/target/i386/Makefile.objs
> +++ b/target/i386/Makefile.objs
> @@ -17,6 +17,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
>   endif
>   obj-$(CONFIG_HVF) += hvf/
>   obj-$(CONFIG_WHPX) += whpx-all.o
> +obj-$(CONFIG_NVMM) += nvmm-all.o
>   endif
>   obj-$(CONFIG_SEV) += sev.o
>   obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
> diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
> new file mode 100644
> index 0000000000..66b08f4f66
> --- /dev/null
> +++ b/target/i386/nvmm-all.c
> @@ -0,0 +1,1222 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/address-spaces.h"
> +#include "exec/ioport.h"
> +#include "qemu-common.h"
> +#include "strings.h"
> +#include "sysemu/accel.h"
> +#include "sysemu/nvmm.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
> +#include "qemu/main-loop.h"
> +#include "hw/boards.h"

You don't need "hw/boards.h"

> +#include "qemu/error-report.h"
> +#include "qemu/queue.h"
> +#include "qapi/error.h"
> +#include "migration/blocker.h"
> +
> +#include <nvmm.h>
> +
> +struct qemu_vcpu {
> +    struct nvmm_vcpu vcpu;
> +    uint8_t tpr;
> +    bool stop;
> +
> +    /* Window-exiting for INTs/NMIs. */
> +    bool int_window_exit;
> +    bool nmi_window_exit;
> +
> +    /* The guest is in an interrupt shadow (POP SS, etc). */
> +    bool int_shadow;
> +};
> +
> +struct qemu_machine {
> +    struct nvmm_capability cap;
> +    struct nvmm_machine mach;
> +};
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static bool nvmm_allowed;
> +static struct qemu_machine qemu_mach;
> +
> +static struct qemu_vcpu *
> +get_qemu_vcpu(CPUState *cpu)
> +{
> +    return (struct qemu_vcpu *)cpu->hax_vcpu;
> +}
> +
> +static struct nvmm_machine *
> +get_nvmm_mach(void)
> +{
> +    return &qemu_mach.mach;
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
> +{
> +    uint32_t attrib = qseg->flags;
> +
> +    nseg->selector = qseg->selector;
> +    nseg->limit = qseg->limit;
> +    nseg->base = qseg->base;
> +    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
> +    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
> +    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
> +    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
> +    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
> +    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
> +    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
> +    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
> +}
> +
> +static void
> +nvmm_set_registers(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t bitmap;
> +    size_t i;
> +    int ret;
> +
> +    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
> +
> +    /* GPRs. */
> +    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
> +    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
> +    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
> +    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
> +    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
> +    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
> +    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
> +    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
> +    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
> +    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
> +    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
> +    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
> +    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
> +    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
> +    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
> +    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
> +
> +    /* RIP and RFLAGS. */
> +    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
> +    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
> +
> +    /* Segments. */
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
> +
> +    /* Special segments. */
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
> +
> +    /* Control registers. */
> +    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
> +    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
> +    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
> +    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
> +    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
> +    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
> +
> +    /* Debug registers. */
> +    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
> +    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
> +    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
> +    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
> +    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
> +    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
> +
> +    /* FPU. */
> +    state->fpu.fx_cw = env->fpuc;
> +    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
> +    state->fpu.fx_tw = 0;
> +    for (i = 0; i < 8; i++) {
> +        state->fpu.fx_tw |= (!env->fptags[i]) << i;
> +    }
> +    state->fpu.fx_opcode = env->fpop;
> +    state->fpu.fx_ip.fa_64 = env->fpip;
> +    state->fpu.fx_dp.fa_64 = env->fpdp;
> +    state->fpu.fx_mxcsr = env->mxcsr;
> +    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
> +    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
> +    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
> +    for (i = 0; i < 16; i++) {
> +        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
> +            &env->xmm_regs[i].ZMM_Q(0), 8);
> +        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
> +            &env->xmm_regs[i].ZMM_Q(1), 8);
> +    }
> +
> +    /* MSRs. */
> +    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
> +    state->msrs[NVMM_X64_MSR_STAR] = env->star;
> +#ifdef TARGET_X86_64
> +    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
> +    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
> +    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
> +    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
> +#endif
> +    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
> +    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
> +    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
> +    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
> +    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
> +
> +    bitmap =
> +        NVMM_X64_STATE_SEGS |
> +        NVMM_X64_STATE_GPRS |
> +        NVMM_X64_STATE_CRS  |
> +        NVMM_X64_STATE_DRS  |
> +        NVMM_X64_STATE_MSRS |
> +        NVMM_X64_STATE_FPU;
> +
> +    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to set virtual processor context,"
> +            " error=%d", errno);
> +    }
> +}
> +
> +static void
> +nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
> +{
> +    qseg->selector = nseg->selector;
> +    qseg->limit = nseg->limit;
> +    qseg->base = nseg->base;
> +
> +    qseg->flags =
> +        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
> +}
> +
> +static void
> +nvmm_get_registers(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t bitmap, tpr;
> +    size_t i;
> +    int ret;
> +
> +    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
> +
> +    bitmap =
> +        NVMM_X64_STATE_SEGS |
> +        NVMM_X64_STATE_GPRS |
> +        NVMM_X64_STATE_CRS  |
> +        NVMM_X64_STATE_DRS  |
> +        NVMM_X64_STATE_MSRS |
> +        NVMM_X64_STATE_FPU;
> +
> +    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to get virtual processor context,"
> +            " error=%d", errno);
> +    }
> +
> +    /* GPRs. */
> +    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
> +    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
> +    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
> +    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
> +    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
> +    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
> +    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
> +    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
> +    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
> +    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
> +    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
> +    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
> +    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
> +    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
> +    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
> +    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
> +
> +    /* RIP and RFLAGS. */
> +    env->eip = state->gprs[NVMM_X64_GPR_RIP];
> +    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
> +
> +    /* Segments. */
> +    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
> +    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
> +    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
> +    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
> +    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
> +    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
> +
> +    /* Special segments. */
> +    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
> +    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
> +    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
> +    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
> +
> +    /* Control registers. */
> +    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
> +    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
> +    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
> +    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
> +    tpr = state->crs[NVMM_X64_CR_CR8];
> +    if (tpr != qcpu->tpr) {
> +        qcpu->tpr = tpr;
> +        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
> +    }
> +    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
> +
> +    /* Debug registers. */
> +    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
> +    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
> +    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
> +    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
> +    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
> +    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
> +
> +    /* FPU. */
> +    env->fpuc = state->fpu.fx_cw;
> +    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
> +    env->fpus = state->fpu.fx_sw & ~0x3800;
> +    for (i = 0; i < 8; i++) {
> +        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
> +    }
> +    env->fpop = state->fpu.fx_opcode;
> +    env->fpip = state->fpu.fx_ip.fa_64;
> +    env->fpdp = state->fpu.fx_dp.fa_64;
> +    env->mxcsr = state->fpu.fx_mxcsr;
> +    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
> +    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
> +    for (i = 0; i < 16; i++) {
> +        memcpy(&env->xmm_regs[i].ZMM_Q(0),
> +            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
> +        memcpy(&env->xmm_regs[i].ZMM_Q(1),
> +            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
> +    }
> +
> +    /* MSRs. */
> +    env->efer = state->msrs[NVMM_X64_MSR_EFER];
> +    env->star = state->msrs[NVMM_X64_MSR_STAR];
> +#ifdef TARGET_X86_64
> +    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
> +    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
> +    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
> +    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
> +#endif
> +    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
> +    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
> +    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
> +    env->pat = state->msrs[NVMM_X64_MSR_PAT];
> +    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
> +
> +    x86_update_hflags(env);
> +}
> +
> +static bool
> +nvmm_can_take_int(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +
> +    if (qcpu->int_window_exit) {
> +        return false;
> +    }
> +
> +    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
> +        struct nvmm_x64_state *state = vcpu->state;
> +
> +        /* Exit on interrupt window. */
> +        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
> +        state->intr.int_window_exiting = 1;
> +        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
> +
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static bool
> +nvmm_can_take_nmi(CPUState *cpu)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +
> +    /*
> +     * Contrary to INTs, NMIs always schedule an exit when they are
> +     * completed. Therefore, if window-exiting is enabled, it means
> +     * NMIs are blocked.
> +     */
> +    if (qcpu->nmi_window_exit) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +/*
> + * Called before the VCPU is run. We inject events generated by the I/O
> + * thread, and synchronize the guest TPR.
> + */
> +static void
> +nvmm_vcpu_pre_run(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    struct nvmm_vcpu_event *event = vcpu->event;
> +    bool has_event = false;
> +    bool sync_tpr = false;
> +    uint8_t tpr;
> +    int ret;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
> +    if (tpr != qcpu->tpr) {
> +        qcpu->tpr = tpr;
> +        sync_tpr = true;
> +    }
> +
> +    /*
> +     * Force the VCPU out of its inner loop to process any INIT requests
> +     * or commit pending TPR access.
> +     */
> +    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> +        cpu->exit_request = 1;
> +    }
> +
> +    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
> +        if (nvmm_can_take_nmi(cpu)) {
> +            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
> +            event->type = NVMM_VCPU_EVENT_INTR;
> +            event->vector = 2;
> +            has_event = true;
> +        }
> +    }
> +
> +    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
> +        if (nvmm_can_take_int(cpu)) {
> +            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> +            event->type = NVMM_VCPU_EVENT_INTR;
> +            event->vector = cpu_get_pic_interrupt(env);
> +            has_event = true;
> +        }
> +    }
> +
> +    /* Don't want SMIs. */
> +    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
> +        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
> +    }
> +
> +    if (sync_tpr) {
> +        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to get CPU state,"
> +                " error=%d", errno);
> +        }
> +
> +        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
> +
> +        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to set CPU state,"
> +                " error=%d", errno);
> +        }
> +    }
> +
> +    if (has_event) {
> +        ret = nvmm_vcpu_inject(mach, vcpu);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to inject event,"
> +                " error=%d", errno);
> +        }
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +}
> +
> +/*
> + * Called after the VCPU ran. We synchronize the host view of the TPR and
> + * RFLAGS.
> + */
> +static void
> +nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    uint64_t tpr;
> +
> +    env->eflags = exit->exitstate.rflags;
> +    qcpu->int_shadow = exit->exitstate.int_shadow;
> +    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
> +    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
> +
> +    tpr = exit->exitstate.cr8;
> +    if (qcpu->tpr != tpr) {
> +        qcpu->tpr = tpr;
> +        qemu_mutex_lock_iothread();
> +        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
> +        qemu_mutex_unlock_iothread();
> +    }
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_io_callback(struct nvmm_io *io)
> +{
> +    MemTxAttrs attrs = { 0 };
> +    int ret;
> +
> +    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
> +        io->size, !io->in);
> +    if (ret != MEMTX_OK) {
> +        error_report("NVMM: I/O Transaction Failed "
> +            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
> +            io->port, io->size);
> +    }
> +
> +    /* XXX Needed, otherwise infinite loop. */

This seems OK, why the XXX in comment?

> +    current_cpu->vcpu_dirty = false;
> +}
> +
> +static void
> +nvmm_mem_callback(struct nvmm_mem *mem)
> +{
> +    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
> +
> +    /* XXX Needed, otherwise infinite loop. */
> +    current_cpu->vcpu_dirty = false;
> +}
> +
> +static struct nvmm_assist_callbacks nvmm_callbacks = {
> +    .io = nvmm_io_callback,
> +    .mem = nvmm_mem_callback
> +};
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static int
> +nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
> +{
> +    int ret;
> +
> +    ret = nvmm_assist_mem(mach, vcpu);
> +    if (ret == -1) {
> +        error_report("NVMM: Mem Assist Failed [gpa=%p]",
> +            (void *)vcpu->exit->u.mem.gpa);
> +    }
> +
> +    return ret;
> +}
> +
> +static int
> +nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
> +{
> +    int ret;
> +
> +    ret = nvmm_assist_io(mach, vcpu);
> +    if (ret == -1) {
> +        error_report("NVMM: I/O Assist Failed [port=%d]",
> +            (int)vcpu->exit->u.io.port);
> +    }
> +
> +    return ret;
> +}
> +
> +static int
> +nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
> +    struct nvmm_vcpu_exit *exit)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t val;
> +    int ret;
> +
> +    switch (exit->u.rdmsr.msr) {
> +    case MSR_IA32_APICBASE:
> +        val = cpu_get_apic_base(x86_cpu->apic_state);
> +        break;
> +    case MSR_MTRRcap:
> +    case MSR_MTRRdefType:
> +    case MSR_MCG_CAP:
> +    case MSR_MCG_STATUS:
> +        val = 0;
> +        break;
> +    default: /* More MSRs to add? */
> +        val = 0;
> +        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
> +            exit->u.rdmsr.msr);
> +        break;
> +    }
> +
> +    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
> +    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
> +    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
> +
> +    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
> +    struct nvmm_vcpu_exit *exit)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t val;
> +    int ret;
> +
> +    val = exit->u.wrmsr.val;
> +
> +    switch (exit->u.wrmsr.msr) {
> +    case MSR_IA32_APICBASE:
> +        cpu_set_apic_base(x86_cpu->apic_state, val);
> +        break;
> +    case MSR_MTRRdefType:
> +    case MSR_MCG_STATUS:
> +        break;
> +    default: /* More MSRs to add? */
> +        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
> +            exit->u.wrmsr.msr, val);
> +        break;
> +    }
> +
> +    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
> +
> +    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
> +    struct nvmm_vcpu_exit *exit)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    int ret = 0;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> +          (env->eflags & IF_MASK)) &&
> +        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
> +        cpu->exception_index = EXCP_HLT;
> +        cpu->halted = true;
> +        ret = 1;
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    return ret;
> +}
> +
> +static int
> +nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
> +{
> +    struct nvmm_vcpu_event *event = vcpu->event;
> +
> +    event->type = NVMM_VCPU_EVENT_EXCP;
> +    event->vector = 6;
> +    event->u.excp.error = 0;
> +
> +    return nvmm_vcpu_inject(mach, vcpu);
> +}
> +
> +static int
> +nvmm_vcpu_loop(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_vcpu_exit *exit = vcpu->exit;
> +    int ret;
> +
> +    /*
> +     * Some asynchronous events must be handled outside of the inner
> +     * VCPU loop. They are handled here.
> +     */
> +    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
> +        nvmm_cpu_synchronize_state(cpu);
> +        do_cpu_init(x86_cpu);
> +        /* XXX: reset the INT/NMI windows */

What is the problem?

> +    }
> +    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
> +        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
> +        apic_poll_irq(x86_cpu->apic_state);
> +    }
> +    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> +         (env->eflags & IF_MASK)) ||
> +        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
> +        cpu->halted = false;
> +    }
> +    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
> +        nvmm_cpu_synchronize_state(cpu);
> +        do_cpu_sipi(x86_cpu);
> +    }
> +    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
> +        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
> +        nvmm_cpu_synchronize_state(cpu);
> +        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
> +            env->tpr_access_type);
> +    }
> +
> +    if (cpu->halted) {
> +        cpu->exception_index = EXCP_HLT;
> +        atomic_set(&cpu->exit_request, false);
> +        return 0;
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +    cpu_exec_start(cpu);
> +
> +    /*
> +     * Inner VCPU loop.
> +     */
> +    do {
> +        if (cpu->vcpu_dirty) {
> +            nvmm_set_registers(cpu);
> +            cpu->vcpu_dirty = false;
> +        }
> +
> +        if (qcpu->stop) {
> +            cpu->exception_index = EXCP_INTERRUPT;
> +            qcpu->stop = false;
> +            ret = 1;
> +            break;
> +        }
> +
> +        nvmm_vcpu_pre_run(cpu);
> +
> +        if (atomic_read(&cpu->exit_request)) {
> +            qemu_cpu_kick_self();
> +        }
> +
> +        ret = nvmm_vcpu_run(mach, vcpu);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to exec a virtual processor,"
> +                " error=%d", errno);
> +            break;
> +        }
> +
> +        nvmm_vcpu_post_run(cpu, exit);
> +
> +        switch (exit->reason) {
> +        case NVMM_VCPU_EXIT_NONE:
> +            break;
> +        case NVMM_VCPU_EXIT_MEMORY:
> +            ret = nvmm_handle_mem(mach, vcpu);
> +            break;
> +        case NVMM_VCPU_EXIT_IO:
> +            ret = nvmm_handle_io(mach, vcpu);
> +            break;
> +        case NVMM_VCPU_EXIT_INT_READY:
> +        case NVMM_VCPU_EXIT_NMI_READY:
> +        case NVMM_VCPU_EXIT_TPR_CHANGED:
> +            break;
> +        case NVMM_VCPU_EXIT_HALTED:
> +            ret = nvmm_handle_halted(mach, cpu, exit);
> +            break;
> +        case NVMM_VCPU_EXIT_SHUTDOWN:
> +            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> +            cpu->exception_index = EXCP_INTERRUPT;
> +            ret = 1;
> +            break;
> +        case NVMM_VCPU_EXIT_RDMSR:
> +            ret = nvmm_handle_rdmsr(mach, cpu, exit);
> +            break;
> +        case NVMM_VCPU_EXIT_WRMSR:
> +            ret = nvmm_handle_wrmsr(mach, cpu, exit);
> +            break;
> +        case NVMM_VCPU_EXIT_MONITOR:
> +        case NVMM_VCPU_EXIT_MWAIT:
> +            ret = nvmm_inject_ud(mach, vcpu);
> +            break;
> +        default:
> +            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
> +                exit->reason, exit->u.inv.hwcode);
> +            nvmm_get_registers(cpu);
> +            qemu_mutex_lock_iothread();
> +            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
> +            qemu_mutex_unlock_iothread();
> +            ret = -1;
> +            break;
> +        }
> +    } while (ret == 0);
> +
> +    cpu_exec_end(cpu);
> +    qemu_mutex_lock_iothread();
> +    current_cpu = cpu;
> +
> +    atomic_set(&cpu->exit_request, false);
> +
> +    return ret < 0;
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    nvmm_get_registers(cpu);
> +    cpu->vcpu_dirty = true;
> +}
> +
> +static void
> +do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    nvmm_set_registers(cpu);
> +    cpu->vcpu_dirty = false;
> +}
> +
> +static void
> +do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    nvmm_set_registers(cpu);
> +    cpu->vcpu_dirty = false;
> +}
> +
> +static void
> +do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    cpu->vcpu_dirty = true;
> +}
> +
> +void nvmm_cpu_synchronize_state(CPUState *cpu)
> +{
> +    if (!cpu->vcpu_dirty) {
> +        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
> +    }
> +}
> +
> +void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
> +}
> +
> +void nvmm_cpu_synchronize_post_init(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> +}
> +
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static Error *nvmm_migration_blocker;
> +
> +static void
> +nvmm_ipi_signal(int sigcpu)
> +{
> +    struct qemu_vcpu *qcpu;
> +
> +    if (current_cpu) {
> +        qcpu = get_qemu_vcpu(current_cpu);
> +        qcpu->stop = true;
> +    }
> +}
> +
> +static void
> +nvmm_init_cpu_signals(void)
> +{
> +    struct sigaction sigact;
> +    sigset_t set;
> +
> +    /* Install the IPI handler. */
> +    memset(&sigact, 0, sizeof(sigact));
> +    sigact.sa_handler = nvmm_ipi_signal;
> +    sigaction(SIG_IPI, &sigact, NULL);
> +
> +    /* Allow IPIs on the current thread. */
> +    sigprocmask(SIG_BLOCK, NULL, &set);
> +    sigdelset(&set, SIG_IPI);
> +    pthread_sigmask(SIG_SETMASK, &set, NULL);
> +}
> +
> +int
> +nvmm_init_vcpu(CPUState *cpu)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct nvmm_vcpu_conf_cpuid cpuid;
> +    struct nvmm_vcpu_conf_tpr tpr;
> +    Error *local_error = NULL;
> +    struct qemu_vcpu *qcpu;
> +    int ret, err;
> +
> +    nvmm_init_cpu_signals();
> +
> +    if (nvmm_migration_blocker == NULL) {
> +        error_setg(&nvmm_migration_blocker,
> +            "NVMM: Migration not supported");
> +
> +        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
> +        if (local_error) {
> +            error_report_err(local_error);
> +            migrate_del_blocker(nvmm_migration_blocker);
> +            error_free(nvmm_migration_blocker);
> +            return -EINVAL;
> +        }
> +    }
> +
> +    qcpu = g_malloc0(sizeof(*qcpu));
> +    if (qcpu == NULL) {
> +        error_report("NVMM: Failed to allocate VCPU context.");
> +        return -ENOMEM;
> +    }
> +
> +    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Failed to create a virtual processor,"
> +            " error=%d", err);
> +        g_free(qcpu);
> +        return -err;
> +    }
> +
> +    memset(&cpuid, 0, sizeof(cpuid));
> +    cpuid.mask = 1;
> +    cpuid.leaf = 0x00000001;
> +    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
> +    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
> +        &cpuid);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Failed to configure a virtual processor,"
> +            " error=%d", err);
> +        g_free(qcpu);
> +        return -err;
> +    }
> +
> +    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
> +        &nvmm_callbacks);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Failed to configure a virtual processor,"
> +            " error=%d", err);
> +        g_free(qcpu);
> +        return -err;
> +    }
> +
> +    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
> +        memset(&tpr, 0, sizeof(tpr));
> +        tpr.exit_changed = 1;
> +        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
> +        if (ret == -1) {
> +            err = errno;
> +            error_report("NVMM: Failed to configure a virtual processor,"
> +                " error=%d", err);
> +            g_free(qcpu);
> +            return -err;
> +        }
> +    }
> +
> +    cpu->vcpu_dirty = true;
> +    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
> +
> +    return 0;
> +}
> +
> +int
> +nvmm_vcpu_exec(CPUState *cpu)
> +{
> +    int ret, fatal;
> +
> +    while (1) {
> +        if (cpu->exception_index >= EXCP_INTERRUPT) {
> +            ret = cpu->exception_index;
> +            cpu->exception_index = -1;
> +            break;
> +        }
> +
> +        fatal = nvmm_vcpu_loop(cpu);
> +
> +        if (fatal) {
> +            error_report("NVMM: Failed to execute a VCPU.");
> +            abort();
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +void
> +nvmm_destroy_vcpu(CPUState *cpu)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +
> +    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
> +    g_free(cpu->hax_vcpu);
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
> +    bool add, bool rom, const char *name)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    int ret, prot;
> +
> +    if (add) {
> +        prot = PROT_READ | PROT_EXEC;
> +        if (!rom) {
> +            prot |= PROT_WRITE;
> +        }
> +        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
> +    } else {
> +        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
> +    }
> +
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
> +            "Size:%p bytes, HostVA:%p, error=%d",
> +            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
> +            (void *)size, (void *)hva, errno);
> +    }
> +}
> +
> +static void
> +nvmm_process_section(MemoryRegionSection *section, int add)
> +{
> +    MemoryRegion *mr = section->mr;
> +    hwaddr start_pa = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    unsigned int delta;
> +    uintptr_t hva;
> +
> +    if (!memory_region_is_ram(mr)) {
> +        return;
> +    }
> +
> +    /* Adjust start_pa and size so that they are page-aligned. */
> +    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
> +    delta &= ~qemu_real_host_page_mask;
> +    if (delta > size) {
> +        return;
> +    }
> +    start_pa += delta;
> +    size -= delta;
> +    size &= qemu_real_host_page_mask;
> +    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
> +        return;
> +    }
> +
> +    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
> +        section->offset_within_region + delta;
> +
> +    nvmm_update_mapping(start_pa, size, hva, add,
> +        memory_region_is_rom(mr), mr->name);
> +}
> +
> +static void
> +nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
> +{
> +    memory_region_ref(section->mr);
> +    nvmm_process_section(section, 1);
> +}
> +
> +static void
> +nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
> +{
> +    nvmm_process_section(section, 0);
> +    memory_region_unref(section->mr);
> +}
> +
> +static void
> +nvmm_transaction_begin(MemoryListener *listener)
> +{
> +    /* nothing */
> +}
> +
> +static void
> +nvmm_transaction_commit(MemoryListener *listener)
> +{
> +    /* nothing */
> +}
> +
> +static void
> +nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
> +{
> +    MemoryRegion *mr = section->mr;
> +
> +    if (!memory_region_is_ram(mr)) {
> +        return;
> +    }
> +
> +    memory_region_set_dirty(mr, 0, int128_get64(section->size));
> +}
> +
> +static MemoryListener nvmm_memory_listener = {
> +    .begin = nvmm_transaction_begin,
> +    .commit = nvmm_transaction_commit,
> +    .region_add = nvmm_region_add,
> +    .region_del = nvmm_region_del,
> +    .log_sync = nvmm_log_sync,
> +    .priority = 10,
> +};
> +
> +static void
> +nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    uintptr_t hva = (uintptr_t)host;
> +    int ret;
> +
> +    ret = nvmm_hva_map(mach, hva, size);
> +
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to map HVA, HostVA:%p "
> +            "Size:%p bytes, error=%d",
> +            (void *)hva, (void *)size, errno);
> +    }
> +}
> +
> +static struct RAMBlockNotifier nvmm_ram_notifier = {
> +    .ram_block_added = nvmm_ram_block_added
> +};
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_handle_interrupt(CPUState *cpu, int mask)
> +{
> +    cpu->interrupt_request |= mask;
> +
> +    if (!qemu_cpu_is_self(cpu)) {
> +        qemu_cpu_kick(cpu);
> +    }
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static int
> +nvmm_accel_init(MachineState *ms)
> +{
> +    int ret, err;
> +
> +    ret = nvmm_init();
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Initialization failed, error=%d", errno);
> +        return -err;
> +    }
> +
> +    ret = nvmm_capability(&qemu_mach.cap);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Unable to fetch capability, error=%d", errno);
> +        return -err;
> +    }
> +    if (qemu_mach.cap.version != 1) {
> +        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
> +        return -EPROGMISMATCH;
> +    }
> +    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
> +        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
> +        return -EPROGMISMATCH;
> +    }
> +
> +    ret = nvmm_machine_create(&qemu_mach.mach);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Machine creation failed, error=%d", errno);
> +        return -err;
> +    }
> +
> +    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
> +    ram_block_notifier_add(&nvmm_ram_notifier);
> +
> +    cpu_interrupt_handler = nvmm_handle_interrupt;
> +
> +    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
> +    return 0;
> +}
> +
> +int
> +nvmm_enabled(void)
> +{
> +    return nvmm_allowed;
> +}
> +
> +static void
> +nvmm_accel_class_init(ObjectClass *oc, void *data)
> +{
> +    AccelClass *ac = ACCEL_CLASS(oc);
> +    ac->name = "NVMM";
> +    ac->init_machine = nvmm_accel_init;
> +    ac->allowed = &nvmm_allowed;
> +}
> +
> +static const TypeInfo nvmm_accel_type = {
> +    .name = ACCEL_CLASS_NAME("nvmm"),
> +    .parent = TYPE_ACCEL,
> +    .class_init = nvmm_accel_class_init,
> +};
> +
> +static void
> +nvmm_type_init(void)
> +{
> +    type_register_static(&nvmm_accel_type);
> +}
> +
> +type_init(nvmm_type_init);
> --
> 2.24.1
> 

Except the XXX comments, LGTM but I'm not a X86 guy.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-01-28 14:09   ` [PATCH v2 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
@ 2020-02-03 11:54     ` Philippe Mathieu-Daudé
  2020-02-06 10:24       ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-03 11:54 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 1/28/20 3:09 PM, Kamil Rytarowski wrote:
> From: Maxime Villard <max@m00nbsd.net>
> 
> Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
> accelerator on NetBSD platforms.
> 
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> ---
>   cpus.c                    | 58 +++++++++++++++++++++++++++++++++++++++
>   include/sysemu/hw_accel.h | 14 ++++++++++
>   target/i386/helper.c      |  2 +-
>   3 files changed, 73 insertions(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index b472378b70..3c3f63588c 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -42,6 +42,7 @@
>   #include "sysemu/hax.h"
>   #include "sysemu/hvf.h"
>   #include "sysemu/whpx.h"
> +#include "sysemu/nvmm.h"
>   #include "exec/exec-all.h"
> 
>   #include "qemu/thread.h"
> @@ -1666,6 +1667,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
>       return NULL;
>   }
> 
> +static void *qemu_nvmm_cpu_thread_fn(void *arg)
> +{
> +    CPUState *cpu = arg;
> +    int r;
> +
> +    assert(nvmm_enabled());
> +
> +    rcu_register_thread();
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_thread_get_self(cpu->thread);
> +    cpu->thread_id = qemu_get_thread_id();
> +    current_cpu = cpu;
> +
> +    r = nvmm_init_vcpu(cpu);
> +    if (r < 0) {
> +        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
> +        exit(1);
> +    }
> +
> +    /* signal CPU creation */
> +    cpu->created = true;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +
> +    do {
> +        if (cpu_can_run(cpu)) {
> +            r = nvmm_vcpu_exec(cpu);
> +            if (r == EXCP_DEBUG) {
> +                cpu_handle_guest_debug(cpu);
> +            }
> +        }
> +        qemu_wait_io_event(cpu);
> +    } while (!cpu->unplug || cpu_can_run(cpu));
> +
> +    nvmm_destroy_vcpu(cpu);
> +    cpu->created = false;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +    qemu_mutex_unlock_iothread();
> +    rcu_unregister_thread();
> +    return NULL;
> +}
> +
>   #ifdef _WIN32
>   static void CALLBACK dummy_apc_func(ULONG_PTR unused)
>   {
> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>   #endif
>   }
> 
> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
> +{
> +    char thread_name[VCPU_THREAD_NAME_SIZE];
> +
> +    cpu->thread = g_malloc0(sizeof(QemuThread));
> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));

Nitpick, we prefer g_new0().

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> +    qemu_cond_init(cpu->halt_cond);
> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
> +             cpu->cpu_index);
> +    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
> +                       cpu, QEMU_THREAD_JOINABLE);
> +}
> +
>   static void qemu_dummy_start_vcpu(CPUState *cpu)
>   {
>       char thread_name[VCPU_THREAD_NAME_SIZE];
> @@ -2069,6 +2125,8 @@ void qemu_init_vcpu(CPUState *cpu)
>           qemu_tcg_init_vcpu(cpu);
>       } else if (whpx_enabled()) {
>           qemu_whpx_start_vcpu(cpu);
> +    } else if (nvmm_enabled()) {
> +        qemu_nvmm_start_vcpu(cpu);
>       } else {
>           qemu_dummy_start_vcpu(cpu);
>       }
> diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
> index 0ec2372477..dbfa7a02f9 100644
> --- a/include/sysemu/hw_accel.h
> +++ b/include/sysemu/hw_accel.h
> @@ -15,6 +15,7 @@
>   #include "sysemu/hax.h"
>   #include "sysemu/kvm.h"
>   #include "sysemu/whpx.h"
> +#include "sysemu/nvmm.h"
> 
>   static inline void cpu_synchronize_state(CPUState *cpu)
>   {
> @@ -27,6 +28,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
>       if (whpx_enabled()) {
>           whpx_cpu_synchronize_state(cpu);
>       }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_state(cpu);
> +    }
>   }
> 
>   static inline void cpu_synchronize_post_reset(CPUState *cpu)
> @@ -40,6 +44,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
>       if (whpx_enabled()) {
>           whpx_cpu_synchronize_post_reset(cpu);
>       }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_post_reset(cpu);
> +    }
> +
>   }
> 
>   static inline void cpu_synchronize_post_init(CPUState *cpu)
> @@ -53,6 +61,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
>       if (whpx_enabled()) {
>           whpx_cpu_synchronize_post_init(cpu);
>       }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_post_init(cpu);
> +    }
>   }
> 
>   static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
> @@ -66,6 +77,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
>       if (whpx_enabled()) {
>           whpx_cpu_synchronize_pre_loadvm(cpu);
>       }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_pre_loadvm(cpu);
> +    }
>   }
> 
>   #endif /* QEMU_HW_ACCEL_H */
> diff --git a/target/i386/helper.c b/target/i386/helper.c
> index c3a6e4fabe..2e79d61329 100644
> --- a/target/i386/helper.c
> +++ b/target/i386/helper.c
> @@ -981,7 +981,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
>       X86CPU *cpu = env_archcpu(env);
>       CPUState *cs = env_cpu(env);
> 
> -    if (kvm_enabled() || whpx_enabled()) {
> +    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
>           env->tpr_access_type = access;
> 
>           cpu_interrupt(cs, CPU_INTERRUPT_TPR);
> --
> 2.24.1
> 



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-03 11:41     ` Philippe Mathieu-Daudé
@ 2020-02-03 11:56       ` Kamil Rytarowski
  2020-02-03 12:10         ` Philippe Mathieu-Daudé
  2020-03-02 17:12         ` Paolo Bonzini
  2020-03-02 17:11       ` Paolo Bonzini
  1 sibling, 2 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-03 11:56 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
>> enabled if available:
>>     hax             HAX acceleration support
>>     hvf             Hypervisor.framework acceleration support
>>     whpx            Windows Hypervisor Platform acceleration support
>> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>>     rdma            Enable RDMA-based migration
>>     pvrdma          Enable PVRDMA support
>>     vde             support for vde network
>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>>       fi
>>   fi
>>
> 
> Maybe you can add something like:
> 
> if test "$targetos" = "NetBSD"; then
>     nvmm="check"
> fi
> 
> to build by default with NVMM if available.

I will add nvmm=yes to the NetBSD) targetos check section.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-03 11:56       ` Kamil Rytarowski
@ 2020-02-03 12:10         ` Philippe Mathieu-Daudé
  2020-03-02 17:12         ` Paolo Bonzini
  1 sibling, 0 replies; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-03 12:10 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 2/3/20 12:56 PM, Kamil Rytarowski wrote:
> On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
>>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
>>> enabled if available:
>>>      hax             HAX acceleration support
>>>      hvf             Hypervisor.framework acceleration support
>>>      whpx            Windows Hypervisor Platform acceleration support
>>> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>>>      rdma            Enable RDMA-based migration
>>>      pvrdma          Enable PVRDMA support
>>>      vde             support for vde network
>>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>>>        fi
>>>    fi
>>>
>>
>> Maybe you can add something like:
>>
>> if test "$targetos" = "NetBSD"; then
>>      nvmm="check"
>> fi
>>
>> to build by default with NVMM if available.
> 
> I will add nvmm=yes to the NetBSD) targetos check section.

Ah yes, clever :)



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 3/4] Introduce the NVMM impl
  2020-02-03 11:51     ` Philippe Mathieu-Daudé
@ 2020-02-05 17:22       ` Kamil Rytarowski
  2020-02-05 17:47       ` Maxime Villard
  1 sibling, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-05 17:22 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 03.02.2020 12:51, Philippe Mathieu-Daudé wrote:
> Except the XXX comments, LGTM but I'm not a X86 guy.
>
>

These comments were old and I will drop them and resubmit.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 3/4] Introduce the NVMM impl
  2020-02-03 11:51     ` Philippe Mathieu-Daudé
  2020-02-05 17:22       ` Kamil Rytarowski
@ 2020-02-05 17:47       ` Maxime Villard
  1 sibling, 0 replies; 79+ messages in thread
From: Maxime Villard @ 2020-02-05 17:47 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell
  Cc: qemu-devel

Hi

Le 03/02/2020 à 12:51, Philippe Mathieu-Daudé a écrit :
>> +static void
>> +nvmm_io_callback(struct nvmm_io *io)
>> +{
>> +    MemTxAttrs attrs = { 0 };
>> +    int ret;
>> +
>> +    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
>> +        io->size, !io->in);
>> +    if (ret != MEMTX_OK) {
>> +        error_report("NVMM: I/O Transaction Failed "
>> +            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
>> +            io->port, io->size);
>> +    }
>> +
>> +    /* XXX Needed, otherwise infinite loop. */
> 
> This seems OK, why the XXX in comment?
> 
>> +    current_cpu->vcpu_dirty = false;
>> +}

Because the other implementations don't do that and avoid the infinite loop
somehow. I didn't completely understand why, so I left an XXX.

>> +static int
>> +nvmm_vcpu_loop(CPUState *cpu)
>> +{
>> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
>> +    struct nvmm_machine *mach = get_nvmm_mach();
>> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
>> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
>> +    X86CPU *x86_cpu = X86_CPU(cpu);
>> +    struct nvmm_vcpu_exit *exit = vcpu->exit;
>> +    int ret;
>> +
>> +    /*
>> +     * Some asynchronous events must be handled outside of the inner
>> +     * VCPU loop. They are handled here.
>> +     */
>> +    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
>> +        nvmm_cpu_synchronize_state(cpu);
>> +        do_cpu_init(x86_cpu);
>> +        /* XXX: reset the INT/NMI windows */
> 
> What is the problem?

The int/nmi windows are not set back to the reset state. Not complicated
to do but I never got around to doing it. This can easily be addressed
in a future patch.

Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-03 11:54     ` Philippe Mathieu-Daudé
@ 2020-02-06 10:24       ` Kamil Rytarowski
  2020-02-06 12:18         ` Philippe Mathieu-Daudé
  2020-02-06 13:06         ` Markus Armbruster
  0 siblings, 2 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 10:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>   #endif
>>   }
>>
>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>> +{
>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>> +
>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>
> Nitpick, we prefer g_new0().

In this file other qemu_*_start_vcpu() use  g_malloc0().

I will leave this part unchanged and defer tor future style fixups if
someone is interested.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                     ` (4 preceding siblings ...)
  2020-02-03  9:52   ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-02-06 11:57   ` Kamil Rytarowski
  2020-02-06 11:57     ` [PATCH v3 1/4] Add the NVMM vcpu API Kamil Rytarowski
                       ` (5 more replies)
  5 siblings, 6 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 11:57 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max
  Cc: Kamil Rytarowski, qemu-devel

Hello QEMU Community!

Over the past year the NetBSD team has been working hard on a new user-mode API
for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
This new API adds user-mode capabilities to create and manage virtual machines,
configure memory mappings for guest machines, and create and control execution
of virtual processors.

With this new API we are now able to bring our hypervisor to the QEMU
community! The following patches implement the NetBSD Virtual Machine Monitor
accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.

When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
accelerator for use. At runtime using the '-accel nvmm' should see a
significant performance improvement over emulation, much like when using 'hax'
on NetBSD.

The documentation for this new API is visible at https://man.netbsd.org under
the libnvmm(3) and nvmm(4) pages.

NVMM was designed and implemented by Maxime Villard.

Thank you for your feedback.

Refrences:
https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html

Test plan:

1. Download a NetBSD 9.0 pre-release snapshot:
http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso

2. Install it natively on a not too old x86_64 hardware (Intel or AMD).

There is no support for nested virtualization in NVMM.

3. Setup the system.

 export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
 pkg_add git gmake python37 glib2 bison pkgconf pixman

Install mozilla-rootcerts and follow post-install instructions.

 pkg_add mozilla-rootcerts

More information: https://wiki.qemu.org/Hosts/BSD#NetBSD

4. Build qemu

 mkdir build
 cd build
 ../configure --python=python3.7
 gmake
 gmake check

5. Test

 qemu -accel nvmm ...


History:
v2 -> v3:
 - Register nvmm in targetos NetBSD check
 - Stop including hw/boards.h
 - Rephrase old code comments (remove XXX)
v1 -> v2:
 - Included the testing plan as requested by Philippe Mathieu-Daude
 - Formatting nit fix in qemu-options.hx
 - Document NVMM in the accel section of qemu-options.hx

Maxime Villard (4):
  Add the NVMM vcpu API
  Add the NetBSD Virtual Machine Monitor accelerator.
  Introduce the NVMM impl
  Add the NVMM acceleration enlightenments

 accel/stubs/Makefile.objs |    1 +
 accel/stubs/nvmm-stub.c   |   43 ++
 configure                 |   37 ++
 cpus.c                    |   58 ++
 include/sysemu/hw_accel.h |   14 +
 include/sysemu/nvmm.h     |   35 ++
 qemu-options.hx           |   16 +-
 target/i386/Makefile.objs |    1 +
 target/i386/helper.c      |    2 +-
 target/i386/nvmm-all.c    | 1221 +++++++++++++++++++++++++++++++++++++
 10 files changed, 1419 insertions(+), 9 deletions(-)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h
 create mode 100644 target/i386/nvmm-all.c

--
2.25.0


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v3 1/4] Add the NVMM vcpu API
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
@ 2020-02-06 11:57     ` Kamil Rytarowski
  2020-02-06 21:06       ` Jared McNeill
  2020-02-06 11:57     ` [PATCH v3 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 11:57 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
introduces the nvmm.h sysemu API for managing the vcpu scheduling and
management.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 accel/stubs/Makefile.objs |  1 +
 accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h

diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
index 3894caf95d..09f2d3e1dd 100644
--- a/accel/stubs/Makefile.objs
+++ b/accel/stubs/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
 obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
 obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
+obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
 obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
 obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
new file mode 100644
index 0000000000..c2208b84a3
--- /dev/null
+++ b/accel/stubs/nvmm-stub.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/nvmm.h"
+
+int nvmm_init_vcpu(CPUState *cpu)
+{
+    return -1;
+}
+
+int nvmm_vcpu_exec(CPUState *cpu)
+{
+    return -1;
+}
+
+void nvmm_destroy_vcpu(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
new file mode 100644
index 0000000000..10496f3980
--- /dev/null
+++ b/include/sysemu/nvmm.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVMM_H
+#define QEMU_NVMM_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int nvmm_init_vcpu(CPUState *);
+int nvmm_vcpu_exec(CPUState *);
+void nvmm_destroy_vcpu(CPUState *);
+
+void nvmm_cpu_synchronize_state(CPUState *);
+void nvmm_cpu_synchronize_post_reset(CPUState *);
+void nvmm_cpu_synchronize_post_init(CPUState *);
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
+
+#ifdef CONFIG_NVMM
+
+int nvmm_enabled(void);
+
+#else /* CONFIG_NVMM */
+
+#define nvmm_enabled() (0)
+
+#endif /* CONFIG_NVMM */
+
+#endif /* CONFIG_NVMM */
--
2.25.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v3 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
  2020-02-06 11:57     ` [PATCH v3 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-02-06 11:57     ` Kamil Rytarowski
  2020-02-06 21:06       ` Jared McNeill
  2020-02-06 11:57     ` [PATCH v3 3/4] Introduce the NVMM impl Kamil Rytarowski
                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 11:57 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Introduces the configure support for the new NetBSD Virtual Machine Monitor that
allows for hypervisor acceleration from usermode components on the NetBSD
platform.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 configure       | 37 +++++++++++++++++++++++++++++++++++++
 qemu-options.hx | 16 ++++++++--------
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index 115dc38085..d4a837cf9d 100755
--- a/configure
+++ b/configure
@@ -241,6 +241,17 @@ supported_whpx_target() {
     return 1
 }

+supported_nvmm_target() {
+    test "$nvmm" = "yes" || return 1
+    glob "$1" "*-softmmu" || return 1
+    case "${1%-softmmu}" in
+        i386|x86_64)
+            return 0
+        ;;
+    esac
+    return 1
+}
+
 supported_target() {
     case "$1" in
         *-softmmu)
@@ -268,6 +279,7 @@ supported_target() {
     supported_hax_target "$1" && return 0
     supported_hvf_target "$1" && return 0
     supported_whpx_target "$1" && return 0
+    supported_nvmm_target "$1" && return 0
     print_error "TCG disabled, but hardware accelerator not available for '$target'"
     return 1
 }
@@ -388,6 +400,7 @@ kvm="no"
 hax="no"
 hvf="no"
 whpx="no"
+nvmm="no"
 rdma=""
 pvrdma=""
 gprof="no"
@@ -823,6 +836,7 @@ DragonFly)
 NetBSD)
   bsd="yes"
   hax="yes"
+  nvmm="yes"
   make="${MAKE-gmake}"
   audio_drv_list="oss try-sdl"
   audio_possible_drivers="oss sdl"
@@ -1169,6 +1183,10 @@ for opt do
   ;;
   --enable-whpx) whpx="yes"
   ;;
+  --disable-nvmm) nvmm="no"
+  ;;
+  --enable-nvmm) nvmm="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1773,6 +1791,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   hax             HAX acceleration support
   hvf             Hypervisor.framework acceleration support
   whpx            Windows Hypervisor Platform acceleration support
+  nvmm            NetBSD Virtual Machine Monitor acceleration support
   rdma            Enable RDMA-based migration
   pvrdma          Enable PVRDMA support
   vde             support for vde network
@@ -2764,6 +2783,20 @@ if test "$whpx" != "no" ; then
     fi
 fi

+##########################################
+# NetBSD Virtual Machine Monitor (NVMM) accelerator check
+if test "$nvmm" != "no" ; then
+    if check_include "nvmm.h" ; then
+        nvmm="yes"
+	LIBS="-lnvmm $LIBS"
+    else
+        if test "$nvmm" = "yes"; then
+            feature_not_found "NVMM" "NVMM is not available"
+        fi
+        nvmm="no"
+    fi
+fi
+
 ##########################################
 # Sparse probe
 if test "$sparse" != "no" ; then
@@ -6543,6 +6576,7 @@ echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
 echo "WHPX support      $whpx"
+echo "NVMM support      $nvmm"
 echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
@@ -7828,6 +7862,9 @@ fi
 if test "$target_aligned_only" = "yes" ; then
   echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
 fi
+if supported_nvmm_target $target; then
+    echo "CONFIG_NVMM=y" >> $config_target_mak
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/qemu-options.hx b/qemu-options.hx
index 224a8e8712..10c046c916 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "-machine [type=]name[,prop[=value][,...]]\n"
     "                selects emulated machine ('-machine help' for list)\n"
     "                property accel=accel1[:accel2[:...]] selects accelerator\n"
-    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
+    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
@@ -64,9 +64,9 @@ Supported machine properties are:
 @table @option
 @item accel=@var{accels1}[:@var{accels2}[:...]]
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
-more than one accelerator specified, the next one is used if the previous one
-fails to initialize.
+kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
+If there is more than one accelerator specified, the next one is used if the
+previous one fails to initialize.
 @item vmport=on|off|auto
 Enables emulation of VMWare IO port, for vmmouse etc. auto says to select the
 value based on accel. For accel=xen the default is off otherwise the default
@@ -114,7 +114,7 @@ ETEXI

 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "-accel [accel=]accelerator[,prop[=value][,...]]\n"
-    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
+    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
     "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
     "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
@@ -124,9 +124,9 @@ STEXI
 @item -accel @var{name}[,prop=@var{value}[,...]]
 @findex -accel
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
-more than one accelerator specified, the next one is used if the previous one
-fails to initialize.
+kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
+If there is more than one accelerator specified, the next one is used if the
+previous one fails to initialize.
 @table @option
 @item igd-passthru=on|off
 When Xen is in use, this option controls whether Intel integrated graphics
--
2.25.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v3 3/4] Introduce the NVMM impl
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
  2020-02-06 11:57     ` [PATCH v3 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-02-06 11:57     ` [PATCH v3 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-02-06 11:57     ` Kamil Rytarowski
  2020-02-06 21:07       ` Jared McNeill
  2020-02-06 11:57     ` [PATCH v3 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 11:57 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
QEMU much greater speed over the emulated x86_64 path's that are taken on
NetBSD today.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
---
 target/i386/Makefile.objs |    1 +
 target/i386/nvmm-all.c    | 1221 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1222 insertions(+)
 create mode 100644 target/i386/nvmm-all.c

diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 48e0c28434..bdcdb32e93 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -17,6 +17,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_NVMM) += nvmm-all.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
new file mode 100644
index 0000000000..6988400f53
--- /dev/null
+++ b/target/i386/nvmm-all.c
@@ -0,0 +1,1221 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "exec/ioport.h"
+#include "qemu-common.h"
+#include "strings.h"
+#include "sysemu/accel.h"
+#include "sysemu/nvmm.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#include <nvmm.h>
+
+struct qemu_vcpu {
+    struct nvmm_vcpu vcpu;
+    uint8_t tpr;
+    bool stop;
+
+    /* Window-exiting for INTs/NMIs. */
+    bool int_window_exit;
+    bool nmi_window_exit;
+
+    /* The guest is in an interrupt shadow (POP SS, etc). */
+    bool int_shadow;
+};
+
+struct qemu_machine {
+    struct nvmm_capability cap;
+    struct nvmm_machine mach;
+};
+
+/* -------------------------------------------------------------------------- */
+
+static bool nvmm_allowed;
+static struct qemu_machine qemu_mach;
+
+static struct qemu_vcpu *
+get_qemu_vcpu(CPUState *cpu)
+{
+    return (struct qemu_vcpu *)cpu->hax_vcpu;
+}
+
+static struct nvmm_machine *
+get_nvmm_mach(void)
+{
+    return &qemu_mach.mach;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
+{
+    uint32_t attrib = qseg->flags;
+
+    nseg->selector = qseg->selector;
+    nseg->limit = qseg->limit;
+    nseg->base = qseg->base;
+    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
+    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
+    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
+    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
+    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
+    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
+    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
+    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
+}
+
+static void
+nvmm_set_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* GPRs. */
+    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
+    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
+    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
+    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
+    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
+    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
+    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
+    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
+    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
+    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
+    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
+    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
+    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
+    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
+    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
+    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
+
+    /* RIP and RFLAGS. */
+    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
+    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
+
+    /* Segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
+
+    /* Special segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
+
+    /* Control registers. */
+    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
+    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
+    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
+    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
+    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
+
+    /* Debug registers. */
+    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
+    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
+    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
+    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
+    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
+    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
+
+    /* FPU. */
+    state->fpu.fx_cw = env->fpuc;
+    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+    state->fpu.fx_tw = 0;
+    for (i = 0; i < 8; i++) {
+        state->fpu.fx_tw |= (!env->fptags[i]) << i;
+    }
+    state->fpu.fx_opcode = env->fpop;
+    state->fpu.fx_ip.fa_64 = env->fpip;
+    state->fpu.fx_dp.fa_64 = env->fpdp;
+    state->fpu.fx_mxcsr = env->mxcsr;
+    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
+            &env->xmm_regs[i].ZMM_Q(0), 8);
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
+            &env->xmm_regs[i].ZMM_Q(1), 8);
+    }
+
+    /* MSRs. */
+    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
+    state->msrs[NVMM_X64_MSR_STAR] = env->star;
+#ifdef TARGET_X86_64
+    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
+    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
+    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
+    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
+#endif
+    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
+    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
+    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
+    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
+    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to set virtual processor context,"
+            " error=%d", errno);
+    }
+}
+
+static void
+nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
+{
+    qseg->selector = nseg->selector;
+    qseg->limit = nseg->limit;
+    qseg->base = nseg->base;
+
+    qseg->flags =
+        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
+}
+
+static void
+nvmm_get_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap, tpr;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to get virtual processor context,"
+            " error=%d", errno);
+    }
+
+    /* GPRs. */
+    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
+    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
+    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
+    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
+    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
+    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
+    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
+    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
+    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
+    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
+    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
+    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
+    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
+    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
+    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
+    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
+
+    /* RIP and RFLAGS. */
+    env->eip = state->gprs[NVMM_X64_GPR_RIP];
+    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
+
+    /* Segments. */
+    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
+    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
+    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
+    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
+    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
+    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
+
+    /* Special segments. */
+    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
+    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
+    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
+    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
+
+    /* Control registers. */
+    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
+    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
+    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
+    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
+    tpr = state->crs[NVMM_X64_CR_CR8];
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+    }
+    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
+
+    /* Debug registers. */
+    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
+    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
+    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
+    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
+    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
+    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
+
+    /* FPU. */
+    env->fpuc = state->fpu.fx_cw;
+    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
+    env->fpus = state->fpu.fx_sw & ~0x3800;
+    for (i = 0; i < 8; i++) {
+        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
+    }
+    env->fpop = state->fpu.fx_opcode;
+    env->fpip = state->fpu.fx_ip.fa_64;
+    env->fpdp = state->fpu.fx_dp.fa_64;
+    env->mxcsr = state->fpu.fx_mxcsr;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&env->xmm_regs[i].ZMM_Q(0),
+            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
+        memcpy(&env->xmm_regs[i].ZMM_Q(1),
+            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
+    }
+
+    /* MSRs. */
+    env->efer = state->msrs[NVMM_X64_MSR_EFER];
+    env->star = state->msrs[NVMM_X64_MSR_STAR];
+#ifdef TARGET_X86_64
+    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
+    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
+    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
+    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
+#endif
+    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
+    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
+    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
+    env->pat = state->msrs[NVMM_X64_MSR_PAT];
+    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
+
+    x86_update_hflags(env);
+}
+
+static bool
+nvmm_can_take_int(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_machine *mach = get_nvmm_mach();
+
+    if (qcpu->int_window_exit) {
+        return false;
+    }
+
+    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
+        struct nvmm_x64_state *state = vcpu->state;
+
+        /* Exit on interrupt window. */
+        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
+        state->intr.int_window_exiting = 1;
+        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
+
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+nvmm_can_take_nmi(CPUState *cpu)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    /*
+     * Contrary to INTs, NMIs always schedule an exit when they are
+     * completed. Therefore, if window-exiting is enabled, it means
+     * NMIs are blocked.
+     */
+    if (qcpu->nmi_window_exit) {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Called before the VCPU is run. We inject events generated by the I/O
+ * thread, and synchronize the guest TPR.
+ */
+static void
+nvmm_vcpu_pre_run(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    struct nvmm_vcpu_event *event = vcpu->event;
+    bool has_event = false;
+    bool sync_tpr = false;
+    uint8_t tpr;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+
+    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        sync_tpr = true;
+    }
+
+    /*
+     * Force the VCPU out of its inner loop to process any INIT requests
+     * or commit pending TPR access.
+     */
+    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+        cpu->exit_request = 1;
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        if (nvmm_can_take_nmi(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = 2;
+            has_event = true;
+        }
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+        if (nvmm_can_take_int(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = cpu_get_pic_interrupt(env);
+            has_event = true;
+        }
+    }
+
+    /* Don't want SMIs. */
+    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
+    }
+
+    if (sync_tpr) {
+        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to get CPU state,"
+                " error=%d", errno);
+        }
+
+        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+
+        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to set CPU state,"
+                " error=%d", errno);
+        }
+    }
+
+    if (has_event) {
+        ret = nvmm_vcpu_inject(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to inject event,"
+                " error=%d", errno);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+}
+
+/*
+ * Called after the VCPU ran. We synchronize the host view of the TPR and
+ * RFLAGS.
+ */
+static void
+nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    uint64_t tpr;
+
+    env->eflags = exit->exitstate.rflags;
+    qcpu->int_shadow = exit->exitstate.int_shadow;
+    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
+    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
+
+    tpr = exit->exitstate.cr8;
+    if (qcpu->tpr != tpr) {
+        qcpu->tpr = tpr;
+        qemu_mutex_lock_iothread();
+        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_io_callback(struct nvmm_io *io)
+{
+    MemTxAttrs attrs = { 0 };
+    int ret;
+
+    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
+        io->size, !io->in);
+    if (ret != MEMTX_OK) {
+        error_report("NVMM: I/O Transaction Failed "
+            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
+            io->port, io->size);
+    }
+
+    /* Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static void
+nvmm_mem_callback(struct nvmm_mem *mem)
+{
+    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static struct nvmm_assist_callbacks nvmm_callbacks = {
+    .io = nvmm_io_callback,
+    .mem = nvmm_mem_callback
+};
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_mem(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: Mem Assist Failed [gpa=%p]",
+            (void *)vcpu->exit->u.mem.gpa);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_io(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: I/O Assist Failed [port=%d]",
+            (int)vcpu->exit->u.io.port);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    switch (exit->u.rdmsr.msr) {
+    case MSR_IA32_APICBASE:
+        val = cpu_get_apic_base(x86_cpu->apic_state);
+        break;
+    case MSR_MTRRcap:
+    case MSR_MTRRdefType:
+    case MSR_MCG_CAP:
+    case MSR_MCG_STATUS:
+        val = 0;
+        break;
+    default: /* More MSRs to add? */
+        val = 0;
+        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
+            exit->u.rdmsr.msr);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
+    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    val = exit->u.wrmsr.val;
+
+    switch (exit->u.wrmsr.msr) {
+    case MSR_IA32_APICBASE:
+        cpu_set_apic_base(x86_cpu->apic_state, val);
+        break;
+    case MSR_MTRRdefType:
+    case MSR_MCG_STATUS:
+        break;
+    default: /* More MSRs to add? */
+        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
+            exit->u.wrmsr.msr, val);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+          (env->eflags & IF_MASK)) &&
+        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->exception_index = EXCP_HLT;
+        cpu->halted = true;
+        ret = 1;
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
+static int
+nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    struct nvmm_vcpu_event *event = vcpu->event;
+
+    event->type = NVMM_VCPU_EVENT_EXCP;
+    event->vector = 6;
+    event->u.excp.error = 0;
+
+    return nvmm_vcpu_inject(mach, vcpu);
+}
+
+static int
+nvmm_vcpu_loop(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_vcpu_exit *exit = vcpu->exit;
+    int ret;
+
+    /*
+     * Some asynchronous events must be handled outside of the inner
+     * VCPU loop. They are handled here.
+     */
+    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_init(x86_cpu);
+        /* set int/nmi windows back to the reset state */
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
+        apic_poll_irq(x86_cpu->apic_state);
+    }
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = false;
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_sipi(x86_cpu);
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        nvmm_cpu_synchronize_state(cpu);
+        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
+            env->tpr_access_type);
+    }
+
+    if (cpu->halted) {
+        cpu->exception_index = EXCP_HLT;
+        atomic_set(&cpu->exit_request, false);
+        return 0;
+    }
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    /*
+     * Inner VCPU loop.
+     */
+    do {
+        if (cpu->vcpu_dirty) {
+            nvmm_set_registers(cpu);
+            cpu->vcpu_dirty = false;
+        }
+
+        if (qcpu->stop) {
+            cpu->exception_index = EXCP_INTERRUPT;
+            qcpu->stop = false;
+            ret = 1;
+            break;
+        }
+
+        nvmm_vcpu_pre_run(cpu);
+
+        if (atomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        ret = nvmm_vcpu_run(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to exec a virtual processor,"
+                " error=%d", errno);
+            break;
+        }
+
+        nvmm_vcpu_post_run(cpu, exit);
+
+        switch (exit->reason) {
+        case NVMM_VCPU_EXIT_NONE:
+            break;
+        case NVMM_VCPU_EXIT_MEMORY:
+            ret = nvmm_handle_mem(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_IO:
+            ret = nvmm_handle_io(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_INT_READY:
+        case NVMM_VCPU_EXIT_NMI_READY:
+        case NVMM_VCPU_EXIT_TPR_CHANGED:
+            break;
+        case NVMM_VCPU_EXIT_HALTED:
+            ret = nvmm_handle_halted(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_SHUTDOWN:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            cpu->exception_index = EXCP_INTERRUPT;
+            ret = 1;
+            break;
+        case NVMM_VCPU_EXIT_RDMSR:
+            ret = nvmm_handle_rdmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_WRMSR:
+            ret = nvmm_handle_wrmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_MONITOR:
+        case NVMM_VCPU_EXIT_MWAIT:
+            ret = nvmm_inject_ud(mach, vcpu);
+            break;
+        default:
+            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
+                exit->reason, exit->u.inv.hwcode);
+            nvmm_get_registers(cpu);
+            qemu_mutex_lock_iothread();
+            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+            qemu_mutex_unlock_iothread();
+            ret = -1;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+    current_cpu = cpu;
+
+    atomic_set(&cpu->exit_request, false);
+
+    return ret < 0;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_get_registers(cpu);
+    cpu->vcpu_dirty = true;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static Error *nvmm_migration_blocker;
+
+static void
+nvmm_ipi_signal(int sigcpu)
+{
+    struct qemu_vcpu *qcpu;
+
+    if (current_cpu) {
+        qcpu = get_qemu_vcpu(current_cpu);
+        qcpu->stop = true;
+    }
+}
+
+static void
+nvmm_init_cpu_signals(void)
+{
+    struct sigaction sigact;
+    sigset_t set;
+
+    /* Install the IPI handler. */
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = nvmm_ipi_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    /* Allow IPIs on the current thread. */
+    sigprocmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
+int
+nvmm_init_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct nvmm_vcpu_conf_cpuid cpuid;
+    struct nvmm_vcpu_conf_tpr tpr;
+    Error *local_error = NULL;
+    struct qemu_vcpu *qcpu;
+    int ret, err;
+
+    nvmm_init_cpu_signals();
+
+    if (nvmm_migration_blocker == NULL) {
+        error_setg(&nvmm_migration_blocker,
+            "NVMM: Migration not supported");
+
+        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
+        if (local_error) {
+            error_report_err(local_error);
+            migrate_del_blocker(nvmm_migration_blocker);
+            error_free(nvmm_migration_blocker);
+            return -EINVAL;
+        }
+    }
+
+    qcpu = g_malloc0(sizeof(*qcpu));
+    if (qcpu == NULL) {
+        error_report("NVMM: Failed to allocate VCPU context.");
+        return -ENOMEM;
+    }
+
+    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to create a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    memset(&cpuid, 0, sizeof(cpuid));
+    cpuid.mask = 1;
+    cpuid.leaf = 0x00000001;
+    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
+        &cpuid);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
+        &nvmm_callbacks);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
+        memset(&tpr, 0, sizeof(tpr));
+        tpr.exit_changed = 1;
+        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
+        if (ret == -1) {
+            err = errno;
+            error_report("NVMM: Failed to configure a virtual processor,"
+                " error=%d", err);
+            g_free(qcpu);
+            return -err;
+        }
+    }
+
+    cpu->vcpu_dirty = true;
+    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
+
+    return 0;
+}
+
+int
+nvmm_vcpu_exec(CPUState *cpu)
+{
+    int ret, fatal;
+
+    while (1) {
+        if (cpu->exception_index >= EXCP_INTERRUPT) {
+            ret = cpu->exception_index;
+            cpu->exception_index = -1;
+            break;
+        }
+
+        fatal = nvmm_vcpu_loop(cpu);
+
+        if (fatal) {
+            error_report("NVMM: Failed to execute a VCPU.");
+            abort();
+        }
+    }
+
+    return ret;
+}
+
+void
+nvmm_destroy_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
+    g_free(cpu->hax_vcpu);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
+    bool add, bool rom, const char *name)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    int ret, prot;
+
+    if (add) {
+        prot = PROT_READ | PROT_EXEC;
+        if (!rom) {
+            prot |= PROT_WRITE;
+        }
+        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
+    } else {
+        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
+    }
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
+            "Size:%p bytes, HostVA:%p, error=%d",
+            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
+            (void *)size, (void *)hva, errno);
+    }
+}
+
+static void
+nvmm_process_section(MemoryRegionSection *section, int add)
+{
+    MemoryRegion *mr = section->mr;
+    hwaddr start_pa = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned int delta;
+    uintptr_t hva;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    /* Adjust start_pa and size so that they are page-aligned. */
+    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
+    delta &= ~qemu_real_host_page_mask;
+    if (delta > size) {
+        return;
+    }
+    start_pa += delta;
+    size -= delta;
+    size &= qemu_real_host_page_mask;
+    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
+        return;
+    }
+
+    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
+        section->offset_within_region + delta;
+
+    nvmm_update_mapping(start_pa, size, hva, add,
+        memory_region_is_rom(mr), mr->name);
+}
+
+static void
+nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    memory_region_ref(section->mr);
+    nvmm_process_section(section, 1);
+}
+
+static void
+nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
+{
+    nvmm_process_section(section, 0);
+    memory_region_unref(section->mr);
+}
+
+static void
+nvmm_transaction_begin(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_transaction_commit(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    memory_region_set_dirty(mr, 0, int128_get64(section->size));
+}
+
+static MemoryListener nvmm_memory_listener = {
+    .begin = nvmm_transaction_begin,
+    .commit = nvmm_transaction_commit,
+    .region_add = nvmm_region_add,
+    .region_del = nvmm_region_del,
+    .log_sync = nvmm_log_sync,
+    .priority = 10,
+};
+
+static void
+nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    uintptr_t hva = (uintptr_t)host;
+    int ret;
+
+    ret = nvmm_hva_map(mach, hva, size);
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to map HVA, HostVA:%p "
+            "Size:%p bytes, error=%d",
+            (void *)hva, (void *)size, errno);
+    }
+}
+
+static struct RAMBlockNotifier nvmm_ram_notifier = {
+    .ram_block_added = nvmm_ram_block_added
+};
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_handle_interrupt(CPUState *cpu, int mask)
+{
+    cpu->interrupt_request |= mask;
+
+    if (!qemu_cpu_is_self(cpu)) {
+        qemu_cpu_kick(cpu);
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_accel_init(MachineState *ms)
+{
+    int ret, err;
+
+    ret = nvmm_init();
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Initialization failed, error=%d", errno);
+        return -err;
+    }
+
+    ret = nvmm_capability(&qemu_mach.cap);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Unable to fetch capability, error=%d", errno);
+        return -err;
+    }
+    if (qemu_mach.cap.version != 1) {
+        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
+        return -EPROGMISMATCH;
+    }
+    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
+        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
+        return -EPROGMISMATCH;
+    }
+
+    ret = nvmm_machine_create(&qemu_mach.mach);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Machine creation failed, error=%d", errno);
+        return -err;
+    }
+
+    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
+    ram_block_notifier_add(&nvmm_ram_notifier);
+
+    cpu_interrupt_handler = nvmm_handle_interrupt;
+
+    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
+    return 0;
+}
+
+int
+nvmm_enabled(void)
+{
+    return nvmm_allowed;
+}
+
+static void
+nvmm_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "NVMM";
+    ac->init_machine = nvmm_accel_init;
+    ac->allowed = &nvmm_allowed;
+}
+
+static const TypeInfo nvmm_accel_type = {
+    .name = ACCEL_CLASS_NAME("nvmm"),
+    .parent = TYPE_ACCEL,
+    .class_init = nvmm_accel_class_init,
+};
+
+static void
+nvmm_type_init(void)
+{
+    type_register_static(&nvmm_accel_type);
+}
+
+type_init(nvmm_type_init);
--
2.25.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v3 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
                       ` (2 preceding siblings ...)
  2020-02-06 11:57     ` [PATCH v3 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-02-06 11:57     ` Kamil Rytarowski
  2020-02-06 21:07       ` Jared McNeill
  2020-02-06 13:13     ` [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator no-reply
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
  5 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 11:57 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
accelerator on NetBSD platforms.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 cpus.c                    | 58 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/hw_accel.h | 14 ++++++++++
 target/i386/helper.c      |  2 +-
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index b4f8b84b61..f833da4a60 100644
--- a/cpus.c
+++ b/cpus.c
@@ -42,6 +42,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"
 #include "exec/exec-all.h"

 #include "qemu/thread.h"
@@ -1670,6 +1671,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     return NULL;
 }

+static void *qemu_nvmm_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    assert(nvmm_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = nvmm_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = nvmm_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    nvmm_destroy_vcpu(cpu);
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 #ifdef _WIN32
 static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 {
@@ -2038,6 +2081,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }

+static void qemu_nvmm_start_vcpu(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
 static void qemu_dummy_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -2078,6 +2134,8 @@ void qemu_init_vcpu(CPUState *cpu)
         qemu_tcg_init_vcpu(cpu);
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
+    } else if (nvmm_enabled()) {
+        qemu_nvmm_start_vcpu(cpu);
     } else {
         qemu_dummy_start_vcpu(cpu);
     }
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index 0ec2372477..dbfa7a02f9 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -15,6 +15,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/kvm.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"

 static inline void cpu_synchronize_state(CPUState *cpu)
 {
@@ -27,6 +28,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_state(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_state(cpu);
+    }
 }

 static inline void cpu_synchronize_post_reset(CPUState *cpu)
@@ -40,6 +44,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_reset(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_reset(cpu);
+    }
+
 }

 static inline void cpu_synchronize_post_init(CPUState *cpu)
@@ -53,6 +61,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_init(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_init(cpu);
+    }
 }

 static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
@@ -66,6 +77,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_pre_loadvm(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_pre_loadvm(cpu);
+    }
 }

 #endif /* QEMU_HW_ACCEL_H */
diff --git a/target/i386/helper.c b/target/i386/helper.c
index c3a6e4fabe..2e79d61329 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -981,7 +981,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
     X86CPU *cpu = env_archcpu(env);
     CPUState *cs = env_cpu(env);

-    if (kvm_enabled() || whpx_enabled()) {
+    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
         env->tpr_access_type = access;

         cpu_interrupt(cs, CPU_INTERRUPT_TPR);
--
2.25.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 10:24       ` Kamil Rytarowski
@ 2020-02-06 12:18         ` Philippe Mathieu-Daudé
  2020-02-06 13:06         ` Markus Armbruster
  1 sibling, 0 replies; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-06 12:18 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, pbonzini, peter.maydell, max
  Cc: qemu-devel

On 2/6/20 11:24 AM, Kamil Rytarowski wrote:
> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>>    #endif
>>>    }
>>>
>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>> +{
>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>> +
>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>
>> Nitpick, we prefer g_new0().
> 
> In this file other qemu_*_start_vcpu() use  g_malloc0().
> 
> I will leave this part unchanged and defer tor future style fixups if
> someone is interested.

Fair enough.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 10:24       ` Kamil Rytarowski
  2020-02-06 12:18         ` Philippe Mathieu-Daudé
@ 2020-02-06 13:06         ` Markus Armbruster
  2020-02-06 13:09           ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 79+ messages in thread
From: Markus Armbruster @ 2020-02-06 13:06 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, ehabkost, slp, qemu-devel, pbonzini,
	Philippe Mathieu-Daudé,
	max, rth

Kamil Rytarowski <n54@gmx.com> writes:

> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>>   #endif
>>>   }
>>>
>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>> +{
>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>> +
>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>
>> Nitpick, we prefer g_new0().
>
> In this file other qemu_*_start_vcpu() use  g_malloc0().
>
> I will leave this part unchanged and defer tor future style fixups if
> someone is interested.

Time to re-run Coccinelle with the semantic patch from commit
b45c03f585e.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 13:06         ` Markus Armbruster
@ 2020-02-06 13:09           ` Philippe Mathieu-Daudé
  2020-02-06 13:31             ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-06 13:09 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, QEMU Developers,
	Paolo Bonzini, Kamil Rytarowski, max, Richard Henderson

On Thu, Feb 6, 2020 at 2:06 PM Markus Armbruster <armbru@redhat.com> wrote:
> Kamil Rytarowski <n54@gmx.com> writes:
>
> > On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
> >>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
> >>>   #endif
> >>>   }
> >>>
> >>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
> >>> +{
> >>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
> >>> +
> >>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
> >>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> >>
> >> Nitpick, we prefer g_new0().
> >
> > In this file other qemu_*_start_vcpu() use  g_malloc0().
> >
> > I will leave this part unchanged and defer tor future style fixups if
> > someone is interested.
>
> Time to re-run Coccinelle with the semantic patch from commit
> b45c03f585e.

I thought about it, but then noticed it would be clever to modify
checkpatch to refuse 'g_malloc0?(.*sizeof.*);'



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
                       ` (3 preceding siblings ...)
  2020-02-06 11:57     ` [PATCH v3 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
@ 2020-02-06 13:13     ` no-reply
  2020-02-06 13:21       ` Kamil Rytarowski
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
  5 siblings, 1 reply; 79+ messages in thread
From: no-reply @ 2020-02-06 13:13 UTC (permalink / raw)
  To: n54
  Cc: peter.maydell, ehabkost, slp, qemu-devel, pbonzini, n54, philmd,
	max, rth

Patchew URL: https://patchew.org/QEMU/20200206115731.13552-1-n54@gmx.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
Message-id: 20200206115731.13552-1-n54@gmx.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
fatal: git fetch_pack: expected ACK/NAK, got 'ERR upload-pack: not our ref 1c298dad3d820f7a2161054ff581cf2fa65ee1b4'
fatal: The remote end hung up unexpectedly
error: Could not fetch 3c8cf5a9c21ff8782164d1def7f44bd888713384
Traceback (most recent call last):
  File "patchew-tester/src/patchew-cli", line 521, in test_one
    git_clone_repo(clone, r["repo"], r["head"], logf, True)
  File "patchew-tester/src/patchew-cli", line 48, in git_clone_repo
    stdout=logf, stderr=logf)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'remote', 'add', '-f', '--mirror=fetch', '3c8cf5a9c21ff8782164d1def7f44bd888713384', 'https://github.com/patchew-project/qemu']' returned non-zero exit status 1.



The full log is available at
http://patchew.org/logs/20200206115731.13552-1-n54@gmx.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-06 13:13     ` [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator no-reply
@ 2020-02-06 13:21       ` Kamil Rytarowski
  2020-02-06 16:01         ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 13:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, ehabkost, slp, pbonzini, philmd, max, rth

Am I supposed to do something with this or is this an issue in a script?

On 06.02.2020 14:13, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200206115731.13552-1-n54@gmx.com/
>
>
>
> Hi,
>
> This series seems to have some coding style problems. See output below for
> more information:
>
> Subject: [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
> Message-id: 20200206115731.13552-1-n54@gmx.com
> Type: series
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
>
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> fatal: git fetch_pack: expected ACK/NAK, got 'ERR upload-pack: not our ref 1c298dad3d820f7a2161054ff581cf2fa65ee1b4'
> fatal: The remote end hung up unexpectedly
> error: Could not fetch 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Traceback (most recent call last):
>   File "patchew-tester/src/patchew-cli", line 521, in test_one
>     git_clone_repo(clone, r["repo"], r["head"], logf, True)
>   File "patchew-tester/src/patchew-cli", line 48, in git_clone_repo
>     stdout=logf, stderr=logf)
>   File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call
>     raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['git', 'remote', 'add', '-f', '--mirror=fetch', '3c8cf5a9c21ff8782164d1def7f44bd888713384', 'https://github.com/patchew-project/qemu']' returned non-zero exit status 1.
>
>
>
> The full log is available at
> http://patchew.org/logs/20200206115731.13552-1-n54@gmx.com/testing.checkpatch/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 13:09           ` Philippe Mathieu-Daudé
@ 2020-02-06 13:31             ` Kamil Rytarowski
  2020-02-06 14:13               ` Markus Armbruster
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 13:31 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Markus Armbruster
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, QEMU Developers,
	Paolo Bonzini, max, Richard Henderson

On 06.02.2020 14:09, Philippe Mathieu-Daudé wrote:
> On Thu, Feb 6, 2020 at 2:06 PM Markus Armbruster <armbru@redhat.com> wrote:
>> Kamil Rytarowski <n54@gmx.com> writes:
>>
>>> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>>>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>>>>   #endif
>>>>>   }
>>>>>
>>>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>>>> +{
>>>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>>> +
>>>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>>
>>>> Nitpick, we prefer g_new0().
>>>
>>> In this file other qemu_*_start_vcpu() use  g_malloc0().
>>>
>>> I will leave this part unchanged and defer tor future style fixups if
>>> someone is interested.
>>
>> Time to re-run Coccinelle with the semantic patch from commit
>> b45c03f585e.
>
> I thought about it, but then noticed it would be clever to modify
> checkpatch to refuse 'g_malloc0?(.*sizeof.*);'
>
>

As the patchset was reviewed, could we please merge it in the current
(v3) form (*) please?

Feel free to fixup the style after that as you like.

We plan to release NetBSD 9.0 in 1-2 weeks unless there will be a delay.

https://blog.netbsd.org/tnf/entry/second_final_release_candidate_for

(*) https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg01405.html


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 13:31             ` Kamil Rytarowski
@ 2020-02-06 14:13               ` Markus Armbruster
  2020-02-06 15:38                 ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Markus Armbruster @ 2020-02-06 14:13 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, QEMU Developers,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	max, Richard Henderson

Kamil Rytarowski <n54@gmx.com> writes:

> On 06.02.2020 14:09, Philippe Mathieu-Daudé wrote:
>> On Thu, Feb 6, 2020 at 2:06 PM Markus Armbruster <armbru@redhat.com> wrote:
>>> Kamil Rytarowski <n54@gmx.com> writes:
>>>
>>>> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>>>>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>>>>>   #endif
>>>>>>   }
>>>>>>
>>>>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>>>>> +{
>>>>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>>>> +
>>>>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>>>
>>>>> Nitpick, we prefer g_new0().
>>>>
>>>> In this file other qemu_*_start_vcpu() use  g_malloc0().
>>>>
>>>> I will leave this part unchanged and defer tor future style fixups if
>>>> someone is interested.
>>>
>>> Time to re-run Coccinelle with the semantic patch from commit
>>> b45c03f585e.
>>
>> I thought about it, but then noticed it would be clever to modify
>> checkpatch to refuse 'g_malloc0?(.*sizeof.*);'
>>
>>
>
> As the patchset was reviewed, could we please merge it in the current
> (v3) form (*) please?

No objection.  If I wanted you to clean this up before we accept your
work, I would've told you :)

[...]



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 14:13               ` Markus Armbruster
@ 2020-02-06 15:38                 ` Kamil Rytarowski
  2020-02-06 16:07                   ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 15:38 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, QEMU Developers,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	max, Richard Henderson

On 06.02.2020 15:13, Markus Armbruster wrote:
> Kamil Rytarowski <n54@gmx.com> writes:
>
>> On 06.02.2020 14:09, Philippe Mathieu-Daudé wrote:
>>> On Thu, Feb 6, 2020 at 2:06 PM Markus Armbruster <armbru@redhat.com> wrote:
>>>> Kamil Rytarowski <n54@gmx.com> writes:
>>>>
>>>>> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>>>>>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>>>>>>   #endif
>>>>>>>   }
>>>>>>>
>>>>>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>>>>>> +{
>>>>>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>>>>> +
>>>>>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>>>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>>>>
>>>>>> Nitpick, we prefer g_new0().
>>>>>
>>>>> In this file other qemu_*_start_vcpu() use  g_malloc0().
>>>>>
>>>>> I will leave this part unchanged and defer tor future style fixups if
>>>>> someone is interested.
>>>>
>>>> Time to re-run Coccinelle with the semantic patch from commit
>>>> b45c03f585e.
>>>
>>> I thought about it, but then noticed it would be clever to modify
>>> checkpatch to refuse 'g_malloc0?(.*sizeof.*);'
>>>
>>>
>>
>> As the patchset was reviewed, could we please merge it in the current
>> (v3) form (*) please?
>
> No objection.  If I wanted you to clean this up before we accept your
> work, I would've told you :)
>
> [...]
>
>

I see. I don't own myself a merge queue so I depend on yours.

Thank you in advance!


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-06 13:21       ` Kamil Rytarowski
@ 2020-02-06 16:01         ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-06 16:01 UTC (permalink / raw)
  To: Kamil Rytarowski, qemu-devel
  Cc: patchew-devel, peter.maydell, ehabkost, slp, pbonzini, max, rth

On 2/6/20 2:21 PM, Kamil Rytarowski wrote:
> Am I supposed to do something with this or is this an issue in a script?

I think either storage full or network failure.

> On 06.02.2020 14:13, no-reply@patchew.org wrote:
>> Patchew URL: https://patchew.org/QEMU/20200206115731.13552-1-n54@gmx.com/
>>
>>
>>
>> Hi,
>>
>> This series seems to have some coding style problems. See output below for
>> more information:
>>
>> Subject: [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
>> Message-id: 20200206115731.13552-1-n54@gmx.com
>> Type: series
>>
>> === TEST SCRIPT BEGIN ===
>> #!/bin/bash
>> git rev-parse base > /dev/null || exit 0
>> git config --local diff.renamelimit 0
>> git config --local diff.renames True
>> git config --local diff.algorithm histogram
>> ./scripts/checkpatch.pl --mailback base..
>> === TEST SCRIPT END ===
>>
>> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
>> fatal: git fetch_pack: expected ACK/NAK, got 'ERR upload-pack: not our ref 1c298dad3d820f7a2161054ff581cf2fa65ee1b4'
>> fatal: The remote end hung up unexpectedly
>> error: Could not fetch 3c8cf5a9c21ff8782164d1def7f44bd888713384
>> Traceback (most recent call last):
>>    File "patchew-tester/src/patchew-cli", line 521, in test_one
>>      git_clone_repo(clone, r["repo"], r["head"], logf, True)
>>    File "patchew-tester/src/patchew-cli", line 48, in git_clone_repo
>>      stdout=logf, stderr=logf)
>>    File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call
>>      raise CalledProcessError(retcode, cmd)
>> subprocess.CalledProcessError: Command '['git', 'remote', 'add', '-f', '--mirror=fetch', '3c8cf5a9c21ff8782164d1def7f44bd888713384', 'https://github.com/patchew-project/qemu']' returned non-zero exit status 1.
>>
>>
>>
>> The full log is available at
>> http://patchew.org/logs/20200206115731.13552-1-n54@gmx.com/testing.checkpatch/?type=message.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 15:38                 ` Kamil Rytarowski
@ 2020-02-06 16:07                   ` Philippe Mathieu-Daudé
  2020-02-06 16:59                     ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-02-06 16:07 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, QEMU Developers,
	Markus Armbruster, Paolo Bonzini, max, Richard Henderson

On 2/6/20 4:38 PM, Kamil Rytarowski wrote:
> On 06.02.2020 15:13, Markus Armbruster wrote:
>> Kamil Rytarowski <n54@gmx.com> writes:
>>
>>> On 06.02.2020 14:09, Philippe Mathieu-Daudé wrote:
>>>> On Thu, Feb 6, 2020 at 2:06 PM Markus Armbruster <armbru@redhat.com> wrote:
>>>>> Kamil Rytarowski <n54@gmx.com> writes:
>>>>>
>>>>>> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>>>>>>> @@ -2029,6 +2072,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
>>>>>>>>    #endif
>>>>>>>>    }
>>>>>>>>
>>>>>>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>>>>>>> +{
>>>>>>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>>>>>> +
>>>>>>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>>>>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>>>>>
>>>>>>> Nitpick, we prefer g_new0().
>>>>>>
>>>>>> In this file other qemu_*_start_vcpu() use  g_malloc0().
>>>>>>
>>>>>> I will leave this part unchanged and defer tor future style fixups if
>>>>>> someone is interested.
>>>>>
>>>>> Time to re-run Coccinelle with the semantic patch from commit
>>>>> b45c03f585e.
>>>>
>>>> I thought about it, but then noticed it would be clever to modify
>>>> checkpatch to refuse 'g_malloc0?(.*sizeof.*);'
>>>>
>>>>
>>>
>>> As the patchset was reviewed, could we please merge it in the current
>>> (v3) form (*) please?
>>
>> No objection.  If I wanted you to clean this up before we accept your
>> work, I would've told you :)
>>
>> [...]
>>
>>
> 
> I see. I don't own myself a merge queue so I depend on yours.

As you said [*] you'd love to have this feature in NetBSD 9.0, no 
objection neither. You still need some X86 specialist to review patch 3. 
The usual reviewers Paolo/Eduardo/Richard are currently very busy.

Also while I'd love to use this feature to be able to regularly run QEMU 
CI on NetBSD, I don't have time to test it on a bare metal hardware :|
Maybe do you know someone from the NetBSD community who already did?

[*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg676199.html



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 16:07                   ` Philippe Mathieu-Daudé
@ 2020-02-06 16:59                     ` Kamil Rytarowski
  0 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 16:59 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Markus Armbruster,
	QEMU Developers, Paolo Bonzini, max, Richard Henderson

On 06.02.2020 17:07, Philippe Mathieu-Daudé wrote:
> On 2/6/20 4:38 PM, Kamil Rytarowski wrote:
>> On 06.02.2020 15:13, Markus Armbruster wrote:
>>> Kamil Rytarowski <n54@gmx.com> writes:
>>>
>>>> On 06.02.2020 14:09, Philippe Mathieu-Daudé wrote:
>>>>> On Thu, Feb 6, 2020 at 2:06 PM Markus Armbruster
>>>>> <armbru@redhat.com> wrote:
>>>>>> Kamil Rytarowski <n54@gmx.com> writes:
>>>>>>
>>>>>>> On 03.02.2020 12:54, Philippe Mathieu-Daudé wrote:
>>>>>>>>> @@ -2029,6 +2072,19 @@ static void
>>>>>>>>> qemu_whpx_start_vcpu(CPUState *cpu)
>>>>>>>>>    #endif
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
>>>>>>>>> +{
>>>>>>>>> +    char thread_name[VCPU_THREAD_NAME_SIZE];
>>>>>>>>> +
>>>>>>>>> +    cpu->thread = g_malloc0(sizeof(QemuThread));
>>>>>>>>> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
>>>>>>>>
>>>>>>>> Nitpick, we prefer g_new0().
>>>>>>>
>>>>>>> In this file other qemu_*_start_vcpu() use  g_malloc0().
>>>>>>>
>>>>>>> I will leave this part unchanged and defer tor future style
>>>>>>> fixups if
>>>>>>> someone is interested.
>>>>>>
>>>>>> Time to re-run Coccinelle with the semantic patch from commit
>>>>>> b45c03f585e.
>>>>>
>>>>> I thought about it, but then noticed it would be clever to modify
>>>>> checkpatch to refuse 'g_malloc0?(.*sizeof.*);'
>>>>>
>>>>>
>>>>
>>>> As the patchset was reviewed, could we please merge it in the current
>>>> (v3) form (*) please?
>>>
>>> No objection.  If I wanted you to clean this up before we accept your
>>> work, I would've told you :)
>>>
>>> [...]
>>>
>>>
>>
>> I see. I don't own myself a merge queue so I depend on yours.
>
> As you said [*] you'd love to have this feature in NetBSD 9.0, no
> objection neither. You still need some X86 specialist to review patch 3.
> The usual reviewers Paolo/Eduardo/Richard are currently very busy.
>
> Also while I'd love to use this feature to be able to regularly run QEMU
> CI on NetBSD, I don't have time to test it on a bare metal hardware :|
> Maybe do you know someone from the NetBSD community who already did?
>
> [*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg676199.html
>
>

I'm going to find a person to test it and submit "Tested-by:".


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 1/4] Add the NVMM vcpu API
  2020-02-06 11:57     ` [PATCH v3 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-02-06 21:06       ` Jared McNeill
  0 siblings, 0 replies; 79+ messages in thread
From: Jared McNeill @ 2020-02-06 21:06 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, ehabkost, slp, Kamil Rytarowski, qemu-devel,
	pbonzini, philmd, max, rth

[-- Attachment #1: Type: text/plain, Size: 3441 bytes --]

Tested-by: Jared McNeill <jmcneill@invisible.ca>

On Thu, 6 Feb 2020, Kamil Rytarowski wrote:

> From: Maxime Villard <max@m00nbsd.net>
>
> Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
> introduces the nvmm.h sysemu API for managing the vcpu scheduling and
> management.
>
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> accel/stubs/Makefile.objs |  1 +
> accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
> include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
> 3 files changed, 79 insertions(+)
> create mode 100644 accel/stubs/nvmm-stub.c
> create mode 100644 include/sysemu/nvmm.h
>
> diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
> index 3894caf95d..09f2d3e1dd 100644
> --- a/accel/stubs/Makefile.objs
> +++ b/accel/stubs/Makefile.objs
> @@ -1,5 +1,6 @@
> obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
> obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
> obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
> +obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
> obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
> obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
> diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
> new file mode 100644
> index 0000000000..c2208b84a3
> --- /dev/null
> +++ b/accel/stubs/nvmm-stub.c
> @@ -0,0 +1,43 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "cpu.h"
> +#include "sysemu/nvmm.h"
> +
> +int nvmm_init_vcpu(CPUState *cpu)
> +{
> +    return -1;
> +}
> +
> +int nvmm_vcpu_exec(CPUState *cpu)
> +{
> +    return -1;
> +}
> +
> +void nvmm_destroy_vcpu(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_state(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_post_init(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +}
> diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
> new file mode 100644
> index 0000000000..10496f3980
> --- /dev/null
> +++ b/include/sysemu/nvmm.h
> @@ -0,0 +1,35 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_NVMM_H
> +#define QEMU_NVMM_H
> +
> +#include "config-host.h"
> +#include "qemu-common.h"
> +
> +int nvmm_init_vcpu(CPUState *);
> +int nvmm_vcpu_exec(CPUState *);
> +void nvmm_destroy_vcpu(CPUState *);
> +
> +void nvmm_cpu_synchronize_state(CPUState *);
> +void nvmm_cpu_synchronize_post_reset(CPUState *);
> +void nvmm_cpu_synchronize_post_init(CPUState *);
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
> +
> +#ifdef CONFIG_NVMM
> +
> +int nvmm_enabled(void);
> +
> +#else /* CONFIG_NVMM */
> +
> +#define nvmm_enabled() (0)
> +
> +#endif /* CONFIG_NVMM */
> +
> +#endif /* CONFIG_NVMM */
> --
> 2.25.0
>
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-06 11:57     ` [PATCH v3 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-02-06 21:06       ` Jared McNeill
  0 siblings, 0 replies; 79+ messages in thread
From: Jared McNeill @ 2020-02-06 21:06 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, ehabkost, slp, Kamil Rytarowski, qemu-devel,
	pbonzini, philmd, max, rth

[-- Attachment #1: Type: text/plain, Size: 6743 bytes --]

Tested-by: Jared McNeill <jmcneill@invisible.ca>

On Thu, 6 Feb 2020, Kamil Rytarowski wrote:

> From: Maxime Villard <max@m00nbsd.net>
>
> Introduces the configure support for the new NetBSD Virtual Machine Monitor that
> allows for hypervisor acceleration from usermode components on the NetBSD
> platform.
>
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> configure       | 37 +++++++++++++++++++++++++++++++++++++
> qemu-options.hx | 16 ++++++++--------
> 2 files changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/configure b/configure
> index 115dc38085..d4a837cf9d 100755
> --- a/configure
> +++ b/configure
> @@ -241,6 +241,17 @@ supported_whpx_target() {
>     return 1
> }
>
> +supported_nvmm_target() {
> +    test "$nvmm" = "yes" || return 1
> +    glob "$1" "*-softmmu" || return 1
> +    case "${1%-softmmu}" in
> +        i386|x86_64)
> +            return 0
> +        ;;
> +    esac
> +    return 1
> +}
> +
> supported_target() {
>     case "$1" in
>         *-softmmu)
> @@ -268,6 +279,7 @@ supported_target() {
>     supported_hax_target "$1" && return 0
>     supported_hvf_target "$1" && return 0
>     supported_whpx_target "$1" && return 0
> +    supported_nvmm_target "$1" && return 0
>     print_error "TCG disabled, but hardware accelerator not available for '$target'"
>     return 1
> }
> @@ -388,6 +400,7 @@ kvm="no"
> hax="no"
> hvf="no"
> whpx="no"
> +nvmm="no"
> rdma=""
> pvrdma=""
> gprof="no"
> @@ -823,6 +836,7 @@ DragonFly)
> NetBSD)
>   bsd="yes"
>   hax="yes"
> +  nvmm="yes"
>   make="${MAKE-gmake}"
>   audio_drv_list="oss try-sdl"
>   audio_possible_drivers="oss sdl"
> @@ -1169,6 +1183,10 @@ for opt do
>   ;;
>   --enable-whpx) whpx="yes"
>   ;;
> +  --disable-nvmm) nvmm="no"
> +  ;;
> +  --enable-nvmm) nvmm="yes"
> +  ;;
>   --disable-tcg-interpreter) tcg_interpreter="no"
>   ;;
>   --enable-tcg-interpreter) tcg_interpreter="yes"
> @@ -1773,6 +1791,7 @@ disabled with --disable-FEATURE, default is enabled if available:
>   hax             HAX acceleration support
>   hvf             Hypervisor.framework acceleration support
>   whpx            Windows Hypervisor Platform acceleration support
> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>   rdma            Enable RDMA-based migration
>   pvrdma          Enable PVRDMA support
>   vde             support for vde network
> @@ -2764,6 +2783,20 @@ if test "$whpx" != "no" ; then
>     fi
> fi
>
> +##########################################
> +# NetBSD Virtual Machine Monitor (NVMM) accelerator check
> +if test "$nvmm" != "no" ; then
> +    if check_include "nvmm.h" ; then
> +        nvmm="yes"
> +	LIBS="-lnvmm $LIBS"
> +    else
> +        if test "$nvmm" = "yes"; then
> +            feature_not_found "NVMM" "NVMM is not available"
> +        fi
> +        nvmm="no"
> +    fi
> +fi
> +
> ##########################################
> # Sparse probe
> if test "$sparse" != "no" ; then
> @@ -6543,6 +6576,7 @@ echo "KVM support       $kvm"
> echo "HAX support       $hax"
> echo "HVF support       $hvf"
> echo "WHPX support      $whpx"
> +echo "NVMM support      $nvmm"
> echo "TCG support       $tcg"
> if test "$tcg" = "yes" ; then
>     echo "TCG debug enabled $debug_tcg"
> @@ -7828,6 +7862,9 @@ fi
> if test "$target_aligned_only" = "yes" ; then
>   echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
> fi
> +if supported_nvmm_target $target; then
> +    echo "CONFIG_NVMM=y" >> $config_target_mak
> +fi
> if test "$target_bigendian" = "yes" ; then
>   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
> fi
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 224a8e8712..10c046c916 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>     "-machine [type=]name[,prop[=value][,...]]\n"
>     "                selects emulated machine ('-machine help' for list)\n"
>     "                property accel=accel1[:accel2[:...]] selects accelerator\n"
> -    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
> +    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
>     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
>     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>     "                mem-merge=on|off controls memory merge support (default: on)\n"
> @@ -64,9 +64,9 @@ Supported machine properties are:
> @table @option
> @item accel=@var{accels1}[:@var{accels2}[:...]]
> This is used to enable an accelerator. Depending on the target architecture,
> -kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
> -more than one accelerator specified, the next one is used if the previous one
> -fails to initialize.
> +kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
> +If there is more than one accelerator specified, the next one is used if the
> +previous one fails to initialize.
> @item vmport=on|off|auto
> Enables emulation of VMWare IO port, for vmmouse etc. auto says to select the
> value based on accel. For accel=xen the default is off otherwise the default
> @@ -114,7 +114,7 @@ ETEXI
>
> DEF("accel", HAS_ARG, QEMU_OPTION_accel,
>     "-accel [accel=]accelerator[,prop[=value][,...]]\n"
> -    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
> +    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
>     "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
>     "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
>     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
> @@ -124,9 +124,9 @@ STEXI
> @item -accel @var{name}[,prop=@var{value}[,...]]
> @findex -accel
> This is used to enable an accelerator. Depending on the target architecture,
> -kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
> -more than one accelerator specified, the next one is used if the previous one
> -fails to initialize.
> +kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
> +If there is more than one accelerator specified, the next one is used if the
> +previous one fails to initialize.
> @table @option
> @item igd-passthru=on|off
> When Xen is in use, this option controls whether Intel integrated graphics
> --
> 2.25.0
>
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 11:57     ` [PATCH v3 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
@ 2020-02-06 21:07       ` Jared McNeill
  0 siblings, 0 replies; 79+ messages in thread
From: Jared McNeill @ 2020-02-06 21:07 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, ehabkost, slp, Kamil Rytarowski, qemu-devel,
	pbonzini, philmd, max, rth

[-- Attachment #1: Type: text/plain, Size: 5330 bytes --]

Tested-by: Jared McNeill <jmcneill@invisible.ca>

On Thu, 6 Feb 2020, Kamil Rytarowski wrote:

> From: Maxime Villard <max@m00nbsd.net>
>
> Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
> accelerator on NetBSD platforms.
>
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
> cpus.c                    | 58 +++++++++++++++++++++++++++++++++++++++
> include/sysemu/hw_accel.h | 14 ++++++++++
> target/i386/helper.c      |  2 +-
> 3 files changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/cpus.c b/cpus.c
> index b4f8b84b61..f833da4a60 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -42,6 +42,7 @@
> #include "sysemu/hax.h"
> #include "sysemu/hvf.h"
> #include "sysemu/whpx.h"
> +#include "sysemu/nvmm.h"
> #include "exec/exec-all.h"
>
> #include "qemu/thread.h"
> @@ -1670,6 +1671,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
>     return NULL;
> }
>
> +static void *qemu_nvmm_cpu_thread_fn(void *arg)
> +{
> +    CPUState *cpu = arg;
> +    int r;
> +
> +    assert(nvmm_enabled());
> +
> +    rcu_register_thread();
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_thread_get_self(cpu->thread);
> +    cpu->thread_id = qemu_get_thread_id();
> +    current_cpu = cpu;
> +
> +    r = nvmm_init_vcpu(cpu);
> +    if (r < 0) {
> +        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
> +        exit(1);
> +    }
> +
> +    /* signal CPU creation */
> +    cpu->created = true;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +
> +    do {
> +        if (cpu_can_run(cpu)) {
> +            r = nvmm_vcpu_exec(cpu);
> +            if (r == EXCP_DEBUG) {
> +                cpu_handle_guest_debug(cpu);
> +            }
> +        }
> +        qemu_wait_io_event(cpu);
> +    } while (!cpu->unplug || cpu_can_run(cpu));
> +
> +    nvmm_destroy_vcpu(cpu);
> +    cpu->created = false;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +    qemu_mutex_unlock_iothread();
> +    rcu_unregister_thread();
> +    return NULL;
> +}
> +
> #ifdef _WIN32
> static void CALLBACK dummy_apc_func(ULONG_PTR unused)
> {
> @@ -2038,6 +2081,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
> #endif
> }
>
> +static void qemu_nvmm_start_vcpu(CPUState *cpu)
> +{
> +    char thread_name[VCPU_THREAD_NAME_SIZE];
> +
> +    cpu->thread = g_malloc0(sizeof(QemuThread));
> +    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
> +    qemu_cond_init(cpu->halt_cond);
> +    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
> +             cpu->cpu_index);
> +    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
> +                       cpu, QEMU_THREAD_JOINABLE);
> +}
> +
> static void qemu_dummy_start_vcpu(CPUState *cpu)
> {
>     char thread_name[VCPU_THREAD_NAME_SIZE];
> @@ -2078,6 +2134,8 @@ void qemu_init_vcpu(CPUState *cpu)
>         qemu_tcg_init_vcpu(cpu);
>     } else if (whpx_enabled()) {
>         qemu_whpx_start_vcpu(cpu);
> +    } else if (nvmm_enabled()) {
> +        qemu_nvmm_start_vcpu(cpu);
>     } else {
>         qemu_dummy_start_vcpu(cpu);
>     }
> diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
> index 0ec2372477..dbfa7a02f9 100644
> --- a/include/sysemu/hw_accel.h
> +++ b/include/sysemu/hw_accel.h
> @@ -15,6 +15,7 @@
> #include "sysemu/hax.h"
> #include "sysemu/kvm.h"
> #include "sysemu/whpx.h"
> +#include "sysemu/nvmm.h"
>
> static inline void cpu_synchronize_state(CPUState *cpu)
> {
> @@ -27,6 +28,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
>     if (whpx_enabled()) {
>         whpx_cpu_synchronize_state(cpu);
>     }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_state(cpu);
> +    }
> }
>
> static inline void cpu_synchronize_post_reset(CPUState *cpu)
> @@ -40,6 +44,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
>     if (whpx_enabled()) {
>         whpx_cpu_synchronize_post_reset(cpu);
>     }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_post_reset(cpu);
> +    }
> +
> }
>
> static inline void cpu_synchronize_post_init(CPUState *cpu)
> @@ -53,6 +61,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
>     if (whpx_enabled()) {
>         whpx_cpu_synchronize_post_init(cpu);
>     }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_post_init(cpu);
> +    }
> }
>
> static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
> @@ -66,6 +77,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
>     if (whpx_enabled()) {
>         whpx_cpu_synchronize_pre_loadvm(cpu);
>     }
> +    if (nvmm_enabled()) {
> +        nvmm_cpu_synchronize_pre_loadvm(cpu);
> +    }
> }
>
> #endif /* QEMU_HW_ACCEL_H */
> diff --git a/target/i386/helper.c b/target/i386/helper.c
> index c3a6e4fabe..2e79d61329 100644
> --- a/target/i386/helper.c
> +++ b/target/i386/helper.c
> @@ -981,7 +981,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
>     X86CPU *cpu = env_archcpu(env);
>     CPUState *cs = env_cpu(env);
>
> -    if (kvm_enabled() || whpx_enabled()) {
> +    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
>         env->tpr_access_type = access;
>
>         cpu_interrupt(cs, CPU_INTERRUPT_TPR);
> --
> 2.25.0
>
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v3 3/4] Introduce the NVMM impl
  2020-02-06 11:57     ` [PATCH v3 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-02-06 21:07       ` Jared McNeill
  0 siblings, 0 replies; 79+ messages in thread
From: Jared McNeill @ 2020-02-06 21:07 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, ehabkost, slp, Kamil Rytarowski, qemu-devel,
	pbonzini, philmd, max, rth

Tested-by: Jared McNeill <jmcneill@invisible.ca>

On Thu, 6 Feb 2020, Kamil Rytarowski wrote:

> From: Maxime Villard <max@m00nbsd.net>
>
> Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
> acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
> QEMU much greater speed over the emulated x86_64 path's that are taken on
> NetBSD today.
>
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> ---
> target/i386/Makefile.objs |    1 +
> target/i386/nvmm-all.c    | 1221 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 1222 insertions(+)
> create mode 100644 target/i386/nvmm-all.c
>
> diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
> index 48e0c28434..bdcdb32e93 100644
> --- a/target/i386/Makefile.objs
> +++ b/target/i386/Makefile.objs
> @@ -17,6 +17,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
> endif
> obj-$(CONFIG_HVF) += hvf/
> obj-$(CONFIG_WHPX) += whpx-all.o
> +obj-$(CONFIG_NVMM) += nvmm-all.o
> endif
> obj-$(CONFIG_SEV) += sev.o
> obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
> diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
> new file mode 100644
> index 0000000000..6988400f53
> --- /dev/null
> +++ b/target/i386/nvmm-all.c
> @@ -0,0 +1,1221 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/address-spaces.h"
> +#include "exec/ioport.h"
> +#include "qemu-common.h"
> +#include "strings.h"
> +#include "sysemu/accel.h"
> +#include "sysemu/nvmm.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/error-report.h"
> +#include "qemu/queue.h"
> +#include "qapi/error.h"
> +#include "migration/blocker.h"
> +
> +#include <nvmm.h>
> +
> +struct qemu_vcpu {
> +    struct nvmm_vcpu vcpu;
> +    uint8_t tpr;
> +    bool stop;
> +
> +    /* Window-exiting for INTs/NMIs. */
> +    bool int_window_exit;
> +    bool nmi_window_exit;
> +
> +    /* The guest is in an interrupt shadow (POP SS, etc). */
> +    bool int_shadow;
> +};
> +
> +struct qemu_machine {
> +    struct nvmm_capability cap;
> +    struct nvmm_machine mach;
> +};
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static bool nvmm_allowed;
> +static struct qemu_machine qemu_mach;
> +
> +static struct qemu_vcpu *
> +get_qemu_vcpu(CPUState *cpu)
> +{
> +    return (struct qemu_vcpu *)cpu->hax_vcpu;
> +}
> +
> +static struct nvmm_machine *
> +get_nvmm_mach(void)
> +{
> +    return &qemu_mach.mach;
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
> +{
> +    uint32_t attrib = qseg->flags;
> +
> +    nseg->selector = qseg->selector;
> +    nseg->limit = qseg->limit;
> +    nseg->base = qseg->base;
> +    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
> +    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
> +    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
> +    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
> +    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
> +    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
> +    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
> +    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
> +}
> +
> +static void
> +nvmm_set_registers(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t bitmap;
> +    size_t i;
> +    int ret;
> +
> +    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
> +
> +    /* GPRs. */
> +    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
> +    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
> +    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
> +    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
> +    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
> +    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
> +    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
> +    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
> +    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
> +    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
> +    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
> +    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
> +    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
> +    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
> +    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
> +    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
> +
> +    /* RIP and RFLAGS. */
> +    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
> +    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
> +
> +    /* Segments. */
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
> +
> +    /* Special segments. */
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
> +    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
> +
> +    /* Control registers. */
> +    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
> +    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
> +    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
> +    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
> +    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
> +    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
> +
> +    /* Debug registers. */
> +    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
> +    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
> +    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
> +    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
> +    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
> +    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
> +
> +    /* FPU. */
> +    state->fpu.fx_cw = env->fpuc;
> +    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
> +    state->fpu.fx_tw = 0;
> +    for (i = 0; i < 8; i++) {
> +        state->fpu.fx_tw |= (!env->fptags[i]) << i;
> +    }
> +    state->fpu.fx_opcode = env->fpop;
> +    state->fpu.fx_ip.fa_64 = env->fpip;
> +    state->fpu.fx_dp.fa_64 = env->fpdp;
> +    state->fpu.fx_mxcsr = env->mxcsr;
> +    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
> +    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
> +    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
> +    for (i = 0; i < 16; i++) {
> +        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
> +            &env->xmm_regs[i].ZMM_Q(0), 8);
> +        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
> +            &env->xmm_regs[i].ZMM_Q(1), 8);
> +    }
> +
> +    /* MSRs. */
> +    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
> +    state->msrs[NVMM_X64_MSR_STAR] = env->star;
> +#ifdef TARGET_X86_64
> +    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
> +    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
> +    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
> +    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
> +#endif
> +    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
> +    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
> +    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
> +    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
> +    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
> +
> +    bitmap =
> +        NVMM_X64_STATE_SEGS |
> +        NVMM_X64_STATE_GPRS |
> +        NVMM_X64_STATE_CRS  |
> +        NVMM_X64_STATE_DRS  |
> +        NVMM_X64_STATE_MSRS |
> +        NVMM_X64_STATE_FPU;
> +
> +    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to set virtual processor context,"
> +            " error=%d", errno);
> +    }
> +}
> +
> +static void
> +nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
> +{
> +    qseg->selector = nseg->selector;
> +    qseg->limit = nseg->limit;
> +    qseg->base = nseg->base;
> +
> +    qseg->flags =
> +        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
> +        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
> +}
> +
> +static void
> +nvmm_get_registers(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t bitmap, tpr;
> +    size_t i;
> +    int ret;
> +
> +    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
> +
> +    bitmap =
> +        NVMM_X64_STATE_SEGS |
> +        NVMM_X64_STATE_GPRS |
> +        NVMM_X64_STATE_CRS  |
> +        NVMM_X64_STATE_DRS  |
> +        NVMM_X64_STATE_MSRS |
> +        NVMM_X64_STATE_FPU;
> +
> +    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to get virtual processor context,"
> +            " error=%d", errno);
> +    }
> +
> +    /* GPRs. */
> +    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
> +    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
> +    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
> +    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
> +    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
> +    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
> +    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
> +    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
> +    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
> +    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
> +    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
> +    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
> +    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
> +    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
> +    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
> +    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
> +
> +    /* RIP and RFLAGS. */
> +    env->eip = state->gprs[NVMM_X64_GPR_RIP];
> +    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
> +
> +    /* Segments. */
> +    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
> +    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
> +    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
> +    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
> +    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
> +    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
> +
> +    /* Special segments. */
> +    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
> +    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
> +    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
> +    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
> +
> +    /* Control registers. */
> +    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
> +    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
> +    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
> +    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
> +    tpr = state->crs[NVMM_X64_CR_CR8];
> +    if (tpr != qcpu->tpr) {
> +        qcpu->tpr = tpr;
> +        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
> +    }
> +    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
> +
> +    /* Debug registers. */
> +    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
> +    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
> +    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
> +    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
> +    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
> +    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
> +
> +    /* FPU. */
> +    env->fpuc = state->fpu.fx_cw;
> +    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
> +    env->fpus = state->fpu.fx_sw & ~0x3800;
> +    for (i = 0; i < 8; i++) {
> +        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
> +    }
> +    env->fpop = state->fpu.fx_opcode;
> +    env->fpip = state->fpu.fx_ip.fa_64;
> +    env->fpdp = state->fpu.fx_dp.fa_64;
> +    env->mxcsr = state->fpu.fx_mxcsr;
> +    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
> +    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
> +    for (i = 0; i < 16; i++) {
> +        memcpy(&env->xmm_regs[i].ZMM_Q(0),
> +            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
> +        memcpy(&env->xmm_regs[i].ZMM_Q(1),
> +            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
> +    }
> +
> +    /* MSRs. */
> +    env->efer = state->msrs[NVMM_X64_MSR_EFER];
> +    env->star = state->msrs[NVMM_X64_MSR_STAR];
> +#ifdef TARGET_X86_64
> +    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
> +    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
> +    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
> +    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
> +#endif
> +    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
> +    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
> +    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
> +    env->pat = state->msrs[NVMM_X64_MSR_PAT];
> +    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
> +
> +    x86_update_hflags(env);
> +}
> +
> +static bool
> +nvmm_can_take_int(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +
> +    if (qcpu->int_window_exit) {
> +        return false;
> +    }
> +
> +    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
> +        struct nvmm_x64_state *state = vcpu->state;
> +
> +        /* Exit on interrupt window. */
> +        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
> +        state->intr.int_window_exiting = 1;
> +        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
> +
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static bool
> +nvmm_can_take_nmi(CPUState *cpu)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +
> +    /*
> +     * Contrary to INTs, NMIs always schedule an exit when they are
> +     * completed. Therefore, if window-exiting is enabled, it means
> +     * NMIs are blocked.
> +     */
> +    if (qcpu->nmi_window_exit) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +/*
> + * Called before the VCPU is run. We inject events generated by the I/O
> + * thread, and synchronize the guest TPR.
> + */
> +static void
> +nvmm_vcpu_pre_run(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    struct nvmm_vcpu_event *event = vcpu->event;
> +    bool has_event = false;
> +    bool sync_tpr = false;
> +    uint8_t tpr;
> +    int ret;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
> +    if (tpr != qcpu->tpr) {
> +        qcpu->tpr = tpr;
> +        sync_tpr = true;
> +    }
> +
> +    /*
> +     * Force the VCPU out of its inner loop to process any INIT requests
> +     * or commit pending TPR access.
> +     */
> +    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
> +        cpu->exit_request = 1;
> +    }
> +
> +    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
> +        if (nvmm_can_take_nmi(cpu)) {
> +            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
> +            event->type = NVMM_VCPU_EVENT_INTR;
> +            event->vector = 2;
> +            has_event = true;
> +        }
> +    }
> +
> +    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
> +        if (nvmm_can_take_int(cpu)) {
> +            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
> +            event->type = NVMM_VCPU_EVENT_INTR;
> +            event->vector = cpu_get_pic_interrupt(env);
> +            has_event = true;
> +        }
> +    }
> +
> +    /* Don't want SMIs. */
> +    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
> +        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
> +    }
> +
> +    if (sync_tpr) {
> +        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to get CPU state,"
> +                " error=%d", errno);
> +        }
> +
> +        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
> +
> +        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to set CPU state,"
> +                " error=%d", errno);
> +        }
> +    }
> +
> +    if (has_event) {
> +        ret = nvmm_vcpu_inject(mach, vcpu);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to inject event,"
> +                " error=%d", errno);
> +        }
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +}
> +
> +/*
> + * Called after the VCPU ran. We synchronize the host view of the TPR and
> + * RFLAGS.
> + */
> +static void
> +nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    uint64_t tpr;
> +
> +    env->eflags = exit->exitstate.rflags;
> +    qcpu->int_shadow = exit->exitstate.int_shadow;
> +    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
> +    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
> +
> +    tpr = exit->exitstate.cr8;
> +    if (qcpu->tpr != tpr) {
> +        qcpu->tpr = tpr;
> +        qemu_mutex_lock_iothread();
> +        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
> +        qemu_mutex_unlock_iothread();
> +    }
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_io_callback(struct nvmm_io *io)
> +{
> +    MemTxAttrs attrs = { 0 };
> +    int ret;
> +
> +    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
> +        io->size, !io->in);
> +    if (ret != MEMTX_OK) {
> +        error_report("NVMM: I/O Transaction Failed "
> +            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
> +            io->port, io->size);
> +    }
> +
> +    /* Needed, otherwise infinite loop. */
> +    current_cpu->vcpu_dirty = false;
> +}
> +
> +static void
> +nvmm_mem_callback(struct nvmm_mem *mem)
> +{
> +    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
> +
> +    /* XXX Needed, otherwise infinite loop. */
> +    current_cpu->vcpu_dirty = false;
> +}
> +
> +static struct nvmm_assist_callbacks nvmm_callbacks = {
> +    .io = nvmm_io_callback,
> +    .mem = nvmm_mem_callback
> +};
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static int
> +nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
> +{
> +    int ret;
> +
> +    ret = nvmm_assist_mem(mach, vcpu);
> +    if (ret == -1) {
> +        error_report("NVMM: Mem Assist Failed [gpa=%p]",
> +            (void *)vcpu->exit->u.mem.gpa);
> +    }
> +
> +    return ret;
> +}
> +
> +static int
> +nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
> +{
> +    int ret;
> +
> +    ret = nvmm_assist_io(mach, vcpu);
> +    if (ret == -1) {
> +        error_report("NVMM: I/O Assist Failed [port=%d]",
> +            (int)vcpu->exit->u.io.port);
> +    }
> +
> +    return ret;
> +}
> +
> +static int
> +nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
> +    struct nvmm_vcpu_exit *exit)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t val;
> +    int ret;
> +
> +    switch (exit->u.rdmsr.msr) {
> +    case MSR_IA32_APICBASE:
> +        val = cpu_get_apic_base(x86_cpu->apic_state);
> +        break;
> +    case MSR_MTRRcap:
> +    case MSR_MTRRdefType:
> +    case MSR_MCG_CAP:
> +    case MSR_MCG_STATUS:
> +        val = 0;
> +        break;
> +    default: /* More MSRs to add? */
> +        val = 0;
> +        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
> +            exit->u.rdmsr.msr);
> +        break;
> +    }
> +
> +    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
> +    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
> +    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
> +
> +    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
> +    struct nvmm_vcpu_exit *exit)
> +{
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_x64_state *state = vcpu->state;
> +    uint64_t val;
> +    int ret;
> +
> +    val = exit->u.wrmsr.val;
> +
> +    switch (exit->u.wrmsr.msr) {
> +    case MSR_IA32_APICBASE:
> +        cpu_set_apic_base(x86_cpu->apic_state, val);
> +        break;
> +    case MSR_MTRRdefType:
> +    case MSR_MCG_STATUS:
> +        break;
> +    default: /* More MSRs to add? */
> +        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
> +            exit->u.wrmsr.msr, val);
> +        break;
> +    }
> +
> +    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
> +
> +    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
> +    if (ret == -1) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
> +    struct nvmm_vcpu_exit *exit)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    int ret = 0;
> +
> +    qemu_mutex_lock_iothread();
> +
> +    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> +          (env->eflags & IF_MASK)) &&
> +        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
> +        cpu->exception_index = EXCP_HLT;
> +        cpu->halted = true;
> +        ret = 1;
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +
> +    return ret;
> +}
> +
> +static int
> +nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
> +{
> +    struct nvmm_vcpu_event *event = vcpu->event;
> +
> +    event->type = NVMM_VCPU_EVENT_EXCP;
> +    event->vector = 6;
> +    event->u.excp.error = 0;
> +
> +    return nvmm_vcpu_inject(mach, vcpu);
> +}
> +
> +static int
> +nvmm_vcpu_loop(CPUState *cpu)
> +{
> +    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    struct nvmm_vcpu_exit *exit = vcpu->exit;
> +    int ret;
> +
> +    /*
> +     * Some asynchronous events must be handled outside of the inner
> +     * VCPU loop. They are handled here.
> +     */
> +    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
> +        nvmm_cpu_synchronize_state(cpu);
> +        do_cpu_init(x86_cpu);
> +        /* set int/nmi windows back to the reset state */
> +    }
> +    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
> +        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
> +        apic_poll_irq(x86_cpu->apic_state);
> +    }
> +    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
> +         (env->eflags & IF_MASK)) ||
> +        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
> +        cpu->halted = false;
> +    }
> +    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
> +        nvmm_cpu_synchronize_state(cpu);
> +        do_cpu_sipi(x86_cpu);
> +    }
> +    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
> +        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
> +        nvmm_cpu_synchronize_state(cpu);
> +        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
> +            env->tpr_access_type);
> +    }
> +
> +    if (cpu->halted) {
> +        cpu->exception_index = EXCP_HLT;
> +        atomic_set(&cpu->exit_request, false);
> +        return 0;
> +    }
> +
> +    qemu_mutex_unlock_iothread();
> +    cpu_exec_start(cpu);
> +
> +    /*
> +     * Inner VCPU loop.
> +     */
> +    do {
> +        if (cpu->vcpu_dirty) {
> +            nvmm_set_registers(cpu);
> +            cpu->vcpu_dirty = false;
> +        }
> +
> +        if (qcpu->stop) {
> +            cpu->exception_index = EXCP_INTERRUPT;
> +            qcpu->stop = false;
> +            ret = 1;
> +            break;
> +        }
> +
> +        nvmm_vcpu_pre_run(cpu);
> +
> +        if (atomic_read(&cpu->exit_request)) {
> +            qemu_cpu_kick_self();
> +        }
> +
> +        ret = nvmm_vcpu_run(mach, vcpu);
> +        if (ret == -1) {
> +            error_report("NVMM: Failed to exec a virtual processor,"
> +                " error=%d", errno);
> +            break;
> +        }
> +
> +        nvmm_vcpu_post_run(cpu, exit);
> +
> +        switch (exit->reason) {
> +        case NVMM_VCPU_EXIT_NONE:
> +            break;
> +        case NVMM_VCPU_EXIT_MEMORY:
> +            ret = nvmm_handle_mem(mach, vcpu);
> +            break;
> +        case NVMM_VCPU_EXIT_IO:
> +            ret = nvmm_handle_io(mach, vcpu);
> +            break;
> +        case NVMM_VCPU_EXIT_INT_READY:
> +        case NVMM_VCPU_EXIT_NMI_READY:
> +        case NVMM_VCPU_EXIT_TPR_CHANGED:
> +            break;
> +        case NVMM_VCPU_EXIT_HALTED:
> +            ret = nvmm_handle_halted(mach, cpu, exit);
> +            break;
> +        case NVMM_VCPU_EXIT_SHUTDOWN:
> +            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
> +            cpu->exception_index = EXCP_INTERRUPT;
> +            ret = 1;
> +            break;
> +        case NVMM_VCPU_EXIT_RDMSR:
> +            ret = nvmm_handle_rdmsr(mach, cpu, exit);
> +            break;
> +        case NVMM_VCPU_EXIT_WRMSR:
> +            ret = nvmm_handle_wrmsr(mach, cpu, exit);
> +            break;
> +        case NVMM_VCPU_EXIT_MONITOR:
> +        case NVMM_VCPU_EXIT_MWAIT:
> +            ret = nvmm_inject_ud(mach, vcpu);
> +            break;
> +        default:
> +            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
> +                exit->reason, exit->u.inv.hwcode);
> +            nvmm_get_registers(cpu);
> +            qemu_mutex_lock_iothread();
> +            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
> +            qemu_mutex_unlock_iothread();
> +            ret = -1;
> +            break;
> +        }
> +    } while (ret == 0);
> +
> +    cpu_exec_end(cpu);
> +    qemu_mutex_lock_iothread();
> +    current_cpu = cpu;
> +
> +    atomic_set(&cpu->exit_request, false);
> +
> +    return ret < 0;
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    nvmm_get_registers(cpu);
> +    cpu->vcpu_dirty = true;
> +}
> +
> +static void
> +do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    nvmm_set_registers(cpu);
> +    cpu->vcpu_dirty = false;
> +}
> +
> +static void
> +do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    nvmm_set_registers(cpu);
> +    cpu->vcpu_dirty = false;
> +}
> +
> +static void
> +do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
> +{
> +    cpu->vcpu_dirty = true;
> +}
> +
> +void nvmm_cpu_synchronize_state(CPUState *cpu)
> +{
> +    if (!cpu->vcpu_dirty) {
> +        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
> +    }
> +}
> +
> +void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
> +}
> +
> +void nvmm_cpu_synchronize_post_init(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
> +}
> +
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static Error *nvmm_migration_blocker;
> +
> +static void
> +nvmm_ipi_signal(int sigcpu)
> +{
> +    struct qemu_vcpu *qcpu;
> +
> +    if (current_cpu) {
> +        qcpu = get_qemu_vcpu(current_cpu);
> +        qcpu->stop = true;
> +    }
> +}
> +
> +static void
> +nvmm_init_cpu_signals(void)
> +{
> +    struct sigaction sigact;
> +    sigset_t set;
> +
> +    /* Install the IPI handler. */
> +    memset(&sigact, 0, sizeof(sigact));
> +    sigact.sa_handler = nvmm_ipi_signal;
> +    sigaction(SIG_IPI, &sigact, NULL);
> +
> +    /* Allow IPIs on the current thread. */
> +    sigprocmask(SIG_BLOCK, NULL, &set);
> +    sigdelset(&set, SIG_IPI);
> +    pthread_sigmask(SIG_SETMASK, &set, NULL);
> +}
> +
> +int
> +nvmm_init_vcpu(CPUState *cpu)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct nvmm_vcpu_conf_cpuid cpuid;
> +    struct nvmm_vcpu_conf_tpr tpr;
> +    Error *local_error = NULL;
> +    struct qemu_vcpu *qcpu;
> +    int ret, err;
> +
> +    nvmm_init_cpu_signals();
> +
> +    if (nvmm_migration_blocker == NULL) {
> +        error_setg(&nvmm_migration_blocker,
> +            "NVMM: Migration not supported");
> +
> +        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
> +        if (local_error) {
> +            error_report_err(local_error);
> +            migrate_del_blocker(nvmm_migration_blocker);
> +            error_free(nvmm_migration_blocker);
> +            return -EINVAL;
> +        }
> +    }
> +
> +    qcpu = g_malloc0(sizeof(*qcpu));
> +    if (qcpu == NULL) {
> +        error_report("NVMM: Failed to allocate VCPU context.");
> +        return -ENOMEM;
> +    }
> +
> +    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Failed to create a virtual processor,"
> +            " error=%d", err);
> +        g_free(qcpu);
> +        return -err;
> +    }
> +
> +    memset(&cpuid, 0, sizeof(cpuid));
> +    cpuid.mask = 1;
> +    cpuid.leaf = 0x00000001;
> +    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
> +    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
> +        &cpuid);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Failed to configure a virtual processor,"
> +            " error=%d", err);
> +        g_free(qcpu);
> +        return -err;
> +    }
> +
> +    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
> +        &nvmm_callbacks);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Failed to configure a virtual processor,"
> +            " error=%d", err);
> +        g_free(qcpu);
> +        return -err;
> +    }
> +
> +    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
> +        memset(&tpr, 0, sizeof(tpr));
> +        tpr.exit_changed = 1;
> +        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
> +        if (ret == -1) {
> +            err = errno;
> +            error_report("NVMM: Failed to configure a virtual processor,"
> +                " error=%d", err);
> +            g_free(qcpu);
> +            return -err;
> +        }
> +    }
> +
> +    cpu->vcpu_dirty = true;
> +    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
> +
> +    return 0;
> +}
> +
> +int
> +nvmm_vcpu_exec(CPUState *cpu)
> +{
> +    int ret, fatal;
> +
> +    while (1) {
> +        if (cpu->exception_index >= EXCP_INTERRUPT) {
> +            ret = cpu->exception_index;
> +            cpu->exception_index = -1;
> +            break;
> +        }
> +
> +        fatal = nvmm_vcpu_loop(cpu);
> +
> +        if (fatal) {
> +            error_report("NVMM: Failed to execute a VCPU.");
> +            abort();
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +void
> +nvmm_destroy_vcpu(CPUState *cpu)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
> +
> +    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
> +    g_free(cpu->hax_vcpu);
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
> +    bool add, bool rom, const char *name)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    int ret, prot;
> +
> +    if (add) {
> +        prot = PROT_READ | PROT_EXEC;
> +        if (!rom) {
> +            prot |= PROT_WRITE;
> +        }
> +        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
> +    } else {
> +        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
> +    }
> +
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
> +            "Size:%p bytes, HostVA:%p, error=%d",
> +            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
> +            (void *)size, (void *)hva, errno);
> +    }
> +}
> +
> +static void
> +nvmm_process_section(MemoryRegionSection *section, int add)
> +{
> +    MemoryRegion *mr = section->mr;
> +    hwaddr start_pa = section->offset_within_address_space;
> +    ram_addr_t size = int128_get64(section->size);
> +    unsigned int delta;
> +    uintptr_t hva;
> +
> +    if (!memory_region_is_ram(mr)) {
> +        return;
> +    }
> +
> +    /* Adjust start_pa and size so that they are page-aligned. */
> +    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
> +    delta &= ~qemu_real_host_page_mask;
> +    if (delta > size) {
> +        return;
> +    }
> +    start_pa += delta;
> +    size -= delta;
> +    size &= qemu_real_host_page_mask;
> +    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
> +        return;
> +    }
> +
> +    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
> +        section->offset_within_region + delta;
> +
> +    nvmm_update_mapping(start_pa, size, hva, add,
> +        memory_region_is_rom(mr), mr->name);
> +}
> +
> +static void
> +nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
> +{
> +    memory_region_ref(section->mr);
> +    nvmm_process_section(section, 1);
> +}
> +
> +static void
> +nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
> +{
> +    nvmm_process_section(section, 0);
> +    memory_region_unref(section->mr);
> +}
> +
> +static void
> +nvmm_transaction_begin(MemoryListener *listener)
> +{
> +    /* nothing */
> +}
> +
> +static void
> +nvmm_transaction_commit(MemoryListener *listener)
> +{
> +    /* nothing */
> +}
> +
> +static void
> +nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
> +{
> +    MemoryRegion *mr = section->mr;
> +
> +    if (!memory_region_is_ram(mr)) {
> +        return;
> +    }
> +
> +    memory_region_set_dirty(mr, 0, int128_get64(section->size));
> +}
> +
> +static MemoryListener nvmm_memory_listener = {
> +    .begin = nvmm_transaction_begin,
> +    .commit = nvmm_transaction_commit,
> +    .region_add = nvmm_region_add,
> +    .region_del = nvmm_region_del,
> +    .log_sync = nvmm_log_sync,
> +    .priority = 10,
> +};
> +
> +static void
> +nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
> +{
> +    struct nvmm_machine *mach = get_nvmm_mach();
> +    uintptr_t hva = (uintptr_t)host;
> +    int ret;
> +
> +    ret = nvmm_hva_map(mach, hva, size);
> +
> +    if (ret == -1) {
> +        error_report("NVMM: Failed to map HVA, HostVA:%p "
> +            "Size:%p bytes, error=%d",
> +            (void *)hva, (void *)size, errno);
> +    }
> +}
> +
> +static struct RAMBlockNotifier nvmm_ram_notifier = {
> +    .ram_block_added = nvmm_ram_block_added
> +};
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static void
> +nvmm_handle_interrupt(CPUState *cpu, int mask)
> +{
> +    cpu->interrupt_request |= mask;
> +
> +    if (!qemu_cpu_is_self(cpu)) {
> +        qemu_cpu_kick(cpu);
> +    }
> +}
> +
> +/* -------------------------------------------------------------------------- */
> +
> +static int
> +nvmm_accel_init(MachineState *ms)
> +{
> +    int ret, err;
> +
> +    ret = nvmm_init();
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Initialization failed, error=%d", errno);
> +        return -err;
> +    }
> +
> +    ret = nvmm_capability(&qemu_mach.cap);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Unable to fetch capability, error=%d", errno);
> +        return -err;
> +    }
> +    if (qemu_mach.cap.version != 1) {
> +        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
> +        return -EPROGMISMATCH;
> +    }
> +    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
> +        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
> +        return -EPROGMISMATCH;
> +    }
> +
> +    ret = nvmm_machine_create(&qemu_mach.mach);
> +    if (ret == -1) {
> +        err = errno;
> +        error_report("NVMM: Machine creation failed, error=%d", errno);
> +        return -err;
> +    }
> +
> +    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
> +    ram_block_notifier_add(&nvmm_ram_notifier);
> +
> +    cpu_interrupt_handler = nvmm_handle_interrupt;
> +
> +    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
> +    return 0;
> +}
> +
> +int
> +nvmm_enabled(void)
> +{
> +    return nvmm_allowed;
> +}
> +
> +static void
> +nvmm_accel_class_init(ObjectClass *oc, void *data)
> +{
> +    AccelClass *ac = ACCEL_CLASS(oc);
> +    ac->name = "NVMM";
> +    ac->init_machine = nvmm_accel_init;
> +    ac->allowed = &nvmm_allowed;
> +}
> +
> +static const TypeInfo nvmm_accel_type = {
> +    .name = ACCEL_CLASS_NAME("nvmm"),
> +    .parent = TYPE_ACCEL,
> +    .class_init = nvmm_accel_class_init,
> +};
> +
> +static void
> +nvmm_type_init(void)
> +{
> +    type_register_static(&nvmm_accel_type);
> +}
> +
> +type_init(nvmm_type_init);
> --
> 2.25.0
>
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
                       ` (4 preceding siblings ...)
  2020-02-06 13:13     ` [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator no-reply
@ 2020-02-06 21:32     ` Kamil Rytarowski
  2020-02-06 21:32       ` [PATCH v4 1/4] Add the NVMM vcpu API Kamil Rytarowski
                         ` (4 more replies)
  5 siblings, 5 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 21:32 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

Hello QEMU Community!

Over the past year the NetBSD team has been working hard on a new user-mode API
for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
This new API adds user-mode capabilities to create and manage virtual machines,
configure memory mappings for guest machines, and create and control execution
of virtual processors.

With this new API we are now able to bring our hypervisor to the QEMU
community! The following patches implement the NetBSD Virtual Machine Monitor
accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.

When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
accelerator for use. At runtime using the '-accel nvmm' should see a
significant performance improvement over emulation, much like when using 'hax'
on NetBSD.

The documentation for this new API is visible at https://man.netbsd.org under
the libnvmm(3) and nvmm(4) pages.

NVMM was designed and implemented by Maxime Villard.

Thank you for your feedback.

Refrences:
https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html

Test plan:

1. Download a NetBSD 9.0 pre-release snapshot:
http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso

2. Install it natively on a not too old x86_64 hardware (Intel or AMD).

There is no support for nested virtualization in NVMM.

3. Setup the system.

 export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
 pkg_add git gmake python37 glib2 bison pkgconf pixman

Install mozilla-rootcerts and follow post-install instructions.

 pkg_add mozilla-rootcerts

More information: https://wiki.qemu.org/Hosts/BSD#NetBSD

4. Build qemu

 mkdir build
 cd build
 ../configure --python=python3.7
 gmake
 gmake check

5. Test

 qemu -accel nvmm ...


History:
v3 -> v4:
 - Correct build warning by adding a missing include
 - Do not set R8-R16 registers unless TARGET_X86_64
v2 -> v3:
 - Register nvmm in targetos NetBSD check
 - Stop including hw/boards.h
 - Rephrase old code comments (remove XXX)
v1 -> v2:
 - Included the testing plan as requested by Philippe Mathieu-Daude
 - Formatting nit fix in qemu-options.hx
 - Document NVMM in the accel section of qemu-options.hx

Maxime Villard (4):
  Add the NVMM vcpu API
  Add the NetBSD Virtual Machine Monitor accelerator.
  Introduce the NVMM impl
  Add the NVMM acceleration enlightenments

 accel/stubs/Makefile.objs |    1 +
 accel/stubs/nvmm-stub.c   |   43 ++
 configure                 |   37 ++
 cpus.c                    |   58 ++
 include/sysemu/hw_accel.h |   14 +
 include/sysemu/nvmm.h     |   35 ++
 qemu-options.hx           |   16 +-
 target/i386/Makefile.objs |    1 +
 target/i386/helper.c      |    2 +-
 target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
 10 files changed, 1424 insertions(+), 9 deletions(-)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h
 create mode 100644 target/i386/nvmm-all.c

--
2.25.0



^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v4 1/4] Add the NVMM vcpu API
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
@ 2020-02-06 21:32       ` Kamil Rytarowski
  2020-08-11 12:47         ` [PATCH v5 " Kamil Rytarowski
  2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-02-06 21:32       ` [PATCH v4 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                         ` (3 subsequent siblings)
  4 siblings, 2 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 21:32 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
introduces the nvmm.h sysemu API for managing the vcpu scheduling and
management.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 accel/stubs/Makefile.objs |  1 +
 accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h

diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
index 3894caf95d..09f2d3e1dd 100644
--- a/accel/stubs/Makefile.objs
+++ b/accel/stubs/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
 obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
 obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
+obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
 obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
 obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
new file mode 100644
index 0000000000..c2208b84a3
--- /dev/null
+++ b/accel/stubs/nvmm-stub.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/nvmm.h"
+
+int nvmm_init_vcpu(CPUState *cpu)
+{
+    return -1;
+}
+
+int nvmm_vcpu_exec(CPUState *cpu)
+{
+    return -1;
+}
+
+void nvmm_destroy_vcpu(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
new file mode 100644
index 0000000000..10496f3980
--- /dev/null
+++ b/include/sysemu/nvmm.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVMM_H
+#define QEMU_NVMM_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int nvmm_init_vcpu(CPUState *);
+int nvmm_vcpu_exec(CPUState *);
+void nvmm_destroy_vcpu(CPUState *);
+
+void nvmm_cpu_synchronize_state(CPUState *);
+void nvmm_cpu_synchronize_post_reset(CPUState *);
+void nvmm_cpu_synchronize_post_init(CPUState *);
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
+
+#ifdef CONFIG_NVMM
+
+int nvmm_enabled(void);
+
+#else /* CONFIG_NVMM */
+
+#define nvmm_enabled() (0)
+
+#endif /* CONFIG_NVMM */
+
+#endif /* CONFIG_NVMM */
--
2.25.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
  2020-02-06 21:32       ` [PATCH v4 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-02-06 21:32       ` Kamil Rytarowski
  2020-02-06 21:32       ` [PATCH v4 3/4] Introduce the NVMM impl Kamil Rytarowski
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 21:32 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Introduces the configure support for the new NetBSD Virtual Machine Monitor that
allows for hypervisor acceleration from usermode components on the NetBSD
platform.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 configure       | 37 +++++++++++++++++++++++++++++++++++++
 qemu-options.hx | 16 ++++++++--------
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index 115dc38085..d4a837cf9d 100755
--- a/configure
+++ b/configure
@@ -241,6 +241,17 @@ supported_whpx_target() {
     return 1
 }

+supported_nvmm_target() {
+    test "$nvmm" = "yes" || return 1
+    glob "$1" "*-softmmu" || return 1
+    case "${1%-softmmu}" in
+        i386|x86_64)
+            return 0
+        ;;
+    esac
+    return 1
+}
+
 supported_target() {
     case "$1" in
         *-softmmu)
@@ -268,6 +279,7 @@ supported_target() {
     supported_hax_target "$1" && return 0
     supported_hvf_target "$1" && return 0
     supported_whpx_target "$1" && return 0
+    supported_nvmm_target "$1" && return 0
     print_error "TCG disabled, but hardware accelerator not available for '$target'"
     return 1
 }
@@ -388,6 +400,7 @@ kvm="no"
 hax="no"
 hvf="no"
 whpx="no"
+nvmm="no"
 rdma=""
 pvrdma=""
 gprof="no"
@@ -823,6 +836,7 @@ DragonFly)
 NetBSD)
   bsd="yes"
   hax="yes"
+  nvmm="yes"
   make="${MAKE-gmake}"
   audio_drv_list="oss try-sdl"
   audio_possible_drivers="oss sdl"
@@ -1169,6 +1183,10 @@ for opt do
   ;;
   --enable-whpx) whpx="yes"
   ;;
+  --disable-nvmm) nvmm="no"
+  ;;
+  --enable-nvmm) nvmm="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1773,6 +1791,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   hax             HAX acceleration support
   hvf             Hypervisor.framework acceleration support
   whpx            Windows Hypervisor Platform acceleration support
+  nvmm            NetBSD Virtual Machine Monitor acceleration support
   rdma            Enable RDMA-based migration
   pvrdma          Enable PVRDMA support
   vde             support for vde network
@@ -2764,6 +2783,20 @@ if test "$whpx" != "no" ; then
     fi
 fi

+##########################################
+# NetBSD Virtual Machine Monitor (NVMM) accelerator check
+if test "$nvmm" != "no" ; then
+    if check_include "nvmm.h" ; then
+        nvmm="yes"
+	LIBS="-lnvmm $LIBS"
+    else
+        if test "$nvmm" = "yes"; then
+            feature_not_found "NVMM" "NVMM is not available"
+        fi
+        nvmm="no"
+    fi
+fi
+
 ##########################################
 # Sparse probe
 if test "$sparse" != "no" ; then
@@ -6543,6 +6576,7 @@ echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
 echo "WHPX support      $whpx"
+echo "NVMM support      $nvmm"
 echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
@@ -7828,6 +7862,9 @@ fi
 if test "$target_aligned_only" = "yes" ; then
   echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
 fi
+if supported_nvmm_target $target; then
+    echo "CONFIG_NVMM=y" >> $config_target_mak
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/qemu-options.hx b/qemu-options.hx
index 224a8e8712..10c046c916 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "-machine [type=]name[,prop[=value][,...]]\n"
     "                selects emulated machine ('-machine help' for list)\n"
     "                property accel=accel1[:accel2[:...]] selects accelerator\n"
-    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
+    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
@@ -64,9 +64,9 @@ Supported machine properties are:
 @table @option
 @item accel=@var{accels1}[:@var{accels2}[:...]]
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
-more than one accelerator specified, the next one is used if the previous one
-fails to initialize.
+kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
+If there is more than one accelerator specified, the next one is used if the
+previous one fails to initialize.
 @item vmport=on|off|auto
 Enables emulation of VMWare IO port, for vmmouse etc. auto says to select the
 value based on accel. For accel=xen the default is off otherwise the default
@@ -114,7 +114,7 @@ ETEXI

 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "-accel [accel=]accelerator[,prop[=value][,...]]\n"
-    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
+    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
     "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
     "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
@@ -124,9 +124,9 @@ STEXI
 @item -accel @var{name}[,prop=@var{value}[,...]]
 @findex -accel
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax, hvf, whpx or tcg can be available. By default, tcg is used. If there is
-more than one accelerator specified, the next one is used if the previous one
-fails to initialize.
+kvm, xen, hax, hvf, nvmm, whpx or tcg can be available. By default, tcg is used.
+If there is more than one accelerator specified, the next one is used if the
+previous one fails to initialize.
 @table @option
 @item igd-passthru=on|off
 When Xen is in use, this option controls whether Intel integrated graphics
--
2.25.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 3/4] Introduce the NVMM impl
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
  2020-02-06 21:32       ` [PATCH v4 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-02-06 21:32       ` [PATCH v4 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-02-06 21:32       ` Kamil Rytarowski
  2020-02-06 23:28         ` [PATCH v4 3/4 FIXUP] " Kamil Rytarowski
  2020-03-02 18:13         ` [PATCH v4 3/4] " Paolo Bonzini
  2020-02-06 21:32       ` [PATCH v4 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
  2020-02-17  9:07       ` [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  4 siblings, 2 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 21:32 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
QEMU much greater speed over the emulated x86_64 path's that are taken on
NetBSD today.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 target/i386/Makefile.objs |    1 +
 target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1227 insertions(+)
 create mode 100644 target/i386/nvmm-all.c

diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 48e0c28434..bdcdb32e93 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -17,6 +17,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_NVMM) += nvmm-all.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
new file mode 100644
index 0000000000..a21908f46a
--- /dev/null
+++ b/target/i386/nvmm-all.c
@@ -0,0 +1,1226 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "exec/ioport.h"
+#include "qemu-common.h"
+#include "strings.h"
+#include "sysemu/accel.h"
+#include "sysemu/nvmm.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#include <nvmm.h>
+
+struct qemu_vcpu {
+    struct nvmm_vcpu vcpu;
+    uint8_t tpr;
+    bool stop;
+
+    /* Window-exiting for INTs/NMIs. */
+    bool int_window_exit;
+    bool nmi_window_exit;
+
+    /* The guest is in an interrupt shadow (POP SS, etc). */
+    bool int_shadow;
+};
+
+struct qemu_machine {
+    struct nvmm_capability cap;
+    struct nvmm_machine mach;
+};
+
+/* -------------------------------------------------------------------------- */
+
+static bool nvmm_allowed;
+static struct qemu_machine qemu_mach;
+
+static struct qemu_vcpu *
+get_qemu_vcpu(CPUState *cpu)
+{
+    return (struct qemu_vcpu *)cpu->hax_vcpu;
+}
+
+static struct nvmm_machine *
+get_nvmm_mach(void)
+{
+    return &qemu_mach.mach;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
+{
+    uint32_t attrib = qseg->flags;
+
+    nseg->selector = qseg->selector;
+    nseg->limit = qseg->limit;
+    nseg->base = qseg->base;
+    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
+    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
+    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
+    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
+    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
+    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
+    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
+    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
+}
+
+static void
+nvmm_set_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* GPRs. */
+    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
+    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
+    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
+    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
+    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
+    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
+    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
+    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
+#ifdef TARGET_X86_64
+    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
+    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
+    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
+    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
+    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
+    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
+    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
+    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
+    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
+
+    /* Segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
+
+    /* Special segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
+
+    /* Control registers. */
+    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
+    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
+    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
+    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
+    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
+
+    /* Debug registers. */
+    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
+    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
+    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
+    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
+    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
+    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
+
+    /* FPU. */
+    state->fpu.fx_cw = env->fpuc;
+    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+    state->fpu.fx_tw = 0;
+    for (i = 0; i < 8; i++) {
+        state->fpu.fx_tw |= (!env->fptags[i]) << i;
+    }
+    state->fpu.fx_opcode = env->fpop;
+    state->fpu.fx_ip.fa_64 = env->fpip;
+    state->fpu.fx_dp.fa_64 = env->fpdp;
+    state->fpu.fx_mxcsr = env->mxcsr;
+    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
+            &env->xmm_regs[i].ZMM_Q(0), 8);
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
+            &env->xmm_regs[i].ZMM_Q(1), 8);
+    }
+
+    /* MSRs. */
+    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
+    state->msrs[NVMM_X64_MSR_STAR] = env->star;
+#ifdef TARGET_X86_64
+    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
+    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
+    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
+    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
+#endif
+    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
+    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
+    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
+    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
+    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to set virtual processor context,"
+            " error=%d", errno);
+    }
+}
+
+static void
+nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
+{
+    qseg->selector = nseg->selector;
+    qseg->limit = nseg->limit;
+    qseg->base = nseg->base;
+
+    qseg->flags =
+        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
+}
+
+static void
+nvmm_get_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap, tpr;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to get virtual processor context,"
+            " error=%d", errno);
+    }
+
+    /* GPRs. */
+    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
+    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
+    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
+    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
+    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
+    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
+    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
+    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
+#ifdef TARGET_X86_64
+    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
+    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
+    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
+    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
+    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
+    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
+    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
+    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    env->eip = state->gprs[NVMM_X64_GPR_RIP];
+    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
+
+    /* Segments. */
+    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
+    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
+    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
+    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
+    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
+    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
+
+    /* Special segments. */
+    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
+    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
+    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
+    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
+
+    /* Control registers. */
+    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
+    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
+    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
+    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
+    tpr = state->crs[NVMM_X64_CR_CR8];
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+    }
+    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
+
+    /* Debug registers. */
+    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
+    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
+    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
+    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
+    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
+    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
+
+    /* FPU. */
+    env->fpuc = state->fpu.fx_cw;
+    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
+    env->fpus = state->fpu.fx_sw & ~0x3800;
+    for (i = 0; i < 8; i++) {
+        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
+    }
+    env->fpop = state->fpu.fx_opcode;
+    env->fpip = state->fpu.fx_ip.fa_64;
+    env->fpdp = state->fpu.fx_dp.fa_64;
+    env->mxcsr = state->fpu.fx_mxcsr;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&env->xmm_regs[i].ZMM_Q(0),
+            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
+        memcpy(&env->xmm_regs[i].ZMM_Q(1),
+            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
+    }
+
+    /* MSRs. */
+    env->efer = state->msrs[NVMM_X64_MSR_EFER];
+    env->star = state->msrs[NVMM_X64_MSR_STAR];
+#ifdef TARGET_X86_64
+    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
+    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
+    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
+    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
+#endif
+    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
+    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
+    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
+    env->pat = state->msrs[NVMM_X64_MSR_PAT];
+    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
+
+    x86_update_hflags(env);
+}
+
+static bool
+nvmm_can_take_int(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_machine *mach = get_nvmm_mach();
+
+    if (qcpu->int_window_exit) {
+        return false;
+    }
+
+    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
+        struct nvmm_x64_state *state = vcpu->state;
+
+        /* Exit on interrupt window. */
+        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
+        state->intr.int_window_exiting = 1;
+        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
+
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+nvmm_can_take_nmi(CPUState *cpu)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    /*
+     * Contrary to INTs, NMIs always schedule an exit when they are
+     * completed. Therefore, if window-exiting is enabled, it means
+     * NMIs are blocked.
+     */
+    if (qcpu->nmi_window_exit) {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Called before the VCPU is run. We inject events generated by the I/O
+ * thread, and synchronize the guest TPR.
+ */
+static void
+nvmm_vcpu_pre_run(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    struct nvmm_vcpu_event *event = vcpu->event;
+    bool has_event = false;
+    bool sync_tpr = false;
+    uint8_t tpr;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+
+    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        sync_tpr = true;
+    }
+
+    /*
+     * Force the VCPU out of its inner loop to process any INIT requests
+     * or commit pending TPR access.
+     */
+    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+        cpu->exit_request = 1;
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        if (nvmm_can_take_nmi(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = 2;
+            has_event = true;
+        }
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+        if (nvmm_can_take_int(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = cpu_get_pic_interrupt(env);
+            has_event = true;
+        }
+    }
+
+    /* Don't want SMIs. */
+    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
+    }
+
+    if (sync_tpr) {
+        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to get CPU state,"
+                " error=%d", errno);
+        }
+
+        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+
+        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to set CPU state,"
+                " error=%d", errno);
+        }
+    }
+
+    if (has_event) {
+        ret = nvmm_vcpu_inject(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to inject event,"
+                " error=%d", errno);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+}
+
+/*
+ * Called after the VCPU ran. We synchronize the host view of the TPR and
+ * RFLAGS.
+ */
+static void
+nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    uint64_t tpr;
+
+    env->eflags = exit->exitstate.rflags;
+    qcpu->int_shadow = exit->exitstate.int_shadow;
+    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
+    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
+
+    tpr = exit->exitstate.cr8;
+    if (qcpu->tpr != tpr) {
+        qcpu->tpr = tpr;
+        qemu_mutex_lock_iothread();
+        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_io_callback(struct nvmm_io *io)
+{
+    MemTxAttrs attrs = { 0 };
+    int ret;
+
+    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
+        io->size, !io->in);
+    if (ret != MEMTX_OK) {
+        error_report("NVMM: I/O Transaction Failed "
+            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
+            io->port, io->size);
+    }
+
+    /* Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static void
+nvmm_mem_callback(struct nvmm_mem *mem)
+{
+    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static struct nvmm_assist_callbacks nvmm_callbacks = {
+    .io = nvmm_io_callback,
+    .mem = nvmm_mem_callback
+};
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_mem(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: Mem Assist Failed [gpa=%p]",
+            (void *)vcpu->exit->u.mem.gpa);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_io(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: I/O Assist Failed [port=%d]",
+            (int)vcpu->exit->u.io.port);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    switch (exit->u.rdmsr.msr) {
+    case MSR_IA32_APICBASE:
+        val = cpu_get_apic_base(x86_cpu->apic_state);
+        break;
+    case MSR_MTRRcap:
+    case MSR_MTRRdefType:
+    case MSR_MCG_CAP:
+    case MSR_MCG_STATUS:
+        val = 0;
+        break;
+    default: /* More MSRs to add? */
+        val = 0;
+        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
+            exit->u.rdmsr.msr);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
+    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    val = exit->u.wrmsr.val;
+
+    switch (exit->u.wrmsr.msr) {
+    case MSR_IA32_APICBASE:
+        cpu_set_apic_base(x86_cpu->apic_state, val);
+        break;
+    case MSR_MTRRdefType:
+    case MSR_MCG_STATUS:
+        break;
+    default: /* More MSRs to add? */
+        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
+            exit->u.wrmsr.msr, val);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+          (env->eflags & IF_MASK)) &&
+        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->exception_index = EXCP_HLT;
+        cpu->halted = true;
+        ret = 1;
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
+static int
+nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    struct nvmm_vcpu_event *event = vcpu->event;
+
+    event->type = NVMM_VCPU_EVENT_EXCP;
+    event->vector = 6;
+    event->u.excp.error = 0;
+
+    return nvmm_vcpu_inject(mach, vcpu);
+}
+
+static int
+nvmm_vcpu_loop(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_vcpu_exit *exit = vcpu->exit;
+    int ret;
+
+    /*
+     * Some asynchronous events must be handled outside of the inner
+     * VCPU loop. They are handled here.
+     */
+    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_init(x86_cpu);
+        /* set int/nmi windows back to the reset state */
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
+        apic_poll_irq(x86_cpu->apic_state);
+    }
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = false;
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_sipi(x86_cpu);
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        nvmm_cpu_synchronize_state(cpu);
+        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
+            env->tpr_access_type);
+    }
+
+    if (cpu->halted) {
+        cpu->exception_index = EXCP_HLT;
+        atomic_set(&cpu->exit_request, false);
+        return 0;
+    }
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    /*
+     * Inner VCPU loop.
+     */
+    do {
+        if (cpu->vcpu_dirty) {
+            nvmm_set_registers(cpu);
+            cpu->vcpu_dirty = false;
+        }
+
+        if (qcpu->stop) {
+            cpu->exception_index = EXCP_INTERRUPT;
+            qcpu->stop = false;
+            ret = 1;
+            break;
+        }
+
+        nvmm_vcpu_pre_run(cpu);
+
+        if (atomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        ret = nvmm_vcpu_run(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to exec a virtual processor,"
+                " error=%d", errno);
+            break;
+        }
+
+        nvmm_vcpu_post_run(cpu, exit);
+
+        switch (exit->reason) {
+        case NVMM_VCPU_EXIT_NONE:
+            break;
+        case NVMM_VCPU_EXIT_MEMORY:
+            ret = nvmm_handle_mem(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_IO:
+            ret = nvmm_handle_io(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_INT_READY:
+        case NVMM_VCPU_EXIT_NMI_READY:
+        case NVMM_VCPU_EXIT_TPR_CHANGED:
+            break;
+        case NVMM_VCPU_EXIT_HALTED:
+            ret = nvmm_handle_halted(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_SHUTDOWN:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            cpu->exception_index = EXCP_INTERRUPT;
+            ret = 1;
+            break;
+        case NVMM_VCPU_EXIT_RDMSR:
+            ret = nvmm_handle_rdmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_WRMSR:
+            ret = nvmm_handle_wrmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_MONITOR:
+        case NVMM_VCPU_EXIT_MWAIT:
+            ret = nvmm_inject_ud(mach, vcpu);
+            break;
+        default:
+            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
+                exit->reason, exit->u.inv.hwcode);
+            nvmm_get_registers(cpu);
+            qemu_mutex_lock_iothread();
+            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+            qemu_mutex_unlock_iothread();
+            ret = -1;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+    current_cpu = cpu;
+
+    atomic_set(&cpu->exit_request, false);
+
+    return ret < 0;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_get_registers(cpu);
+    cpu->vcpu_dirty = true;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static Error *nvmm_migration_blocker;
+
+static void
+nvmm_ipi_signal(int sigcpu)
+{
+    struct qemu_vcpu *qcpu;
+
+    if (current_cpu) {
+        qcpu = get_qemu_vcpu(current_cpu);
+        qcpu->stop = true;
+    }
+}
+
+static void
+nvmm_init_cpu_signals(void)
+{
+    struct sigaction sigact;
+    sigset_t set;
+
+    /* Install the IPI handler. */
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = nvmm_ipi_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    /* Allow IPIs on the current thread. */
+    sigprocmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
+int
+nvmm_init_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct nvmm_vcpu_conf_cpuid cpuid;
+    struct nvmm_vcpu_conf_tpr tpr;
+    Error *local_error = NULL;
+    struct qemu_vcpu *qcpu;
+    int ret, err;
+
+    nvmm_init_cpu_signals();
+
+    if (nvmm_migration_blocker == NULL) {
+        error_setg(&nvmm_migration_blocker,
+            "NVMM: Migration not supported");
+
+        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
+        if (local_error) {
+            error_report_err(local_error);
+            migrate_del_blocker(nvmm_migration_blocker);
+            error_free(nvmm_migration_blocker);
+            return -EINVAL;
+        }
+    }
+
+    qcpu = g_malloc0(sizeof(*qcpu));
+    if (qcpu == NULL) {
+        error_report("NVMM: Failed to allocate VCPU context.");
+        return -ENOMEM;
+    }
+
+    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to create a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    memset(&cpuid, 0, sizeof(cpuid));
+    cpuid.mask = 1;
+    cpuid.leaf = 0x00000001;
+    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
+        &cpuid);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
+        &nvmm_callbacks);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
+        memset(&tpr, 0, sizeof(tpr));
+        tpr.exit_changed = 1;
+        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
+        if (ret == -1) {
+            err = errno;
+            error_report("NVMM: Failed to configure a virtual processor,"
+                " error=%d", err);
+            g_free(qcpu);
+            return -err;
+        }
+    }
+
+    cpu->vcpu_dirty = true;
+    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
+
+    return 0;
+}
+
+int
+nvmm_vcpu_exec(CPUState *cpu)
+{
+    int ret, fatal;
+
+    while (1) {
+        if (cpu->exception_index >= EXCP_INTERRUPT) {
+            ret = cpu->exception_index;
+            cpu->exception_index = -1;
+            break;
+        }
+
+        fatal = nvmm_vcpu_loop(cpu);
+
+        if (fatal) {
+            error_report("NVMM: Failed to execute a VCPU.");
+            abort();
+        }
+    }
+
+    return ret;
+}
+
+void
+nvmm_destroy_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
+    g_free(cpu->hax_vcpu);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
+    bool add, bool rom, const char *name)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    int ret, prot;
+
+    if (add) {
+        prot = PROT_READ | PROT_EXEC;
+        if (!rom) {
+            prot |= PROT_WRITE;
+        }
+        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
+    } else {
+        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
+    }
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
+            "Size:%p bytes, HostVA:%p, error=%d",
+            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
+            (void *)size, (void *)hva, errno);
+    }
+}
+
+static void
+nvmm_process_section(MemoryRegionSection *section, int add)
+{
+    MemoryRegion *mr = section->mr;
+    hwaddr start_pa = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned int delta;
+    uintptr_t hva;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    /* Adjust start_pa and size so that they are page-aligned. */
+    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
+    delta &= ~qemu_real_host_page_mask;
+    if (delta > size) {
+        return;
+    }
+    start_pa += delta;
+    size -= delta;
+    size &= qemu_real_host_page_mask;
+    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
+        return;
+    }
+
+    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
+        section->offset_within_region + delta;
+
+    nvmm_update_mapping(start_pa, size, hva, add,
+        memory_region_is_rom(mr), mr->name);
+}
+
+static void
+nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    memory_region_ref(section->mr);
+    nvmm_process_section(section, 1);
+}
+
+static void
+nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
+{
+    nvmm_process_section(section, 0);
+    memory_region_unref(section->mr);
+}
+
+static void
+nvmm_transaction_begin(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_transaction_commit(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    memory_region_set_dirty(mr, 0, int128_get64(section->size));
+}
+
+static MemoryListener nvmm_memory_listener = {
+    .begin = nvmm_transaction_begin,
+    .commit = nvmm_transaction_commit,
+    .region_add = nvmm_region_add,
+    .region_del = nvmm_region_del,
+    .log_sync = nvmm_log_sync,
+    .priority = 10,
+};
+
+static void
+nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    uintptr_t hva = (uintptr_t)host;
+    int ret;
+
+    ret = nvmm_hva_map(mach, hva, size);
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to map HVA, HostVA:%p "
+            "Size:%p bytes, error=%d",
+            (void *)hva, (void *)size, errno);
+    }
+}
+
+static struct RAMBlockNotifier nvmm_ram_notifier = {
+    .ram_block_added = nvmm_ram_block_added
+};
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_handle_interrupt(CPUState *cpu, int mask)
+{
+    cpu->interrupt_request |= mask;
+
+    if (!qemu_cpu_is_self(cpu)) {
+        qemu_cpu_kick(cpu);
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_accel_init(MachineState *ms)
+{
+    int ret, err;
+
+    ret = nvmm_init();
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Initialization failed, error=%d", errno);
+        return -err;
+    }
+
+    ret = nvmm_capability(&qemu_mach.cap);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Unable to fetch capability, error=%d", errno);
+        return -err;
+    }
+    if (qemu_mach.cap.version != 1) {
+        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
+        return -EPROGMISMATCH;
+    }
+    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
+        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
+        return -EPROGMISMATCH;
+    }
+
+    ret = nvmm_machine_create(&qemu_mach.mach);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Machine creation failed, error=%d", errno);
+        return -err;
+    }
+
+    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
+    ram_block_notifier_add(&nvmm_ram_notifier);
+
+    cpu_interrupt_handler = nvmm_handle_interrupt;
+
+    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
+    return 0;
+}
+
+int
+nvmm_enabled(void)
+{
+    return nvmm_allowed;
+}
+
+static void
+nvmm_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "NVMM";
+    ac->init_machine = nvmm_accel_init;
+    ac->allowed = &nvmm_allowed;
+}
+
+static const TypeInfo nvmm_accel_type = {
+    .name = ACCEL_CLASS_NAME("nvmm"),
+    .parent = TYPE_ACCEL,
+    .class_init = nvmm_accel_class_init,
+};
+
+static void
+nvmm_type_init(void)
+{
+    type_register_static(&nvmm_accel_type);
+}
+
+type_init(nvmm_type_init);
--
2.25.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 4/4] Add the NVMM acceleration enlightenments
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
                         ` (2 preceding siblings ...)
  2020-02-06 21:32       ` [PATCH v4 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-02-06 21:32       ` Kamil Rytarowski
  2020-02-17  9:07       ` [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  4 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 21:32 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
accelerator on NetBSD platforms.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 cpus.c                    | 58 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/hw_accel.h | 14 ++++++++++
 target/i386/helper.c      |  2 +-
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index b4f8b84b61..f833da4a60 100644
--- a/cpus.c
+++ b/cpus.c
@@ -42,6 +42,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"
 #include "exec/exec-all.h"

 #include "qemu/thread.h"
@@ -1670,6 +1671,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     return NULL;
 }

+static void *qemu_nvmm_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    assert(nvmm_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = nvmm_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = nvmm_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    nvmm_destroy_vcpu(cpu);
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 #ifdef _WIN32
 static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 {
@@ -2038,6 +2081,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }

+static void qemu_nvmm_start_vcpu(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
 static void qemu_dummy_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -2078,6 +2134,8 @@ void qemu_init_vcpu(CPUState *cpu)
         qemu_tcg_init_vcpu(cpu);
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
+    } else if (nvmm_enabled()) {
+        qemu_nvmm_start_vcpu(cpu);
     } else {
         qemu_dummy_start_vcpu(cpu);
     }
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index 0ec2372477..dbfa7a02f9 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -15,6 +15,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/kvm.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"

 static inline void cpu_synchronize_state(CPUState *cpu)
 {
@@ -27,6 +28,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_state(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_state(cpu);
+    }
 }

 static inline void cpu_synchronize_post_reset(CPUState *cpu)
@@ -40,6 +44,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_reset(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_reset(cpu);
+    }
+
 }

 static inline void cpu_synchronize_post_init(CPUState *cpu)
@@ -53,6 +61,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_init(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_init(cpu);
+    }
 }

 static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
@@ -66,6 +77,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_pre_loadvm(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_pre_loadvm(cpu);
+    }
 }

 #endif /* QEMU_HW_ACCEL_H */
diff --git a/target/i386/helper.c b/target/i386/helper.c
index c3a6e4fabe..2e79d61329 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -981,7 +981,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
     X86CPU *cpu = env_archcpu(env);
     CPUState *cs = env_cpu(env);

-    if (kvm_enabled() || whpx_enabled()) {
+    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
         env->tpr_access_type = access;

         cpu_interrupt(cs, CPU_INTERRUPT_TPR);
--
2.25.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v4 3/4 FIXUP] Introduce the NVMM impl
  2020-02-06 21:32       ` [PATCH v4 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-02-06 23:28         ` Kamil Rytarowski
  2020-03-02 18:13         ` [PATCH v4 3/4] " Paolo Bonzini
  1 sibling, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-06 23:28 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
QEMU much greater speed over the emulated x86_64 path's that are taken on
NetBSD today.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 target/i386/Makefile.objs |    1 +
 target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1227 insertions(+)
 create mode 100644 target/i386/nvmm-all.c

diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 48e0c28434..bdcdb32e93 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -17,6 +17,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_NVMM) += nvmm-all.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
new file mode 100644
index 0000000000..b3f1c11984
--- /dev/null
+++ b/target/i386/nvmm-all.c
@@ -0,0 +1,1226 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "exec/ioport.h"
+#include "qemu-common.h"
+#include "strings.h"
+#include "sysemu/accel.h"
+#include "sysemu/nvmm.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#include <nvmm.h>
+
+struct qemu_vcpu {
+    struct nvmm_vcpu vcpu;
+    uint8_t tpr;
+    bool stop;
+
+    /* Window-exiting for INTs/NMIs. */
+    bool int_window_exit;
+    bool nmi_window_exit;
+
+    /* The guest is in an interrupt shadow (POP SS, etc). */
+    bool int_shadow;
+};
+
+struct qemu_machine {
+    struct nvmm_capability cap;
+    struct nvmm_machine mach;
+};
+
+/* -------------------------------------------------------------------------- */
+
+static bool nvmm_allowed;
+static struct qemu_machine qemu_mach;
+
+static struct qemu_vcpu *
+get_qemu_vcpu(CPUState *cpu)
+{
+    return (struct qemu_vcpu *)cpu->hax_vcpu;
+}
+
+static struct nvmm_machine *
+get_nvmm_mach(void)
+{
+    return &qemu_mach.mach;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
+{
+    uint32_t attrib = qseg->flags;
+
+    nseg->selector = qseg->selector;
+    nseg->limit = qseg->limit;
+    nseg->base = qseg->base;
+    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
+    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
+    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
+    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
+    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
+    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
+    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
+    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
+}
+
+static void
+nvmm_set_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* GPRs. */
+    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
+    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
+    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
+    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
+    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
+    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
+    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
+    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
+#ifdef TARGET_X86_64
+    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
+    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
+    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
+    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
+    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
+    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
+    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
+    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
+    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
+
+    /* Segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
+
+    /* Special segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
+
+    /* Control registers. */
+    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
+    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
+    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
+    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
+    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
+
+    /* Debug registers. */
+    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
+    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
+    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
+    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
+    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
+    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
+
+    /* FPU. */
+    state->fpu.fx_cw = env->fpuc;
+    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+    state->fpu.fx_tw = 0;
+    for (i = 0; i < 8; i++) {
+        state->fpu.fx_tw |= (!env->fptags[i]) << i;
+    }
+    state->fpu.fx_opcode = env->fpop;
+    state->fpu.fx_ip.fa_64 = env->fpip;
+    state->fpu.fx_dp.fa_64 = env->fpdp;
+    state->fpu.fx_mxcsr = env->mxcsr;
+    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
+    for (i = 0; i < CPU_NB_REGS; i++) {
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
+            &env->xmm_regs[i].ZMM_Q(0), 8);
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
+            &env->xmm_regs[i].ZMM_Q(1), 8);
+    }
+
+    /* MSRs. */
+    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
+    state->msrs[NVMM_X64_MSR_STAR] = env->star;
+#ifdef TARGET_X86_64
+    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
+    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
+    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
+    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
+#endif
+    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
+    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
+    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
+    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
+    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to set virtual processor context,"
+            " error=%d", errno);
+    }
+}
+
+static void
+nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
+{
+    qseg->selector = nseg->selector;
+    qseg->limit = nseg->limit;
+    qseg->base = nseg->base;
+
+    qseg->flags =
+        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
+}
+
+static void
+nvmm_get_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap, tpr;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to get virtual processor context,"
+            " error=%d", errno);
+    }
+
+    /* GPRs. */
+    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
+    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
+    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
+    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
+    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
+    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
+    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
+    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
+#ifdef TARGET_X86_64
+    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
+    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
+    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
+    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
+    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
+    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
+    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
+    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    env->eip = state->gprs[NVMM_X64_GPR_RIP];
+    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
+
+    /* Segments. */
+    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
+    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
+    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
+    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
+    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
+    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
+
+    /* Special segments. */
+    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
+    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
+    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
+    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
+
+    /* Control registers. */
+    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
+    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
+    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
+    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
+    tpr = state->crs[NVMM_X64_CR_CR8];
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+    }
+    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
+
+    /* Debug registers. */
+    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
+    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
+    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
+    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
+    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
+    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
+
+    /* FPU. */
+    env->fpuc = state->fpu.fx_cw;
+    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
+    env->fpus = state->fpu.fx_sw & ~0x3800;
+    for (i = 0; i < 8; i++) {
+        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
+    }
+    env->fpop = state->fpu.fx_opcode;
+    env->fpip = state->fpu.fx_ip.fa_64;
+    env->fpdp = state->fpu.fx_dp.fa_64;
+    env->mxcsr = state->fpu.fx_mxcsr;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
+    for (i = 0; i < CPU_NB_REGS; i++) {
+        memcpy(&env->xmm_regs[i].ZMM_Q(0),
+            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
+        memcpy(&env->xmm_regs[i].ZMM_Q(1),
+            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
+    }
+
+    /* MSRs. */
+    env->efer = state->msrs[NVMM_X64_MSR_EFER];
+    env->star = state->msrs[NVMM_X64_MSR_STAR];
+#ifdef TARGET_X86_64
+    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
+    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
+    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
+    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
+#endif
+    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
+    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
+    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
+    env->pat = state->msrs[NVMM_X64_MSR_PAT];
+    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
+
+    x86_update_hflags(env);
+}
+
+static bool
+nvmm_can_take_int(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_machine *mach = get_nvmm_mach();
+
+    if (qcpu->int_window_exit) {
+        return false;
+    }
+
+    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
+        struct nvmm_x64_state *state = vcpu->state;
+
+        /* Exit on interrupt window. */
+        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
+        state->intr.int_window_exiting = 1;
+        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
+
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+nvmm_can_take_nmi(CPUState *cpu)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    /*
+     * Contrary to INTs, NMIs always schedule an exit when they are
+     * completed. Therefore, if window-exiting is enabled, it means
+     * NMIs are blocked.
+     */
+    if (qcpu->nmi_window_exit) {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Called before the VCPU is run. We inject events generated by the I/O
+ * thread, and synchronize the guest TPR.
+ */
+static void
+nvmm_vcpu_pre_run(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    struct nvmm_vcpu_event *event = vcpu->event;
+    bool has_event = false;
+    bool sync_tpr = false;
+    uint8_t tpr;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+
+    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        sync_tpr = true;
+    }
+
+    /*
+     * Force the VCPU out of its inner loop to process any INIT requests
+     * or commit pending TPR access.
+     */
+    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+        cpu->exit_request = 1;
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        if (nvmm_can_take_nmi(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = 2;
+            has_event = true;
+        }
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+        if (nvmm_can_take_int(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = cpu_get_pic_interrupt(env);
+            has_event = true;
+        }
+    }
+
+    /* Don't want SMIs. */
+    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
+    }
+
+    if (sync_tpr) {
+        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to get CPU state,"
+                " error=%d", errno);
+        }
+
+        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+
+        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to set CPU state,"
+                " error=%d", errno);
+        }
+    }
+
+    if (has_event) {
+        ret = nvmm_vcpu_inject(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to inject event,"
+                " error=%d", errno);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+}
+
+/*
+ * Called after the VCPU ran. We synchronize the host view of the TPR and
+ * RFLAGS.
+ */
+static void
+nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    uint64_t tpr;
+
+    env->eflags = exit->exitstate.rflags;
+    qcpu->int_shadow = exit->exitstate.int_shadow;
+    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
+    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
+
+    tpr = exit->exitstate.cr8;
+    if (qcpu->tpr != tpr) {
+        qcpu->tpr = tpr;
+        qemu_mutex_lock_iothread();
+        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_io_callback(struct nvmm_io *io)
+{
+    MemTxAttrs attrs = { 0 };
+    int ret;
+
+    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
+        io->size, !io->in);
+    if (ret != MEMTX_OK) {
+        error_report("NVMM: I/O Transaction Failed "
+            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
+            io->port, io->size);
+    }
+
+    /* Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static void
+nvmm_mem_callback(struct nvmm_mem *mem)
+{
+    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static struct nvmm_assist_callbacks nvmm_callbacks = {
+    .io = nvmm_io_callback,
+    .mem = nvmm_mem_callback
+};
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_mem(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: Mem Assist Failed [gpa=%p]",
+            (void *)vcpu->exit->u.mem.gpa);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_io(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: I/O Assist Failed [port=%d]",
+            (int)vcpu->exit->u.io.port);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    switch (exit->u.rdmsr.msr) {
+    case MSR_IA32_APICBASE:
+        val = cpu_get_apic_base(x86_cpu->apic_state);
+        break;
+    case MSR_MTRRcap:
+    case MSR_MTRRdefType:
+    case MSR_MCG_CAP:
+    case MSR_MCG_STATUS:
+        val = 0;
+        break;
+    default: /* More MSRs to add? */
+        val = 0;
+        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
+            exit->u.rdmsr.msr);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
+    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    val = exit->u.wrmsr.val;
+
+    switch (exit->u.wrmsr.msr) {
+    case MSR_IA32_APICBASE:
+        cpu_set_apic_base(x86_cpu->apic_state, val);
+        break;
+    case MSR_MTRRdefType:
+    case MSR_MCG_STATUS:
+        break;
+    default: /* More MSRs to add? */
+        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
+            exit->u.wrmsr.msr, val);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+          (env->eflags & IF_MASK)) &&
+        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->exception_index = EXCP_HLT;
+        cpu->halted = true;
+        ret = 1;
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
+static int
+nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    struct nvmm_vcpu_event *event = vcpu->event;
+
+    event->type = NVMM_VCPU_EVENT_EXCP;
+    event->vector = 6;
+    event->u.excp.error = 0;
+
+    return nvmm_vcpu_inject(mach, vcpu);
+}
+
+static int
+nvmm_vcpu_loop(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_vcpu_exit *exit = vcpu->exit;
+    int ret;
+
+    /*
+     * Some asynchronous events must be handled outside of the inner
+     * VCPU loop. They are handled here.
+     */
+    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_init(x86_cpu);
+        /* set int/nmi windows back to the reset state */
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
+        apic_poll_irq(x86_cpu->apic_state);
+    }
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = false;
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_sipi(x86_cpu);
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        nvmm_cpu_synchronize_state(cpu);
+        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
+            env->tpr_access_type);
+    }
+
+    if (cpu->halted) {
+        cpu->exception_index = EXCP_HLT;
+        atomic_set(&cpu->exit_request, false);
+        return 0;
+    }
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    /*
+     * Inner VCPU loop.
+     */
+    do {
+        if (cpu->vcpu_dirty) {
+            nvmm_set_registers(cpu);
+            cpu->vcpu_dirty = false;
+        }
+
+        if (qcpu->stop) {
+            cpu->exception_index = EXCP_INTERRUPT;
+            qcpu->stop = false;
+            ret = 1;
+            break;
+        }
+
+        nvmm_vcpu_pre_run(cpu);
+
+        if (atomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        ret = nvmm_vcpu_run(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to exec a virtual processor,"
+                " error=%d", errno);
+            break;
+        }
+
+        nvmm_vcpu_post_run(cpu, exit);
+
+        switch (exit->reason) {
+        case NVMM_VCPU_EXIT_NONE:
+            break;
+        case NVMM_VCPU_EXIT_MEMORY:
+            ret = nvmm_handle_mem(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_IO:
+            ret = nvmm_handle_io(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_INT_READY:
+        case NVMM_VCPU_EXIT_NMI_READY:
+        case NVMM_VCPU_EXIT_TPR_CHANGED:
+            break;
+        case NVMM_VCPU_EXIT_HALTED:
+            ret = nvmm_handle_halted(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_SHUTDOWN:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            cpu->exception_index = EXCP_INTERRUPT;
+            ret = 1;
+            break;
+        case NVMM_VCPU_EXIT_RDMSR:
+            ret = nvmm_handle_rdmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_WRMSR:
+            ret = nvmm_handle_wrmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_MONITOR:
+        case NVMM_VCPU_EXIT_MWAIT:
+            ret = nvmm_inject_ud(mach, vcpu);
+            break;
+        default:
+            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
+                exit->reason, exit->u.inv.hwcode);
+            nvmm_get_registers(cpu);
+            qemu_mutex_lock_iothread();
+            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+            qemu_mutex_unlock_iothread();
+            ret = -1;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+    current_cpu = cpu;
+
+    atomic_set(&cpu->exit_request, false);
+
+    return ret < 0;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_get_registers(cpu);
+    cpu->vcpu_dirty = true;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static Error *nvmm_migration_blocker;
+
+static void
+nvmm_ipi_signal(int sigcpu)
+{
+    struct qemu_vcpu *qcpu;
+
+    if (current_cpu) {
+        qcpu = get_qemu_vcpu(current_cpu);
+        qcpu->stop = true;
+    }
+}
+
+static void
+nvmm_init_cpu_signals(void)
+{
+    struct sigaction sigact;
+    sigset_t set;
+
+    /* Install the IPI handler. */
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = nvmm_ipi_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    /* Allow IPIs on the current thread. */
+    sigprocmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
+int
+nvmm_init_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct nvmm_vcpu_conf_cpuid cpuid;
+    struct nvmm_vcpu_conf_tpr tpr;
+    Error *local_error = NULL;
+    struct qemu_vcpu *qcpu;
+    int ret, err;
+
+    nvmm_init_cpu_signals();
+
+    if (nvmm_migration_blocker == NULL) {
+        error_setg(&nvmm_migration_blocker,
+            "NVMM: Migration not supported");
+
+        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
+        if (local_error) {
+            error_report_err(local_error);
+            migrate_del_blocker(nvmm_migration_blocker);
+            error_free(nvmm_migration_blocker);
+            return -EINVAL;
+        }
+    }
+
+    qcpu = g_malloc0(sizeof(*qcpu));
+    if (qcpu == NULL) {
+        error_report("NVMM: Failed to allocate VCPU context.");
+        return -ENOMEM;
+    }
+
+    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to create a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    memset(&cpuid, 0, sizeof(cpuid));
+    cpuid.mask = 1;
+    cpuid.leaf = 0x00000001;
+    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
+        &cpuid);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
+        &nvmm_callbacks);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
+        memset(&tpr, 0, sizeof(tpr));
+        tpr.exit_changed = 1;
+        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
+        if (ret == -1) {
+            err = errno;
+            error_report("NVMM: Failed to configure a virtual processor,"
+                " error=%d", err);
+            g_free(qcpu);
+            return -err;
+        }
+    }
+
+    cpu->vcpu_dirty = true;
+    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
+
+    return 0;
+}
+
+int
+nvmm_vcpu_exec(CPUState *cpu)
+{
+    int ret, fatal;
+
+    while (1) {
+        if (cpu->exception_index >= EXCP_INTERRUPT) {
+            ret = cpu->exception_index;
+            cpu->exception_index = -1;
+            break;
+        }
+
+        fatal = nvmm_vcpu_loop(cpu);
+
+        if (fatal) {
+            error_report("NVMM: Failed to execute a VCPU.");
+            abort();
+        }
+    }
+
+    return ret;
+}
+
+void
+nvmm_destroy_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
+    g_free(cpu->hax_vcpu);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
+    bool add, bool rom, const char *name)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    int ret, prot;
+
+    if (add) {
+        prot = PROT_READ | PROT_EXEC;
+        if (!rom) {
+            prot |= PROT_WRITE;
+        }
+        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
+    } else {
+        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
+    }
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
+            "Size:%p bytes, HostVA:%p, error=%d",
+            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
+            (void *)size, (void *)hva, errno);
+    }
+}
+
+static void
+nvmm_process_section(MemoryRegionSection *section, int add)
+{
+    MemoryRegion *mr = section->mr;
+    hwaddr start_pa = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned int delta;
+    uintptr_t hva;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    /* Adjust start_pa and size so that they are page-aligned. */
+    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
+    delta &= ~qemu_real_host_page_mask;
+    if (delta > size) {
+        return;
+    }
+    start_pa += delta;
+    size -= delta;
+    size &= qemu_real_host_page_mask;
+    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
+        return;
+    }
+
+    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
+        section->offset_within_region + delta;
+
+    nvmm_update_mapping(start_pa, size, hva, add,
+        memory_region_is_rom(mr), mr->name);
+}
+
+static void
+nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    memory_region_ref(section->mr);
+    nvmm_process_section(section, 1);
+}
+
+static void
+nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
+{
+    nvmm_process_section(section, 0);
+    memory_region_unref(section->mr);
+}
+
+static void
+nvmm_transaction_begin(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_transaction_commit(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    memory_region_set_dirty(mr, 0, int128_get64(section->size));
+}
+
+static MemoryListener nvmm_memory_listener = {
+    .begin = nvmm_transaction_begin,
+    .commit = nvmm_transaction_commit,
+    .region_add = nvmm_region_add,
+    .region_del = nvmm_region_del,
+    .log_sync = nvmm_log_sync,
+    .priority = 10,
+};
+
+static void
+nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    uintptr_t hva = (uintptr_t)host;
+    int ret;
+
+    ret = nvmm_hva_map(mach, hva, size);
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to map HVA, HostVA:%p "
+            "Size:%p bytes, error=%d",
+            (void *)hva, (void *)size, errno);
+    }
+}
+
+static struct RAMBlockNotifier nvmm_ram_notifier = {
+    .ram_block_added = nvmm_ram_block_added
+};
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_handle_interrupt(CPUState *cpu, int mask)
+{
+    cpu->interrupt_request |= mask;
+
+    if (!qemu_cpu_is_self(cpu)) {
+        qemu_cpu_kick(cpu);
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_accel_init(MachineState *ms)
+{
+    int ret, err;
+
+    ret = nvmm_init();
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Initialization failed, error=%d", errno);
+        return -err;
+    }
+
+    ret = nvmm_capability(&qemu_mach.cap);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Unable to fetch capability, error=%d", errno);
+        return -err;
+    }
+    if (qemu_mach.cap.version != 1) {
+        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
+        return -EPROGMISMATCH;
+    }
+    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
+        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
+        return -EPROGMISMATCH;
+    }
+
+    ret = nvmm_machine_create(&qemu_mach.mach);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Machine creation failed, error=%d", errno);
+        return -err;
+    }
+
+    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
+    ram_block_notifier_add(&nvmm_ram_notifier);
+
+    cpu_interrupt_handler = nvmm_handle_interrupt;
+
+    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
+    return 0;
+}
+
+int
+nvmm_enabled(void)
+{
+    return nvmm_allowed;
+}
+
+static void
+nvmm_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "NVMM";
+    ac->init_machine = nvmm_accel_init;
+    ac->allowed = &nvmm_allowed;
+}
+
+static const TypeInfo nvmm_accel_type = {
+    .name = ACCEL_CLASS_NAME("nvmm"),
+    .parent = TYPE_ACCEL,
+    .class_init = nvmm_accel_class_init,
+};
+
+static void
+nvmm_type_init(void)
+{
+    type_register_static(&nvmm_accel_type);
+}
+
+type_init(nvmm_type_init);
--
2.25.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
                         ` (3 preceding siblings ...)
  2020-02-06 21:32       ` [PATCH v4 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
@ 2020-02-17  9:07       ` Kamil Rytarowski
  2020-02-24 15:17         ` Kamil Rytarowski
  4 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-17  9:07 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: qemu-devel

Ping?

On 06.02.2020 22:32, Kamil Rytarowski wrote:
> Hello QEMU Community!
>
> Over the past year the NetBSD team has been working hard on a new user-mode API
> for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
> This new API adds user-mode capabilities to create and manage virtual machines,
> configure memory mappings for guest machines, and create and control execution
> of virtual processors.
>
> With this new API we are now able to bring our hypervisor to the QEMU
> community! The following patches implement the NetBSD Virtual Machine Monitor
> accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.
>
> When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
> accelerator for use. At runtime using the '-accel nvmm' should see a
> significant performance improvement over emulation, much like when using 'hax'
> on NetBSD.
>
> The documentation for this new API is visible at https://man.netbsd.org under
> the libnvmm(3) and nvmm(4) pages.
>
> NVMM was designed and implemented by Maxime Villard.
>
> Thank you for your feedback.
>
> Refrences:
> https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html
>
> Test plan:
>
> 1. Download a NetBSD 9.0 pre-release snapshot:
> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso
>
> 2. Install it natively on a not too old x86_64 hardware (Intel or AMD).
>
> There is no support for nested virtualization in NVMM.
>
> 3. Setup the system.
>
>  export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
>  pkg_add git gmake python37 glib2 bison pkgconf pixman
>
> Install mozilla-rootcerts and follow post-install instructions.
>
>  pkg_add mozilla-rootcerts
>
> More information: https://wiki.qemu.org/Hosts/BSD#NetBSD
>
> 4. Build qemu
>
>  mkdir build
>  cd build
>  ../configure --python=python3.7
>  gmake
>  gmake check
>
> 5. Test
>
>  qemu -accel nvmm ...
>
>
> History:
> v3 -> v4:
>  - Correct build warning by adding a missing include
>  - Do not set R8-R16 registers unless TARGET_X86_64
> v2 -> v3:
>  - Register nvmm in targetos NetBSD check
>  - Stop including hw/boards.h
>  - Rephrase old code comments (remove XXX)
> v1 -> v2:
>  - Included the testing plan as requested by Philippe Mathieu-Daude
>  - Formatting nit fix in qemu-options.hx
>  - Document NVMM in the accel section of qemu-options.hx
>
> Maxime Villard (4):
>   Add the NVMM vcpu API
>   Add the NetBSD Virtual Machine Monitor accelerator.
>   Introduce the NVMM impl
>   Add the NVMM acceleration enlightenments
>
>  accel/stubs/Makefile.objs |    1 +
>  accel/stubs/nvmm-stub.c   |   43 ++
>  configure                 |   37 ++
>  cpus.c                    |   58 ++
>  include/sysemu/hw_accel.h |   14 +
>  include/sysemu/nvmm.h     |   35 ++
>  qemu-options.hx           |   16 +-
>  target/i386/Makefile.objs |    1 +
>  target/i386/helper.c      |    2 +-
>  target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
>  10 files changed, 1424 insertions(+), 9 deletions(-)
>  create mode 100644 accel/stubs/nvmm-stub.c
>  create mode 100644 include/sysemu/nvmm.h
>  create mode 100644 target/i386/nvmm-all.c
>
> --
> 2.25.0
>
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-17  9:07       ` [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-02-24 15:17         ` Kamil Rytarowski
  2020-03-02 17:02           ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-02-24 15:17 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: qemu-devel

Ping?

On 17.02.2020 10:07, Kamil Rytarowski wrote:
> Ping?
>
> On 06.02.2020 22:32, Kamil Rytarowski wrote:
>> Hello QEMU Community!
>>
>> Over the past year the NetBSD team has been working hard on a new user-mode API
>> for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
>> This new API adds user-mode capabilities to create and manage virtual machines,
>> configure memory mappings for guest machines, and create and control execution
>> of virtual processors.
>>
>> With this new API we are now able to bring our hypervisor to the QEMU
>> community! The following patches implement the NetBSD Virtual Machine Monitor
>> accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.
>>
>> When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
>> accelerator for use. At runtime using the '-accel nvmm' should see a
>> significant performance improvement over emulation, much like when using 'hax'
>> on NetBSD.
>>
>> The documentation for this new API is visible at https://man.netbsd.org under
>> the libnvmm(3) and nvmm(4) pages.
>>
>> NVMM was designed and implemented by Maxime Villard.
>>
>> Thank you for your feedback.
>>
>> Refrences:
>> https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html
>>
>> Test plan:
>>
>> 1. Download a NetBSD 9.0 pre-release snapshot:
>> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso
>>
>> 2. Install it natively on a not too old x86_64 hardware (Intel or AMD).
>>
>> There is no support for nested virtualization in NVMM.
>>
>> 3. Setup the system.
>>
>>  export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
>>  pkg_add git gmake python37 glib2 bison pkgconf pixman
>>
>> Install mozilla-rootcerts and follow post-install instructions.
>>
>>  pkg_add mozilla-rootcerts
>>
>> More information: https://wiki.qemu.org/Hosts/BSD#NetBSD
>>
>> 4. Build qemu
>>
>>  mkdir build
>>  cd build
>>  ../configure --python=python3.7
>>  gmake
>>  gmake check
>>
>> 5. Test
>>
>>  qemu -accel nvmm ...
>>
>>
>> History:
>> v3 -> v4:
>>  - Correct build warning by adding a missing include
>>  - Do not set R8-R16 registers unless TARGET_X86_64
>> v2 -> v3:
>>  - Register nvmm in targetos NetBSD check
>>  - Stop including hw/boards.h
>>  - Rephrase old code comments (remove XXX)
>> v1 -> v2:
>>  - Included the testing plan as requested by Philippe Mathieu-Daude
>>  - Formatting nit fix in qemu-options.hx
>>  - Document NVMM in the accel section of qemu-options.hx
>>
>> Maxime Villard (4):
>>   Add the NVMM vcpu API
>>   Add the NetBSD Virtual Machine Monitor accelerator.
>>   Introduce the NVMM impl
>>   Add the NVMM acceleration enlightenments
>>
>>  accel/stubs/Makefile.objs |    1 +
>>  accel/stubs/nvmm-stub.c   |   43 ++
>>  configure                 |   37 ++
>>  cpus.c                    |   58 ++
>>  include/sysemu/hw_accel.h |   14 +
>>  include/sysemu/nvmm.h     |   35 ++
>>  qemu-options.hx           |   16 +-
>>  target/i386/Makefile.objs |    1 +
>>  target/i386/helper.c      |    2 +-
>>  target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
>>  10 files changed, 1424 insertions(+), 9 deletions(-)
>>  create mode 100644 accel/stubs/nvmm-stub.c
>>  create mode 100644 include/sysemu/nvmm.h
>>  create mode 100644 target/i386/nvmm-all.c
>>
>> --
>> 2.25.0
>>
>>
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-02-24 15:17         ` Kamil Rytarowski
@ 2020-03-02 17:02           ` Kamil Rytarowski
  2020-03-02 17:10             ` Eduardo Habkost
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-02 17:02 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: qemu-devel

Ping?

On 24.02.2020 16:17, Kamil Rytarowski wrote:
> Ping?
>
> On 17.02.2020 10:07, Kamil Rytarowski wrote:
>> Ping?
>>
>> On 06.02.2020 22:32, Kamil Rytarowski wrote:
>>> Hello QEMU Community!
>>>
>>> Over the past year the NetBSD team has been working hard on a new user-mode API
>>> for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
>>> This new API adds user-mode capabilities to create and manage virtual machines,
>>> configure memory mappings for guest machines, and create and control execution
>>> of virtual processors.
>>>
>>> With this new API we are now able to bring our hypervisor to the QEMU
>>> community! The following patches implement the NetBSD Virtual Machine Monitor
>>> accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.
>>>
>>> When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
>>> accelerator for use. At runtime using the '-accel nvmm' should see a
>>> significant performance improvement over emulation, much like when using 'hax'
>>> on NetBSD.
>>>
>>> The documentation for this new API is visible at https://man.netbsd.org under
>>> the libnvmm(3) and nvmm(4) pages.
>>>
>>> NVMM was designed and implemented by Maxime Villard.
>>>
>>> Thank you for your feedback.
>>>
>>> Refrences:
>>> https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html
>>>
>>> Test plan:
>>>
>>> 1. Download a NetBSD 9.0 pre-release snapshot:
>>> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso
>>>
>>> 2. Install it natively on a not too old x86_64 hardware (Intel or AMD).
>>>
>>> There is no support for nested virtualization in NVMM.
>>>
>>> 3. Setup the system.
>>>
>>>  export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
>>>  pkg_add git gmake python37 glib2 bison pkgconf pixman
>>>
>>> Install mozilla-rootcerts and follow post-install instructions.
>>>
>>>  pkg_add mozilla-rootcerts
>>>
>>> More information: https://wiki.qemu.org/Hosts/BSD#NetBSD
>>>
>>> 4. Build qemu
>>>
>>>  mkdir build
>>>  cd build
>>>  ../configure --python=python3.7
>>>  gmake
>>>  gmake check
>>>
>>> 5. Test
>>>
>>>  qemu -accel nvmm ...
>>>
>>>
>>> History:
>>> v3 -> v4:
>>>  - Correct build warning by adding a missing include
>>>  - Do not set R8-R16 registers unless TARGET_X86_64
>>> v2 -> v3:
>>>  - Register nvmm in targetos NetBSD check
>>>  - Stop including hw/boards.h
>>>  - Rephrase old code comments (remove XXX)
>>> v1 -> v2:
>>>  - Included the testing plan as requested by Philippe Mathieu-Daude
>>>  - Formatting nit fix in qemu-options.hx
>>>  - Document NVMM in the accel section of qemu-options.hx
>>>
>>> Maxime Villard (4):
>>>   Add the NVMM vcpu API
>>>   Add the NetBSD Virtual Machine Monitor accelerator.
>>>   Introduce the NVMM impl
>>>   Add the NVMM acceleration enlightenments
>>>
>>>  accel/stubs/Makefile.objs |    1 +
>>>  accel/stubs/nvmm-stub.c   |   43 ++
>>>  configure                 |   37 ++
>>>  cpus.c                    |   58 ++
>>>  include/sysemu/hw_accel.h |   14 +
>>>  include/sysemu/nvmm.h     |   35 ++
>>>  qemu-options.hx           |   16 +-
>>>  target/i386/Makefile.objs |    1 +
>>>  target/i386/helper.c      |    2 +-
>>>  target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
>>>  10 files changed, 1424 insertions(+), 9 deletions(-)
>>>  create mode 100644 accel/stubs/nvmm-stub.c
>>>  create mode 100644 include/sysemu/nvmm.h
>>>  create mode 100644 target/i386/nvmm-all.c
>>>
>>> --
>>> 2.25.0
>>>
>>>
>>
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-03-02 17:02           ` Kamil Rytarowski
@ 2020-03-02 17:10             ` Eduardo Habkost
  2020-03-02 17:10               ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Eduardo Habkost @ 2020-03-02 17:10 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, slp, qemu-devel, jmcneill, pbonzini, philmd, max, rth

Hi Kamil, Maxime,

I haven't managed to reserve time to review this, sorry for that.
I hope others can chime in before I do.

Would any of you be willing to be included as maintainer of the
new code on MAINTAINERS?


On Mon, Mar 02, 2020 at 06:02:18PM +0100, Kamil Rytarowski wrote:
> Ping?
> 
> On 24.02.2020 16:17, Kamil Rytarowski wrote:
> > Ping?
> >
> > On 17.02.2020 10:07, Kamil Rytarowski wrote:
> >> Ping?
> >>
> >> On 06.02.2020 22:32, Kamil Rytarowski wrote:
> >>> Hello QEMU Community!
> >>>
> >>> Over the past year the NetBSD team has been working hard on a new user-mode API
> >>> for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
> >>> This new API adds user-mode capabilities to create and manage virtual machines,
> >>> configure memory mappings for guest machines, and create and control execution
> >>> of virtual processors.
> >>>
> >>> With this new API we are now able to bring our hypervisor to the QEMU
> >>> community! The following patches implement the NetBSD Virtual Machine Monitor
> >>> accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.
> >>>
> >>> When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
> >>> accelerator for use. At runtime using the '-accel nvmm' should see a
> >>> significant performance improvement over emulation, much like when using 'hax'
> >>> on NetBSD.
> >>>
> >>> The documentation for this new API is visible at https://man.netbsd.org under
> >>> the libnvmm(3) and nvmm(4) pages.
> >>>
> >>> NVMM was designed and implemented by Maxime Villard.
> >>>
> >>> Thank you for your feedback.
> >>>
> >>> Refrences:
> >>> https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html
> >>>
> >>> Test plan:
> >>>
> >>> 1. Download a NetBSD 9.0 pre-release snapshot:
> >>> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso
> >>>
> >>> 2. Install it natively on a not too old x86_64 hardware (Intel or AMD).
> >>>
> >>> There is no support for nested virtualization in NVMM.
> >>>
> >>> 3. Setup the system.
> >>>
> >>>  export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
> >>>  pkg_add git gmake python37 glib2 bison pkgconf pixman
> >>>
> >>> Install mozilla-rootcerts and follow post-install instructions.
> >>>
> >>>  pkg_add mozilla-rootcerts
> >>>
> >>> More information: https://wiki.qemu.org/Hosts/BSD#NetBSD
> >>>
> >>> 4. Build qemu
> >>>
> >>>  mkdir build
> >>>  cd build
> >>>  ../configure --python=python3.7
> >>>  gmake
> >>>  gmake check
> >>>
> >>> 5. Test
> >>>
> >>>  qemu -accel nvmm ...
> >>>
> >>>
> >>> History:
> >>> v3 -> v4:
> >>>  - Correct build warning by adding a missing include
> >>>  - Do not set R8-R16 registers unless TARGET_X86_64
> >>> v2 -> v3:
> >>>  - Register nvmm in targetos NetBSD check
> >>>  - Stop including hw/boards.h
> >>>  - Rephrase old code comments (remove XXX)
> >>> v1 -> v2:
> >>>  - Included the testing plan as requested by Philippe Mathieu-Daude
> >>>  - Formatting nit fix in qemu-options.hx
> >>>  - Document NVMM in the accel section of qemu-options.hx
> >>>
> >>> Maxime Villard (4):
> >>>   Add the NVMM vcpu API
> >>>   Add the NetBSD Virtual Machine Monitor accelerator.
> >>>   Introduce the NVMM impl
> >>>   Add the NVMM acceleration enlightenments
> >>>
> >>>  accel/stubs/Makefile.objs |    1 +
> >>>  accel/stubs/nvmm-stub.c   |   43 ++
> >>>  configure                 |   37 ++
> >>>  cpus.c                    |   58 ++
> >>>  include/sysemu/hw_accel.h |   14 +
> >>>  include/sysemu/nvmm.h     |   35 ++
> >>>  qemu-options.hx           |   16 +-
> >>>  target/i386/Makefile.objs |    1 +
> >>>  target/i386/helper.c      |    2 +-
> >>>  target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
> >>>  10 files changed, 1424 insertions(+), 9 deletions(-)
> >>>  create mode 100644 accel/stubs/nvmm-stub.c
> >>>  create mode 100644 include/sysemu/nvmm.h
> >>>  create mode 100644 target/i386/nvmm-all.c
> >>>
> >>> --
> >>> 2.25.0
> >>>
> >>>
> >>
> >
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-03-02 17:10             ` Eduardo Habkost
@ 2020-03-02 17:10               ` Kamil Rytarowski
  2020-03-02 17:22                 ` Eduardo Habkost
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-02 17:10 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: peter.maydell, slp, qemu-devel, jmcneill, pbonzini, philmd, max, rth

On 02.03.2020 18:10, Eduardo Habkost wrote:
> Hi Kamil, Maxime,
>
> I haven't managed to reserve time to review this, sorry for that.
> I hope others can chime in before I do.
>
> Would any of you be willing to be included as maintainer of the
> new code on MAINTAINERS?
>

I'm already mentioned as the NetBSD maintainer and NVMM is NetBSD-only
(at least today).

>
> On Mon, Mar 02, 2020 at 06:02:18PM +0100, Kamil Rytarowski wrote:
>> Ping?
>>
>> On 24.02.2020 16:17, Kamil Rytarowski wrote:
>>> Ping?
>>>
>>> On 17.02.2020 10:07, Kamil Rytarowski wrote:
>>>> Ping?
>>>>
>>>> On 06.02.2020 22:32, Kamil Rytarowski wrote:
>>>>> Hello QEMU Community!
>>>>>
>>>>> Over the past year the NetBSD team has been working hard on a new user-mode API
>>>>> for our hypervisor that will be released as part of the upcoming NetBSD 9.0.
>>>>> This new API adds user-mode capabilities to create and manage virtual machines,
>>>>> configure memory mappings for guest machines, and create and control execution
>>>>> of virtual processors.
>>>>>
>>>>> With this new API we are now able to bring our hypervisor to the QEMU
>>>>> community! The following patches implement the NetBSD Virtual Machine Monitor
>>>>> accelerator (NVMM) for QEMU on NetBSD 9.0 and newer hosts.
>>>>>
>>>>> When compiling QEMU for x86_64 passing the --enable-nvmm flag will compile the
>>>>> accelerator for use. At runtime using the '-accel nvmm' should see a
>>>>> significant performance improvement over emulation, much like when using 'hax'
>>>>> on NetBSD.
>>>>>
>>>>> The documentation for this new API is visible at https://man.netbsd.org under
>>>>> the libnvmm(3) and nvmm(4) pages.
>>>>>
>>>>> NVMM was designed and implemented by Maxime Villard.
>>>>>
>>>>> Thank you for your feedback.
>>>>>
>>>>> Refrences:
>>>>> https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html
>>>>>
>>>>> Test plan:
>>>>>
>>>>> 1. Download a NetBSD 9.0 pre-release snapshot:
>>>>> http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/images/NetBSD-9.0_RC1-amd64.iso
>>>>>
>>>>> 2. Install it natively on a not too old x86_64 hardware (Intel or AMD).
>>>>>
>>>>> There is no support for nested virtualization in NVMM.
>>>>>
>>>>> 3. Setup the system.
>>>>>
>>>>>  export PKG_PATH=http://www.ki.nu/pkgsrc/packages/current/NetBSD-9.0_RC1/All
>>>>>  pkg_add git gmake python37 glib2 bison pkgconf pixman
>>>>>
>>>>> Install mozilla-rootcerts and follow post-install instructions.
>>>>>
>>>>>  pkg_add mozilla-rootcerts
>>>>>
>>>>> More information: https://wiki.qemu.org/Hosts/BSD#NetBSD
>>>>>
>>>>> 4. Build qemu
>>>>>
>>>>>  mkdir build
>>>>>  cd build
>>>>>  ../configure --python=python3.7
>>>>>  gmake
>>>>>  gmake check
>>>>>
>>>>> 5. Test
>>>>>
>>>>>  qemu -accel nvmm ...
>>>>>
>>>>>
>>>>> History:
>>>>> v3 -> v4:
>>>>>  - Correct build warning by adding a missing include
>>>>>  - Do not set R8-R16 registers unless TARGET_X86_64
>>>>> v2 -> v3:
>>>>>  - Register nvmm in targetos NetBSD check
>>>>>  - Stop including hw/boards.h
>>>>>  - Rephrase old code comments (remove XXX)
>>>>> v1 -> v2:
>>>>>  - Included the testing plan as requested by Philippe Mathieu-Daude
>>>>>  - Formatting nit fix in qemu-options.hx
>>>>>  - Document NVMM in the accel section of qemu-options.hx
>>>>>
>>>>> Maxime Villard (4):
>>>>>   Add the NVMM vcpu API
>>>>>   Add the NetBSD Virtual Machine Monitor accelerator.
>>>>>   Introduce the NVMM impl
>>>>>   Add the NVMM acceleration enlightenments
>>>>>
>>>>>  accel/stubs/Makefile.objs |    1 +
>>>>>  accel/stubs/nvmm-stub.c   |   43 ++
>>>>>  configure                 |   37 ++
>>>>>  cpus.c                    |   58 ++
>>>>>  include/sysemu/hw_accel.h |   14 +
>>>>>  include/sysemu/nvmm.h     |   35 ++
>>>>>  qemu-options.hx           |   16 +-
>>>>>  target/i386/Makefile.objs |    1 +
>>>>>  target/i386/helper.c      |    2 +-
>>>>>  target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
>>>>>  10 files changed, 1424 insertions(+), 9 deletions(-)
>>>>>  create mode 100644 accel/stubs/nvmm-stub.c
>>>>>  create mode 100644 include/sysemu/nvmm.h
>>>>>  create mode 100644 target/i386/nvmm-all.c
>>>>>
>>>>> --
>>>>> 2.25.0
>>>>>
>>>>>
>>>>
>>>
>>
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-03 11:41     ` Philippe Mathieu-Daudé
  2020-02-03 11:56       ` Kamil Rytarowski
@ 2020-03-02 17:11       ` Paolo Bonzini
  2020-03-02 18:09         ` Kamil Rytarowski
  1 sibling, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-02 17:11 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Kamil Rytarowski, rth, ehabkost, slp, peter.maydell, max
  Cc: qemu-devel

On 03/02/20 12:41, Philippe Mathieu-Daudé wrote:
> 
> Maybe you can add something like:
> 
> if test "$targetos" = "NetBSD"; then
>     nvmm="check"
> fi

You could do just nvmm="" and, below,

if test "$nvmm" != "no" && test "$targetos" = "NetBSD"

But maybe even testing NetBSD is not needed since nvmm.h will likely not
be there.

Paolo

> to build by default with NVMM if available.
> 
>> +##########################################
>> +# NetBSD Virtual Machine Monitor (NVMM) accelerator check
>> +if test "$nvmm" != "no" ; then
>> +    if check_include "nvmm.h" ; then
>> +        nvmm="yes"
>> +    LIBS="-lnvmm $LIBS"
>> +    else
>> +        if test "$nvmm" = "yes"; then
>> +            feature_not_found "NVMM" "NVMM is not available"
>> +        fi
>> +        nvmm="no"
>> +    fi
>> +fi 



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-02-03 11:56       ` Kamil Rytarowski
  2020-02-03 12:10         ` Philippe Mathieu-Daudé
@ 2020-03-02 17:12         ` Paolo Bonzini
  2020-03-02 18:05           ` Kamil Rytarowski
  1 sibling, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-02 17:12 UTC (permalink / raw)
  To: Kamil Rytarowski, Philippe Mathieu-Daudé,
	rth, ehabkost, slp, peter.maydell, max
  Cc: qemu-devel

On 03/02/20 12:56, Kamil Rytarowski wrote:
> On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
>>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
>>> enabled if available:
>>>     hax             HAX acceleration support
>>>     hvf             Hypervisor.framework acceleration support
>>>     whpx            Windows Hypervisor Platform acceleration support
>>> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>>>     rdma            Enable RDMA-based migration
>>>     pvrdma          Enable PVRDMA support
>>>     vde             support for vde network
>>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>>>       fi
>>>   fi
>>>
>>
>> Maybe you can add something like:
>>
>> if test "$targetos" = "NetBSD"; then
>>     nvmm="check"
>> fi
>>
>> to build by default with NVMM if available.
> 
> I will add nvmm=yes to the NetBSD) targetos check section.

No, nvmm=yes instead should fail the build if nvmm.h is not available.
That is not a good default.

Paolo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator
  2020-03-02 17:10               ` Kamil Rytarowski
@ 2020-03-02 17:22                 ` Eduardo Habkost
  0 siblings, 0 replies; 79+ messages in thread
From: Eduardo Habkost @ 2020-03-02 17:22 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: peter.maydell, slp, qemu-devel, jmcneill, pbonzini, philmd, max, rth

On Mon, Mar 02, 2020 at 06:10:50PM +0100, Kamil Rytarowski wrote:
> On 02.03.2020 18:10, Eduardo Habkost wrote:
> > Hi Kamil, Maxime,
> >
> > I haven't managed to reserve time to review this, sorry for that.
> > I hope others can chime in before I do.
> >
> > Would any of you be willing to be included as maintainer of the
> > new code on MAINTAINERS?
> >
> 
> I'm already mentioned as the NetBSD maintainer and NVMM is NetBSD-only
> (at least today).

I don't see the new files (accel/stubs/nvmm-stub.c,
include/sysemu/nvmm.h, target/i386/nvmm-all.c) being added to
MAINTAINERS.  Can you add it as a follow up patch?

-- 
Eduardo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-03-02 17:12         ` Paolo Bonzini
@ 2020-03-02 18:05           ` Kamil Rytarowski
  2020-03-02 19:14             ` Maxime Villard
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-02 18:05 UTC (permalink / raw)
  To: Paolo Bonzini, Philippe Mathieu-Daudé,
	rth, ehabkost, slp, peter.maydell, max
  Cc: qemu-devel

On 02.03.2020 18:12, Paolo Bonzini wrote:
> On 03/02/20 12:56, Kamil Rytarowski wrote:
>> On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
>>>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
>>>> enabled if available:
>>>>     hax             HAX acceleration support
>>>>     hvf             Hypervisor.framework acceleration support
>>>>     whpx            Windows Hypervisor Platform acceleration support
>>>> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>>>>     rdma            Enable RDMA-based migration
>>>>     pvrdma          Enable PVRDMA support
>>>>     vde             support for vde network
>>>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>>>>       fi
>>>>   fi
>>>>
>>>
>>> Maybe you can add something like:
>>>
>>> if test "$targetos" = "NetBSD"; then
>>>     nvmm="check"
>>> fi
>>>
>>> to build by default with NVMM if available.
>>
>> I will add nvmm=yes to the NetBSD) targetos check section.
>
> No, nvmm=yes instead should fail the build if nvmm.h is not available.
> That is not a good default.
>
> Paolo
>
>

Most users will get nvmm.h in place now and this is still a tunable.

I have got no opinion what to put there, nvmm=check still works.

diff --git a/configure b/configure
index d4a837cf9d..b3560d88bb 100755
--- a/configure
+++ b/configure
@@ -836,7 +836,7 @@ DragonFly)
 NetBSD)
   bsd="yes"
   hax="yes"
-  nvmm="yes"
+  nvmm="check"
   make="${MAKE-gmake}"
   audio_drv_list="oss try-sdl"
   audio_possible_drivers="oss sdl"


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-03-02 17:11       ` Paolo Bonzini
@ 2020-03-02 18:09         ` Kamil Rytarowski
  0 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-02 18:09 UTC (permalink / raw)
  To: Paolo Bonzini, Philippe Mathieu-Daudé,
	rth, ehabkost, slp, peter.maydell, max
  Cc: qemu-devel

On 02.03.2020 18:11, Paolo Bonzini wrote:
> On 03/02/20 12:41, Philippe Mathieu-Daudé wrote:
>>
>> Maybe you can add something like:
>>
>> if test "$targetos" = "NetBSD"; then
>>     nvmm="check"
>> fi
>
> You could do just nvmm="" and, below,
>
> if test "$nvmm" != "no" && test "$targetos" = "NetBSD"
>
> But maybe even testing NetBSD is not needed since nvmm.h will likely not
> be there.
>
> Paolo
>

I have got no opinion here.

I can just change on request nvmm="yes" to nvmm="check" and be done.

>> to build by default with NVMM if available.
>>
>>> +##########################################
>>> +# NetBSD Virtual Machine Monitor (NVMM) accelerator check
>>> +if test "$nvmm" != "no" ; then
>>> +    if check_include "nvmm.h" ; then
>>> +        nvmm="yes"
>>> +    LIBS="-lnvmm $LIBS"
>>> +    else
>>> +        if test "$nvmm" = "yes"; then
>>> +            feature_not_found "NVMM" "NVMM is not available"
>>> +        fi
>>> +        nvmm="no"
>>> +    fi
>>> +fi
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-02-06 21:32       ` [PATCH v4 3/4] Introduce the NVMM impl Kamil Rytarowski
  2020-02-06 23:28         ` [PATCH v4 3/4 FIXUP] " Kamil Rytarowski
@ 2020-03-02 18:13         ` Paolo Bonzini
  2020-03-02 19:28           ` Maxime Villard
  1 sibling, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-02 18:13 UTC (permalink / raw)
  To: Kamil Rytarowski, rth, ehabkost, slp, peter.maydell, philmd, max,
	jmcneill
  Cc: qemu-devel

On 06/02/20 22:32, Kamil Rytarowski wrote:
> +get_qemu_vcpu(CPUState *cpu)
> +{
> +    return (struct qemu_vcpu *)cpu->hax_vcpu;
> +}

Please make hax_vcpu a void * and rename it to "accel_data".

> +    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);

> +        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);

What are __SHIFTOUT and __SHIFTIN?

> 
> +    if (qcpu->int_window_exit) {

Should it assert the condition in the "if" below?

> +        return false;
> +    }
> +
> +    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
> +        struct nvmm_x64_state *state = vcpu->state;
> +
> +        /* Exit on interrupt window. */
> +        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
> +        state->intr.int_window_exiting = 1;
> +        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);

... and should this set qcpu->int_window_exit?

> +
> +        return false;
> +    }

Have you tried running kvm-unit-tests?

> +
> +    /* Needed, otherwise infinite loop. */
> +    current_cpu->vcpu_dirty = false;

Can you explain this?

> +        break;
> +    default: /* More MSRs to add? */
> +        val = 0;

I would add at least MSR_IA32_TSC.

> 
> +
> +        if (qcpu->stop) {
> +            cpu->exception_index = EXCP_INTERRUPT;
> +            qcpu->stop = false;
> +            ret = 1;
> +            break;
> +        }
> +
> +        nvmm_vcpu_pre_run(cpu);
> +
> +        if (atomic_read(&cpu->exit_request)) {
> +            qemu_cpu_kick_self();
> +        }
> +

This is racy without something like KVM's immediate_exit mechanism.
This should be fixed in NVMM.

Paolo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-03-02 18:05           ` Kamil Rytarowski
@ 2020-03-02 19:14             ` Maxime Villard
  2020-03-02 19:40               ` Paolo Bonzini
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Villard @ 2020-03-02 19:14 UTC (permalink / raw)
  To: Kamil Rytarowski, Paolo Bonzini, Philippe Mathieu-Daudé,
	rth, ehabkost, slp, peter.maydell
  Cc: qemu-devel

Le 02/03/2020 à 19:05, Kamil Rytarowski a écrit :
> On 02.03.2020 18:12, Paolo Bonzini wrote:
>> On 03/02/20 12:56, Kamil Rytarowski wrote:
>>> On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
>>>>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
>>>>> enabled if available:
>>>>>     hax             HAX acceleration support
>>>>>     hvf             Hypervisor.framework acceleration support
>>>>>     whpx            Windows Hypervisor Platform acceleration support
>>>>> +  nvmm            NetBSD Virtual Machine Monitor acceleration support
>>>>>     rdma            Enable RDMA-based migration
>>>>>     pvrdma          Enable PVRDMA support
>>>>>     vde             support for vde network
>>>>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>>>>>       fi
>>>>>   fi
>>>>>
>>>>
>>>> Maybe you can add something like:
>>>>
>>>> if test "$targetos" = "NetBSD"; then
>>>>     nvmm="check"
>>>> fi
>>>>
>>>> to build by default with NVMM if available.
>>>
>>> I will add nvmm=yes to the NetBSD) targetos check section.
>>
>> No, nvmm=yes instead should fail the build if nvmm.h is not available.
>> That is not a good default.
>>
>> Paolo
>>
>>
> 
> Most users will get nvmm.h in place now and this is still a tunable.
> 
> I have got no opinion what to put there, nvmm=check still works.

I would keep "yes", for consistency with the other entries. Changing all
entries to "check" should be done in a separate commit, unrelated to
NVMM.

> diff --git a/configure b/configure
> index d4a837cf9d..b3560d88bb 100755
> --- a/configure
> +++ b/configure
> @@ -836,7 +836,7 @@ DragonFly)
>  NetBSD)
>    bsd="yes"
>    hax="yes"
> -  nvmm="yes"
> +  nvmm="check"
>    make="${MAKE-gmake}"
>    audio_drv_list="oss try-sdl"
>    audio_possible_drivers="oss sdl"
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-02 18:13         ` [PATCH v4 3/4] " Paolo Bonzini
@ 2020-03-02 19:28           ` Maxime Villard
  2020-03-02 19:35             ` Paolo Bonzini
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Villard @ 2020-03-02 19:28 UTC (permalink / raw)
  To: Paolo Bonzini, Kamil Rytarowski, rth, ehabkost, slp,
	peter.maydell, philmd, jmcneill
  Cc: qemu-devel

Le 02/03/2020 à 19:13, Paolo Bonzini a écrit :
> On 06/02/20 22:32, Kamil Rytarowski wrote:
>> +get_qemu_vcpu(CPUState *cpu)
>> +{
>> +    return (struct qemu_vcpu *)cpu->hax_vcpu;
>> +}
> 
> Please make hax_vcpu a void * and rename it to "accel_data".

NVMM reproduces the existing logic in the other accelerators. I agree
that it should be "accel_data" with void *, but that should be done
in all accelerators in a separate commit, unrelated to NVMM.

>> +    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
> 
>> +        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
> 
> What are __SHIFTOUT and __SHIFTIN?

They are macros in NetBSD.

>> +    if (qcpu->int_window_exit) {
> 
> Should it assert the condition in the "if" below?

No, because if int_window_exit is set, then state->intr.int_window_exiting
is set too, so there is no point doing get+set.

>> +        return false;
>> +    }
>> +
>> +    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
>> +        struct nvmm_x64_state *state = vcpu->state;
>> +
>> +        /* Exit on interrupt window. */
>> +        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
>> +        state->intr.int_window_exiting = 1;
>> +        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
> 
> ... and should this set qcpu->int_window_exit?

Mmh, maybe. Not a big problem though, because at worst it just means we
set int_window_exiting to one while it was already one. I'll think about
that for a future commit. (I'm not immediately able to test.)

>> +
>> +        return false;
>> +    }
> 
> Have you tried running kvm-unit-tests?

I didn't know kvm-unit-tests (until now). I developed my own tests and
ran them in qemu-nvmm. But good to know, I'll try these tests.

>> +    /* Needed, otherwise infinite loop. */
>> +    current_cpu->vcpu_dirty = false;
> 
> Can you explain this?

If vcpu_dirty remains true, we land here in the next iteration of the
loop:

        if (cpu->vcpu_dirty) {
            nvmm_set_registers(cpu);
            cpu->vcpu_dirty = false;
        }

And the (now updated) register values are lost. The guest stays on the
same instruction.

>> +        break;
>> +    default: /* More MSRs to add? */
>> +        val = 0;
> 
> I would add at least MSR_IA32_TSC.

MSR_IA32_TSC is handled by the kernel, that's why it isn't there. The
values do get synced:

    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
    ...
    env->tsc = state->msrs[NVMM_X64_MSR_TSC];

>> +        if (qcpu->stop) {
>> +            cpu->exception_index = EXCP_INTERRUPT;
>> +            qcpu->stop = false;
>> +            ret = 1;
>> +            break;
>> +        }
>> +
>> +        nvmm_vcpu_pre_run(cpu);
>> +
>> +        if (atomic_read(&cpu->exit_request)) {
>> +            qemu_cpu_kick_self();
>> +        }
>> +
> 
> This is racy without something like KVM's immediate_exit mechanism.
> This should be fixed in NVMM.

I don't immediately see how this is racy. It reproduces the existing
logic found in whpx-all.c, and if there is a real problem it can be
fixed in a future commit along with WHPX.

Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-02 19:28           ` Maxime Villard
@ 2020-03-02 19:35             ` Paolo Bonzini
  2020-03-10  6:45               ` Maxime Villard
  0 siblings, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-02 19:35 UTC (permalink / raw)
  To: Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Il lun 2 mar 2020, 20:28 Maxime Villard <max@m00nbsd.net> ha scritto:

>
> >> +        nvmm_vcpu_pre_run(cpu);
> >> +
> >> +        if (atomic_read(&cpu->exit_request)) {
> >> +            qemu_cpu_kick_self();
> >> +        }
> >> +
> >
> > This is racy without something like KVM's immediate_exit mechanism.
> > This should be fixed in NVMM.
>
> I don't immediately see how this is racy.


You can get an IPI signal immediately after reading cpu->exit_request.

It reproduces the existing
> logic found in whpx-all.c, and if there is a real problem it can be
> fixed in a future commit along with WHPX.
>

It's buggy there too and it has to be fixed in the hypervisor so it can't
be done at the same time I'm both. KVM does it right by having a flag
("immediate_exit") that is set by the signal handler and checked by the
hypervisor.

An earlier version of KVM instead atomically unblocked the signal while
executing the guest, and then ate it with a sigwaitinfo after exiting back
to userspace.

You don't have to fix it immediately, but adding a FIXME would be a good
idea.

Paolo


> Maxime
>
>

[-- Attachment #2: Type: text/html, Size: 2149 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-03-02 19:14             ` Maxime Villard
@ 2020-03-02 19:40               ` Paolo Bonzini
  2020-03-02 21:10                 ` Kamil Rytarowski
  0 siblings, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-02 19:40 UTC (permalink / raw)
  To: Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel, Kamil Rytarowski,
	Philippe Mathieu-Daudé,
	rth

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

Il lun 2 mar 2020, 20:14 Maxime Villard <max@m00nbsd.net> ha scritto:

> Le 02/03/2020 à 19:05, Kamil Rytarowski a écrit :
> > On 02.03.2020 18:12, Paolo Bonzini wrote:
> >> On 03/02/20 12:56, Kamil Rytarowski wrote:
> >>> On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
> >>>>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
> >>>>> enabled if available:
> >>>>>     hax             HAX acceleration support
> >>>>>     hvf             Hypervisor.framework acceleration support
> >>>>>     whpx            Windows Hypervisor Platform acceleration support
> >>>>> +  nvmm            NetBSD Virtual Machine Monitor acceleration
> support
> >>>>>     rdma            Enable RDMA-based migration
> >>>>>     pvrdma          Enable PVRDMA support
> >>>>>     vde             support for vde network
> >>>>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
> >>>>>       fi
> >>>>>   fi
> >>>>>
> >>>>
> >>>> Maybe you can add something like:
> >>>>
> >>>> if test "$targetos" = "NetBSD"; then
> >>>>     nvmm="check"
> >>>> fi
> >>>>
> >>>> to build by default with NVMM if available.
> >>>
> >>> I will add nvmm=yes to the NetBSD) targetos check section.
> >>
> >> No, nvmm=yes instead should fail the build if nvmm.h is not available.
> >> That is not a good default.
> >>
> >> Paolo
> >>
> >>
> >
> > Most users will get nvmm.h in place now and this is still a tunable.
> >
> > I have got no opinion what to put there, nvmm=check still works.
>
> I would keep "yes", for consistency with the other entries. Changing all
> entries to "check" should be done in a separate commit, unrelated to
> NVMM.
>

The difference is that KVM for example does not need external includes or
libraries.

Paolo


> > diff --git a/configure b/configure
> > index d4a837cf9d..b3560d88bb 100755
> > --- a/configure
> > +++ b/configure
> > @@ -836,7 +836,7 @@ DragonFly)
> >  NetBSD)
> >    bsd="yes"
> >    hax="yes"
> > -  nvmm="yes"
> > +  nvmm="check"
> >    make="${MAKE-gmake}"
> >    audio_drv_list="oss try-sdl"
> >    audio_possible_drivers="oss sdl"
> >
>
>

[-- Attachment #2: Type: text/html, Size: 3493 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-03-02 19:40               ` Paolo Bonzini
@ 2020-03-02 21:10                 ` Kamil Rytarowski
  2020-03-02 22:45                   ` Paolo Bonzini
  0 siblings, 1 reply; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-02 21:10 UTC (permalink / raw)
  To: Paolo Bonzini, Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel,
	Philippe Mathieu-Daudé,
	rth

On 02.03.2020 20:40, Paolo Bonzini wrote:
>
>
> Il lun 2 mar 2020, 20:14 Maxime Villard <max@m00nbsd.net
> <mailto:max@m00nbsd.net>> ha scritto:
>
>     Le 02/03/2020 à 19:05, Kamil Rytarowski a écrit :
>     > On 02.03.2020 18:12, Paolo Bonzini wrote:
>     >> On 03/02/20 12:56, Kamil Rytarowski wrote:
>     >>> On 03.02.2020 12:41, Philippe Mathieu-Daudé wrote:
>     >>>>> @@ -1768,6 +1785,7 @@ disabled with --disable-FEATURE, default is
>     >>>>> enabled if available:
>     >>>>>     hax             HAX acceleration support
>     >>>>>     hvf             Hypervisor.framework acceleration support
>     >>>>>     whpx            Windows Hypervisor Platform acceleration
>     support
>     >>>>> +  nvmm            NetBSD Virtual Machine Monitor acceleration
>     support
>     >>>>>     rdma            Enable RDMA-based migration
>     >>>>>     pvrdma          Enable PVRDMA support
>     >>>>>     vde             support for vde network
>     >>>>> @@ -2757,6 +2775,20 @@ if test "$whpx" != "no" ; then
>     >>>>>       fi
>     >>>>>   fi
>     >>>>>
>     >>>>
>     >>>> Maybe you can add something like:
>     >>>>
>     >>>> if test "$targetos" = "NetBSD"; then
>     >>>>     nvmm="check"
>     >>>> fi
>     >>>>
>     >>>> to build by default with NVMM if available.
>     >>>
>     >>> I will add nvmm=yes to the NetBSD) targetos check section.
>     >>
>     >> No, nvmm=yes instead should fail the build if nvmm.h is not
>     available.
>     >> That is not a good default.
>     >>
>     >> Paolo
>     >>
>     >>
>     >
>     > Most users will get nvmm.h in place now and this is still a tunable.
>     >
>     > I have got no opinion what to put there, nvmm=check still works.
>
>     I would keep "yes", for consistency with the other entries. Changing all
>     entries to "check" should be done in a separate commit, unrelated to
>     NVMM.
>
>
> The difference is that KVM for example does not need external includes
> or libraries.
>
> Paolo
>

We don't support this scenario and after a year there might be no
supported release without NVMM.

The only concern is about using qemu on !amd64, but we have there not
many users of qemu for understandable reasons.

For AArch64 we plan to implement a dedicated NVMM backend.

>
>     > diff --git a/configure b/configure
>     > index d4a837cf9d..b3560d88bb 100755
>     > --- a/configure
>     > +++ b/configure
>     > @@ -836,7 +836,7 @@ DragonFly)
>     >  NetBSD)
>     >    bsd="yes"
>     >    hax="yes"
>     > -  nvmm="yes"
>     > +  nvmm="check"
>     >    make="${MAKE-gmake}"
>     >    audio_drv_list="oss try-sdl"
>     >    audio_possible_drivers="oss sdl"
>     >
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-03-02 21:10                 ` Kamil Rytarowski
@ 2020-03-02 22:45                   ` Paolo Bonzini
  0 siblings, 0 replies; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-02 22:45 UTC (permalink / raw)
  To: Kamil Rytarowski
  Cc: Maydell, Peter, ehabkost, slp, qemu-devel,
	Philippe Mathieu-Daudé,
	Maxime Villard, rth

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

Il lun 2 mar 2020, 22:11 Kamil Rytarowski <n54@gmx.com> ha scritto:

> > The difference is that KVM for example does not need external includes
> > or libraries.
>

We don't support this scenario


What scenario?

and after a year there might be no
> supported release without NVMM.
>
> The only concern is about using qemu on !amd64, but we have there not
> many users of qemu for understandable reasons.
>

How do you know?

Paolo


> For AArch64 we plan to implement a dedicated NVMM backend.
>
> >
> >     > diff --git a/configure b/configure
> >     > index d4a837cf9d..b3560d88bb 100755
> >     > --- a/configure
> >     > +++ b/configure
> >     > @@ -836,7 +836,7 @@ DragonFly)
> >     >  NetBSD)
> >     >    bsd="yes"
> >     >    hax="yes"
> >     > -  nvmm="yes"
> >     > +  nvmm="check"
> >     >    make="${MAKE-gmake}"
> >     >    audio_drv_list="oss try-sdl"
> >     >    audio_possible_drivers="oss sdl"
> >     >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 2233 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-02 19:35             ` Paolo Bonzini
@ 2020-03-10  6:45               ` Maxime Villard
  2020-03-10 10:15                 ` Kamil Rytarowski
                                   ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Maxime Villard @ 2020-03-10  6:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

Le 02/03/2020 à 20:35, Paolo Bonzini a écrit :
> 
> 
> Il lun 2 mar 2020, 20:28 Maxime Villard <max@m00nbsd.net <mailto:max@m00nbsd.net>> ha scritto:
> 
> 
>     >> +        nvmm_vcpu_pre_run(cpu);
>     >> +
>     >> +        if (atomic_read(&cpu->exit_request)) {
>     >> +            qemu_cpu_kick_self();
>     >> +        }
>     >> +
>     >
>     > This is racy without something like KVM's immediate_exit mechanism.
>     > This should be fixed in NVMM.
> 
>     I don't immediately see how this is racy.
> 
> 
> You can get an IPI signal immediately after reading cpu->exit_request.
> 
>     It reproduces the existing
>     logic found in whpx-all.c, and if there is a real problem it can be
>     fixed in a future commit along with WHPX.
> 
> 
> It's buggy there too and it has to be fixed in the hypervisor so it can't be done at the same time I'm both. KVM does it right by having a flag ("immediate_exit") that is set by the signal handler and checked by the hypervisor.
> 
> An earlier version of KVM instead atomically unblocked the signal while executing the guest, and then ate it with a sigwaitinfo after exiting back to userspace.
> 
> You don't have to fix it immediately, but adding a FIXME would be a good idea.
> 
> Paolo

Kamil, please add /* FIXME: possible race here */ before the atomic_read().

Thanks


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-10  6:45               ` Maxime Villard
@ 2020-03-10 10:15                 ` Kamil Rytarowski
  2020-03-10 10:58                 ` Paolo Bonzini
  2020-07-21 13:42                 ` Kamil Rytarowski
  2 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-10 10:15 UTC (permalink / raw)
  To: Maxime Villard, Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill, philmd, rth

On 10.03.2020 07:45, Maxime Villard wrote:
> Le 02/03/2020 à 20:35, Paolo Bonzini a écrit :
>>
>>
>> Il lun 2 mar 2020, 20:28 Maxime Villard <max@m00nbsd.net <mailto:max@m00nbsd.net>> ha scritto:
>>
>>
>>     >> +        nvmm_vcpu_pre_run(cpu);
>>     >> +
>>     >> +        if (atomic_read(&cpu->exit_request)) {
>>     >> +            qemu_cpu_kick_self();
>>     >> +        }
>>     >> +
>>     >
>>     > This is racy without something like KVM's immediate_exit mechanism.
>>     > This should be fixed in NVMM.
>>
>>     I don't immediately see how this is racy.
>>
>>
>> You can get an IPI signal immediately after reading cpu->exit_request.
>>
>>     It reproduces the existing
>>     logic found in whpx-all.c, and if there is a real problem it can be
>>     fixed in a future commit along with WHPX.
>>
>>
>> It's buggy there too and it has to be fixed in the hypervisor so it can't be done at the same time I'm both. KVM does it right by having a flag ("immediate_exit") that is set by the signal handler and checked by the hypervisor.
>>
>> An earlier version of KVM instead atomically unblocked the signal while executing the guest, and then ate it with a sigwaitinfo after exiting back to userspace.
>>
>> You don't have to fix it immediately, but adding a FIXME would be a good idea.
>>
>> Paolo
>
> Kamil, please add /* FIXME: possible race here */ before the atomic_read().
>
> Thanks
>

I will do it and submit a new patchset revision.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-10  6:45               ` Maxime Villard
  2020-03-10 10:15                 ` Kamil Rytarowski
@ 2020-03-10 10:58                 ` Paolo Bonzini
  2020-03-10 19:14                   ` Maxime Villard
  2020-07-21 13:42                 ` Kamil Rytarowski
  2 siblings, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-10 10:58 UTC (permalink / raw)
  To: Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

On 10/03/20 07:45, Maxime Villard wrote:
> > It reproduces the existing logic found in whpx-all.c, and if there is
> >
> 
> It's buggy there too and it has to be fixed in the hypervisor so it
> can't be done at the same time I'm both. KVM does it right by having
> a flag ("immediate_exit") that is set by the signal handler and
> checked by the hypervisor.

For what it's worth, WHPX's whpx_kick_vcpu invokes a "cancel entry" API
that is probably similar to what KVM does, so there's nothing to do there.

Paolo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-10 10:58                 ` Paolo Bonzini
@ 2020-03-10 19:14                   ` Maxime Villard
  2020-03-11 18:03                     ` Paolo Bonzini
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Villard @ 2020-03-10 19:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

Le 10/03/2020 à 11:58, Paolo Bonzini a écrit :
> On 10/03/20 07:45, Maxime Villard wrote:
>>> It reproduces the existing logic found in whpx-all.c, and if there is
>>>
>>
>> It's buggy there too and it has to be fixed in the hypervisor so it
>> can't be done at the same time I'm both. KVM does it right by having
>> a flag ("immediate_exit") that is set by the signal handler and
>> checked by the hypervisor.
> 
> For what it's worth, WHPX's whpx_kick_vcpu invokes a "cancel entry" API
> that is probably similar to what KVM does, so there's nothing to do there.
> 
> Paolo

Having had some time to look at the actual code today, I'm wondering why
WHPX uses whpx_vcpu_kick(), and why I even use qemu_cpu_kick_self() in
NVMM.

whpx_vcpu_kick() cancels the RunVirtualProcessor on the given VCPU. But
that VCPU is not executing anyway, since we're the calling thread, and if
we are in userland then we cannot be in the guest. It seems to me the code
should just break the loop directly instead of doing a self-kick, given
that it is known that the VCPU is not executing.

Maybe, whpx_vcpu_kick() causes a WHvRunVpExitReasonCanceled in the
WHvRunVirtualProcessor() call that follows, which in turn causes "ret=1"
to leave the loop. That is, maybe the next WHvRunVirtualProcessor() acks
the cancellation and leaves without doing anything, even if the
cancellation was received when this function wasn't executing. So there is
no bad effect, given that we still end up leaving the loop, which is the
desired functional behavior.

Looking at NVMM now, it seems to me there is the same thing. We do a
self-kick but we're the calling thread and know the VCPU isn't executing.
As a result of the self-kick the IPI handler sets
	qcpu->stop = true;
And in the next iteration of the loop, we break because this bool is set.
Again, we took a difficult path, but this is the desired final behavior.

In both WHPX and NVMM, it seems that we are doing unnecessary self-kicks,
and could just break the loop right away.

What am I getting wrong? The thread logic in Qemu is not really easy to
grasp..

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-10 19:14                   ` Maxime Villard
@ 2020-03-11 18:03                     ` Paolo Bonzini
  2020-03-11 20:14                       ` Maxime Villard
  0 siblings, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-11 18:03 UTC (permalink / raw)
  To: Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

On 10/03/20 20:14, Maxime Villard wrote:
> Maybe, whpx_vcpu_kick() causes a WHvRunVpExitReasonCanceled in the
> WHvRunVirtualProcessor() call that follows, which in turn causes "ret=1"
> to leave the loop. That is, maybe the next WHvRunVirtualProcessor() acks
> the cancellation and leaves without doing anything, even if the
> cancellation was received when this function wasn't executing. So there is
> no bad effect, given that we still end up leaving the loop, which is the
> desired functional behavior.

Yes, that's exactly the effect, and it solves the race in the same way
as KVM's run->immediate_exit flag.

> Looking at NVMM now, it seems to me there is the same thing. We do a
> self-kick but we're the calling thread and know the VCPU isn't executing.
> As a result of the self-kick the IPI handler sets
> 	qcpu->stop = true;
> And in the next iteration of the loop, we break because this bool is set

The problem is that qcpu->stop is checked _before_ entering the
hypervisor and not after, so there is a small race window.

Paolo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-11 18:03                     ` Paolo Bonzini
@ 2020-03-11 20:14                       ` Maxime Villard
  2020-03-11 20:42                         ` Paolo Bonzini
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Villard @ 2020-03-11 20:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

Le 11/03/2020 à 19:03, Paolo Bonzini a écrit :
> On 10/03/20 20:14, Maxime Villard wrote:
>> Maybe, whpx_vcpu_kick() causes a WHvRunVpExitReasonCanceled in the
>> WHvRunVirtualProcessor() call that follows, which in turn causes "ret=1"
>> to leave the loop. That is, maybe the next WHvRunVirtualProcessor() acks
>> the cancellation and leaves without doing anything, even if the
>> cancellation was received when this function wasn't executing. So there is
>> no bad effect, given that we still end up leaving the loop, which is the
>> desired functional behavior.
> 
> Yes, that's exactly the effect, and it solves the race in the same way
> as KVM's run->immediate_exit flag.
> 
>> Looking at NVMM now, it seems to me there is the same thing. We do a
>> self-kick but we're the calling thread and know the VCPU isn't executing.
>> As a result of the self-kick the IPI handler sets
>> 	qcpu->stop = true;
>> And in the next iteration of the loop, we break because this bool is set
> 
> The problem is that qcpu->stop is checked _before_ entering the
> hypervisor and not after, so there is a small race window.

Ok. I don't understand what's supposed to be the race here. If we get an
IPI between the check and the call to nvmm_vcpu_run() then we'll just do
one run and stop in the next iteration, because the IPI will have set
qcpu->stop. Is this extra iteration undesired?


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-11 20:14                       ` Maxime Villard
@ 2020-03-11 20:42                         ` Paolo Bonzini
  2020-03-11 21:21                           ` Maxime Villard
  0 siblings, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-11 20:42 UTC (permalink / raw)
  To: Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

On 11/03/20 21:14, Maxime Villard wrote:
>> The problem is that qcpu->stop is checked _before_ entering the
>> hypervisor and not after, so there is a small race window.
> Ok. I don't understand what's supposed to be the race here. If we get an
> IPI between the check and the call to nvmm_vcpu_run() then we'll just do
> one run and stop in the next iteration, because the IPI will have set
> qcpu->stop. Is this extra iteration undesired?

Yes, you don't know how long that run would take.  I don't know about
NVMM but for KVM it may even never leave if the guest is in HLT state.

Paolo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-11 20:42                         ` Paolo Bonzini
@ 2020-03-11 21:21                           ` Maxime Villard
  2020-03-11 21:22                             ` Kamil Rytarowski
  2020-03-11 21:44                             ` Paolo Bonzini
  0 siblings, 2 replies; 79+ messages in thread
From: Maxime Villard @ 2020-03-11 21:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

Le 11/03/2020 à 21:42, Paolo Bonzini a écrit :
> On 11/03/20 21:14, Maxime Villard wrote:
>>> The problem is that qcpu->stop is checked _before_ entering the
>>> hypervisor and not after, so there is a small race window.
>> Ok. I don't understand what's supposed to be the race here. If we get an
>> IPI between the check and the call to nvmm_vcpu_run() then we'll just do
>> one run and stop in the next iteration, because the IPI will have set
>> qcpu->stop. Is this extra iteration undesired?
> 
> Yes, you don't know how long that run would take.  I don't know about
> NVMM but for KVM it may even never leave if the guest is in HLT state.

Ok, I see, thanks.

In NVMM the runs are short, the syscalls are fast, and pending signals
cause returns to userland. Therefore, in practice, it's not a big problem,
because (1) the window is small and (2) if we have a miss it's not going
to take long to come back to Qemu.

I see a quick kernel change I can make to reduce 95% of the window
already in the current state. The remaining 5% will need a new
nvmm_vcpu_kick() function.

For now this issue is unimportant and no Qemu change is required.

Kamil, please also drop the XXX in
    /* XXX Needed, otherwise infinite loop. */
It's not a bug.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-11 21:21                           ` Maxime Villard
@ 2020-03-11 21:22                             ` Kamil Rytarowski
  2020-03-11 21:44                             ` Paolo Bonzini
  1 sibling, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-03-11 21:22 UTC (permalink / raw)
  To: Maxime Villard, Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill, philmd, rth

On 11.03.2020 22:21, Maxime Villard wrote:
> Le 11/03/2020 à 21:42, Paolo Bonzini a écrit :
>> On 11/03/20 21:14, Maxime Villard wrote:
>>>> The problem is that qcpu->stop is checked _before_ entering the
>>>> hypervisor and not after, so there is a small race window.
>>> Ok. I don't understand what's supposed to be the race here. If we get an
>>> IPI between the check and the call to nvmm_vcpu_run() then we'll just do
>>> one run and stop in the next iteration, because the IPI will have set
>>> qcpu->stop. Is this extra iteration undesired?
>>
>> Yes, you don't know how long that run would take.  I don't know about
>> NVMM but for KVM it may even never leave if the guest is in HLT state.
>
> Ok, I see, thanks.
>
> In NVMM the runs are short, the syscalls are fast, and pending signals
> cause returns to userland. Therefore, in practice, it's not a big problem,
> because (1) the window is small and (2) if we have a miss it's not going
> to take long to come back to Qemu.
>
> I see a quick kernel change I can make to reduce 95% of the window
> already in the current state. The remaining 5% will need a new
> nvmm_vcpu_kick() function.
>
> For now this issue is unimportant and no Qemu change is required.
>
> Kamil, please also drop the XXX in
>     /* XXX Needed, otherwise infinite loop. */
> It's not a bug.
>

OK. I will do it.

> Thanks,
> Maxime
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-11 21:21                           ` Maxime Villard
  2020-03-11 21:22                             ` Kamil Rytarowski
@ 2020-03-11 21:44                             ` Paolo Bonzini
  2020-03-12  7:08                               ` Maxime Villard
  1 sibling, 1 reply; 79+ messages in thread
From: Paolo Bonzini @ 2020-03-11 21:44 UTC (permalink / raw)
  To: Maxime Villard
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

On 11/03/20 22:21, Maxime Villard wrote:
>> Yes, you don't know how long that run would take.  I don't know about
>> NVMM but for KVM it may even never leave if the guest is in HLT state.
> Ok, I see, thanks.
> 
> In NVMM the runs are short

How do you ensure that a guest with interrupts off exits promptly?

> , the syscalls are fast, and pending signals
> cause returns to userland. Therefore, in practice, it's not a big problem,
> because (1) the window is small and (2) if we have a miss it's not going
> to take long to come back to Qemu.
> 
> I see a quick kernel change I can make to reduce 95% of the window
> already in the current state. The remaining 5% will need a new
> nvmm_vcpu_kick() function.

You can also do what KVM did until a few years ago: swap the signal mask
atomically when you enter the hypervisor (e.g. unmasking SIGUSR1---this
has to be done in the kernel) and when you leave it.  Then in QEMU you
keep SIGUSR1 masked and "eat" it with sigwaitinfo.

> For now this issue is unimportant and no Qemu change is required.

If you say so.

Paolo



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-11 21:44                             ` Paolo Bonzini
@ 2020-03-12  7:08                               ` Maxime Villard
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Villard @ 2020-03-12  7:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill,
	Kamil Rytarowski, philmd, rth

Le 11/03/2020 à 22:44, Paolo Bonzini a écrit :
> On 11/03/20 22:21, Maxime Villard wrote:
>>> Yes, you don't know how long that run would take.  I don't know about
>>> NVMM but for KVM it may even never leave if the guest is in HLT state.
>> Ok, I see, thanks.
>>
>> In NVMM the runs are short
> 
> How do you ensure that a guest with interrupts off exits promptly?

In NVMM there are several conditions unrelated to the guest state which
cause returns to userland. These are reschedulings, signals and softints.
They happen "regularly". As the man page states: "this gives a chance
for emulator software to halt the VM in its tracks".

There was a specific reason this design was chosen, but it's true that a
nvmm_vcpu_kick() is more precise and warranted here.

>> , the syscalls are fast, and pending signals
>> cause returns to userland. Therefore, in practice, it's not a big problem,
>> because (1) the window is small and (2) if we have a miss it's not going
>> to take long to come back to Qemu.
>>
>> I see a quick kernel change I can make to reduce 95% of the window
>> already in the current state. The remaining 5% will need a new
>> nvmm_vcpu_kick() function.
> 
> You can also do what KVM did until a few years ago: swap the signal mask
> atomically when you enter the hypervisor (e.g. unmasking SIGUSR1---this
> has to be done in the kernel) and when you leave it.  Then in QEMU you
> keep SIGUSR1 masked and "eat" it with sigwaitinfo.
> 
>> For now this issue is unimportant and no Qemu change is required.
> 
> If you say so.

At first I thought the race was an actual locking problem. In fact it's
just a delay which on NVMM happens to be small, so yeah, not a very
important issue, it will be addressed in a future patch set soon.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v4 3/4] Introduce the NVMM impl
  2020-03-10  6:45               ` Maxime Villard
  2020-03-10 10:15                 ` Kamil Rytarowski
  2020-03-10 10:58                 ` Paolo Bonzini
@ 2020-07-21 13:42                 ` Kamil Rytarowski
  2 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-07-21 13:42 UTC (permalink / raw)
  To: Maxime Villard, Paolo Bonzini
  Cc: peter.maydell, ehabkost, slp, qemu-devel, jmcneill, philmd, rth

On 10.03.2020 07:45, Maxime Villard wrote:
> Le 02/03/2020 à 20:35, Paolo Bonzini a écrit :
>>
>>
>> Il lun 2 mar 2020, 20:28 Maxime Villard <max@m00nbsd.net <mailto:max@m00nbsd.net>> ha scritto:
>>
>>
>>     >> +        nvmm_vcpu_pre_run(cpu);
>>     >> +
>>     >> +        if (atomic_read(&cpu->exit_request)) {
>>     >> +            qemu_cpu_kick_self();
>>     >> +        }
>>     >> +
>>     >
>>     > This is racy without something like KVM's immediate_exit mechanism.
>>     > This should be fixed in NVMM.
>>
>>     I don't immediately see how this is racy.
>>
>>
>> You can get an IPI signal immediately after reading cpu->exit_request.
>>
>>     It reproduces the existing
>>     logic found in whpx-all.c, and if there is a real problem it can be
>>     fixed in a future commit along with WHPX.
>>
>>
>> It's buggy there too and it has to be fixed in the hypervisor so it can't be done at the same time I'm both. KVM does it right by having a flag ("immediate_exit") that is set by the signal handler and checked by the hypervisor.
>>
>> An earlier version of KVM instead atomically unblocked the signal while executing the guest, and then ate it with a sigwaitinfo after exiting back to userspace.
>>
>> You don't have to fix it immediately, but adding a FIXME would be a good idea.
>>
>> Paolo
> 
> Kamil, please add /* FIXME: possible race here */ before the atomic_read().
> 
> Thanks
> 

So, is this still considered as a possible race? Were there any other
changes to be introduced?

Can we see this patchset merged?


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v5 1/4] Add the NVMM vcpu API
  2020-02-06 21:32       ` [PATCH v4 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-08-11 12:47         ` Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                             ` (2 more replies)
  2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
  1 sibling, 3 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 12:47 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
introduces the nvmm.h sysemu API for managing the vcpu scheduling and
management.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 accel/stubs/Makefile.objs |  1 +
 accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h

diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
index bbd14e71fb..38660a0b9b 100644
--- a/accel/stubs/Makefile.objs
+++ b/accel/stubs/Makefile.objs
@@ -1,6 +1,7 @@
 obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
 obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
 obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
+obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
 obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
 obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
 obj-$(call lnot,$(CONFIG_XEN))  += xen-stub.o
diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
new file mode 100644
index 0000000000..c2208b84a3
--- /dev/null
+++ b/accel/stubs/nvmm-stub.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/nvmm.h"
+
+int nvmm_init_vcpu(CPUState *cpu)
+{
+    return -1;
+}
+
+int nvmm_vcpu_exec(CPUState *cpu)
+{
+    return -1;
+}
+
+void nvmm_destroy_vcpu(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
new file mode 100644
index 0000000000..10496f3980
--- /dev/null
+++ b/include/sysemu/nvmm.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVMM_H
+#define QEMU_NVMM_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int nvmm_init_vcpu(CPUState *);
+int nvmm_vcpu_exec(CPUState *);
+void nvmm_destroy_vcpu(CPUState *);
+
+void nvmm_cpu_synchronize_state(CPUState *);
+void nvmm_cpu_synchronize_post_reset(CPUState *);
+void nvmm_cpu_synchronize_post_init(CPUState *);
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
+
+#ifdef CONFIG_NVMM
+
+int nvmm_enabled(void);
+
+#else /* CONFIG_NVMM */
+
+#define nvmm_enabled() (0)
+
+#endif /* CONFIG_NVMM */
+
+#endif /* CONFIG_NVMM */
-- 
2.24.1




^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-08-11 12:47         ` [PATCH v5 " Kamil Rytarowski
@ 2020-08-11 12:47           ` Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 3/4] Introduce the NVMM impl Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
  2 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 12:47 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Introduces the configure support for the new NetBSD Virtual Machine Monitor that
allows for hypervisor acceleration from usermode components on the NetBSD
platform.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 configure       | 37 +++++++++++++++++++++++++++++++++++++
 qemu-options.hx | 10 +++++-----
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/configure b/configure
index 2acc4d1465..fb9ffba2bf 100755
--- a/configure
+++ b/configure
@@ -246,6 +246,17 @@ supported_whpx_target() {
     return 1
 }
 
+supported_nvmm_target() {
+    test "$nvmm" = "yes" || return 1
+    glob "$1" "*-softmmu" || return 1
+    case "${1%-softmmu}" in
+        i386|x86_64)
+            return 0
+        ;;
+    esac
+    return 1
+}
+
 supported_target() {
     case "$1" in
         *-softmmu)
@@ -273,6 +284,7 @@ supported_target() {
     supported_hax_target "$1" && return 0
     supported_hvf_target "$1" && return 0
     supported_whpx_target "$1" && return 0
+    supported_nvmm_target "$1" && return 0
     print_error "TCG disabled, but hardware accelerator not available for '$target'"
     return 1
 }
@@ -395,6 +407,7 @@ kvm="no"
 hax="no"
 hvf="no"
 whpx="no"
+nvmm="no"
 rdma=""
 pvrdma=""
 gprof="no"
@@ -847,6 +860,7 @@ DragonFly)
 NetBSD)
   bsd="yes"
   hax="yes"
+  nvmm="yes"
   make="${MAKE-gmake}"
   audio_drv_list="oss try-sdl"
   audio_possible_drivers="oss sdl"
@@ -1233,6 +1247,10 @@ for opt do
   ;;
   --enable-whpx) whpx="yes"
   ;;
+  --disable-nvmm) nvmm="no"
+  ;;
+  --enable-nvmm) nvmm="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1879,6 +1897,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   hax             HAX acceleration support
   hvf             Hypervisor.framework acceleration support
   whpx            Windows Hypervisor Platform acceleration support
+  nvmm            NetBSD Virtual Machine Monitor acceleration support
   rdma            Enable RDMA-based migration
   pvrdma          Enable PVRDMA support
   vde             support for vde network
@@ -2965,6 +2984,20 @@ if test "$whpx" != "no" ; then
     fi
 fi
 
+##########################################
+# NetBSD Virtual Machine Monitor (NVMM) accelerator check
+if test "$nvmm" != "no" ; then
+    if check_include "nvmm.h" ; then
+        nvmm="yes"
+	LIBS="-lnvmm $LIBS"
+    else
+        if test "$nvmm" = "yes"; then
+            feature_not_found "NVMM" "NVMM is not available"
+        fi
+        nvmm="no"
+    fi
+fi
+
 ##########################################
 # Sparse probe
 if test "$sparse" != "no" ; then
@@ -6934,6 +6967,7 @@ echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
 echo "WHPX support      $whpx"
+echo "NVMM support      $nvmm"
 echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
@@ -8332,6 +8366,9 @@ fi
 if test "$target_aligned_only" = "yes" ; then
   echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
 fi
+if supported_nvmm_target $target; then
+    echo "CONFIG_NVMM=y" >> $config_target_mak
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/qemu-options.hx b/qemu-options.hx
index 708583b4ce..697accaa7e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -26,7 +26,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "-machine [type=]name[,prop[=value][,...]]\n"
     "                selects emulated machine ('-machine help' for list)\n"
     "                property accel=accel1[:accel2[:...]] selects accelerator\n"
-    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
+    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
@@ -58,7 +58,7 @@ SRST
 
     ``accel=accels1[:accels2[:...]]``
         This is used to enable an accelerator. Depending on the target
-        architecture, kvm, xen, hax, hvf, whpx or tcg can be available.
+        architecture, kvm, xen, hax, hvf, nvmm, whpx or tcg can be available.
         By default, tcg is used. If there is more than one accelerator
         specified, the next one is used if the previous one fails to
         initialize.
@@ -119,7 +119,7 @@ ERST
 
 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "-accel [accel=]accelerator[,prop[=value][,...]]\n"
-    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
+    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
     "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
     "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
@@ -128,8 +128,8 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
 SRST
 ``-accel name[,prop=value[,...]]``
     This is used to enable an accelerator. Depending on the target
-    architecture, kvm, xen, hax, hvf, whpx or tcg can be available. By
-    default, tcg is used. If there is more than one accelerator
+    architecture, kvm, xen, hax, hvf, nvmm whpx or tcg can be available.
+    By default, tcg is used. If there is more than one accelerator
     specified, the next one is used if the previous one fails to
     initialize.
 
-- 
2.24.1




^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 3/4] Introduce the NVMM impl
  2020-08-11 12:47         ` [PATCH v5 " Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-08-11 12:47           ` Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
  2 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 12:47 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
QEMU much greater speed over the emulated x86_64 path's that are taken on
NetBSD today.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 target/i386/Makefile.objs |    1 +
 target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1227 insertions(+)
 create mode 100644 target/i386/nvmm-all.c

diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 0b93143e27..ff0df68404 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -18,6 +18,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_NVMM) += nvmm-all.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
new file mode 100644
index 0000000000..408f7305b9
--- /dev/null
+++ b/target/i386/nvmm-all.c
@@ -0,0 +1,1226 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "exec/ioport.h"
+#include "qemu-common.h"
+#include "strings.h"
+#include "sysemu/accel.h"
+#include "sysemu/nvmm.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#include <nvmm.h>
+
+struct qemu_vcpu {
+    struct nvmm_vcpu vcpu;
+    uint8_t tpr;
+    bool stop;
+
+    /* Window-exiting for INTs/NMIs. */
+    bool int_window_exit;
+    bool nmi_window_exit;
+
+    /* The guest is in an interrupt shadow (POP SS, etc). */
+    bool int_shadow;
+};
+
+struct qemu_machine {
+    struct nvmm_capability cap;
+    struct nvmm_machine mach;
+};
+
+/* -------------------------------------------------------------------------- */
+
+static bool nvmm_allowed;
+static struct qemu_machine qemu_mach;
+
+static struct qemu_vcpu *
+get_qemu_vcpu(CPUState *cpu)
+{
+    return (struct qemu_vcpu *)cpu->hax_vcpu;
+}
+
+static struct nvmm_machine *
+get_nvmm_mach(void)
+{
+    return &qemu_mach.mach;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
+{
+    uint32_t attrib = qseg->flags;
+
+    nseg->selector = qseg->selector;
+    nseg->limit = qseg->limit;
+    nseg->base = qseg->base;
+    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
+    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
+    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
+    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
+    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
+    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
+    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
+    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
+}
+
+static void
+nvmm_set_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* GPRs. */
+    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
+    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
+    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
+    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
+    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
+    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
+    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
+    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
+#ifdef TARGET_X86_64
+    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
+    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
+    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
+    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
+    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
+    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
+    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
+    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
+    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
+
+    /* Segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
+
+    /* Special segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
+
+    /* Control registers. */
+    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
+    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
+    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
+    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
+    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
+
+    /* Debug registers. */
+    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
+    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
+    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
+    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
+    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
+    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
+
+    /* FPU. */
+    state->fpu.fx_cw = env->fpuc;
+    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+    state->fpu.fx_tw = 0;
+    for (i = 0; i < 8; i++) {
+        state->fpu.fx_tw |= (!env->fptags[i]) << i;
+    }
+    state->fpu.fx_opcode = env->fpop;
+    state->fpu.fx_ip.fa_64 = env->fpip;
+    state->fpu.fx_dp.fa_64 = env->fpdp;
+    state->fpu.fx_mxcsr = env->mxcsr;
+    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
+            &env->xmm_regs[i].ZMM_Q(0), 8);
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
+            &env->xmm_regs[i].ZMM_Q(1), 8);
+    }
+
+    /* MSRs. */
+    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
+    state->msrs[NVMM_X64_MSR_STAR] = env->star;
+#ifdef TARGET_X86_64
+    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
+    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
+    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
+    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
+#endif
+    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
+    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
+    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
+    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
+    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to set virtual processor context,"
+            " error=%d", errno);
+    }
+}
+
+static void
+nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
+{
+    qseg->selector = nseg->selector;
+    qseg->limit = nseg->limit;
+    qseg->base = nseg->base;
+
+    qseg->flags =
+        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
+}
+
+static void
+nvmm_get_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap, tpr;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to get virtual processor context,"
+            " error=%d", errno);
+    }
+
+    /* GPRs. */
+    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
+    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
+    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
+    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
+    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
+    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
+    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
+    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
+#ifdef TARGET_X86_64
+    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
+    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
+    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
+    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
+    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
+    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
+    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
+    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    env->eip = state->gprs[NVMM_X64_GPR_RIP];
+    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
+
+    /* Segments. */
+    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
+    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
+    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
+    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
+    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
+    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
+
+    /* Special segments. */
+    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
+    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
+    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
+    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
+
+    /* Control registers. */
+    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
+    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
+    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
+    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
+    tpr = state->crs[NVMM_X64_CR_CR8];
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+    }
+    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
+
+    /* Debug registers. */
+    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
+    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
+    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
+    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
+    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
+    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
+
+    /* FPU. */
+    env->fpuc = state->fpu.fx_cw;
+    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
+    env->fpus = state->fpu.fx_sw & ~0x3800;
+    for (i = 0; i < 8; i++) {
+        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
+    }
+    env->fpop = state->fpu.fx_opcode;
+    env->fpip = state->fpu.fx_ip.fa_64;
+    env->fpdp = state->fpu.fx_dp.fa_64;
+    env->mxcsr = state->fpu.fx_mxcsr;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&env->xmm_regs[i].ZMM_Q(0),
+            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
+        memcpy(&env->xmm_regs[i].ZMM_Q(1),
+            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
+    }
+
+    /* MSRs. */
+    env->efer = state->msrs[NVMM_X64_MSR_EFER];
+    env->star = state->msrs[NVMM_X64_MSR_STAR];
+#ifdef TARGET_X86_64
+    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
+    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
+    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
+    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
+#endif
+    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
+    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
+    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
+    env->pat = state->msrs[NVMM_X64_MSR_PAT];
+    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
+
+    x86_update_hflags(env);
+}
+
+static bool
+nvmm_can_take_int(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_machine *mach = get_nvmm_mach();
+
+    if (qcpu->int_window_exit) {
+        return false;
+    }
+
+    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
+        struct nvmm_x64_state *state = vcpu->state;
+
+        /* Exit on interrupt window. */
+        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
+        state->intr.int_window_exiting = 1;
+        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
+
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+nvmm_can_take_nmi(CPUState *cpu)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    /*
+     * Contrary to INTs, NMIs always schedule an exit when they are
+     * completed. Therefore, if window-exiting is enabled, it means
+     * NMIs are blocked.
+     */
+    if (qcpu->nmi_window_exit) {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Called before the VCPU is run. We inject events generated by the I/O
+ * thread, and synchronize the guest TPR.
+ */
+static void
+nvmm_vcpu_pre_run(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    struct nvmm_vcpu_event *event = vcpu->event;
+    bool has_event = false;
+    bool sync_tpr = false;
+    uint8_t tpr;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+
+    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        sync_tpr = true;
+    }
+
+    /*
+     * Force the VCPU out of its inner loop to process any INIT requests
+     * or commit pending TPR access.
+     */
+    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+        cpu->exit_request = 1;
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        if (nvmm_can_take_nmi(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = 2;
+            has_event = true;
+        }
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+        if (nvmm_can_take_int(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = cpu_get_pic_interrupt(env);
+            has_event = true;
+        }
+    }
+
+    /* Don't want SMIs. */
+    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
+    }
+
+    if (sync_tpr) {
+        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to get CPU state,"
+                " error=%d", errno);
+        }
+
+        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+
+        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to set CPU state,"
+                " error=%d", errno);
+        }
+    }
+
+    if (has_event) {
+        ret = nvmm_vcpu_inject(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to inject event,"
+                " error=%d", errno);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+}
+
+/*
+ * Called after the VCPU ran. We synchronize the host view of the TPR and
+ * RFLAGS.
+ */
+static void
+nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    uint64_t tpr;
+
+    env->eflags = exit->exitstate.rflags;
+    qcpu->int_shadow = exit->exitstate.int_shadow;
+    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
+    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
+
+    tpr = exit->exitstate.cr8;
+    if (qcpu->tpr != tpr) {
+        qcpu->tpr = tpr;
+        qemu_mutex_lock_iothread();
+        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_io_callback(struct nvmm_io *io)
+{
+    MemTxAttrs attrs = { 0 };
+    int ret;
+
+    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
+        io->size, !io->in);
+    if (ret != MEMTX_OK) {
+        error_report("NVMM: I/O Transaction Failed "
+            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
+            io->port, io->size);
+    }
+
+    /* Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static void
+nvmm_mem_callback(struct nvmm_mem *mem)
+{
+    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static struct nvmm_assist_callbacks nvmm_callbacks = {
+    .io = nvmm_io_callback,
+    .mem = nvmm_mem_callback
+};
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_mem(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: Mem Assist Failed [gpa=%p]",
+            (void *)vcpu->exit->u.mem.gpa);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_io(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: I/O Assist Failed [port=%d]",
+            (int)vcpu->exit->u.io.port);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    switch (exit->u.rdmsr.msr) {
+    case MSR_IA32_APICBASE:
+        val = cpu_get_apic_base(x86_cpu->apic_state);
+        break;
+    case MSR_MTRRcap:
+    case MSR_MTRRdefType:
+    case MSR_MCG_CAP:
+    case MSR_MCG_STATUS:
+        val = 0;
+        break;
+    default: /* More MSRs to add? */
+        val = 0;
+        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
+            exit->u.rdmsr.msr);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
+    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    val = exit->u.wrmsr.val;
+
+    switch (exit->u.wrmsr.msr) {
+    case MSR_IA32_APICBASE:
+        cpu_set_apic_base(x86_cpu->apic_state, val);
+        break;
+    case MSR_MTRRdefType:
+    case MSR_MCG_STATUS:
+        break;
+    default: /* More MSRs to add? */
+        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
+            exit->u.wrmsr.msr, val);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+          (env->eflags & IF_MASK)) &&
+        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->exception_index = EXCP_HLT;
+        cpu->halted = true;
+        ret = 1;
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
+static int
+nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    struct nvmm_vcpu_event *event = vcpu->event;
+
+    event->type = NVMM_VCPU_EVENT_EXCP;
+    event->vector = 6;
+    event->u.excp.error = 0;
+
+    return nvmm_vcpu_inject(mach, vcpu);
+}
+
+static int
+nvmm_vcpu_loop(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_vcpu_exit *exit = vcpu->exit;
+    int ret;
+
+    /*
+     * Some asynchronous events must be handled outside of the inner
+     * VCPU loop. They are handled here.
+     */
+    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_init(x86_cpu);
+        /* set int/nmi windows back to the reset state */
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
+        apic_poll_irq(x86_cpu->apic_state);
+    }
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = false;
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_sipi(x86_cpu);
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        nvmm_cpu_synchronize_state(cpu);
+        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
+            env->tpr_access_type);
+    }
+
+    if (cpu->halted) {
+        cpu->exception_index = EXCP_HLT;
+        atomic_set(&cpu->exit_request, false);
+        return 0;
+    }
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    /*
+     * Inner VCPU loop.
+     */
+    do {
+        if (cpu->vcpu_dirty) {
+            nvmm_set_registers(cpu);
+            cpu->vcpu_dirty = false;
+        }
+
+        if (qcpu->stop) {
+            cpu->exception_index = EXCP_INTERRUPT;
+            qcpu->stop = false;
+            ret = 1;
+            break;
+        }
+
+        nvmm_vcpu_pre_run(cpu);
+
+        if (atomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        ret = nvmm_vcpu_run(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to exec a virtual processor,"
+                " error=%d", errno);
+            break;
+        }
+
+        nvmm_vcpu_post_run(cpu, exit);
+
+        switch (exit->reason) {
+        case NVMM_VCPU_EXIT_NONE:
+            break;
+        case NVMM_VCPU_EXIT_MEMORY:
+            ret = nvmm_handle_mem(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_IO:
+            ret = nvmm_handle_io(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_INT_READY:
+        case NVMM_VCPU_EXIT_NMI_READY:
+        case NVMM_VCPU_EXIT_TPR_CHANGED:
+            break;
+        case NVMM_VCPU_EXIT_HALTED:
+            ret = nvmm_handle_halted(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_SHUTDOWN:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            cpu->exception_index = EXCP_INTERRUPT;
+            ret = 1;
+            break;
+        case NVMM_VCPU_EXIT_RDMSR:
+            ret = nvmm_handle_rdmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_WRMSR:
+            ret = nvmm_handle_wrmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_MONITOR:
+        case NVMM_VCPU_EXIT_MWAIT:
+            ret = nvmm_inject_ud(mach, vcpu);
+            break;
+        default:
+            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
+                exit->reason, exit->u.inv.hwcode);
+            nvmm_get_registers(cpu);
+            qemu_mutex_lock_iothread();
+            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+            qemu_mutex_unlock_iothread();
+            ret = -1;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+    current_cpu = cpu;
+
+    atomic_set(&cpu->exit_request, false);
+
+    return ret < 0;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_get_registers(cpu);
+    cpu->vcpu_dirty = true;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static Error *nvmm_migration_blocker;
+
+static void
+nvmm_ipi_signal(int sigcpu)
+{
+    struct qemu_vcpu *qcpu;
+
+    if (current_cpu) {
+        qcpu = get_qemu_vcpu(current_cpu);
+        qcpu->stop = true;
+    }
+}
+
+static void
+nvmm_init_cpu_signals(void)
+{
+    struct sigaction sigact;
+    sigset_t set;
+
+    /* Install the IPI handler. */
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = nvmm_ipi_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    /* Allow IPIs on the current thread. */
+    sigprocmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
+int
+nvmm_init_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct nvmm_vcpu_conf_cpuid cpuid;
+    struct nvmm_vcpu_conf_tpr tpr;
+    Error *local_error = NULL;
+    struct qemu_vcpu *qcpu;
+    int ret, err;
+
+    nvmm_init_cpu_signals();
+
+    if (nvmm_migration_blocker == NULL) {
+        error_setg(&nvmm_migration_blocker,
+            "NVMM: Migration not supported");
+
+        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
+        if (local_error) {
+            error_report_err(local_error);
+            migrate_del_blocker(nvmm_migration_blocker);
+            error_free(nvmm_migration_blocker);
+            return -EINVAL;
+        }
+    }
+
+    qcpu = g_malloc0(sizeof(*qcpu));
+    if (qcpu == NULL) {
+        error_report("NVMM: Failed to allocate VCPU context.");
+        return -ENOMEM;
+    }
+
+    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to create a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    memset(&cpuid, 0, sizeof(cpuid));
+    cpuid.mask = 1;
+    cpuid.leaf = 0x00000001;
+    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
+        &cpuid);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
+        &nvmm_callbacks);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
+        memset(&tpr, 0, sizeof(tpr));
+        tpr.exit_changed = 1;
+        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
+        if (ret == -1) {
+            err = errno;
+            error_report("NVMM: Failed to configure a virtual processor,"
+                " error=%d", err);
+            g_free(qcpu);
+            return -err;
+        }
+    }
+
+    cpu->vcpu_dirty = true;
+    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
+
+    return 0;
+}
+
+int
+nvmm_vcpu_exec(CPUState *cpu)
+{
+    int ret, fatal;
+
+    while (1) {
+        if (cpu->exception_index >= EXCP_INTERRUPT) {
+            ret = cpu->exception_index;
+            cpu->exception_index = -1;
+            break;
+        }
+
+        fatal = nvmm_vcpu_loop(cpu);
+
+        if (fatal) {
+            error_report("NVMM: Failed to execute a VCPU.");
+            abort();
+        }
+    }
+
+    return ret;
+}
+
+void
+nvmm_destroy_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
+    g_free(cpu->hax_vcpu);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
+    bool add, bool rom, const char *name)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    int ret, prot;
+
+    if (add) {
+        prot = PROT_READ | PROT_EXEC;
+        if (!rom) {
+            prot |= PROT_WRITE;
+        }
+        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
+    } else {
+        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
+    }
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
+            "Size:%p bytes, HostVA:%p, error=%d",
+            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
+            (void *)size, (void *)hva, errno);
+    }
+}
+
+static void
+nvmm_process_section(MemoryRegionSection *section, int add)
+{
+    MemoryRegion *mr = section->mr;
+    hwaddr start_pa = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned int delta;
+    uintptr_t hva;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    /* Adjust start_pa and size so that they are page-aligned. */
+    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
+    delta &= ~qemu_real_host_page_mask;
+    if (delta > size) {
+        return;
+    }
+    start_pa += delta;
+    size -= delta;
+    size &= qemu_real_host_page_mask;
+    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
+        return;
+    }
+
+    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
+        section->offset_within_region + delta;
+
+    nvmm_update_mapping(start_pa, size, hva, add,
+        memory_region_is_rom(mr), mr->name);
+}
+
+static void
+nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    memory_region_ref(section->mr);
+    nvmm_process_section(section, 1);
+}
+
+static void
+nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
+{
+    nvmm_process_section(section, 0);
+    memory_region_unref(section->mr);
+}
+
+static void
+nvmm_transaction_begin(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_transaction_commit(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    memory_region_set_dirty(mr, 0, int128_get64(section->size));
+}
+
+static MemoryListener nvmm_memory_listener = {
+    .begin = nvmm_transaction_begin,
+    .commit = nvmm_transaction_commit,
+    .region_add = nvmm_region_add,
+    .region_del = nvmm_region_del,
+    .log_sync = nvmm_log_sync,
+    .priority = 10,
+};
+
+static void
+nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    uintptr_t hva = (uintptr_t)host;
+    int ret;
+
+    ret = nvmm_hva_map(mach, hva, size);
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to map HVA, HostVA:%p "
+            "Size:%p bytes, error=%d",
+            (void *)hva, (void *)size, errno);
+    }
+}
+
+static struct RAMBlockNotifier nvmm_ram_notifier = {
+    .ram_block_added = nvmm_ram_block_added
+};
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_handle_interrupt(CPUState *cpu, int mask)
+{
+    cpu->interrupt_request |= mask;
+
+    if (!qemu_cpu_is_self(cpu)) {
+        qemu_cpu_kick(cpu);
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_accel_init(MachineState *ms)
+{
+    int ret, err;
+
+    ret = nvmm_init();
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Initialization failed, error=%d", errno);
+        return -err;
+    }
+
+    ret = nvmm_capability(&qemu_mach.cap);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Unable to fetch capability, error=%d", errno);
+        return -err;
+    }
+    if (qemu_mach.cap.version != NVMM_KERN_VERSION) {
+        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
+        return -EPROGMISMATCH;
+    }
+    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
+        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
+        return -EPROGMISMATCH;
+    }
+
+    ret = nvmm_machine_create(&qemu_mach.mach);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Machine creation failed, error=%d", errno);
+        return -err;
+    }
+
+    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
+    ram_block_notifier_add(&nvmm_ram_notifier);
+
+    cpu_interrupt_handler = nvmm_handle_interrupt;
+
+    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
+    return 0;
+}
+
+int
+nvmm_enabled(void)
+{
+    return nvmm_allowed;
+}
+
+static void
+nvmm_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "NVMM";
+    ac->init_machine = nvmm_accel_init;
+    ac->allowed = &nvmm_allowed;
+}
+
+static const TypeInfo nvmm_accel_type = {
+    .name = ACCEL_CLASS_NAME("nvmm"),
+    .parent = TYPE_ACCEL,
+    .class_init = nvmm_accel_class_init,
+};
+
+static void
+nvmm_type_init(void)
+{
+    type_register_static(&nvmm_accel_type);
+}
+
+type_init(nvmm_type_init);
-- 
2.24.1




^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 4/4] Add the NVMM acceleration enlightenments
  2020-08-11 12:47         ` [PATCH v5 " Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  2020-08-11 12:47           ` [PATCH v5 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-08-11 12:47           ` Kamil Rytarowski
  2 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 12:47 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
accelerator on NetBSD platforms.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 include/sysemu/hw_accel.h | 14 ++++++++++
 softmmu/cpus.c            | 58 +++++++++++++++++++++++++++++++++++++++
 target/i386/helper.c      |  2 +-
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index e128f8b06b..9e19f5794c 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -16,6 +16,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"
 
 static inline void cpu_synchronize_state(CPUState *cpu)
 {
@@ -31,6 +32,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_state(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_state(cpu);
+    }
 }
 
 static inline void cpu_synchronize_post_reset(CPUState *cpu)
@@ -47,6 +51,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_reset(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_reset(cpu);
+    }
+
 }
 
 static inline void cpu_synchronize_post_init(CPUState *cpu)
@@ -63,6 +71,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_init(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_init(cpu);
+    }
 }
 
 static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
@@ -79,6 +90,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_pre_loadvm(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_pre_loadvm(cpu);
+    }
 }
 
 #endif /* QEMU_HW_ACCEL_H */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index a802e899ab..3b44b92830 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -43,6 +43,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"
 #include "exec/exec-all.h"
 
 #include "qemu/thread.h"
@@ -1621,6 +1622,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     return NULL;
 }
 
+static void *qemu_nvmm_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    assert(nvmm_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = nvmm_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = nvmm_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    nvmm_destroy_vcpu(cpu);
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 #ifdef _WIN32
 static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 {
@@ -1998,6 +2041,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }
 
+static void qemu_nvmm_start_vcpu(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
 static void qemu_dummy_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -2038,6 +2094,8 @@ void qemu_init_vcpu(CPUState *cpu)
         qemu_tcg_init_vcpu(cpu);
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
+    } else if (nvmm_enabled()) {
+        qemu_nvmm_start_vcpu(cpu);
     } else {
         qemu_dummy_start_vcpu(cpu);
     }
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 70be53e2c3..c2f1aef65c 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -983,7 +983,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
     X86CPU *cpu = env_archcpu(env);
     CPUState *cs = env_cpu(env);
 
-    if (kvm_enabled() || whpx_enabled()) {
+    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
         env->tpr_access_type = access;
 
         cpu_interrupt(cs, CPU_INTERRUPT_TPR);
-- 
2.24.1




^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 1/4] Add the NVMM vcpu API
  2020-02-06 21:32       ` [PATCH v4 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-08-11 12:47         ` [PATCH v5 " Kamil Rytarowski
@ 2020-08-11 13:01         ` Kamil Rytarowski
  2020-08-11 13:01           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
                             ` (3 more replies)
  1 sibling, 4 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 13:01 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
introduces the nvmm.h sysemu API for managing the vcpu scheduling and
management.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 accel/stubs/Makefile.objs |  1 +
 accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)
 create mode 100644 accel/stubs/nvmm-stub.c
 create mode 100644 include/sysemu/nvmm.h

diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
index bbd14e71fb..38660a0b9b 100644
--- a/accel/stubs/Makefile.objs
+++ b/accel/stubs/Makefile.objs
@@ -1,6 +1,7 @@
 obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
 obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
 obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
+obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
 obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
 obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
 obj-$(call lnot,$(CONFIG_XEN))  += xen-stub.o
diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
new file mode 100644
index 0000000000..c2208b84a3
--- /dev/null
+++ b/accel/stubs/nvmm-stub.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/nvmm.h"
+
+int nvmm_init_vcpu(CPUState *cpu)
+{
+    return -1;
+}
+
+int nvmm_vcpu_exec(CPUState *cpu)
+{
+    return -1;
+}
+
+void nvmm_destroy_vcpu(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+}
diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
new file mode 100644
index 0000000000..10496f3980
--- /dev/null
+++ b/include/sysemu/nvmm.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVMM_H
+#define QEMU_NVMM_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int nvmm_init_vcpu(CPUState *);
+int nvmm_vcpu_exec(CPUState *);
+void nvmm_destroy_vcpu(CPUState *);
+
+void nvmm_cpu_synchronize_state(CPUState *);
+void nvmm_cpu_synchronize_post_reset(CPUState *);
+void nvmm_cpu_synchronize_post_init(CPUState *);
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
+
+#ifdef CONFIG_NVMM
+
+int nvmm_enabled(void);
+
+#else /* CONFIG_NVMM */
+
+#define nvmm_enabled() (0)
+
+#endif /* CONFIG_NVMM */
+
+#endif /* CONFIG_NVMM */
--
2.28.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator.
  2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
@ 2020-08-11 13:01           ` Kamil Rytarowski
  2020-08-11 13:01           ` [PATCH v5 3/4] Introduce the NVMM impl Kamil Rytarowski
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 13:01 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Introduces the configure support for the new NetBSD Virtual Machine Monitor that
allows for hypervisor acceleration from usermode components on the NetBSD
platform.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 configure       | 37 +++++++++++++++++++++++++++++++++++++
 qemu-options.hx | 10 +++++-----
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/configure b/configure
index 2acc4d1465..fb9ffba2bf 100755
--- a/configure
+++ b/configure
@@ -246,6 +246,17 @@ supported_whpx_target() {
     return 1
 }

+supported_nvmm_target() {
+    test "$nvmm" = "yes" || return 1
+    glob "$1" "*-softmmu" || return 1
+    case "${1%-softmmu}" in
+        i386|x86_64)
+            return 0
+        ;;
+    esac
+    return 1
+}
+
 supported_target() {
     case "$1" in
         *-softmmu)
@@ -273,6 +284,7 @@ supported_target() {
     supported_hax_target "$1" && return 0
     supported_hvf_target "$1" && return 0
     supported_whpx_target "$1" && return 0
+    supported_nvmm_target "$1" && return 0
     print_error "TCG disabled, but hardware accelerator not available for '$target'"
     return 1
 }
@@ -395,6 +407,7 @@ kvm="no"
 hax="no"
 hvf="no"
 whpx="no"
+nvmm="no"
 rdma=""
 pvrdma=""
 gprof="no"
@@ -847,6 +860,7 @@ DragonFly)
 NetBSD)
   bsd="yes"
   hax="yes"
+  nvmm="yes"
   make="${MAKE-gmake}"
   audio_drv_list="oss try-sdl"
   audio_possible_drivers="oss sdl"
@@ -1233,6 +1247,10 @@ for opt do
   ;;
   --enable-whpx) whpx="yes"
   ;;
+  --disable-nvmm) nvmm="no"
+  ;;
+  --enable-nvmm) nvmm="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1879,6 +1897,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   hax             HAX acceleration support
   hvf             Hypervisor.framework acceleration support
   whpx            Windows Hypervisor Platform acceleration support
+  nvmm            NetBSD Virtual Machine Monitor acceleration support
   rdma            Enable RDMA-based migration
   pvrdma          Enable PVRDMA support
   vde             support for vde network
@@ -2965,6 +2984,20 @@ if test "$whpx" != "no" ; then
     fi
 fi

+##########################################
+# NetBSD Virtual Machine Monitor (NVMM) accelerator check
+if test "$nvmm" != "no" ; then
+    if check_include "nvmm.h" ; then
+        nvmm="yes"
+	LIBS="-lnvmm $LIBS"
+    else
+        if test "$nvmm" = "yes"; then
+            feature_not_found "NVMM" "NVMM is not available"
+        fi
+        nvmm="no"
+    fi
+fi
+
 ##########################################
 # Sparse probe
 if test "$sparse" != "no" ; then
@@ -6934,6 +6967,7 @@ echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
 echo "WHPX support      $whpx"
+echo "NVMM support      $nvmm"
 echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
@@ -8332,6 +8366,9 @@ fi
 if test "$target_aligned_only" = "yes" ; then
   echo "TARGET_ALIGNED_ONLY=y" >> $config_target_mak
 fi
+if supported_nvmm_target $target; then
+    echo "CONFIG_NVMM=y" >> $config_target_mak
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/qemu-options.hx b/qemu-options.hx
index 708583b4ce..697accaa7e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -26,7 +26,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "-machine [type=]name[,prop[=value][,...]]\n"
     "                selects emulated machine ('-machine help' for list)\n"
     "                property accel=accel1[:accel2[:...]] selects accelerator\n"
-    "                supported accelerators are kvm, xen, hax, hvf, whpx or tcg (default: tcg)\n"
+    "                supported accelerators are kvm, xen, hax, hvf, nvmm, whpx or tcg (default: tcg)\n"
     "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
     "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
     "                mem-merge=on|off controls memory merge support (default: on)\n"
@@ -58,7 +58,7 @@ SRST

     ``accel=accels1[:accels2[:...]]``
         This is used to enable an accelerator. Depending on the target
-        architecture, kvm, xen, hax, hvf, whpx or tcg can be available.
+        architecture, kvm, xen, hax, hvf, nvmm, whpx or tcg can be available.
         By default, tcg is used. If there is more than one accelerator
         specified, the next one is used if the previous one fails to
         initialize.
@@ -119,7 +119,7 @@ ERST

 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "-accel [accel=]accelerator[,prop[=value][,...]]\n"
-    "                select accelerator (kvm, xen, hax, hvf, whpx or tcg; use 'help' for a list)\n"
+    "                select accelerator (kvm, xen, hax, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
     "                igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
     "                kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
     "                kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
@@ -128,8 +128,8 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
 SRST
 ``-accel name[,prop=value[,...]]``
     This is used to enable an accelerator. Depending on the target
-    architecture, kvm, xen, hax, hvf, whpx or tcg can be available. By
-    default, tcg is used. If there is more than one accelerator
+    architecture, kvm, xen, hax, hvf, nvmm whpx or tcg can be available.
+    By default, tcg is used. If there is more than one accelerator
     specified, the next one is used if the previous one fails to
     initialize.

--
2.28.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 3/4] Introduce the NVMM impl
  2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-08-11 13:01           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
@ 2020-08-11 13:01           ` Kamil Rytarowski
  2020-08-11 13:01           ` [PATCH v5 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
  2020-09-04 23:28           ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
  3 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 13:01 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NetBSD Virtual Machine Monitor (NVMM) target. Which
acts as a hypervisor accelerator for QEMU on the NetBSD platform. This enables
QEMU much greater speed over the emulated x86_64 path's that are taken on
NetBSD today.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 target/i386/Makefile.objs |    1 +
 target/i386/nvmm-all.c    | 1226 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1227 insertions(+)
 create mode 100644 target/i386/nvmm-all.c

diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 0b93143e27..ff0df68404 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -18,6 +18,7 @@ obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-posix.o
 endif
 obj-$(CONFIG_HVF) += hvf/
 obj-$(CONFIG_WHPX) += whpx-all.o
+obj-$(CONFIG_NVMM) += nvmm-all.o
 endif
 obj-$(CONFIG_SEV) += sev.o
 obj-$(call lnot,$(CONFIG_SEV)) += sev-stub.o
diff --git a/target/i386/nvmm-all.c b/target/i386/nvmm-all.c
new file mode 100644
index 0000000000..408f7305b9
--- /dev/null
+++ b/target/i386/nvmm-all.c
@@ -0,0 +1,1226 @@
+/*
+ * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
+ *
+ * NetBSD Virtual Machine Monitor (NVMM) accelerator for QEMU.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "exec/ioport.h"
+#include "qemu-common.h"
+#include "strings.h"
+#include "sysemu/accel.h"
+#include "sysemu/nvmm.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "migration/blocker.h"
+
+#include <nvmm.h>
+
+struct qemu_vcpu {
+    struct nvmm_vcpu vcpu;
+    uint8_t tpr;
+    bool stop;
+
+    /* Window-exiting for INTs/NMIs. */
+    bool int_window_exit;
+    bool nmi_window_exit;
+
+    /* The guest is in an interrupt shadow (POP SS, etc). */
+    bool int_shadow;
+};
+
+struct qemu_machine {
+    struct nvmm_capability cap;
+    struct nvmm_machine mach;
+};
+
+/* -------------------------------------------------------------------------- */
+
+static bool nvmm_allowed;
+static struct qemu_machine qemu_mach;
+
+static struct qemu_vcpu *
+get_qemu_vcpu(CPUState *cpu)
+{
+    return (struct qemu_vcpu *)cpu->hax_vcpu;
+}
+
+static struct nvmm_machine *
+get_nvmm_mach(void)
+{
+    return &qemu_mach.mach;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_set_segment(struct nvmm_x64_state_seg *nseg, const SegmentCache *qseg)
+{
+    uint32_t attrib = qseg->flags;
+
+    nseg->selector = qseg->selector;
+    nseg->limit = qseg->limit;
+    nseg->base = qseg->base;
+    nseg->attrib.type = __SHIFTOUT(attrib, DESC_TYPE_MASK);
+    nseg->attrib.s = __SHIFTOUT(attrib, DESC_S_MASK);
+    nseg->attrib.dpl = __SHIFTOUT(attrib, DESC_DPL_MASK);
+    nseg->attrib.p = __SHIFTOUT(attrib, DESC_P_MASK);
+    nseg->attrib.avl = __SHIFTOUT(attrib, DESC_AVL_MASK);
+    nseg->attrib.l = __SHIFTOUT(attrib, DESC_L_MASK);
+    nseg->attrib.def = __SHIFTOUT(attrib, DESC_B_MASK);
+    nseg->attrib.g = __SHIFTOUT(attrib, DESC_G_MASK);
+}
+
+static void
+nvmm_set_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* GPRs. */
+    state->gprs[NVMM_X64_GPR_RAX] = env->regs[R_EAX];
+    state->gprs[NVMM_X64_GPR_RCX] = env->regs[R_ECX];
+    state->gprs[NVMM_X64_GPR_RDX] = env->regs[R_EDX];
+    state->gprs[NVMM_X64_GPR_RBX] = env->regs[R_EBX];
+    state->gprs[NVMM_X64_GPR_RSP] = env->regs[R_ESP];
+    state->gprs[NVMM_X64_GPR_RBP] = env->regs[R_EBP];
+    state->gprs[NVMM_X64_GPR_RSI] = env->regs[R_ESI];
+    state->gprs[NVMM_X64_GPR_RDI] = env->regs[R_EDI];
+#ifdef TARGET_X86_64
+    state->gprs[NVMM_X64_GPR_R8]  = env->regs[R_R8];
+    state->gprs[NVMM_X64_GPR_R9]  = env->regs[R_R9];
+    state->gprs[NVMM_X64_GPR_R10] = env->regs[R_R10];
+    state->gprs[NVMM_X64_GPR_R11] = env->regs[R_R11];
+    state->gprs[NVMM_X64_GPR_R12] = env->regs[R_R12];
+    state->gprs[NVMM_X64_GPR_R13] = env->regs[R_R13];
+    state->gprs[NVMM_X64_GPR_R14] = env->regs[R_R14];
+    state->gprs[NVMM_X64_GPR_R15] = env->regs[R_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    state->gprs[NVMM_X64_GPR_RIP] = env->eip;
+    state->gprs[NVMM_X64_GPR_RFLAGS] = env->eflags;
+
+    /* Segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_CS], &env->segs[R_CS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_DS], &env->segs[R_DS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_ES], &env->segs[R_ES]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_FS], &env->segs[R_FS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GS], &env->segs[R_GS]);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_SS], &env->segs[R_SS]);
+
+    /* Special segments. */
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_GDT], &env->gdt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_LDT], &env->ldt);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_TR], &env->tr);
+    nvmm_set_segment(&state->segs[NVMM_X64_SEG_IDT], &env->idt);
+
+    /* Control registers. */
+    state->crs[NVMM_X64_CR_CR0] = env->cr[0];
+    state->crs[NVMM_X64_CR_CR2] = env->cr[2];
+    state->crs[NVMM_X64_CR_CR3] = env->cr[3];
+    state->crs[NVMM_X64_CR_CR4] = env->cr[4];
+    state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+    state->crs[NVMM_X64_CR_XCR0] = env->xcr0;
+
+    /* Debug registers. */
+    state->drs[NVMM_X64_DR_DR0] = env->dr[0];
+    state->drs[NVMM_X64_DR_DR1] = env->dr[1];
+    state->drs[NVMM_X64_DR_DR2] = env->dr[2];
+    state->drs[NVMM_X64_DR_DR3] = env->dr[3];
+    state->drs[NVMM_X64_DR_DR6] = env->dr[6];
+    state->drs[NVMM_X64_DR_DR7] = env->dr[7];
+
+    /* FPU. */
+    state->fpu.fx_cw = env->fpuc;
+    state->fpu.fx_sw = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+    state->fpu.fx_tw = 0;
+    for (i = 0; i < 8; i++) {
+        state->fpu.fx_tw |= (!env->fptags[i]) << i;
+    }
+    state->fpu.fx_opcode = env->fpop;
+    state->fpu.fx_ip.fa_64 = env->fpip;
+    state->fpu.fx_dp.fa_64 = env->fpdp;
+    state->fpu.fx_mxcsr = env->mxcsr;
+    state->fpu.fx_mxcsr_mask = 0x0000FFFF;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(state->fpu.fx_87_ac, env->fpregs, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[0],
+            &env->xmm_regs[i].ZMM_Q(0), 8);
+        memcpy(&state->fpu.fx_xmm[i].xmm_bytes[8],
+            &env->xmm_regs[i].ZMM_Q(1), 8);
+    }
+
+    /* MSRs. */
+    state->msrs[NVMM_X64_MSR_EFER] = env->efer;
+    state->msrs[NVMM_X64_MSR_STAR] = env->star;
+#ifdef TARGET_X86_64
+    state->msrs[NVMM_X64_MSR_LSTAR] = env->lstar;
+    state->msrs[NVMM_X64_MSR_CSTAR] = env->cstar;
+    state->msrs[NVMM_X64_MSR_SFMASK] = env->fmask;
+    state->msrs[NVMM_X64_MSR_KERNELGSBASE] = env->kernelgsbase;
+#endif
+    state->msrs[NVMM_X64_MSR_SYSENTER_CS]  = env->sysenter_cs;
+    state->msrs[NVMM_X64_MSR_SYSENTER_ESP] = env->sysenter_esp;
+    state->msrs[NVMM_X64_MSR_SYSENTER_EIP] = env->sysenter_eip;
+    state->msrs[NVMM_X64_MSR_PAT] = env->pat;
+    state->msrs[NVMM_X64_MSR_TSC] = env->tsc;
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to set virtual processor context,"
+            " error=%d", errno);
+    }
+}
+
+static void
+nvmm_get_segment(SegmentCache *qseg, const struct nvmm_x64_state_seg *nseg)
+{
+    qseg->selector = nseg->selector;
+    qseg->limit = nseg->limit;
+    qseg->base = nseg->base;
+
+    qseg->flags =
+        __SHIFTIN((uint32_t)nseg->attrib.type, DESC_TYPE_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.s, DESC_S_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.dpl, DESC_DPL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.p, DESC_P_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.avl, DESC_AVL_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.l, DESC_L_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.def, DESC_B_MASK) |
+        __SHIFTIN((uint32_t)nseg->attrib.g, DESC_G_MASK);
+}
+
+static void
+nvmm_get_registers(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t bitmap, tpr;
+    size_t i;
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    bitmap =
+        NVMM_X64_STATE_SEGS |
+        NVMM_X64_STATE_GPRS |
+        NVMM_X64_STATE_CRS  |
+        NVMM_X64_STATE_DRS  |
+        NVMM_X64_STATE_MSRS |
+        NVMM_X64_STATE_FPU;
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, bitmap);
+    if (ret == -1) {
+        error_report("NVMM: Failed to get virtual processor context,"
+            " error=%d", errno);
+    }
+
+    /* GPRs. */
+    env->regs[R_EAX] = state->gprs[NVMM_X64_GPR_RAX];
+    env->regs[R_ECX] = state->gprs[NVMM_X64_GPR_RCX];
+    env->regs[R_EDX] = state->gprs[NVMM_X64_GPR_RDX];
+    env->regs[R_EBX] = state->gprs[NVMM_X64_GPR_RBX];
+    env->regs[R_ESP] = state->gprs[NVMM_X64_GPR_RSP];
+    env->regs[R_EBP] = state->gprs[NVMM_X64_GPR_RBP];
+    env->regs[R_ESI] = state->gprs[NVMM_X64_GPR_RSI];
+    env->regs[R_EDI] = state->gprs[NVMM_X64_GPR_RDI];
+#ifdef TARGET_X86_64
+    env->regs[R_R8]  = state->gprs[NVMM_X64_GPR_R8];
+    env->regs[R_R9]  = state->gprs[NVMM_X64_GPR_R9];
+    env->regs[R_R10] = state->gprs[NVMM_X64_GPR_R10];
+    env->regs[R_R11] = state->gprs[NVMM_X64_GPR_R11];
+    env->regs[R_R12] = state->gprs[NVMM_X64_GPR_R12];
+    env->regs[R_R13] = state->gprs[NVMM_X64_GPR_R13];
+    env->regs[R_R14] = state->gprs[NVMM_X64_GPR_R14];
+    env->regs[R_R15] = state->gprs[NVMM_X64_GPR_R15];
+#endif
+
+    /* RIP and RFLAGS. */
+    env->eip = state->gprs[NVMM_X64_GPR_RIP];
+    env->eflags = state->gprs[NVMM_X64_GPR_RFLAGS];
+
+    /* Segments. */
+    nvmm_get_segment(&env->segs[R_ES], &state->segs[NVMM_X64_SEG_ES]);
+    nvmm_get_segment(&env->segs[R_CS], &state->segs[NVMM_X64_SEG_CS]);
+    nvmm_get_segment(&env->segs[R_SS], &state->segs[NVMM_X64_SEG_SS]);
+    nvmm_get_segment(&env->segs[R_DS], &state->segs[NVMM_X64_SEG_DS]);
+    nvmm_get_segment(&env->segs[R_FS], &state->segs[NVMM_X64_SEG_FS]);
+    nvmm_get_segment(&env->segs[R_GS], &state->segs[NVMM_X64_SEG_GS]);
+
+    /* Special segments. */
+    nvmm_get_segment(&env->gdt, &state->segs[NVMM_X64_SEG_GDT]);
+    nvmm_get_segment(&env->ldt, &state->segs[NVMM_X64_SEG_LDT]);
+    nvmm_get_segment(&env->tr, &state->segs[NVMM_X64_SEG_TR]);
+    nvmm_get_segment(&env->idt, &state->segs[NVMM_X64_SEG_IDT]);
+
+    /* Control registers. */
+    env->cr[0] = state->crs[NVMM_X64_CR_CR0];
+    env->cr[2] = state->crs[NVMM_X64_CR_CR2];
+    env->cr[3] = state->crs[NVMM_X64_CR_CR3];
+    env->cr[4] = state->crs[NVMM_X64_CR_CR4];
+    tpr = state->crs[NVMM_X64_CR_CR8];
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+    }
+    env->xcr0 = state->crs[NVMM_X64_CR_XCR0];
+
+    /* Debug registers. */
+    env->dr[0] = state->drs[NVMM_X64_DR_DR0];
+    env->dr[1] = state->drs[NVMM_X64_DR_DR1];
+    env->dr[2] = state->drs[NVMM_X64_DR_DR2];
+    env->dr[3] = state->drs[NVMM_X64_DR_DR3];
+    env->dr[6] = state->drs[NVMM_X64_DR_DR6];
+    env->dr[7] = state->drs[NVMM_X64_DR_DR7];
+
+    /* FPU. */
+    env->fpuc = state->fpu.fx_cw;
+    env->fpstt = (state->fpu.fx_sw >> 11) & 0x7;
+    env->fpus = state->fpu.fx_sw & ~0x3800;
+    for (i = 0; i < 8; i++) {
+        env->fptags[i] = !((state->fpu.fx_tw >> i) & 1);
+    }
+    env->fpop = state->fpu.fx_opcode;
+    env->fpip = state->fpu.fx_ip.fa_64;
+    env->fpdp = state->fpu.fx_dp.fa_64;
+    env->mxcsr = state->fpu.fx_mxcsr;
+    assert(sizeof(state->fpu.fx_87_ac) == sizeof(env->fpregs));
+    memcpy(env->fpregs, state->fpu.fx_87_ac, sizeof(env->fpregs));
+    for (i = 0; i < 16; i++) {
+        memcpy(&env->xmm_regs[i].ZMM_Q(0),
+            &state->fpu.fx_xmm[i].xmm_bytes[0], 8);
+        memcpy(&env->xmm_regs[i].ZMM_Q(1),
+            &state->fpu.fx_xmm[i].xmm_bytes[8], 8);
+    }
+
+    /* MSRs. */
+    env->efer = state->msrs[NVMM_X64_MSR_EFER];
+    env->star = state->msrs[NVMM_X64_MSR_STAR];
+#ifdef TARGET_X86_64
+    env->lstar = state->msrs[NVMM_X64_MSR_LSTAR];
+    env->cstar = state->msrs[NVMM_X64_MSR_CSTAR];
+    env->fmask = state->msrs[NVMM_X64_MSR_SFMASK];
+    env->kernelgsbase = state->msrs[NVMM_X64_MSR_KERNELGSBASE];
+#endif
+    env->sysenter_cs  = state->msrs[NVMM_X64_MSR_SYSENTER_CS];
+    env->sysenter_esp = state->msrs[NVMM_X64_MSR_SYSENTER_ESP];
+    env->sysenter_eip = state->msrs[NVMM_X64_MSR_SYSENTER_EIP];
+    env->pat = state->msrs[NVMM_X64_MSR_PAT];
+    env->tsc = state->msrs[NVMM_X64_MSR_TSC];
+
+    x86_update_hflags(env);
+}
+
+static bool
+nvmm_can_take_int(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    struct nvmm_machine *mach = get_nvmm_mach();
+
+    if (qcpu->int_window_exit) {
+        return false;
+    }
+
+    if (qcpu->int_shadow || !(env->eflags & IF_MASK)) {
+        struct nvmm_x64_state *state = vcpu->state;
+
+        /* Exit on interrupt window. */
+        nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_INTR);
+        state->intr.int_window_exiting = 1;
+        nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_INTR);
+
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+nvmm_can_take_nmi(CPUState *cpu)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    /*
+     * Contrary to INTs, NMIs always schedule an exit when they are
+     * completed. Therefore, if window-exiting is enabled, it means
+     * NMIs are blocked.
+     */
+    if (qcpu->nmi_window_exit) {
+        return false;
+    }
+
+    return true;
+}
+
+/*
+ * Called before the VCPU is run. We inject events generated by the I/O
+ * thread, and synchronize the guest TPR.
+ */
+static void
+nvmm_vcpu_pre_run(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    struct nvmm_vcpu_event *event = vcpu->event;
+    bool has_event = false;
+    bool sync_tpr = false;
+    uint8_t tpr;
+    int ret;
+
+    qemu_mutex_lock_iothread();
+
+    tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+    if (tpr != qcpu->tpr) {
+        qcpu->tpr = tpr;
+        sync_tpr = true;
+    }
+
+    /*
+     * Force the VCPU out of its inner loop to process any INIT requests
+     * or commit pending TPR access.
+     */
+    if (cpu->interrupt_request & (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
+        cpu->exit_request = 1;
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        if (nvmm_can_take_nmi(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_NMI;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = 2;
+            has_event = true;
+        }
+    }
+
+    if (!has_event && (cpu->interrupt_request & CPU_INTERRUPT_HARD)) {
+        if (nvmm_can_take_int(cpu)) {
+            cpu->interrupt_request &= ~CPU_INTERRUPT_HARD;
+            event->type = NVMM_VCPU_EVENT_INTR;
+            event->vector = cpu_get_pic_interrupt(env);
+            has_event = true;
+        }
+    }
+
+    /* Don't want SMIs. */
+    if (cpu->interrupt_request & CPU_INTERRUPT_SMI) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_SMI;
+    }
+
+    if (sync_tpr) {
+        ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to get CPU state,"
+                " error=%d", errno);
+        }
+
+        state->crs[NVMM_X64_CR_CR8] = qcpu->tpr;
+
+        ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_CRS);
+        if (ret == -1) {
+            error_report("NVMM: Failed to set CPU state,"
+                " error=%d", errno);
+        }
+    }
+
+    if (has_event) {
+        ret = nvmm_vcpu_inject(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to inject event,"
+                " error=%d", errno);
+        }
+    }
+
+    qemu_mutex_unlock_iothread();
+}
+
+/*
+ * Called after the VCPU ran. We synchronize the host view of the TPR and
+ * RFLAGS.
+ */
+static void
+nvmm_vcpu_post_run(CPUState *cpu, struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    uint64_t tpr;
+
+    env->eflags = exit->exitstate.rflags;
+    qcpu->int_shadow = exit->exitstate.int_shadow;
+    qcpu->int_window_exit = exit->exitstate.int_window_exiting;
+    qcpu->nmi_window_exit = exit->exitstate.nmi_window_exiting;
+
+    tpr = exit->exitstate.cr8;
+    if (qcpu->tpr != tpr) {
+        qcpu->tpr = tpr;
+        qemu_mutex_lock_iothread();
+        cpu_set_apic_tpr(x86_cpu->apic_state, qcpu->tpr);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_io_callback(struct nvmm_io *io)
+{
+    MemTxAttrs attrs = { 0 };
+    int ret;
+
+    ret = address_space_rw(&address_space_io, io->port, attrs, io->data,
+        io->size, !io->in);
+    if (ret != MEMTX_OK) {
+        error_report("NVMM: I/O Transaction Failed "
+            "[%s, port=%u, size=%zu]", (io->in ? "in" : "out"),
+            io->port, io->size);
+    }
+
+    /* Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static void
+nvmm_mem_callback(struct nvmm_mem *mem)
+{
+    cpu_physical_memory_rw(mem->gpa, mem->data, mem->size, mem->write);
+
+    /* XXX Needed, otherwise infinite loop. */
+    current_cpu->vcpu_dirty = false;
+}
+
+static struct nvmm_assist_callbacks nvmm_callbacks = {
+    .io = nvmm_io_callback,
+    .mem = nvmm_mem_callback
+};
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_handle_mem(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_mem(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: Mem Assist Failed [gpa=%p]",
+            (void *)vcpu->exit->u.mem.gpa);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_io(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    int ret;
+
+    ret = nvmm_assist_io(mach, vcpu);
+    if (ret == -1) {
+        error_report("NVMM: I/O Assist Failed [port=%d]",
+            (int)vcpu->exit->u.io.port);
+    }
+
+    return ret;
+}
+
+static int
+nvmm_handle_rdmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    switch (exit->u.rdmsr.msr) {
+    case MSR_IA32_APICBASE:
+        val = cpu_get_apic_base(x86_cpu->apic_state);
+        break;
+    case MSR_MTRRcap:
+    case MSR_MTRRdefType:
+    case MSR_MCG_CAP:
+    case MSR_MCG_STATUS:
+        val = 0;
+        break;
+    default: /* More MSRs to add? */
+        val = 0;
+        error_report("NVMM: Unexpected RDMSR 0x%x, ignored",
+            exit->u.rdmsr.msr);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RAX] = (val & 0xFFFFFFFF);
+    state->gprs[NVMM_X64_GPR_RDX] = (val >> 32);
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.rdmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_wrmsr(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_x64_state *state = vcpu->state;
+    uint64_t val;
+    int ret;
+
+    val = exit->u.wrmsr.val;
+
+    switch (exit->u.wrmsr.msr) {
+    case MSR_IA32_APICBASE:
+        cpu_set_apic_base(x86_cpu->apic_state, val);
+        break;
+    case MSR_MTRRdefType:
+    case MSR_MCG_STATUS:
+        break;
+    default: /* More MSRs to add? */
+        error_report("NVMM: Unexpected WRMSR 0x%x [val=0x%lx], ignored",
+            exit->u.wrmsr.msr, val);
+        break;
+    }
+
+    ret = nvmm_vcpu_getstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    state->gprs[NVMM_X64_GPR_RIP] = exit->u.wrmsr.npc;
+
+    ret = nvmm_vcpu_setstate(mach, vcpu, NVMM_X64_STATE_GPRS);
+    if (ret == -1) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int
+nvmm_handle_halted(struct nvmm_machine *mach, CPUState *cpu,
+    struct nvmm_vcpu_exit *exit)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    int ret = 0;
+
+    qemu_mutex_lock_iothread();
+
+    if (!((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+          (env->eflags & IF_MASK)) &&
+        !(cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->exception_index = EXCP_HLT;
+        cpu->halted = true;
+        ret = 1;
+    }
+
+    qemu_mutex_unlock_iothread();
+
+    return ret;
+}
+
+static int
+nvmm_inject_ud(struct nvmm_machine *mach, struct nvmm_vcpu *vcpu)
+{
+    struct nvmm_vcpu_event *event = vcpu->event;
+
+    event->type = NVMM_VCPU_EVENT_EXCP;
+    event->vector = 6;
+    event->u.excp.error = 0;
+
+    return nvmm_vcpu_inject(mach, vcpu);
+}
+
+static int
+nvmm_vcpu_loop(CPUState *cpu)
+{
+    struct CPUX86State *env = (CPUArchState *)cpu->env_ptr;
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+    struct nvmm_vcpu *vcpu = &qcpu->vcpu;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    struct nvmm_vcpu_exit *exit = vcpu->exit;
+    int ret;
+
+    /*
+     * Some asynchronous events must be handled outside of the inner
+     * VCPU loop. They are handled here.
+     */
+    if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_init(x86_cpu);
+        /* set int/nmi windows back to the reset state */
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
+        apic_poll_irq(x86_cpu->apic_state);
+    }
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = false;
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_SIPI) {
+        nvmm_cpu_synchronize_state(cpu);
+        do_cpu_sipi(x86_cpu);
+    }
+    if (cpu->interrupt_request & CPU_INTERRUPT_TPR) {
+        cpu->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        nvmm_cpu_synchronize_state(cpu);
+        apic_handle_tpr_access_report(x86_cpu->apic_state, env->eip,
+            env->tpr_access_type);
+    }
+
+    if (cpu->halted) {
+        cpu->exception_index = EXCP_HLT;
+        atomic_set(&cpu->exit_request, false);
+        return 0;
+    }
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    /*
+     * Inner VCPU loop.
+     */
+    do {
+        if (cpu->vcpu_dirty) {
+            nvmm_set_registers(cpu);
+            cpu->vcpu_dirty = false;
+        }
+
+        if (qcpu->stop) {
+            cpu->exception_index = EXCP_INTERRUPT;
+            qcpu->stop = false;
+            ret = 1;
+            break;
+        }
+
+        nvmm_vcpu_pre_run(cpu);
+
+        if (atomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        ret = nvmm_vcpu_run(mach, vcpu);
+        if (ret == -1) {
+            error_report("NVMM: Failed to exec a virtual processor,"
+                " error=%d", errno);
+            break;
+        }
+
+        nvmm_vcpu_post_run(cpu, exit);
+
+        switch (exit->reason) {
+        case NVMM_VCPU_EXIT_NONE:
+            break;
+        case NVMM_VCPU_EXIT_MEMORY:
+            ret = nvmm_handle_mem(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_IO:
+            ret = nvmm_handle_io(mach, vcpu);
+            break;
+        case NVMM_VCPU_EXIT_INT_READY:
+        case NVMM_VCPU_EXIT_NMI_READY:
+        case NVMM_VCPU_EXIT_TPR_CHANGED:
+            break;
+        case NVMM_VCPU_EXIT_HALTED:
+            ret = nvmm_handle_halted(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_SHUTDOWN:
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            cpu->exception_index = EXCP_INTERRUPT;
+            ret = 1;
+            break;
+        case NVMM_VCPU_EXIT_RDMSR:
+            ret = nvmm_handle_rdmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_WRMSR:
+            ret = nvmm_handle_wrmsr(mach, cpu, exit);
+            break;
+        case NVMM_VCPU_EXIT_MONITOR:
+        case NVMM_VCPU_EXIT_MWAIT:
+            ret = nvmm_inject_ud(mach, vcpu);
+            break;
+        default:
+            error_report("NVMM: Unexpected VM exit code 0x%lx [hw=0x%lx]",
+                exit->reason, exit->u.inv.hwcode);
+            nvmm_get_registers(cpu);
+            qemu_mutex_lock_iothread();
+            qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+            qemu_mutex_unlock_iothread();
+            ret = -1;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+    current_cpu = cpu;
+
+    atomic_set(&cpu->exit_request, false);
+
+    return ret < 0;
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+do_nvmm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_get_registers(cpu);
+    cpu->vcpu_dirty = true;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+{
+    nvmm_set_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+static void
+do_nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void nvmm_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_nvmm_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_nvmm_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static Error *nvmm_migration_blocker;
+
+static void
+nvmm_ipi_signal(int sigcpu)
+{
+    struct qemu_vcpu *qcpu;
+
+    if (current_cpu) {
+        qcpu = get_qemu_vcpu(current_cpu);
+        qcpu->stop = true;
+    }
+}
+
+static void
+nvmm_init_cpu_signals(void)
+{
+    struct sigaction sigact;
+    sigset_t set;
+
+    /* Install the IPI handler. */
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = nvmm_ipi_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    /* Allow IPIs on the current thread. */
+    sigprocmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
+int
+nvmm_init_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct nvmm_vcpu_conf_cpuid cpuid;
+    struct nvmm_vcpu_conf_tpr tpr;
+    Error *local_error = NULL;
+    struct qemu_vcpu *qcpu;
+    int ret, err;
+
+    nvmm_init_cpu_signals();
+
+    if (nvmm_migration_blocker == NULL) {
+        error_setg(&nvmm_migration_blocker,
+            "NVMM: Migration not supported");
+
+        (void)migrate_add_blocker(nvmm_migration_blocker, &local_error);
+        if (local_error) {
+            error_report_err(local_error);
+            migrate_del_blocker(nvmm_migration_blocker);
+            error_free(nvmm_migration_blocker);
+            return -EINVAL;
+        }
+    }
+
+    qcpu = g_malloc0(sizeof(*qcpu));
+    if (qcpu == NULL) {
+        error_report("NVMM: Failed to allocate VCPU context.");
+        return -ENOMEM;
+    }
+
+    ret = nvmm_vcpu_create(mach, cpu->cpu_index, &qcpu->vcpu);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to create a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    memset(&cpuid, 0, sizeof(cpuid));
+    cpuid.mask = 1;
+    cpuid.leaf = 0x00000001;
+    cpuid.u.mask.set.edx = CPUID_MCE | CPUID_MCA | CPUID_MTRR;
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CPUID,
+        &cpuid);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_CALLBACKS,
+        &nvmm_callbacks);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Failed to configure a virtual processor,"
+            " error=%d", err);
+        g_free(qcpu);
+        return -err;
+    }
+
+    if (qemu_mach.cap.arch.vcpu_conf_support & NVMM_CAP_ARCH_VCPU_CONF_TPR) {
+        memset(&tpr, 0, sizeof(tpr));
+        tpr.exit_changed = 1;
+        ret = nvmm_vcpu_configure(mach, &qcpu->vcpu, NVMM_VCPU_CONF_TPR, &tpr);
+        if (ret == -1) {
+            err = errno;
+            error_report("NVMM: Failed to configure a virtual processor,"
+                " error=%d", err);
+            g_free(qcpu);
+            return -err;
+        }
+    }
+
+    cpu->vcpu_dirty = true;
+    cpu->hax_vcpu = (struct hax_vcpu_state *)qcpu;
+
+    return 0;
+}
+
+int
+nvmm_vcpu_exec(CPUState *cpu)
+{
+    int ret, fatal;
+
+    while (1) {
+        if (cpu->exception_index >= EXCP_INTERRUPT) {
+            ret = cpu->exception_index;
+            cpu->exception_index = -1;
+            break;
+        }
+
+        fatal = nvmm_vcpu_loop(cpu);
+
+        if (fatal) {
+            error_report("NVMM: Failed to execute a VCPU.");
+            abort();
+        }
+    }
+
+    return ret;
+}
+
+void
+nvmm_destroy_vcpu(CPUState *cpu)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+
+    nvmm_vcpu_destroy(mach, &qcpu->vcpu);
+    g_free(cpu->hax_vcpu);
+}
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_update_mapping(hwaddr start_pa, ram_addr_t size, uintptr_t hva,
+    bool add, bool rom, const char *name)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    int ret, prot;
+
+    if (add) {
+        prot = PROT_READ | PROT_EXEC;
+        if (!rom) {
+            prot |= PROT_WRITE;
+        }
+        ret = nvmm_gpa_map(mach, hva, start_pa, size, prot);
+    } else {
+        ret = nvmm_gpa_unmap(mach, hva, start_pa, size);
+    }
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to %s GPA range '%s' PA:%p, "
+            "Size:%p bytes, HostVA:%p, error=%d",
+            (add ? "map" : "unmap"), name, (void *)(uintptr_t)start_pa,
+            (void *)size, (void *)hva, errno);
+    }
+}
+
+static void
+nvmm_process_section(MemoryRegionSection *section, int add)
+{
+    MemoryRegion *mr = section->mr;
+    hwaddr start_pa = section->offset_within_address_space;
+    ram_addr_t size = int128_get64(section->size);
+    unsigned int delta;
+    uintptr_t hva;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    /* Adjust start_pa and size so that they are page-aligned. */
+    delta = qemu_real_host_page_size - (start_pa & ~qemu_real_host_page_mask);
+    delta &= ~qemu_real_host_page_mask;
+    if (delta > size) {
+        return;
+    }
+    start_pa += delta;
+    size -= delta;
+    size &= qemu_real_host_page_mask;
+    if (!size || (start_pa & ~qemu_real_host_page_mask)) {
+        return;
+    }
+
+    hva = (uintptr_t)memory_region_get_ram_ptr(mr) +
+        section->offset_within_region + delta;
+
+    nvmm_update_mapping(start_pa, size, hva, add,
+        memory_region_is_rom(mr), mr->name);
+}
+
+static void
+nvmm_region_add(MemoryListener *listener, MemoryRegionSection *section)
+{
+    memory_region_ref(section->mr);
+    nvmm_process_section(section, 1);
+}
+
+static void
+nvmm_region_del(MemoryListener *listener, MemoryRegionSection *section)
+{
+    nvmm_process_section(section, 0);
+    memory_region_unref(section->mr);
+}
+
+static void
+nvmm_transaction_begin(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_transaction_commit(MemoryListener *listener)
+{
+    /* nothing */
+}
+
+static void
+nvmm_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr)) {
+        return;
+    }
+
+    memory_region_set_dirty(mr, 0, int128_get64(section->size));
+}
+
+static MemoryListener nvmm_memory_listener = {
+    .begin = nvmm_transaction_begin,
+    .commit = nvmm_transaction_commit,
+    .region_add = nvmm_region_add,
+    .region_del = nvmm_region_del,
+    .log_sync = nvmm_log_sync,
+    .priority = 10,
+};
+
+static void
+nvmm_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+    struct nvmm_machine *mach = get_nvmm_mach();
+    uintptr_t hva = (uintptr_t)host;
+    int ret;
+
+    ret = nvmm_hva_map(mach, hva, size);
+
+    if (ret == -1) {
+        error_report("NVMM: Failed to map HVA, HostVA:%p "
+            "Size:%p bytes, error=%d",
+            (void *)hva, (void *)size, errno);
+    }
+}
+
+static struct RAMBlockNotifier nvmm_ram_notifier = {
+    .ram_block_added = nvmm_ram_block_added
+};
+
+/* -------------------------------------------------------------------------- */
+
+static void
+nvmm_handle_interrupt(CPUState *cpu, int mask)
+{
+    cpu->interrupt_request |= mask;
+
+    if (!qemu_cpu_is_self(cpu)) {
+        qemu_cpu_kick(cpu);
+    }
+}
+
+/* -------------------------------------------------------------------------- */
+
+static int
+nvmm_accel_init(MachineState *ms)
+{
+    int ret, err;
+
+    ret = nvmm_init();
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Initialization failed, error=%d", errno);
+        return -err;
+    }
+
+    ret = nvmm_capability(&qemu_mach.cap);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Unable to fetch capability, error=%d", errno);
+        return -err;
+    }
+    if (qemu_mach.cap.version != NVMM_KERN_VERSION) {
+        error_report("NVMM: Unsupported version %u", qemu_mach.cap.version);
+        return -EPROGMISMATCH;
+    }
+    if (qemu_mach.cap.state_size != sizeof(struct nvmm_x64_state)) {
+        error_report("NVMM: Wrong state size %u", qemu_mach.cap.state_size);
+        return -EPROGMISMATCH;
+    }
+
+    ret = nvmm_machine_create(&qemu_mach.mach);
+    if (ret == -1) {
+        err = errno;
+        error_report("NVMM: Machine creation failed, error=%d", errno);
+        return -err;
+    }
+
+    memory_listener_register(&nvmm_memory_listener, &address_space_memory);
+    ram_block_notifier_add(&nvmm_ram_notifier);
+
+    cpu_interrupt_handler = nvmm_handle_interrupt;
+
+    printf("NetBSD Virtual Machine Monitor accelerator is operational\n");
+    return 0;
+}
+
+int
+nvmm_enabled(void)
+{
+    return nvmm_allowed;
+}
+
+static void
+nvmm_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "NVMM";
+    ac->init_machine = nvmm_accel_init;
+    ac->allowed = &nvmm_allowed;
+}
+
+static const TypeInfo nvmm_accel_type = {
+    .name = ACCEL_CLASS_NAME("nvmm"),
+    .parent = TYPE_ACCEL,
+    .class_init = nvmm_accel_class_init,
+};
+
+static void
+nvmm_type_init(void)
+{
+    type_register_static(&nvmm_accel_type);
+}
+
+type_init(nvmm_type_init);
--
2.28.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v5 4/4] Add the NVMM acceleration enlightenments
  2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
  2020-08-11 13:01           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
  2020-08-11 13:01           ` [PATCH v5 3/4] Introduce the NVMM impl Kamil Rytarowski
@ 2020-08-11 13:01           ` Kamil Rytarowski
  2020-09-04 23:28           ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
  3 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-08-11 13:01 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: Kamil Rytarowski, qemu-devel

From: Maxime Villard <max@m00nbsd.net>

Implements the NVMM accelerator cpu enlightenments to actually use the nvmm-all
accelerator on NetBSD platforms.

Signed-off-by: Maxime Villard <max@m00nbsd.net>
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Jared McNeill <jmcneill@invisible.ca>
---
 include/sysemu/hw_accel.h | 14 ++++++++++
 softmmu/cpus.c            | 58 +++++++++++++++++++++++++++++++++++++++
 target/i386/helper.c      |  2 +-
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
index e128f8b06b..9e19f5794c 100644
--- a/include/sysemu/hw_accel.h
+++ b/include/sysemu/hw_accel.h
@@ -16,6 +16,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"

 static inline void cpu_synchronize_state(CPUState *cpu)
 {
@@ -31,6 +32,9 @@ static inline void cpu_synchronize_state(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_state(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_state(cpu);
+    }
 }

 static inline void cpu_synchronize_post_reset(CPUState *cpu)
@@ -47,6 +51,10 @@ static inline void cpu_synchronize_post_reset(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_reset(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_reset(cpu);
+    }
+
 }

 static inline void cpu_synchronize_post_init(CPUState *cpu)
@@ -63,6 +71,9 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_post_init(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_post_init(cpu);
+    }
 }

 static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
@@ -79,6 +90,9 @@ static inline void cpu_synchronize_pre_loadvm(CPUState *cpu)
     if (whpx_enabled()) {
         whpx_cpu_synchronize_pre_loadvm(cpu);
     }
+    if (nvmm_enabled()) {
+        nvmm_cpu_synchronize_pre_loadvm(cpu);
+    }
 }

 #endif /* QEMU_HW_ACCEL_H */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index a802e899ab..3b44b92830 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -43,6 +43,7 @@
 #include "sysemu/hax.h"
 #include "sysemu/hvf.h"
 #include "sysemu/whpx.h"
+#include "sysemu/nvmm.h"
 #include "exec/exec-all.h"

 #include "qemu/thread.h"
@@ -1621,6 +1622,48 @@ static void *qemu_whpx_cpu_thread_fn(void *arg)
     return NULL;
 }

+static void *qemu_nvmm_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    assert(nvmm_enabled());
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+
+    r = nvmm_init_vcpu(cpu);
+    if (r < 0) {
+        fprintf(stderr, "nvmm_init_vcpu failed: %s\n", strerror(-r));
+        exit(1);
+    }
+
+    /* signal CPU creation */
+    cpu->created = true;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = nvmm_vcpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    nvmm_destroy_vcpu(cpu);
+    cpu->created = false;
+    qemu_cond_signal(&qemu_cpu_cond);
+    qemu_mutex_unlock_iothread();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 #ifdef _WIN32
 static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 {
@@ -1998,6 +2041,19 @@ static void qemu_whpx_start_vcpu(CPUState *cpu)
 #endif
 }

+static void qemu_nvmm_start_vcpu(CPUState *cpu)
+{
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+    qemu_cond_init(cpu->halt_cond);
+    snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/NVMM",
+             cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, qemu_nvmm_cpu_thread_fn,
+                       cpu, QEMU_THREAD_JOINABLE);
+}
+
 static void qemu_dummy_start_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -2038,6 +2094,8 @@ void qemu_init_vcpu(CPUState *cpu)
         qemu_tcg_init_vcpu(cpu);
     } else if (whpx_enabled()) {
         qemu_whpx_start_vcpu(cpu);
+    } else if (nvmm_enabled()) {
+        qemu_nvmm_start_vcpu(cpu);
     } else {
         qemu_dummy_start_vcpu(cpu);
     }
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 70be53e2c3..c2f1aef65c 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -983,7 +983,7 @@ void cpu_report_tpr_access(CPUX86State *env, TPRAccess access)
     X86CPU *cpu = env_archcpu(env);
     CPUState *cs = env_cpu(env);

-    if (kvm_enabled() || whpx_enabled()) {
+    if (kvm_enabled() || whpx_enabled() || nvmm_enabled()) {
         env->tpr_access_type = access;

         cpu_interrupt(cs, CPU_INTERRUPT_TPR);
--
2.28.0



^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v5 1/4] Add the NVMM vcpu API
  2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
                             ` (2 preceding siblings ...)
  2020-08-11 13:01           ` [PATCH v5 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
@ 2020-09-04 23:28           ` Kamil Rytarowski
  3 siblings, 0 replies; 79+ messages in thread
From: Kamil Rytarowski @ 2020-09-04 23:28 UTC (permalink / raw)
  To: rth, ehabkost, slp, pbonzini, peter.maydell, philmd, max, jmcneill
  Cc: qemu-devel

Ping?

On 11.08.2020 15:01, Kamil Rytarowski wrote:
> From: Maxime Villard <max@m00nbsd.net>
> 
> Adds support for the NetBSD Virtual Machine Monitor (NVMM) stubs and
> introduces the nvmm.h sysemu API for managing the vcpu scheduling and
> management.
> 
> Signed-off-by: Maxime Villard <max@m00nbsd.net>
> Signed-off-by: Kamil Rytarowski <n54@gmx.com>
> Reviewed-by: Sergio Lopez <slp@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> Tested-by: Jared McNeill <jmcneill@invisible.ca>
> ---
>  accel/stubs/Makefile.objs |  1 +
>  accel/stubs/nvmm-stub.c   | 43 +++++++++++++++++++++++++++++++++++++++
>  include/sysemu/nvmm.h     | 35 +++++++++++++++++++++++++++++++
>  3 files changed, 79 insertions(+)
>  create mode 100644 accel/stubs/nvmm-stub.c
>  create mode 100644 include/sysemu/nvmm.h
> 
> diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
> index bbd14e71fb..38660a0b9b 100644
> --- a/accel/stubs/Makefile.objs
> +++ b/accel/stubs/Makefile.objs
> @@ -1,6 +1,7 @@
>  obj-$(call lnot,$(CONFIG_HAX))  += hax-stub.o
>  obj-$(call lnot,$(CONFIG_HVF))  += hvf-stub.o
>  obj-$(call lnot,$(CONFIG_WHPX)) += whpx-stub.o
> +obj-$(call lnot,$(CONFIG_NVMM)) += nvmm-stub.o
>  obj-$(call lnot,$(CONFIG_KVM))  += kvm-stub.o
>  obj-$(call lnot,$(CONFIG_TCG))  += tcg-stub.o
>  obj-$(call lnot,$(CONFIG_XEN))  += xen-stub.o
> diff --git a/accel/stubs/nvmm-stub.c b/accel/stubs/nvmm-stub.c
> new file mode 100644
> index 0000000000..c2208b84a3
> --- /dev/null
> +++ b/accel/stubs/nvmm-stub.c
> @@ -0,0 +1,43 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator stub.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "cpu.h"
> +#include "sysemu/nvmm.h"
> +
> +int nvmm_init_vcpu(CPUState *cpu)
> +{
> +    return -1;
> +}
> +
> +int nvmm_vcpu_exec(CPUState *cpu)
> +{
> +    return -1;
> +}
> +
> +void nvmm_destroy_vcpu(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_state(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_post_reset(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_post_init(CPUState *cpu)
> +{
> +}
> +
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *cpu)
> +{
> +}
> diff --git a/include/sysemu/nvmm.h b/include/sysemu/nvmm.h
> new file mode 100644
> index 0000000000..10496f3980
> --- /dev/null
> +++ b/include/sysemu/nvmm.h
> @@ -0,0 +1,35 @@
> +/*
> + * Copyright (c) 2018-2019 Maxime Villard, All rights reserved.
> + *
> + * NetBSD Virtual Machine Monitor (NVMM) accelerator support.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_NVMM_H
> +#define QEMU_NVMM_H
> +
> +#include "config-host.h"
> +#include "qemu-common.h"
> +
> +int nvmm_init_vcpu(CPUState *);
> +int nvmm_vcpu_exec(CPUState *);
> +void nvmm_destroy_vcpu(CPUState *);
> +
> +void nvmm_cpu_synchronize_state(CPUState *);
> +void nvmm_cpu_synchronize_post_reset(CPUState *);
> +void nvmm_cpu_synchronize_post_init(CPUState *);
> +void nvmm_cpu_synchronize_pre_loadvm(CPUState *);
> +
> +#ifdef CONFIG_NVMM
> +
> +int nvmm_enabled(void);
> +
> +#else /* CONFIG_NVMM */
> +
> +#define nvmm_enabled() (0)
> +
> +#endif /* CONFIG_NVMM */
> +
> +#endif /* CONFIG_NVMM */
> --
> 2.28.0
> 



^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2020-09-04 23:33 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200107124903.16505-1-n54@gmx.com>
2020-01-28 14:09 ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-01-28 14:09   ` [PATCH v2 1/4] Add the NVMM vcpu API Kamil Rytarowski
2020-02-03 11:42     ` Philippe Mathieu-Daudé
2020-01-28 14:09   ` [PATCH v2 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-02-03 11:41     ` Philippe Mathieu-Daudé
2020-02-03 11:56       ` Kamil Rytarowski
2020-02-03 12:10         ` Philippe Mathieu-Daudé
2020-03-02 17:12         ` Paolo Bonzini
2020-03-02 18:05           ` Kamil Rytarowski
2020-03-02 19:14             ` Maxime Villard
2020-03-02 19:40               ` Paolo Bonzini
2020-03-02 21:10                 ` Kamil Rytarowski
2020-03-02 22:45                   ` Paolo Bonzini
2020-03-02 17:11       ` Paolo Bonzini
2020-03-02 18:09         ` Kamil Rytarowski
2020-01-28 14:09   ` [PATCH v2 3/4] Introduce the NVMM impl Kamil Rytarowski
2020-02-03 11:51     ` Philippe Mathieu-Daudé
2020-02-05 17:22       ` Kamil Rytarowski
2020-02-05 17:47       ` Maxime Villard
2020-01-28 14:09   ` [PATCH v2 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
2020-02-03 11:54     ` Philippe Mathieu-Daudé
2020-02-06 10:24       ` Kamil Rytarowski
2020-02-06 12:18         ` Philippe Mathieu-Daudé
2020-02-06 13:06         ` Markus Armbruster
2020-02-06 13:09           ` Philippe Mathieu-Daudé
2020-02-06 13:31             ` Kamil Rytarowski
2020-02-06 14:13               ` Markus Armbruster
2020-02-06 15:38                 ` Kamil Rytarowski
2020-02-06 16:07                   ` Philippe Mathieu-Daudé
2020-02-06 16:59                     ` Kamil Rytarowski
2020-02-03  9:52   ` [PATCH v2 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-02-06 11:57   ` [PATCH v3 " Kamil Rytarowski
2020-02-06 11:57     ` [PATCH v3 1/4] Add the NVMM vcpu API Kamil Rytarowski
2020-02-06 21:06       ` Jared McNeill
2020-02-06 11:57     ` [PATCH v3 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-02-06 21:06       ` Jared McNeill
2020-02-06 11:57     ` [PATCH v3 3/4] Introduce the NVMM impl Kamil Rytarowski
2020-02-06 21:07       ` Jared McNeill
2020-02-06 11:57     ` [PATCH v3 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
2020-02-06 21:07       ` Jared McNeill
2020-02-06 13:13     ` [PATCH v3 0/4] Implements the NetBSD Virtual Machine Monitor accelerator no-reply
2020-02-06 13:21       ` Kamil Rytarowski
2020-02-06 16:01         ` Philippe Mathieu-Daudé
2020-02-06 21:32     ` [PATCH v4 " Kamil Rytarowski
2020-02-06 21:32       ` [PATCH v4 1/4] Add the NVMM vcpu API Kamil Rytarowski
2020-08-11 12:47         ` [PATCH v5 " Kamil Rytarowski
2020-08-11 12:47           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-08-11 12:47           ` [PATCH v5 3/4] Introduce the NVMM impl Kamil Rytarowski
2020-08-11 12:47           ` [PATCH v5 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
2020-08-11 13:01         ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
2020-08-11 13:01           ` [PATCH v5 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-08-11 13:01           ` [PATCH v5 3/4] Introduce the NVMM impl Kamil Rytarowski
2020-08-11 13:01           ` [PATCH v5 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
2020-09-04 23:28           ` [PATCH v5 1/4] Add the NVMM vcpu API Kamil Rytarowski
2020-02-06 21:32       ` [PATCH v4 2/4] Add the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-02-06 21:32       ` [PATCH v4 3/4] Introduce the NVMM impl Kamil Rytarowski
2020-02-06 23:28         ` [PATCH v4 3/4 FIXUP] " Kamil Rytarowski
2020-03-02 18:13         ` [PATCH v4 3/4] " Paolo Bonzini
2020-03-02 19:28           ` Maxime Villard
2020-03-02 19:35             ` Paolo Bonzini
2020-03-10  6:45               ` Maxime Villard
2020-03-10 10:15                 ` Kamil Rytarowski
2020-03-10 10:58                 ` Paolo Bonzini
2020-03-10 19:14                   ` Maxime Villard
2020-03-11 18:03                     ` Paolo Bonzini
2020-03-11 20:14                       ` Maxime Villard
2020-03-11 20:42                         ` Paolo Bonzini
2020-03-11 21:21                           ` Maxime Villard
2020-03-11 21:22                             ` Kamil Rytarowski
2020-03-11 21:44                             ` Paolo Bonzini
2020-03-12  7:08                               ` Maxime Villard
2020-07-21 13:42                 ` Kamil Rytarowski
2020-02-06 21:32       ` [PATCH v4 4/4] Add the NVMM acceleration enlightenments Kamil Rytarowski
2020-02-17  9:07       ` [PATCH v4 0/4] Implements the NetBSD Virtual Machine Monitor accelerator Kamil Rytarowski
2020-02-24 15:17         ` Kamil Rytarowski
2020-03-02 17:02           ` Kamil Rytarowski
2020-03-02 17:10             ` Eduardo Habkost
2020-03-02 17:10               ` Kamil Rytarowski
2020-03-02 17:22                 ` Eduardo Habkost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.