[Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state
@ 2019-06-19 16:21 Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure Liran Alon
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, pbonzini, rth, jmattson

Hi,

This series aims to add support for QEMU to be able to migrate VMs that
are running nested hypervisors. In order to do so, it utilizes the new
IOCTLs introduced in KVM commit 8fcc4b5923af ("kvm: nVMX: Introduce
KVM_CAP_NESTED_STATE") which was created for this purpose.

1st patch add a missed cleanup for deleting VMX migration blocker in
case vCPU init fails.

2st patch introduce kvm_arch_destroy_vcpu() to perform per-vCPU
destruction logic that is arch-dependent.

3st patch is just refactoring to use symbolic constants instead of hard-coded
numbers.

4st patch fixes QEMU to update DR6 when QEMU re-inject #DB to guest after
it was intercepted by KVM when guest is debugged.

5th patch adds migration blocker for vCPU exposed with either Intel VMX
or AMD SVM. Until now it was blocked only for Intel VMX.

6rd patch updates linux-headers to have updated struct kvm_nested_state.
The updated struct now have explicit fields for the data portion.

7rd patch add vmstate support for saving/restoring kernel integer types (e.g. __u16).

8th patch adds support for saving and restoring nested state in order to migrate
guests which run a nested hypervisor.

9th patch add support for KVM_CAP_EXCEPTION_PAYLOAD. This new KVM capability
allows userspace to properly distingiush between pending and injecting exceptions.

10th patch changes the nested virtualization migration blocker to only
be added when kernel lack support for one of the capabilities required
for correct nested migration. i.e. Either KVM_CAP_NESTED_STATE or
KVM_CAP_EXCEPTION_PAYLOAD.

Regards,
-Liran

v1->v2 changes:
* Add patch to fix bug when re-inject #DB to guest.
* Add support for KVM_CAP_EXCEPTION_PAYLOAD.
* Use explicit fields for struct kvm_nested_state data portion.
* Use vmstate subsections to save/restore nested state in order to properly
* support forward & backwards migration compatability.
* Remove VMX migration blocker.

v2->v3 changes:
* Add kvm_arch_destroy_vcpu().
* Use DR6_BS where appropriate.
* Add cpu_pre_save() logic to convert pending exception to injected
  exception if guest is running L2.
* Converted max_nested_state_len to int instead of uint32_t.
* Use kvm_arch_destroy_vcpu() to free nested_state.
* Add migration blocker for vCPU exposed with AMD SVM.
* Don't rely on CR4 or MSR_EFER to know if it is required to
migrate new VMState subsections.
* Signal if vCPU is in guest-mode in hflags as original intention by Paolo.

v3->v4 changes:
* Add delete of nested migration blocker in case vCPU init fail.
* Change detection of VMX/SVM to not consider CPU vendor.
* Modify linux-headers to not have SVM stubs and use constant for vmcs12 size.
* Wrapped definition of kernel integer vmstate macros with #ifdef CONFIG_LINUX.
* Add migration blocker in case vCPU is exposed with VMX/SVM and kernel
  is lacking one of the KVM caps required to migrate nested workloads.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 20:30   ` Maran Wilson
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 02/10] KVM: Introduce kvm_arch_destroy_vcpu() Liran Alon
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	pbonzini, rth, jmattson

Commit d98f26073beb ("target/i386: kvm: add VMX migration blocker")
added migration blocker for vCPU exposed with Intel VMX because QEMU
doesn't yet contain code to support migration of nested virtualization
workloads.

However, that commit missed adding deletion of the migration blocker in
case init of vCPU failed. Similar to invtsc_mig_blocker. This commit fix
that issue.

Fixes: d98f26073beb ("target/i386: kvm: add VMX migration blocker")
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/kvm.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 3b29ce5c0d08..7aa7914a498c 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -940,7 +940,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
     r = kvm_arch_set_tsc_khz(cs);
     if (r < 0) {
-        goto fail;
+        return r;
     }
 
     /* vcpu's TSC frequency is either specified by user, or following
@@ -1295,7 +1295,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             if (local_err) {
                 error_report_err(local_err);
                 error_free(invtsc_mig_blocker);
-                return r;
+                goto fail2;
             }
         }
     }
@@ -1346,6 +1346,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
  fail:
     migrate_del_blocker(invtsc_mig_blocker);
+ fail2:
+    migrate_del_blocker(vmx_mig_blocker);
+
     return r;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 02/10] KVM: Introduce kvm_arch_destroy_vcpu()
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 03/10] target/i386: kvm: Use symbolic constant for #DB/#BP exception constants Liran Alon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	pbonzini, rth, jmattson

Simiar to how kvm_init_vcpu() calls kvm_arch_init_vcpu() to perform
arch-dependent initialisation, introduce kvm_arch_destroy_vcpu()
to be called from kvm_destroy_vcpu() to perform arch-dependent
destruction.

This was added because some architectures (Such as i386)
currently do not free memory that it have allocated in
kvm_arch_init_vcpu().

Suggested-by: Maran Wilson <maran.wilson@oracle.com>
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 accel/kvm/kvm-all.c  |  5 +++++
 include/sysemu/kvm.h |  1 +
 target/arm/kvm32.c   |  5 +++++
 target/arm/kvm64.c   |  5 +++++
 target/i386/kvm.c    | 12 ++++++++++++
 target/mips/kvm.c    |  5 +++++
 target/ppc/kvm.c     |  5 +++++
 target/s390x/kvm.c   | 10 ++++++++++
 8 files changed, 48 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 524c4ddfbd0f..59a3aa3a40da 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -292,6 +292,11 @@ int kvm_destroy_vcpu(CPUState *cpu)
 
     DPRINTF("kvm_destroy_vcpu\n");
 
+    ret = kvm_arch_destroy_vcpu(cpu);
+    if (ret < 0) {
+        goto err;
+    }
+
     mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
         ret = mmap_size;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a6d1cd190fed..64f55e519df7 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -371,6 +371,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
 int kvm_arch_init_vcpu(CPUState *cpu);
+int kvm_arch_destroy_vcpu(CPUState *cpu);
 
 bool kvm_vcpu_id_is_valid(int vcpu_id);
 
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 4e54e372a668..51f78f722b18 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -240,6 +240,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return kvm_arm_init_cpreg_list(cpu);
 }
 
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+	return 0;
+}
+
 typedef struct Reg {
     uint64_t id;
     int offset;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 998d21f399f4..22d19c9aec6f 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -654,6 +654,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return kvm_arm_init_cpreg_list(cpu);
 }
 
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+    return 0;
+}
+
 bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 {
     /* Return true if the regidx is a register we should synchronize
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 7aa7914a498c..efbecfc9d7f0 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1352,6 +1352,18 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return r;
 }
 
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+    X86CPU *cpu = X86_CPU(cs);
+
+    if (cpu->kvm_msr_buf) {
+        g_free(cpu->kvm_msr_buf);
+        cpu->kvm_msr_buf = NULL;
+    }
+
+    return 0;
+}
+
 void kvm_arch_reset_vcpu(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index 8e72850962e1..938f8f144b74 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -91,6 +91,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return ret;
 }
 
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+    return 0;
+}
+
 void kvm_mips_reset_vcpu(MIPSCPU *cpu)
 {
     CPUMIPSState *env = &cpu->env;
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 3bf0a46c3352..1967ccc51791 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -521,6 +521,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return ret;
 }
 
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+    return 0;
+}
+
 static void kvm_sw_tlb_put(PowerPCCPU *cpu)
 {
     CPUPPCState *env = &cpu->env;
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index e5e2b691f253..c2747c31649b 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -368,6 +368,16 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return 0;
 }
 
+int kvm_arch_destroy_vcpu(CPUState *cs)
+{
+    S390CPU *cpu = S390_CPU(cs);
+
+    g_free(cpu->irqstate);
+    cpu->irqstate = NULL;
+
+    return 0;
+}
+
 void kvm_s390_reset_vcpu(S390CPU *cpu)
 {
     CPUState *cs = CPU(cpu);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 03/10] target/i386: kvm: Use symbolic constant for #DB/#BP exception constants
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 02/10] KVM: Introduce kvm_arch_destroy_vcpu() Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 04/10] target/i386: kvm: Re-inject #DB to guest with updated DR6 Liran Alon
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	Nikita Leshenko, Krish Sadhukhan, pbonzini, rth, jmattson

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/kvm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index efbecfc9d7f0..7b3b80833fdd 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -3009,9 +3009,9 @@ static int kvm_guest_debug_workarounds(X86CPU *cpu)
     unsigned long reinject_trap = 0;
 
     if (!kvm_has_vcpu_events()) {
-        if (env->exception_injected == 1) {
+        if (env->exception_injected == EXCP01_DB) {
             reinject_trap = KVM_GUESTDBG_INJECT_DB;
-        } else if (env->exception_injected == 3) {
+        } else if (env->exception_injected == EXCP03_INT3) {
             reinject_trap = KVM_GUESTDBG_INJECT_BP;
         }
         env->exception_injected = -1;
@@ -3523,8 +3523,8 @@ static int kvm_handle_debug(X86CPU *cpu,
     int ret = 0;
     int n;
 
-    if (arch_info->exception == 1) {
-        if (arch_info->dr6 & (1 << 14)) {
+    if (arch_info->exception == EXCP01_DB) {
+        if (arch_info->dr6 & DR6_BS) {
             if (cs->singlestep_enabled) {
                 ret = EXCP_DEBUG;
             }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 04/10] target/i386: kvm: Re-inject #DB to guest with updated DR6
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (2 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 03/10] target/i386: kvm: Use symbolic constant for #DB/#BP exception constants Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 05/10] target/i386: kvm: Block migration for vCPUs exposed with nested virtualization Liran Alon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	Nikita Leshenko, Krish Sadhukhan, pbonzini, rth, jmattson

If userspace (QEMU) debug guest, when #DB is raised in guest and
intercepted by KVM, KVM forwards information on #DB to userspace
instead of injecting #DB to guest.
While doing so, KVM don't update vCPU DR6 but instead report the #DB DR6
value to userspace for further handling.
See KVM's handle_exception() DB_VECTOR handler.

QEMU handler for this case is kvm_handle_debug(). This handler basically
checks if #DB is related to one of user set hardware breakpoints and if
not, it re-inject #DB into guest.
The re-injection is done by setting env->exception_injected to #DB which
will later be passed as events.exception.nr to KVM_SET_VCPU_EVENTS ioctl
by kvm_put_vcpu_events().

However, in case userspace re-injects #DB, KVM expects userspace to set
vCPU DR6 as reported to userspace when #DB was intercepted! Otherwise,
KVM_REQ_EVENT handler will inject #DB with wrong DR6 to guest.

Fix this issue by updating vCPU DR6 appropriately when re-inject #DB to
guest.

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/kvm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 7b3b80833fdd..fa01f18b28f0 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -3561,6 +3561,9 @@ static int kvm_handle_debug(X86CPU *cpu,
         /* pass to guest */
         env->exception_injected = arch_info->exception;
         env->has_error_code = 0;
+        if (arch_info->exception == EXCP01_DB) {
+            env->dr[6] = arch_info->dr6;
+        }
     }

     return ret;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 05/10] target/i386: kvm: Block migration for vCPUs exposed with nested virtualization
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (3 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 04/10] target/i386: kvm: Re-inject #DB to guest with updated DR6 Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 06/10] linux-headers: i386: Modify struct kvm_nested_state to have explicit fields for data Liran Alon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	pbonzini, rth, jmattson

Commit d98f26073beb ("target/i386: kvm: add VMX migration blocker")
added a migration blocker for vCPU exposed with Intel VMX.
However, migration should also be blocked for vCPU exposed with
AMD SVM.

Both cases should be blocked because QEMU should extract additional
vCPU state from KVM that should be migrated as part of vCPU VMState.
E.g. Whether vCPU is running in guest-mode or host-mode.

Fixes: d98f26073beb ("target/i386: kvm: add VMX migration blocker")
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/cpu.c |  6 ------
 target/i386/cpu.h | 22 ++++++++++++++++++++++
 target/i386/kvm.c | 14 +++++++-------
 3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 536d7d152044..197201087e65 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5170,12 +5170,6 @@ static int x86_cpu_filter_features(X86CPU *cpu)
     return rv;
 }
 
-#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \
-                           (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \
-                           (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
-#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
-                         (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
-                         (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
     CPUState *cs = CPU(dev);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index fce6660bac00..64bb7fdcb231 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -728,6 +728,13 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 
 #define CPUID_VENDOR_HYGON    "HygonGenuine"
 
+#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \
+                           (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \
+                           (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3)
+#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
+                         (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
+                         (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
+
 #define CPUID_MWAIT_IBE     (1U << 1) /* Interrupts can exit capability */
 #define CPUID_MWAIT_EMX     (1U << 0) /* enumeration supported */
 
@@ -1866,6 +1873,21 @@ static inline int32_t x86_get_a20_mask(CPUX86State *env)
     }
 }
 
+static inline bool cpu_has_vmx(CPUX86State *env)
+{
+    return (env->features[FEAT_1_ECX] & CPUID_EXT_VMX);
+}
+
+static inline bool cpu_has_svm(CPUX86State *env)
+{
+    return (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM);
+}
+
+static inline bool cpu_has_nested_virt(CPUX86State *env)
+{
+    return (cpu_has_vmx(env) || cpu_has_svm(env));
+}
+
 /* fpu_helper.c */
 void update_fp_status(CPUX86State *env);
 void update_mxcsr_status(CPUX86State *env);
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index fa01f18b28f0..b2041ce5ec3c 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -906,7 +906,7 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 }
 
 static Error *invtsc_mig_blocker;
-static Error *vmx_mig_blocker;
+static Error *nested_virt_mig_blocker;
 
 #define KVM_MAX_CPUID_ENTRIES  100
 
@@ -1270,13 +1270,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                   !!(c->ecx & CPUID_EXT_SMX);
     }
 
-    if ((env->features[FEAT_1_ECX] & CPUID_EXT_VMX) && !vmx_mig_blocker) {
-        error_setg(&vmx_mig_blocker,
-                   "Nested VMX virtualization does not support live migration yet");
-        r = migrate_add_blocker(vmx_mig_blocker, &local_err);
+    if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) {
+        error_setg(&nested_virt_mig_blocker,
+                   "Nested virtualization does not support live migration yet");
+        r = migrate_add_blocker(nested_virt_mig_blocker, &local_err);
         if (local_err) {
             error_report_err(local_err);
-            error_free(vmx_mig_blocker);
+            error_free(nested_virt_mig_blocker);
             return r;
         }
     }
@@ -1347,7 +1347,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
  fail:
     migrate_del_blocker(invtsc_mig_blocker);
  fail2:
-    migrate_del_blocker(vmx_mig_blocker);
+    migrate_del_blocker(nested_virt_mig_blocker);
 
     return r;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 06/10] linux-headers: i386: Modify struct kvm_nested_state to have explicit fields for data
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (4 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 05/10] target/i386: kvm: Block migration for vCPUs exposed with nested virtualization Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 21:17   ` Maran Wilson
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 07/10] vmstate: Add support for kernel integer types Liran Alon
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	pbonzini, rth, jmattson

Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format
of VMX nested state data in a struct.

In order to avoid changing the ioctl values of
KVM_{GET,SET}_NESTED_STATE, there is a need to preserve
sizeof(struct kvm_nested_state). This is done by defining the data
struct as "data.vmx[0]". It was the most elegant way I found to
preserve struct size while still keeping struct readable and easy to
maintain. It does have a misfortunate side-effect that now it has to be
accessed as "data.vmx[0]" rather than just "data.vmx".

Because we are already modifying these structs, I also modified the
following:
* Define the "format" field values as macros.
* Rename vmcs_pa to vmcs12_pa for better readability.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 linux-headers/asm-x86/kvm.h | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 7a0e64ccd6ff..6e7dd792e448 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -383,16 +383,26 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	(1 << 2)
 #define KVM_X86_QUIRK_OUT_7E_INC_RIP	(1 << 3)
 
+#define KVM_STATE_NESTED_FORMAT_VMX	0
+#define KVM_STATE_NESTED_FORMAT_SVM	1
+
 #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
 #define KVM_STATE_NESTED_EVMCS		0x00000004
 
+#define KVM_STATE_NESTED_VMX_VMCS_SIZE	0x1000
+
 #define KVM_STATE_NESTED_SMM_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_SMM_VMXON	0x00000002
 
-struct kvm_vmx_nested_state {
+struct kvm_vmx_nested_state_data {
+	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
+	__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
+};
+
+struct kvm_vmx_nested_state_hdr {
 	__u64 vmxon_pa;
-	__u64 vmcs_pa;
+	__u64 vmcs12_pa;
 
 	struct {
 		__u16 flags;
@@ -401,24 +411,25 @@ struct kvm_vmx_nested_state {
 
 /* for KVM_CAP_NESTED_STATE */
 struct kvm_nested_state {
-	/* KVM_STATE_* flags */
 	__u16 flags;
-
-	/* 0 for VMX, 1 for SVM.  */
 	__u16 format;
-
-	/* 128 for SVM, 128 + VMCS size for VMX.  */
 	__u32 size;
 
 	union {
-		/* VMXON, VMCS */
-		struct kvm_vmx_nested_state vmx;
+		struct kvm_vmx_nested_state_hdr vmx;
 
 		/* Pad the header to 128 bytes.  */
 		__u8 pad[120];
-	};
+	} hdr;
 
-	__u8 data[0];
+	/*
+	 * Define data region as 0 bytes to preserve backwards-compatability
+	 * to old definition of kvm_nested_state in order to avoid changing
+	 * KVM_{GET,PUT}_NESTED_STATE ioctl values.
+	 */
+	union {
+		struct kvm_vmx_nested_state_data vmx[0];
+	} data;
 };
 
 #endif /* _ASM_X86_KVM_H */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 07/10] vmstate: Add support for kernel integer types
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (5 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 06/10] linux-headers: i386: Modify struct kvm_nested_state to have explicit fields for data Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 17:37   ` Dr. David Alan Gilbert
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 08/10] target/i386: kvm: Add support for save and restore nested state Liran Alon
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	Nikita Leshenko, pbonzini, rth, jmattson

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 include/migration/vmstate.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 9224370ed59a..ca68584eba4d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -797,6 +797,19 @@ extern const VMStateInfo vmstate_info_qtailq;
 #define VMSTATE_UINT64_V(_f, _s, _v)                                  \
     VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
 
+#ifdef CONFIG_LINUX
+
+#define VMSTATE_U8_V(_f, _s, _v)                                   \
+    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint8, __u8)
+#define VMSTATE_U16_V(_f, _s, _v)                                  \
+    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint16, __u16)
+#define VMSTATE_U32_V(_f, _s, _v)                                  \
+    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint32, __u32)
+#define VMSTATE_U64_V(_f, _s, _v)                                  \
+    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, __u64)
+
+#endif
+
 #define VMSTATE_BOOL(_f, _s)                                          \
     VMSTATE_BOOL_V(_f, _s, 0)
 
@@ -818,6 +831,19 @@ extern const VMStateInfo vmstate_info_qtailq;
 #define VMSTATE_UINT64(_f, _s)                                        \
     VMSTATE_UINT64_V(_f, _s, 0)
 
+#ifdef CONFIG_LINUX
+
+#define VMSTATE_U8(_f, _s)                                         \
+    VMSTATE_U8_V(_f, _s, 0)
+#define VMSTATE_U16(_f, _s)                                        \
+    VMSTATE_U16_V(_f, _s, 0)
+#define VMSTATE_U32(_f, _s)                                        \
+    VMSTATE_U32_V(_f, _s, 0)
+#define VMSTATE_U64(_f, _s)                                        \
+    VMSTATE_U64_V(_f, _s, 0)
+
+#endif
+
 #define VMSTATE_UINT8_EQUAL(_f, _s, _err_hint)                        \
     VMSTATE_SINGLE_FULL(_f, _s, 0, 0,                                 \
                         vmstate_info_uint8_equal, uint8_t, _err_hint)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 08/10] target/i386: kvm: Add support for save and restore nested state
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (6 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 07/10] vmstate: Add support for kernel integer types Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 09/10] target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD Liran Alon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	Nikita Leshenko, pbonzini, rth, jmattson

Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
introduced new IOCTLs to extract and restore vCPU state related to
Intel VMX & AMD SVM.

Utilize these IOCTLs to add support for migration of VMs which are
running nested hypervisors.

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Tested-by: Maran Wilson <maran.wilson@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 accel/kvm/kvm-all.c   |   8 ++
 include/sysemu/kvm.h  |   1 +
 target/i386/cpu.h     |   3 +
 target/i386/kvm.c     |  80 +++++++++++++++++
 target/i386/machine.c | 198 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 290 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 59a3aa3a40da..4fdf5b04b131 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -88,6 +88,7 @@ struct KVMState
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_HEAD(, kvm_sw_breakpoint) kvm_sw_breakpoints;
 #endif
+    int max_nested_state_len;
     int many_ioeventfds;
     int intx_set_mask;
     bool sync_mmu;
@@ -1678,6 +1679,8 @@ static int kvm_init(MachineState *ms)
     s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
 #endif
 
+    s->max_nested_state_len = kvm_check_extension(s, KVM_CAP_NESTED_STATE);
+
 #ifdef KVM_CAP_IRQ_ROUTING
     kvm_direct_msi_allowed = (kvm_check_extension(s, KVM_CAP_SIGNAL_MSI) > 0);
 #endif
@@ -2245,6 +2248,11 @@ int kvm_has_debugregs(void)
     return kvm_state->debugregs;
 }
 
+int kvm_max_nested_state_length(void)
+{
+    return kvm_state->max_nested_state_len;
+}
+
 int kvm_has_many_ioeventfds(void)
 {
     if (!kvm_enabled()) {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 64f55e519df7..acd90aebb6c4 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -210,6 +210,7 @@ bool kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
+int kvm_max_nested_state_length(void);
 int kvm_has_pit_state2(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 64bb7fdcb231..c1a42aa8851a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1350,6 +1350,9 @@ typedef struct CPUX86State {
 #if defined(CONFIG_KVM) || defined(CONFIG_HVF)
     void *xsave_buf;
 #endif
+#if defined(CONFIG_KVM)
+    struct kvm_nested_state *nested_state;
+#endif
 #if defined(CONFIG_HVF)
     HVFX86EmulatorState *hvf_emul;
 #endif
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index b2041ce5ec3c..8ae7598aac3d 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -931,6 +931,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     struct kvm_cpuid_entry2 *c;
     uint32_t signature[3];
     int kvm_base = KVM_CPUID_SIGNATURE;
+    int max_nested_state_len;
     int r;
     Error *local_err = NULL;
 
@@ -1331,6 +1332,24 @@ int kvm_arch_init_vcpu(CPUState *cs)
     if (has_xsave) {
         env->xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
     }
+
+    max_nested_state_len = kvm_max_nested_state_length();
+    if (max_nested_state_len > 0) {
+        assert(max_nested_state_len >= offsetof(struct kvm_nested_state, data));
+        env->nested_state = g_malloc0(max_nested_state_len);
+
+        env->nested_state->size = max_nested_state_len;
+
+        if (IS_INTEL_CPU(env)) {
+            struct kvm_vmx_nested_state_hdr *vmx_hdr =
+                &env->nested_state->hdr.vmx;
+
+            vmx_hdr->vmxon_pa = -1ull;
+            vmx_hdr->vmcs12_pa = -1ull;
+        }
+
+    }
+
     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
 
     if (!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
@@ -1355,12 +1374,18 @@ int kvm_arch_init_vcpu(CPUState *cs)
 int kvm_arch_destroy_vcpu(CPUState *cs)
 {
     X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
 
     if (cpu->kvm_msr_buf) {
         g_free(cpu->kvm_msr_buf);
         cpu->kvm_msr_buf = NULL;
     }
 
+    if (env->nested_state) {
+        g_free(env->nested_state);
+        env->nested_state = NULL;
+    }
+
     return 0;
 }
 
@@ -3075,6 +3100,52 @@ static int kvm_get_debugregs(X86CPU *cpu)
     return 0;
 }
 
+static int kvm_put_nested_state(X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+    int max_nested_state_len = kvm_max_nested_state_length();
+
+    if (max_nested_state_len <= 0) {
+        return 0;
+    }
+
+    assert(env->nested_state->size <= max_nested_state_len);
+    return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_NESTED_STATE, env->nested_state);
+}
+
+static int kvm_get_nested_state(X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+    int max_nested_state_len = kvm_max_nested_state_length();
+    int ret;
+
+    if (max_nested_state_len <= 0) {
+        return 0;
+    }
+
+    /*
+     * It is possible that migration restored a smaller size into
+     * nested_state->hdr.size than what our kernel support.
+     * We preserve migration origin nested_state->hdr.size for
+     * call to KVM_SET_NESTED_STATE but wish that our next call
+     * to KVM_GET_NESTED_STATE will use max size our kernel support.
+     */
+    env->nested_state->size = max_nested_state_len;
+
+    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_NESTED_STATE, env->nested_state);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (env->nested_state->flags & KVM_STATE_NESTED_GUEST_MODE) {
+        env->hflags |= HF_GUEST_MASK;
+    } else {
+        env->hflags &= ~HF_GUEST_MASK;
+    }
+
+    return ret;
+}
+
 int kvm_arch_put_registers(CPUState *cpu, int level)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -3082,6 +3153,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    ret = kvm_put_nested_state(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_msr_feature_control(x86_cpu);
         if (ret < 0) {
@@ -3197,6 +3273,10 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
+    ret = kvm_get_nested_state(cpu);
+    if (ret < 0) {
+        goto out;
+    }
     ret = 0;
  out:
     cpu_sync_bndcs_hflags(&cpu->env);
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 225b5d433bc4..e27115fdb756 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -231,6 +231,15 @@ static int cpu_pre_save(void *opaque)
         env->segs[R_SS].flags &= ~(env->segs[R_SS].flags & DESC_DPL_MASK);
     }
 
+#ifdef CONFIG_KVM
+    /* Verify we have nested virtualization state from kernel if required */
+    if (cpu_has_nested_virt(env) && !env->nested_state) {
+        error_report("Guest enabled nested virtualization but kernel "
+                "does not support saving of nested state");
+        return -EINVAL;
+    }
+#endif
+
     return 0;
 }
 
@@ -278,6 +287,16 @@ static int cpu_post_load(void *opaque, int version_id)
     env->hflags &= ~HF_CPL_MASK;
     env->hflags |= (env->segs[R_SS].flags >> DESC_DPL_SHIFT) & HF_CPL_MASK;
 
+#ifdef CONFIG_KVM
+    if ((env->hflags & HF_GUEST_MASK) &&
+        (!env->nested_state ||
+        !(env->nested_state->flags & KVM_STATE_NESTED_GUEST_MODE))) {
+        error_report("vCPU set in guest-mode inconsistent with "
+                     "migrated kernel nested state");
+        return -EINVAL;
+    }
+#endif
+
     env->fpstt = (env->fpus_vmstate >> 11) & 7;
     env->fpus = env->fpus_vmstate & ~0x3800;
     env->fptag_vmstate ^= 0xff;
@@ -851,6 +870,182 @@ static const VMStateDescription vmstate_tsc_khz = {
     }
 };
 
+#ifdef CONFIG_KVM
+
+static bool vmx_vmcs12_needed(void *opaque)
+{
+    struct kvm_nested_state *nested_state = opaque;
+    return (nested_state->size >
+            offsetof(struct kvm_nested_state, data.vmx[0].vmcs12));
+}
+
+static const VMStateDescription vmstate_vmx_vmcs12 = {
+    .name = "cpu/kvm_nested_state/vmx/vmcs12",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vmx_vmcs12_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8_ARRAY(data.vmx[0].vmcs12,
+                            struct kvm_nested_state,
+                            KVM_STATE_NESTED_VMX_VMCS_SIZE),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool vmx_shadow_vmcs12_needed(void *opaque)
+{
+    struct kvm_nested_state *nested_state = opaque;
+    return (nested_state->size >
+            offsetof(struct kvm_nested_state, data.vmx[0].shadow_vmcs12));
+}
+
+static const VMStateDescription vmstate_vmx_shadow_vmcs12 = {
+    .name = "cpu/kvm_nested_state/vmx/shadow_vmcs12",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vmx_shadow_vmcs12_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8_ARRAY(data.vmx[0].shadow_vmcs12,
+                            struct kvm_nested_state,
+                            KVM_STATE_NESTED_VMX_VMCS_SIZE),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool vmx_nested_state_needed(void *opaque)
+{
+    struct kvm_nested_state *nested_state = opaque;
+
+    return ((nested_state->format == KVM_STATE_NESTED_FORMAT_VMX) &&
+            ((nested_state->hdr.vmx.vmxon_pa != -1ull) ||
+             (nested_state->hdr.vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON)));
+}
+
+static const VMStateDescription vmstate_vmx_nested_state = {
+    .name = "cpu/kvm_nested_state/vmx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vmx_nested_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_U64(hdr.vmx.vmxon_pa, struct kvm_nested_state),
+        VMSTATE_U64(hdr.vmx.vmcs12_pa, struct kvm_nested_state),
+        VMSTATE_U16(hdr.vmx.smm.flags, struct kvm_nested_state),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_vmx_vmcs12,
+        &vmstate_vmx_shadow_vmcs12,
+        NULL,
+    }
+};
+
+static bool svm_nested_state_needed(void *opaque)
+{
+    struct kvm_nested_state *nested_state = opaque;
+
+    return (nested_state->format == KVM_STATE_NESTED_FORMAT_SVM);
+}
+
+static const VMStateDescription vmstate_svm_nested_state = {
+    .name = "cpu/kvm_nested_state/svm",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = svm_nested_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool nested_state_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return (env->nested_state &&
+            (vmx_nested_state_needed(env->nested_state) ||
+             svm_nested_state_needed(env->nested_state)));
+}
+
+static int nested_state_post_load(void *opaque, int version_id)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+    struct kvm_nested_state *nested_state = env->nested_state;
+    int min_nested_state_len = offsetof(struct kvm_nested_state, data);
+    int max_nested_state_len = kvm_max_nested_state_length();
+
+    /*
+     * If our kernel don't support setting nested state
+     * and we have received nested state from migration stream,
+     * we need to fail migration
+     */
+    if (max_nested_state_len <= 0) {
+        error_report("Received nested state when kernel cannot restore it");
+        return -EINVAL;
+    }
+
+    /*
+     * Verify that the size of received nested_state struct
+     * at least cover required header and is not larger
+     * than the max size that our kernel support
+     */
+    if (nested_state->size < min_nested_state_len) {
+        error_report("Received nested state size less than min: "
+                     "len=%d, min=%d",
+                     nested_state->size, min_nested_state_len);
+        return -EINVAL;
+    }
+    if (nested_state->size > max_nested_state_len) {
+        error_report("Recieved unsupported nested state size: "
+                     "nested_state->size=%d, max=%d",
+                     nested_state->size, max_nested_state_len);
+        return -EINVAL;
+    }
+
+    /* Verify format is valid */
+    if ((nested_state->format != KVM_STATE_NESTED_FORMAT_VMX) &&
+        (nested_state->format != KVM_STATE_NESTED_FORMAT_SVM)) {
+        error_report("Received invalid nested state format: %d",
+                     nested_state->format);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_kvm_nested_state = {
+    .name = "cpu/kvm_nested_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_U16(flags, struct kvm_nested_state),
+        VMSTATE_U16(format, struct kvm_nested_state),
+        VMSTATE_U32(size, struct kvm_nested_state),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_vmx_nested_state,
+        &vmstate_svm_nested_state,
+        NULL
+    }
+};
+
+static const VMStateDescription vmstate_nested_state = {
+    .name = "cpu/nested_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = nested_state_needed,
+    .post_load = nested_state_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_STRUCT_POINTER(env.nested_state, X86CPU,
+                vmstate_kvm_nested_state,
+                struct kvm_nested_state),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+#endif
+
 static bool mcg_ext_ctl_needed(void *opaque)
 {
     X86CPU *cpu = opaque;
@@ -1089,6 +1284,9 @@ VMStateDescription vmstate_x86_cpu = {
         &vmstate_msr_intel_pt,
         &vmstate_msr_virt_ssbd,
         &vmstate_svm_npt,
+#ifdef CONFIG_KVM
+        &vmstate_nested_state,
+#endif
         NULL
     }
 };
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 09/10] target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (7 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 08/10] target/i386: kvm: Add support for save and restore nested state Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 10/10] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities Liran Alon
  2019-06-20 12:38 ` [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Paolo Bonzini
  10 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	Nikita Leshenko, pbonzini, rth, jmattson

Kernel commit c4f55198c7c2 ("kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD")
introduced a new KVM capability which allows userspace to correctly
distinguish between pending and injected exceptions.

This distinguish is important in case of nested virtualization scenarios
because a L2 pending exception can still be intercepted by the L1 hypervisor
while a L2 injected exception cannot.

Furthermore, when an exception is attempted to be injected by QEMU,
QEMU should specify the exception payload (CR2 in case of #PF or
DR6 in case of #DB) instead of having the payload already delivered in
the respective vCPU register. Because in case exception is injected to
L2 guest and is intercepted by L1 hypervisor, then payload needs to be
reported to L1 intercept (VMExit handler) while still preserving
respective vCPU register unchanged.

This commit adds support for QEMU to properly utilise this new KVM
capability (KVM_CAP_EXCEPTION_PAYLOAD).

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/cpu.c        |   6 ++-
 target/i386/cpu.h        |   6 ++-
 target/i386/hvf/hvf.c    |  10 ++--
 target/i386/hvf/x86hvf.c |   4 +-
 target/i386/kvm.c        | 101 ++++++++++++++++++++++++++++++++-------
 target/i386/machine.c    |  84 +++++++++++++++++++++++++++++++-
 6 files changed, 187 insertions(+), 24 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 197201087e65..a026e49f5c0d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4774,7 +4774,11 @@ static void x86_cpu_reset(CPUState *s)
     memset(env->mtrr_fixed, 0, sizeof(env->mtrr_fixed));
 
     env->interrupt_injected = -1;
-    env->exception_injected = -1;
+    env->exception_nr = -1;
+    env->exception_pending = 0;
+    env->exception_injected = 0;
+    env->exception_has_payload = false;
+    env->exception_payload = 0;
     env->nmi_injected = false;
 #if !defined(CONFIG_USER_ONLY)
     /* We hard-wire the BSP to the first CPU. */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c1a42aa8851a..24543dc188f8 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1338,10 +1338,14 @@ typedef struct CPUX86State {
 
     /* For KVM */
     uint32_t mp_state;
-    int32_t exception_injected;
+    int32_t exception_nr;
     int32_t interrupt_injected;
     uint8_t soft_interrupt;
+    uint8_t exception_pending;
+    uint8_t exception_injected;
     uint8_t has_error_code;
+    uint8_t exception_has_payload;
+    uint64_t exception_payload;
     uint32_t ins_len;
     uint32_t sipi_vector;
     bool tsc_valid;
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index 2751c8125ca2..dc4bb63536c8 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -605,7 +605,9 @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    env->exception_injected = -1;
+    env->exception_nr = -1;
+    env->exception_pending = 0;
+    env->exception_injected = 0;
     env->interrupt_injected = -1;
     env->nmi_injected = false;
     if (idtvec_info & VMCS_IDT_VEC_VALID) {
@@ -619,7 +621,8 @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
             break;
         case VMCS_IDT_VEC_HWEXCEPTION:
         case VMCS_IDT_VEC_SWEXCEPTION:
-            env->exception_injected = idtvec_info & VMCS_IDT_VEC_VECNUM;
+            env->exception_nr = idtvec_info & VMCS_IDT_VEC_VECNUM;
+            env->exception_injected = 1;
             break;
         case VMCS_IDT_VEC_PRIV_SWEXCEPTION:
         default:
@@ -912,7 +915,8 @@ int hvf_vcpu_exec(CPUState *cpu)
             macvm_set_rip(cpu, rip + ins_len);
             break;
         case VMX_REASON_VMCALL:
-            env->exception_injected = EXCP0D_GPF;
+            env->exception_nr = EXCP0D_GPF;
+            env->exception_injected = 1;
             env->has_error_code = true;
             env->error_code = 0;
             break;
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index df8e946fbcde..e0ea02d631e6 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -362,8 +362,8 @@ bool hvf_inject_interrupts(CPUState *cpu_state)
     if (env->interrupt_injected != -1) {
         vector = env->interrupt_injected;
         intr_type = VMCS_INTR_T_SWINTR;
-    } else if (env->exception_injected != -1) {
-        vector = env->exception_injected;
+    } else if (env->exception_nr != -1) {
+        vector = env->exception_nr;
         if (vector == EXCP03_INT3 || vector == EXCP04_INTO) {
             intr_type = VMCS_INTR_T_SWEXCEPTION;
         } else {
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 8ae7598aac3d..99480a52ad33 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -104,6 +104,7 @@ static uint32_t num_architectural_pmu_fixed_counters;
 static int has_xsave;
 static int has_xcrs;
 static int has_pit_state2;
+static int has_exception_payload;
 
 static bool has_msr_mcg_ext_ctl;
 
@@ -584,15 +585,56 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
     /* Hope we are lucky for AO MCE */
 }
 
+static void kvm_reset_exception(CPUX86State *env)
+{
+	env->exception_nr = -1;
+	env->exception_pending = 0;
+	env->exception_injected = 0;
+	env->exception_has_payload = false;
+	env->exception_payload = 0;
+}
+
+static void kvm_queue_exception(CPUX86State *env,
+                                int32_t exception_nr,
+                                uint8_t exception_has_payload,
+                                uint64_t exception_payload)
+{
+    assert(env->exception_nr == -1);
+    assert(!env->exception_pending);
+    assert(!env->exception_injected);
+    assert(!env->exception_has_payload);
+
+    env->exception_nr = exception_nr;
+
+    if (has_exception_payload) {
+        env->exception_pending = 1;
+
+        env->exception_has_payload = exception_has_payload;
+        env->exception_payload = exception_payload;
+    } else {
+        env->exception_injected = 1;
+
+        if (exception_nr == EXCP01_DB) {
+            assert(exception_has_payload);
+            env->dr[6] = exception_payload;
+        } else if (exception_nr == EXCP0E_PAGE) {
+            assert(exception_has_payload);
+            env->cr[2] = exception_payload;
+        } else {
+            assert(!exception_has_payload);
+        }
+    }
+}
+
 static int kvm_inject_mce_oldstyle(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
 
-    if (!kvm_has_vcpu_events() && env->exception_injected == EXCP12_MCHK) {
+    if (!kvm_has_vcpu_events() && env->exception_nr == EXCP12_MCHK) {
         unsigned int bank, bank_num = env->mcg_cap & 0xff;
         struct kvm_x86_mce mce;
 
-        env->exception_injected = -1;
+        kvm_reset_exception(env);
 
         /*
          * There must be at least one bank in use if an MCE is pending.
@@ -1613,6 +1655,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
     hv_vpindex_settable = kvm_check_extension(s, KVM_CAP_HYPERV_VP_INDEX);
 
+    has_exception_payload = kvm_check_extension(s, KVM_CAP_EXCEPTION_PAYLOAD);
+    if (has_exception_payload) {
+        ret = kvm_vm_enable_cap(s, KVM_CAP_EXCEPTION_PAYLOAD, 0, true);
+        if (ret < 0) {
+            error_report("kvm: Failed to enable exception payload cap: %s",
+                         strerror(-ret));
+            return ret;
+        }
+    }
+
     ret = kvm_get_supported_msrs(s);
     if (ret < 0) {
         return ret;
@@ -2917,8 +2969,16 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
         return 0;
     }
 
-    events.exception.injected = (env->exception_injected >= 0);
-    events.exception.nr = env->exception_injected;
+    events.flags = 0;
+
+    if (has_exception_payload) {
+        events.flags |= KVM_VCPUEVENT_VALID_PAYLOAD;
+        events.exception.pending = env->exception_pending;
+        events.exception_has_payload = env->exception_has_payload;
+        events.exception_payload = env->exception_payload;
+    }
+    events.exception.nr = env->exception_nr;
+    events.exception.injected = env->exception_injected;
     events.exception.has_error_code = env->has_error_code;
     events.exception.error_code = env->error_code;
 
@@ -2931,7 +2991,6 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
     events.nmi.masked = !!(env->hflags2 & HF2_NMI_MASK);
 
     events.sipi_vector = env->sipi_vector;
-    events.flags = 0;
 
     if (has_msr_smbase) {
         events.smi.smm = !!(env->hflags & HF_SMM_MASK);
@@ -2981,8 +3040,19 @@ static int kvm_get_vcpu_events(X86CPU *cpu)
     if (ret < 0) {
        return ret;
     }
-    env->exception_injected =
-       events.exception.injected ? events.exception.nr : -1;
+
+    if (events.flags & KVM_VCPUEVENT_VALID_PAYLOAD) {
+        env->exception_pending = events.exception.pending;
+        env->exception_has_payload = events.exception_has_payload;
+        env->exception_payload = events.exception_payload;
+    } else {
+        env->exception_pending = 0;
+        env->exception_has_payload = false;
+    }
+    env->exception_injected = events.exception.injected;
+    env->exception_nr =
+        (env->exception_pending || env->exception_injected) ?
+        events.exception.nr : -1;
     env->has_error_code = events.exception.has_error_code;
     env->error_code = events.exception.error_code;
 
@@ -3034,12 +3104,12 @@ static int kvm_guest_debug_workarounds(X86CPU *cpu)
     unsigned long reinject_trap = 0;
 
     if (!kvm_has_vcpu_events()) {
-        if (env->exception_injected == EXCP01_DB) {
+        if (env->exception_nr == EXCP01_DB) {
             reinject_trap = KVM_GUESTDBG_INJECT_DB;
         } else if (env->exception_injected == EXCP03_INT3) {
             reinject_trap = KVM_GUESTDBG_INJECT_BP;
         }
-        env->exception_injected = -1;
+        kvm_reset_exception(env);
     }
 
     /*
@@ -3415,13 +3485,13 @@ int kvm_arch_process_async_events(CPUState *cs)
 
         kvm_cpu_synchronize_state(cs);
 
-        if (env->exception_injected == EXCP08_DBLE) {
+        if (env->exception_nr == EXCP08_DBLE) {
             /* this means triple fault */
             qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
             cs->exit_request = 1;
             return 0;
         }
-        env->exception_injected = EXCP12_MCHK;
+        kvm_queue_exception(env, EXCP12_MCHK, 0, 0);
         env->has_error_code = 0;
 
         cs->halted = 0;
@@ -3636,14 +3706,13 @@ static int kvm_handle_debug(X86CPU *cpu,
     }
     if (ret == 0) {
         cpu_synchronize_state(cs);
-        assert(env->exception_injected == -1);
+        assert(env->exception_nr == -1);
 
         /* pass to guest */
-        env->exception_injected = arch_info->exception;
+        kvm_queue_exception(env, arch_info->exception,
+                            arch_info->exception == EXCP01_DB,
+                            arch_info->dr6);
         env->has_error_code = 0;
-        if (arch_info->exception == EXCP01_DB) {
-            env->dr[6] = arch_info->dr6;
-        }
     }
 
     return ret;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index e27115fdb756..4f292f0660ca 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -240,6 +240,41 @@ static int cpu_pre_save(void *opaque)
     }
 #endif
 
+    /*
+     * When vCPU is running L2 and exception is still pending,
+     * it can potentially be intercepted by L1 hypervisor.
+     * In contrast to an injected exception which cannot be
+     * intercepted anymore.
+     *
+     * Furthermore, when a L2 exception is intercepted by L1
+     * hypervisor, it's exception payload (CR2/DR6 on #PF/#DB)
+     * should not be set yet in the respective vCPU register.
+     * Thus, in case an exception is pending, it is
+     * important to save the exception payload seperately.
+     *
+     * Therefore, if an exception is not in a pending state
+     * or vCPU is not in guest-mode, it is not important to
+     * distinguish between a pending and injected exception
+     * and we don't need to store seperately the exception payload.
+     *
+     * In order to preserve better backwards-compatabile migration,
+     * convert a pending exception to an injected exception in
+     * case it is not important to distingiush between them
+     * as described above.
+     */
+    if (env->exception_pending && !(env->hflags & HF_GUEST_MASK)) {
+        env->exception_pending = 0;
+        env->exception_injected = 1;
+
+        if (env->exception_has_payload) {
+            if (env->exception_nr == EXCP01_DB) {
+                env->dr[6] = env->exception_payload;
+            } else if (env->exception_nr == EXCP0E_PAGE) {
+                env->cr[2] = env->exception_payload;
+            }
+        }
+    }
+
     return 0;
 }
 
@@ -297,6 +332,23 @@ static int cpu_post_load(void *opaque, int version_id)
     }
 #endif
 
+    /*
+     * There are cases that we can get valid exception_nr with both
+     * exception_pending and exception_injected being cleared.
+     * This can happen in one of the following scenarios:
+     * 1) Source is older QEMU without KVM_CAP_EXCEPTION_PAYLOAD support.
+     * 2) Source is running on kernel without KVM_CAP_EXCEPTION_PAYLOAD support.
+     * 3) "cpu/exception_info" subsection not sent because there is no exception
+     *	  pending or guest wasn't running L2 (See comment in cpu_pre_save()).
+     *
+     * In those cases, we can just deduce that a valid exception_nr means
+     * we can treat the exception as already injected.
+     */
+    if ((env->exception_nr != -1) &&
+        !env->exception_pending && !env->exception_injected) {
+        env->exception_injected = 1;
+    }
+
     env->fpstt = (env->fpus_vmstate >> 11) & 7;
     env->fpus = env->fpus_vmstate & ~0x3800;
     env->fptag_vmstate ^= 0xff;
@@ -342,6 +394,35 @@ static bool steal_time_msr_needed(void *opaque)
     return cpu->env.steal_time_msr != 0;
 }
 
+static bool exception_info_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * It is important to save exception-info only in case
+     * we need to distingiush between a pending and injected
+     * exception. Which is only required in case there is a
+     * pending exception and vCPU is running L2.
+     * For more info, refer to comment in cpu_pre_save().
+     */
+    return (env->exception_pending && (env->hflags & HF_GUEST_MASK));
+}
+
+static const VMStateDescription vmstate_exception_info = {
+    .name = "cpu/exception_info",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = exception_info_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8(env.exception_pending, X86CPU),
+        VMSTATE_UINT8(env.exception_injected, X86CPU),
+        VMSTATE_UINT8(env.exception_has_payload, X86CPU),
+        VMSTATE_UINT64(env.exception_payload, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_steal_time_msr = {
     .name = "cpu/steal_time_msr",
     .version_id = 1,
@@ -1230,7 +1311,7 @@ VMStateDescription vmstate_x86_cpu = {
         VMSTATE_INT32(env.interrupt_injected, X86CPU),
         VMSTATE_UINT32(env.mp_state, X86CPU),
         VMSTATE_UINT64(env.tsc, X86CPU),
-        VMSTATE_INT32(env.exception_injected, X86CPU),
+        VMSTATE_INT32(env.exception_nr, X86CPU),
         VMSTATE_UINT8(env.soft_interrupt, X86CPU),
         VMSTATE_UINT8(env.nmi_injected, X86CPU),
         VMSTATE_UINT8(env.nmi_pending, X86CPU),
@@ -1254,6 +1335,7 @@ VMStateDescription vmstate_x86_cpu = {
         /* The above list is not sorted /wrt version numbers, watch out! */
     },
     .subsections = (const VMStateDescription*[]) {
+        &vmstate_exception_info,
         &vmstate_async_pf_msr,
         &vmstate_pv_eoi_msr,
         &vmstate_steal_time_msr,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [QEMU PATCH v4 10/10] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (8 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 09/10] target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD Liran Alon
@ 2019-06-19 16:21 ` Liran Alon
  2019-06-19 23:52   ` Maran Wilson
  2019-06-20 12:38 ` [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Paolo Bonzini
  10 siblings, 1 reply; 20+ messages in thread
From: Liran Alon @ 2019-06-19 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, Liran Alon,
	pbonzini, rth, jmattson

Previous commits have added support for migration of nested virtualization
workloads. This was done by utilising two new KVM capabilities:
KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are
required in order to correctly migrate such workloads.

Therefore, change code to add a migration blocker for vCPUs exposed with
Intel VMX or AMD SVM in case one of these kernel capabilities is
missing.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/kvm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 99480a52ad33..a3d0fbed3b35 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1313,9 +1313,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                   !!(c->ecx & CPUID_EXT_SMX);
     }
 
-    if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) {
+    if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker &&
+        ((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) {
         error_setg(&nested_virt_mig_blocker,
-                   "Nested virtualization does not support live migration yet");
+                   "Kernel do not provide required capabilities for "
+                   "nested virtualization migration. "
+                   "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)",
+                   kvm_max_nested_state_length() > 0,
+                   has_exception_payload);
         r = migrate_add_blocker(nested_virt_mig_blocker, &local_err);
         if (local_err) {
             error_report_err(local_err);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 07/10] vmstate: Add support for kernel integer types
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 07/10] vmstate: Add support for kernel integer types Liran Alon
@ 2019-06-19 17:37   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-19 17:37 UTC (permalink / raw)
  To: Liran Alon
  Cc: ehabkost, kvm, maran.wilson, mtosatti, qemu-devel,
	Nikita Leshenko, pbonzini, jmattson, rth

* Liran Alon (liran.alon@oracle.com) wrote:
> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>  include/migration/vmstate.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 9224370ed59a..ca68584eba4d 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -797,6 +797,19 @@ extern const VMStateInfo vmstate_info_qtailq;
>  #define VMSTATE_UINT64_V(_f, _s, _v)                                  \
>      VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
>  
> +#ifdef CONFIG_LINUX
> +
> +#define VMSTATE_U8_V(_f, _s, _v)                                   \
> +    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint8, __u8)
> +#define VMSTATE_U16_V(_f, _s, _v)                                  \
> +    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint16, __u16)
> +#define VMSTATE_U32_V(_f, _s, _v)                                  \
> +    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint32, __u32)
> +#define VMSTATE_U64_V(_f, _s, _v)                                  \
> +    VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, __u64)
> +
> +#endif
> +

Right, and that works as well as the comment I suggested, so

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>  #define VMSTATE_BOOL(_f, _s)                                          \
>      VMSTATE_BOOL_V(_f, _s, 0)
>  
> @@ -818,6 +831,19 @@ extern const VMStateInfo vmstate_info_qtailq;
>  #define VMSTATE_UINT64(_f, _s)                                        \
>      VMSTATE_UINT64_V(_f, _s, 0)
>  
> +#ifdef CONFIG_LINUX
> +
> +#define VMSTATE_U8(_f, _s)                                         \
> +    VMSTATE_U8_V(_f, _s, 0)
> +#define VMSTATE_U16(_f, _s)                                        \
> +    VMSTATE_U16_V(_f, _s, 0)
> +#define VMSTATE_U32(_f, _s)                                        \
> +    VMSTATE_U32_V(_f, _s, 0)
> +#define VMSTATE_U64(_f, _s)                                        \
> +    VMSTATE_U64_V(_f, _s, 0)
> +
> +#endif
> +
>  #define VMSTATE_UINT8_EQUAL(_f, _s, _err_hint)                        \
>      VMSTATE_SINGLE_FULL(_f, _s, 0, 0,                                 \
>                          vmstate_info_uint8_equal, uint8_t, _err_hint)
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure Liran Alon
@ 2019-06-19 20:30   ` Maran Wilson
  2019-06-19 20:33     ` Liran Alon
  0 siblings, 1 reply; 20+ messages in thread
From: Maran Wilson @ 2019-06-19 20:30 UTC (permalink / raw)
  To: Liran Alon, qemu-devel
  Cc: ehabkost, kvm, mtosatti, dgilbert, pbonzini, jmattson, rth

On 6/19/2019 9:21 AM, Liran Alon wrote:
> Commit d98f26073beb ("target/i386: kvm: add VMX migration blocker")
> added migration blocker for vCPU exposed with Intel VMX because QEMU
> doesn't yet contain code to support migration of nested virtualization
> workloads.
>
> However, that commit missed adding deletion of the migration blocker in
> case init of vCPU failed. Similar to invtsc_mig_blocker. This commit fix
> that issue.
>
> Fixes: d98f26073beb ("target/i386: kvm: add VMX migration blocker")
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>   target/i386/kvm.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 3b29ce5c0d08..7aa7914a498c 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -940,7 +940,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>   
>       r = kvm_arch_set_tsc_khz(cs);
>       if (r < 0) {
> -        goto fail;
> +        return r;
>       }
>   
>       /* vcpu's TSC frequency is either specified by user, or following
> @@ -1295,7 +1295,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>               if (local_err) {
>                   error_report_err(local_err);
>                   error_free(invtsc_mig_blocker);
> -                return r;
> +                goto fail2;
>               }
>           }
>       }
> @@ -1346,6 +1346,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
>   
>    fail:
>       migrate_del_blocker(invtsc_mig_blocker);
> + fail2:
> +    migrate_del_blocker(vmx_mig_blocker);
> +

At the risk of being a bit pedantic...

Your changes don't introduce this problem, but they do make it worse -- 
Since [vmx|invtsc]_mig_blocker are both global in scope, isn't it 
possible you end up deleting one or both valid blockers that were 
created by a previous invocation of kvm_arch_init_vcpu() ?  Seems to me 
that you would need something like an additional pair of local boolean 
variables named created_[vmx|invtsc]_mig_blocker and condition the calls 
to migrate_del_blocker() accordingly. On the positive side, that would 
simplify some of the logic around when and if it's ok to jump to "fail" 
(and you wouldn't need the "fail2").

Thanks,
-Maran

>       return r;
>   }
>   


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure
  2019-06-19 20:30   ` Maran Wilson
@ 2019-06-19 20:33     ` Liran Alon
  2019-06-19 20:48       ` Maran Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Liran Alon @ 2019-06-19 20:33 UTC (permalink / raw)
  To: Maran Wilson
  Cc: ehabkost, kvm, mtosatti, dgilbert, qemu-devel, pbonzini, jmattson, rth



> On 19 Jun 2019, at 23:30, Maran Wilson <maran.wilson@oracle.com> wrote:
> 
> On 6/19/2019 9:21 AM, Liran Alon wrote:
>> Commit d98f26073beb ("target/i386: kvm: add VMX migration blocker")
>> added migration blocker for vCPU exposed with Intel VMX because QEMU
>> doesn't yet contain code to support migration of nested virtualization
>> workloads.
>> 
>> However, that commit missed adding deletion of the migration blocker in
>> case init of vCPU failed. Similar to invtsc_mig_blocker. This commit fix
>> that issue.
>> 
>> Fixes: d98f26073beb ("target/i386: kvm: add VMX migration blocker")
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> ---
>>  target/i386/kvm.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index 3b29ce5c0d08..7aa7914a498c 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -940,7 +940,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>        r = kvm_arch_set_tsc_khz(cs);
>>      if (r < 0) {
>> -        goto fail;
>> +        return r;
>>      }
>>        /* vcpu's TSC frequency is either specified by user, or following
>> @@ -1295,7 +1295,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>              if (local_err) {
>>                  error_report_err(local_err);
>>                  error_free(invtsc_mig_blocker);
>> -                return r;
>> +                goto fail2;
>>              }
>>          }
>>      }
>> @@ -1346,6 +1346,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>     fail:
>>      migrate_del_blocker(invtsc_mig_blocker);
>> + fail2:
>> +    migrate_del_blocker(vmx_mig_blocker);
>> +
> 
> At the risk of being a bit pedantic...
> 
> Your changes don't introduce this problem, but they do make it worse -- Since [vmx|invtsc]_mig_blocker are both global in scope, isn't it possible you end up deleting one or both valid blockers that were created by a previous invocation of kvm_arch_init_vcpu() ?  Seems to me that you would need something like an additional pair of local boolean variables named created_[vmx|invtsc]_mig_blocker and condition the calls to migrate_del_blocker() accordingly. On the positive side, that would simplify some of the logic around when and if it's ok to jump to "fail" (and you wouldn't need the "fail2").
> 
> Thanks,
> -Maran

I actually thought about this as-well when I encountered this issue.
In general one can argue that every vCPU should introduce it’s own migration blocker on init (if required) and remove it’s own migration blocker on deletion (on vCPU destroy).
On 99% of the time, all of this shouldn’t matter as all vCPUs of a given VM have exactly the same properties.
Anyway, I decided that this is entirely not relevant for this patch-series and therefore if this should be addressed further, let it be another unrelated patch-series.
QEMU have too many issues to fix all at once :P. I need to filter.

-Liran

> 
>>      return r;
>>  }
>>  



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure
  2019-06-19 20:33     ` Liran Alon
@ 2019-06-19 20:48       ` Maran Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Maran Wilson @ 2019-06-19 20:48 UTC (permalink / raw)
  To: Liran Alon
  Cc: ehabkost, kvm, mtosatti, qemu-devel, dgilbert, pbonzini, rth, jmattson

On 6/19/2019 1:33 PM, Liran Alon wrote:
>> On 19 Jun 2019, at 23:30, Maran Wilson <maran.wilson@oracle.com> wrote:
>>
>> On 6/19/2019 9:21 AM, Liran Alon wrote:
>>> Commit d98f26073beb ("target/i386: kvm: add VMX migration blocker")
>>> added migration blocker for vCPU exposed with Intel VMX because QEMU
>>> doesn't yet contain code to support migration of nested virtualization
>>> workloads.
>>>
>>> However, that commit missed adding deletion of the migration blocker in
>>> case init of vCPU failed. Similar to invtsc_mig_blocker. This commit fix
>>> that issue.
>>>
>>> Fixes: d98f26073beb ("target/i386: kvm: add VMX migration blocker")
>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>> ---
>>>   target/i386/kvm.c | 7 +++++--
>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>>> index 3b29ce5c0d08..7aa7914a498c 100644
>>> --- a/target/i386/kvm.c
>>> +++ b/target/i386/kvm.c
>>> @@ -940,7 +940,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>         r = kvm_arch_set_tsc_khz(cs);
>>>       if (r < 0) {
>>> -        goto fail;
>>> +        return r;
>>>       }
>>>         /* vcpu's TSC frequency is either specified by user, or following
>>> @@ -1295,7 +1295,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>               if (local_err) {
>>>                   error_report_err(local_err);
>>>                   error_free(invtsc_mig_blocker);
>>> -                return r;
>>> +                goto fail2;
>>>               }
>>>           }
>>>       }
>>> @@ -1346,6 +1346,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>      fail:
>>>       migrate_del_blocker(invtsc_mig_blocker);
>>> + fail2:
>>> +    migrate_del_blocker(vmx_mig_blocker);
>>> +
>> At the risk of being a bit pedantic...
>>
>> Your changes don't introduce this problem, but they do make it worse -- Since [vmx|invtsc]_mig_blocker are both global in scope, isn't it possible you end up deleting one or both valid blockers that were created by a previous invocation of kvm_arch_init_vcpu() ?  Seems to me that you would need something like an additional pair of local boolean variables named created_[vmx|invtsc]_mig_blocker and condition the calls to migrate_del_blocker() accordingly. On the positive side, that would simplify some of the logic around when and if it's ok to jump to "fail" (and you wouldn't need the "fail2").
>>
>> Thanks,
>> -Maran
> I actually thought about this as-well when I encountered this issue.
> In general one can argue that every vCPU should introduce it’s own migration blocker on init (if required) and remove it’s own migration blocker on deletion (on vCPU destroy).
> On 99% of the time, all of this shouldn’t matter as all vCPUs of a given VM have exactly the same properties.

The example I was thinking about is a VM that is created with a bunch of 
vCPUs -- all of which require installation of the blocker(s). Then at 
some point in the future, a failed CPU hotplug attempt wipes out all the 
pre-existing blockers and leaves the VM exposed.

But I agree that the problem wasn't introduced by this patch series and 
so it is reasonable to argue that you shouldn't have to fix it here. As 
such:

Reviewed-by: Maran Wilson <maran.wilson@oracle.com>

Thanks,
-Maran

> Anyway, I decided that this is entirely not relevant for this patch-series and therefore if this should be addressed further, let it be another unrelated patch-series.
> QEMU have too many issues to fix all at once :P. I need to filter.
>
> -Liran
>
>>>       return r;
>>>   }
>>>   
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 06/10] linux-headers: i386: Modify struct kvm_nested_state to have explicit fields for data
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 06/10] linux-headers: i386: Modify struct kvm_nested_state to have explicit fields for data Liran Alon
@ 2019-06-19 21:17   ` Maran Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Maran Wilson @ 2019-06-19 21:17 UTC (permalink / raw)
  To: Liran Alon, qemu-devel
  Cc: ehabkost, kvm, mtosatti, dgilbert, pbonzini, jmattson, rth

On 6/19/2019 9:21 AM, Liran Alon wrote:
> Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format
> of VMX nested state data in a struct.
>
> In order to avoid changing the ioctl values of
> KVM_{GET,SET}_NESTED_STATE, there is a need to preserve
> sizeof(struct kvm_nested_state). This is done by defining the data
> struct as "data.vmx[0]". It was the most elegant way I found to
> preserve struct size while still keeping struct readable and easy to
> maintain. It does have a misfortunate side-effect that now it has to be
> accessed as "data.vmx[0]" rather than just "data.vmx".
>
> Because we are already modifying these structs, I also modified the
> following:
> * Define the "format" field values as macros.
> * Rename vmcs_pa to vmcs12_pa for better readability.
>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>   linux-headers/asm-x86/kvm.h | 33 ++++++++++++++++++++++-----------
>   1 file changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> index 7a0e64ccd6ff..6e7dd792e448 100644
> --- a/linux-headers/asm-x86/kvm.h
> +++ b/linux-headers/asm-x86/kvm.h
> @@ -383,16 +383,26 @@ struct kvm_sync_regs {
>   #define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	(1 << 2)
>   #define KVM_X86_QUIRK_OUT_7E_INC_RIP	(1 << 3)
>   
> +#define KVM_STATE_NESTED_FORMAT_VMX	0
> +#define KVM_STATE_NESTED_FORMAT_SVM	1
> +
>   #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
>   #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
>   #define KVM_STATE_NESTED_EVMCS		0x00000004
>   
> +#define KVM_STATE_NESTED_VMX_VMCS_SIZE	0x1000
> +
>   #define KVM_STATE_NESTED_SMM_GUEST_MODE	0x00000001
>   #define KVM_STATE_NESTED_SMM_VMXON	0x00000002
>   
> -struct kvm_vmx_nested_state {
> +struct kvm_vmx_nested_state_data {
> +	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
> +	__u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
> +};
> +
> +struct kvm_vmx_nested_state_hdr {
>   	__u64 vmxon_pa;
> -	__u64 vmcs_pa;
> +	__u64 vmcs12_pa;
>   
>   	struct {
>   		__u16 flags;
> @@ -401,24 +411,25 @@ struct kvm_vmx_nested_state {
>   
>   /* for KVM_CAP_NESTED_STATE */
>   struct kvm_nested_state {
> -	/* KVM_STATE_* flags */
>   	__u16 flags;
> -
> -	/* 0 for VMX, 1 for SVM.  */
>   	__u16 format;
> -
> -	/* 128 for SVM, 128 + VMCS size for VMX.  */
>   	__u32 size;
>   
>   	union {
> -		/* VMXON, VMCS */
> -		struct kvm_vmx_nested_state vmx;
> +		struct kvm_vmx_nested_state_hdr vmx;
>   
>   		/* Pad the header to 128 bytes.  */
>   		__u8 pad[120];
> -	};
> +	} hdr;
>   
> -	__u8 data[0];
> +	/*
> +	 * Define data region as 0 bytes to preserve backwards-compatability
> +	 * to old definition of kvm_nested_state in order to avoid changing
> +	 * KVM_{GET,PUT}_NESTED_STATE ioctl values.
> +	 */
> +	union {
> +		struct kvm_vmx_nested_state_data vmx[0];
> +	} data;
>   };
>   
>   #endif /* _ASM_X86_KVM_H */

Reviewed-by: Maran Wilson <maran.wilson@oracle.com>

Thanks,
-Maran


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 10/10] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 10/10] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities Liran Alon
@ 2019-06-19 23:52   ` Maran Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Maran Wilson @ 2019-06-19 23:52 UTC (permalink / raw)
  To: Liran Alon, qemu-devel
  Cc: ehabkost, kvm, mtosatti, dgilbert, pbonzini, jmattson, rth

On 6/19/2019 9:21 AM, Liran Alon wrote:
> Previous commits have added support for migration of nested virtualization
> workloads. This was done by utilising two new KVM capabilities:
> KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are
> required in order to correctly migrate such workloads.
>
> Therefore, change code to add a migration blocker for vCPUs exposed with
> Intel VMX or AMD SVM in case one of these kernel capabilities is
> missing.
>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> ---
>   target/i386/kvm.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 99480a52ad33..a3d0fbed3b35 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -1313,9 +1313,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>                                     !!(c->ecx & CPUID_EXT_SMX);
>       }
>   
> -    if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) {
> +    if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker &&
> +        ((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) {
>           error_setg(&nested_virt_mig_blocker,
> -                   "Nested virtualization does not support live migration yet");
> +                   "Kernel do not provide required capabilities for "

s/do/does/

And with that change:

Reviewed-by: Maran Wilson <maran.wilson@oracle.com>

Thanks,
-Maran


> +                   "nested virtualization migration. "
> +                   "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)",
> +                   kvm_max_nested_state_length() > 0,
> +                   has_exception_payload);
>           r = migrate_add_blocker(nested_virt_mig_blocker, &local_err);
>           if (local_err) {
>               error_report_err(local_err);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state
  2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
                   ` (9 preceding siblings ...)
  2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 10/10] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities Liran Alon
@ 2019-06-20 12:38 ` Paolo Bonzini
  2019-06-20 13:28   ` Liran Alon
  10 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2019-06-20 12:38 UTC (permalink / raw)
  To: Liran Alon, qemu-devel
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, jmattson, rth

On 19/06/19 18:21, Liran Alon wrote:
> Hi,
> 
> This series aims to add support for QEMU to be able to migrate VMs that
> are running nested hypervisors. In order to do so, it utilizes the new
> IOCTLs introduced in KVM commit 8fcc4b5923af ("kvm: nVMX: Introduce
> KVM_CAP_NESTED_STATE") which was created for this purpose.

Applied with just three minor changes that should be uncontroversial:

> 6rd patch updates linux-headers to have updated struct kvm_nested_state.
> The updated struct now have explicit fields for the data portion.

Changed patch title to "linux-headers: sync with latest KVM headers from
Linux 5.2"

> 7rd patch add vmstate support for saving/restoring kernel integer types (e.g. __u16).
> 
> 8th patch adds support for saving and restoring nested state in order to migrate
> guests which run a nested hypervisor.

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index e924663f32..f3cf6e1b27 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1671,10 +1671,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
             struct kvm_vmx_nested_state_hdr *vmx_hdr =
                 &env->nested_state->hdr.vmx;

+            env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
             vmx_hdr->vmxon_pa = -1ull;
             vmx_hdr->vmcs12_pa = -1ull;
         }
-
     }

     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);

which is a no-op since KVM_STATE_NESTED_FORMAT_VMX is zero, but it's tidy.

> 9th patch add support for KVM_CAP_EXCEPTION_PAYLOAD. This new KVM capability
> allows userspace to properly distingiush between pending and injecting exceptions.
> 
> 10th patch changes the nested virtualization migration blocker to only
> be added when kernel lack support for one of the capabilities required
> for correct nested migration. i.e. Either KVM_CAP_NESTED_STATE or
> KVM_CAP_EXCEPTION_PAYLOAD.

Had to disable this for SVM unfortunately.


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state
  2019-06-20 12:38 ` [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Paolo Bonzini
@ 2019-06-20 13:28   ` Liran Alon
  2019-06-20 13:40     ` Liran Alon
  0 siblings, 1 reply; 20+ messages in thread
From: Liran Alon @ 2019-06-20 13:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, qemu-devel,
	jmattson, rth



> On 20 Jun 2019, at 15:38, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 19/06/19 18:21, Liran Alon wrote:
>> Hi,
>> 
>> This series aims to add support for QEMU to be able to migrate VMs that
>> are running nested hypervisors. In order to do so, it utilizes the new
>> IOCTLs introduced in KVM commit 8fcc4b5923af ("kvm: nVMX: Introduce
>> KVM_CAP_NESTED_STATE") which was created for this purpose.
> 
> Applied with just three minor changes that should be uncontroversial:

ACK. Where can I see the applied patches for review?

> 
>> 6rd patch updates linux-headers to have updated struct kvm_nested_state.
>> The updated struct now have explicit fields for the data portion.
> 
> Changed patch title to "linux-headers: sync with latest KVM headers from
> Linux 5.2”

ACK.

> 
>> 7rd patch add vmstate support for saving/restoring kernel integer types (e.g. __u16).
>> 
>> 8th patch adds support for saving and restoring nested state in order to migrate
>> guests which run a nested hypervisor.
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index e924663f32..f3cf6e1b27 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -1671,10 +1671,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>             struct kvm_vmx_nested_state_hdr *vmx_hdr =
>                 &env->nested_state->hdr.vmx;
> 
> +            env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
>             vmx_hdr->vmxon_pa = -1ull;
>             vmx_hdr->vmcs12_pa = -1ull;
>         }
> -
>     }
> 
>     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
> 
> which is a no-op since KVM_STATE_NESTED_FORMAT_VMX is zero, but it's tidy.

I agree. My bad. Thanks for adding this :)

> 
>> 9th patch add support for KVM_CAP_EXCEPTION_PAYLOAD. This new KVM capability
>> allows userspace to properly distingiush between pending and injecting exceptions.
>> 
>> 10th patch changes the nested virtualization migration blocker to only
>> be added when kernel lack support for one of the capabilities required
>> for correct nested migration. i.e. Either KVM_CAP_NESTED_STATE or
>> KVM_CAP_EXCEPTION_PAYLOAD.
> 
> Had to disable this for SVM unfortunately.

For backwards compatibility I assume… Sounds reasonable to me so ACK.

Even though I must say I would really like to hear your opinion about the thread I had with David Gilbert regarding QEMU’s migration backwards compatibility:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg622274.html

Thanks for the assistance pushing this forward,
-Liran




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state
  2019-06-20 13:28   ` Liran Alon
@ 2019-06-20 13:40     ` Liran Alon
  0 siblings, 0 replies; 20+ messages in thread
From: Liran Alon @ 2019-06-20 13:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, maran.wilson, mtosatti, dgilbert, qemu-devel,
	jmattson, rth



> On 20 Jun 2019, at 16:28, Liran Alon <liran.alon@oracle.com> wrote:
> 
> 
> 
>> On 20 Jun 2019, at 15:38, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> 
>> On 19/06/19 18:21, Liran Alon wrote:
>>> Hi,
>>> 
>>> This series aims to add support for QEMU to be able to migrate VMs that
>>> are running nested hypervisors. In order to do so, it utilizes the new
>>> IOCTLs introduced in KVM commit 8fcc4b5923af ("kvm: nVMX: Introduce
>>> KVM_CAP_NESTED_STATE") which was created for this purpose.
>> 
>> Applied with just three minor changes that should be uncontroversial:
> 
> ACK. Where can I see the applied patches for review?
> 
>> 
>>> 6rd patch updates linux-headers to have updated struct kvm_nested_state.
>>> The updated struct now have explicit fields for the data portion.
>> 
>> Changed patch title to "linux-headers: sync with latest KVM headers from
>> Linux 5.2”
> 
> ACK.
> 
>> 
>>> 7rd patch add vmstate support for saving/restoring kernel integer types (e.g. __u16).
>>> 
>>> 8th patch adds support for saving and restoring nested state in order to migrate
>>> guests which run a nested hypervisor.
>> 
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index e924663f32..f3cf6e1b27 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -1671,10 +1671,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>            struct kvm_vmx_nested_state_hdr *vmx_hdr =
>>                &env->nested_state->hdr.vmx;
>> 
>> +            env->nested_state->format = KVM_STATE_NESTED_FORMAT_VMX;
>>            vmx_hdr->vmxon_pa = -1ull;
>>            vmx_hdr->vmcs12_pa = -1ull;
>>        }
>> -
>>    }
>> 
>>    cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
>> 
>> which is a no-op since KVM_STATE_NESTED_FORMAT_VMX is zero, but it's tidy.
> 
> I agree. My bad. Thanks for adding this :)

Actually, I think it makes more sense to condition here on cpu_has_vmx(env) instead of IS_INTEL_CPU(env).
And also add an “else if (cpu_has_svm(env))” that sets env->nested_state->format to KVM_STATE_NESTED_FORMAT_SVM.
If you can change that when applying. :)

-Liran

> 
>> 
>>> 9th patch add support for KVM_CAP_EXCEPTION_PAYLOAD. This new KVM capability
>>> allows userspace to properly distingiush between pending and injecting exceptions.
>>> 
>>> 10th patch changes the nested virtualization migration blocker to only
>>> be added when kernel lack support for one of the capabilities required
>>> for correct nested migration. i.e. Either KVM_CAP_NESTED_STATE or
>>> KVM_CAP_EXCEPTION_PAYLOAD.
>> 
>> Had to disable this for SVM unfortunately.
> 
> For backwards compatibility I assume… Sounds reasonable to me so ACK.
> 
> Even though I must say I would really like to hear your opinion about the thread I had with David Gilbert regarding QEMU’s migration backwards compatibility:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_qemu-2Ddevel-40nongnu.org_msg622274.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=aPCucPqkbmosKyDNeWq6rNNJ4Ry4GCh4HlxnZcQvAS8&s=ZnEgQlntxSZ2cZf9nnqJa74vM3cq_yPUlTEL1pwVpUs&e=
> 
> Thanks for the assistance pushing this forward,
> -Liran
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-06-20 14:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-19 16:21 [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 01/10] target/i386: kvm: Delete VMX migration blocker on vCPU init failure Liran Alon
2019-06-19 20:30   ` Maran Wilson
2019-06-19 20:33     ` Liran Alon
2019-06-19 20:48       ` Maran Wilson
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 02/10] KVM: Introduce kvm_arch_destroy_vcpu() Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 03/10] target/i386: kvm: Use symbolic constant for #DB/#BP exception constants Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 04/10] target/i386: kvm: Re-inject #DB to guest with updated DR6 Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 05/10] target/i386: kvm: Block migration for vCPUs exposed with nested virtualization Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 06/10] linux-headers: i386: Modify struct kvm_nested_state to have explicit fields for data Liran Alon
2019-06-19 21:17   ` Maran Wilson
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 07/10] vmstate: Add support for kernel integer types Liran Alon
2019-06-19 17:37   ` Dr. David Alan Gilbert
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 08/10] target/i386: kvm: Add support for save and restore nested state Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 09/10] target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOAD Liran Alon
2019-06-19 16:21 ` [Qemu-devel] [QEMU PATCH v4 10/10] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities Liran Alon
2019-06-19 23:52   ` Maran Wilson
2019-06-20 12:38 ` [Qemu-devel] [QEMU PATCH v4 0/10]: target/i386: kvm: Add support for save and restore of nested state Paolo Bonzini
2019-06-20 13:28   ` Liran Alon
2019-06-20 13:40     ` Liran Alon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).