[PATCH 0/2]: KVM: i386: Add support for save and restore nested state

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2]: KVM: i386: Add support for save and restore nested state
@ 2018-09-14  0:38 ` Liran Alon
  0 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-14  0:38 UTC (permalink / raw)
  To: qemu-devel, rth, ehabkost; +Cc: pbonzini, idan.brown, mtosatti, kvm, jmattson

Hi,

This series aims to add support for QEMU to be able to migrate VMs that
are running nested hypervisors. In order to do so, it utilizes the new
IOCTLs introduced in KVM commit 8fcc4b5923af ("kvm: nVMX: Introduce
KVM_CAP_NESTED_STATE") which were created for this purpose.

1st patch is not really related to the goal of the patch series. It just
makes CPUX86State->xsave_buf to be compiled only when needed (When
compiling with KVM or HVF CPU accelerator).

2nd patch adds the support to migrate VMs that are running nested
hypervisors.

Regards,
-Liran

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 0/2]: KVM: i386: Add support for save and restore nested state
@ 2018-09-14  0:38 ` Liran Alon
  0 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-14  0:38 UTC (permalink / raw)
  To: qemu-devel, rth, ehabkost; +Cc: pbonzini, mtosatti, kvm, jmattson, idan.brown

Hi,

This series aims to add support for QEMU to be able to migrate VMs that
are running nested hypervisors. In order to do so, it utilizes the new
IOCTLs introduced in KVM commit 8fcc4b5923af ("kvm: nVMX: Introduce
KVM_CAP_NESTED_STATE") which were created for this purpose.

1st patch is not really related to the goal of the patch series. It just
makes CPUX86State->xsave_buf to be compiled only when needed (When
compiling with KVM or HVF CPU accelerator).

2nd patch adds the support to migrate VMs that are running nested
hypervisors.

Regards,
-Liran

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] i386: Compile CPUX86State xsave_buf only when support KVM or HVF
  2018-09-14  0:38 ` [Qemu-devel] " Liran Alon
@ 2018-09-14  0:38   ` Liran Alon
  -1 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-14  0:38 UTC (permalink / raw)
  To: qemu-devel, rth, ehabkost
  Cc: idan.brown, kvm, mtosatti, Liran Alon, pbonzini, jmattson

While at it, also rename var to indicate it is not used only in KVM.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/cpu.h         | 4 +++-
 target/i386/hvf/README.md | 2 +-
 target/i386/hvf/hvf.c     | 2 +-
 target/i386/hvf/x86hvf.c  | 4 ++--
 target/i386/kvm.c         | 6 +++---
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index b572a8e4aa41..6e4c2b02f947 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1327,7 +1327,9 @@ typedef struct CPUX86State {
     bool tsc_valid;
     int64_t tsc_khz;
     int64_t user_tsc_khz; /* for sanity check only */
-    void *kvm_xsave_buf;
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
+    void *xsave_buf;
+#endif
 #if defined(CONFIG_HVF)
     HVFX86EmulatorState *hvf_emul;
 #endif
diff --git a/target/i386/hvf/README.md b/target/i386/hvf/README.md
index 0d27a0d52b58..2d33477aca50 100644
--- a/target/i386/hvf/README.md
+++ b/target/i386/hvf/README.md
@@ -2,6 +2,6 @@
 
 These sources (and ../hvf-all.c) are adapted from Veertu Inc's vdhh (Veertu Desktop Hosted Hypervisor) (last known location: https://github.com/veertuinc/vdhh) with some minor changes, the most significant of which were:
 
-1. Adapt to our current QEMU's `CPUState` structure and `address_space_rw` API; many struct members have been moved around (emulated x86 state, kvm_xsave_buf) due to historical differences + QEMU needing to handle more emulation targets.
+1. Adapt to our current QEMU's `CPUState` structure and `address_space_rw` API; many struct members have been moved around (emulated x86 state, xsave_buf) due to historical differences + QEMU needing to handle more emulation targets.
 2. Removal of `apic_page` and hyperv-related functionality.
 3. More relaxed use of `qemu_mutex_lock_iothread`.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index df69e6d0a7af..5db167df98e6 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -587,7 +587,7 @@ int hvf_init_vcpu(CPUState *cpu)
     hvf_reset_vcpu(cpu);
 
     x86cpu = X86_CPU(cpu);
-    x86cpu->env.kvm_xsave_buf = qemu_memalign(4096, 4096);
+    x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 
     hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
     hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index 6c88939b968b..df8e946fbcde 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -75,7 +75,7 @@ void hvf_put_xsave(CPUState *cpu_state)
 
     struct X86XSaveArea *xsave;
 
-    xsave = X86_CPU(cpu_state)->env.kvm_xsave_buf;
+    xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 
@@ -163,7 +163,7 @@ void hvf_get_xsave(CPUState *cpu_state)
 {
     struct X86XSaveArea *xsave;
 
-    xsave = X86_CPU(cpu_state)->env.kvm_xsave_buf;
+    xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
     if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
         abort();
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 0b2a07d3a47b..c1cd8c461fe4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1189,7 +1189,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     }
 
     if (has_xsave) {
-        env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
+        env->xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
     }
     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
 
@@ -1639,7 +1639,7 @@ ASSERT_OFFSET(XSAVE_PKRU, pkru_state);
 static int kvm_put_xsave(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
-    X86XSaveArea *xsave = env->kvm_xsave_buf;
+    X86XSaveArea *xsave = env->xsave_buf;
 
     if (!has_xsave) {
         return kvm_put_fpu(cpu);
@@ -2081,7 +2081,7 @@ static int kvm_get_fpu(X86CPU *cpu)
 static int kvm_get_xsave(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
-    X86XSaveArea *xsave = env->kvm_xsave_buf;
+    X86XSaveArea *xsave = env->xsave_buf;
     int ret;
 
     if (!has_xsave) {
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/2] i386: Compile CPUX86State xsave_buf only when support KVM or HVF
@ 2018-09-14  0:38   ` Liran Alon
  0 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-14  0:38 UTC (permalink / raw)
  To: qemu-devel, rth, ehabkost
  Cc: pbonzini, mtosatti, kvm, jmattson, idan.brown, Liran Alon

While at it, also rename var to indicate it is not used only in KVM.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 target/i386/cpu.h         | 4 +++-
 target/i386/hvf/README.md | 2 +-
 target/i386/hvf/hvf.c     | 2 +-
 target/i386/hvf/x86hvf.c  | 4 ++--
 target/i386/kvm.c         | 6 +++---
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index b572a8e4aa41..6e4c2b02f947 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1327,7 +1327,9 @@ typedef struct CPUX86State {
     bool tsc_valid;
     int64_t tsc_khz;
     int64_t user_tsc_khz; /* for sanity check only */
-    void *kvm_xsave_buf;
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
+    void *xsave_buf;
+#endif
 #if defined(CONFIG_HVF)
     HVFX86EmulatorState *hvf_emul;
 #endif
diff --git a/target/i386/hvf/README.md b/target/i386/hvf/README.md
index 0d27a0d52b58..2d33477aca50 100644
--- a/target/i386/hvf/README.md
+++ b/target/i386/hvf/README.md
@@ -2,6 +2,6 @@
 
 These sources (and ../hvf-all.c) are adapted from Veertu Inc's vdhh (Veertu Desktop Hosted Hypervisor) (last known location: https://github.com/veertuinc/vdhh) with some minor changes, the most significant of which were:
 
-1. Adapt to our current QEMU's `CPUState` structure and `address_space_rw` API; many struct members have been moved around (emulated x86 state, kvm_xsave_buf) due to historical differences + QEMU needing to handle more emulation targets.
+1. Adapt to our current QEMU's `CPUState` structure and `address_space_rw` API; many struct members have been moved around (emulated x86 state, xsave_buf) due to historical differences + QEMU needing to handle more emulation targets.
 2. Removal of `apic_page` and hyperv-related functionality.
 3. More relaxed use of `qemu_mutex_lock_iothread`.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index df69e6d0a7af..5db167df98e6 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -587,7 +587,7 @@ int hvf_init_vcpu(CPUState *cpu)
     hvf_reset_vcpu(cpu);
 
     x86cpu = X86_CPU(cpu);
-    x86cpu->env.kvm_xsave_buf = qemu_memalign(4096, 4096);
+    x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 
     hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
     hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index 6c88939b968b..df8e946fbcde 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -75,7 +75,7 @@ void hvf_put_xsave(CPUState *cpu_state)
 
     struct X86XSaveArea *xsave;
 
-    xsave = X86_CPU(cpu_state)->env.kvm_xsave_buf;
+    xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 
@@ -163,7 +163,7 @@ void hvf_get_xsave(CPUState *cpu_state)
 {
     struct X86XSaveArea *xsave;
 
-    xsave = X86_CPU(cpu_state)->env.kvm_xsave_buf;
+    xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
     if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
         abort();
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 0b2a07d3a47b..c1cd8c461fe4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1189,7 +1189,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
     }
 
     if (has_xsave) {
-        env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
+        env->xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
     }
     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
 
@@ -1639,7 +1639,7 @@ ASSERT_OFFSET(XSAVE_PKRU, pkru_state);
 static int kvm_put_xsave(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
-    X86XSaveArea *xsave = env->kvm_xsave_buf;
+    X86XSaveArea *xsave = env->xsave_buf;
 
     if (!has_xsave) {
         return kvm_put_fpu(cpu);
@@ -2081,7 +2081,7 @@ static int kvm_get_fpu(X86CPU *cpu)
 static int kvm_get_xsave(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
-    X86XSaveArea *xsave = env->kvm_xsave_buf;
+    X86XSaveArea *xsave = env->xsave_buf;
     int ret;
 
     if (!has_xsave) {
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-14  0:38 ` [Qemu-devel] " Liran Alon
@ 2018-09-14  0:38   ` Liran Alon
  -1 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-14  0:38 UTC (permalink / raw)
  To: qemu-devel, rth, ehabkost
  Cc: idan.brown, kvm, mtosatti, Liran Alon, pbonzini, jmattson

Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
introduced new IOCTLs to extract and restore KVM internal state used to
run a VM that is in VMX operation.

Utilize these IOCTLs to add support of migration of VMs which are
running nested hypervisors.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 accel/kvm/kvm-all.c   | 14 +++++++++++++
 include/sysemu/kvm.h  |  1 +
 target/i386/cpu.h     |  2 ++
 target/i386/kvm.c     | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/machine.c | 38 +++++++++++++++++++++++++++++++++
 5 files changed, 113 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index de12f78eb8e4..3f0d0dd741b9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -87,6 +87,7 @@ struct KVMState
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
+    uint32_t nested_state_len;
     int many_ioeventfds;
     int intx_set_mask;
     bool sync_mmu;
@@ -1628,6 +1629,14 @@ static int kvm_init(MachineState *ms)
     s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
 #endif
 
+    ret = kvm_check_extension(s, KVM_CAP_NESTED_STATE);
+    if (ret < 0) {
+        fprintf(stderr, "kvm failed to get size of nested state (%d)",
+                ret);
+        goto err;
+    }
+    s->nested_state_len = (uint32_t)ret;
+
 #ifdef KVM_CAP_IRQ_ROUTING
     kvm_direct_msi_allowed = (kvm_check_extension(s, KVM_CAP_SIGNAL_MSI) > 0);
 #endif
@@ -2187,6 +2196,11 @@ int kvm_has_debugregs(void)
     return kvm_state->debugregs;
 }
 
+uint32_t kvm_nested_state_length(void)
+{
+    return kvm_state->nested_state_len;
+}
+
 int kvm_has_many_ioeventfds(void)
 {
     if (!kvm_enabled()) {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0b64b8e06786..17dadaf57f40 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -210,6 +210,7 @@ bool kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
+uint32_t kvm_nested_state_length(void);
 int kvm_has_pit_state2(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6e4c2b02f947..3b97b5b280f0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1330,6 +1330,8 @@ typedef struct CPUX86State {
 #if defined(CONFIG_KVM) || defined(CONFIG_HVF)
     void *xsave_buf;
 #endif
+    struct kvm_nested_state *nested_state;
+    uint32_t nested_state_len;   /* needed for migration */
 #if defined(CONFIG_HVF)
     HVFX86EmulatorState *hvf_emul;
 #endif
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index c1cd8c461fe4..8160ab55909a 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1191,6 +1191,22 @@ int kvm_arch_init_vcpu(CPUState *cs)
     if (has_xsave) {
         env->xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
     }
+
+    env->nested_state_len = kvm_nested_state_length();
+    if (env->nested_state_len > 0) {
+        uint32_t min_nested_state_len =
+            offsetof(struct kvm_nested_state, size) + sizeof(uint32_t);
+
+        /*
+         * Verify nested state length cover at least the size
+         * field of struct kvm_nested_state
+         */
+        assert(env->nested_state_len >= min_nested_state_len);
+
+        env->nested_state = g_malloc0(env->nested_state_len);
+        env->nested_state->size = env->nested_state_len;
+    }
+
     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
 
     if (!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
@@ -2867,6 +2883,39 @@ static int kvm_get_debugregs(X86CPU *cpu)
     return 0;
 }
 
+static int kvm_put_nested_state(X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    if (kvm_nested_state_length() == 0) {
+        return 0;
+    }
+
+    assert(env->nested_state->size <= env->nested_state_len);
+    return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_NESTED_STATE, env->nested_state);
+}
+
+static int kvm_get_nested_state(X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    if (kvm_nested_state_length() == 0) {
+        return 0;
+    }
+
+
+    /*
+     * It is possible that migration restored a smaller size into
+     * nested_state->size than what our kernel support.
+     * We preserve migration origin nested_state->size for
+     * call to KVM_SET_NESTED_STATE but wish that our next call
+     * to KVM_GET_NESTED_STATE will use max size our kernel support.
+     */
+    env->nested_state->size = env->nested_state_len;
+
+    return kvm_vcpu_ioctl(CPU(cpu), KVM_GET_NESTED_STATE, env->nested_state);
+}
+
 int kvm_arch_put_registers(CPUState *cpu, int level)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -2874,6 +2923,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    ret = kvm_put_nested_state(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_msr_feature_control(x86_cpu);
         if (ret < 0) {
@@ -2989,6 +3043,10 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
+    ret = kvm_get_nested_state(cpu);
+    if (ret < 0) {
+        goto out;
+    }
     ret = 0;
  out:
     cpu_sync_bndcs_hflags(&cpu->env);
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 084c2c73a8f7..394953de8d1c 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -842,6 +842,43 @@ static const VMStateDescription vmstate_tsc_khz = {
     }
 };
 
+static int nested_state_post_load(void *opaque, int version_id)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * Verify that the size specified in given struct is set
+     * to no more than the size that our kernel support
+     */
+    if (env->nested_state->size > env->nested_state_len) {
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static bool nested_state_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+    return (env->nested_state_len > 0);
+}
+
+static const VMStateDescription vmstate_nested_state = {
+    .name = "cpu/nested_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = nested_state_post_load,
+    .needed = nested_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_VBUFFER_UINT32(env.nested_state, X86CPU,
+                               0, NULL,
+                               env.nested_state_len),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static bool mcg_ext_ctl_needed(void *opaque)
 {
     X86CPU *cpu = opaque;
@@ -1080,6 +1117,7 @@ VMStateDescription vmstate_x86_cpu = {
         &vmstate_msr_intel_pt,
         &vmstate_msr_virt_ssbd,
         &vmstate_svm_npt,
+        &vmstate_nested_state,
         NULL
     }
 };
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/2] KVM: i386: Add support for save and restore nested state
@ 2018-09-14  0:38   ` Liran Alon
  0 siblings, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-14  0:38 UTC (permalink / raw)
  To: qemu-devel, rth, ehabkost
  Cc: pbonzini, mtosatti, kvm, jmattson, idan.brown, Liran Alon

Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
introduced new IOCTLs to extract and restore KVM internal state used to
run a VM that is in VMX operation.

Utilize these IOCTLs to add support of migration of VMs which are
running nested hypervisors.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 accel/kvm/kvm-all.c   | 14 +++++++++++++
 include/sysemu/kvm.h  |  1 +
 target/i386/cpu.h     |  2 ++
 target/i386/kvm.c     | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/machine.c | 38 +++++++++++++++++++++++++++++++++
 5 files changed, 113 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index de12f78eb8e4..3f0d0dd741b9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -87,6 +87,7 @@ struct KVMState
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     struct kvm_sw_breakpoint_head kvm_sw_breakpoints;
 #endif
+    uint32_t nested_state_len;
     int many_ioeventfds;
     int intx_set_mask;
     bool sync_mmu;
@@ -1628,6 +1629,14 @@ static int kvm_init(MachineState *ms)
     s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
 #endif
 
+    ret = kvm_check_extension(s, KVM_CAP_NESTED_STATE);
+    if (ret < 0) {
+        fprintf(stderr, "kvm failed to get size of nested state (%d)",
+                ret);
+        goto err;
+    }
+    s->nested_state_len = (uint32_t)ret;
+
 #ifdef KVM_CAP_IRQ_ROUTING
     kvm_direct_msi_allowed = (kvm_check_extension(s, KVM_CAP_SIGNAL_MSI) > 0);
 #endif
@@ -2187,6 +2196,11 @@ int kvm_has_debugregs(void)
     return kvm_state->debugregs;
 }
 
+uint32_t kvm_nested_state_length(void)
+{
+    return kvm_state->nested_state_len;
+}
+
 int kvm_has_many_ioeventfds(void)
 {
     if (!kvm_enabled()) {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0b64b8e06786..17dadaf57f40 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -210,6 +210,7 @@ bool kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
+uint32_t kvm_nested_state_length(void);
 int kvm_has_pit_state2(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6e4c2b02f947..3b97b5b280f0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1330,6 +1330,8 @@ typedef struct CPUX86State {
 #if defined(CONFIG_KVM) || defined(CONFIG_HVF)
     void *xsave_buf;
 #endif
+    struct kvm_nested_state *nested_state;
+    uint32_t nested_state_len;   /* needed for migration */
 #if defined(CONFIG_HVF)
     HVFX86EmulatorState *hvf_emul;
 #endif
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index c1cd8c461fe4..8160ab55909a 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1191,6 +1191,22 @@ int kvm_arch_init_vcpu(CPUState *cs)
     if (has_xsave) {
         env->xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
     }
+
+    env->nested_state_len = kvm_nested_state_length();
+    if (env->nested_state_len > 0) {
+        uint32_t min_nested_state_len =
+            offsetof(struct kvm_nested_state, size) + sizeof(uint32_t);
+
+        /*
+         * Verify nested state length cover at least the size
+         * field of struct kvm_nested_state
+         */
+        assert(env->nested_state_len >= min_nested_state_len);
+
+        env->nested_state = g_malloc0(env->nested_state_len);
+        env->nested_state->size = env->nested_state_len;
+    }
+
     cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
 
     if (!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
@@ -2867,6 +2883,39 @@ static int kvm_get_debugregs(X86CPU *cpu)
     return 0;
 }
 
+static int kvm_put_nested_state(X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    if (kvm_nested_state_length() == 0) {
+        return 0;
+    }
+
+    assert(env->nested_state->size <= env->nested_state_len);
+    return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_NESTED_STATE, env->nested_state);
+}
+
+static int kvm_get_nested_state(X86CPU *cpu)
+{
+    CPUX86State *env = &cpu->env;
+
+    if (kvm_nested_state_length() == 0) {
+        return 0;
+    }
+
+
+    /*
+     * It is possible that migration restored a smaller size into
+     * nested_state->size than what our kernel support.
+     * We preserve migration origin nested_state->size for
+     * call to KVM_SET_NESTED_STATE but wish that our next call
+     * to KVM_GET_NESTED_STATE will use max size our kernel support.
+     */
+    env->nested_state->size = env->nested_state_len;
+
+    return kvm_vcpu_ioctl(CPU(cpu), KVM_GET_NESTED_STATE, env->nested_state);
+}
+
 int kvm_arch_put_registers(CPUState *cpu, int level)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -2874,6 +2923,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    ret = kvm_put_nested_state(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_msr_feature_control(x86_cpu);
         if (ret < 0) {
@@ -2989,6 +3043,10 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
+    ret = kvm_get_nested_state(cpu);
+    if (ret < 0) {
+        goto out;
+    }
     ret = 0;
  out:
     cpu_sync_bndcs_hflags(&cpu->env);
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 084c2c73a8f7..394953de8d1c 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -842,6 +842,43 @@ static const VMStateDescription vmstate_tsc_khz = {
     }
 };
 
+static int nested_state_post_load(void *opaque, int version_id)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    /*
+     * Verify that the size specified in given struct is set
+     * to no more than the size that our kernel support
+     */
+    if (env->nested_state->size > env->nested_state_len) {
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static bool nested_state_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+    return (env->nested_state_len > 0);
+}
+
+static const VMStateDescription vmstate_nested_state = {
+    .name = "cpu/nested_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = nested_state_post_load,
+    .needed = nested_state_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_VBUFFER_UINT32(env.nested_state, X86CPU,
+                               0, NULL,
+                               env.nested_state_len),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static bool mcg_ext_ctl_needed(void *opaque)
 {
     X86CPU *cpu = opaque;
@@ -1080,6 +1117,7 @@ VMStateDescription vmstate_x86_cpu = {
         &vmstate_msr_intel_pt,
         &vmstate_msr_virt_ssbd,
         &vmstate_svm_npt,
+        &vmstate_nested_state,
         NULL
     }
 };
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-14  0:38   ` [Qemu-devel] " Liran Alon
@ 2018-09-14  7:16     ` Paolo Bonzini
  -1 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2018-09-14  7:16 UTC (permalink / raw)
  To: Liran Alon, qemu-devel, rth, ehabkost; +Cc: idan.brown, mtosatti, kvm, jmattson

On 14/09/2018 02:38, Liran Alon wrote:
> Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> introduced new IOCTLs to extract and restore KVM internal state used to
> run a VM that is in VMX operation.
> 
> Utilize these IOCTLs to add support of migration of VMs which are
> running nested hypervisors.
> 
> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>

Heh, I was going to send a similar patch.  However, things are a bit
more complex for two reason.

First, I'd prefer to reuse the hflags and hflags2 fields that we already
have, and only store the VMCS blob in the subsection.  For example,
HF_SVMI_MASK is really the same as HF_GUEST_MASK in KVM source code and
KVM_STATE_NESTED_GUEST_MODE in the nested virt state.

More important, this:

>   
> +static int nested_state_post_load(void *opaque, int version_id) 
> +{ 
> +    X86CPU *cpu = opaque; 
> +    CPUX86State *env = &cpu->env; 
> + 
> +    /* 
> +     * Verify that the size specified in given struct is set 
> +     * to no more than the size that our kernel support 
> +     */ 
> +    if (env->nested_state->size > env->nested_state_len) { 
> +        return -EINVAL; 
> +    } 
> + 
> +    return 0; 
> +} 
> + 
> +static bool nested_state_needed(void *opaque) 

doesn't work if nested_state_len differs between source and destination,
and could overflow the nested_state buffer if nested_state_len is larger
on the source.

I'll send my version today or next Monday.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] KVM: i386: Add support for save and restore nested state
@ 2018-09-14  7:16     ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2018-09-14  7:16 UTC (permalink / raw)
  To: Liran Alon, qemu-devel, rth, ehabkost; +Cc: mtosatti, kvm, jmattson, idan.brown

On 14/09/2018 02:38, Liran Alon wrote:
> Kernel commit 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> introduced new IOCTLs to extract and restore KVM internal state used to
> run a VM that is in VMX operation.
> 
> Utilize these IOCTLs to add support of migration of VMs which are
> running nested hypervisors.
> 
> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>

Heh, I was going to send a similar patch.  However, things are a bit
more complex for two reason.

First, I'd prefer to reuse the hflags and hflags2 fields that we already
have, and only store the VMCS blob in the subsection.  For example,
HF_SVMI_MASK is really the same as HF_GUEST_MASK in KVM source code and
KVM_STATE_NESTED_GUEST_MODE in the nested virt state.

More important, this:

>   
> +static int nested_state_post_load(void *opaque, int version_id) 
> +{ 
> +    X86CPU *cpu = opaque; 
> +    CPUX86State *env = &cpu->env; 
> + 
> +    /* 
> +     * Verify that the size specified in given struct is set 
> +     * to no more than the size that our kernel support 
> +     */ 
> +    if (env->nested_state->size > env->nested_state_len) { 
> +        return -EINVAL; 
> +    } 
> + 
> +    return 0; 
> +} 
> + 
> +static bool nested_state_needed(void *opaque) 

doesn't work if nested_state_len differs between source and destination,
and could overflow the nested_state buffer if nested_state_len is larger
on the source.

I'll send my version today or next Monday.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-15 20:48       ` Liran Alon
  2018-09-15 20:57         ` Liran Alon
@ 2018-09-17 14:35         ` Paolo Bonzini
  1 sibling, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2018-09-17 14:35 UTC (permalink / raw)
  To: Liran Alon
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson

On 15/09/2018 22:48, Liran Alon wrote:
> 
> 
>> On 14 Sep 2018, at 18:08, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 14/09/2018 16:31, Liran Alon wrote:
>>>> There is still a problem, however, in that the same input stream would
>>>> be parsed differently depending on the kernel version.  In particular,
>>>> if in the future the maximum nested state size grows, you break all
>>>> existing save files.
>>>
>>> I’m not sure I agree with this.
>>> 1) Newer kernels should change struct only by utilizing unused fields in current struct
>>> or enlarging it with extra fields. It should not change the meaning of existing fields.
>>
>> Newer kernels will return a larger size, which is stored in
>> env->nested_state_len, and the file format depends on it:
>>
>>> +        VMSTATE_VBUFFER_UINT32(env.nested_state, X86CPU, 
>>> +                               0, NULL, 
>>> +                               env.nested_state_len), 
>>
> 
> Oh. I thought that QEMU will just receive to dest buffer only what was sent from source buffer.
> I didn’t know that it also enforces that the sizes of the source and dest buffer are equal.
> (I thought that dest_buffer_size only needed to be >= src_buffer_size).

It's much less smarter than that. :)  If env.nested_state_len is 12345,
it reads 12345 bytes.  There's no attempt at type checking.

> Is there a simple way to do this in QEMU’s VMSTATE framework without implementing custom save/load callbacks?

You can store the source code size and use VMSTATE_VALIDATE to test it
against the KVM capability.


>> I agree that the value is small, but it's there.  For example, the
>> debugging commands support AMD nested paging, and sharing the flags
>> means that it works for KVM too, not just TCG.  So I'd prefer to do it
>> the same way for Intel too.
> 
> Can you explain this example? I’m not sure I follow.
> In the solution I proposed above, the nested_state fixed-size header is also shared between hypervisors.
> Thus, flags here are shared exactly the same as hflags are shared.
> Only the blob union-part is not necessarely shared between hypervisors (Which makes sense as it may differ). 
> Am I missing something trivial?

There is already a "hypervisor" (TCG) that implements AMD nested paging,
so it makes sense to reuse the place that TCG uses for its hflags.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-15 20:48       ` Liran Alon
@ 2018-09-15 20:57         ` Liran Alon
  2018-09-17 14:35         ` Paolo Bonzini
  1 sibling, 0 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-15 20:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson



> On 15 Sep 2018, at 23:48, Liran Alon <liran.alon@oracle.com> wrote:
> 
> 
> 
>> On 14 Sep 2018, at 18:08, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> 
>> On 14/09/2018 16:31, Liran Alon wrote:
>>>> There is still a problem, however, in that the same input stream would
>>>> be parsed differently depending on the kernel version.  In particular,
>>>> if in the future the maximum nested state size grows, you break all
>>>> existing save files.
>>> 
>>> I’m not sure I agree with this.
>>> 1) Newer kernels should change struct only by utilizing unused fields in current struct
>>> or enlarging it with extra fields. It should not change the meaning of existing fields.
>> 
>> Newer kernels will return a larger size, which is stored in
>> env->nested_state_len, and the file format depends on it:
>> 
>>> +        VMSTATE_VBUFFER_UINT32(env.nested_state, X86CPU, 
>>> +                               0, NULL, 
>>> +                               env.nested_state_len), 
>> 
> 
> Oh. I thought that QEMU will just receive to dest buffer only what was sent from source buffer.
> I didn’t know that it also enforces that the sizes of the source and dest buffer are equal.
> (I thought that dest_buffer_size only needed to be >= src_buffer_size).
> 
> Anyway, my intention here was that QEMU will only enforce (dest_buffer_size >= source_buffer_size)
> and if so, receive source buffer into destination buffer.
> Is there a simple way to do this in QEMU’s VMSTATE framework without implementing custom save/load callbacks?
> 

One possible, but not so elegant, way to do so is to put env.nested_state_len as part of VMState
and change env.nested_state to be declared in VMState as VMSTATE_VBUFFER_ALLOC_UINT32().
Then, make nested_state_post_load() to realloc env.nested_state in size specified by kvm_nested_state_length().
To guarantee that env.nested_state is always at max size that local kernel supports.

-Liran

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-14 15:08     ` Paolo Bonzini
@ 2018-09-15 20:48       ` Liran Alon
  2018-09-15 20:57         ` Liran Alon
  2018-09-17 14:35         ` Paolo Bonzini
  0 siblings, 2 replies; 15+ messages in thread
From: Liran Alon @ 2018-09-15 20:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson



> On 14 Sep 2018, at 18:08, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 14/09/2018 16:31, Liran Alon wrote:
>>> There is still a problem, however, in that the same input stream would
>>> be parsed differently depending on the kernel version.  In particular,
>>> if in the future the maximum nested state size grows, you break all
>>> existing save files.
>> 
>> I’m not sure I agree with this.
>> 1) Newer kernels should change struct only by utilizing unused fields in current struct
>> or enlarging it with extra fields. It should not change the meaning of existing fields.
> 
> Newer kernels will return a larger size, which is stored in
> env->nested_state_len, and the file format depends on it:
> 
>> +        VMSTATE_VBUFFER_UINT32(env.nested_state, X86CPU, 
>> +                               0, NULL, 
>> +                               env.nested_state_len), 
> 

Oh. I thought that QEMU will just receive to dest buffer only what was sent from source buffer.
I didn’t know that it also enforces that the sizes of the source and dest buffer are equal.
(I thought that dest_buffer_size only needed to be >= src_buffer_size).

Anyway, my intention here was that QEMU will only enforce (dest_buffer_size >= source_buffer_size)
and if so, receive source buffer into destination buffer.
Is there a simple way to do this in QEMU’s VMSTATE framework without implementing custom save/load callbacks?

>>> 
>>> By defining more HF flags.  I'd rather avoid having multiple ways to
>>> store the same thing, in case for example in the future HVF implements
>>> nested virt.
>> 
>> I agree it is possible to define more hflags and to translate kvm_nested_flags->flags to flags in hflags and vice-versa.
>> However, I am not sure I understand the extra value provided by doing so.
>> I think it makes more sense to rename struct kvm_nested_state to struct nested_state and
>> use this as a generic interface to get/set nested_state for all hypervisors.
>> If a given hypervisor, such as HVF, needs to store different blob than KVM VMX, it can just add another struct field to
>> the union-part of struct kvm_nested_state.
>> This way, the code that handles the save/restore of nested_state can remain generic for all hypervisors.
> 
> I agree that the value is small, but it's there.  For example, the
> debugging commands support AMD nested paging, and sharing the flags
> means that it works for KVM too, not just TCG.  So I'd prefer to do it
> the same way for Intel too.

Can you explain this example? I’m not sure I follow.
In the solution I proposed above, the nested_state fixed-size header is also shared between hypervisors.
Thus, flags here are shared exactly the same as hflags are shared.
Only the blob union-part is not necessarely shared between hypervisors (Which makes sense as it may differ). 
Am I missing something trivial?

-Liran

> 
> Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-14 14:31   ` Liran Alon
@ 2018-09-14 15:08     ` Paolo Bonzini
  2018-09-15 20:48       ` Liran Alon
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2018-09-14 15:08 UTC (permalink / raw)
  To: Liran Alon
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson

On 14/09/2018 16:31, Liran Alon wrote:
>> There is still a problem, however, in that the same input stream would
>> be parsed differently depending on the kernel version.  In particular,
>> if in the future the maximum nested state size grows, you break all
>> existing save files.
> 
> I’m not sure I agree with this.
> 1) Newer kernels should change struct only by utilizing unused fields in current struct
> or enlarging it with extra fields. It should not change the meaning of existing fields.

Newer kernels will return a larger size, which is stored in
env->nested_state_len, and the file format depends on it:

> +        VMSTATE_VBUFFER_UINT32(env.nested_state, X86CPU, 
> +                               0, NULL, 
> +                               env.nested_state_len), 

>>
>> By defining more HF flags.  I'd rather avoid having multiple ways to
>> store the same thing, in case for example in the future HVF implements
>> nested virt.
> 
> I agree it is possible to define more hflags and to translate kvm_nested_flags->flags to flags in hflags and vice-versa.
> However, I am not sure I understand the extra value provided by doing so.
> I think it makes more sense to rename struct kvm_nested_state to struct nested_state and
> use this as a generic interface to get/set nested_state for all hypervisors.
> If a given hypervisor, such as HVF, needs to store different blob than KVM VMX, it can just add another struct field to
> the union-part of struct kvm_nested_state.
> This way, the code that handles the save/restore of nested_state can remain generic for all hypervisors.

I agree that the value is small, but it's there.  For example, the
debugging commands support AMD nested paging, and sharing the flags
means that it works for KVM too, not just TCG.  So I'd prefer to do it
the same way for Intel too.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-14 10:59 ` Paolo Bonzini
@ 2018-09-14 14:31   ` Liran Alon
  2018-09-14 15:08     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Liran Alon @ 2018-09-14 14:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson


> On 14 Sep 2018, at 13:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 14/09/2018 11:54, Liran Alon wrote:
>>> On 14/09/2018 09:16, Paolo Bonzini wrote:
>>> Heh, I was going to send a similar patch.  However, things are a bit
>>> more complex for two reason.
>>> 
>>> First, I'd prefer to reuse the hflags and hflags2 fields that we already
>>> have, and only store the VMCS blob in the subsection.  For example,
>>> HF_SVMI_MASK is really the same as HF_GUEST_MASK in KVM source code and
>>> KVM_STATE_NESTED_GUEST_MODE in the nested virt state.
>>> 
>> 
>> Do you mean you intend to only save/restore the “vmx” field in struct kvm_nested_state?
>> (That is, struct kvm_vmx_nested_state)
>> If yes, that is problematic as kvm_nested_state.flags also hold other flags besides KVM_STATE_NESTED_GUEST_MODE.
>> How do you expect to save/restore for example the vmx->nested.nested_run_pending flag that is specified in KVM_STATE_NESTED_RUN_PENDING?
> 
> By defining more HF flags.  I'd rather avoid having multiple ways to
> store the same thing, in case for example in the future HVF implements
> nested virt.

I agree it is possible to define more hflags and to translate kvm_nested_flags->flags to flags in hflags and vice-versa.
However, I am not sure I understand the extra value provided by doing so.
I think it makes more sense to rename struct kvm_nested_state to struct nested_state and
use this as a generic interface to get/set nested_state for all hypervisors.
If a given hypervisor, such as HVF, needs to store different blob than KVM VMX, it can just add another struct field to
the union-part of struct kvm_nested_state.
This way, the code that handles the save/restore of nested_state can remain generic for all hypervisors.

> 
>> In addition, why is it important to avoid save/restore the entire kvm_nested_state struct?
>> It seems to simplify things to just save/restore the entire struct.
> 
> It is indeed simpler, but it adds unnecessary KVM specificities.

I’m not sure I understand this argument.

> 
>>>> +static int nested_state_post_load(void *opaque, int version_id)
>>>> +{
>>>> +    X86CPU *cpu = opaque;
>>>> +    CPUX86State *env = &cpu->env;
>>>> +
>>>> +    /*
>>>> +     * Verify that the size specified in given struct is set
>>>> +     * to no more than the size that our kernel support
>>>> +     */
>>>> +    if (env->nested_state->size > env->nested_state_len) {
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static bool nested_state_needed(void *opaque)
>>> 
>>> doesn't work if nested_state_len differs between source and destination,
>>> and could overflow the nested_state buffer if nested_state_len is larger
>>> on the source.
>> 
>> This is not accurate.
>> I have actually given a lot of thought to this aspect in the patch.
>> 
>> The above post_load() method actually prevents an overflow to happen on dest.
>> Note that env->nested_state_len is not passed as part of migration stream.
>> It is only set by local kernel KVM_CAP_NESTED_STATE.
>> 
>> Therefore, we allow the following migrations to execute successfully:
>> 1) Migration from a source with smaller KVM_CAP_NESTED_STATE to dest with a bigger one.
>> The above post_load() check will succeed as size specified in migrated nested_state->size
>> is smaller than dest KVM_CAP_NESTED_STATE (stored in env->nested_state_len).
>> 2) Migration from source to dest when they both have same KVM_CAP_NESTED_STATE size.
>> Obvious.
>> 3) Migration from source to dest when source have a bigger KVM_CAP_NESTED_STATE than dest.
>> This will only succeed in case size specified in nested_state->size is smaller than dest KVM_CAP_NESTED_STATE.
> 
> You're right (I got confused with my implementation).
> 
> There is still a problem, however, in that the same input stream would
> be parsed differently depending on the kernel version.  In particular,
> if in the future the maximum nested state size grows, you break all
> existing save files.

I’m not sure I agree with this.
1) Newer kernels should change struct only by utilizing unused fields in current struct
or enlarging it with extra fields. It should not change the meaning of existing fields.
2) Newer kernel need to be able to parse and generate structs of sizes of older kernels.
Otherwise, you won’t be able to migrate from older kernels to newer kernels and vice-versa.
3) The thing that is really missing here in my patch is: Source don’t know which version
of the struct it should ask kernel to generate in order for dest to be able to parse it.
For that, there is a need that source will somehow know what is the dest max compatible struct size
and request kernel to generate a struct of that size.
Is there any mechanism in QEMU which I can use to utilize to do such logic?

-Liran

> 
> Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
  2018-09-14  9:54 Liran Alon
@ 2018-09-14 10:59 ` Paolo Bonzini
  2018-09-14 14:31   ` Liran Alon
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2018-09-14 10:59 UTC (permalink / raw)
  To: Liran Alon
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson

On 14/09/2018 11:54, Liran Alon wrote:
>> On 14/09/2018 09:16, Paolo Bonzini wrote:
>> Heh, I was going to send a similar patch.  However, things are a bit
>> more complex for two reason.
>>
>> First, I'd prefer to reuse the hflags and hflags2 fields that we already
>> have, and only store the VMCS blob in the subsection.  For example,
>> HF_SVMI_MASK is really the same as HF_GUEST_MASK in KVM source code and
>> KVM_STATE_NESTED_GUEST_MODE in the nested virt state.
>>
> 
> Do you mean you intend to only save/restore the “vmx” field in struct kvm_nested_state?
> (That is, struct kvm_vmx_nested_state)
> If yes, that is problematic as kvm_nested_state.flags also hold other flags besides KVM_STATE_NESTED_GUEST_MODE.
> How do you expect to save/restore for example the vmx->nested.nested_run_pending flag that is specified in KVM_STATE_NESTED_RUN_PENDING?

By defining more HF flags.  I'd rather avoid having multiple ways to
store the same thing, in case for example in the future HVF implements
nested virt.

> In addition, why is it important to avoid save/restore the entire kvm_nested_state struct?
> It seems to simplify things to just save/restore the entire struct.

It is indeed simpler, but it adds unnecessary KVM specificities.

>>> +static int nested_state_post_load(void *opaque, int version_id)
>>> +{
>>> +    X86CPU *cpu = opaque;
>>> +    CPUX86State *env = &cpu->env;
>>> +
>>> +    /*
>>> +     * Verify that the size specified in given struct is set
>>> +     * to no more than the size that our kernel support
>>> +     */
>>> +    if (env->nested_state->size > env->nested_state_len) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static bool nested_state_needed(void *opaque)
>>
>> doesn't work if nested_state_len differs between source and destination,
>> and could overflow the nested_state buffer if nested_state_len is larger
>> on the source.
> 
> This is not accurate.
> I have actually given a lot of thought to this aspect in the patch.
> 
> The above post_load() method actually prevents an overflow to happen on dest.
> Note that env->nested_state_len is not passed as part of migration stream.
> It is only set by local kernel KVM_CAP_NESTED_STATE.
> 
> Therefore, we allow the following migrations to execute successfully:
> 1) Migration from a source with smaller KVM_CAP_NESTED_STATE to dest with a bigger one.
> The above post_load() check will succeed as size specified in migrated nested_state->size
> is smaller than dest KVM_CAP_NESTED_STATE (stored in env->nested_state_len).
> 2) Migration from source to dest when they both have same KVM_CAP_NESTED_STATE size.
> Obvious.
> 3) Migration from source to dest when source have a bigger KVM_CAP_NESTED_STATE than dest.
> This will only succeed in case size specified in nested_state->size is smaller than dest KVM_CAP_NESTED_STATE.

You're right (I got confused with my implementation).

There is still a problem, however, in that the same input stream would
be parsed differently depending on the kernel version.  In particular,
if in the future the maximum nested state size grows, you break all
existing save files.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: i386: Add support for save and restore nested state
@ 2018-09-14  9:54 Liran Alon
  2018-09-14 10:59 ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Liran Alon @ 2018-09-14  9:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Idan Brown, ehabkost, kvm list, mtosatti, qemu-devel, rth, Jim Mattson

>On 14/09/2018 09:16, Paolo Bonzini wrote:
>Heh, I was going to send a similar patch.  However, things are a bit
>more complex for two reason.
>
>First, I'd prefer to reuse the hflags and hflags2 fields that we already
>have, and only store the VMCS blob in the subsection.  For example,
>HF_SVMI_MASK is really the same as HF_GUEST_MASK in KVM source code and
>KVM_STATE_NESTED_GUEST_MODE in the nested virt state.
>

Do you mean you intend to only save/restore the “vmx” field in struct kvm_nested_state?
(That is, struct kvm_vmx_nested_state)
If yes, that is problematic as kvm_nested_state.flags also hold other flags besides KVM_STATE_NESTED_GUEST_MODE.
How do you expect to save/restore for example the vmx->nested.nested_run_pending flag that is specified in KVM_STATE_NESTED_RUN_PENDING?

In addition, why is it important to avoid save/restore the entire kvm_nested_state struct?
It seems to simplify things to just save/restore the entire struct.

>More important, this:
>
>>
>> +static int nested_state_post_load(void *opaque, int version_id)
>> +{
>> +    X86CPU *cpu = opaque;
>> +    CPUX86State *env = &cpu->env;
>> +
>> +    /*
>> +     * Verify that the size specified in given struct is set
>> +     * to no more than the size that our kernel support
>> +     */
>> +    if (env->nested_state->size > env->nested_state_len) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static bool nested_state_needed(void *opaque)
>
>doesn't work if nested_state_len differs between source and destination,
>and could overflow the nested_state buffer if nested_state_len is larger
>on the source.

This is not accurate.
I have actually given a lot of thought to this aspect in the patch.

The above post_load() method actually prevents an overflow to happen on dest.
Note that env->nested_state_len is not passed as part of migration stream.
It is only set by local kernel KVM_CAP_NESTED_STATE.

Therefore, we allow the following migrations to execute successfully:
1) Migration from a source with smaller KVM_CAP_NESTED_STATE to dest with a bigger one.
The above post_load() check will succeed as size specified in migrated nested_state->size
is smaller than dest KVM_CAP_NESTED_STATE (stored in env->nested_state_len).
2) Migration from source to dest when they both have same KVM_CAP_NESTED_STATE size.
Obvious.
3) Migration from source to dest when source have a bigger KVM_CAP_NESTED_STATE than dest.
This will only succeed in case size specified in nested_state->size is smaller than dest KVM_CAP_NESTED_STATE.

-Liran

>
>I'll send my version today or next Monday.
>
>Thanks,
>
>Paolo
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-09-17 14:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-14  0:38 [PATCH 0/2]: KVM: i386: Add support for save and restore nested state Liran Alon
2018-09-14  0:38 ` [Qemu-devel] " Liran Alon
2018-09-14  0:38 ` [PATCH 1/2] i386: Compile CPUX86State xsave_buf only when support KVM or HVF Liran Alon
2018-09-14  0:38   ` [Qemu-devel] " Liran Alon
2018-09-14  0:38 ` [PATCH 2/2] KVM: i386: Add support for save and restore nested state Liran Alon
2018-09-14  0:38   ` [Qemu-devel] " Liran Alon
2018-09-14  7:16   ` Paolo Bonzini
2018-09-14  7:16     ` [Qemu-devel] " Paolo Bonzini
2018-09-14  9:54 Liran Alon
2018-09-14 10:59 ` Paolo Bonzini
2018-09-14 14:31   ` Liran Alon
2018-09-14 15:08     ` Paolo Bonzini
2018-09-15 20:48       ` Liran Alon
2018-09-15 20:57         ` Liran Alon
2018-09-17 14:35         ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.